Course Schedule

Note this tentative schedule is subject to change. Please check the schedule regularly.

Last Updated: 01/13/2025

Week Starting Topics Reading Notes
1 1/13/25 M: Lec1 - Course intro; Big Data Week 1 Mon: HW1 out
W: Lec2 - Hadoop Ecosystem
F: Lec3 - HDFS
2 1/20/25 M: MLK Day; No Classes Week 2
W: Lec4 - MapReduce Tue: HW1 due
F: Dr. Li on travel; No Classes
3 1/27/25 M: PE1 - MapReduce Exercise Week 3 Mon: HW2 out
W: Lab 1 Team formation due
F: Lec5 - Apache Pig
4 2/3/25 M: PE2 - Pig Exercise Week 4
W: Lab 2
F: Lec6 - Apache Spark Finalize project topic and dataset
5 2/10/25 M: Lec7 - Spark deployment on GCP Week 5 Mon: HW2 due; HW3 out
W: Lec8 - Spark Low-level API
F: Lec9 - Spark DataFrame
6 2/17/25 M: PE3 - Spark DataFrame Week 6
W: Lab 3
F: Lec10 - Module 1 summary
7 2/24/25 M: Exam 1 Mon: HW3 due
W: Dr. Li on travel; No Classes
F: Dr. Li on travel; No Classes Fri: Proposal due
8 3/3/25 M: Spring Break; No Classes
W: Spring Break; No Classes
F: Spring Break; No Classes
9 3/10/25 M: Lec11 - Spark Machine Learning Week 9 Mon: HW4 out
W: Lec12 - Spark Linear Regression
F: Lec13 - Spark Logistic Regression
10 3/17/25 M: Lec14 - Spark Tree Classifiers Week 10
W: PE4 - Spark ML
F: Lab 4
11 3/24/25 M: Lec15 - Spark KMeans Week 11 Mon: Project milestone 1 due (40%)
W: Lec16 - NLP at Scale
F: Lec17 - NLP with Spark
12 3/31/25 M: PE5 - ML at Scale Mon: HW4 due
W: Lab 5
F: Exam 2
13 4/7/25 M: Paper Presentation: Kafka Paper 1, 2 Mon: HW 5 out
W: Paper Presentation: Spark Streaming
F: PE6 - Data Streaming
14 4/14/25 M: Lab 6 Paper 3, 4 Mon: Project milestone 2 due (80%)
W: Paper Presentation: NoSQL Overview
F: Easter Break; No Classes
15 4/21/25 M: Paper Presentation: Hbase Paper 5, 6 Mon: HW 5 due
T: SCAD Day: Volunteer Presentation
W: Paper Presentation: DynamoDB
F: Paper Presentation: Cassandra
16 4/28/25 M: Project Day; No Classes
W: Project Presentation Group 1 Wed by noon: presentation and demo due
F: Project Presentation Group 2
17 5/5/25 M: Project Group 3: 2:30 - 5:30 PM
W: Have a great summer break! Wed: Final report and revised code due
F: Have a great summer break!

Reading List

Reading # Title URL
Week 1 Hadoop in a heartbeat
Week 2 HDFS
Week 3 MapReduce with Python
Week 4 Pig and Python
Week 5 Apache Spark and RDD API
Week 6 Spark SQL and DataFrame API
Week 9 PySpark Machine Learning
Week 10 Random Forest in PySpark
Week 11 NLP with PySpark
Week 13 Spark Structured Streaming

Paper List

Paper # Topic URL
Paper 1 Apache Kafka
Paper 2 Spark Streaming
Paper 3 NoSQL Databases Review
Paper 4 Apache Hbase
Paper 5 DynamoDB
Paper 6 Apache Cassandra