Course Schedule

Note this tentative schedule is subject to change. Please check the schedule regularly.

Last Updated: 01/13/2025

Week Starting Topics Reading Notes
1 1/13/25 M: Lec1 - Course intro; Big Data Week 1 Mon: HW1 out
W: Lec2 - Hadoop Ecosystem
F: Lec3 - HDFS
2 1/20/25 M: MLK Day; No Classes Week 2
W: Lec4 - MapReduce Tue: HW1 due
F: Dr. Li on travel; No Classes
3 1/27/25 M: PE1 - MapReduce Exercise Week 3 Mon: HW2 out
W: Lab 1 Team formation due
F: Lec5 - Apache Pig
4 2/3/25 M: PE2 - Pig Exercise Week 4
W: Lab 2
F: Lec6 - Apache Spark Finalize project topic and dataset
5 2/10/25 M: Lec7 - Spark deployment on GCP Week 5 Mon: HW2 due; HW3 out
W: Lec8 - Spark Low-level API
F: Lec9 - Spark DataFrame
6 2/17/25 M: PE3 - Spark DataFrame Week 6
W: Lab 3
F: Lec10 - Module 1 summary
7 2/24/25 M: Exam 1 Mon: HW3 due
W: Dr. Li on travel; No Classes
F: Dr. Li on travel; No Classes Fri: Proposal due
8 3/3/25 M: Spring Break; No Classes
W: Spring Break; No Classes
F: Spring Break; No Classes
9 3/10/25 M: Lec11 - Spark Machine Learning Week 9 Mon: HW4 out
W: Lec12 - Spark Linear Regression
F: Lec13 - Spark Logistic Regression
10 3/17/25 M: Lec14 - Spark Tree Classifiers Week 10
W: PE4 - Spark ML
F: Lab 4
11 3/24/25 M: Lec15 - Spark KMeans Week 11 Mon: Project milestone 1 due (40%)
W: Lec16 - NLP at Scale
F: Lec17 - NLP with Spark
12 3/31/25 M: PE5 - ML at Scale Mon: HW4 due
W: Lab 5
F: Exam 2
13 4/7/25 M: Paper Presentation: Kafka Paper 1, 2 Mon: HW 5 out
W: Paper Presentation: Spark Streaming
F: PE6 - Data Streaming
14 4/14/25 M: Lab 6 Paper 3, 4 Mon: Project milestone 2 due (80%)
W: Paper Presentation: NoSQL Overview
F: Easter Break; No Classes
15 4/21/25 M: Paper Presentation: Hbase Paper 5, 6 Mon: HW 5 due
T: SCAD Day: Volunteer Presentation
W: Paper Presentation: DynamoDB
F: Paper Presentation: Cassandra
16 4/28/25 M: Project Day; No Classes
W: Project Presentation Group 1 Wed by noon: presentation and demo due
F: Project Presentation Group 2
17 5/5/25 M: Project Group 3: 2:30 - 5:30 PM
W: Have a great summer break! Wed: Final report and revised code due
F: Have a great summer break!

Reading List

Reading # Title URL
Week 1 Hadoop in a heartbeat https://peilong.github.io/files/cs354/Reading1_Hadoop.pdf
Week 2 HDFS https://peilong.github.io/files/cs354/Reading2_HDFS.pdf
Week 3 MapReduce with Python https://peilong.github.io/files/cs354/Reading3_MR.pdf
Week 4 Pig and Python https://peilong.github.io/files/cs354/Reading4_Pig.pdf
Week 5 Apache Spark and RDD API https://peilong.github.io/files/cs354/Reading5_RDD.pdf
Week 6 Spark SQL and DataFrame API https://peilong.github.io/files/cs354/Reading6_SparkSQL.pdf
Week 9 PySpark Machine Learning https://www.projectpro.io/hadoop-tutorial/pyspark-machine-learning-tutorial
Week 10 Random Forest in PySpark https://towardsdatascience.com/a-guide-to-exploit-random-forest-classifier-in-pyspark-46d6999cb5db
Week 11 NLP with PySpark https://medium.com/@mrunmayee.dhapre/natural-language-processing-nlp-with-spark-python-f67ac513616f
Week 13 Spark Structured Streaming https://medium.com/analytics-vidhya/apache-spark-structured-streaming-with-pyspark-b4a054a7947d

Paper List

Paper # Topic URL
Paper 1 Apache Kafka https://notes.stephenholiday.com/Kafka.pdf
Paper 2 Spark Streaming https://dl.acm.org/doi/10.1145/2517349.2522737
Paper 3 NoSQL Databases Review https://dl.acm.org/doi/10.1145/1978915.1978919
Paper 4 Apache Hbase https://dl.acm.org/doi/10.1145/1365815.1365816
Paper 5 DynamoDB https://dl.acm.org/doi/10.1145/1323293.1294281
Paper 6 Apache Cassandra https://dl.acm.org/doi/10.1145/1773912.1773922
Previous
Next