Course Schedule

Note this tentative schedule is subject to change. Please check the schedule regularly.

Last Updated: 04/09/2022

Week Starting Topics Reading Notes
1 1/16/23 T: Lec1 - Course introduction; Intro to Big Data Reading1_Hadoop Mon: HW1 out.
H: Lec2 - Hadoop Ecosystem
2 1/23/23 T: Lec3 - HDFS Reading2_HDFS Mon: HW1 due.
H: Lec4 - MapReduce
3 1/30/23 T: PE1 - MapReduce Exercise Reading3_MR Mon: Quiz 1 week; HW 2 out; Team formation due.
H: Lec5 - Apache Pig
4 2/6/23 T: PE2 - Pig Exercise Reading4_Pig Mon: Quiz 2 week.
H: Lec6 - NoSQL (Paper 1 Presentation)
5 2/13/23 T: Lec7 - Hbase (Paper 2 Presentation) Paper1 & Paper2 Mon: HW2 due; HW3 out; Finalize project topic and dataset.
H: Lec8 - MongoDB
6 2/20/23 T: Exam 1 Paper3_Spark
H: Lec9 - Intro to Apache Spark (Paper 3 Presentation)
7 2/27/23 T: Lec10 - Spark RDD (Paper 4 Presentation) Paper4_RDD, LS Ch.3 Mon: HW3 due. Final project proposal due
H: Lec11 - Spark DataFrame
8 3/6/23 T: Spring Break, No Classes
H: Spring Break, No Classes
9 3/13/23 T: PE3 - Spark DataFrame Exercise Paper5_Mllib, LS Ch.4 Mon: Quiz 3 week; HW4 out
H: Lec12 - Spark Machine Learning (Paper 5 Presentation)
10 3/20/23 T: PE4 - Spark Linear Regression Exercise LS Ch.10
H: Lec13 - Spark Logistic Regression
11 3/27/23 T: Lec14 - Spark Tree Classifiers LS Ch.11 Mon: Project milestone 1 due (50% work done)
H: PE5 - Spark Trees
12 4/3/23 T: Lec15 - Spark K-Means LS Ch.7 Mon: Quiz 4 week
H: Exam 2
13 4/10/23 T: Lec16 - Intro to NLP Reading5_NLP Mon: HW4 due
H: Lec17 - NLP by example
14 4/17/23 T: Lec18 - NLP with Spark LS Ch.8 Mon: Quiz 5 week
H: Lec19 - Apache Kafka Fri: Project milestone 2 due (90% work done)
15 4/24/23 T: Scholarship Day, Student Presentation Spark Document Mon: HW5 out, SCAD slides due
H: Lec20 - Spark Streaming Tue: Present at SCAD Day
16 5/1/23 T: PE6 - Data Streaming Mon: Quiz 6 week
H: Reading Day, No Classes
17 5/8/23 T: Work on Final Project Report Mon: Final presentation and demo code due
H: Have a great summer! Wed: Final report and revised code due; HW5 due

Reading List

  1. Textbook “Learning Spark, 2nd Edition” (Short name: LS): Link
  2. Reading1_Hadoop: Link
  3. Reading2_HDFS: Link
  4. Reading3_MR: Link
  5. Reading4_Pig: Link
  6. Reading5_NLP: Link
  7. Paper1_NoSQL: Link
  8. Paper2_Hbase: Link
  9. Paper3_Spark: Link
  10. Paper4_RDD: Link
  11. Paper5_MLlib: Link
Previous
Next