CS 354 Big Data

This course covers techniques needed to collect, store, analyze, and visualize big data, particularly for applications in machine learning at scale. The MapReduce paradigm will be taught using the popular Hadoop framework. Both batch and real-time analysis of massive quantities of data will be applied to machine learning problems such as clustering, regression, and classification. Relational SQL and NoSQL database models will be discussed and compared with use case analysis. Natural language processing will be studied as a comprehensive big data application.

Instructor:

Dr. Peilong Li

Office:

Esbenshade 284B

Appointments:

By email

Number of Credits

4

Pre-requisites

  • CS 250 Foundations of AI and Data Science
  • CS 209 Database Systems

Textbooks

  • (Required) Mehrotra, Shrey; Grade, Akash. Apache Spark Quick Start guide : quickly learn the art of writing efficient big data applications with Apache Spark. 2019, Birmingham : Packt. A free version of the textbook can be retrieved at: Link.