Course Overview

Course Coverage

This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.

Student Learning Outcomes

By the end of the course, the successful students will be able to:

(Concept Building) Understand the background, uniqueness and application scopes of data mining,
(Data Sourcing) Be able to extract and integrate raw data from various data sources and store the data in proper data formats,
(Data Pipeline) Proficiently use the Python data science tool set including NumPy, Pandas, Scikit-learn, Seaborn and Tensorflow etc. to implement computing components within the data pipeline,
(Data Mining) Apply data analytical models including classification, clustering and neural networks to discover patterns from large data sets,
(Knowledge Transferring) Interpret the results from models to obtain knowledge, and transfer the knowledge to the right audience.

Last updated on May 5, 2019