Course Overview

Course Coverage

This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.

Student Learning Outcomes

By the end of the course, the successful students will be able to:

(Foundations) Understand the background, uniqueness, and application scope of data mining.
(Mathematics) Grasp the mathematical foundations underlying data mining models, enhancing your ability to understand and apply these models effectively.
(Data Acquisition) Extract and integrate raw data from various sources and store it in appropriate formats.
(Tool Proficiency) Use the Python data science toolkit, including NumPy, Pandas, Scikit-learn, Seaborn, TensorFlow, etc., to build components within the data pipeline.
(Modeling) Apply data analytical models such as classification, clustering, and neural networks to uncover patterns in large datasets.
(Knowledge Communication) Interpret model results to derive knowledge and effectively communicate it to the appropriate audience.

Last updated on May 5, 2019