Course Overview
Course Coverage
This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.
Student Learning Outcomes
By the end of the course, the successful students will be able to:
- (Foundations) Understand the background, uniqueness, and application scope of data mining.
- (Mathematics) Grasp the mathematical foundations underlying data mining models, enhancing your ability to understand and apply these models effectively.
- (Data Acquisition) Extract and integrate raw data from various sources and store it in appropriate formats.
- (Tool Proficiency) Use the Python data science toolkit, including NumPy, Pandas, Scikit-learn, Seaborn, TensorFlow, etc., to build components within the data pipeline.
- (Modeling) Apply data analytical models such as classification, clustering, and neural networks to uncover patterns in large datasets.
- (Knowledge Communication) Interpret model results to derive knowledge and effectively communicate it to the appropriate audience.