Course Overview

Course Coverage

This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.

Student Learning Outcomes

By the end of the course, the successful students will be able to:

  • (Foundations) Understand the background, uniqueness, and application scope of data mining.
  • (Mathematics) Grasp the mathematical foundations underlying data mining models, enhancing your ability to understand and apply these models effectively.
  • (Data Acquisition) Extract and integrate raw data from various sources and store it in appropriate formats.
  • (Tool Proficiency) Use the Python data science toolkit, including NumPy, Pandas, Scikit-learn, Seaborn, TensorFlow, etc., to build components within the data pipeline.
  • (Modeling) Apply data analytical models such as classification, clustering, and neural networks to uncover patterns in large datasets.
  • (Knowledge Communication) Interpret model results to derive knowledge and effectively communicate it to the appropriate audience.
Next