Course Overview

Course Coverage

This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.

Student Learning Outcomes

By the end of the course, the successful students will be able to:

  • (Concept Building) Understand the background, uniqueness and application scopes of data mining,
  • (Data Sourcing) Be able to extract and integrate raw data from various data sources and store the data in proper data formats,
  • (Data Pipeline) Proficiently use the Python data science tool set including NumPy, Pandas, Scikit-learn, Seaborn and Tensorflow etc. to implement computing components within the data pipeline,
  • (Data Mining) Apply data analytical models including classification, clustering and neural networks to discover patterns from large data sets,
  • (Knowledge Transferring) Interpret the results from models to obtain knowledge, and transfer the knowledge to the right audience.
Next