Course Overview
Course Coverage
This project-based course aims to cover the fundamental concepts and core techniques for discovering patterns in large-scale data sets. This course consists of three main modules: (1) Data Mining Pipeline, which introduces the key steps of data understanding, data preprocessing, data warehousing, data modeling and interpretation/evaluation; (2) Data Mining Methods, which covers core techniques for regression, classification, clustering, dimensionality reduction and association; and (3) Deep Learning, which discusses the state-of-art deep learning techniques such as CNN and RNN with the implementation in Tensorflow.
Student Learning Outcomes
By the end of the course, the successful students will be able to:
- (Concept Building) Understand the background, uniqueness and application scopes of data mining,
- (Data Sourcing) Be able to extract and integrate raw data from various data sources and store the data in proper data formats,
- (Data Pipeline) Proficiently use the Python data science tool set including NumPy, Pandas, Scikit-learn, Seaborn and Tensorflow etc. to implement computing components within the data pipeline,
- (Data Mining) Apply data analytical models including classification, clustering and neural networks to discover patterns from large data sets,
- (Knowledge Transferring) Interpret the results from models to obtain knowledge, and transfer the knowledge to the right audience.