Start Introductory Advanced Projects Intern Activities

Downloading and Data Extraction Pipeline

For small data, the download could be a simple task. But in BigData regime, data download/transfer or server migration is a challenging task. One need to use command lines in shell scripts to transfer data from one remote server to another remote server. For example, sometimes data may be corrupted during the downloading process, an MD5-checksum need to be created and tested for integrity of the download process.

Lectures : Using Python to Access Web Data | Data Wrangling with MongoDB | Introduction to SQL

Tutorials: os | wget

Preprocessing & Feature Engineering

Cleaned data may need some preprocessing steps, for example, one hot encoding, label encoding, data transformation, normalization, etc. After preprocessing, usually, feature importance/dependency is checked by the underlying statistical properties of the data. Feature Engineering is a brilliant step for model design at online streaming of data where specific features are selected and passed through multiple preprocessing steps in a pipeline before fed into the model training.

Lectures : Natural Language Processing | Applied Text Mining in Python | Feature Engineering | Feature Engineering for Improving Learning Environments

Tutorials: Data Transformation | Feature Extraction | Dimentionality Reduction

Machine Learning (Regression, Classification & Clustering)

Machine learning is a process to teach an algorithm to work for your benefits. If you train your algorithm with already known features and labels, it is called supervised learning (Regression, classification, etc.). If you train with features only(no labels), it is unsupervised learning (Clustering, Self-organizing map etc). Best features selected from feature engineering are passed through machine learning pipeline. It builds a model by implementing model tuning, where the error is reduced, and accuracy is increased, maintaining bias-variance trade-off.

Lectures : Regression | Classification | Clustering & Retrieval | Machine Learning Fundamentals | Data Science: Basics of Machine Learning | Machine Learning(Udacity) | Practical Predictive Analytics: Models and Methods (Coursera) | Principles of Machine Learning: Python Edition(Edx) | Machine Learning for AI (Edx) | Machine Learning with Python: from Linear Models to Deep Learning (Edx)

Tutorials: Supervised Learning | Clustering

Interactive Data Visualization

Data visualization is the way to advertise the power of data science. There are two ways to present data: as a static view (png, jpg, pdf, etc.) and as an interactive app in a website using javascript, plotly, tablue, etc. In interactive data visualization, viewers can interact with the visualization app and elevate the level of understanding of the presented matter.

Lectures : Data Visualization Specialization(4 courses) | Data Visualization with Tableau Specialization (5-courses) | Data Visualization and D3.js

Tutorials: Plotly | Bokeh | Introduction to D3.js