# Introductory

Start Intermediate Advanced Projects Intern Activities

### Getting Started with Python

This is an introductory level of Python which includes fundamentals of programming with concepts in object-oriented programming and introductory data-structures for example list, tuple and dictionary. More concepts in programming is presented in Algorithm and data structure section.

### Getting Started with Linux, Git, and GitHub

Why Linux for a data scientist? Linux is the best operating system for computing. Most of the servers are implemented on Linux computers. You should be familiar with basic Linux commands to run program, install packages, download milliomns of documents in a remote cloud/server etc. GitHub and Git are the best applications to keep track of your versions of programming files. These are called version control software.

### Applied Statistics: Probability, Inferential & Bayesian Statistics

Applied Statistics is a fundamental requirement for Data Science. The basic statistical analysis for example underlying distribution, correlation, sampling, hypothesis test, etc. are used to preprocess, feature engineering and model tuning. More detail of Bayesian learning is available in the advanced level.

### Applied Math : Linear Algebra, Advanced Calculus & Optimization

Vectors and matrices are the leading players in the playground! All data samples are coordinate in n-dimensional vector space whereas features are degrees of freedom of the system. During feature engineering and model selection, your central struggle is to reduce the degrees of freedom because high degrees of freedom implies more complexity of the model and requires more computational power. Most of the algorithms in Machine Learning implements optimization techniques to train the model. To understand Optimization you need multivariable calculus. Which means you are taking derivatives of those aforementioned vectors and matrices. Deep learning even requires tensors..

### Introduction to Algorithm and Data Structure

If you write a program, the very first question is how much time it takes to get the result and how much space in the hard disk(stack-heap) it keeps busy while running your program - called space-time complexity. So writing a program for completing a task could be a requirement but not a sufficient approach when you implement the same code for BigData. There are a variety of algorithms and data structures with different space and time complexities. In the distributed/parallel computing domain, similar to space-time complexity, there are latency and throughput among the processing in separate lots.

### Data Visualization (Static)

Data visualization is the way to advertise the power of data science. There are two ways to present data: as a static view (png, jpg, pdf, etc) and as an interactive app in the website using javascript, plotly, tablue, etc. In static data visualization, it produces a permanent figure without user interaction.

Tutrials: matplotlib | pandas | seaborn | Tidyverse

Cheat Sheets: ggplot part-I | Cheat Sheets: ggplot part-II