UNIT 1:
Data Science – Fundamentals and Components
Terminologies Used in Big Data Environments
Classification of Digital Data
Introduction to Big Data – Characteristics of Data
Classification of Analytics
Top Challenges Facing Big Data – Importance of Big Data Analytics
UNIT 2:
Mean, Median and Mode – Standard Deviation and Variance
Probability Density Function
Types of Data Distribution
Percentiles and Moments – Correlation and Covariance
Conditional Probability – Bayes’ Theorem
Introduction to Univariate, Bivariate and Multivariate Analysis
Principal Component Analysis (PCA)
Dimensionality Reduction using Principal Component Analysis and LDA
UNIT 3:
Linear Regression – Polynomial Regression – Multivariate Regression – Multi Level Models
Data Warehousing Overview
Bias/Variance Trade Off – K Fold Cross Validation
Data Cleaning and Normalization – Cleaning Web Log Data
Introduction to Machine learning algorithms
UNIT 4:
Introducing Hadoop –Hadoop Overview – RDBMS versus Hadoop
HDFS (Hadoop Distributed File System):
Components and Block Replication
Processing Data with Hadoop – Introduction to MapReduce