Subject Details
Dept     : IT
Sem      : 6
Regul    : 2017
Faculty : ashokkumar
phone  : NIL
E-mail  : ashok.r.it@snsce.ac.in
324
Page views
32
Files
4
Videos
4
R.Links

Icon
Syllabus

UNIT
1
INTRODUCTION TO DATA SCIENCE AND BIG DATA

Data Science – Fundamentals and Components – Data Scientist – Terminologies Used in Big Data Environments – Types of Digital Data – Classification of Digital Data – Introduction to Big Data – Characteristics of Data – Evolution of Big Data – Big Data Analytics – Classification of Analytics – Top Challenges Facing Big Data – Importance of Big Data Analytics – Data Analytics Tools.

UNIT
2
DESCRIPTIVE ANALYTICS USING STATISTICS

Types of Data – Mean, Median and Mode – Standard Deviation and Variance – Probability – Probability Density Function – Types of Data Distribution – Percentiles and Moments – Correlation and Covariance – Conditional Probability – Bayes’ Theorem – Introduction to Univariate, Bivariate and Multivariate Analysis – Dimensionality Reduction using Principal Component Analysis and LDA – Dimensionality Reduction using Principal Component Analysis and Linear Discriminant Analysis (LDA) – Principal Component Analysis (PCA) example with Iris Data Set from UCI repository.

UNIT
3
PREDICTIVE MODELING AND MACHINE LEARNING

Linear Regression – Polynomial Regression – Multivariate Regression – Multi Level Models – Data Warehousing Overview – Bias/Variance Trade Off – K Fold Cross Validation – Data Cleaning and Normalization – Cleaning Web Log Data – Normalizing Numerical Data – Detecting Outliers – Introduction to Supervised And Unsupervised Learning – Reinforcement Learning – Dealing with Real World Data – Machine Learning Algorithms –Clustering – Python Based Application.

UNIT
4
DATA ANALYTICAL FRAMEWORKS

Introducing Hadoop –Hadoop Overview – RDBMS versus Hadoop – HDFS (Hadoop Distributed File System): Components and Block Replication – Processing Data with Hadoop – Introduction to MapReduce – Features of MapReduce – Introduction to NoSQL: CAP theorem – MongoDB: RDBMS Vs MongoDB – Mongo DB Database Model – Data Types and Sharding – Introduction to Hive – Hive Architecture – Hive Query Language (HQL).

UNIT
5
DATA SCIENCE USING PYTHON

Introduction to Essential Data Science Packages: Numpy, Scipy, Jupyter, Statsmodels and Pandas Package – Data Munging: Introduction to Data Munging, Data Pipeline and Machine Learning in Python – Data Visualization Using Matplotlib – Interactive Visualization with Advanced Data Learning Representation in Python

Reference Book:

Alberto Boschetti, Luca Massaron, “Python Data Science Essentials”, Packt Publications, 2nd Edition, 2016. VDT Editorial Services, Big Data, Black Book, Dream Tech Press, 2015. Yuxi (Hayden) Liu, “Python Machine Learning”, Packt Publication, 2017.

Text Book:

Frank Pane, “Hands On Data Science and Python Machine Learning”, Packt Publishers, 2017. Seema Acharya, Subhashini Chellapan, “Big Data and Analytics”, Wiley, 2015.

 

Print    Download