DIU | Data Science Lab

B.Sc. in Software Engineering Major in Data Science
Curriculum & Syllabus

Course Syllabus with Brief Description of Courses

Introduction to Data Science and Data Management & Analysis

Course 1: DS 331 + DS 332

(L3 T3, 9th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites - STA 101 Probability & Statistics in Software Engineering, SE 121 Structured Programming

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Introduction to Data Science
Introduction to Machine learning
Supervised & Unsupervised Learning
Linear Algebra
Understanding Gradient Decent Algorithm
Linear Regression Algorithm
Introduction to Python And R programming In Data Science

Brief Description

This is the first course in the Data Science Major. This course has two components. The first component will focus on building the fundamentals required for a learner to work with data and become a data scientist. Here, the learner will first be introduced with the discipline of Data Science which is the study of gaining insights into data through computation, statistics and visualization. The concepts of proper experimental design for a data science project and big data will also be covered in this component to establish a firm ground over which the rest of the Major will be built.

The second component of the course will focus in detail on the first two steps of the data science process after appropriate question(s) have been formed. These are data acquisition or finding or generating and preparing the data and exploratory data analysis or exploring the data through the use of statistics and visualizations. In the data acquisition part, first theoretical and practical aspects for obtaining data from heterogeneous data sources such as API, database, web or data repositories having different file formats will be discussed and then methods for data cleaning, organizing, merging and managing data required for effective downstream data analysis will be covered. In the exploratory data analysis part, tools and techniques for summarizing data will be covered. These techniques are typically applied before more formal data analysis commences and can help inform the development of more complex statistical analysis. Exploratory techniques are also important for eliminating or sharpening potential hypotheses that are generated to answer the question(s) in a data science project. Data visualization using graphs and some of the common multivariate statistical techniques such as clustering and dimensionality reduction that can be used for visualizing high-dimensional data will be discussed in this part.

Statistical Data Analysis

Course 2: DS 411 + DS 412

(L4 T1, 10th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 331 +DS 332 Introduction to Data Science and Data Management & Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Introduction to Statistical Data Analysis
Hypothesis testing
Data visualizations
Descriptive statistics
Inferential statistics
How to use data visualization tools

Brief Description

This is the second course in the Data Science Major. In this course the learner will be introduced to statistical tools and techniques that can be used to analyze data. These tools can be used to draw effective conclusions or inferences about populations or scientific truths from data that can then help answer the question(s) in a data science project. The tools and techniques that will be covered in this course are probability, random variables and expected values, variability, distributions, limits, confidence intervals, testing, p values, power, bootstrapping and permutation tests.

As many of these fundamental techniques have already been covered in STA 101 Probability & Statistics in Software Engineering, here the focus will be to address these techniques in a data science project setting that is not only how these techniques work will be covered here but more importantly when, how and why each of these techniques should be applied in a data science project will be imparted to the learner.

Machine Learning driven Data Analysis-I

Course 3: DS 421 + DS 422

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 +DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Learn different machine learning algorithms
Learn data-driven analysis
Linear regression & Logistic regression
Learning different classification models
Logistic Regression model
K-Nearest Neighbor (KNN) model
Support Vector Machine (SMV) model
Decision Tree Classifier model
Random Forest model
Clustering Algorithm
Apply Machine Learning algorithms in real life the data science project

Brief Description

This is the third course in the Data Science Major. The focus of this course will be to apply machine learning tools and techniques for predictive data analysis in order to help answer the questions(s) in a data science project. The course will assume and pile on the basic components of building and applying prediction functions including feature creation, algorithms, evaluation or model validation, training and test sets, overfitting, and error rates that are covered in SE 443 Machine Learning.

The algorithms and the machine learning methods that will be covered in this course are multivariate linear regression, logistic regression, KNN, SVM, decision trees, random forests, mean-shift clustering, density-based spatial clustering of applications with noise (DBSCAN), expectation–maximization(EM) clustering using gaussian mixture models (GMM), agglomerative hierarchical clustering and non-linear dimensionality reduction. Emphasis will be put not only on how these algorithms and methods function but also on when, how and why each of these should be applied in a data science project.

Machine Learning driven Data Analysis-II and Communicating Data Insights

Course 4: DS 423 + DS 424

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 + DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Introduction to Deep Learning
Research Datasets and Model Cross Validation
Feature Selection Techniques
Natural language processing (NLP)
Computer Vision
Image Processing with keras and tensorflow
Big data analytics
Full model concept of Deep Learning and their Applications
Apply Deep Learning algorithms in real life the data science project

Brief Description

This is the fourth course in the Data Science Major. It has two components. The first component focuses on naive bayes classifiers, deep learning based classifiers and ensemble methods that can be used for predictive data analysis in order to help answer the question(s) in a data science project. The deep learning based classifiers that will be covered in the course are convolutional neural network (CNN), recurrent neural network (RNN) and long short-term memory (LSTM). Emphasis will be put not only on how these algorithms and methods function but also on when, how and why each of these should be applied in a data science project. As in the case of DS103 Machine Learning driven Data Analysis I, this course will also assume and pile on the basic components of building and applying prediction functions including feature creation, algorithms, evaluation or model validation, training and test sets, overfitting, and error rates that are covered in SE 443 Machine Learning.

The second component of this course will focus on communicating the insights gained to an audience after a successful data analysis phase. Tools and techniques for effective presentation of the outcomes of a data science project that include story-telling, interactive visualizations and presentations for the general audience and library support along with documentation for engineers building data driven software/hardware products will be covered here.

Capstone Project: DS 431 Data Science Major Capstone Project

(L4 T3, 12th Semester)

Credits - 6

Prerequisites – DS 331 + DS 332 Introduction to Data Science and Data Management & Analysis, DS 411 + DS412 Statistical Data Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Analyze Real Life Industrial Data
Implement Deep learning / Machine Learning Models
Make a ML system
Deploy the project in online platfrom
Publishing student own Research Paper

Brief Description

The Data Science Major Capstone Project will allow the learners to apply the knowledge acquired in the four courses of the Major to complete a data science project addressing a real world problem preferably in collaboration with an industry or government organization. This project will act as a testament to the skills and knowledge of the learners in the data science domain to potential future employers. It will act as a substitute for the course SE 422 Final Year Thesis/Project/Internship. It will be conducted over the course of two semesters - L4 T2 and L4 T3. By the end of L4 T2, a learner is expected to select a suitable real world problem with the guidance of supervisor(s), define appropriate questions which will steer the rest of the project, acquire relevant data, carry out exploratory data analysis and if needed statistical data analysis as well. By the end of L4 T3, the learner is then expected to carry out relevant machine learning driven data analysis if needed and finally prepare an effective presentation comprising story and interactive visualizations and/ library support along with documentation for communicating the outcomes of the project to the appropriate audience.

Course Offer for Data Science Major Students

SEMESTER	COURSE CODE	COURSE NAME	Prerequisite	Theory Credit	Lab Credit	Total Credit
9th (3-3)	DS 331	Introduction To Data Science and Data Management & Analysis (DS Major)	STA 101, SE 121	2		3
9th (3-3)	DS 332	Introduction To Data Science and Data Management & Analysis Lab (DS Major)	STA 101, SE 121		1	3
10th (4-1)	DS 411	Statistical Data Analysis (DS Major)	DS 331, DS 332	2		3
10th (4-1)	DS 412	Statistical Data Analysis Lab (DS Major)	DS 331, DS 332		1	3
11th (4-2)	DS 421	Machine Learning Driven Data Analysis I(DS Major)	DS 411,DS 412, SE 544	2		6
	DS 422	Machine Learning Driven Data Analysis Lab I (DS Major)	DS 411,DS 412, SE 544		1
	DS 423	Machine Learning Driven Data Analysis II and Communicating Data Insights (DS Major)	DS 411,DS 412, SE 544	2
	DS 424	Machine Learning Driven Data Analysis II and Communicating Data Insights Lab (DS Major)	DS 411,DS 412, SE 544		1
12th(4-3)	DS 431	Data Science Major Capstone Project (DS Major)	ALL DS Major Courses	6		6

B.Sc. in Software Engineering Major in Data Science
Curriculum & Syllabus

Course Syllabus with Brief Description of Courses

Introduction to Data Science and Data Management & Analysis

Course 1: DS 331 + DS 332

(L3 T3, 9th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites - STA 101 Probability & Statistics in Software Engineering, SE 121 Structured Programming

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Statistical Data Analysis

Course 2: DS 411 + DS 412

(L4 T1, 10th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 331 +DS 332 Introduction to Data Science and Data Management & Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Machine Learning driven Data Analysis-I

Course 3: DS 421 + DS 422

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 +DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Machine Learning driven Data Analysis-II and Communicating Data Insights

Course 4: DS 423 + DS 424

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 + DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Capstone Project: DS 431 Data Science Major Capstone Project

(L4 T3, 12th Semester)

Credits - 6

Prerequisites – DS 331 + DS 332 Introduction to Data Science and Data Management & Analysis, DS 411 + DS412 Statistical Data Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Course Offer for Data Science Major Students

Services

Community

Our Course Curriculum

B.Sc. in Software Engineering Major in Data ScienceCurriculum & Syllabus

Course Syllabus with Brief Description of Courses

Introduction to Data Science and Data Management & Analysis

Course 1: DS 331 + DS 332

(L3 T3, 9th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites - STA 101 Probability & Statistics in Software Engineering, SE 121 Structured Programming

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Statistical Data Analysis

Course 2: DS 411 + DS 412

(L4 T1, 10th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 331 +DS 332 Introduction to Data Science and Data Management & Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Machine Learning driven Data Analysis-I

Course 3: DS 421 + DS 422

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 +DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Machine Learning driven Data Analysis-II and Communicating Data Insights

Course 4: DS 423 + DS 424

(L4 T2, 11th Semester)

Credits - 3; Theory (2 credits) + Lab (1 credit)

Prerequisites – DS 411 + DS 412 Statistical Data Analysis, SE 443 Machine Learning

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Capstone Project: DS 431 Data Science Major Capstone Project

(L4 T3, 12th Semester)

Credits - 6

Prerequisites – DS 331 + DS 332 Introduction to Data Science and Data Management & Analysis, DS 411 + DS412 Statistical Data Analysis

Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

Brief Description

Course Offer for Data Science Major Students

B.Sc. in Software Engineering Major in Data Science
Curriculum & Syllabus