B.Sc. in Software Engineering Major in Data Science
Curriculum & Syllabus

Course Syllabus with Brief Description of Courses


about
about shape
about shape

Introduction to Data Science and Data Management & Analysis

Course 1: DS 331 + DS 332

(L3 T3, 9th Semester)
Credits - 3; Theory (2 credits) + Lab (1 credit)
Prerequisites - STA 101 Probability & Statistics in Software Engineering, SE 121 Structured Programming
Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

  • Introduction to Data Science
  • Introduction to Machine learning
  • Supervised & Unsupervised Learning
  • Linear Algebra
  • Understanding Gradient Decent Algorithm
  • Linear Regression Algorithm
  • Introduction to Python And R programming In Data Science

Brief Description

This is the first course in the Data Science Major. This course has two components. The first component will focus on building the fundamentals required for a learner to work with data and become a data scientist. Here, the learner will first be introduced with the discipline of Data Science which is the study of gaining insights into data through computation, statistics and visualization. The concepts of proper experimental design for a data science project and big data will also be covered in this component to establish a firm ground over which the rest of the Major will be built.

The second component of the course will focus in detail on the first two steps of the data science process after appropriate question(s) have been formed. These are data acquisition or finding or generating and preparing the data and exploratory data analysis or exploring the data through the use of statistics and visualizations. In the data acquisition part, first theoretical and practical aspects for obtaining data from heterogeneous data sources such as API, database, web or data repositories having different file formats will be discussed and then methods for data cleaning, organizing, merging and managing data required for effective downstream data analysis will be covered. In the exploratory data analysis part, tools and techniques for summarizing data will be covered. These techniques are typically applied before more formal data analysis commences and can help inform the development of more complex statistical analysis. Exploratory techniques are also important for eliminating or sharpening potential hypotheses that are generated to answer the question(s) in a data science project. Data visualization using graphs and some of the common multivariate statistical techniques such as clustering and dimensionality reduction that can be used for visualizing high-dimensional data will be discussed in this part.

Statistical Data Analysis

Course 2: DS 411 + DS 412

(L4 T1, 10th Semester)
Credits - 3; Theory (2 credits) + Lab (1 credit)
Prerequisites – DS 331 +DS 332 Introduction to Data Science and Data Management & Analysis
Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

  • Introduction to Statistical Data Analysis
  • Hypothesis testing
  • Data visualizations
  • Descriptive statistics
  • Inferential statistics
  • How to use data visualization tools

Brief Description

This is the second course in the Data Science Major. In this course the learner will be introduced to statistical tools and techniques that can be used to analyze data. These tools can be used to draw effective conclusions or inferences about populations or scientific truths from data that can then help answer the question(s) in a data science project. The tools and techniques that will be covered in this course are probability, random variables and expected values, variability, distributions, limits, confidence intervals, testing, p values, power, bootstrapping and permutation tests.

As many of these fundamental techniques have already been covered in STA 101 Probability & Statistics in Software Engineering, here the focus will be to address these techniques in a data science project setting that is not only how these techniques work will be covered here but more importantly when, how and why each of these techniques should be applied in a data science project will be imparted to the learner.

Machine Learning driven Data Analysis-I

Course 3: DS 421 + DS 422

(L4 T2, 11th Semester)
Credits - 3; Theory (2 credits) + Lab (1 credit)
Prerequisites – DS 411 +DS 412 Statistical Data Analysis, SE 443 Machine Learning
Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

  • Learn different machine learning algorithms
  • Learn data-driven analysis
  • Linear regression & Logistic regression
  • Learning different classification models
  • Logistic Regression model
  • K-Nearest Neighbor (KNN) model
  • Support Vector Machine (SMV) model
  • Decision Tree Classifier model
  • Random Forest model
  • Clustering Algorithm
  • Apply Machine Learning algorithms in real life the data science project

Brief Description

This is the third course in the Data Science Major. The focus of this course will be to apply machine learning tools and techniques for predictive data analysis in order to help answer the questions(s) in a data science project. The course will assume and pile on the basic components of building and applying prediction functions including feature creation, algorithms, evaluation or model validation, training and test sets, overfitting, and error rates that are covered in SE 443 Machine Learning.

The algorithms and the machine learning methods that will be covered in this course are multivariate linear regression, logistic regression, KNN, SVM, decision trees, random forests, mean-shift clustering, density-based spatial clustering of applications with noise (DBSCAN), expectation–maximization(EM) clustering using gaussian mixture models (GMM), agglomerative hierarchical clustering and non-linear dimensionality reduction. Emphasis will be put not only on how these algorithms and methods function but also on when, how and why each of these should be applied in a data science project.

Machine Learning driven Data Analysis-II and Communicating Data Insights

Course 4: DS 423 + DS 424

(L4 T2, 11th Semester)
Credits - 3; Theory (2 credits) + Lab (1 credit)
Prerequisites – DS 411 + DS 412 Statistical Data Analysis, SE 443 Machine Learning
Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

  • Introduction to Deep Learning
  • Research Datasets and Model Cross Validation
  • Feature Selection Techniques
  • Natural language processing (NLP)
  • Computer Vision
  • Image Processing with keras and tensorflow
  • Big data analytics
  • Full model concept of Deep Learning and their Applications
  • Apply Deep Learning algorithms in real life the data science project

Brief Description

This is the fourth course in the Data Science Major. It has two components. The first component focuses on naive bayes classifiers, deep learning based classifiers and ensemble methods that can be used for predictive data analysis in order to help answer the question(s) in a data science project. The deep learning based classifiers that will be covered in the course are convolutional neural network (CNN), recurrent neural network (RNN) and long short-term memory (LSTM). Emphasis will be put not only on how these algorithms and methods function but also on when, how and why each of these should be applied in a data science project. As in the case of DS103 Machine Learning driven Data Analysis I, this course will also assume and pile on the basic components of building and applying prediction functions including feature creation, algorithms, evaluation or model validation, training and test sets, overfitting, and error rates that are covered in SE 443 Machine Learning.

The second component of this course will focus on communicating the insights gained to an audience after a successful data analysis phase. Tools and techniques for effective presentation of the outcomes of a data science project that include story-telling, interactive visualizations and presentations for the general audience and library support along with documentation for engineers building data driven software/hardware products will be covered here.

Capstone Project: DS 431 Data Science Major Capstone Project

(L4 T3, 12th Semester)
Credits - 6
Prerequisites – DS 331 + DS 332 Introduction to Data Science and Data Management & Analysis, DS 411 + DS412 Statistical Data Analysis
Lab Platform - Python Ecosystem (Preferred) or R Ecosystem

  • Analyze Real Life Industrial Data
  • Implement Deep learning / Machine Learning Models
  • Make a ML system
  • Deploy the project in online platfrom
  • Publishing student own Research Paper

Brief Description

The Data Science Major Capstone Project will allow the learners to apply the knowledge acquired in the four courses of the Major to complete a data science project addressing a real world problem preferably in collaboration with an industry or government organization. This project will act as a testament to the skills and knowledge of the learners in the data science domain to potential future employers. It will act as a substitute for the course SE 422 Final Year Thesis/Project/Internship. It will be conducted over the course of two semesters - L4 T2 and L4 T3. By the end of L4 T2, a learner is expected to select a suitable real world problem with the guidance of supervisor(s), define appropriate questions which will steer the rest of the project, acquire relevant data, carry out exploratory data analysis and if needed statistical data analysis as well. By the end of L4 T3, the learner is then expected to carry out relevant machine learning driven data analysis if needed and finally prepare an effective presentation comprising story and interactive visualizations and/ library support along with documentation for communicating the outcomes of the project to the appropriate audience.


Course Offer for Data Science Major Students


SEMESTER COURSE CODE COURSE NAME Prerequisite Theory Credit Lab Credit Total Credit
9th (3-3) DS 331 Introduction To Data Science and Data Management & Analysis (DS Major) STA 101, SE 121 2 3
DS 332 Introduction To Data Science and Data Management & Analysis Lab (DS Major) STA 101, SE 121 1
10th (4-1) DS 411 Statistical Data Analysis (DS Major) DS 331, DS 332 2 3
DS 412 Statistical Data Analysis Lab (DS Major) DS 331, DS 332 1
11th (4-2) DS 421 Machine Learning Driven Data Analysis I(DS Major) DS 411,DS 412, SE 544 2 6
DS 422 Machine Learning Driven Data Analysis Lab I (DS Major) DS 411,DS 412, SE 544 1
DS 423 Machine Learning Driven Data Analysis II and Communicating Data Insights (DS Major) DS 411,DS 412, SE 544 2
DS 424 Machine Learning Driven Data Analysis II and Communicating Data Insights Lab (DS Major) DS 411,DS 412, SE 544 1
12th(4-3) DS 431 Data Science Major Capstone Project (DS Major) ALL DS Major Courses 6 6