Data Science Lab-II

Course Objective :
The objective of this course is to provide knowledge of data exploration using programming APIs and freely available tools.

Course Outline:
The course introduces the concept of data formation with the help of crawling and usage of APIs. Apply various Data cleaning, data transformation, data exploration and data visualization techniques in R and Python programming language. Explore and visualize data using various tools such as Gephi, NodeXL, Rapid Miner. Students will learn to explore and visualize data using various programming python APIs- Matplotlib, GraphViz, NetworkX and R libraries: Rminer, FactoMineR etc.

Learning Outcome:

On successful completion of this course, the students should be able to:

  1. Acquire Data through Web Scraping / Crawling, and Data API such as Tweepy (Twitter), Google API (YouTube, Google+, Google Search Engine), FaceBook API (Instagram API, Graph API, Atlas API), etc.
  2. Clean, Integrate and select appropriate attribute using various data transformation and discretization techniques.
  3. Explore Data for rapid quantitative analysis using data science tools- Weka3, OpenRefine (Google), and Rapid Miner.
  4. Visualize Data in interactive manner using visualization tools such as Gephi, NodeXL, and Cytoscape.
  5. Write Programs in R and Python to explore and visualize data using libraries such as Matplotlib, Graphviz, NetworkX, ggplot2, etc.
  6. Handle complex data science problems such as classification, prediction, clustering, dimension reduction, etc.using various R libraries (RMiner, Caret, FactoMineR) and Python APIs (Scipy, Scikit,-learn, Pandas, NLTK, Gensim, Theano).
