Course Objective :
The objective of this course is to provide knowledge of data exploration using programming APIs and freely available tools.
Course Outline:
The course introduces the concept of data formation with the help of crawling and usage of APIs. Apply various Data cleaning, data transformation, data exploration and data visualization techniques in R and Python programming language. Explore and visualize data using various tools such as Gephi, NodeXL, Rapid Miner. Students will learn to explore and visualize data using various programming python APIs- Matplotlib, GraphViz, NetworkX and R libraries: Rminer, FactoMineR etc.
Learning Outcome:
On successful completion of this course, the students should be able to:
- Acquire Data through Web Scraping / Crawling, and Data API such as Tweepy (Twitter), Google API (YouTube, Google+, Google Search Engine), FaceBook API (Instagram API, Graph API, Atlas API), etc.
- Clean, Integrate and select appropriate attribute using various data transformation and discretization techniques.
- Explore Data for rapid quantitative analysis using data science tools- Weka3, OpenRefine (Google), and Rapid Miner.
- Visualize Data in interactive manner using visualization tools such as Gephi, NodeXL, and Cytoscape.
- Write Programs in R and Python to explore and visualize data using libraries such as Matplotlib, Graphviz, NetworkX, ggplot2, etc.
- Handle complex data science problems such as classification, prediction, clustering, dimension reduction, etc.using various R libraries (RMiner, Caret, FactoMineR) and Python APIs (Scipy, Scikit,-learn, Pandas, NLTK, Gensim, Theano).