Data Analysis Project
Generate a summary report from a data source listed below.
Total 100 Points:
- Jupyter Notebook 50 Points – Notebook showing your work and generated tables and graphs.
- Summary report 50 points. – Document presenting a summary of your data analysis process.
Using a public data source from below create a summary report with an observation of the data or answer a specific question about the data. Submit a word doc / pdf with a summary of your observations along with a Jupyter notebook output (PDF/print to PDF) showing your work.
Data Sets and Questions
- Instacart Market Basket Analysis (What will I buy next? 3 Million Instacart Orders, Open Sourced) What is the top and bottom selling items.
- Amazon Reviews for Sentiment Analysis (Letâ€™s get sentimental. Few million rows of Amazon customer review text and star ratings) What is the best, and worst rated products?
- Indian Premier League | Kaggle (Love Cricket? This is the dataset for you. This dataset has IPL data from all seasons and all matches. Can you predict the winner for the next season?)
- Walmart Recruiting – Store Sales Forecasting ( Hmmâ€¦ you think you can forecast? Data from 45 stores in the US, also bakes in the seasonality and key events so be prepared for ups and downs. ) Compute some average temps and other weather for a city?
- Trending YouTube Video Statistics and Comments (How about good old Exploratory Data Analysis ( EDA) and insights generation? Can you identify the attributes that make a video popular? 200 trending videos from US and UK)
- Credit Card Fraud Detection (Letâ€™s play fraudster. Data from a European credit card. Can you cope up with the low incidence rate of 0.17% ?)
- Climate Change: Earth Surface Temperature Data (Is Global Warming for real? Global temperature data from the year 1750 onwards. What will be a good way to statistically segment this data?)
- https://www.kaggle.com/hhs/healt… (Healthcare analytics is booming. Data from US Department of Health on individual and small businesses. What drives the plan rate? Who makes the most money?)
- Used cars database | Kaggle (370,000 used cars data scraped from Germany Ebay. Letâ€™s keep it simple- build a linear regression model)
- Human Resources Analytics | Kaggle (Why do employees leave? Note that this is simulated data and not very large in size)