Table of contents
R Shiny Application
Data Visualisation using Alteryx
& Tableau
Predictive modeling
Consulting Case studies
R Shiny Application
Approach: The main idea is to create an application, that we named 'Foogle', that would work as an intuitive restaurant finder tool designed for people who have food allergies or special dietary needs. The primary focus was meeting all project goals and deliverables and the next was to provide a very simple and easy-to-use interface allowing the most useful functions. Consequentially, we conceptualized a broad database design that would allow both expansion and contraction of use cases so that the basic table structure need not be disturbed. Then an R shiny interface was created followed by database queries to fetch data.
Data Visualization Project
Project : Commit Partnership – Data Visualization Project Description
Project description: The Commit Partnership is an educational non-profit whose true-north goal is that by 2040 at least half of all 25-34 year old residents of Dallas County, irrespective of race, will be provided the opportunity to earn a living wage.
Preliminary STAAR results show drastic declines in student performance from 2019 to 2021. This data visualization project would create a Tableau dashboard that visualizes these changes, with a particular focus on schools in the DFW Metroplex. Students will also need to use Alteryx to blend various data sources together.
Project scope: Students are encouraged to analyze and provide insights with regards to :
- Which schools had the biggest declines in test scores from 2019-2021? Which ones had the smallest? Are there any trends about schools with the biggest/smallest declines?
- Which schools are “bright spots” (overperforming expectations), and which schools should be “focus areas” (underperforming expectations)?
- Which schools are succeeding at closing the opportunity gap (the difference in standardized test scores and other metrics by race and/or income)?
Project Findings:
- Major factors that were considered :
- Percent of Economically Disadvantaged (EcoDis) students
- Opportunity Gap
- Minority Full Time Staff
- Percent of Bilingual Teachers
- Teacher Experience Average
- Teacher Student Ratio
- Teacher Base Salary
- Insights :
- Factors that heavily influenced outcomes :
- Percent of EcoDis Students
- Opportunity Gap
- Dallas County consistently showed better average test scores for Hispanic students.
- In both Dallas County and Tarrant County, Hispanic students had the largest drop in average test scores.
- Economically disadvantaged students continue to struggle.
- Minority groups are disproportionately affected by hiccups in education.
- Strides have been made to attempt to close the Opportunity Gap
ALTERYX
Alteryx Packaged Workflow here
TABLEAU
A glimse of Tableau work
Tableau packaged workbook here
Predictive Modelling
BIKESHARE PROJECT
Use the publicly available bikeshare dataset to do exploratory data analysis, predict a linear model and provide valuable insights.
Observations and Insights:
- The spread of all users was fairly equal among all weekdays, however, the number of bike riders dropped on weekends.
- This could be due to the fact that customers are mostly registered and use bikes mostly on weekdays.
- The number of bike users both casual and regular, increased in the second year by almost double. Demonstrating that the company has significant growth potential.
- The number of users peaked at about 80 degrees Fahrenheit, at temperatures hotter than that the amount of users begins to decline. The number of riders is much lower at low temperatures (winters)
- The results of the regression indicated that variables like - season, year, month, holiday, weekday, working day, weather situation, temperature, humidity, and wind speed - account for 84.22% of the variation in total rental bike users (count). From this, we can conclude that the time of year and environmental conditions are important elements to consider when predicting total bike users (count).
Suggestions:
- The number of registered users is four times larger than the number of casual users. We also observed that there is a significant drop in casual bike users that rent bikes on weekdays compared to weekends. We could have promotions that are targeted directly at casual users specifically for weekdays and persuade them to become registered users.
- We need to instill confidence to customers that biking can still be a good option for transportation regardless of season or temperature.
- In terms of bike users and temperature and the seasons, it can be understood that the amount of bike riders in the winter is low due to the harsh temperatures. One way to mitigate this is by advertising bikes that are more suitable for that weather.
- Windspeeds appear to have a highly negative impact on ridership and hence we may also explore the option of using windscreens for the bikes on a trial basis. They are not very expensive to install or maintain.
Project final report here
Project R file here
Project data here
Note: This is a group project collectively done by Allen Mattam, Andrea De Leon, Isak Arms, Jessica Zhang and Srikanth Soma.
MOVIE PREDICTION PROJECT
Project: Run a logistic regression on the movies datasets. The goal is to say something interesting about this data. The outcome of interest is whether the movie is considered “good” or “bad” based on the movie’s rating. Part of the purpose of this homework project is to get comfortable obtaining, cleaning, and prepping data for analysis. Combine or manipulate the original variables or make new variables using original variables if needed.
Approach: We cleansed the data using python. We created new variables namely ‘cast_score’, ‘director_score’, and ‘production_score’. We used the top directors, Actors & Actresses and Production houses according to IMDB and incremented the respective variables - ‘director_score’, ‘cast_score’ and ‘production_score’ by 1 every there was one from the list in the movie. We then ran regression in R with variables - Belongs_to_collection, popularity, runtime, cast_score, director_score, production_score.
Observations and Insights:
- There are many movies that have very high popularity but have very low ratings and vice versa. Given that we use the calculated rating to determine whether a movie is good or bad, we assumed that this might be due to movies that are marketed by pushing a lot of money to create a lot of hype from the start, or by having a very high rated actor in the movie but doesn't gain a lot of viewership after the movie is actually released.
- Our model suggests that the popularity of a movie means that the movie is 59% probable to be a good movie.
- The star actors in the movie, the director, and the producers of the movie have a significant influence on the success of the movie by 74%, 85%, and 60% respectively.
- There is a negative relationship between the fact that a movie belongs to a series and the rating of the movie. Based on our model, if everything else remains the same, the probability that a movie is bad is 57% if is part of a movie series. However, this goes against the common perception and there could be unaccounted factors playing into this or it is just that the count of bad rated movies belonging to series is higher than the popular ones.
Suggestions:
- Although great cast, production and direction will add value, they come at a high cost. Having a great director had a better impact than cast and production (according to our model) and could be a balanced trade-off between expense and success factors.
- Other good factors that contribute to the success of the movie:
- Run time between 90-150 minutes.
- Genres: Animation, Adventure and Family Movie (for greater ROI)
- Popularity: IMDB calculates this variable using its proprietary formula. This variable is calculated before the release of the movie and influences the marketing plans.
Project Final report here
Project R file here
Project Python file here
Project data here
Business Analytics Consulting Case Studies
Executive
summary and presentation outlining
top 5 analytics/ process solutions to transform and scale loan
contract-drafting process at financial services firm. Flowcharts of as-is and
to-be processes, and value realization matrix for proper prioritization of
solutions.
CVS service improvement (HBC)
Breakdown
of pharmacy service process stages and identification of areas of improvement
to increase customer satisfaction. Outline of additional decisions to further
improve and scale the current process.
Domestic Auto Parts (HBC)
Outline
of objectives for each of the dimensions of the Balanced Scorecard: Financial,
Customer, Internal Processes, Learning & Growth. Proposals of KPIs to
measure performance against strategy.
UCB data is the new drug (HBC)
Assessment
of existence of an evidence-based culture at pharmaceutical company UCB to
foster better decision making. Planning of decision agenda for the executive
meeting in Shanghai to present technology strategy.
Ethics - Data Science at Target
How
far can data science go? What role does an Analyst have in making ethical decision-making?