09 January 2022

Business Analytics Projects & Learnings


Table of contents

R Shiny Application

Data Visualisation using Alteryx & Tableau

Predictive modeling

Consulting Case studies



R Shiny Application


Project: A R Shiny application using Microsoft Server Database 

Project Goal: This group project will simulate the system design and implementation experience of a typical software consulting firm. The project will require you to analyze user requirements, develop and document a database design to support the user requirements, and implement the project in SQL Server and Shiny. Project deliverables include SQL queries used for developing the database,  R code for the shiny application, Final Report containing project proposal, Conceptual design and Shiny App URL.

Approach: The main idea is to create an application, that we named '
Foogle', that would work as an intuitive restaurant finder tool designed for people who have food allergies or special dietary needs. The primary focus was meeting all project goals and deliverables and the next was to provide a very simple and easy-to-use interface allowing the most useful functions. Consequentially, we conceptualized a broad database design that would allow both expansion and contraction of use cases so that the basic table structure need not be disturbed. Then an R shiny interface was created followed by database queries to fetch data.

More details are available in the Project Paper.


R shiny Application here

Project Paper here

Project Presentation here

Project R file here

Project SQL queries here

Entity Relationship diagram here


Note1: The project was a group effort by Allen Mattam, Audrey Nguyen, Priyanka Raj, Srikanth Soma
Note2: To run the R file, another file containing database credentials is needed. It is not being shared for security reasons.

Screenshots:




Data Visualization Project

Project : Commit Partnership – Data Visualization Project Description

Project description: The Commit Partnership is an educational non-profit whose true-north goal is that by 2040 at least half of all 25-34 year old residents of Dallas County, irrespective of race, will be provided the opportunity to earn a living wage.  

        Preliminary STAAR results show drastic declines in student performance from 2019 to 2021. This data visualization project would create a Tableau dashboard that visualizes these changes, with a particular focus on schools in the DFW Metroplex. Students will also need to use Alteryx to blend various data sources together.   


Project scope: Students are encouraged to analyze and provide insights with regards to :

  • Which schools had the biggest declines in test scores from 2019-2021? Which ones had the smallest? Are there any trends about schools with the biggest/smallest declines?  
  • Which schools are “bright spots” (overperforming expectations), and which schools should be “focus areas” (underperforming expectations)?  
  • Which schools are succeeding at closing the opportunity gap (the difference in standardized test scores and other metrics by race and/or income)?

Project Findings

  • Major factors that were considered​ :
    • Percent of Economically Disadvantaged (EcoDis) students​
    • Opportunity Gap​
    • Minority Full Time Staff​
    • Percent of Bilingual Teachers​
    • Teacher Experience Average​
    • Teacher Student Ratio​
    • Teacher Base Salary

  • Insights​ :
    • Factors that heavily influenced outcomes​ :
      • Percent of EcoDis Students​
      • Opportunity Gap
    • Dallas County consistently showed better average test scores for Hispanic students​.
    • In both Dallas County and Tarrant County, Hispanic students had the largest drop in average test scores​.
    • Economically disadvantaged students continue to struggle.
    • Minority groups are disproportionately affected by hiccups in education.
    • Strides have been made to attempt to close the Opportunity Gap
Note1: This is a group project collectively worked on by David Lin, Edward Zhao, Nicolas Rhodes, Nikhil Cherukuri, and Srikanth Soma​.
Note2: The data used for the purpose of this project is publicly available (most of it at https://tea.texas.gov/ ) and is not proprietory. 

ALTERYX










Alteryx  Packaged Workflow here


TABLEAU


A glimse of Tableau work











Tableau packaged workbook here


Predictive Modelling


BIKESHARE PROJECT


Project
Use the publicly available bikeshare dataset to do exploratory data analysis, predict a linear model and provide valuable insights.

Observations and Insights: 

  • The spread of all users was fairly equal among all weekdays, however, the number of bike riders dropped on weekends. 
  • This could be due to the fact that customers are mostly registered and use bikes mostly on weekdays. 
  • The number of bike users both casual and regular, increased in the second year by almost double. Demonstrating that the company has significant growth potential. 
  • The number of users peaked at about 80 degrees Fahrenheit, at temperatures hotter than that the amount of users begins to decline. The number of riders is much lower at low temperatures (winters) 
  • The results of the regression indicated that variables like - season, year, month, holiday, weekday, working day, weather situation, temperature, humidity, and wind speed - account for 84.22% of the variation in total rental bike users (count). From this, we can conclude that the time of year and environmental conditions are important elements to consider when predicting total bike users (count). 

Suggestions:

  • The number of registered users is four times larger than the number of casual users. We also observed that there is a significant drop in casual bike users that rent bikes on weekdays compared to weekends. We could have promotions that are targeted directly at casual users specifically for weekdays and persuade them to become registered users.
  • We need to instill confidence to customers that biking can still be a good option for transportation regardless of season or temperature.
    • In terms of bike users and temperature and the seasons, it can be understood that the amount of bike riders in the winter is low due to the harsh temperatures. One way to mitigate this is by advertising bikes that are more suitable for that weather.
    • Windspeeds appear to have a highly negative impact on ridership and hence we may also explore the option of using windscreens for the bikes on a trial basis. They are not very expensive to install or maintain.


Project final report here

Project R file here

Project data here

Note: This is a group project collectively done by Allen Mattam, Andrea De Leon, Isak Arms, Jessica Zhang and Srikanth Soma.


MOVIE PREDICTION PROJECT

ProjectRun a logistic regression on the movies datasets. The goal is to say something interesting about this data. The outcome of interest is whether the movie is considered “good” or “bad” based on the movie’s rating. Part of the purpose of this homework project is to get comfortable obtaining, cleaning, and prepping data for analysis. Combine or manipulate the original variables or make new variables using original variables if needed.

ApproachWe cleansed the data using python. We created new variables namely ‘cast_score’, ‘director_score’, and ‘production_score’. We used the top directors, Actors & Actresses and Production houses according to IMDB and incremented the respective variables - ‘director_score’, ‘cast_score’ and ‘production_score’ by 1 every there was one from the list in the movie. We then ran regression in R with variables - Belongs_to_collection, popularity, runtime, cast_score, director_score, production_score.

Observations and Insights: 

  • There are many movies that have very high popularity but have very low ratings and vice versa. Given that we use the calculated rating to determine whether a movie is good or bad, we assumed that this might be due to movies that are marketed by pushing a lot of money to create a lot of hype from the start, or by having a very high rated actor in the movie but doesn't gain a lot of viewership after the movie is actually released.
  • Our model suggests that the popularity of a movie means that the movie is 59% probable to be a good movie.
  • The star actors in the movie, the director, and the producers of the movie have a significant influence on the success of the movie by 74%, 85%, and 60% respectively.
  • There is a negative relationship between the fact that a movie belongs to a series and the rating of the movie. Based on our model, if everything else remains the same, the probability that a movie is bad is 57% if is part of a movie series. However, this goes against the common perception and there could be unaccounted factors playing into this or it is just that the count of bad rated movies belonging to series is higher than the popular ones.

Suggestions:

  • Although great cast, production and direction will add value, they come at a high cost. Having a great director had a better impact than cast and production (according to our model) and could be a balanced trade-off between expense and success factors.
  • Other good factors that contribute to the success of the movie:
    • Run time between 90-150 minutes.
    • Genres:  Animation, Adventure and Family Movie (for greater ROI)
    • Popularity: IMDB calculates this variable using its proprietary formula. This variable is calculated before the release of the movie and influences the marketing plans. 

 

Project Final report here

Project R file here

Project Python file here

Project data here


Business Analytics Consulting Case Studies

Blueprint for FinLaw Case

Executive summary and presentation outlining top 5 analytics/ process solutions to transform and scale loan contract-drafting process at financial services firm. Flowcharts of as-is and to-be processes, and value realization matrix for proper prioritization of solutions.

 

CVS service improvement (HBC)

Breakdown of pharmacy service process stages and identification of areas of improvement to increase customer satisfaction. Outline of additional decisions to further improve and scale the current process.

 

Domestic Auto Parts (HBC)

Outline of objectives for each of the dimensions of the Balanced Scorecard: Financial, Customer, Internal Processes, Learning & Growth. Proposals of KPIs to measure performance against strategy.

 

UCB data is the new drug (HBC)

Assessment of existence of an evidence-based culture at pharmaceutical company UCB to foster better decision making. Planning of decision agenda for the executive meeting in Shanghai to present technology strategy.

 

Ethics - Data Science at Target

How far can data science go? What role does an Analyst have in making ethical decision-making?