Project Foundations for Data Science: FoodHub Data Analysis¶
Marks: 60
Context¶
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
Objective¶
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.
Data Description¶
The data contains the different data related to a food order. The detailed data dictionary is given below.
Data Dictionary¶
- order_id: Unique ID of the order
- customer_id: ID of the customer who ordered the food
- restaurant_name: Name of the restaurant
- cuisine_type: Cuisine ordered by the customer
- cost: Cost of the order
- day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
- rating: Rating given by the customer out of 5
- food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
- delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information
Let us start by importing the required libraries¶
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# to suppress warnings
import warnings
warnings.filterwarnings('ignore')
Understanding the structure of the data¶
# let colab access my google drive
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
# read the data
df = pd.read_csv('/content/drive/MyDrive/MIT - Applied Data Science/Foundations - Python & Statistics/Project - FoodHub/foodhub_order.csv')
# returns the first 5 rows
df.head()
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | Not given | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | Not given | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4 | 25 | 24 |
Observations:¶
The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.
Question 1: How many rows and columns are present in the data? [0.5 mark]¶
# Write your code here
df.shape
(1898, 9)
Observations:¶
The dataframe has 1898 rows and 9 columns.
Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶
# Use info() to print a concise summary of the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(1), int64(4), object(4) memory usage: 133.6+ KB
Observations:¶
The dataset columns have the following datatypes:
- Integer: order_id, custumer_id, food_preparation_time and delivery_time;
- Float: cost_of_the_order;
- Object: restaurant_name, cuisine_type, day_of_the_week and rating.
Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶
# Write your code here
df.isna().sum()
| 0 | |
|---|---|
| order_id | 0 |
| customer_id | 0 |
| restaurant_name | 0 |
| cuisine_type | 0 |
| cost_of_the_order | 0 |
| day_of_the_week | 0 |
| rating | 0 |
| food_preparation_time | 0 |
| delivery_time | 0 |
Observations:¶
With this line of code, which returns a count of the null values, we can explicitly see that there are no null values in our data.
Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶
# Write your code here
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| order_id | 1898.0 | 1.477496e+06 | 548.049724 | 1476547.00 | 1477021.25 | 1477495.50 | 1.477970e+06 | 1478444.00 |
| customer_id | 1898.0 | 1.711685e+05 | 113698.139743 | 1311.00 | 77787.75 | 128600.00 | 2.705250e+05 | 405334.00 |
| cost_of_the_order | 1898.0 | 1.649885e+01 | 7.483812 | 4.47 | 12.08 | 14.14 | 2.229750e+01 | 35.41 |
| food_preparation_time | 1898.0 | 2.737197e+01 | 4.632481 | 20.00 | 23.00 | 27.00 | 3.100000e+01 | 35.00 |
| delivery_time | 1898.0 | 2.416175e+01 | 4.972637 | 15.00 | 20.00 | 25.00 | 2.800000e+01 | 33.00 |
Observations:¶
The describe() function above gives us a statistical summary for the different columns of the dataset. As we can see, for the food_preparation_time column, we got the following statistical results:
- Minimum: 20 min
- Average: 27 min
- Maximum: 35 min
Question 5: How many orders are not rated? [1 mark]¶
# Write the code here
# Finding how many rows in the rating column are filled with "Not given", meaning orders there are not rated
(df["rating"] == "Not given").sum()
736
Observations:¶
As we can see in the cell above, the dataset has 736 rows where the ranking column has the value "Not given".
Exploratory Data Analysis (EDA)¶
Univariate Analysis¶
Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶
# Write the code here
# Let's start by the cuisine_type column. Being a categorical variable I'll use a barplot
sns.countplot(data=df, x="cuisine_type")
plt.xticks(rotation=90)
plt.show()
Observations:¶
Regarding the cuisine_type column, we can conclude that the most popular cuisine types are Japanese, American, Italian and Chinese. Being the American cuisine the most preferred food of our dataset.
# Let's now analyse the cost_of_the_order column. Being a numerical variable, I'll use a histogram as well as a boxplot to give a more accurate visualization of the data
sns.histplot(data=df, x="cost_of_the_order", kde=True)
plt.show()
sns.boxplot(data=df, x="cost_of_the_order")
plt.show()
Observations:¶
- For the cost_of_the_order column we can see that the values for the orders range, approximatly, between 4 and 35, and the average being around 14.
- We can also see that the data is right skewed, whitch indicates us that there is a higher number of lower cost orders, and that the orders numbers tend to decrease as the price of the orders increases.
# Now we'll analyse the day_of_the_week column, whitch is a categorical variable, so I'll use a barplot
sns.countplot(data=df, x="day_of_the_week")
plt.show()
Observations:¶
- As we can clearly see, it's on the weekend days that are more orders being placed.
- The amount of orders on weekends is more than 2 times greater than on weekdays.
# Going for the rating column, I will use a barplot in order to analyse the ratings count
sns.countplot(data=df, x="rating")
plt.show()
Observations:¶
- We can see that the ratings range from 3 to 5, with 5 being the most common rating given by the customers.
- We also have a lot of customers that haven't given a rating score as we have already seen in the previous question.
- I will have to adress this issue later on so I can be able to do further analysis on this variable.
# I'm now going to plot the food_preparation_time column. Once this is a numerical variable, I shall use a histogram and a boxplot better visualize this data
sns.histplot(data=df, binwidth=1, x="food_preparation_time")
plt.show()
sns.boxplot(data=df, x="food_preparation_time")
plt.show()
Observations:¶
- Regarding the food_preparation_time variable we can see that there are around 120 orders for each bin between 20 and 34 minutes.
- For the 35th min. bin thought, the number of orders takes off to over 200.
- By analysing the boxplot we can confirm the statistics that we saw earlier, that the average time for preparing the food is about 27 min., the fastests orders are done in 20 min., and the ones that take more take around 35 min.
# Finally let's analyse the delivery_time column, with again a histogram and a boxplot, being this a numeric variable as well
sns.histplot(data=df, binwidth=1, x="delivery_time")
plt.show()
sns.boxplot(data=df, x="delivery_time")
plt.show()
Observations:¶
- By analysing the histogram we can immediately observe that most orders take about 24 minutes to be delivered to the customer.
- Being that number really close to the average delivery time, whitch is 25 minutes, as we can see in the boxplot.
- We can also conclude that the fastest delivery orders take 15 minutes, and 33 minutes is the maximum time a customer has to wait for his order to be delivered.
Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶
# Write the code here
df["restaurant_name"].value_counts().nlargest(5)
| count | |
|---|---|
| restaurant_name | |
| Shake Shack | 219 |
| The Meatball Shop | 132 |
| Blue Ribbon Sushi | 119 |
| Blue Ribbon Fried Chicken | 96 |
| Parm | 68 |
Observations:¶
The top 5 restaurants that have more orders received are:
- Shake Shack - 219 Orders
- The Meatball Shop - 132 Orders
- Blue Ribbon Sushi - 119 Orders
- Blue Ribbon Fried Chicken - 96 Orders
- Parm - 68 Orders
Question 8: Which is the most popular cuisine on weekends? [1 mark]¶
# Write the code here
# To find out which is the most popular cuisine type on weekends, I'll use a barplot showing the two variables by using the hue parameter on the week day
sns.countplot(data=df, x="cuisine_type", hue="day_of_the_week")
plt.xticks(rotation=90)
plt.show()
Observations:¶
As we can clearly see from the plot above, the most popular cuisine on weekends is the American cuisine with over 400 orders.
Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶
# Write the code here
# First I will store the total number of orders in a variable
total = df.shape[0]
# Now I will create another variable to store the number of orders that are above 20
above_20 = df[df["cost_of_the_order"] > 20].shape[0]
# Finaly, now that we have all the values we need, I'll find the percentage of the orders that cost more than 20 dollars and print the result
percentage_above_20 = (above_20 / total) * 100
round(percentage_above_20)
29
Observations:¶
29 % of the orders cost more than 20 dollars.
Question 10: What is the mean order delivery time? [1 mark]¶
# Write the code here
round(df["delivery_time"].mean())
24
Observations:¶
The mean order delivery time is around 24 minutes.
Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶
# Write the code here
df["customer_id"].value_counts().nlargest(3)
| count | |
|---|---|
| customer_id | |
| 52832 | 13 |
| 47440 | 10 |
| 83287 | 9 |
Observations:¶
The top 3 most frequent customer IDs are:
- 52832 - 13 Orders
- 47440 - 10 Orders
- 83287 - 9 Orders
Multivariate Analysis¶
Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶
# Write the code here
# I'll start by creating a correlation table between the most relevant numeric variables
corr = df[["cost_of_the_order", "food_preparation_time", "delivery_time"]].corr()
corr
| cost_of_the_order | food_preparation_time | delivery_time | |
|---|---|---|---|
| cost_of_the_order | 1.000000 | 0.041527 | -0.029949 |
| food_preparation_time | 0.041527 | 1.000000 | 0.011094 |
| delivery_time | -0.029949 | 0.011094 | 1.000000 |
# I'll now plot the data of the corr table using a heatmap to better visualize the correlations between this variables
sns.heatmap(data=corr, annot=True, cmap="YlGnBu")
plt.show()
Observations:¶
Surprisingly there seems to be very low correlations between this variables. I was expecting to see at least one strong correlation between food_preparation_time and delivery_time but that doesn't seem to be the case. So we can say that this three variables are not correlated.
# I'm now going to analyse the relationship between delivery_time and day_of_the_week to see in which days the orders arrive faster to the customers
sns.boxplot(data=df, x="day_of_the_week", y="delivery_time", palette="Set2")
plt.title("Delevery Time by Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Delivery Time (minutes)")
plt.show()
Observations:¶
- As we can see customers wait less time for there orders on weekends.
- Being the average time of the deliver, around 22 minutes for the weekends, and around 28 minutes on the weekdays.
- We can also observe that the minimum time it takes, on weekends, for an order to arrive at the custumer is 15 minutes. In comparison, the minimum waiting time, on weekdays, is around 24 minutes.
# I'm now going to analyse the relationship between food_preparation_time and day_of_the_week, to see if the food preparation time differs during the week from weekends.
sns.boxplot(data=df, x="day_of_the_week", y="food_preparation_time", palette="Set2")
plt.title("Food Preparation Time by Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Food Preparation Time (minutes)")
plt.show()
Observations:¶
Analysing this two variables, we can see that the time it takes for the food to be prepared is the same on weekends and on weekdays.
# I'm now going for the cuisine_type and cost_of_the_order variables, to see if there is much difference on the costs regarding the type of cuisine
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x="cuisine_type", y="cost_of_the_order", palette="Set3")
plt.title("Cost of the Order by Cuisine Type")
plt.xlabel("Cuisine Type")
plt.ylabel("Cost of the Order")
plt.xticks(rotation=90)
plt.show()
Observations:¶
- Looking at the plot, we observe that Vietnamese and Korean foods are the cheapest, with most part of the meals being between 10 and 15 dollars.
- All the other cuisine type prices are from around 12 to 29 dollars, for most of the orders.
- For the Korean, Mediterranean and Vietnamese types, we can also notice the presence of some outliers, indicating that these types of foods have some more expensive then normal options in their menu, that are rarely ordered, as well as some even cheaper options for the Korean cuisine.
# Let's look at the relationship between delivery_time and rating, to see if costumers tend to give higher ratings when the orders are delivered faster
sns.boxplot(data=df, x= "rating", y="delivery_time", palette="Set2")
plt.title("Rating by Delivery Time")
plt.xlabel("Rating")
plt.ylabel("Delivery Time")
plt.show()
Observations:¶
As we can see from the boxplots, it appears that there is no relationship between the time a customer waits for his order to arrive and the rating that it gives to the service.
# Now let's analyse if there is a relationship between the rating and how much a customer pays for his order, to see if the cost influences the ratings
sns.boxplot(data=df, x="rating", y="cost_of_the_order", palette="Set3")
plt.title("Rating by Cost")
plt.xlabel("Rating")
plt.ylabel("Cost")
plt.show()
Observations:¶
It also seems that the cost of the orders doesn't infuence the customers ratings, as all the rating points seem to be evenly balanced with the cost values.
Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶
# Write the code here
# First I will convert the "Not Given" values in the rating column to "NaN". Then I'll change the datatype of the rating column to numeric. Then I'll be able to calculate the average
df["rating"] = df["rating"].replace("Not given", np.nan)
# Converting the column to numeric datatype
df["rating"] = pd.to_numeric(df["rating"])
# Confirming if the conversion of the datatype was sucessful
df.info()
# Checking if the "Not Given" values where correctly replaced by NaN
df.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1162 non-null float64 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(2), int64(4), object(3) memory usage: 133.6+ KB
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.75 | Weekend | NaN | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.08 | Weekend | NaN | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.23 | Weekday | 5.0 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.20 | Weekend | 3.0 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.59 | Weekday | 4.0 | 25 | 24 |
# Now I'll group the data by restaurant_name and calculate the top rating counts for each rating value
df.groupby(["restaurant_name"])["rating"].value_counts().nlargest(10)
| count | ||
|---|---|---|
| restaurant_name | rating | |
| Shake Shack | 5.0 | 60 |
| The Meatball Shop | 5.0 | 53 |
| Shake Shack | 4.0 | 50 |
| Blue Ribbon Fried Chicken | 5.0 | 32 |
| Blue Ribbon Sushi | 5.0 | 32 |
| 4.0 | 25 | |
| Shake Shack | 3.0 | 23 |
| Blue Ribbon Fried Chicken | 4.0 | 21 |
| The Meatball Shop | 4.0 | 21 |
| RedFarm Broadway | 5.0 | 18 |
# Grouping the data by restaurant_name and calculting the count and average of ratings for each restaurant
stats = df.groupby("restaurant_name")["rating"].agg(["count", "mean"])
# Filtering the data to create a dataframe with only the restaurants that mach the promotional criteria
target_restaurants = stats[(stats["count"] > 50) & (stats["mean"] > 4)]
target_restaurants
| count | mean | |
|---|---|---|
| restaurant_name | ||
| Blue Ribbon Fried Chicken | 64 | 4.328125 |
| Blue Ribbon Sushi | 73 | 4.219178 |
| Shake Shack | 133 | 4.278195 |
| The Meatball Shop | 84 | 4.511905 |
Observations:¶
These are the restaurants that fill the criteria to get the promotional offer:
- Blue Ribbon Fried Chicken - 64 ratings Average 4.3
- Blue Ribbon Sushi - 73 ratings Average 4.2
- Shake Shack - 133 ratings Average 4.2
- The Meatball Shop - 84 ratings Average 4.5
Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶
# Write the code here
# To calculate the company total net revenue, I'll create a column in the dataframe assigning the defined percentages to the correspondent order values
df["company_revenue"] = 0
# Assigning the 25% revenue for orders over 20 dollars
df.loc[df["cost_of_the_order"] > 20, "company_revenue"] = df["cost_of_the_order"] * 0.25
# Assigning the 15% revenue for orders over 5 to 20 dollars
df.loc[(df["cost_of_the_order"] > 5) & (df["cost_of_the_order"] <= 20), "company_revenue"] = df["cost_of_the_order"] * 0.15
# Calculating the total net revenue
net_revenue = df["company_revenue"].sum()
round(net_revenue, 2)
6166.3
Observations:¶
The net revenue generated by the company across all orders is 6166.3 dollars.
Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶
# Write the code here
# I'll first create a new column in the dataframe with the total delivery time
df["total_time"] = df["food_preparation_time"] + df["delivery_time"]
# Now I'll count the orders that take more than 60 minutes
above_60 = df[df["total_time"] > 60]
# Finally I'll calculate the percentage
percentage_above_60 = (above_60.shape[0] / df.shape[0]) * 100
# I will also calculate the total number of orders over 60 minutes to give more prespective
total_above_60 = above_60["total_time"].count().sum()
print(round(percentage_above_60, 1))
print(total_above_60)
10.5 200
Observations:¶
There are 200 orders that take more than 60 minutes to be delivered, which corresponds to 10.5 % of all the orders.
Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶
# Write the code here
# Let's plot again the delivery times by day of the week to visualise this two variables
sns.boxplot(data=df, x="day_of_the_week", y="delivery_time", palette="Set2")
plt.title("Delevery Time by Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Delivery Time (minutes)")
plt.show()
# Now let's calculate the mean delivery times for weekends and weekdays
delivery_mean= df.groupby(["day_of_the_week"])["delivery_time"].mean()
delivery_mean
| delivery_time | |
|---|---|
| day_of_the_week | |
| Weekday | 28.340037 |
| Weekend | 22.470022 |
Observations:¶
- As we can see customers wait less time for there orders on weekends.
- We can also observe that the minimum time it takes, on weekends, for an order to arrive at the custumer, after it was picked, is 15 minutes. In comparison, the minimum waiting time, on weekdays, is around 24 minutes.
- The maximum time it takes to deliver an order, on weekends, is 30 minutes, after picking, and on weekdays it's around 33 minutes.
- Being the average time of the deliver, around 22 minutes for the weekends, and around 28 minutes on the weekdays.
- Given this, we now know that, on average, after the orders are ready, they are delivered 6 minutes faster on weekends.
Conclusion and Recommendations¶
Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶
Conclusions:¶
- Cuisine Preferences:
- American cuisine is the most popular, especially on weekends. This suggests that American food has a consistent demand, and focusing on this cuisine type could be a strategy for both restaurants and the FoodHub platform.
- Other popular cuisines include Japanese, Italian, and Chinese, which shows a diverse customer preference.
- Cost Trends:
- There are many orders that fall in the lower price range around 10 to 15 dollars, but there is still a noticeable number of higher value orders, over 20 dollars, which contribute significantly to revenue. This can provide insights into pricing strategies, with opportunities for both budget and premium offerings.
- Delivery and Food Preparation:
- The average delivery time is faster on weekends, which could be due to fewer overall orders or more efficient delivery operations.
- Food preparation time doesn't vary significantly between weekdays and weekends, suggesting that restaurants maintain consistent efficiency in preparing meals.
- Customer Ratings:
- The majority of ratings are 4-5 stars, indicating that customers are generally satisfied with the service. However, a significant number of orders are not rated, which could affect the overall picture of customer satisfaction.
- There is no clear correlation between delivery time and ratings, implying that delivery speed does not strongly influence customer satisfaction.
- Promotional Opportunities:
- Based on the ratings count and average rating, restaurants like Blue Ribbon Fried Chicken and Shake Shack are strong candidates for promotional offers, which could attract more customers.
- The company's revenue model is more profitable on orders above $20, but a significant portion of orders falls below that threshold. The model should consider incentivizing higher-value orders.
- Operational Bottlenecks:
- About 10.5 % of orders take more than 60 minutes for delivery and preparation, which may indicate potential inefficiencies or high demand for certain restaurants during peak hours.
Recommendations:¶
- Focus on American Cuisine and Popular Restaurants:
- Promote American cuisine heavily, especially during weekends, where demand is highest. This could include special deals or features for American restaurants to capitalize on high traffic.
- Invest in marketing campaigns for top restaurants like Shake Shack and The Meatball Shop, which are frequently ordered from.
- Improve Operational Efficiency:
- Since delivery times are generally quicker on weekends, the company should investigate potential reasons for slower delivery times during weekdays, such as traffic patterns or workforce issues. Implementing time-based pricing for weekends could optimize delivery logistics.
- Consider reviewing restaurants with longer food preparation times, especially those that often exceed the average time of 27 minutes.
- Encourage More Ratings:
- Since 736 orders were not rated, FoodHub could implement incentives for customers to provide feedback. For example, offering small discounts or loyalty points for completing the rating could increase the volume of customer feedback.
- Additionally, investigating whether there's a pattern of un-rated orders for example, if certain restaurants or delivery times correlate with no rating, could provide valuable insights.
- Leverage Promotions for High-Rating Restaurants:
- The company should continue targeting restaurants with over 50 ratings and an average rating of above 4 for promotional offers. These restaurants have consistently satisfied customers and are likely to be more successful in driving traffic with a targeted advertisement campaign.
- The restaurants eligible for the promotional offer, like Blue Ribbon Sushi and Shake Shack, could be marketed in a way that emphasizes their customer satisfaction.
- Incentivize Higher-Value Orders:
- Given that 29% of orders exceed $20, FoodHub could consider offering discount vouchers or loyalty programs for customers who place higher-value orders.
- Address Long Delivery Times:
- Focus on improving the delivery process, especially for orders that take over 60 minutes, as this is a significant pain point. Identifying peak hours and restaurants that contribute to delays could help improve the overall customer experience and reduce customer complaints.
- Adding real-time tracking and better estimated delivery time features in the app could reduce customer frustration and improve satisfaction.
By focusing on these areas, FoodHub could improve customer satisfaction, streamline operations, and boost revenue.