Cafe in Hong Kong

Posted on Mon 14 October 2024 in notebooks

This notebook is an adaptation with pyAgrum of Gautier Marti's

Bayesian Network for Business : Modeling Profit and Loss of a Cafe in Hong Kong.

Thanks a lot, Gautier, for this inspiring notebook !

Bayesian Network of Cafe Profit Model
In [1]:
import numpy as np
import pandas as pd

import pyAgrum as gum
import pyAgrum.clg as gclg

import pyAgrum.lib.notebook as gnb
import pyAgrum.clg.notebook as gclgnb

We use CLG (Conditioal Linear Gaussian Bayesian network) to describe the relationships between these variables. Each CPD defines how one variable depends lineary on another (or stays constant, in the case of fixed costs.

  • FootTraffic: This is treated as an independent variable. We model it with a mean value (average_traffic) and a variance, representing the fluctuation in the number of visitors per day.
  • DailySales: This is modeled as a function of foot traffic. The more visitors, the more sales. The average bill per customer is represented by average_bill.
  • RawMaterialCosts: The cost of raw materials is modeled as a percentage of daily sales, reflecting the idea that a fraction of sales goes towards covering ingredient costs. For instance, in this case, 40% of sales goes to raw materials, with a base cost of 100 HKD per day.
  • Wages & Rent: These are fixed daily costs, represented with no variability, as modeled by:
  • Profit: Finally, we calculate profit as the difference between revenue and costs.

In formula terms, the model is: $$\begin{eqnarray} Wages&=&\text{daily_wage}\\ Rent&=&\text{daily_rent}\\ FootTraffic&=&\text{average_traffic}+ϵ,ϵ∼{\cal N}(0,0.1×\text{average_traffic})^2)\\ DailySales&=&0+FootTraffic\times\text{average_bill}+\epsilon,\epsilon∼{\cal N}(0,(0.1×\text{average_bill})^2)\\ RawMaterialCosts&=&100+0.4\times DailySales+ϵ,ϵ∼{\cal N}(0,20)\\ Profit&=&DailySales−RawMaterialCosts−Wages−Rent+ϵ,ϵ∼{\cal N}(0,10)\\ \end{eqnarray} $$

pyAgrum proposes a quite simple syntax to declare a linear SEM (Structural Equation Model). For instance, the equation (5) for $RawMaterialCost$ will be written :

RawMaterialCosts = 100+0.4*DailySales [20] (the * is optionnal)

In [2]:
def build_model(average_bill=60.0, 
                average_traffic=100.0, 
                daily_rent=2000.0, 
                daily_wage=1200.0,
                epsilon=0.0):
  sem=f"""
Wages            = {daily_wage}                           [{epsilon}] # for exact inference, 
Rent             = {daily_rent}                           [{epsilon}] # stdev=0 not allowed
FootTraffic      = {average_traffic}                      [{0.01*average_traffic*average_traffic}]
DailySales       = {average_bill} FootTraffic             [{0.01*average_bill*average_bill}]
RawMaterialCosts = 100+0.4 DailySales                     [20]
Profit           = DailySales-RawMaterialCosts-Wages-Rent [10]
  """
  return gclg.SEM.toclg(sem)
model=build_model()
model
Out[2]:
G Wages Wages μ=1200.000 σ=0.000 Profit Profit μ=0.000 σ=10.000 Wages->Profit -1.00 Rent Rent μ=2000.000 σ=0.000 Rent->Profit -1.00 FootTraffic FootTraffic μ=100.000 σ=100.000 DailySales DailySales μ=0.000 σ=36.000 FootTraffic->DailySales 60.00 RawMaterialCosts RawMaterialCosts μ=100.000 σ=20.000 DailySales->RawMaterialCosts 0.40 DailySales->Profit 1.00 RawMaterialCosts->Profit -1.00

Exact Inference for Conditional Linear Gaussian model

pyAgrum can compute the exact posterior of a CLG (but not for $\sigma=0$):

In [3]:
model2=build_model(epsilon=0.001)
gnb.sideBySide(gclgnb.getInference(model2,evs={}),
               gclgnb.getInference(model2,evs={"DailySales":8000}),
               captions=["Exact inference with no evidence",
                         "Exact inference knowing the value of DailySales (=8000)"])
G Wages Wages μ=1200.000 σ=0.001 Profit Profit μ=300.000 σ=3600.134 Wages->Profit Rent Rent μ=2000.000 σ=0.001 Rent->Profit FootTraffic FootTraffic μ=100.000 σ=100.000 DailySales DailySales μ=6000.000 σ=6000.108 FootTraffic->DailySales RawMaterialCosts RawMaterialCosts μ=2500.000 σ=2400.127 DailySales->RawMaterialCosts DailySales->Profit RawMaterialCosts->Profit
Exact inference with no evidence
G Wages Wages μ=1200.000 σ=0.001 Profit Profit μ=1500.000 σ=22.361 Wages->Profit Rent Rent μ=2000.000 σ=0.001 Rent->Profit FootTraffic FootTraffic μ=133.332 σ=0.600 DailySales DailySales μ=8000.000 σ=0.000 FootTraffic->DailySales RawMaterialCosts RawMaterialCosts μ=3300.000 σ=20.000 DailySales->RawMaterialCosts DailySales->Profit RawMaterialCosts->Profit
Exact inference knowing the value of DailySales (=8000)

Now, we strictly follow [marti.ai](https://marti.ai/business/2024/10/14/pnl-cafe-simu.html)

Traffic and Bill simulation

The simulation explores a range of

  • foot traffic levels (traffic = [10 * i for i in range(1, 12)])
  • and average customer bill sizes (bill = range(40, 71)).

For each combination of traffic and bill size, a Bayesian network model is built to represent the relationships between key variables like foot traffic, daily sales, raw material costs, wages, rent, and profit.

Once the model is set up, a Monte Carlo simulation is run (NB_SIMU = 1000). For each simulation:

  • A year’s worth of daily profit is simulated by generating an observed foot traffic level for each day, drawn from a normal distribution around the specified average foot traffic (np.random.normal(average_traffic, average_traffic * 0.1)).
  • Using the observed foot traffic and the conditional relationships between variables, the daily sales and profit are calculated based on the observed traffic.

For each simulation, the cumulative profit over the year is recorded, and then averaged across all simulations for each combination of foot traffic and bill size. This results in an estimate of the mean annual profit (PnL) for a café given different levels of foot traffic and average bill size.

In [4]:
from tqdm.auto import tqdm

mean_year_pnl = []
traffic = [10 * i for i in range(1, 12)]
bill = range(40, 71)
dates = pd.date_range("2024-01-01", "2025-01-01")
N=1000

mean_year_pnl=[]
for average_traffic in tqdm(traffic):
    mean_year_pnl_per_traffic= []
    for average_bill in bill:
      model = build_model(average_bill=average_bill,
                          average_traffic=average_traffic,
                          daily_rent=1500)
      fs = gclg.ForwardSampling(model)
      all_daily_pnl=[fs.makeSample(N).topandas()["Profit"] for date in dates]
      all_daily_pnl = [list(sublist) for sublist in list(zip(*all_daily_pnl))]
      mean_year_pnl_per_traffic.append(pd.DataFrame(all_daily_pnl).cumsum().iloc[-1].mean())
    mean_year_pnl.append(mean_year_pnl_per_traffic)
In [5]:
df_mean_year_pnl = pd.DataFrame(mean_year_pnl, index=traffic, columns=bill)

How to interpret the Simulation

  • Traffic Impact: By varying foot traffic from low to high, the simulation shows how different levels of customer footfall influence the café’s annual profit. Lower traffic may result in negative profits (losses), while higher traffic might lead to profitability.

  • Bill Size Sensitivity: The model also explores the impact of average customer spending (the bill size). A small increase in average bill size could lead to higher profit margins since fixed costs (rent, wages) remain constant, and the additional revenue directly boosts profitability.

  • Annual Profit Ranges: For each scenario of foot traffic and bill size, you’ll see the range of possible profit outcomes, helping to assess how sensitive the café’s financial performance is to these key variables.

In [6]:
import matplotlib.pyplot as plt
plt.rcParams['figure.facecolor']='white'

plt.figure(figsize=(8, 6))
plt.pcolormesh(df_mean_year_pnl, cmap='RdYlGn')
plt.grid(True, which='both', color='lightgray', linestyle='--', linewidth=0.5)
plt.xticks(range(len(bill)), bill, rotation=90, fontsize=12)
plt.yticks(range(len(traffic)), traffic, rotation=90, fontsize=12)
plt.colorbar()
plt.xlabel("Average bill per patron (in HKD)", size=14)
plt.ylabel("Average number of patrons in a day", size=14)
plt.title("Yearly profit (HKD)", size=14)
plt.tight_layout()
2024-10-20T12:45:04.194237 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/

Quick Comment on the Plot

The plot visualizes the yearly profit of a café as a function of average foot traffic (number of patrons per day) and average bill size (spending per customer). Each cell represents the estimated profit based on the combination of these two factors, with color intensity indicating profit levels.

Key observations:

  • Low foot traffic (bottom rows) generally results in negative profits, regardless of the bill size, indicating that a minimum customer base is essential to cover fixed costs like rent and wages.

  • Higher foot traffic (top rows) leads to a positive profit zone, especially as the average bill size increases.

  • Profit Sensitivity: There is a clear transition from loss to profit as the average number of patrons and their spending increase, highlighting that both high traffic and a sufficient average bill are crucial for the café’s success.

This plot helps identify the break-even points, where running the café becomes profitable, and provides an intuitive visual guide for understanding how small changes in traffic or bill size affect overall profitability.

In [7]:
plt.figure(figsize=(8, 6))
foot_traffic = 90
df_mean_year_pnl.loc[foot_traffic].plot(marker='o', markersize=6, color='blue', lw=2, label='Profit')
plt.axhline(0, color='red', linestyle='--', lw=2, label='Break-even')
plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='gray')
plt.xlabel("Average bill per patron (in HKD)", size=14)
plt.ylabel("Yearly profit (HKD)", size=14)
plt.title(f"Yearly profit in HKD (assuming {foot_traffic} daily patrons)", size=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.legend(loc='upper left', fontsize=12)
plt.tight_layout()
plt.show()
2024-10-20T12:45:04.495385 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/

Brief Comment on the Plot

This plot illustrates the projected yearly profit of the café for a foot traffic level of 90 daily patrons, depending on the average spending per customer (bill size).

  • The red dashed line represents the break-even point, where profit is zero.

  • As we can see, with lower average bills, the café operates at a loss. However, once the average bill surpasses approximately HKD 52, the café crosses the break-even threshold and starts generating profit.

  • The plot shows the sensitivity of profitability to the bill size: even small increases in the average bill lead to significant improvements in yearly profit once the business crosses the break-even point.

This graph provides valuable insights into how bill size impacts the café’s financial performance, showing that profitability is highly dependent on maintaining a sufficiently high average spend per customer.

In [8]:
plt.figure(figsize=(8, 6))
avg_bill_patron_1 = 55
avg_bill_patron_2 = 65

# Plot the curves with different styles for better distinction
df_mean_year_pnl.T.loc[avg_bill_patron_1].plot(
    label=f"Average bill / patron: HKD {avg_bill_patron_1}", linestyle='-', marker='o', markersize=6, lw=2)
df_mean_year_pnl.T.loc[avg_bill_patron_2].plot(
    label=f"Average bill / patron: HKD {avg_bill_patron_2}", linestyle='--', marker='s', markersize=6, lw=2)

# Add the break-even line
plt.axhline(0, color='red', linestyle='--', lw=2, label='Break-even')

# Add gridlines and labels
plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='gray')
plt.xlabel("Average number of patrons in a day", size=14)
plt.ylabel("Yearly profit (HKD)", size=14)
plt.title("Yearly profit in HKD (depending on number of daily patrons)", size=14)

# Customize ticks and legend
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.legend(loc='upper left', fontsize=12)
plt.tight_layout()
plt.show()
2024-10-20T12:45:04.669853 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/

Brief Comment on the Plot

This plot visualizes the yearly profit of the café based on the number of daily patrons for two different average bill amounts: HKD 55 and HKD 65.

  • The solid line represents the yearly profit for an average bill of HKD 55, while the dashed line represents an average bill of HKD 65.

  • The red dashed line marks the break-even point, where the profit equals zero.

  • As expected, a higher average bill significantly boosts the profitability of the café, especially when the daily foot traffic increases.

  • For both bill amounts, the café operates at a loss at lower foot traffic levels, but as the number of daily patrons rises, profitability improves, with the break-even point being reached earlier for the HKD 65 bill compared to the HKD 55 bill.

Of course, pricing is competitive, and you may lose patrons by increasing price… which is not modeled at all here.

Overall, this simulation provides insights into the break-even points and profitability of a small café, highlighting how critical customer traffic and average spending are to the business’s financial health.

Simulation for 1 year of business, given a set of parameters

This final simulation runs multiple trajectories (1,000 simulations) of daily profit over the course of one year, given a specific set of parameters:

  • Average foot traffic: 80 patrons per day
  • Average bill per patron: HKD 59
  • Daily rent: HKD 1,500
  • Daily wages: HKD 1,200

Explanation of the Process

  • For each simulation, daily profit is computed based on observed daily foot traffic, which fluctuates around the set average (80 patrons), with variability of 20% (i.e., foot traffic is drawn from a normal distribution centered on 80 with a standard deviation of 16).

  • Daily profit is computed through the Bayesian Network, which conditions profit on variables such as foot traffic and daily sales, using the Joint Gaussian Distribution to account for dependencies between the variables.

  • The cumulative yearly profit is then calculated by summing up the daily profits for each simulation.

In [9]:
FOOT_TRAFFIC = 80
NB_SIMU=1000
model = build_model(
    average_bill=59,
    average_traffic=FOOT_TRAFFIC,
    daily_rent=1500,
    daily_wage=1200,
    epsilon=0.001
)
ie = gclg.CLGVariableElimination(model)
all_daily_pnl=[]
for i in tqdm(range(NB_SIMU)):
  daily_pnl=[]
  for date in dates:
     observed_foot_traffic = np.random.normal(FOOT_TRAFFIC, 0.2 * FOOT_TRAFFIC)
     ie.updateEvidence({"FootTraffic":observed_foot_traffic})
     daily_pnl.append(ie.posterior("Profit").mu())
  all_daily_pnl.append(daily_pnl)

The histogram shows the distribution of cumulative yearly profits across all simulations. It helps assess the variability and risk of the business:

  • The center of the distribution tells us the most likely range of outcomes.
  • The spread (variance) reflects the financial uncertainty the café might face due to fluctuations in foot traffic and other factors.
In [10]:
pd.DataFrame(all_daily_pnl).cumsum().iloc[-1].hist()
Out[10]:
<Axes: >
2024-10-20T12:52:13.919458 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/

The trajectory plot shows the evolution of cumulative profit throughout the year for each simulation:

  • It visualizes how profits evolve day-by-day, highlighting the range of possible trajectories.
  • We observe significant variations, but overall patterns can emerge, such as the tendency to move into positive or negative profitability over time.
In [11]:
pd.DataFrame(all_daily_pnl).cumsum().plot(legend=False);
2024-10-20T12:52:14.740035 image/svg+xml Matplotlib v3.9.2, https://matplotlib.org/
In [12]:
pd.DataFrame(all_daily_pnl).cumsum().iloc[-1].describe()
Out[12]:
count      367.000000
mean     33318.048156
std      18491.991072
min     -43765.297258
25%      20884.936522
50%      33828.027061
75%      45464.357015
max      83343.925046
Name: 999, dtype: float64

NB- How to generate the image for the first cell of the notebook

In [13]:
import pyAgrum.lib.image as gumimg
gumimg.export(model,"../images/cafe.svg",size="6!")