top of page

Customer Lifetime Value calculation with SQL & Python


Customer lifetime value (CLTV) is a critical metric for businesses as it helps them to identify high-value customers and take targeted marketing actions. In this blog post, we will explore how to calculate CLTV using Python, focusing on data stored in Google BigQuery. The data will be extracted and cleaned, and various metrics such as total revenue per customer, purchase frequency, and repeat rate will be calculated. We will also address the issue of outliers in the data and discuss how to eliminate them using a filtering method. Finally, we will visualize the distribution of CLTV and interpret the results to gain actionable insights.



Usually a tables are spread out across several tables we would first to need to join the tables with SQL query. My data tables are in Googe Bigquery.


To calculate the customer lifetime value, a unique id must be assigned to each customer. While a Customer_Id would typically work, not all customers have one, especially if they checkout as a guest. However, all customers have an email, so for this purpose, their unique id will be their email. Unfortunately, the "bc_order" table doesn't contain email information, so we need to join "bc_order billing addresses" with the "bc_order" table and create a new table for it.



Once I had my table "orders_email"ready I imported it into Google Colab.


I use a unique code for each email to make sure we keep with privacy concerns.


We don't need all the columns from 'orders_email' column so we being to remove by creating a subset.


I remove the extra zeroes in the 'sub_total_excluding_tax' columns and change the name to 'total_rev'

This script will take create a sort of pivot table on email and calculate number of orders per customer, total revenue per customer, days from first order to last date in dataset.



To calculate the AOV for each order we divide total rev by order id


The purchase frequency for the entire dataset is calculated by dividing all the orders for all customers and all the customers which gives us 1.95


Repeat Rate is calculated by dividing the number of repeat purchases by the total number of customers gives us the repeat rate of customers which is .35


Churn Rate is simply the inverse of Repeat Rate. which is .65

Finally, the Customer Lifetime Value is calculated by multiplying the average order value (AOV) by the purchase frequency and dividing it by the churn rate. The resulting value is then multiplied by 0.4 to represent the profit margin.

Our CLTV has many outliers this is usually the case if we have development teams testing orders or special orders put in the system. We can see the outliers by creating a box plot.



To eliminate the outliers we create a filtered subset anything under 300 is considered an outlier.

We plot this into a histogram and the see the CLTV has a mean of $69 and a median of $57





Comments


bottom of page