A Next Product To Buy (NPTB) model for BBB

This python notebook demonstrates Next-Product-To-Buy modeling using an experiment performed by Bookbinders on 30,000 customers. Dave Lawton sent out offers in the "Art" category to 10,000 randomly selected customers ("The Art History Of Florence"). Another, 10,000 randomly selected customers got an offer in the "DIY" category ("Paint Like a Pro"), and a final 10,000 randomly selected customers got an offer in "Cook" category ("Vegetarian Cooking for Everyone"). The dataset contains information on responses from these 30,000 customers. Our task is to use the available data to find the best book to offer each of the 30,000 customers in the test, as well as the best book to offer an additional 10,000 customers that were not part of the test

Start by importing the relevant packages and the bbb_nptb dataset.

Run a logistic regression

We will estimate a logistic regression with buyer as the response variable and gender, last, total, and book categories child through geog as the explanatory variables. For this test Dave Lawton sent out Art, DIY, and Cook offers to 10,000 customers each so will use the information about the offer we sent in the model as well

We want to predict what each customer would have done if we had sent him/her any one of the three offers

Create predictions

If we just use bbb_nptb as the prediction data dropdown we get only predictions based on what customers were actually sent. What we want, however, is to predict what each person would have done if they had been sent, for example, the Art offer. We can modify the prediction data by setting the value of the "offer" column and (re) running the model prediction for each book type:

Which offer to extend? Use the idxmax function to automatically find the best offer for each customer

This command provides a label for the category with the maximum predicted probability of buying (i.e., "Art", "Diy", "Cook"). Lets use this option and also create a variable p_target that captures the probability of responding for the best offer selected for a customer. We use max with axis=1 here because we want the maximum response probability for each customer (i.e., each row in the data)

Lets create a crosstab to see which book(s) Dave Lawton should offer his customers

This does not look very customized! The model predicts that every customer should be sent the DIY book. The (deliberate) mistake in the analysis above was that the specified model is not sufficiently flexible to allow customization across customers!

The whole point of customization is that different offers may work better for different customers. In other words, we want to customize offers because we think that there might be an interaction between (1) who the customer is and (2) how effective the offer is. Hence, we need to interact offer with the variables that describe customer characteristics. For convenience, lets just interact offer will all available customer variables. The model output is shown below:

Now lets repeat the previous analysis steps but using the results from the new, more flexible, model. Start by generating predictions. Provide the name p_arti, p_diyi, and p_cooki to store the predictions from the model with interactions

Which offer should we extend? Again, use the idxmax function to automatically find the best offer for each customer

This command, again, provides a label for the category with the maximum predicted probability of buying across the columns p_arti, p_diyi, and p_cooki. Lets use this option and also create a variable p_targeti that captures the probability of responding for the best offer selected for a customer. We use max with axis=1 here because we want the maximum response probability for each customer (i.e., each row in the data)

Lets create a crosstab to see which book(s) Dave Lawton should offer his customers

Now lets create a table with the average purchase probabilities if we (1) sent the Art book to everyone, or (2) sent the DIY book to everyone, or (3) sent the Cook book to everyone, or (4) targeted the book that a customer is most likely to buy according to our model with interactions

Accounting for profitability

So far we have picked offers for each customer according to his/her predicted purchase probability. However, that is not the right criterion if offers differ in profitability. Lets assume the following: The profit from selling the "Art History of Florence" is \$6, the profit from selling "Paint Like a Pro" is \\$4, and the profit from selling "Vegetarian Cooking for Everyone" is \$7.

Now, calculate the expected profit for each book and each customer (i.e., the predicted purchase probability * margin on sale). Lets use the prefix ep_ for these variables, short for "Expected Profit"

To determine the book to offer that will maximize expected profits per customer we can use idxmax again in the following command:

Finally, create a variable ep_target that captures the result from targeting a customer with the book with the highest expected profit:

Lets create a crosstab to see which book(s) Dave Lawton should offer his customers

Calculate average expected profits if we (1) sent the Art book to everyone, or (2) sent the DIY book to everyone, or (3) sent the Cook book to everyone, or (4) targeted using the book with the highest expected profit for each individual customer

The expected profit per customer from targeting is substantially higher, as we might expect. If we extrapolate this result to the remaining 520,000 customers in the BBB database, we expect the following in profit