Using Bayesian inference to pick Coffee Shops
I like going to coffee shops in Edinburgh. I have opinions of them: some are better for meeting a friend and others are totally not laptop-friendly.
In this post, I prototype a way to use my opinions to rank coffee shops using a really simple probabilistic model.
Ranking based on Comparisons
Since ranking 20+ coffee shops is not that much fun, I’ll gather data as comparisons of pairs of coffee shops. For example, I’ll tell my system that I think BrewLab is a lot more laptop-friendly than Wellington Coffee, but that BrewLab and Levels are equally laptop-friendly. Then I’ll figure out which are the best and worst coffee shops for laptops using probabilistic modelling.
Using pairs is convenient because it means I can borrow from approaches that rank players based on matches, like this pymc3
demo or Microsoft’s post on TrueSkill. (We also had a homework assignment on the math of this problem.)
Coffee shops
Very important is what I mean by the attributes for coffee shops. For now, I’m using four metrics defined below in METRIC_LIST
. They are
- reading: Chill and reading a book
- laptop: Camp out and doing work on my laptop
- meet: Meet up with someone
- group: Grab a table and hang out with folks
These metrics are completely independent. It’s more like four copies of the same problem with data stored in one place.
Part 1: Data
Metadata
When I go into pymc3
land, the data will end up in a big matrix. I’d like a way to associate indexes with static information about the coffee shop, like its name and location. This will come in handy later when I want to label plots or build an app.
I really like using YAML for human-entered, computer-readable data. I entered a list of metadata about the coffee places I’ve visited in data/coffee_metadata.yaml
. The id
field is a human-readable unique identifier.
- name: BrewLab
id: brewlab
location: university
- name: Cult Espresso
id: cult
location: university
I also like using namedtuples
to enforce schemas and catch typos when loading yaml
or json
files. I’ll define a namedtuple Metadata
for the above data and load the file.
Then I’ll make some useful data structures to map back and forth from id to index. The index in the data matrix will just be the position of the metadata dictionary in the data/coffee_metadata.yaml
list.
(I’m assuming id
is unique and it won’t ever change. When I save data, I’ll associate data with a coffee shop by using it’s id
, not it’s location in the matrix. I chose having a unique id
field over using the index in the matrix because it’s human-readable, which makes it easier to update incorrect comparisons, and it makes it trivial to add new coffee shops without changing matrix sizes.)
Loading comparisons
I like to store data that humans shouldn’t need to mess with in a file with lines of json
. Worst-case, I can go in and delete or change a value, but I don’t need to think about weird key orders that writing yaml
has.
A file showing two comparisons would look like this:
{"metric": "meet", "first": "artisan_broughton", "last": "cairngorm_george", "weight": 0.5}
{"metric": "meet", "first": "wellington", "last": "twelve_triangles_portobello", "weight": 0.5}
Here’s the idea: metric
is which metric I’m trying to measure. In this case, meeting
means where I like to meet up with someone. first
and last
are the two id
s that should be in the big list of coffee shop metadata
defined above. weight
is how much better first
is than last
. It could be negative if I really want.
Initial comparisons
If I have no data so far, I can begin by requesting a few comparisons between two randomly selected coffee shops.
The code will show me a metric
name, and the first coffee shop id and second coffee shop id. Then I’ll type in a number between 1 and 5. Here’s what the keys mean:
1
: totally the first coffee shop2
: lean towards the first coffee shop3
: draw4
: lean towards the second coffee shop5
: totally the second coffee shop
Inputting comparisons
This part gets nostalgic. A lot of my first programs were about asking for data from the user. Over time I’ve moved to different approaches, like the YAML file above, but I think this way works better because the computer is choosing which items to show me. As an example session:
laptop 1) lowdown 5) wellington? 1
laptop 1) black_medicine 5) levels? 4
laptop 1) artisan_stockbridge 5) castello? q
I can type q
to exit. Otherwise, I’ll be asked to compare two coffee shops and should type a number between 1 and 5.
(If you want to run this, set SHOULD_ASK = True
. I turn it off by default so I can run the entire notebook without being asked for input.)
Aside: exploring comparisions data
I can check what comparisons I have data for. Since the comparison is symmetric, I’ll mark sides of the matrix.
I can also show a plot of the weights. This shows how my data is a little odd: I only ever store positive numbers, and they’re rarely 0. I’m going to ignore it in this post, but I think it’s something my non-prototype model should take into account.
Part 2: Modelling
For the rest of this notebook, I’ll limit the scope to a single metric by setting METRIC_NAME = laptop
. To explore other metrics, I can update that string and rerun the following cells.
Model
Using the laptop
metric as an example, my model says there is some unknown laptop
metric for each coffee shop. This is what I’m trying to learn. The metric is Gaussian distributed around some mean. Given enough data, it should approach the actual laptop-friendliness of the coffee shop.
When I said that BrewLab was better for laptop work than Wellington Coffee, my model takes that to mean that BrewLab’s laptop
metric is probably higher than Wellington’s. Specifically, the number between 0 and 1 that I gave it is the difference between BrewLab’s mean and Wellington’s mean.
When I make a comparison, I might be a little off and the weights might be noisy. Maybe I’m more enthusiastic about Cairngorm over Press because I haven’t been to Press recently. pymc3
can take that into account too! I’ll say my comparison weight is also Gaussian distributed.
I’m basing my code on this tutorial but with the above model. Like the rugby model, I also use one shared HalfStudentT
variable for the metric’s standard deviations.
For each comparison, I compute the difference between the “true_metric” for the first and second coffee shop, and say that should be around the score I actually gave it.
Warning
Because the model is probably making terrible assumptions that I can’t recognize yet, I’m mostly using this model to see how a pymc3
model could fit into this project. I can always go back and improve the model!
pymc3
gives a lot of tools to check how well sampling went. I’m still learning how they work, but nothing jumps out yet.
- The sampler gave that the number of effective samples is small, but they say that’s probably okay.
- Below I plot the
traceplot
. I told it to sample with 3 chains. There are three copies of each distribution which are all in roughly the same place.
Plotting
Watch out: Now that I need to interpret the results, I’m at high risk of making embarrassing assumptions that I will use in the future as a “don’t do it this way” :D
Like this tutorial, I’ll plot the medians from the samples and use the Highest Posterior Density (HPD) as credible intervals. HPD finds the smallest range of the posterior distribution that contains 95% of its mass.
This looks really cool!
I think I can take two coffee shops and ask in how many posterior samples one better than the other. When the model doesn’t have much opinion, it’s close to 0.5. Otherwise it’s closer to 1 or 0.
Comparing model results to actual results
This is another step I can take in checking that the model seems reasonable is to ask what it predicts for each observation should be and plot it. This is asking if the model can predict the data it learned from.
It does miss two values. It’s a little suspicious. It seems like it tends to have trouble predicting that a comparison could be weighted as 0.
Part 3: Batched active learning
I can use my attempts at quantifying uncertainty as a heuristic for choosing which questions to ask. I do this using is_pretty_certain
. This is super cool! If the model is totally sure that Artisan is better for reading than Castello, it doesn’t ask about it.
Like before, update SHOULD_ASK
if you want to try it out.
Ways to make this even cooler
The thing is that I’ll just train the model from scratch with this new data. In some special models like TrueSkill, you can update the uncertainty in closed-form.
If this was a real product, there might be enough random questions to ask that it’s fine to not always ask the most-useful question. If it’s time-consuming to answer the question, it might be worth learning the model in between, using a different model that’s easy to update with new information, or finding some middle ground.
Conclusion
This was a fun first try! It took me an afternoon to put together, it looks like it’s roughly doing what I want, and it’s cool knowing that it’s backed by a probabilistic model.
It’s also cool because the application is general for other things I’d want to rank, like books. I just need to update filenames, drop in other things into a YAML file, and update the list of attributes.
I started experimenting with some of pymc3
’s neat tools like sample_ppc
and plot_posterior
.
More blog material
This project also helped me see how much more I need to learn about pymc3
and Bayesian data analysis.
The model seems to make reasonable predictions and adjust to data like I’d expect. The embarrassing thing is every time I tried to adjust the model, I would start getting terrible results and not know why. It’s useful to have data and a model I can play with, but I have a lot to learn!