[Blogtober #6] Uncertainty in ML

[I was spending way too long on this post so I’m releasing it in bullet point form. Maybe I’ll pass it through an LLM ghostwriter.]

Intro

I was recently asked in an interview how you might get a confidence score from a neural network
It was very much a blind spot for me, despite having studied Bayesian inference and Gaussian processes at uni
I want to turn a weakness into a strength, so this post will be a refresher on some topics
There’s a lot of literature on this so I’m just going to cover some basic methods for regression, and maybe revisit in the future
Even the stuff I do mention here can be explored in much more depth - I’ll leave links

Contents

We’re going to focus on the simple task of predicting a noisy tanh function y = tanh(x) + e, e \sim N(0, 1%)
NN point estimate

MC Dropout (Gal https://arxiv.org/pdf/1506.02142, https://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801aa532c1ce.html)
- This is somewhat a specific case of BNNs where the weight rows are either 0 or unaltered following a Bernoulli dist
- Gal paper shows that this approximates a Gaussian process
- One advantage is that we train the model as normal using gradient descent and dropout, so it’s more computationally tractable. We get the uncertainty for free
- However it’s not clear to me what the output distribution actually represents - need to dig into the maths more

For comparison here’s the GP with a quadratic kernel (GPs are very cool, very Cambridge)

Uses

Theory