The news is now out:

We are now 'Aurizn'.

The Elusive Lighthouse: Part 2

Stuart Burrows 17 Jan 2017 6 minute read
The Elusive Lighthouse: Part 2

Let There be a Lighthouse

In the preceding post we attempted to locate a queer lighthouse from the incident locations of its uniformly angular distributed flashes upon a straight beach. The geometry of the scenario is shown below.

It turned out to be a difficult problem, because the flashpoints assume a Cauchy distribution, which makes the intuitively appealing sample mean totally useless. In this post, we track down the lighthouse by applying sound principles of probability theory which are commonly called “Bayesian”.

The Bayesian method is a very general approach to reasoning in the presence of uncertainty. Many common statistical methods can be derived from it by applying simple assumptions. For example, if the flashpoints were distributed normally, then the most probable value of a, under the simplest Bayesian assumptions, would be the sample mean of the flashpoints. But Bayesian principles really come into their own when common statistical assumptions don’t apply, as for the Cauchy distribution.

Enlightenment

In the last post we derived the estimator density p_E\left(\hat{a} | N,a,b,I\right), which enabled us to find an expression for the probability-of-bad-estimate P\left(\left| \hat{a}-a\right| >1 | N,a,b,I\right). But notice that p_E depends on a and b in general, and in this case on b. This is unfortunate, because a and b are unknown. The most we can get from p_E is a series of “if-then” statements like \mbox{``}if (a,b)=(7,5), then \hat{a} is likely to be this close to a\mbox{''}. But what if (a,b)\neq (7,5)? Moreover, what we really want to know is not how \hat{a} stands subject to (a,b), but how (a,b) stands subject to \hat{a}. Fortunately, the inference may be reversed in that way by applying Bayes’ rule, and it obliges us to consider what we know about (a,b) prior to its application.

Bayes’ rule expresses the posterior density for a and b as:

(1)   \begin{equation*} p_P(a,b | \underset{\sim }{x},I)=\frac{p_{\underset{\sim }{F}}(\underset{\sim }{x} | a,b,I)p_{\pi }(a,b | I)}{p_Z(\underset{\sim }{x} | I)} \end{equation*}

where p_{\underset{\sim }{F}}(\underset{\sim }{x} | a,b,I)=\prod _{n=1}^N p_F\left(x_n | a,b,I\right) is called the likelihood, p_{\pi} is the prior for a and b, and p_Z is a normalisation factor. Bayes’ rule is very easy to understand intuitively. The prior captures what we know about the location of the lighthouse before taking account of the flashes. The likelihood specifies where we ought to expect flashes for each possible lighthouse location. And the posterior expresses what is known about the location of the lighthouse after taking account of the flashes.

We know that p_F\left(x_n | a,b,I\right) is a Cauchy distribution, so we know what the likelihood is. The normalisation factor will be worked out later. The prior is where we address head-on the aforementioned inescapable question, “What if (a,b)\neq (7,5)?” We must think hard about what we know about the location of the lighthouse before considering any flashpoints. The only relevant information in the opening paragraph is that “you happen to know there’s a lighthouse somewhere out to sea”. That means (i) b>0; and (ii) no location satisfying that condition should be favoured over any other. These constraints imply a uniform distribution on the upper half-plane. Because that is an improper distribution defined only in the limit, we set finite bounds which we later relax to infinity. Our prior becomes

(2)   \begin{equation*} p_{\pi }(a,b | I)=\frac{1}{2l^2} \;\;\; ,\;\;\; -l<a<l, \; 0<b<l \;\;\; \text{(else 0)} \end{equation*}

Now we can write down an expression for the posterior distribution:

(3)   \begin{equation*}  p_P(a,b | \underset{\sim }{x},I)=\frac{1}{p_Z(\underset{\sim }{x} | I)}\cdot \left[\prod _{n=1}^N p_F\left(x_n | a,b,I\right)\right]\cdot \frac{1}{2l^2}\;\;\; , \;\;\; -l<a<l, \; 0<b<l \;\;\; \text{(else 0)} \end{equation*}

Since the posterior is a probability distribution, it integrates to 1, so we have

(4)   \begin{equation*} \int _0^l\int _{-l}^l\frac{1}{p_Z(\underset{\sim }{x} | I)}\cdot \left[\prod _{n=1}^N p_F(x_n | a,b,I)\right]\cdot \frac{1}{2l^2}\:dadb=1 \end{equation*}

(5)   \begin{equation*} \text{i.e.} \;\;\; p_Z(\underset{\sim }{x} | I)=\frac{1}{2l^2}\int _0^l\int _{-l}^l \prod _{n=1}^N p_F(x_n | a,b,I)\:dadb \end{equation*}

Substituting into equation (3) we have

(6)   \begin{equation*} p_P(a,b | \underset{\sim }{x},I)=\frac{\left[\prod _{n=1}^N p_F\left(x_n | a,b,I\right)\right]\cdot \frac{1}{2l^2}}{\frac{1}{2l^2}\int _0^l\int _{-l}^l \prod _{n=1}^N p_F\left(x_n | a,b,I\right)\:dadb}\;\;\; ,\;\;\; -l<a<l, \; 0<b<l\;\;\; \text{(else 0)} \end{equation*}

The \frac{1}{2l^2} terms cancel, and the integral remaining in the denominator is a constant with respect to a and b. Therefore

(7)   \begin{equation*} p_P(a,b | \underset{\sim }{x},I)\propto \prod _{n=1}^N p_F\left(x_n | a,b,I\right)\;\;\; ,\;\;\; -l<a<l, \; 0<b<l\;\;\;\text{(else 0)} \end{equation*}

We can now take the limit l\to \infty and substitute the equation for p_F to get our final expression for the posterior:

(8)   \begin{equation*} p_P(a,b | \underset{\sim }{x},I)\propto b^N\prod _{n=1}^N \frac{1}{\left(x_n-a\right){}^2+b^2}\;\;\; ,\;\;\; b>0\;\;\;\text{(else 0)} \end{equation*}

Let’s put our posterior to work to see what the flashes reveal about the location of the lighthouse. The following five images show posterior level curves for five different flashpoint samples of size ten. The level curves are equally-spaced across the height of the posterior, and the maximum is marked with a dot. In order to keep the plot range the same, not all of the flashpoints are shown.

Case 1

Case 2

Case 3

Case 4

Case 5

Not only do we see “maximum a posteriori” (the central dot) estimates for a and b which seem plausible in light of the flashpoints; we get an appropriately nuanced distribution as well.

Finally, let’s take a look at how the posterior changes as more flashes are gathered. We take a sample of 100 flashpoints and plot the contours that take account of the first 5, 10, 20, 50, and 100 (all of them).

Case 5

Case 10

Case 20

Case 50

Case 100

Notice that from N=10 to N=20, the spread of the contours increases. This is because the overall spread of the flashpoints increases, countervailing the effect of their increasing number. It serves as a reminder that more data does not necessarily mean more certainty. Nevertheless, the convergence of the contours on the lighthouse with increasing N shows that the method is sound.

How Far That Little Candle Throws His Beams!

We have learnt a number of lessons. Firstly, we have seen how orthodox statistical estimators can be assessed, and how they give rise to a series of “if-then” statements where the antecedent contains the quantities to be inferred, and the consequent contains the estimator (the data). Secondly, there are situations where such estimators are unsound—for example, when the sample mean is used to infer the location parameter of a Cauchy distribution. Thirdly, by switching the inferred quantities and the data back to their natural order, and explicitly addressing the inescapable question of prior knowledge, which orthodox methods neglect, the Bayesian method resolves these shortcomings.

In the real world, lighthouses don’t have an infinite throw, and sensors have limited precision. However, this is easily dealt with by modifying the flashpoint distribution. We might truncate it, widen it, or reduce the tails. In any case, if its form is logically derived from the effects in play, then it will yield an appropriate posterior distribution.

One is tempted to ask whether it is possible to derive more accurate predictions than those above. The surprising answer is “No”! It turns out that these posterior distributions capture exactly what the data imply: nothing more, nothing less. Wider distributions are too uncertain, and narrower distributions too confident. The Bayesian method is the means to do inference correctly, just as the propositional calculus is the means to do deduction correctly.

Bayes’ Theorem: casting its light across the world since 1763.

References

Sivia, DS & Skilling, J 2006. Data Analysis: A Bayesian Tutorial. Oxford University Press, Oxford, United Kingdom.