The Elusive Lighthouse: Part 1
A Light Under a Bushel
You’re walking along a straight beach at night and you happen to know there’s a lighthouse somewhere out to sea, shrouded in darkness on a rocky island. But it is no ordinary lighthouse: it emits flashes in random directions, favouring no angle over any other. Fortuitously, you have a set of sensors rigged up along the coastline that register the incident locations of any flashes that hit the beach. Where is the lighthouse?
It turns out that this little problem holds some salutary lessons for data analysts. The geometry of the scenario is shown below, with a sample of ten flash records generated randomly from the case . Our length units are kilometres. The task is to infer and from the flashes.
The Lighthouse Sets Sail
Let us first try the obvious thing and estimate by averaging the -values of the flashes:
where the -values of the flashes are . Let’s experiment with this a little. If we set and generate sets of ten flashes, and apply the estimator to each set, then we get the following estimates of :
Some of the estimates are close to , but others (e.g. , ) are very different. Very well; this is what happens in random sampling: you get some random variation. Surely we can improve the estimate by increasing the sample size—the number of flashes per estimate. Let’s now try flashes for each estimate:
This time we have a ! Okay, it’s time to deploy the nukes: perhaps we can finally swamp these annoying variations with a sample size of a million!
Yet now we have seven results that differ from the true value by more than km! Frankly we still have no clue about the lighthouse’s position along the beach.
And Now, For Something Completely Different
Shall we try thinking it through? Lurking somewhere deep beneath our futile attempt was an implicit assumption that the distribution for ought to tend towards as the number of flashes increases. Let’s make that intuition more precise. Let stand for the information in the problem description, before any flashes were registered. Now if in addition the location of the lighthouse were known to be , then each flashpoint would be subject to uncertainty which we shall describe by the probability density . The form of this distribution follows from the lighthouse’s uniform habits and the geometry of the scene; we will say more about it later. Our estimator is a function of and the flashpoints , so its density is based on , , , and —call it . Now, our implicit hope was that as we increased , would pile up more and more narrowly about , so that when we came to take a variate from , it would be pretty close to . We can make the idea explicit by naming what we consider to be a small deviation from as , and considering the probability
which we’d like to be a (generally) decreasing function of . Moreover, we’d like it to decrease steeply enough that it is close to for feasible values of . In order for this estimator to have some modest utility, we might specify a deviation of km, and hope that falls to for a cost-effective number of flash records . This means that our estimate would have a chance of falling within a kilometre of .
The Ship Runs Aground
As you no doubt noticed from our earlier simulations, our estimator achieves nothing of the sort. The reason lies in the form of the flashpoint distribution. Consider first what would have happened if the flashpoint distribution had been the Gaussian distribution with mean and standard deviation . Here is a random set of flashpoints from that distribution for our example case :
We need to calculate , the density for our estimator . The estimator is a sum of independent Gaussians with mean and standard deviation , divided by . It turns out that the distribution of a sum of independent distributions is the inverse Fourier transform of the product of their Fourier transforms. Performing this for the sum of Gaussians and following with a change of variables to account for the denominator , we get a Gaussian with mean and standard deviation . Following equation (2) with , our probability-of-bad-estimate as a function of is
which is plotted below for .
Thus, if the flashpoint distribution were the given Gaussian, and the lighthouse were in fact km from shore, we could achieve the desired chance of getting a good estimate by collecting just flashes.
Now let’s return to the case at hand and figure out the form of the flashpoint distribution . A flash which is eventually recorded must depart the lighthouse at an angle , as shown below.
Before we observe the flash, is distributed uniformly on :
From the geometry we have
and the change of variables from to yields
This is the density, known as a Cauchy or Lorentz distribution, governing each flashpoint. If we superimpose it on our original set of flashpoints, it looks like this (scaled up for visibility):
Here’s the rub. When we calculate the density using a Cauchy flashpoint distribution, we get ! That is, the density for the estimator is just the same as the density for the flashpoints, which appears above, and does not depend on . So, collecting many flashpoints and taking the mean is mathematically equivalent to estimating by a single flashpoint.
For the same reason, the probability-of-bad-estimate is not a function of and doesn’t decrease with :
which for evaluates to . Our modest goal of a chance of a good estimate has been all but turned on its head. The Cauchy distribution also has heavy tails, which means many of the of estimates in error by more than a kilometre will be out by much more than a kilometre, such as the km we saw earlier. Our only hope is that is small:
However, in order to bring the probability-of-bad-estimate down to , we need to bring down to , which, with our and , evaluates to metres! In other words, the method begins to provide a somewhat-reliable, somewhat-accurate estimate of the lighthouse’s location along the beach only when the lighthouse is almost literally a stone’s throw from the shore.
And we haven’t even tried to estimate !
In the next post, we will demonstrate how to remedy our signal failure by seeking safer shores in the sound principles of probability theory.
Sivia, DS & Skilling, J 2006. Data Analysis: A Bayesian Tutorial. Oxford University Press, Oxford, United Kingdom.