Numb3rs S1E1: The Rossmo Formula

Updated:

When I think back to the TV shows I loved most growing up, Numb3rs stands at the top. It follows FBI agent Don Eppes and his genius mathematician brother, Charlie Eppes, as they solve cases by applying mathematical ideas to investigative work.

I recently started rewatching the series. The first time around—in middle and high school—I enjoyed it mostly as entertainment. Now that I can understand more advanced math and have the programming tools to analyze it, I’ve become curious about the theoretical foundations behind the techniques showcased in the show.

In this series of posts, I plan to revisit episodes that feature methods grounded in real theory, summarize the ideas, and try implementing them. The emphasis will be on translating the key assumptions into computational procedures that can be explored and visualized.

Geographic Profiling: Rossmo’s Formula

In Season 1, Episode 1, the show introduces geographic profiling—a technique that uses the locations of crimes to infer the offender’s home base.

Rossmo's Formula

Among the methods showcased, one is Rossmo’s formula. Proposed by criminologist Kim Rossmo in 19961, it assumes that the probability a specific location, denoted by $\vb{x}$, is the offender’s residence is proportional to the following function of the observed crime locations $\vb{x}_1, \cdots, \vb{x}_n$:

\begin{equation} P(\vb{x}) \propto \sum_{i=1}^n \qty[ \frac{H(d(\vb{x}, \vb{x}_i) - B)}{d(\vb{x}, \vb{x}_i)^f} + \frac{(1 - H(d(\vb{x}, \vb{x}_i) - B))B^{g- f}}{(2B - d(\vb{x}, \vb{x}_i))^g} ] \end{equation}

Here, $d(\vb{x}, \vb{y})$ denotes the distance between two locations $\vb{x}$ and $\vb{y}$—this can be Manhattan (taxicab) distance or Euclidean distance. The parameter $B$ is the “buffer zone radius,” reflecting the assumption that offenders tend to avoid committing crimes very close to home. The function $H(x)$ is the Heaviside step function, equal to 1 for $x \ge 0$ and 0 for $x < 0$.

At a high level, Rossmo’s formula encodes two behavioral assumptions about how offenders select crime locations:

  1. Distance decay: they rarely travel extremely far from their activity space.
  2. Buffer zone: they also avoid locations that are too close to home.

In the expression, the first term captures distance decay—locations far from crime sites receive lower weight. The second term encodes the buffer zone behavior—locations very close to home receive lower weight, tapering out of the “avoidance” region.

Actual Implementation

Because Rossmo’s formula is not derived from a fully specified probabilistic model but rather from weighted assumptions, the resulting scores should be normalized over the search region to yield a proper spatial probability distribution. In practice, integration over a continuous domain is difficult, so we approximate on a grid: compute a score map $S(\vb{x})$ on grid cells and normalize via $ P(\vb{x}) = \frac{S(\vb{x})}{\sum_{\vb{y}} S(\vb{y})} $.

Rossmo Figure

This is essentially what the show illustrates: divide the map into a grid, compute a heatmap of scores, and then interpret the normalized map as an approximation to a probability surface over candidate home locations.

Empirical Validation of Assumptions

The ideas are compelling, but how well do the assumptions hold empirically?

The first assumption—distance decay—appears to be well supported. Wiles & Costello (2000)2 report that despite advances in transportation, offenders tend to operate relatively close to home. Canter & Hammond (2006)3 analyze U.S. serial homicide cases to study which decay functions fit best. Intuitively, searching for targets far from home requires greater effort, so more crimes occur in nearer areas.

The second assumption—the buffer zone—is more debated. Turner (1969)4 first noted that juvenile offenders committed crimes at least one block away from home, and Brantingham & Brantingham (1981)5 formalized the concept. But even after five decades, evidence is mixed. A later retracted paper by Bernasco & van Dijke (2020)6 reported that only 11 of 33 studies supported the buffer zone hypothesis, with others not finding consistent evidence. While the retraction diminishes the weight of that tally, it underscores that the buffer zone is not universally accepted.

Choice of Parameters: Distance Function and Decay Behavior

As a physicist, my first reaction on seeing the formula was that the output must depend on the chosen distance metric and the functional form of decay. Beyond Manhattan and Euclidean distance, there are many ways to define “distance”—including travel time, which implicitly accounts for roads, transit lines, and other constraints that shape perceived or effective distance.

Similarly, the decay with distance need not be a simple power law; negative exponential $e^{-\lambda d}$ or polynomial forms (linear, quadratic in the denominator) are also plausible. Canter & Hammond (2006)3 examined the efficacy of multiple decay forms and found results to be relatively robust—performance differences were not dramatic. This practical robustness helps explain why the exponent value $1.2$ originally suggested by Rossmo is often used in applications without extensive re-tuning.

Recent Works: A Bayesian Approach

Despite the robustness to decay forms and distance metrics, Rossmo’s method has notable limitations—especially the fact that the buffer zone scale can vary substantially across contexts. Because serial cases often have few observations, estimating buffer zone parameters while applying the method leaves room for subjective analyst input.

A more principled alternative is to posit a generative model for crime locations conditioned on the offender’s residence and behavioral parameters, and then infer those latent variables from observed data. Mohler & Short (2012)7 pursue this with kinetic (Fokker–Planck) models of offender movement and a Bayesian framework:

\[P(\vb{x}_0, \alpha | \vb{x}_1, \cdots, \vb{x}_n) \propto \qty[ \prod_{i=1}^n P(\vb{x}_i | \vb{x}_0, \alpha) ] H(\vb{x}_0) \pi(\alpha)\]

Here, $\vb{x}_0$ is the home location, $\alpha$ collects behavioral parameters (e.g., buffer zone scales), $H(\vb{x}_0)$ is a prior over habitable/likely residential locations (e.g., population density), and $\pi(\alpha)$ is a prior over parameters informed by prior studies. This approach turns the problem into posterior inference over $(\vb{x}_0, \alpha)$ given observed incidents, tying the assumptions to a coherent probabilistic model.

Practical Implementation

Rossmo vs Bayesian Approach

Let’s compare the results of Rossmo’s method and the Bayesian approach using taxicab distance as the distance metric.

Rossmo vs Bayesian Approach

When I assume the prior probability of crime site selection given a residence follows the same weight function used in Rossmo’s method, I observe that Rossmo’s approach spreads probability mass over a relatively broad region, whereas the Bayesian approach concentrates it more sharply around the true residence location. This suggests that if Rossmo’s method is applied, one would need to assume a smaller decay rate than the actual prior probability to achieve comparable precision.

Different Distance Functions

I hypothesized that when rapid transit like subways is available, the distance function should change. The intuition is psychological: offenders might perceive distance more in terms of travel time than normal taxicab distance. To test whether modifying the distance metric improves inference, I simulated a scenario where a subway runs along the x-axis, making travel in that direction considerably faster.

\[d((x_1, y_1), (x_2, y_2)) = \beta \abs{x_1 - x_2} + \abs{y_1 - y_2}\]

I adjusted the prior probability based on this metric and compared the Bayesian inference with and without accounting for $\beta$.

Different distance function

When $\beta$ is incorporated, the inferred residence probability peaks near the true home location. Without it, the method tends to estimate the residence near the simple average of crime locations. This effect is most pronounced when crimes cluster along the x-axis; in less biased distributions, the average location often coincides with the true residence.

Multiple Residence Cases

In the show, we also see scenarios where the offender commits crimes not just near home but also near their workplace. Given the tendency of naive approaches to estimate location based on the average of observed points, having two anchor locations can significantly degrade estimates.

Multiple residence case

Both panels use the Bayesian approach, but the left assumes a single residence while the right correctly assumes two. Interestingly, even the single-residence assumption manages to identify one of the two true locations reasonably well. This mirrors what happens in the show: the first location the FBI identifies turns out to be the suspect’s former residence, with the true current hideout revealed later. (In retrospect, one could interpret this as three activity centers across different time periods.) The Bayesian model’s behavior here—pinpointing at least one plausible anchor even under model misspecification—may partly explain why the method worked in that episode.

References

  1. US Patent 5781704, Rossmo, D. K., “Expert system method of performing crime site analysis”, issued 2002-07-16 

  2. Paul Wiles, Andrew Costello, “Road to Nowhere: The Evidence For Travelling Criminals” (2000) 

  3. Canter, D. and Hammond, L. (2006), A comparison of the efficacy of different decay functions in geographical profiling for a sample of US serial killers. J. Investig. Psych. Offender Profil., 3: 91–103.  2

  4. Turner, S. (1969). Delinquency and distance. In T. Sellin & M. E. Wolfgang (Eds.), Delinquency: Selected studies. New York: John Wiley. 

  5. Brantingham, P. L., Brantingham, P. J. (1981). Notes on the Geometry of Crime. In P. J. Brantingham & P. L. Brantingham (Eds.), Environmental criminology. Beverly Hills: Sage. 

  6. Bernasco, W., van Dijke, R. RETRACTED ARTICLE: Do offenders avoid offending near home? A systematic review of the buffer zone hypothesis. Crime Sci 9, 8 (2020) 

  7. George O. Mohler and Martin B. Short, “Geographic Profiling from Kinetic Models of Criminal Behavior”, SIAM Journal on Applied Mathematics 2012 72:1, 163–180