The Cramer-Rao Bound

Hello everyone! For my inaugural post I thought I would take some time to talk about and derive the Cramer-Rao Bound (CRB).

I work a lot with radar, which, at its heart, is an estimation theory problem. In its simplest application, a radar sends a known signal x(t) out into the environment, the signal bounces off an object at some distance from the transmitter, and is returned to the radar as a delayed version of the original signal x(t-\tau). To determine how far away the reflecting object was located, we simply have to estimate \tau.

Well let’s say I’ve been contracted to build a radar for the military, and they specify that I need to be able to determine an unknown object’s range with a precision of 1 meter. How can I know if my radar will be able to meet this specification before I build it? I need to know how well I can possibly do with what I have to work with!

This is where the CRB is useful. In general, the CRB gives you the best possible performance of an estimator. In other words, it lower-bounds the mean-squared error of your system. Going back to the radar application, the range CRB would tell me the theoretical best precision I could make on a range estimate. A very useful tool indeed!

Let’s get started deriving it. We seek a bound on the mean-squared error matrix \textbf{M}

\textbf{M} = \textup{E}\left [(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})^H \right ] = \textup{E}\left [\boldsymbol{\epsilon}\boldsymbol{\epsilon}^H \right ] .

In general, we can say a matrix is “lower bounded” by another matrix if the difference between the two is a non-negative definite matrix. Let’s now define the column vector \textbf{x} as

\textbf{x} =\begin{bmatrix}\hat{\boldsymbol{\theta}} - \boldsymbol{\theta} - \textbf{b}(\boldsymbol{\theta})\\\nabla_{\boldsymbol{\theta}}\textup{ln}p_\textbf{X}(\textbf{X};\boldsymbol{\theta})\end{bmatrix} ,

where \textbf{b}(\boldsymbol{\theta}) denotes the column matrix of estimator biases. To derive the Cramer-Rao Lower Bound, we evaluate \textup{E}\left [ \textbf{x}\textbf{x}^H \right ].

\textup{E}\left [ \textbf{x}\textbf{x}^H \right ]=\begin{bmatrix}\textbf{M}-\textbf{bb}^H & \textbf{I}+\nabla_{\boldsymbol{\theta}}^H\textbf{b}\\(\textbf{I}+\nabla_{\boldsymbol{\theta}}^H\textbf{b})^H & \textbf{F}\end{bmatrix}

where \nabla_{\boldsymbol{\theta}} is the matrix of partials of the bias \left [ \partial b_i / \partial \theta_j\right ] and the notation \nabla_{\boldsymbol{\theta}}^H\textbf{b} represents the matrix (\nabla_{\boldsymbol{\theta}}\textbf{b}^H)^H.

The term \textbf{F} in the matrix is a very important quantity in estimation theory- known as the Fisher Information Matrix. Formally, the Fisher Information is the variance of the score (which itself is the gradient of the LLF), but more intuitively, the Fisher Information tells us how much information that a random variable \textbf{X} contains about an unknown parameter \boldsymbol{\theta} upon which the probability of \textbf{X} depends. This matrix is

\textbf{F} = \textup{E}[(\nabla_{\boldsymbol{\theta}}\textup{ln}p_\textbf{X}(\textbf{X};\boldsymbol{\theta}))(\nabla_{\boldsymbol{\theta}} \textup{ln}p_\textbf{X}(\textbf{X};\boldsymbol{\theta}))^H]=-\textup{E}[\nabla_{\boldsymbol{\theta}}\nabla^H_{\boldsymbol{\theta}}\textup{ln}p_\textbf{X}(\textbf{X};\boldsymbol{\theta})]

The notation \nabla_{\boldsymbol{\theta}}\nabla^H_{\boldsymbol{\theta}} refers to the matrix of all second order partials, or the gradient of the gradient. This matrix is known as the Hessian. Demonstrating the equivalence of the two forms of the Fisher Information given is rather easy, so we’ll skip it.

We now can say that the matrix \textup{E}[\textbf{xx}^H] is non-negative definite because it is a correlation matrix. Thus, for any column matrix \boldsymbol{\alpha} the quadratic form \boldsymbol{\alpha}^H \textup{E}[\textbf{xx}^H] \boldsymbol{\alpha} is a non-negative. One convenient choice of \boldsymbol{\alpha} is

\boldsymbol{\alpha}=\begin{bmatrix}\boldsymbol{\beta}\\-\textbf{F}^{-1}\left(\textbf{I}+\nabla^H_{\boldsymbol{\theta}}\textbf{b}\right)\boldsymbol{\beta}\end{bmatrix}

where \boldsymbol{\beta} can be any arbitrary column matrix. In this case, the quadratic form becomes

\boldsymbol{\alpha}^H \textup{E}[\textbf{xx}^H] \boldsymbol{\alpha} = \boldsymbol{\beta}^H \left [ \textbf{M}-\textbf{bb}^H -\left(\textbf{I}+\nabla^H_{\boldsymbol{\theta}}\textbf{b}\right)^H\textbf{F}^{-1}\left(\textbf{I}+\nabla^H_{\boldsymbol{\theta}}\textbf{b}\right) \right ]\boldsymbol{\beta} .

Since this quadratic form must be non-negative, the matrix expression within the brackets must also be non-negative definite. We thus obtain the Cramer-Rao Bound on the mean-squared error matrix as

\textup{E}\left [\boldsymbol{\epsilon}\boldsymbol{\epsilon}^H \right ] \succeq \textbf{b}(\boldsymbol{\theta})\textbf{b}(\boldsymbol{\theta})^H + \left(\textbf{I}+\nabla^H_{\boldsymbol{\theta}}\textbf{b}\right)^H\textbf{F}^{-1}\left(\textbf{I}+\nabla^H_{\boldsymbol{\theta}}\textbf{b}\right) .

The elements of \textup{E}\left [\boldsymbol{\epsilon}\boldsymbol{\epsilon}^H \right ] are the squared errors of the estimate of the individual parameters. This bound is greatly simplified if the estimator is assumed unbiased (\textbf{b} = \textbf{0}), which is actually often the case. Then the CRB simply becomes

\textup{E}\left [ \left ( \hat{\theta}_i - \theta_i\right )^2 \right ] \geq \left ( \textbf{F}^{-1}\right )_{ii}.

Hopefully this derivation has been helpful and you have also been able to gain some intuition behind the physical meaning of the CRB.

Reference – Don H. Johnson, “Statistical Signal Processing” class notes, Rice University, 2018

0 comments