Earlier this month, I posed some statistics interview questions. Here are possible answers.

1. Stirling’s formula holds that $\lim_{n\to\infty}{{\Gamma(n)e^{n}}\over{n^{n-1/2}}}=\sqrt{2\pi}$ , a result with broad utility in numerical recipes (the gamma function and concentration inequalities) and complexity (the notion of log-linear growth.)  It can follow directly from the central limit theorem.  How?

Suppose $X_{1},\dots,X_{n}$ are i.i.d. exponential(1).  Then $\overline{X}_{n}$ is distributed $\Gamma(n,1/n)$.  By the CLT, $\sqrt{n}\left(\overline{X}_{n}-1\right)\to N(0,1)$.

Therefore by the CLT, $\lim_{n\to\infty}{{n^{n-1/2}}\over{\Gamma(n)e^{n}}}\left(t/\sqrt{n}+1\right)^{n-1}e^{-\sqrt{n}t}={{1}\over{\sqrt{2\pi}}}e^{-t^{2}/2}$

for all $t\ge0$.  Showing the result of the theorem requires recognizing that ${\left( {t / \sqrt{n}+1}\right)}^{n-1}e^{-\sqrt{n}t}=(t/\sqrt{n}+1)^{-1}\left({{(t/\sqrt{n}+1)^{\sqrt{n}}}\over{e^{t}}}\right)^{\sqrt{n}}\to e^{-t^{2}/2}$.

We’ll omit the details.

2. Can you think of how regularization and prior distributions are connected?

Generally we can characterize the cost function as a log-likelihood.  For instance, the sum-of-squares error in OLS given by $\sum_{n=1}^{N}\left(ax_{n}+b-y_{n}\right)^{2}$

can be interpreted as a negative log-likelihood of $\mathbb{P}(Y=y|X=x;a,b)={{1}\over{\sqrt{2\pi}}}e^{-(ax+b-y)^{2}/2}$.

We can coerce a Bayesian treatment by thinking of the regression coefficients as random phenomena, so that $\mathbb{P}(Y=y,A=a,B=b|X=x)={{1}\over{\sqrt{2\pi}}}e^{-(ax+b-y)^{2}/2}\times\Pi(a,b)$.

This prior belief about the regression coefficients can take the form of any regularization we may choose to include in the original formulation.  For instance, suppose we really believe that the slope and intercept ought not be too big.  An L2 regularization would mean $\Pi(a,b)=Ke^{-a^2-b^2}$

for some suitable constant $K$, akin to the regularization hyperparameter.

3. Where might the CLT run aground?

Answer : Any number of obstacles to invoking the CLT exist, including non-finite variance, unstable variance, lack of independence, and so on.  Specific examples include a ratio of two independent standard normal variables, ratios of exponentials, waiting times to exceed say the first measurement, and so on.

4. Can you offer a variance-stabilizing statistic for predicting success probability in a binomial sample?  Provide a $100(1-\alpha)$% confidence interval.

Answer : With the delta method, we can offer the test statistic $T=2\arcsin(\sqrt{\overline{X}})$.

By the delta method, we have $\sqrt{n}\left(2\arcsin(\sqrt{\overline{X}})-2\arcsin(\sqrt{p})\right)\to N(0,1)$.

The confidence interval, with work, is $\left[A-B,A+B\right]$,

with $A=\overline{x}\cos\left({{|z_{\alpha/2}|}\over{\sqrt{N}}}\right)+\sin^{2}\left({{|z_{\alpha/2}|}\over{2\sqrt{N}}}\right)$

and $B=\sqrt{\overline{x}}\sqrt{1-\overline{x}}\sin\left({{|z_{\alpha/2}|}\over{\sqrt{N}}}\right)$.

A candidate capable of deriving the aforementioned in an hour interview would achieve a near unconditional pass.

5. Where does maximum likelihood estimation run into trouble?  Name three problems.

Answer : (1) Peakedness of the likelihood function can cause numerical instability, (2) sometimes the optimal solution falls outside the parameter space, and (3) there may be no global optimum.

A followup question is to query examples of each case.  Simple ones are estimating the size of a binomial trial, estimating parameters in subtended distributions, and unidentifiable parameters, respectively.  Answers may vary.

6. Consider a ratio of two exponential random variables.  If your boss asked you to approximate its expectation, how would you answer it!

Answer : If you got number three above, you already know the answer : the expectation does not exist.  Understanding the nuance is helpful in overcoming the challenges posed by ratio metrics.

7. If $X_{1},\dots,X_{N}$ are i.i.d. unif( $0,\theta$), how would you estimate $\theta$?  Give an estimator and justification.

Answer : This is an excellent opportunity to discuss sufficiency, a satisfying means of describing information necessary to determining a parameter.  It turns out that the maximum order statistic $X_{(N)}$, distributed $\theta\times B(N,1)$, is sufficient for $\theta$.  Therefore, an unbiased estimator is $Y={{N+1}\over{N}}X_{(N)}$,

with $\text{Var }Y/\theta={{1}\over{N(N+2)}}$.

We can invoke Lehmann-Scheffe to claim our estimator is UMVUE, if we can show completeness, another convenient statistical property we’ll discuss more in the days ahead.  Offering a confidence interval is an interesting follow-up.

Much of the above comes from insights in Statistical Inference by Casella and Berger.  I’ll be interviewing Roger Berger in a few months for Algo-Stats.  If you’ve made it this far in my article, please reach out to me to chat.  npslagle