Much of what we do in statistics requires a deeper understanding than running a package in R or python, though those skills can’t hurt. Testing for statistical literacy can be a bit tricky, as scientists often fall into one of two camps : statistics is solved and thus not sufficiently important to cultivate in skills, or it’s completely opaque and perhaps uninteresting.
Conditioning on my own preferential treatment of statistics, I’d wager very few data scientists could answer the following questions. We’ll defer providing sources to avoid giving up the answers. If you’re interested in playing along, resist the temptation to search for answers online. Think about how you would approach each of these without anything other than pencil and paper (if those archaisms still existed.)
- Stirling’s formula holds that
, a result with broad utility in numerical recipes (the gamma function and concentration inequalities) and complexity (the notion of log-linear growth.) It can follow directly from the central limit theorem. How?
- Can you think of how regularization and prior distributions are connected?
- Where might the CLT run aground?
- Can you offer a variance-stabilizing statistic for predicting success probability in a binomial sample? Provide a
% confidence interval.
- Where does maximum likelihood estimation run into trouble? Name three problems.
- Consider a ratio of two exponential random variables. If your boss asked you to approximate its expectation, how would you answer it!
- If
are unif(
), how would you estimate
? Give an estimator and justification.
Happy Tuesday! Answers and more to follow soon.