Sample mean and variance are estimators of the population mean and variance.
Sample Mean
- Note: this is a function of n i.i.d random variables
Uniform connection?
To me, looks awfully similar to expectation of a uniformly distributed random variable.
WHY?
Informally:
If you do not assume any model for the data, the best you can do is place equal mass on each observed data point.
No preference among sample points
- The sample is the entire support of a “best guess” distribution
- All sample data points are equally valid draws
Nonparametric Maximum Likelihood
- In a purely nonparametric framework (i.e., we do not assume the data must come from a Normal, Exponential, etc.), the “simplest” discrete distribution that exactly places mass on the observed data (and nowhere else) with maximum likelihood is one that puts mass on each of the n observations.
- If you try to find a discrete distribution over the observations that maximizes the likelihood of observing exactly those points (and only those), you end up with for all . That’s another justification for why the empirical distribution is uniform on the sample points.
The empirical distribution
In frequentist methods, especially in nonparametric statistics, we often do not assume any specific distribution family. Instead, the data “speak for themselves.”
The empirical distribution is a discrete distribution that places equal probability on each of these observed points. Formally, it is given by
where is an indicator function that is if and otherwise. This defines the empirical CDF (Cumulative Distribution Function): at any point , is simply the fraction of observed data points that are .
“What proportion of the sample is ?”
• As , (the empirical CDF) converges to the true distribution by the Glivenko–Cantelli Theorem, which says
Equivalently, if you treat the data points as distinct mass points, you can define a random variable (distributed according to the empirical distribution) such that:
- is a random variable that is equally likely to be any of the observed data points: