06 September 2013

Box, Hunter, and Hunter

I’ve been studying a book called Statistics for Experimenters: an Introduction to Design, Data Analysis, and Model Building, by the intrepid trio of authors Box, Hunter and Hunter (their last names).  Section 2.2 is called “Theory:  Probability Distributions, Parameters, and Statistics.”  That section starts with a subsection called “Experimental Error,” from which this is the first paragraph:


"When an operation or experiment is repeated under what are, as nearly as possible, the same conditions, the observed results are never quite identical.  The fluctuation that occurs from one repetition to another is called noise, experimental variation, experimental error, or merely error.  In a statistical context the word error is used in a technical and emotionally neutral sense.  It refers to variation that is often unavoidable.  It is not associated with blame."

The authors do, however, mention in the next paragraph that experimental error must be distinguished from "careless mistakes,"  which presumably are associated with blame.  The question would then be is there also a resulting shame when the careless mistakes are discovered?  Sometimes yes, sometimes no would be my guess.

This book nicely discusses the difference in the concepts of sample and population.  In discussing the average value associated with a sample, it uses the letter y with a bar above it to represent the average (whereas I used the brackets < >).  Since the average, it says, "tells us where the scatter of points is centered, it is called a measure of location for the sample."  Lovely!

Next the book discusses the population:  "If we imagine a hypothetical population as containing some very large number N of observations, we can denote the corresponding measure of location of the population by the Greek letter η (eta), so that

                                                           η = Σy/N

(end of quote).  These authors use y as the variable that represents samples and populations.  And they have more to say about samples versus populations:



"To distinguish the sample and population quantities we call η the population mean, and 'y-bar' the sample average. In general, a parameter like the mean η is a quantity directly associated with the population, and a statistic like the average 'y-bar' is a quantity calculated from a set of data often thought of as some kind of sample from a population.  Parameters are usually designated by Greek letters; statistics, by Roman letters." 

Population:  a very large set of N observation from which the sample can be imagined to come. The population mean, η = Σy/N, is considered to be a "parameter." 
Sample:  a small group of n observations.  The sample average,  y-bar = Σy/n, is considered to be a "statisitic."

Finally, at least as far as this post is concerned, Box Hunter and Hunter mention this:
"The mean of the population is also called the expected value of y, or the mathematical expectation of y, and is then denoted by E(y).  Thus  η = E(y)."
Physicists usually use brackets <> for the expected value, especially in quantum mechanics where it's called the "expectation value."  In my earlier posts, I was thinking that "average" and "mean" are synonomous, but now I see that they aren't.  I was also thinking that I could use the formula for standard deviation shown in A Serious Man to rewrite the standard deviation formula in the People Aren't Perfect* lab, but I can't, since the former is the mean and the latter is the average.  I can, however, see what error arises when I try to use the mean instead of the average, which I plan to do one of these days. 

*and neither are machines