16 September 2013

Degrees of Freedom

How many degrees of freedom do you have in your life?  Well, if you can’t answer that, can you say how many constraints you’re subject to?  The number of constraints would tell you how many degrees of freedom you subtract from the original or unconstrained state you started off in.  But freedom versus constraint is a difficult thing to get a hold of in the realm of human activity, so let’s go to the realm of physics.
 
I first encountered degrees of freedom and constraints in a two-semester junior-level Newtonian mechanics course I took at Hendrix College, in the 1978-79 academic year.  Richard Rolleigh was the professor, and a good one.  I was not that good of a student, except in the first three months of the course.  The textbook is a pretty good one:  Mechanics, by Keith R. Symon.  Chapter 9 in the book discusses generalized coordinates, Lagrange’s equations, systems subject to constraints, constants of the motion and ignorable coordinates, Hamilton’s equations, and Liouville’s theorem. 
 
In section 9.4, Symon says this about degrees of freedom:  “The number of independent ways in which a mechanical system can move without violating any constraints which may be imposed is called the number of degrees of freedom of the system.    For example, a single particle moving in space has three degrees of freedom, but if it is constrained to move along a certain curve, it only has one.  A system of N free particles has 3N degrees of freedom, a rigid body has 6 degrees of freedom (3 translational and 3 rotational), and a rigid body constrained to rotate about an axis has one degree of freedom.”
 
 
The concepts of constraints and degrees of freedom are also used in statistics.  Let’s see what professors Box, Hunter, and Hunter (BH&H) have to say on the subject.  First jog your memory and recall that each deviation for a sample is the difference in an observational value y and the sample average ŷ. (This symbol is ‘y-hat’, which in physics is used to indicate the  unit vector in the y direction, but the symbol set on my computer doesn’t seem to have ‘y-bar’, so I’ll use ŷ instead.)  The deviation for a population is the difference in y and the mean η.   That is, in the case of a sample average, the deviation is yŷ, and in the case of the population mean, the deviation is y η.  Add up the deviations for a given sample or a given population and what do you get?  That’s right, zero.
 
Here’s the BH&H paragraph that discusses why n-1 is used for calculating the sample standard deviation, but first they have to define the variance:
 
“The deviations of n observations from their sample average must sum to zero.  This requirement, that Σ(y-ŷ) = 0, constitutes a linear constraint on the deviations or residuals y1 – ŷ, y2 – ŷ, …, yn – ŷ used for calculating s2 = Σ(y- ŷ)2/(n-1).  It implies that any n-1 of them completely determine the other [one].   The n residuals y- ŷ [and hence their sum of squares Σ(y- ŷ)2 and the sample variance, Σ(y- ŷ)2/(n-1)] are therefore said to have n-1 degrees of freedom.  In this book the number of degrees of freedom is denoted by the Greek letter ν (nu).”
 
Allright!  So in the People Aren’t Perfect lab, we have ν = n – 1 = 10 – 1 = 9 degrees of freedom.  BH&H say this:  “The loss of one degree of freedom is associated with the need to replace the unknown population mean η by ŷ, the sample average derived from the data.” 
 
Yep, and as you just noticed, there’s this thing called the variance, which the standard deviation comes from by taking the square root.  The population variance is σ2 = Σ(y-η)2/N,  and its standard deviation is σ =[Σ(y-η)2/N]1/2.  The sample variance is s2 = Σ(y-ŷ)2/(n-1) and its standard deviation is  s = [Σ(y-ŷ)2/(n-1)]1/2.

Would you believe this is the same expression as the one at the bottom of page 1 of the People Are Not Perfect* lab handout?  Different symbology is the only difference.

*Neither are machines!  [But what about equations?]
 

06 September 2013

Box, Hunter, and Hunter

I’ve been studying a book called Statistics for Experimenters: an Introduction to Design, Data Analysis, and Model Building, by the intrepid trio of authors Box, Hunter and Hunter (their last names).  Section 2.2 is called “Theory:  Probability Distributions, Parameters, and Statistics.”  That section starts with a subsection called “Experimental Error,” from which this is the first paragraph:


"When an operation or experiment is repeated under what are, as nearly as possible, the same conditions, the observed results are never quite identical.  The fluctuation that occurs from one repetition to another is called noise, experimental variation, experimental error, or merely error.  In a statistical context the word error is used in a technical and emotionally neutral sense.  It refers to variation that is often unavoidable.  It is not associated with blame."

The authors do, however, mention in the next paragraph that experimental error must be distinguished from "careless mistakes,"  which presumably are associated with blame.  The question would then be is there also a resulting shame when the careless mistakes are discovered?  Sometimes yes, sometimes no would be my guess.

This book nicely discusses the difference in the concepts of sample and population.  In discussing the average value associated with a sample, it uses the letter y with a bar above it to represent the average (whereas I used the brackets < >).  Since the average, it says, "tells us where the scatter of points is centered, it is called a measure of location for the sample."  Lovely!

Next the book discusses the population:  "If we imagine a hypothetical population as containing some very large number N of observations, we can denote the corresponding measure of location of the population by the Greek letter η (eta), so that

                                                           η = Σy/N

(end of quote).  These authors use y as the variable that represents samples and populations.  And they have more to say about samples versus populations:



"To distinguish the sample and population quantities we call η the population mean, and 'y-bar' the sample average. In general, a parameter like the mean η is a quantity directly associated with the population, and a statistic like the average 'y-bar' is a quantity calculated from a set of data often thought of as some kind of sample from a population.  Parameters are usually designated by Greek letters; statistics, by Roman letters." 

Population:  a very large set of N observation from which the sample can be imagined to come. The population mean, η = Σy/N, is considered to be a "parameter." 
Sample:  a small group of n observations.  The sample average,  y-bar = Σy/n, is considered to be a "statisitic."

Finally, at least as far as this post is concerned, Box Hunter and Hunter mention this:
"The mean of the population is also called the expected value of y, or the mathematical expectation of y, and is then denoted by E(y).  Thus  η = E(y)."
Physicists usually use brackets <> for the expected value, especially in quantum mechanics where it's called the "expectation value."  In my earlier posts, I was thinking that "average" and "mean" are synonomous, but now I see that they aren't.  I was also thinking that I could use the formula for standard deviation shown in A Serious Man to rewrite the standard deviation formula in the People Aren't Perfect* lab, but I can't, since the former is the mean and the latter is the average.  I can, however, see what error arises when I try to use the mean instead of the average, which I plan to do one of these days. 

*and neither are machines