29 October 2013

Who's afraid of the standard deviation?

The lab report in my previous post doesn’t show any actual standard deviation calculations. So I recently followed the instructions of the “People Are Not Perfect” lab handout (after nearly forty years) and timed a metronome.  I used a digital stopwatch that measures to the nearest hundreth of a second to time the metronome on my Yamaha electronic keyboard through four beats.  I repeated the measurement ten times.  These are the results, with the units being seconds:

20.52
20.41
20.49
20.36
20.18
20.49
20.29
20.49
20.46
20.47

Here is the average of these measurements:

<t> = (t1 + t2 + t3 + … + tN )/N  = Σ ti/N =  (204.16)/10 = 20.416 seconds.


The deviation of each value from the average is ti - <t>.  Here they all are

0.104
-0.006
0.074
-0.056
-0.236
0.074
-0.126
0.074
0.044
0.054

If you add up all the deviations you get zero (try it), because the below-the-average deviations are negative numbers, and they cancel the above-the-average deviations.  So, as you know by now, you can’t find a useful “average deviation” by adding up the deviations and dividing by N.  Instead, you square each deviation (the squares are all positive, see?), then add up all these squared deviations and divide by N. Well, for a set of data (a "sample" as opposed to a “population") we divide by N-1.  I wrote about the reason for the use of N-1 instead of N in my post called “Degrees of freedom.”  Finally, we take the square root and have the standard deviation.  (In other situations, this process of squaring, finding the mean, then taking the square root goes by the name of “root mean square,” or RMS, which I’ll come back to later, if the need arises.) The deviations squared are

0.010816


0.000036


0.000030


0.003136


0.055696


0.005476


0.015876


0.005476


0.001936


0.002916


The sum of these is 0.101394.  Lower case sigma is a common symbol for the standard deviation, so we’ll call the standard deviation in time “little-sigma-sub-t.”  

σt = sqrt(0.101394/9) = sqrt(0.011266) = 0.10614 seconds,

or, rounding it to the fraction of a second shown on the stopwatch (one hundredth of a second), the sample standard deviation is 0.11 sec.

Well, I know this isn’t the exciting part, sorry.  It just has to be done for some reason I can’t explain.  I hope to get to a more exciting part soon.

If we use N =10 instead of N-1 = 9, what do we get?

sqrt(0.101394/10) = sqrt(0.0101394) = 0.10069 seconds,

or if we round it to hundredths , 0.10 sec.

Now, what about that formula from A Serious Man? I said earlier I wanted to compare it with the UALR lab’s formula, but I later figured out it represents the population or theoretical standard deviation, whereas the lab uses the sample standard deviation. So now I just want to compare it to the "divide by N=10" case and see if they are equal.  Here it is:

σt = (<t2 >- <t>2)1/2          (population or theoretical std. dev.)

Recall that <t2> is the average of t-squared, while <t>2is the square of the average of t.  You would or could read the former as "tee-squared bracket" and the latter as "bracket-tee squared."  

(Compare how, in the movie, Larry writes these and reads them off the board, where it's "p" instead of "t":  he writes both as <p>2 and says "bracket pee squared minus bracket pee squared,"  so he has written something minus itself, which is identically zero, which is another case of the Coen brothers messing with our sense of reality.  This time they mess with our math anxiety, too.)  

What the formula for σt says is “subtract the square of the average of t from the average of the square of t and take the square root.”  Putting in the numbers gives
… oh, no, I haven’t yet squared all the ti’s in order to find <t2>.  Later!
Okay, now it’s later.  Here's how <t2> is found. I squared each measurement and got

421.0704
416.5681
419.8401
414.5296
407.2324
419.8401
411.6841
419.8401
418.6116
419.0209

We want the average of these.  The sum is 4168.237, and dividing by 10 gives:

<t2 > = 416.8237 = "the average of the square of t."

At the beginning, above, I found the average of the t measurements to be 20.416.  So the square of the average  is

<t>2 = (20.416)2 = 416.813056.

And now…

σt = (<t2 > - <t>2)1/2  = (416.8237 - 416.813056)1/2

     = (0.010644)1/2

   = 0.1032

When rounded to the nearest hundredth second, this is equal to 0.10 from above.  To make the comparison more precise, we can use four digits and do a percent difference: Take the difference of the two and divide it by the average of the two, then multiply by 100.  I did this and the percent difference is 2.54%.
 
Now finally I will compare the 0.10 sec result, the "wrong" way of doing this data-based standard deviation, with our calculated sample standard deviation, 0.10614 sec, which rounds off to 0.11 sec.   The percent difference is  [0.01/(0.21/2)] x 100  =  9.52%, or about 10%, as maybe you could tell by simple comparison of 0.11 and 0.10?  In a way, this is comparing apples and oranges, because the "N" associated with a population isn't going to be the "N" used in the N-1 associated with the sample.  In the Box, Hunter and Hunter text, this is made clear by their use of n for the number of sample values collected and N for the population number.
 
So that's it, friends, you can draw your own  conclusions.  If you conclude that I don't work with actual numbers very often, you'd be right. 
Next time I'm going to use this 0.11 sec standard deviation to recalculate the numbers for the Cars and Speed Limits lab,  and also convert the speeds from meters per second to miles per hour. The conversion factor to change from m/s to mph is 2.24, so as one quick example from my 1974 lab report, the car measured to have a speed of 18.2 m/s would have been traveling at 40.8 miles an hour.
 A final thought:  these speeds are measurements of average speed over 100 meters rather than measurements of instantaneous speed. (Your speedometer tells you what your instantaneous speed is.)  What if you were speeding down University Avenue in Little Rock on a certain day in early June 1974 and you saw some kids along the side of the road ahead of you, apparently college students, taking some kind of measurements?  You'd likely slow down.  You might even be still slowing down in the 100 meter interval over which they were timing you, and even could have been going, say, 55 mph at the beginning of the 100 meters and 25 mph at the end, to take a very extreme possibility.  Your average speed in that case is (55 + 25)/2 = 40 mph even though you were speeding at first.  (Yeh, the speed limit on that four-lane boulevard is 40 mph, and yes, this average found by a different method should equal the measured average.  But from now on I'm going to be more wary of things that should turn out to be equal...)