27 August 2013

A Serious Man's standard deviation

As you know, an average or mean value for a set of measurements is found by adding up all the measured values and dividing by the number of measurements.  The “People Aren’t Perfect” lab handout shows how it’s done, but it doesn’t show a pretty cool shorthand symbol that can be used instead of writing out the sum.  That symbol is the Greek capital letter sigma Σ. You replace the written-out addition with cap sigma and an index (usually i) that goes from 1 to N, like so:

<t> = (t1 + t2 + t3 + … + tN )/N

 =  Σ ti /N,   from i= 1 to N.

Unlike the lab handout, I'm not going to use a symbol for the "deviation" because I'm going to square it and then do the multiplication of each term.  So here it is, unsquared:

Deviation of each value from the average = ti - <t>.

Now, if you add up all the deviations you get zero, because the below-the-average deviations are negative numbers, and they cancel the above-the-average deviations.  So you can’t find a useful “average deviation” by adding up the deviations and dividing by N. 

But we still want to have a way of knowing how the measured values are dispersed around the average value: Widely dispersed or narrowly dispersed?  That’s how the “standard deviation” came to be defined.  What's called the sample standard deviation, as used in the lab handout, is found by squaring each deviation, then adding up all these squared deviations and dividing by N-1, and then taking the square root. The N-1 comes about because we are considering a "sample" rather than the entire "population." Look it up for yourself if you want.  It's kind of interesting.  Pollsters can only reasonably poll a sample of the entire population of voters, for example, and using  N-1 is supposed to make the sample standard deviation more closely approximate what the population standard deviation would be if you actually calculated it. 

Here we go with the squaring of the deviation:

                         (ti - <t>)2  =  (ti - <t>)( ti - <t>) 

                                          =  ti2 -  <t> ti  -  ti <t> +  <t><t>

                                          =  ti2 - 2<t> ti  + <t>2  .

Now, as Professor Larry Gopnick says when describing Schrödinger’s Cat to his class, “You following this?  So…okay…this part is exciting.”  Remember we need to sum these squared deviations, then divide by N-1, then take the square root, to get the standard deviation.  Symbolically, before taking the square root, it looks like this

                             Σ (ti2 - 2<t> ti  + <t>2 )/(N-1),

which is just a different way of writing the expression that appears under the square root symbol in the equation for Δt in the People Aren't Perfect lab discussion. 

Next, distribute the (N-1) factor to each term, like so

                        Σ [ti2/(N-1) - 2<t> ti/(N-1) + <t>2/(N-1)],

then do the same with the summation operator Σ

                        Σ ti2/(N-1) - Σ 2<t> ti/(N-1) + Σ <t>2/(N-1).

And now I've gotten myself in trouble, but like a fool I will keep going.  Remember <t> is the average, so <t>2 is just the average squared.  Both of these are just numbers, and the summation symbol doesn't have any effect on them, so they can be taken outside the summation symbol.  The (N-1) factor could be taken out also, but I want to use it inside the summation.  What I've got now is

                  Σ ti2/(N-1) - 2<t> Σ ti/(N-1) + <t>2Σ[1/(N-1)].

The trouble arises because I need N and not N-1 in each of these terms.  What I'm gonna do is use N, then see what error or difference arises in comparison with the use of N-1.  Y'all know if N is a large number, N and N-1 are not that much different. But N=10, as in the lab handout we'll be getting back to soon, is not a large number.  Hmmm.  Anyway, using N, we have:
 

                      Σ ti2/N - 2<t> Σ ti/N + <t>2Σ1/N

Recall from way up above that the average is <t> = Σ ti/N  (now you can see why I need N, not N-1)  so what we have in the middle term is just 2 <t><t>  =  2<t>2.   And the first term is the average not of t but of t2, written <t2>.  See how exciting this is?  We now have

                             <t2> - 2<t>2+ <t>2Σ1/N,

which is just what I wanted except for the "sum of one-over-N" part.  We used the summation symbol in the other terms and we have to use it here, too, even though it seems there's nothing to sum up.  The operation looks like this
 
                                              Σ1/N  =  (1/N)Σ(1),
 
or at least I'm guessing that's the way it can be written, and that
 
                                   Σ(1) = 1+1+1+ ... +1 = N.
 
So          Σ1/N =  (1/N) Σ(1) = (1/N)(1+1+1+ ... +1) = (1/N)N
                                                  =  1.
 
You following this?  We have
 
                                    <t2> - 2<t>2+ <t>2
 
                                =  <t2> - <t>2


 
Put this under the square root symbol, and you have the desired standard deviation expression, which in the movie is the standard deviation of the momentum, p, as written on the board behind Larry while he's talking to Sy in the classroom dream scene in A Serious Man.  I will use  (....)1/2 instead of the usual radical symbol to show the square root:
                                     Δp = (<p2 > - <p>2)1/2
 
 
and we're done.  Except you need to watch the movie again, eh?  You missed so much the first time!