Stats Made Easy

Practical Tools for Effective Experimentation

Sunday, May 25, 2008

Advice from famous physicist Feynman: “You must not fool yourself”

My bookseller friend Rich emailed recently about a find he made:
>In your physics days, did you ever encounter the famous Feynmann Lectures in book form (three volume set)? Feynmann was a respected renegade, if there IS such a thing. But he was good enough to be appointed to the Challenger's explosion evaluation. Interestingly, before the SSTs flew, he predicted a 2% failure rate -- and he's been right on.<

That reminded me of one of my favorite quotes by this renowned that I bring up when discussing the dangers of deleting experimental outliers:
“The first principle is that you must not fool yourself -- and you are the easiest person to fool.”
It comes from a Feynmann’s talk on Cargo Cult Science.

PS. While searching the internet on the topic of bias, I came across an interesting website that provides “an outlet for experiments that do not reach the traditional significance levels (p < .05)”! It’s called the Journal of Articles in Support of the Null Hypothesis. The journal’s purpose is to reduce the ‘file drawer problem,’ that is the tendency for unpublished results to get buried in researchers' file cabinets.*

*To learn about this propensity to publish only the positive, read this study by Jeffrey D. Scargle of the Space Science Division National Aeronautics and Space Administration, Ames Research Center.

Tuesday, May 20, 2008

Duck named DOE (pronounced "dewie")

I now have a duck as a roommate in my office at home. My daughter's biology class hatched a bunch* of them and she brought Dewie Pablo Abnot Decker (aka DPAD) home. Abnot is my name that is derived from "You will not get a duck -- absolutely not!" However, DPAD cannot be resisted, so I've taken him (her?) under my wing, so to speak. I view it as an unplanned experiment -- once the fowl flies into your pond, it's best to just go with the flow and make the best of it.

PS. Today my little shrub put out a lovely red flower. That has never happened before. I think the duck did it. I had to move the pot from its corner to a sunnier spot in order to make room for the cage. So this is another unplanned experiment. I like it!

*According to Shannon McKinnon of the Alberta Red Deer Advocate there are six ways to describe a bunch of ducks -- a flock of ducks, a brace of ducks, a flush of ducks (we are not allowing our duck into the bathroom!), a paddling of ducks, a raft of ducks (could be a bit unstable for boating) and a team of ducks (that would be the one in Anaheim).

Monday, May 19, 2008

What would Deming say about the demise of testing in education?

Earlier this month I was listening to a local talk radio show when a caller provided this “end of the world” development: To be more kindly and gentle to students stressed out by math tests, teachers now refer to these as “celebrations of knowledge”!

At the time I heard of this outrageous slackening of educational standards, I was in the midst of reading a book that my buddy Rich sent me that’s titled “The World of W. Edwards Deming.” (If you are a bibliophile like me, check out his eclectic mix of uncommon books offered for sale via Ebay). My entry into the world of quality assurance was catalyzed by the electrifying documentary “If Japan can... Why can't we?” by NBC in 1980 featuring Deming and his use of statistical methods.

One of my favorite stories in the book on Deming, which was written by his long-time personal assistant Cecelia S. Kilian, involves another pioneer in the field of statistics, a fellow named Harold Dodge. Deming worked with Dodge during WWII to develop statistical standards on a wartime emergency basis. Over a decade before that, Deming had an internship with Bell Telephone Laboratories – Dodge’s employer. During these times the statisticians working under Dodge played a neat trick on him as he tried to get a feeling for the cord length on a newly-developed handset: They clipped off millimeter or two every day. Deming recalls seeing Dodge stoop to an astoundingly uncomfortable level to take a phone call. Evidently it’s not hard to fool an absent-minded statistical genius!

Getting back to Deming’s views on education, I really do wonder how he would feel about the practice of testing as an incentive for students to develop a profound knowledge of their subject. In his book “Out of the Crisis” (1982, MIT) he said “I have seen a teacher hold a hundred fifty students spellbound, teaching what is wrong.” He credits Sir Ronald Fisher as his inspiration for learning statistics, despite being a “poor teacher on every count”! Deming made no secret of his dislike for grading, rating, and testing in industrial settings. Therefore I suppose he would approve of the new, more positive approach of celebrating knowledge, rather than making students take final exams.

“When teachers are forced to teach to the test, students get bored and genuine education ceases, no matter what the test scores may say… The examination as a test of the past is of no value for increased learning ability. Like all external motivators, it can produce a short term effect, but examinations for the purpose of grading the past do not hook a student on learning for life.”
-- Myron Tribus (from his essay Quality in Education According to the Teachings of Deming and Feuerstein )

Monday, May 05, 2008

Baseball batting averages throw some curves at statisticians

“I had many years that I was not so successful as a ballplayer, as it is a game of skill.”
-- Casey Stengel (from testimony before United States Senate Anti-Trust and Monopoly Hearing, 1958)

Last week the University of Minnesota School of Statistics sponsored a talk titled “In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies,” presented by Lawrence D. Brown from the Statistics Department of Wharton School at the University of Pennsylvania. My colleague Pat Whitcomb attended and told me about a few findings by Brown that baseball fanatics like me would find a bit surprising:
“The simplest prediction method that uses the first half [of a season’s] batting average …performs worse than simply ignoring individual performances and using the overall mean of batting averages as the predictor for all players.”*

Evidently these professional players perform at such a consistent level that the ones hitting at a higher than average rate up until the mid-season break tend to regress back to the mean the rest of the way, and vice-versa.

Of course, by looking at many years of past performance, one would gain some predictive powers. For example, in 1978, more than ten years into his Hall of Fame (HOF) career, Rod Carew batted .333 for the Minnesota Twins. He made it to the Major Leagues only a few years ahead of fellow Twin Rick Dempsey, who hit at an average of .259 in 1978. Carew finished up his 19-year playing career with a lifetime batting average (BA) of .328, whereas Dempsey hung on for an astounding 24 years with a BA of only .233! It would not require a sabermetrician to predict over any reasonable time frame a higher BA for a HOF ballplayer like Carew versus a dog (but lovable, durable and reliable defensively at catcher) such as Dempsey.

Brown also verifies this ‘no brainer’ for baseball fans: “The naıve prediction that uses the first month’s average to predict later performance is especially poor.” Dempsey demonstrated the converse of this caveat by batting .385 (5 for 13) for his Baltimore Oriole team in the 1983 World Series to earn the Most Valuable Player (MVP) award!

Statistical anomalies like this naturally occur due to the nature of such binomial events, where only two outcomes are possible: When a batter comes to the plate, he either gets a hit, or he does not (foregoing any credit for a walk or sacrifice). It is very tricky to characterize binomial events when very few occur, such as in any given Series of 4 to 7 games. However, as a rule-of-thumb the statistical umpires say that if np>10 (for example over 50 at-bats for a fellow hitting at an rate of 0.200), the normal approximation can be used for binomial distributions and the variance becomes approximately p(1-p)/n.** From this equation one can see that the bigger the n, that is – at-bats, the less the fraction (batting average) varies.

PS. I leave you with this paradoxical question: Is it possible for one player to hit for a higher batting average than another player during a given year, and to do so again the next year, but to have a lower BA when the two years are combined?

*Annals of Applied Statistics, Volume 2, Number 1 (2008), 113-152

**This Wikipedia entry on the binomial distribution says that “this approximation is a huge time-saver (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1733.”