Probability and Percentage
Sunday, July 4th, 2010Author Tony Demaio
(You Fool Me Once…)
Leo Durocher once said, “Baseball is a game of percentages.” Leo was wrong. What he SHOULD have said was, “Baseball is a game of probabilities.” Of course, while somewhat correct, that is still insufficient. A correct statement is, “LIFE is a game of probabilities.”
Being a professional statistician, I often cringe at the (mis)uses to which my “science” has been put. I suspect most of the modern professionals in the field either wear feathers, a breechcloth, and a shark’s tooth necklace; or a mask and gun. Let us look at percent and probability and perhaps I can alert you to some of the abuses and errors that are often committed in their names.
First, simply because a “percentage” and a “probability” are generally computed the same way, they are NOT the same entity. A percentage applies to a past event (e.g. 20% of the students passed the exam); a probability applies to some future event (“You have a probability of .50 of winning a coin toss.” often erroneously stated as, “You have a 50 percent chance of winning a coin toss.”). A percentage can be greater than one or even negative (there was a 110% gain in performance); a probability must lie between zero and one (inclusive)–with zero meaning no chance of happening; one meaning it must happen. (note the future tense)
Under some circumstances, we can “shift” from one to another. Hence, if we survey the students at some school and determine that 10% are on the honor roll, we may conclude that if we take a student at random, we have a .10 probability of selecting someone on the honor roll.
What is somewhat disconcerting is the (often deliberate) “reluctance” of some researchers to report the raw numbers used in calculating percentages, or telling us how they were derived, i.e. from what population or experiment. Be suspicious when the raw numbers are “not available”.
Consider the California law requiring booster baby seats for children. It was stated that 75% of the child fatalities could have been avoided if booster seats were used. Later research showed that 12 children were killed that would have been affected in the year the law was passed. (Subsequent research also showed that car seat manufacturers made sizeable campaign contributions.)
Or, consider this statement from the U.S. Weather Bureau:
Lightning is a major cause of storm related deaths in the U.S., out pacing hurricanes and tornados in most years.
Research shows that there is an average of 58 reported lightning fatalities each year.
Then there is always the ploy of taking a perfectly rational/reasonable statement and emphasizing it in such a manner that it “sounds” ominous. A member of the British Parliament stated that, “One half the babies born in England are below average birth rate.” Or, “One half the children read below grade level.” (Grade level is usually defined as the average of a particular grade.)
Be suspicious when ONLY raw numbers are given. For instance, recently Congress decided to “investigate” Exxon for “obscene profits” of billions of dollars. Apparently, Congress never bothered to compute Exxon’s return on investment (ROI) which was about 9%.
Consider the statement 50% of marriages end in divorce. (Clearly, marriage is the major cause of divorce.) Consider the population on which the number was computed. As an extreme case, assume a closed population of 50 couples that get married. Suppose 10 couples get divorced. That would be a divorce rate of 20%. Now, suppose those 10 couples remarry–and get divorced again. There are now 60 marriages and 20 divorces. Clearly, the divorce rate is now 33%. Of course, we can play that game again, leading to 70 marriages and 30 divorces. If one considers PEOPLE instead of MARRIAGES, one notes that 80% of the people that get married, stay married. One must be quite careful to ascertain the population measured, and how the percentage was computed.
Consider the statement “63% of the people killed in car accidents were not wearing a seat belt.” Such a statement is meaningless without comparison statements, yet it implies that if the people were wearing a seat belt they would not have been killed. Consider that if there WERE NO seat belts and NOBODY wore them, one could truthfully state, “100% of the people killed in car accidents were not wearing seat belts.” One might consider the implications if one were told that only 1% of the people wear seat belts, yet they contribute 37% to fatalities.
Or, consider the statement, “Alcohol is involved in 50 per cent of the fatal accidents. “ (Apparently, drunks don’t wear seat belts.) The question is, “What does ‘alcohol involved’ mean?” The implication is that “drunk drivers” are killing people and dying themselves in car accidents. Further investigation into the studies indicates that if one driver had one drink, “Alcohol was involved.” Also, if a passenger had one drink, “Alcohol was involved.” Finally, if there was an (old) open container but no one had ANY alcohol, “Alcohol was involved.” Note that under this definition, if a drunk is stopped at a stop sign and someone rams him from behind and kills him, “Alcohol was involved”. One must pay close attention the definition of what is being measured.
Particularly insidious are ratios of ratios (percentages of percentages). Consider the statement, “We are growing twice as fast as our competitor.” This may well mean their sales went from one million to two million dollars (sales doubled), and our sales went from ten dollars to forty dollars (sales quadrupled)
A “real life” example is smoking, as in, “You are 23 times more likely to get lung cancer from smoking than if you do not smoke.” (males–females are 13 times more likely). Quite damning, but lets look at the whole picture.
85% of lung cancer was caused by smoking
About 25% of the population smoke (whites)
The population of the United States is about 300,000,000
About .25% x 300,000,000 people, or 75,000,000, smoke.
In 2003 157,200 people died of lung cancer
The proportion of the population that dies from lung cancer in a given year is .0005.
Using the above figures, 133,620 of those deaths can be attributable to smoking. So, if you smoke, the proportion dying from smoking must be: .0018
If you do not smoke, then your chances of getting lung cancer must be:
The number of people who do not smoke who got lung cancer: 9,720
Divided by the number of people who do not smoke: 225,000,000 or .00007
(Actually, slightly inaccurate since I am counting cancer deaths from all races.)
When one looks at the whole picture, one wonders what the fuss is all about. If you smoke, there are roughly two chances in a thousand of getting lung cancer in a given year. And yet, the statement is perfectly true that you are 23 times more likely to get lung cancer if you smoke.
As a comparison, the chances of lightning hitting a given house is about one in 200; and the chances of being involved in a fatal and/or injury car accident is about one in a hundred; yet few houses have lightning rods, and folks continue to drive.
Much of the same kind analysis can be done with other programs–seat belts, bicycle helmets, etc. We often spend tremendous amounts of money to fix “trivial” problems. Most of the time, the effort can be traced to large campaign contributions by companies that will profit by the effort.
It must be remembered the “percentage” is a ratio consisting of a numerator and denominator. The value can be changed by changing the value of the numerator, denominator, or both. Consider the statements:
100 students cannot get into college because of grades
1000 students cannot get into college
Consequently 100/1000 (10%) of the students who could not get into college could not do so because of grades
Assume that an additional 500 students could not get into college because of SAT scores.
Further assume that the colleges stop requiring SAT scores, so that the 500 students who could not get in now CAN get in. How does that change the percentage? Clearly, now
100/500 = 20% of the students who cannot get into college cannot because of bad grades. Note how a totally extraneous effect can affect our data (e.g. by changing the denominator).
Far more insidious is when both the numerator and denominator are changing at different rates. Consider the employment data:
At this time, the unemployment rate is considered to be 10%. For sake of argument, we will assume 100/1000. What will happen to the rate as: 1) people find jobs, 2) more people become unemployed, 3) people drop off the rolls because they quit looking? The reader is encouraged to “play with” different numbers and note that the number of folks unemployed can go down while the percentage goes up, and vice versa.
Finally, there is survey sampling. At one time, polls/surveys were used to determine public opinion. They are now often used to SHAPE public opinion. It is well known that people like to be “with the crowd.” Hence, if a poll is published with a large number of people that believe something, it will sway others to “join them”. There are many ways to “shade” the poll results, ranging from improper sampling (e.g. over sample groups you know believe the way you want the poll to come out) to “push polls” where the questions are framed in such a manner to convince the respondent of the validity of the answer you wish. (E.g. Do you want the government to WASTE tax payer money on…..)
Furthermore, they report a “standard error” of 3% or so. That means that were the survey repeated, the results should not deviate by more than 3%. It does NOT state that the survey was accurate to 3% in terms of phrasing of questions, sampling, etc.
Well, folks, thanks for listening. The science/discipline of statistics has been so bastardized and prostituted that no wonder Mark Twain said, “There are lies, damn lies, and statistics.“ I, and others like me, object to it in much the same fashion that a policeman hates a rogue cop, and a doctor hates the malpractice of medicine.
Related posts:

Desertrat says:
July 4th, 2010
9:22 am
Lord knows how long back I learned that “Half of all people are below average.” Somewhere along the line I started thinking in terms of odds and probabilities. That tends to lead to scepticism. The more you learn about people, the more sceptical you become as to either honesty or dishonesty: What are the probabilities?
As far as corporate profits, I’ve long griped about the way the mediahcrities report them. Last year you made one dollar on a million buck’s worth of sales. This year you make two dollars on that same million. Hey, profits doubled, right? Still not enough to walk across the street for…
Again, thanks.
‘Rat
James the Wanderer says:
July 4th, 2010
10:21 am
Not much has changed since Mark Twain’s “Figures don’t lie, but liars figure”.
Thanks for an illuminating read. Have you read Nassim Taleb’s _The Black Swan_? More improper use of statistics, along with reasons why given statistical processes don’t apply to certain situations.
Now, off to buy my lottery tickets for the week!
james
Steve Foste says:
July 4th, 2010
10:28 am
Tony,
Pretty enlightening you have brought to my attention some things that have been bothereing me for many years now that I didn’t have answeres for. Over time I have begun to ignore most of the figures and statistics that were presented to us via the media, many times the just didn’t make sence, I just didn’t know why, it was in the back of my mind that they weren’t accurate for some reason.
I began to realize over the years that whenever a figure was reported on is that it almost always had a bias in regards to what was being reported on and who was paying for the information and never, as you stated, was the emperical evidence presented as to how they arived at this figure.
The worst of the bunch to me is the Health industry, pharmacutical studies, and the herbal industry, I just came to belive that regardless of the study double blind or otherwise that the results, in most cases, were always interped to the favor of the company if there were any positive results and that the results were always skewed to prsent the favarable side of their argument ignoring the possible consequences of the side effects.
I was asked one time to write a sales artical to sell a men’s multi vitamin, it’s benifits and features. I finally gave up on that assignment because I could not find valid information that would justify real benifits and extrodinary claims. In fact at one time I was studying to be a natural health nutritionist but over time I became discourage because there was so much dissenting information on both sides of the isle good and bad. One study would prove the benifits and the next day I would find something proving the quackery of the preceding information. Since I could not find the truth to the claims I finally quit fighting the battle.
Don’t get me wrong I am not on the side of the FDA or the AMA, or the pharmacuticals, these people run a revolving door in regards to management and policy and most of the time it is not in our best interest. I began to see over time that I just could not trust the numbers.
That is just one area of concern, but I also see it in all of the government reports monthly, as well as the quarterly publishing of the financial conditions of stocks, I have become as Rat said
What was that old saying, figures don’t lie but liars figure.
I feel that is our govenrment at it’s best. Keep skewing the numbers till you report in my favor. Does anyone here really belive that the Health Care bill will only cost 1 trillion in 10 years. How many government projects have any of you seen that ever came in under budget.
A Skeptic.
Thanks
Steve
Oh yeah, and inflation figures.
Steve Foste says:
July 4th, 2010
10:30 am
James,
Buy two of each, double your chances, and hey winning would be a true Black Swan event.
Steve Foste says:
July 4th, 2010
11:40 am
So is this the point Tony, If James buys two lotto tickets his chances to win increased by 100 percent, but in the case of a lotto ticket his probability of winning in reality didn’t change at all.
CheriVNB says:
July 4th, 2010
1:11 pm
Hello, can you say Global Warming? Talk about trying to hit a moving target… The original data was lost/destroyed and the collection and reporting methods switched up down the time line. Not to mention constant revision of historical data by NASA, or that many natural phenomenon are logarithmic or just non-linear. It is like playing a game of 3-D checkers with 2-D rules and strategies. With everyone so busy figuring how to profit from the info or control other people, makes you wonder who is minding the store….
Tony, nice article. As a kid (High school-still fresh) I used to feel the need to correct people or explain their error. This did not make me very popular, as you can imagine. Eventually, wanting to “fit-in” I became equally sloppy. One of the things I like about the Ring is the overall level of awareness of FACT and context which can lead to a higher degree of TRUTH/understanding. As Lynne might say, “Reality just is.”
That being said the reporting of numeric information usually tells you more about the reporter than the situation reported on. Which reminds me of an old joke.
A hot air balloonist had some trouble with his burner and drifted off course on decent. He spots a gentleman on a bycle and hollers down to him, “Hello sir, can you tell me where I am?”
The gentleman looks up and yells back, “Yes, you are about 100 feet above me!”
The balloonist replies, “You must be a physicist!”
Flattered the bicyclist answers back “Why, yes. How did you know?”
The exasperated balloonist shoots back, “Well, because you are absolutely correct and totally useless!”
One last thought, and this one has tripped me up now and then. A percentage appears smaller in a contracting environment than an expanding one for the same amount of change.
Example:
Buy a house for 100K. Real Estate appreciates 100%, the house is “worth” 200K. Something changes, say Barney Frank is in charge, and Real Estate depreciates 50%. The same house is “worth” 100K, again. The value change in both cases is 100K. Same thing is true of stock market portfolio valuations, GDP growth and unemployment/employment rates.
Cheri
CheriVNB says:
July 4th, 2010
2:38 pm
Why is it you only see the spelling errors after hitting “Submit”?
Tony says:
July 4th, 2010
3:48 pm
James,
No, I haven’t read the Nassim’s book, but I’ve read about it and corresponded with him.
His thesis, while somewhat correct, has some problems. For one thing, if you presume that NOTHING is “predictable”, then planning is not possible. MOST events are “linear” over a range. Hence, don’t let his thesis blind you to the fact that while past performance does not necessarily predict future performance, past performance is usually the best indicator we have. While some unpredicted and unpredictable event can distort the present “reality”, it don’t happen too often.
Wot’s the old joke. If you want a laff on Wall Street, just say, “It’s different this time.”
As I sed in an interview once, after being asked if I made mistakes. “I’ve made many mistakes, but I seldom make the same mistake twice, and only once did I make it the third time.”
The guy looked puzzled and asked, “What was that mistake?” I replied, “I loaned a friend $3,000, and lost it all–including the friend.” The guy said, “How is it you made the same mistake three times?” To which I replied, “I loaned it to him a thousand dollars at a time.” Often, the past is the best predictor of the future you have.
always,
tony
Tony says:
July 4th, 2010
4:05 pm
Steve,
Here’s an utter situation I probably should have included. You generate a bunch of events and cherry pick from the results. Much like a stock advisor that makes a hundred recommendations, then chooses to report the 3 that made money.
Wot’s the old joke. A guy gets a letter in the mail. It says, “Buy stocks A, B, C.” He, of course, ignores it. The stocks double.
A month later, he gets a letter saying, “Buy stocks D, E, F.” He of course, ignores it. The stocks double.
A month later, he gets a letter saying, “Buy stocks G, H, I.” He thinks about it, then chooses not to buy. The stocks double.
A month later a guy shows up at his door and says, “I told you to buy three sets of three stocks. Had you bought them, you would be rich. I’m tired of giving you free information you won’t act on, so I’ll SELL you 3 more picks for $10,000.
What the “mark” doesn’t realize is the guy sent out 100,000 letters to 100,000 people–all with different stocks. Then after the first results came in, he sent out 10,000 letters to the “winners”–again with different stocks. At the end, he only had 10 people to “visit”.
In like manner, I’ll sell you my “lucky penny” for $2,000. This penny has predicted every presidential election since 1860. I started with a vast number of pennies and said “heads, Republican; tails, Democrat”. I threw them in the air and half came up heads, and half came up tails. Since Lincoln (a Republican) won the 1860 election, I threw away all the pennies that came up tails, and flipped them again. Lincoln got re-elected, so I threw away all the pennies that came up tails and flipped them again. I kept this up until I reached the 2008 election. I had two pennies left. One came up heads, the other tails. I call the one that came up tails my lucky penny since it had predicted every election since 1860. I’ll flip in next year and bet the farm on what it predicts the outcome will be.
always,
tony
David Franklin says:
July 4th, 2010
4:25 pm
REGARDING PURCHASING LOTTERY TICKETS:
The mathematical reality is, the more you buy the greater your losses will be.
This is one where Adam Smith got it 100% correct. Paraphrasing what he wrote, “If one adventures (purchases) upon all the lottery tickets, you will most certainly loose nearly all your money.”
Why is this so? BECAUSE while the lottery may say a jackpot is worth 10 million, they do not divulge the actual TOTAL AMOUNT they have taken in for that particular jackpot. The prize may be 10 million but they have actually taken in a sum far greater than actually published.
So, if you buy all the tickets, you will “win” the 10 million, but most likely loose 30-50 million before arriving at the 10 million dollar ticket.
Simply, the lottery odds maker controls the mathematical odds and they are in his favor, NOT yours.
Regards,
Dave
Desertrat says:
July 4th, 2010
6:45 pm
I’ve always used the word “odds” in lieu of “probabilities”. The usage works out the same. Odds are, you’ll seven before you six or eight when the little cubes are rolling. 6:5 against you.
I’ve never really worried about Black Swans, before nor after learning the meaning of the term. Nothing new about the idea. If a fella worried about Black Swans, he’d never climb into a danged ol’ race car, ’cause that wall never gets softer. Of course, one way of looking at it is that eating concrete is a 100% probability if you stay with it long enough. Mario Andretti made the definitive comment. “There are two kinds of drivers at Indy. Those who have hit the wall, and those who will hit the wall.”
Tony says:
July 4th, 2010
8:56 pm
Folks,
The chances of winning the California lottery is about 1 in 175 million. It’s a combinatorial problem. The number of ways you draw 5 numbers from 56 is 56 choose 5, or
56 x 55 x 54 x 53 x 52 / 5 x 4 x 3 x 2 x 1. or about 4 million. So, the probability of you picking 5 winning numbers out of 5 drawn is about 1 in 4 million. Additionally, you must pick the “mega number” from the numbers 1 – 46, so multiply the 4 million by 46.
Interestingly enuff, before they had the mega number, it was interesting to watch the volume increase as the expected value became positive. Hence, if the probability was 1 in 10 million and the prize was 11 million, the number of tickets went up dramatically. It was “interesting” because although the expected value was positive, as more and more folks bought tickets, the probability of SHARING the prize went up.
If you could have been assured that you wouldn’t have to share the prize, it would have paid you to purchase enough tickets to complete blanket the output space and thus assure yourself of being a winner.
Linda Brady Traynham says:
July 6th, 2010
4:54 am
Happy sigh. Ya’ll have hit critical mass and you don’t really need me at all! Particularly not when Tony opens the ball by dancing with the princess. I do believe in betting…on our intelligence, skills, drive, and knowledge. Not on luck or any game the house controls. Will it hurt if you buy one lottery ticket every time you fill the gas tank? No, so long as you only do it then and realize that you almost certainly will not win and the chances are you won’t remember to see if you had the winning number. If it makes you feel better to make a small “wild cat” bet occasionally, do so. Just take a look at those who buy lottery tickets by the handful. The lottery is a vicious racket that hurts those at the bottom of the socioeconomic heap the most for the very reasons they are there: not being able to tell reality from fantasy, wanting something big for not very much, not being able to figure odds, and not reading the small print that says scratch off tickets will continue to be sold even after all the prizes have been awarded. Almost all of us here at the Ring bet on ourselves and the rest are working towards it. Dawn is breaking and I’m looking at the Pasture Art. They may never make a dime, but they don’t have to. I can afford to feed them and I love looking at them. They WILL earn their keep, as anyone (Rat, I’ll bet) who has ever tried to move a steer or two on foot or in a truck from one pasture to another when he doesn’t want to go can tell you. This is an easy task for a couple of men on horseback. Our horses are what is called “cow-y,” which means they understand cows, what their jobs are, and how to do them. Animals crack me up. Irish Brook, 2 1/2, ready to start working next year, is teaching the two fillies how to move cattle! “See, a gentle nip right here at the root of her tail, don’t draw blood, but that gets her attention…now block her so she can’t turn away, that’s it, you’ve got her started,Glenlivet, good, smart girl!” It is IN them to herd cows, just as it is bred into certain dogs to herd sheep, cattle, or goats. Some days Hank tries to herd chickens because everything else is already being moved. This activity is not a success. Neither is herding cats. The parallel is that all of us are working with amounts Obama and Hillary would scorn, but we’re doing things with our money that give us pleasure now and will pay off big “if.” Lynne is an inspiration to us all with her knack for thinking of better ways to prep. In a TEOTWAWKI world a footlocker full of matches will become very valuable. There are so many little things we can accumulate at a few dollars a time. So much knowledge is available now that will disappear when the Internet does. Even if we don’t have anarchy, what we learn will get us through depressions, and I’m preaching to the choir instead of going to bed like a sensible female. Hugs, proud of you all, Linda
Tony says:
July 8th, 2010
7:50 am
Linda,
Had a guy tell me that the lottery wasn’t selling tickets, it was selling “hope”.
Personally, I’m wit youse. I buy a scratch-off once in a while, but don’t mess with the lottery unless the value of the prize exceeds the probability of winning–i.e. expected value is positive.
In Calif, the division of money used to be 50% to winners, 34% to schools, 16% to overhead. That tells me that the payback percentage is 50%. In Reno (which I used to frequent) some of the slot machines are advertised to have a payback ratio of 98% or even 99%.
I had a storekeeper in Oregon tell me that he only made 5 cents on each ticket sold, that it was only 5% and it “wasn’t enough”. He said that the seller had gotten together and were going to demand more.
I asked, “How often do you turn over the stock?” He replied, “About once a week.” To which I responded, “My gawd, man, that’s 250% per year on a dollar ticket.”
He looked at me and said, “It isn’t enough.”
I felt like saying, “It never is.”
always,
tony