Testing, Testing, Testing
Wednesday, July 21st, 2010Author Tony Demaio
INTRODUCTION
Given the response to my probability post, let me pursue another topic which has received some notoriety over the past few years due to a considerable amount of corruption and misinformation. I’m sure most of us have heard the phrase “The teacher is teaching the test.” and/or “70% is a ‘C’, and 60% is a ‘passing’ score”.
In general, testing has one of two possible purposes:
1. Assessment
2. Prediction
In the case of assessment, we wish to determine the status/competency of a person/object on some task (e.g. driving, inserting a voltmeter into a house socket) or in some knowledge (or personality) area. The test score is of paramount importance.
In the case of prediction, the test score per se is of little interest and only has value in terms of its ability to forecast or predict performance. An example of such a score is the SAT score, where the only purpose of the score is to predict college performance. The SAT score per se has little use or interest.
Sometimes a given “score” has dual purposes. For example, high school grades are often important in and of themselves, and they may also be used to predict college success.
PART I
Suppose a mathematics teacher wishes to teach her second grade class “single digit addition”. If we “equate” the two problems 7 + 6 and 6 + 7 (commutative law for those mathematically inclined) there are exactly 55 different problems. I suggest to you that it makes perfect sense for the teacher to “teach the test”, then test the children on all 55 problems.
There are many situations where it is completely proper to “teach the test”–with an obvious one being the driver’s license test. The test consists of the skills we wish the driver to have–left turn, right turn, passing, etc. Thus, we teach those skills, then test those skills, and assess the competency of the student based upon his score.
Let us return to the second grade teacher. After successfully teaching the children single digit addition, the teacher then turns her attention to double digit addition. In this case there are about 5,000 distinct problems. Clearly it is impossible to test the children on each problem. In this case, the teacher teaches the concept, gives some practice items, then SAMPLES from the 5,000 possible problems and tests the children. Suppose the teacher decides to test using 20 problems. It is assumed that the 20 (randomly?) selected items from the 5,000 items represents a cross section of the entire space of 5,000 items, and can be used to PREDICT how well the students would do on the entire 5,000 items. Clearly, if the teacher taught the children only those 20 items they would score well, but the scores would not be representative of how the students would perform on the entire space of 5,000 items. The results would not be “valid” in terms of PREDICTING how well the student would do on all 5,000 items, but the score WOULD be valid as an ASSESSMENT of their ability to do THOSE 20 ITEMS.
That same rationale applies to most testing used for assessment purposes. If the test spans the space of concern (knowledge, skills, etc.) it is entirely proper to “teach the test”, then test the space of concern. If the space of concern is too large (e.g. consider the history topic “Civil War”) then the items used in the test are assumed to be a sample/subset of all the items one could ask about the Civil War, and it would make little sense to teach only those items that are to be tested. Clearly, the “worth” of such a test score will also hinge heavily on the quality of the items that were sampled/selected to “represent” the total space.
PART II
Let us take a math test:
1. 6 + 7
2. 7 – 5
3. 8/4
4. 13 + 12
How did you do?
Let’s take another test:
1. Integrate the function sin(2x) dx
2. Solve the differential equation 6x + 3/x
3. A bag contains 9 red and 6 white marbles. What is the probability of drawing
2 white marbles and 3 red marbles if 5 are marbles are drawn at random?
4. Find the tangent to the curve y = cos(x) at 30 degrees
How did you do?
Clearly, I can “adjust” the percentage most folks will get on a test simply by making the test more or less difficult. To state 70% is passing is really quite arbitrary and subject to the whim of the teacher who can make the test “easy” or “hard”.
We must ask the question, “Why are we testing?”
Suppose we have three scholarships to give out and 100 applicants. Our desire is to give the scholarships to the three best students. In order to determine who those students are, we decide to test all 100 students. In such a case, we would give a “very hard” test where most students would “fail”. We really don’t care about the “bottom” 97 students, we wish to “spread out” the “top students” so we can distinguish among them. We would wish the “normal curve” to be “pulled to the right”, or positively skewed. If we gave an “easy” test, it is entirely possible that 10 students would get “perfect scores” and we would not be able to differentiate among them using the test score.
Alternatively, suppose we wish to find those students who are candidates for “special education”. In such a case, we would make the test “very easy”, such that most students would get “high scores”, but the slower children would have difficulty. We would wish a “negatively skewed” curve, with the “left tail” pulled out and elongated.
Strangely enough, if we wish “maximal discrimination” along the entire spectrum or range, the optimum “difficulty level” (percentage of people answering correctly on an item) is 50%. Such a test would tend to a “normal distribution” or “bell curve”. As such, the average score on a test that maximally discriminated is 50% correct. In everyday language, of course, this would mean that the person who received an average score would also receive an “F”.
Related posts:

Linda Brady Traynham says:
July 21st, 2010
7:58 pm
Tony…my life has long been joyous, but two new De Maios in one day? An incredible treat. See C. Northcote Parkinson on constructing a test so that only one candidate, the right one, emerges.
Can you imagine a bigger hell than KNOWING you are the token whatever you are, there only to satisfy racist regulation? Knowing, as well, that if you’re good few will believe it, and that if you’re hopelessly out of your depth you pull down the rest of your group?
Loved your math tests. Big hug, Linda
Tony says:
July 21st, 2010
8:12 pm
Linda,
An interesting irony is that the better you are at something, the fewer people there are that are ABLE to appreciate it.
As one guy said, “Suppose you were talking to Einstein and he was hit by lightning and it doubled his I.Q. How would you know?”
I found it to be true in gubbermint that, in general, the whole group has been dumbed down so far that a true incompetent is a “star”. Such a group will “drive out” true competence. At IRS research, the guy sitting next to me was taking college algebra so that he would have the prerequisite to take basic stat. He was making $70k.
Wot’s the old saw? “In the valley of the blink, a man with one eye is king.”
always,
tony
Desertrat says:
July 22nd, 2010
7:29 am
Moving to the Modrun Merkin* Werlt, this whole testing of school kids deal took on a life of its own after comparative tests showed how poorly our kids stacked up against other countries’ kids.
So, test scores became more important than actual education. That set the priority for the whole public school system and we see the result. The real-world solution is quite easy, but damfino what any politically-viable answer would be…
‘Rat
*From time to time, LBJ’s “My fellow Merkins…” comes to mind.
Oldmanriver says:
July 22nd, 2010
8:16 am
I took your tests Tony, first one I did pretty well. Second one….well lets just say I should probably be taking the college algebra coarse as well. I havent done any math like that in years and I have apparently forgotten all of it. This reminds me of some conversations I have with some Russians friends I have in Latvia. They described the testing that was done in their education. They dont believe in the multiple guess tests that we have here. Most tests were oral. At the end of the class they were required to write a thesis and defend it in person in front of a panel of experts.
Tony says:
July 22nd, 2010
10:02 am
Rat,
LOTS of reasons for the state of the public schools.
1. Affirmative action. At one time, the “better” women went into nursing and teaching. With affirmative action, those women are now doctors, lawyers, corporate prex, etc., leaving the teaching profession to “lesser” folk. Don’t that give you a warm fuzzy feeling about going to the hospital?
2. School milieu. One of the most (if not THE most) predictive measures of student achievement is the school culture. Given a middle class school with everyone going to college, the kids perform. Given a lower class school, kids find other goals. It is well known that when two cultures “meet”, invariably the more vulgar one “survives”. We destroyed many middle class school cultures with forced integration, which destroyed the city school systems. As the the whites fled to the suburbs, we destroyed THOSE schools with bussing. In our rush to “integrate” blacks and whites, we neglected to realize that we were also “integrating” poor and affluent. Any sociologists can tell you the vast differences between the two cultures (middle class, lower class)–ranging from how they value education to how they value drugs. (See my note on “Civil Rights, Civil Wrongs, etc.”)
3. At one time, grades meant something. NOW, it appears that EVERYONE gets and “A”. I was in a faculty meeting back when I was young and foolish and a teacher. We were discussing grades. (At this time, looking back, I can see such a discussion was pointless since we never did define what a grade means.) One lefty teacher said that if she had a student that was “really trying” (as if she could tell), she would give him and “A”. If she had another student that wasn’t trying because he didn’t have to, she might give him a “C” or even “D”, even though he had perfect papers.
Much like the “easy test for scholarship” (above), when everyone gets and “A”, grades become meaningless since there is no differentiation. Same thing with degrees. At one time, a high school diploma was worth something. Then everyone had one, so the AB became the criterion. Then everyone had one of those too. New York City schools decreed that “everyone will have an MA”. Everyone got the MA–which cut the guts out of the MA since no one was gonna kick someone out of the program and cost them their job. The PhD has gone the same way–same as publications since there are thousands of journals that will publish anything. (Many folks write one paper, publish it, change the title, publish it again, change the title…)
I’m sure other folks can come up with many other reasons. Basically, it’s gonna take YEARS to recover–if we can. Funny thing, but folks actually say that all we need to do is pay teacher more. I don’t quite know why you should get DIFFERENT teaching simply because you pay more for it.
rebel without a job,
tony
Tony says:
July 22nd, 2010
10:23 am
OMR,
Don’t worry about it. All those problems were from college level math/stat classes.
Multiple choice tests are quite interesting, taken as a topic independently of any use.
They are easily scored and objective, if you define “objective” as two different scorers who score the test will get exactly the same score except for clerical errors. An essay test might receive ANY grade depending on the expertise, bias, etc. of the person reading the test. Also, while a computer can score an objective test quickly, it takes much time to read an essay, and is quite expensive. Oftentimes, on “important tests”, two or even three people will read the same essay and the scores will be averaged.
Experiments have shown that a properly constructed multiple guess test will correlate highly with essay tests on the same subject. Whatever the essay test is measuring, the multiple guess test is measuring the same thing.
From a personal point of view, the major attraction the multiple guess test has for me is the wide sampling of items that can be tested. It takes about a minute to answer a multiple choice question. It takes quite a bit longer to write an essay. If it takes 10 minutes to write an essay on a topic, you can ask ten different questions on the subject in that time, and direct the questions to specific aspects of the question.
Hence instead of asking someone to write about the American Revolution, you can ask 10 different questions about various aspects of it.
The major problem with multiple choice tests is making up good questions. Hence, many tests are used year to year. Unfortunately, the questions get compromised and the tests become useless for their intended purpose.
It is not well known, but the SAT consists of three sets of items.
1. 1/3 are new items that are being tested. They are not counted.
2. 1/3 are new items that are incorporated into the test–items which survived last year’s cut.
3. 1/3 are “old” items that allow the new items to be calibrated to the old tests so that the test scores can be equated year to year.
Oral tests are “O.K.”, I guess. Again, the results depend upon the skill and knowledge of the people doing the interview. If there is more than one panel, there is no way to equate the scores of two applicants that faced different panels. For large groups of “applicants”, it is impractical in terms of “fairness”.
Not a simple yes/no situation. In many cases, combining several types of tests can be fruitful. A multiple guess test as a screening, then essays to further screen, then oral panels. That’s usually too expensive for most situations.
always,
tony
Oldmanriver says:
July 22nd, 2010
10:48 am
What I always liked about multiple choice/guess tests was that the answer is there in front of you. I always figured I could usually pass pretty much any multiple choice test even from classes that I never took just because there is a strategy involved in taking them. I remember one course where the test consisted of one question and 3 pages of blank paper to answer on. That tested my knowledge of the subject much more than any multiple choice test,but yes using different methods is probably the best way to go about doing it. LoL btw I took college level math/Stats classes lol thats the shame of it all.
Tony says:
July 22nd, 2010
11:55 am
OMR,
Pssst. Don’t tell anyone, but I probably couldn’t solve those problems myself–and I taught the stuff.
I used to take one day out when I was teaching high school and tell the students how to take a MC test. I knew I was successful when a kid asked at the end of the lecture, “Mr. DeMaio. Shouldn’t you read the question and try to pick the right answer.”
USUALLY, the answer is right in front of you. I used to hate the choice “none of the above”.
I was in the faculty lounge when a fellow teacher was making up a MC test. I asked if she was randomizing the order of the answers. She said, “No, I pretty much move them around so there is no particular order.” I said, “I don’t think so. It is impossible psychologically for you to do so.” She said, “Bull. I tell you I do it.” To which I replied, “Let’s test it. You tell me the right answer and I’ll tally the results.” She said, “O.K. and read off the answers.”
I forget the exact numbers, but it was something like “a” occurred 10 times, “b” occurred 20 times, “c” occurred 2 times, and “d” occurred 1 time. One of the things I used to tell my kiddies was if you have a teacher that gives multiple choice tests, tally the number of times each alternative occurs as the right answer when the teacher goes over the test. Then, on the NEXT test, if you don’t know the answer, you have a good guess.
Here’s an utter won. Here’s a math question. xxxxxxxxxxxx
here are the answers:
a. 1 + sqrt(2)
b. 2 + sqrt(2)
c. 1 + sqrt(3)
d. 1 – sqrt(2)
The answer is most probably “a”. If you analyze the pattern, you notice that “1″ appears 3 times, “+” appears 3 times, and “sqrt(2″ appears 3 times.
always,
tony
Steve Foste says:
July 22nd, 2010
4:47 pm
Statistics for the non statatition, did I spell that right. Anyway, no wonder the numbers out of washington never add up. Spin em any way you can to your favor.
James the Wanderer says:
July 22nd, 2010
5:20 pm
The ability to use mathematics at all separates the ignorant or stupid from the capable.
The ability to use mathematics to solve new problems separates the capable from the gifted.
And the ability to create mathematics for new applications separates the gifted from the genius.
Please don’t ask where I fall on the scale – my ego is too fragile.
Cheers! Great article.
james
Tony says:
July 22nd, 2010
9:09 pm
James,
Many thx.
I used to tell my kiddies (high school) that in terms of “making it” in this world, that Algebra wss the most most important course they would take. In terms of a series of courses, 4 years of English was probably more important than 4 years of math.
I told them that when they apply for a job, the company is gonna throw a test at them that consists of half English and half Math. The results of that test will probably determine if they put you behind a desk or behind a broom.
When I came back into this country from an overseas trip, I got into quite a bit of trouble. They caught me with a calculator, compass,protractor, and straight edge. The figured I was a member of the terrorist group ALGEBRA, and I had weapons of math instruction.
rebel without a job,
tony
Cheri says:
July 23rd, 2010
1:44 pm
Ok Tony, no recycled jokes please!
Years ago a test described as an 8th grade graduation test from the first quarter of the 20th Century circulated on the internet. (I can’t find a copy) The knowledge required to pass the test spotlighted clearly the “dumbing down” of the modern education system. Much more logic, math, science theory and grammar were taught and expected to be comprehended.
I couldn’t remember much about the second test questions even with having taken and passed (not just by social promotion) Algebra II and Pre Calculus in high school. Repeated in college and took Calculus I and II and Differential Equations. (This was before they let you use calculators, slide rulers were being phased out and I MANUALLY graphed more slopes, functions and parbolas than students today can imagine.) I just don’t use it anymore. Diffy E-Q (Fourier Transforms, FFTs etc.) was conceptually useful when I was woorking on MRI, but not necessary. Computers do all the work. My job went from “Understand this so you know what to do.” to “Do this and don’t ask questions.” I have often said “I am just smart enough to know I am not smart enough.” Which is kind of a stinky place to be. Where you see something on the horizon but just can’t get it into focus.
James’ comment regarding talent is very true(observable yet unappreciated)and if you throw in Tony’s “It is well known that when two cultures “meet”, invariably the more vulgar one “survives”.” you get a nice summation of society today. I like to remind my friends who many of the politicains were in high school. Most were the popular kids who got B’s and C’s in their “Core” classes, but could talk their way into anything, deserving or not.
Cheri
Tony says:
July 23rd, 2010
3:40 pm
8th Grade Final Exam: Salina, Kansas – 1895
http://maggiesfarm.anotherdotcom.com/archives/2333-American-Education,-1895.html
Desertrat says:
July 23rd, 2010
4:15 pm
In 1948 in Austin High School, the choice was made as to whether a kid would or would not go to college. If college, emphasis on foreign language, higher math and literature. If not, trade courses were a bit over half the curriculum. Auto mechanics, vocational agriculture, wood working, metal working including welding, etc. Compare with today’s lack of availability of practical courses and you can see why kids who just aren’t college material get bored, cause trouble, and “graduate” from high school in a condition of societal uselessness.
The Graduate Record Exam is multiple choice. I did okay. A score of 1200 was the minimum to be accepted at UTex grad school in 1966. 1250 to get into Mensa. The Princeton folks work pretty hard to maintain the validity of that little doofer, for sure.
I was startled at how little higher math was needed in automotive, electrical-generating and civil engineering. And I’d busted hump to do well in advanced calculus–but never used it! Lots of algebra, trig and plane geometry, though.
Being good at trig and physics helps one’s pool-shooting, for sure.
A practical and remunerative body of knowledge!
Nuff fer now…
Cheri says:
July 23rd, 2010
4:42 pm
Tony,
Thanks for the link. I must have seen the West Coast version, cuz I don’t remember it being as tough as that version. ‘course I was much younger, had fewer kids and got more sleep. : )
I nearly laughed out loud at this part thinking of how different it would be answered today by Gore types.
Geography (Time, one hour)
1. What is climate? Upon what does climate depend?
2. How do you account for the extremes of climate in Kansas?
3. Of what use are rivers? Of what use is the ocean?
What? Climate effected by geography?
Rat, advanced math does give you many “hidden” talents….
Cheri
Tony says:
July 23rd, 2010
5:57 pm
Cheri,
It is not well known or accepted, but the cause of global warming is quite obvious.
As you know, the earth spins on it’s axis.
Also as you know, we are taking all this oil out of the earth.
What is not appreciated is that the purpose of that oil is to lubricate the earth’s axis. As we take more and more oil out of the earth, there is less available for lubricating/cooling the axis. As such, a tremendous amount of heat is being generated, and rising to heat the earth’s surface.
Also, as is well known, the angular velocity of the earth at the equator is about a thousnd miles/hr. I have become quite concerned that if we take enough oil out of the earth, there will not be enough oil to lubricate the axis, the bearings will “freeze” and the earth will stop spinning and grind to a halt. Folks at the equator will be hurled into space. Folks at the north/south poles shouldn’t be affected very much. Those of us “in between” should take precautions. I have purchased two large ship anchors and anchored my house, in addition to tying it to some trees.
Forewarned is forearmed.
always,
tony
Tony says:
July 23rd, 2010
7:33 pm
Rat,
When I went to high school, the chairman of the industrial arts department had an “in” with the local unions. The school had sheet metal, electrician, wood working, auto shop, etc. Many students went right into the trades from high school, and did very well.
What is not often appreciated is that schools are comprised of college “educated” people who “know the value” of a college education and sort of “look down” on the trades. Consequently, when the cuts came, the entire industrial arts program was cut. It was a terrible waste and a horrible mistake–if you were interested in “the kids”.
always,
tony
Cheri says:
July 23rd, 2010
11:36 pm
Tony which of the above equations did you use to determine that
“two large ship anchors and anchored my house, in addition to tying it to some trees”
would be sufficient? What type of trees and how deep are there roots?
Actually I am having Déjà vu. Is that idea taken from an earlier article of yours? Maybe one on the promotion of false arguements or something?
Cheri
Cheri
Tony says:
July 24th, 2010
7:22 am
Cheri,
I used that same logic in a paper I wrote on “Global Warming”. I don’t think I published it (maybe it was rejected). However, you being my #1 girl friend (please don’t tell Linda–she thinks she’s #1), I may have sent you a copy.
I personally developed the equations by modifying those used to describe parallel universes, black magic, and astrology. The trees were three 6 year old nutty pines–typical root structure, but I reinforced the bases by pouring epoxy around them.
Now that this is published, I suggest you immediately purchase boat anchors before there is a run and the price goes up.
always,
tony
p.s. Another cause of global warming is cell phones. As is well known, when two objects rub against each other, they cause friction, which causes heat.
Also, the probability of two objects “bumping into one another” goes up as the SQUARE of the number of objects in that space.
If you think about it, all the radio waves, (non cable) t.v., ham radio, radar, etc. send electrons into the air and they rub against each other. The environment could tolerate that, but with the addition of cell phones, it became too much and critical mass was achieved. All those electrons bump against each other causing too much friction and the earth is not capable of dissipating the heat.
If THAT sounds familiar, you probably read the paper.
Linda Brady Traynham says:
July 28th, 2010
7:46 pm
Dang it, Tony, send submissions to Michael! You KNOW things get lost over here and I’d never turn down a Tony De Maio! I lust, yearn, to know what you think about global warning.