Kids These Days: Is Our Learning Measure Valid?

ResearchBlogging.orgKevin Drum has done a couple of education-related posts recently, first noting a story claiming that college kids study less than they used to, and following that up with an anecdotal report on kids these days, from an email correspondent who teaches physics. Kevin’s emailer writes of his recent experiences with two different groups of students:

Since the early 1990’s, I have pre and post tested all of my introductory mechanics classes using a research based diagnostic instrument, the Force and Motion Conceptual Evaluation. This instrument is based on research by Ron Thornton at Tufts that identified a reproducible sequence of intermediate states that all people seem to pass through in the process of gaining a Newtonian understanding. So it can give me not only a do they get it/do they not measure, but also, along several conceptual dimensions, a measure of how close they are to getting it.

My first job out of graduate school was at an unranked tier 4 institution in Myrtle Beach, South Carolina. Coastal Carolina “University” to be specific. It was the 13th grade. […] I pretty reliably got 50-60% normalized gains on the FMCE.

Normalized gain is the ratio of how much their scores increased compared to how much they could have increased — (post-pre)/(100-pre). 50-60% is actually pretty stupendous on this particular measure. It means they were typically getting 80-90% of the questions right.

[His current employer] Spelman [College, in Georgia] is a top 75 liberal arts college, according to US News, and top 10 according to the Washington Monthly. My personal impression of the students is that the average is generally much higher than it was at Coastal. These are students who can think around a few corners.[…]

I think I’m at least as good an instructor as I used to be, and probably a lot better. I know quite a bit more about developmental psychology and cognitive science as a result of my job at Georgia Tech and I think that improves my instruction considerably.

And yet, in a good year I get about 20-30% normalized gains.

I don’t really know what is different but something clearly is.

I have seen a few comments about this questioning the validity of “normalized gain.” The argument is, basically, that if you start with students who know nothing, it’s easy to teach them quite a bit, but if you start with students who already know quite a bit, it’s difficult to raise their scores significantly.

This is true if you’re talking about absolute gain, but normalized gain is supposed to take that into account. That’s why it’s a fairly standard measure used by the physics education research community to compare instructional methods across courses and institutions.

The concept of “normalized gain” as a general pre/post test measure goes way back– I’ve seen references to papers from the 1940’s. Its application to physics really starts in the 1990’s, with the key reference being this 1998 paper by Richard Hake looking at test scores from 6000-odd students in introductory physics courses at a variety of institutions (using a slightly different test than the one cited in Drum’s post, but the results are pretty robust). The class average pre-test scores range from around 20% to around 80%.

Hake plots the data in a slightly funny way, shown in this figure:

i-207e9ae4e4e994028b328ee8bcc8d425-HakeGainGraph.jpg

This is a graph of the absolute gain (that is, the increase in the percentage score from pre-test to post-test) as a function of the pre-test score. As you would expect, this shows a clear downward trend as you move to higher pre-test scores.

The diagonal lines on the graph are lines of constant normalized gain. That is, all points on the lowest solid line have a normalized gain of g=0.23. As you can see from the data, the points associated with “Traditional” courses (standard professor-lecturing-from-the-front-of-the-room courses, represented by shaded points) tend to cluster along that line, whether they were taught in a high school, college, or university setting. Points associated with “Interactive Engagement” courses (any of a variety of reform instruction methods in which students do more group work than note-taking) have a higher spread, but if you draw a line through the middle of the group, you get a decent fit from a normalized gain of 0.48 (the second solid line).

This suggests that normalized gain scores are correlated with instructional method, and not so much with the incoming student knowledge. Hake did the obvious statistical test, and the correlation coefficient between the normalized gain and the pre-test score is only +0.02, which is pretty negligible compared to the correlations of +0.55 for the post-test score and -0.49 for the absolute gain.

That’s just one article, though, and a somewhat self-selected sample by a guy with an agenda. Maybe there’s some correlation to be found in a different study. And, indeed, there is, in this 2005 paper, which looked at correlations between normalized gain and pre-test scores at a range of schools: Loyola Marymount, Southeastern Louisiana University (hey to Rhett), the University of Minnesota, and Harvard University. For all of these schools but Harvard, they found a correlation between pre-test score and normalized gain, as shown in graphs like this one for Loyola Marymount:

i-b5d0d719d496dcb34046272c43aab678-FCIScoresLMU.jpg

The top graph shows all of the individual student normalized gain scores plotted versus pre-test score, while the bottom graph shows the scores averaged within 17 bins. There’s a clear correlation to be seen, but it’s a positive correlation– that is, students with higher pre-test scores are likely to see higher normalized gains than students with low pre-test scores. This is the exact opposite of the obvious argument against the anecdata provided by Kevin’s emailer.

So, does this mean that kids these days are dramatically worse than they used to be? Not necessarily. The ellipses in the long quote above include a lot of material that would call into question any conclusions based on the assumption that the classes at Coastal Carolina and Spelman were truly identical in the way they would need to be for these scores to be meaningful. It does strongly suggest, though, that the change cannot be explained away as an obvious effect of starting with smarter students.

(Both of these papers can be found as free PDF’s with a little Googling, if you would like to read the source material directly.)

Hake, R. (1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses American Journal of Physics, 66 (1) DOI: 10.1119/1.18809

Coletta, V., & Phillips, J. (2005). Interpreting FCI scores: Normalized gain, preinstruction scores, and scientific reasoning ability American Journal of Physics, 73 (12) DOI: 10.1119/1.2117109

14 thoughts on “Kids These Days: Is Our Learning Measure Valid?

  1. Incidentally, I should note that both of these graphs are presented terribly, in my not-at-all-humble opinion. I don’t have access to their raw data, though, and I’m not committed enough to go through and count pixels to reconstruct the data and make new plots that wouldn’t make Baby Tufte cry.

  2. I used to keep track of one of the first concepts with my high schoolers. Distance/velocity/acceleration was taught with equations of motion and no graphs, or with velocity/ time graphs and no equations. The velocity/time graph crew always did better on the post test. After that, everyone got the combined treatment for a problem set, and they were on their own.
    The relationship of slope and “rates of change” was brought into class at this time, just in case anyone wanted to talk about calculus.

  3. If you love your daughter as much as I love my 4 kids and 2 dogs, you will not send her to American public education.

    A PhD in Mathematics is the hardest to obtain, a PhD in Physics the 2nd hardest, and a PhD in Education the hands-down easiest. Many Masters degrees are easier to obtain than a doctorate in Education. Stay away from these people. It will cost you, but it will be money well spent.

    From one loving husband, Dad, and dog owner to another, I mention this because I root for you and your lovely wife to give your lovely daughter the absolutely best possible education.

    You are a good and busy man, but as you teach and cool atoms and try to explain to Emmy why she will never catch a squirrel, you may not be aware that there is an organization called the the Alliance for Separation of School and State. They’re NOT a far-right wing uber-Conservative crackpot bunch, though of course they attract such people. The Alliance actually makes good points. It’s an 1835 thing, as you’ll read.

    Kids, Chad! They grow up SO fast! It was just yesterday my oldest son was born (1989), and next May he will graduate with his bachelors in Biomedical Engineering! Before you know it, you will be walking Steelykid down the aisle, and giving her to the 2nd most important man of her life, about 70 years before the first Fusion Energy plant is up and running, an event she may eventually live long enough to see, and hopefully, with YOU as her Dad, contribute to.

    Onward and upward, bro.

  4. Oops, sorry. I meant to say that most Masters degrees are HARDER to obtain than a PhD in Education (I type too fast), and that goes double for “Educational Psychology.” Ignore that field. “Ad hoc” isn’t nearly sufficient to describe it.

    As an example of how far we have fallen since 1835, take this test:

    Could You Pass This 8th Grade Exam from 1895?

    Forget the SOL’s (Standards of Learning) tests in public schools, let this be the standard: Imagine a college student who went to public school trying to pass this test, even if the few outdated questions were modernized. With fluency and agility we could do it. We could get Americans, student and professor alike, back up to the 8th grade level of 1895!

    In 1895 the 8th grade was considered upper level education. Many children quit school as soon as they could master the basic fundamentals of the 3 R’s (reading, writing and arithmetic). Most never went past the 3rd or 4th grade. That’s all you needed for the farm and most city jobs.

    Child labor laws were not in existence yet. Additionally today’s education has much more focus on technology and sociology than the grammar and geography of old. It’s a different world with different requirements and capabilities needed to succeed.

    Could You Have Passed the 8th Grade in 1895?
    Probably Not…Take a Look: This is the eighth-grade final exam from 1895 from Salina, KS. It was taken from the original document on file at the Smoky Valley Genealogical Society and Library in Salina, KS and reprinted by the Salina Journal.

    =================================

    8th Grade Final Exam: Salina, KS – 1895

    Grammar (Time, one hour)

    1. Give nine rules for the use of Capital Letters.
    2. Name the Parts of Speech and define those that have no modifications.
    3. Define Verse, Stanza and Paragraph.
    4. What are the Principal Parts of a verb? Give Principal Parts of do, lie, lay and run.
    5. Define Case, Illustrate each Case.
    6. What is Punctuation? Give rules for principal marks of Punctuation.
    7 – 10. Write a composition of about 150 words and show therein that you understand the practical use of the rules of grammar.

    Arithmetic (Time, 1.25 hours)

    1. Name and define the Fundamental Rules of Arithmetic.
    2. A wagon box is 2 ft. deep, 10 feet long, and 3 ft. wide. How many bushels of wheat will it hold?
    3. If a load of wheat weighs 3942 lbs., what is it worth at 50 cts.
    per bu., deducting 1050 lbs. for tare?
    4. District No. 33 has a valuation of $35,000. What is the necessary levy to carry on a school seven months at $50 per month, and have $104 for incidentals?
    5. Find cost of 6720 lbs. coal at $6.00 per ton.
    6. Find the interest of $512.60 for 8 months and 18 days at 7 percent.
    7. What is the cost of 40 boards 12 inches wide and 16 ft. long at $20 per m?
    8. Find bank discount on $300 for 90 days (no grace) at 10 percent.
    9. What is the cost of a square farm at $15 per acre, the distance around which is 640 rods?
    10. Write a Bank Check, a Promissory Note, and a Receipt.

    U.S. History (Time, 45 minutes)

    1. Give the epochs into which U.S. History is divided.
    2. Give an account of the discovery of America by Columbus.
    3. Relate the causes and results of the Revolutionary War.
    4. Show the territorial growth of the United States. 5. Tell what you can of the history of Kansas.
    6. Describe three of the most prominent battles of the Rebellion.
    7. Who were the following: Morse, Whitney, Fulton, Bell, Lincoln, Penn, and Howe?
    8. Name events connected with the following dates: 1607, 1620, 1800, 1849, and 1865?

    Orthography (Time, one hour)

    1. What is meant by the following: Alphabet, phonetic orthography, etymology, syllabication?
    2. What are elementary sounds? How classified?
    3. What are the following, and give examples of each: Trigraph, subvocals, diphthong, cognate letters, linguals?
    4. Give four substitutes for caret ‘u’.
    5. Give two rules for spelling words with final ‘e’. Name two exceptions under each rule.
    6. Give two uses of silent letters in spelling. Illustrate each.
    7. Define the following prefixes and use in connection with a word:
    Bi, dis, mis, pre, semi, post, non, inter, mono, super.
    8. Mark diacritically and divide into syllables the following, and name the sign that indicates the sound: Card, ball, mercy, sir, odd, cell, rise, blood, fare, last.
    9. Use the following correctly in sentences, Cite, site, sight, fane, fain, feign, vane, vain, vein, raze, raise, rays.
    10. Write 10 words frequently mispronounced and indicate pronunciation by use of diacritical marks and by syllabication.

    Geography (Time, one hour)

    1. What is climate? Upon what does climate depend?
    2. How do you account for the extremes of climate in Kansas?
    3. Of what use are rivers? Of what use is the ocean?
    4. Describe the mountains of N.A.
    5. Name and describe the following: Monrovia, Odessa, Denver, Manitoba, Hecla, Yukon, St. Helena, Juan Fermandez, Aspinwall and Orinoco.
    6. Name and locate the principal trade centers of the U.S.
    7. Name all the republics of Europe and give capital of each.
    8. Why is the Atlantic Coast colder than the Pacific in the same latitude?
    9. Describe the process by which the water of the ocean returns to the sources of rivers.
    10. Describe the movements of the earth. Give inclination of the earth.

  5. What a useless education it would be if I spent my time learning the material on that 1895 test.

    I went to all public schools and came out with college credit for 2 semesters of english, a semester of spanish, two of calculus, one each of biology and physics, credit for both american and international comparitive history… and I went to predominantly minority schools with terrible marks on standard exams.

    ‘Alliance’ schools — at the supposed height of pre-collegiate knowledge kids are reciting to me “nine rules for the use of Capital Letters(sic)”, and the partial contents of Google maps I’d scream bloody murder.

    Only the mathematical questions seemed even remotely useful. There was little evidence of any analytic focus in the curriculum, focusing instead on rote memorization (which has little use today). Case in point that there are two sections on prescriptivist English usage, but not even a mention of science or scientific method.

  6. Why as a society do we want to look at normalized gain? Aren’t we really interested in the absolute gain? In terms of understanding physical models and the facts backing them up and the tools used to discover and test them (i.e. physical science) I don’t really care about rate, I care about level and retention.

    Absolute measures of learning to me are what would be more important, to the extent that any of these measures are useful.

  7. Why as a society do we want to look at normalized gain? Aren’t we really interested in the absolute gain?

    Researchers who study education look at normalized gain because it’s a way to compare students across institutions, and get some sense of how well an educational innovation is actually working. This is helpful because the places that have the resources to try new things are often institutions (like Harvard, where Eric Mazur has done a lot of work on physics education) which have student populations way out at the right end of the bell curve. Before some school with less money and weaker students makes a big investment in copying something tried at Harvard, they would like to have some idea of what sort of gains they can reasonably expect for their students. Normalized gain comes closer to providing that information than absolute gain does.

  8. Hey Dan, thank you very, very much for correcting me, and turning me on to that website. I owe you one, bro. On your other point, I happen to know 2 Education PhDs, the one a brilliant and good woman, the other a brilliant and evilish man. The man is the Principal of a public High School, the woman of a public Middle School (grades 6-8). They may well be on the Superintendent track, but there’s only so much room at the top.

    And thank you Chad for that succinct explanation of Normalized vs Absolute gain, and turning us on to Eric Mazur. Well done as usual.

  9. I’ll just add that my local public school system has, for years, been recruiting people with undergraduate degrees in math (among other subjects), including those who have been out of college for years. If someone has a degree in math or history, and decides to take a teaching job, the contents of the standard education curriculum aren’t terribly relevant.

    As for that 1895 test, there may be reasons for avoiding all of twentieth century history, expecting students to memorize obscure if not obsolete units of measure, and do quiz stuff like fain versus feign, but preparing for life in the 21st century isn’t one of them. (I say this as someone who values spelling and is good at it.) Sure, in 1895 you couldn’t be testing students on the First World War, the Cuban Missile Crisis, or the space race, but here and now, I’d rather people be learning about such things than about the battles of the War of the Rebellion. Causes and aftermath, yes, but that test could be passed by someone who knows the details of Second Bull Run but has no idea of why it was fought.

    Oh, and how hard a degree is to get says little or nothing about how useful it would be in a given context. A Ph.D. in biology is harder to obtain than a bachelor’s in accounting, but I know which credential I’d be interested in if I was hiring someone to do my taxes.

  10. @Steve Colyer:

    I agree to only a limited degree;

    The way they taught English was far more structured – phonetically breaking words into syllables, bases and modifiers, diagramming proper pronunciation, defining different classes of words and how they can be used to construct proper sentences, structuring thoughts into paragraphs and paragraphs into fully formed essays. ‘Whole language’ is a shambling horror in comparison.

    On the other hand, most of the math problems are pretty straight-forward, given some unfamiliar measures – 1 bushel of wheat fills 1.25 cubic feet and weighs 60 pounds, a ton is 2000 pounds, an ‘m’ is a thousand board-feet is 144,000 cubic inches, an acre is 160 square rods (4 rods = 1 chain = 66 feet, 1 acre = 1 chain x 10 chains). How many students today really need to know the weight of a bushel of wheat, or the number of grains in a dram, or how many pints to a hogshead? Switch to a rational measurement system, ie metric, and most of these bloody stupid memorized conversion factors simply go away.

    The biggest problem with elementary and high schools today is that their focus is on containment and socialization of children and only incidentally on education – my aunt, an elementary teacher, has been told in so many words that she is *not allowed* to fail any students, that being held back (for not knowing the material) would be too damaging to their self-esteem. If you kicked out all the kids who really aren’t interested in being there in the first place, discipline problems would drop to next to nothing, the remnants could focus on learning, and some of those expelled – after a bit of “life experience” – might decide they really wanted to learn after all!

  11. Hi Hugh, I’m not sure I see where you disagree with me at all, except it’s probably my fault for not explaining myself fully. I do however very much agree with what you said here:

    The biggest problem with elementary and high schools today is that their focus is on containment and socialization of children and only incidentally on education…

    Yes, and one casualty of that has been the suppression of the natural competitiveness of boys. If a boy even THINKS of acting male, the next thing you know he’s accused of having ADD or ADHD. I had one of those, you would not believe how quickly the psychologists and unnecessary medications are brought in when that happens.

    Psychology by the way, is very much a wonderful field of study, and by no means do I wish to disparage it. But it is so very, very young. It lacks the central themata of other fields especially Physics and Chemistry and their conservation laws. It gets better in leaps and bounds every day as more data poor in, but in the meantime it is overly politicized, and in improperly trained hands can be quite dangerous.

  12. I try to be laissez faire regarding topic drift in the comments, but this is getting a little too far away from the subject of the post, and drifting into highly charged topics. I would rather not host an argument about the current state of public education at this time, and I especially do not want to host an argument about gender effects in public education at this time.

    If you want to talk about these topics, do it at your own blog. If I ever put up a post inviting such things, you can talk about them in the comments there, but this is not the place.

  13. Thank you, Chad, for explaining the degree of your laissez faire-ness. I was really just responding to an intelligent comment by one of your guests, but you are quite correct that an idea based on an idea based on an idea, etc. and a day … if taken too far, disrespects the host weblogger, even if unintentionally so as in my case. So please accept my apologies and be sure your now well-defined bridge (3 sigma) will not be crossed. No Rubicon for me, thank you.

    GREAT reply to the whole tinier Proton diameter thing, btw. Keep up the good work.

Comments are closed.