Asking Entirely Too Much From Testing

Matthew Nisbet has a comment about interpreting the public's understanding of science over at Framing Science
Consider what a split-ballot comparison in a 2004 University of Michigan survey revealed about the nature of responses to these long standing questions about evolution. In this survey experiment, one half of the sample was asked the following traditionally worded question:

True or false, human beings as we know them today, developed from earlier species of animals.

When asked this way, 42% answered true, a result that has been incredibly consistent across surveys since 1985.

The other half of the sample, however, was asked a slightly different version of the question:

True or false, according to the theory of evolution, human beings as we know them today, developed from earlier species of animals.

When asked this way, 74% answered true.

The implication is that context matters: Americans are not ignorant of what science says about human origins, in fact, as the second version of the question reveals, 3/4 of the public are familiar with the scientifically correct answer.

I believe it may be that the issue demonstrated by the two versions of the question is not one that allows a metric to be constructed.

What it does point out, is an important limitation in questioning people about anything at all. Questions are likely to be construed differently by a variety of responders due to differences in their perception of the semantic underpinnings of language and their personal learning history separate from the facts or belief systems. Time also plays a significant part. As people are introduced to a new concept, they put it in context with something they know. We understand by analogy which is limiting, but effective in helping make decisions.

But when someone asks us a question about it, the analog can either get in the way, or facilitate. Two people can be at a similar point on the path to comprehension, but react to a question differently.

As soon as you start asking questions using English, the variety of correct answers widens in the subject though the perception of the questioner does NOT. If a thought generated by the question can take you down more than one path and still be correct, the question will be less accurate in terms of what it beings back to the questioner. A second ambiguity may destroy the accuracy of a certain percentage of responses. And if the survey is littered with these, statistical correlation will not generate a warning.

The question's statement on lineage suggests to me that humans evolved in a linear way and it makes me uncomfortable to say that it is unambiguously correct. I just don't like it because I would never frame a statement using those words. Thus I am tempted to say "No." What is more, I may have never actually passed the first question through the filter if it had not contained the hint "according to the theory of evolution." That addition is a billboard telling my consciousness to stop and think about my understanding of the "theory of evolution" and wonder if the questioner is asking about current understand on natural selection.

If you fall back on definitions, as most scientists do, you fail to understand the nature of language and the nature of human cognition. The fact is that people generally don't parse questions carefully. Scientists do, but they aren't normal people in the sense I'm talking about. Scientists have trained themselves to pass any statement through a series of semantic filters and knowledge frameworks prior to admitting the question to consideration. Thus, Nisbet sees the question as presenting natural selection so that the appropriate answer will follow a particular logic pathway. I feel that it is not particularly accurate to say humans will subject any given question that includes sufficient ambiguity to any predictable logical pathway.

We may never know how many people fail to parse the question at all in terms of differentiating the framework of science versus faith because we desperately need to think these types of questions give us answers.


Social Production Models vs Prescription

I was listening to Yochai Benkler's interview with Russ Roberts on his podcast discussing regulatory frameworks for national infrastructure and realized that conservative arguments often include a call to principles for guidance whereas researchers demand examination of evidence.

Roberts is a libertarian and Keyensian economist who makes an assumption that regulation is always going to stifle innovation. But Benkler proved by using evidence between 2000 and 2010 that lack of regulation has taken the U.S. from first or second to about fifteenth in network speed, access, and cost.

In education, we are faced with a similar situation. Probably not for the same reasons. Unfortunately, education reform has a history of panic and prescription.

Careful examination of evidence can allow us to create a community from which to foster change. It is our ability to create community, the freedom to do so, that is important. Not restriction, but freedom that creates the ability to innovate.

In a highly restrictive environment, any tiny move can be seen as a huge one because we are focusing on conflict rather than cooperation. These tiny moves have no cumulative effect because they are not mutually reinforcing.


Accountability and Measurements of Effectiveness

You could also call this "The Mismeasure of Teaching" - My thoughts on evaluation of work product. You have two kinds of work: 1. Easily-measured productive work and 2. Creative work that is hard to measure.

Typical evaluation techniques such as checklists are used for production. Counts of assignments, distribution of test grades, presence of "essential question" on whiteboard, presence of vocabulary related to lesson on wall, number of students off-task over a quarter hour, actions taken by staff, presence of discussion, student evaluations - typical list-based management of the outward and visible signs. But is it possible to use the theological next step? Signs of the inward and spiritual state? Does it follow logically and can it be reliable?

For creative work, Yochai Benkler's Wealth of Networks chapter on Economics of Social Production - mentions the inherent difficulty of measuring a distinction between quantity of labor and quality of labor when it comes to creative output. A widely based traditional model of production can't produce an increased qualitative output is his conclusion.

If you look at The Structure of Educational Organizations by John W. Meyer & Brian Rowan, 1978 pp 79 through 109, Meyer et al identify lack of evaluation procedures as resulting from a lack of will-to-measure and consequent reliance on credentialism.

Benkler hypothesizes peer review as an effective method of measuring creative work in a collaborative environment. This may be networked and collaborated in an environment outside work, so internal measurement by a checklist will miss it completely. Peer review takes into account publication, and involvement in publication, workshops, symposia, production and collaboration with persons outside of the institution.

To the extent that teaching is simple production, evaluation using simple metrics should suffice. But to the extent it is creative, the checkoff sheet will fail measure creative and networked production.

I believe management technique may identify creativity improperly in the k-12 environment as "enthusiasm". So while it is being taken into consideration, it is mismeasured. Another problem is professionalization. To the extent lawmakers