Tzimiskes: On Interpreting Data

The Economist recently had a couple of blog posts about using data to support political arguments. In the comments there were at least a few allusions to the common notion that data can be used to prove just about anything.

This is true in only the most trivial sense. It is true that just about anything can be "proven"* by sufficiently restricting the data and by choosing exactly the right level of abstraction to use while discussing it. But doing this is just silly, human beings are pattern finding animals, we see shapes in clouds and Jesus on our toast for chrissake. Carefully cherry picking your data and choosing an arbitrary level of abstraction is equivalent to saying that you can see a face in a rock if you stand on your hands, angle your head just right, and squint with one eye. While this may perhaps give the appearance of a face, it compares poorly to a statue where you can tell that the rock is a face by sight, by touch, by measurements, and by many other approaches to the available evidence. The image produced by one particular method is likely random chance, the image that can be detected by multiple methods is far more likely to have been produced by intent and to be what it is said to be.

This basic approach is what is meant when it is said that an idea or theory is well supported. It is true that if data is carefully selected and an argument is carefully enough made you can make the data look like just about anything. This is the intellectual equivalent to needing to squint just right. However, it is almost impossible that a false image can be generated through multiple distinct approaches. An idea that is supported by comparative evidence, by modeling, that is independently arrived at by multiple fields (like economics, sociology, and political science all agreeing), that remains true under multiple different levels of abstraction, that remains visible under multiple different sets of assumptions, that has survived testing by multiple independent teams, etc. is like the face in a statue that can be detected by sight, touch, measurements, etc.. Multiple lines of evidence mean that something is likely true, and the more independent lines of evidence the more likely it is to be true. If an idea is instead only asserted by one group, uses an extremely restrictive set of assumptions, lacks agreement with other fields, uses a high level of abstraction, lacks independent confirmation, etc. it is almost always going to be BS.

Now, the contrast in the social sciences is rarely as clear as my above example. We're dealing with worn statues here, not perfectly preserved pieces. But keeping this in mind does give a rough and ready approach to the validity of various arguments. Generally, the better argument will be the one that is adding data to the discussion rather than trying to reject it,** the one that suggests new approaches, the one that uses less restrictive assumptions, the one that is less abstract, etc. Approached in this way it becomes fairly easy to assess the validity of various arguments being made with only a fairly cursory knowledge of the subject. Someone that has the confidence to suggest reading widely and deep and that can point you towards multiple fields to gather evidence is fairly likely right. Someone that denigrates other approaches and tries to tell you to use their interpretation or you're doing it wrong is probably wrong themselves.

Now, a separate issue is what we should do with the data once we've agreed on a rough and ready way of interpreting it and moved past the idea that all data is equal. But that is for another post.

* Of course, to a considerable extent it's wrong to look towards data or the scientific method to prove anything. The scientific method works primarily by telling us what's not true rather than what is. A lot of people seem to have a problem with this, preferring something more akin to medieval scholasticism where data is gathered to prove, rather than disprove, a what is already known. It is fairly evident from experience that the scientific method works rather better, it is more effective to show conclusively what is not rather than building too much upon what may turn out to be false premises. Of course, both policymakers and the public would prefer to be given some kind of set of rules to follow to a best outcome, rather than simply a more restricted set of possibilities to choose among and some general guidelines. Unfortunately for this viewpoint, we have social scientists, not social engineers, which is what is really wanted by this outlook. This desire does tend to lead to an unfortunate reliance on the naturalist fallacy in political discourse, where a theoretical perspective is taken to be a state of nature rather than a simple set of methodological assumptions on which to conduct further testing.

** An important caveat here is that sometimes people make things up. This is why there are standard methodologies used for many purposes, these have been heavily investigated and shown to work. In cases where the counterargument is that data was collected badly, then it may be that the stronger argument is attacking data. This may not always be the case, sometimes a new approach is better than the old and is facing criticism. However, this isn't the case when dealing with older arguments and approaches so criticisms of well-established methods and fields shouldn't be taken seriously. For instance, I see comparative methods criticized a lot because the data resulting is often very inconvenient for certain approaches. Since comparative evidence tends to work well and is independently confirmed by many other approaches attempts to discredit the approach generally usually reflect poorly on the person doing the criticism, hinting that their own pet approach lacks independent confirmation.

Tzimiskes

Tuesday, February 14, 2012

On Interpreting Data

No comments:

Post a Comment

Search This Blog

Labels

Pages

Followers

Blog Archive