The Status of Simulations

Most of what would ordinarily be blogging time this morning got used up writing a response to a question at the
Physics Stack Exchange. But having put all that effort in over there, I might as well put it to use here, too…

The question comes from a person who did a poster on terminology at the recently concluded American Geophysical Union meeting, offering the following definition of “data”:

Values collected as part of a scientific investigation; may be qualified as ‘science data’. This includes uncalibrated values (raw data), derived values (calibrated data), and other transformations of the values (processed data).

In response, he got a note saying:

You have a bias here towards observational data. Need to recognize that a lot of data comes from models and analyses.

The question is phrased as, basically, “What constitutes ‘data?'” but really it’s about the status given to simulation results within science.

This is, of course, a politically loaded question, which is probably why it got this response at the AGU, where there are people who work on climate change issues. Given the concerted effort in some quarters to cast doubt on the science of climate change in part by disparaging models and simulations as having lesser status, scientifically, it’s not surprising that people would be a little touchy about anything that seems to lean in that direction.

As for the actual status of models and simulations, that varies from (sub)field to (sub)field, more or less in accordance with how difficult it is to interpret experimental or observational data. My own field of experimental Atomic, Molecular, and Optical physics has a fairly clear divide between experiment and theory, largely because the experiments we do are relatively unambiguous: an atom either absorbs light or doesn’t, or it’s either in this position or that one. We do need simulations to compare to some experiments, but there’s never much question that those are theory, and not part of the experiment. The correspondence between things like density distributions observed in experiments and those generated by simulations is often close to perfect, differing only by a bit of noise in the experimental data.

When you get to nuclear and particle physics, where the detectors are the size of office buildings, the line gets a little fuzzier. The systems they use to detect and identify the products of collisions between particles are so complicated that it’s impossible to interpret what happens without a significant amount of simulation. As a result, experimental nuclear and particle physicists spend a great deal of time generating and analyzing simulated data, in order to account for issues of detector efficiency and so on. I don’t think they would call these results “data” per se, but computational simulation is an absolutely essential part of experimental physics in those fields, and those simulations are accorded more status than they would be in AMO physics. Experimental nuclear particle physicists spend almost as much time writing computer code as theoretical AMO physicists, at least from what I’ve seen.

The situation gets even more complicated when you get to parts of physics that are fundamentally observational rather than experimental. If you’re a particle physicist, you can repeat your experiments millions or billions of times, and build up a very good statistical understanding of what happens. If you’re an astrophysicist or a geophysicist, you only get one data run– we have only one observable universe, and only one Earth within it to study. You can’t rewind the history of the observable universe and try it again with slightly different input parameters. Unless you do it in a simulation.

My outsider’s understanding of those fields is that simulation and modeling is accorded a much higher status than in my corner of physics, just out of necessity. If you want to use a physical model to explain some geological or astrophysical phenomenon, the only way you can really do it is by running a whole lot of simulations, and showing that the single reality we observe is a plausible result of your models. Correctly interpreting and establishing correspondences between simulations and observation is a subtle and complicated business, and constitutes a huge proportion of the work in those communities.

I don’t know that many astrophysicists, and even fewer geophysicists, so I don’t know the terminology they use. My impression of the astrophysics talks I’ve seen is that they wouldn’t put such simulation results on the same level as observational data or experiments, but then my sample isn’t remotely representative. It may well be that there are fields in which model results are deemed “data” in the local jargon.

Of course, part of the reason for moving this over here is that there will be many more geology/ climate science types hanging around ScienceBlogs than there are at the Physics Stack Exchange, so there’s a good chance of getting some clarification from within the relevant communities. And even extending the question to other fields outside the physical sciences– I know even less about biology than geophysics, so for all I know this is a question that comes up there, too. If you work in a field where simulation results are commonly termed “data,” leave a comment and let me know.

23 thoughts on “The Status of Simulations

  1. I don’t understand – what part of that definition leans toward observational data, or away from simulations and analysis? What part of it says that “Values collected as part of a scientific investigation” can’t come from a model?

    In other words, how could one improve the definition to incorporate the response, without simply saying “Data can include values derived through observations, modelling, or analysis?” (I see that sort of statement as redundant – if the rest of the definition is written poorly, then the caveat will contradict the definition; if written well, then it’s superfluous.)

    For the record, I’m a lawyer, not a scientist.


  2. I am a climate scientist. Complex numerical modeling of climate system is a giant neighborhood in the climate science community. There are a great many scientists who work almost entirely within the world of numerical climate models. Many of them refer to the output of climate models as “data”. This has been exacerbated by the release of huge archives of climate model output on public websites (the CMIP archive hosted by LLNL) which allows anyone to download and analyze the output from more than 20 climate models. This is accomplished in much the same fashion that large observational groups, such as NASA, share publicly-funded observational data. The CMIP archive has been a great development for opening up the world of climate modeling to any interested party (and to valuable scrutiny by any interested party). But it has, in my opinion, blurred the line between data and simulation output.

    My opinion is that the outputs of numerical models should not be referred to as data. Our technical language, particularly in the earth sciences, should distinguish between observations of the one true realization of the earth’s climate and the representations of possible (but not necessarily likely or even plausible) climate states that the models spit out.

    None of this should be taken as demoting the “status of simulations”. I work with models and observations. They both have their place in climate science.

  3. To some extent, this is a pragmatic issue: Data is data, simulated or not, when you get to the part where you put the data into the equation/software/visualization system.

    But we do have a strong cultural bias, with deep roots and a valid raison d’etre, to refer to information we collect from observation as data. THAT kind of data has a special place in the “Scientific Method” and all that.

    One problem with this is that simulation results are sometimes seen, within various fields (certainly in bioanthropology) as “made up data” and “made up data” is sometimes very very evil. If, that is, you called it observation and really just made it up.

    Even the term “simulated data” has a negative connotation. Perhaps “simulation data” is the ideal term that Eric seems to imply that we need.

    I have more than once found myself in a room full of screeching anthropologists insisting that bootstrapping was unethical because it involved made up data. Fortunately, they were my students in my classroom so I had a chance of doing something about it.

  4. Chad: “Given the concerted effort in some quarters to cast doubt on the science of climate change in part by disparaging models and simulations as having lesser status, scientifically…”

    Can any scientist really doubt that models and simulations have lesser status, scientifically, then observational/experimental data? This is THE cornerstone of empiricism and science.

    If your model says jumping from your window is safe and your experiment ends up with you dead on the pavement will anyone argue that your model got it right?

    Yeah, there is a whole spectrum here, since extracting data from observations/experiments usually takes some kind of models, but the general rule is pretty simple – each step away from direct observation, each layer of theory, model or simulation, lowers the reliability of the conclusions.

    For a trivial example, lets say you measure voltage at some point in a light detecting circuit and your meter shows 10 Volts. The statement that the meter showed 10 Volts is the most reliable one, the statement that there were 10V potential there in the circuit is less reliable since it assumes your meter is working as intended, the statement that it means 20mA current was flowing between those points is even less reliable since it relies on the correct knowledge of the impedance and many other aspects of the circuit, finally the statement that it means the light intensity was such and such is even less reliable since it assumes the circuit is working as intended, and so on, each step away from the empirical fact that the meter showed 10V involves additional assumptions and models which lowers the reliability of the conclusion.

  5. each step away from direct observation, each layer of theory, model or simulation, lowers the reliability of the conclusions.

    I’d have to strongly disagree with this. What if you’re dealing with some process where the only data you have is very far removed from the system? (I’m thinking specifically of things like behavior of Earth’s magnetic field.) Models can do a lot to illuminate the mechanisms that cause the behavior because we can observe what is happening with the model while we cannot ever directly observe what is happening inside the planet: we can only see the generated field and examine historical evidence for what it might have looked like in the past (which is already a somewhat contentious debate). In reality, those observations doesn’t tell us how the field is generated, and the equations are non-linear and work over huge ranges of scales. There’s no straight-forward way to describe the behavior as only the simplest cases are intuitive. The models, however, are extremely illuminating and are increasingly reliable as more computational power and thus better resolution become available.

  6. “each step away from direct observation, each layer of theory, model or simulation, lowers the reliability of the conclusions.”

    This is hogwash; and for reasons more fundamental than suggested by Cherish. An observation that lacks a theory to explain it doesn’t provide much understanding.

  7. @5 & 6: I interpret Paul’s statement as “the more complicated the model (once you include all the intermediate layers of theory)…”

    To use Cherish’s magnetic field example, if you *could* observe what’s happening inside the Earth and built a model based on that, it would likely be better than one based on modeling the unobservable. Chad says essentially the same thing re: particle physics.

    Eric– I don’t think anyone’s arguing that observation without theory is more meaningful. Paul’s statement that “The statement that the meter showed 10 Volts is the most reliable one” is accurate. It’s not terribly meaningful, but it is more reliable.

    This sort of issue comes up all the time in financial modeling as well. It’s always very tempting to use the result of a [smoothed w/ nice boundary behavior] model as input to another one, but you should avoid that where possible.

  8. You two (Cherish and Eric) completely missed the point, I am not talking about understanding I am talking about *reliability*. All the understanding a model has to offer is worthless if it’s reliability is close to zero or undefined.

  9. As a professional data torturer, I don’t see any difference in principle between “real” and simulated data: it’s all information that is to be processed. There are even statistical techniques for dealing specifically with simulations.

    Within biology, I think simulations aren’t necessarily seen as better or worse, but just as telling us something different: the model is an abstraction, an idealisation, of the Real World.

  10. Speaking to amplify Cherish’s point and to address Paul’s point, I recall doing several experiments as an undergrad physics student in which the values I was actually recording were far, far removed from the values I was trying to determine.

    The fact is, you really can’t directly measure the energy levels of the nucleons in the atoms of a sample, but if you have the Standard Model from QM you can take the values you can measure (which in a typical Mössbauer spectroscopy experiment would be things like histogram data from an event detector and the linear velocity of your solenoid) and derive those energy levels.

    I understand that the Standard Model is currently considered incomplete or inadequate, but it’s nevertheless valuable because it makes accurate predictions which are usable to physicists and engineers alike. Without the mathematical models made possible by the Standard Model, these kinds of calculations would be impossible, and if any of the assumptions or relationships within the Standard Model were found to be grossly incorrect or inaccurate, any data derived through its equations would be rendered meaningless, just as the model itself would collapse like a proverbial house of cards.

    But we have enough confidence in this particular model, despite its flaws, that we don’t anticipate such a drastic outcome; after all, we engineer semiconductor devices of staggering complexity which push quantum limits, and we are engineering quantum computers and cryptography systems using those same models. Similarly, we know that General Relativity has shortcomings, but we don’t omit relativistic corrections from our GPS software. (At least clock-skew is easier to directly measure, assuming you trust how your atomic clock works…)

    Consider the field of astronomy, where the things you’re measuring are even more remote than in any other scientific discipline. Trying to determine something as simple as distance to some object can be fraught with difficulty, and many estimates have been revised as our tools have gotten better. Still, determining distance relies upon a ladder or hierarchy of metrics, with overlap between the different ranges for each technique at the astronomer’s disposal — the overlap is what gives you confidence that the results for one distance estimate are valid when other objects can be measured using both that and a different method. As long as the methodology for determining some astrophysical results is well-published, so that the model can be tested and analyzed, there’s no reason to reject the calculated results derived by an astrophysicist in favor of the raw observational data from an astronomer. (But don’t throw away the raw data! Someone has to check the work, or maybe even do fresh analysis.)

    If we were to apply Chad’s reasoning to the fields of astronomy and astrophysics, I suspect there would be precious few trustworthy conclusions one could draw about anything.

  11. I am also a climate scientist. It’s not uncommon to refer to model output as “data”, especially by modelers and software engineering types. I do data-model comparisons, and I try to distinguish between “model output” and “data”. (Although “data” can mean either direct observations or observational “data products” that involve modeling). When I work with statisticians they often refer to everything, observations and model output, as “data”, and this can be confusing.

    Regarding the primacy of observations over theory, fine, I understand the perspective: reality is of course the ultimate arbiter of whether a theory is right or wrong. But, in practice, observational data are not always more “reliable” than theory. (Sparse data, high noise, uncontrolled confounding factors, unknown systematic biases, poorly characterized relationships between measured and inferred quantities, etc.) Not every field of science has abundant clean data that can be measured out to ten decimal places.

    Consequently, this makes “falsification” much harder. You can’t always just say that data and theory are “inconsistent” and therefore the theory is ruled out. You also have to wonder whether the data are flawed, because in reality it’s not just theory that can be wrong, but data too. (Or, at least, the data don’t necessarily mean what you think.)

    Rather than “falsification”, I prefer to think of science as iterative theory refinement. This is a broad process which involves looking at the totality of both observational and theoretical evidence, the credibility of individual measurements and calculations, and overall coherency. Does the balance of evidence undermine the credibility of a theory, or support it?

  12. Paul, how is creating a model based on the fundamental laws of physics (i.e. Maxwell’s equations and the laws governing fluid dynamics) that creastes a field similar to that observed at the surface of the planet unreliable? When were the laws of physics invalidated?

    I’m not sure what you’re getting at with ‘reliability’. Do you mean that there are errors in measurements when you have to infer some of the information? That’s true, but you also can make a reasonable estimate about what the error is. If a model based on fundamental laws is producing results that are close to the empirical data, I’d have a hard time believing it isn’t valid. Further, any program of that sort of complexity has to be validated at many levels. It has to be able to successfully produce results that can be observed before you throw it at a problem that complicated. If you can’t first reproduce results from a ball of spinning ferromagnetic fluid from a lab, then obviously you aren’t going to tackle a planetary interior until you know the code works correctly.

    It seems to me that what you are saying is that models have no basis in reality. That is false.

  13. I mostly agree with FB’s take. I admire his (her?) attempt to distinguish “model output” from “data”.

    Contrary to Chad’s original article, I’d claim that this nomenclature issue occasionally pops up in AMO physics as well. I’ve heard theorists talk about the correspondence between “theory” (their analytic model) and “data” (the results of their numerical simulation). Which drives me nuts.

    Admittedly this is a bit of a different issue than what goes on with nuclear/particle/climate physics. But we need to update the language to differentiate between data (which, according to my religious beliefs, is something you can only get from an experiment) and the results of simulations.

  14. I think it works better if you consider data as evidence, since the point of collecting data is to support an argument. Scientific argument follows the rules of rhetoric whether one is forensic, arguing the past, or hortatory, arguing the future. Real world observations are essential for forensic argument, since the argument is about what caused what and what happened. On the other hand, the results of simulations can be used for arguing about past or about the future.

    It makes perfect sense to use the results of a simulation as evidence, but it is evidence about the behavior of the model. It has to be combined with evidence of real world observations to support arguments about the behavior of the real world.

    P.S. The mathematician Gian Carlo Rota considered mathematical proofs to be evidence which could be used to understand mathematical entities. It was rarely obvious which statements were the axioms and which the theorems, but whichever were chosen, the structure of the proof could provide valuable insight.

  15. I find some of the opinions expressed above deeply disturbing; it is hugely important that we maintain a clear separation between observational and experimental data from the Real World, and numbers generated by computer simulations.

    Isaac Newton appreciated the distinction when he promoted the primacy of experiments and poo-pooed those who tried to argue about the structure of nature without experimenting. Let’s not go backwards now.

    In the computer science sense, of course all numbers are “data”. That’s just a different use of the term and should not trouble us or cause confusion. But a “datum” (from the Latin, meaning “a given [thing]”) in science should be a privileged item.

    Data is never completely raw – the example of the reading on a voltmeter is a good one – but there is a world of difference between a measured item on a super-strong well-verified inferential chain and a number puffed out of a computer using the researchers’ pet models.

    For example, there is a difference between the straightforward algorithms that recover particle tracks from detectors and the more contentious calculations that aggregate these to infer the existence of particles that cannot be observed as directly.

    And of course data needs to be “cleaned” sometimes; raw data is privileged but not necessarily pure. A voltmeter can be lose its calibration, a rainfall collector can fill with rubbish, a recorder can record the wrong number.

    A recent post somewhere on ScienceBlogs from a climate scientist whinged that it was unreasonable to demand publication of models for review, and referred to the informal process of aggregation and “verification” of models, and adjustment of parameters. The implication was that the plebs should accept that the masters of climate science look on this work and pronounce it satisfactory. Very, very unconvincing. With that post, and this one, I have become even more unhappy about the status of climate science; most of us have neither the expertise nor time to dig deep and to find that many think that the numerical diarrhea generated by computer models (fiddle factors adjusted to the researchers’ preference) might be accorded the same respect as careful measurements is very worrying.

    I suspect the climate scientists are generally correct. But I don’t trust the climate science community. There is none of the rigor demanded of medical trials or epidemiology in their work.

  16. Sam,

    If you think that the commenters here have claimed that computer models and measured data are equivalent, I seriously suggest that you re-read the comments.

    What commenters have argued against is the position that theory and modeling are useless or unreliable (as you imply with your “numerical diarrhea” comment), or that measured data is always more reliable than theory.

  17. (note — I’m the one who posted the question to stackexchange that spawned this discussion)

    Re: Sam C’s comment (#16) : we actually have models of how sensors degrade over time and how sensitivity in one waveband is related to sensitivity of the light from the test lamp (imagine a 15 year old telescope that’s been pointed at the sun — it gets some burn-in on the CCD [1], and being that it’s in space, you can’t go and swap it out for a fresh one), and that’s used to generate the calibrated data which most scientists use without question.

    For newer missions (eg, SDO), except for EVE, the data is only being released after calibration. (and we had to fight for a reversible calibration, and not data which had a point spread function applied and forced to integers)

    … at the very least, I likely need to differentiate between ‘simulations’ and ‘models’, as the term ‘model’ is used to describe formulas, simulations, specific runs of simulations, multi-dimensional maps, etc.

    And for publication, I think that editors should require that at the very least, the peer reviewers have access to the underlying data used in the analysis, and information about where it came from and how it was processed. As right now, peer-reviewed papers are the main metric of productivity (ie, tenure, etc.), if we can get the gate-keepers on board, I think we have a chance with fixing the culture that accepts a lack of data transparency.


  18. Sam, you’re totally right. And while we’re at it, let’s stop using computer models to predict the weather, engineering prescription drugs, designing cell phone towers, finding mineral deposits under the ground, developing an understanding of cancer and other biological processes…all that stuff. Obviously if it’s made on computers, it must be junk and useless, right?

    You may not realize it, but nearly every field of science and engineering uses computers to model complicated processes. Take a look at Nvidia’s site ( as one example and you can see the breadth of fields that use large-scale computing. Most of these fields take for granted that they can use computers to solve their problems, and most don’t do anything differently that climate science. It’s just that climate science has been under intense scrutiny, so everyone assumes that it is somehow flawed because it uses computers.

    The inputs to these processes use ‘real’ data, and the outputs need to be measured and evaluated against observational data. Generally, we know that these models can produce some very good science and engineering because you’re living with the results every day. For those who use their modeling to tackle less pragmatic or immediate subjects, peer review comes in to evaluate the validity. If you’re modeling something with an obnoxiously unrealistic parameter regime, it’s fair to say that an observant reviewer is going to ask you what you were thinking. Further, some fields employ benchmarks: if your simulation cannot produce a required set of phenomena, you don’t meet the benchmark and your results can’t be trusted.

    Most modelers don’t use ‘fiddle factors to adjust to the researchers preference’. They evaluate the system at a set of regimes to see how the system responds. This is good science because it’s important to know where your system works, where it doesn’t, and whether the results are unexpected due to the model or the physical system itself.

    Using computer does have it’s drawbacks as there will never be enough computational power to have full resolution of all systems at reasonable timescales. However, arguing that computer models produce useless or highly suspect data because of this is like saying we should never make macroscopic approximations to systems and should always rely on quantum mechanics to figure out what’s going to happen because it’s so much more precise.

  19. Interesting to see some cultural divide here. Looks to me that your attitude depends on how complicated is your measuring device and the phenomena you are trying to describe. On the one extreme there is the “voltmeter” people, who seem to ignore any layer of interpretation involved in their measurements, thinking about measuring as a simple and unambiguous process. Nice work if you can get it…On the other extreme, high energy experimentalists are confronted with a couple of GB of data per second…Both the collsions and the measuring device are extremely complicated systems, and big part of the effort is figuring out what constitutes of a measurement, what is noise and what is signal, in other words how to interpret the data (which means lots and lots of modelling and simulations). I expect this is a big part in climate science (about which I don’t know all that much).

    I’d argue that this layer of interpretation exists always in any measurement. When the measuring device is simple and/or well-understood this can be done in an intuitive oral-tradition kind of way. When the system is complicated enough this layer has to be explicitly considered and systematized. I’d also argue that such systematic approach makes the measurement more, not less, relaible. But, I’m only a theorist, and I admit to not having any strong opinions on what the scientific method ought to be…

  20. Sam (#16): The publishing of climate model code issue is a strawman. The codes for most major climate models are publicly available and have been for many years – at least those developed in the US. Indeed the models have many tuning parameters (which are really physical coefficients, such as ice crystal fall speeds and turbulent mixing coefficients, not merely unphysical knobs). Many are chosen based on observations, rather than whim, however the most crucial parameters (those that determine the properties and behavior of clouds) are only observed with huge uncertainty bars (note that these measurement uncertainties persist in spite of tremendous rigor in the measurement strategy). Therefore the modelers are only helped marginally by the observational community. Nevertheless, the sensitivity of the model results to tuning parameter values across the range of values within the measurement uncertainty has been probed and provides a powerful constraint (though not a complete constraint) on the range of possible future warming for a given increase in greenhouse gases. All of this well documented in the IPCC reports and the scientific literature. If this is not clearly conveyed by a few voices in the blogs, then that is unfortunate. But the peer-reviewed literature and the assessments upon which they rest clearly document the uncertainties related to climate modeling.

    Asking for the same level of rigor in climate science as in medical trials is not particularly helpful. Medical trials rely on carefully controlled experiments with an experimental group and a control group. Doing so for the climate system would be nice. But alas, we have only one Earth. I would be interested to know how the epidemiology community is a better example of rigor. They, like climate science, are inhibited by a lack of carefully controlled experiments.

  21. I am a theorist/computational scientist who works closely with experimentalists. We always refer to “numerical/simulation data” and “experimental data”. They are both data, and neither is flawless; both come with assumptions and approximations. Theory and numerics start from a hypothesis of “why” something happens and try the describe a particular phenomenon; you test the “why” (our understanding) by testing the prediction of the phenomenon with the experimental observation of the phenomenon. This process is at the core of the scientific method, so I don’t understand people who think so little of theory/simulation.

  22. “the experiments we do are relatively unambiguous: an atom either absorbs light or doesn’t, or it’s either in this position or that one.” Well… dogs know that quantum mechanics allows the superposition of states, right ?

Comments are closed.