Why Does Excel Suck So Much?

Yesterday’s bad graphic post spurred me to finally get around to doing the “Why Does Excel Suck So Much?” post I’ve been meaning to do for a while. I gripe about Excel a lot, as we’re more or less forced to use it for data analysis in the intro labs (students who have taken the intro engineering course supposedly are taught how to work with Excel, and it’s kind of difficult to buy a computer without it these days, so it eliminates the “I couldn’t do anything with the data” excuse for not doing lab reports). This is a constant source of irritation, as the default settings are carefully chosen so as to make it difficult for students to do a good job of data presentation.

Let’s say, for example, that we’re doing a lab that asks students to measure two things that are proportional to one another– the magnetic field along the axis of a coil for various currents, for example. The graph I get from students is almost always something like this:

i-ffebd5f9da0ddfba093f38a592e4d222-lineplot.jpg

These students have done a terrible job of taking the data, right? No, they did the data collection right, but were tripped up by Excel.

I’ve constructed my fake data so as to make the effect as bad as possible, but this is the sort of thing that really happens with a lot of graphs. What we have here simulates data that was collected in a perfectly responsible manner, by taking half the points while increasing the current, and the second half filling in the gaps between the first set while decreasing the current. This is a fairly standard experimental technique to guard against things changing over time.

The garbage graph your see results from students using the “Chart Wizard” and choosing the first thing that looks reasonable, a “Line” plot. This is helpfully indicated by a little graphic showing points connected by lines, and it draws students like rotting meat draws flies.

The problem is, a “Line” plot as defined by Excel is a type of plot that is utterly unsuited to scientific data: it plots a series of points in the order in which they occur, spaced equally along the horizontal axis. The horizontal axis labels are just that: labels. This may superficially resemble a graph of magnetic field vs. current, but in fact it’s a graph of magnetic field vs. row number, with the points helpfully labelled by the current corresponding to that row number.

What you want if you’re attempting to do science with Excel is a “Scatter” plot, a little farther down the list. If you choose “Scatter,” the resulting graph looks like this:

i-cf18aabcf69139d84037c1ed2d3b8d7f-scatterplot.jpg

This is at least the right kind of plot, but it’s awful in almost every other respect. There’s the godawful grey background, the inexplicable pastel color scheme, and the axes running right through the middle of the plot, rendering them completely illegible.

There is no way that I know of to change any of these defaults, and fixing them takes an inordinate amount of work. You have to right-click to get “Format Chart Area” to remove the grey, and that’s the easiest of the lot. Changing the axes requires you to format each individual axis, change the maximum and minimum values (or at least tell it not to auto-scale the axes, in case anything else changes), and set the crossing point of the other axis. Changing the data points to colors that don’t resemble one of those eye tests they give you in grade school requires you to change both the “foreground” and “background” colors of each series, and “foreground” and “background” are not defined the way you might think.

After a good deal of pointing and clicking, you can end up with something halfway presentable:

i-70ec01a4a280c8a008354b1c34891fcb-scatterplot_fixed.jpg

Of course, when you realize that you’re going to need to do all this every time you make a graph, the several hundred dollar price tag for a real scientific graphing package doesn’t seem so bad.

Of course, it’s rare to have a lab in which you just plot points with no further analysis. Usually, we really care about the slope of the line, or something like that, so let’s look at how Excel does there. It’s actually quite simple to superficially do the right thing, by right-clicking a point and adding a “Trendline.” You can even get it to display some information (I’ve trimmed the graph down to a single series, for clarity):

i-20f56a27698b393f2e94deb8af4ac2e7-trendline.jpg

Of course, what’s shown on that graph is the sum total of the information you can get about the trendline: an equation, and an R2 value. Which is fine, if you’re doing biology or something, but in physics, we care about the numerical values of things, and the uncertainties in those numerical values.

Having the slope of the line is nice, and all, but if you’re going to do anything with it, you need the uncertainty in the slope. Maybe there’s some way to convert that R2 value into an uncertainty, but I don’t know what it is, and you can be sure that our students have absolutely no idea. Which sort of blows the whole idea of uncertainty analysis out of the water, unless you want to make them do least-squares regressions by hand.

There is another tool that can be brought to bear here– if you have the right version of Excel, and have installed it from the disks, you can get the “Data Analysis ToolPak,” which will do a linear regression relatively painlessly, and give you the uncertainty in the slope and intercept. This brings uncertainty propagation back into play, pedagogically, though it forces some contortions in order to get the data into the right form to show a linear relationship. That’s not entirely a bad thing, but it does make for some additional headaches.

(And, of course, the Data Analysis ToolPak was entirely too useful, and thus has been removed from the new version of Excel. Probably because it didn’t fit nicely in their godawful “ribbon” concept for the interface.)

Now, you might be saying “Well, of course Excel isn’t appropriate for scientific data analysis. It’s not really for scientists, though.” Which is true, but here’s the thing: the things I’ve complained about here aren’t good for anything. The color schemes and axis settings lead to illegible plots no matter what sort of data you’re working with. And I’m completely at a loss as to the purpose of the “Line” plot, or making it difficult to find uncertainties in fitted quantitites.

Another thing you might be saying is “But all of these pictures are from the old Excel– the newer version is much more swizzy!” I used the older version for my demo pictures above, because that’s what’s on the computers in the teaching labs. But just for fun, here’s what you get from the new version:

i-11e8f0b2e965443bc0e501086e334c0f-new_excel.jpg

I’ll admit, there are a few improvements. The dreadful grey background is gone, and the colors, while still weirdly muted, at least have a reasonable contrast with the background. And at least one of the axes is moved out of the middle of the goddamn plot, making it halfway readable. The data points have become cartoonishly large, but that’s a relatively minor problem.

If you’re perceptive, though, you’ll notice something missing. Something… kind of important. That’s right, it defaults to no axis labels at all. Fan-tastic.

Happily, the lack of the analysis tools probably renders this all a moot point, starting next year when ITS upgrades the computers in the teaching labs. Unless somebody in Redmond picks up a clue and provides a simple tool to do fits with uncertainties, I’m scrapping the whole stupid program, and forcing the engineers to learn SigmaPlot. Which has its own stupid defaults, but at least generates something approximately like proper data presentation on the first attempt.