Via social media, John Novak cashes in a Nobel Betting Pool win from a while back, asking:

Please explain to me the relationship between energy, entropy, and free energy. Like you would explain it to a two year old child.

Why? There is a statistical algorithm called Expectation Maximization which is often explained in terms of those three quantities. But apparently entropy is hard to understand because all the detail in the analogy is heaped on entropy part (with some side tracks in Kullback-Leibler divergences.) Since I have a background in communication theory, entropy is a perfectly natural thing to me, even divorced from physical quantities. Energy, and especially free energy aren’t nearly as natural to me in a statistical context.

Sadly, I am one of the worst people you could possibly ask about this. I’m primarily an atomic physicist, meaning most of the time I’m worried about single atoms or very small groups of atoms. I rarely if ever deal with atoms in the kind of quantity where thermodynamic quantities are really necessary. Most of the time, when I think about changing energy states, I’m thinking about internal energy states of an individual atom in response to something like a narrow-band laser field applied to excite the atom. That’s not really a thermodynamic process, even when the laser is on for long enough to establish some sort of equilibrium distribution.

As a result, despite having taken thermo/statmech multiple times, I’m pretty hazy about a lot of this stuff. It always seemed kind of weird and artificial to me, and never all that relevant to anything I was doing. Taking it at 8:30am didn’t help anything, either.

What you really want here is a chemist or a mechanical engineer, because they’re used to dealing with this stuff on a regular basis. I know some of my wise and worldly readers come from those fields, and maybe one of them can shed some additional light. I can tell you what little I do know, though explaining it as if to a two year old will end up sounding a bit like SteelyKid telling The Pip about movie plots (“…and then Darkrai showed up but also there were these purple things and if they touched anything it disappeared, but the kids had to defeat Darkrai except Darkrai had to battle the other Pokemon and they went [*miscellaneous battle noises*] and…”), because my understanding of the subject is around the five-year-old level.

Anyway, the primary concern of thermodynamics is the “internal energy” of a system, by which we mean basically any energy that’s too complicated to keep track of in detail. This is stuff like the kinetic, rotational, and vibrational energy of macroscopic numbers of particles– you’re not going to individually track the instantaneous velocity of every single atom in a box full of gas, so all of that stuff gets lumped together into the “internal energy.” The “external energy” would presumably be something like the center-of-mass kinetic energy of the box full of gas if you shot it out of a cannon, but that’s classical mechanics, not thermodynamics.

Internal energy in thermodynamics gets the symbol *U* because God only knows. (Confusingly, *U* is generally used for potential energy in quantum mechanics, so I spent a lot of time operating under significant misconceptions on this front.) Of course, you never really measure the total internal energy of a real system, only *changes* in the internal energy, which can be broken down into two pieces: energy flow due to heat entering or leaving the system, and energy flow due to work being done on or by the system. Because the apparatus for dealing with this stuff in a mathematical manner primarily is built around systems in equilibrium, we generally consider infinitesimal changes in everything, and handle time evolution by saying that each infinitesimal step results in a new equilibrium, and as long as nothing happens too quickly, that works reasonably well.

The change in internal energy is the difference between heat in and work out (the conventional definitions of positive heat flow and positive work meaning heat coming in and work going out), given by:

$latex dU = dQ – dW $

where *Q* is heat (God knows why), and *W* is work (finally, a quantity whose symbol makes sense in English). These can be further broken down in terms of other quantities: for a sample of gas to do work (or have work done upon it) you generally need to change the volume, in which case the work can be written in terms of the pressure driving the volume change and the change in volume (Work is force times distance, so pressure (force per area) times volume (distance cubed) has the right units). Heat flow is associated with temperature, and at a given temperature, for heat to flow there must be something else changing, which was dubbed “entropy.” Given this, we can re-write th change in internal energy as:

$latex dU = TdS – PdV $

where *S* is entropy, and the other symbols ought to be self-explanatory. Everything in thermodynamics starts with this relationship.

The original question had to do with the relationship between these quantities, which can (sort of) be seen from this fundamental equation, by holding various bits constant. The cleanest relationship between internal energy and entropy comes thorough temperature. If you hold the volume constant (doing no work), then the change in internal energy is just the temperature times the change in entropy. If you’re a physicist, you can re-arrange this into a derivative:

$latex \frac{1}{T} = \frac{dS}{dU} $

(at which point the mathematicians in the audience all break out in hives), telling you that the temperature is related to the change in entropy with changes in internal energy, which is one way to understand the negative temperature experiment from earlier this year. If you set up a system where the entropy decreases as you increase the energy, that corresponds to a negative temperature. This happens for a system where you have a maximum possible energy, because at the microscopic level, entropy is a measure of the number of possible states with a given total energy, and if there’s a cap on the energy of a single particle, that means there’s only one way to get that maximum energy, and thus it’s a state of very low entropy.

Unfortunately, that’s about all you can cleanly say about a relationship between energy and entropy. They’re not really directly related in an obvious way– you can have high-entropy-low-energy states and low-entropy-high-energy states, and everything in between. The only thing you can say for certain is that the total entropy of a closed system will never decrease, no matter what happens to the energy.

“Free energy” is a different quantity, which gets its name because it’s a sort of measure of the useful work that can be done by a system. This is a good illustration of the basic process of thermodynamics classes that I always found sort of bewildering. Basically, you can define a new quantity that’s the difference between internal energy and the product of entropy and temperature (*U – TS*). Physicists tend to give this the symbol “F,” and using the above definition of change in internal energy and a little algebraic sleight of hand (and the product rule from calculus), you get:

$latex dF = d(U-TS) = SdT – PdV $

This new quantity is the “Helmholtz Free Energy.” What’s the point of this? Well, if you consider an isothermal process, for which *dT* is by definition zero, then the change in this free energy is exactly equal to the negative of the work done by the system. So the free energy is a measure of the energy that is available to be extracted as work, which makes it a useful quantity for thinking about pushing things around with hot gases.

The relevance to the Expectation Maximization business would probably have to do with the fact that for a system in equilibrium, *F* must be a minimum. Physically, this makes sense: if there’s extra free energy available above the minimum value, then you can extract useful work from the system, and it’s not really an equilibrium state in that case. Systems with excess free energy will tend to spontaneously give that energy up and change their volume in the process. Most discussions of this helpfully note that the only time chemists use the Helmholtz free energy (they prefer the Gibbs free energy, which includes the pressure and volume term and only changes when components of the system react to change their chemical identity) is when they’re talking about explosives.

These are all very general and abstract relationships between things. The part that I always found kind of bewildering about this stuff is that completely separate from the above math, you tend to also be able to define an “equation of state” that relates various variables together– the “ideal gas law” *PV = nRT* is about the simplest example of an equation of state. These tended to basically drop out of the sky, usually turning up for the very first time in an exam problem, and I would get hung up on just where the hell they were coming from or what I was supposed to do with them. Which is a big part of why I’ve mostly avoided thinking about this since about 1995.

So that’s all I’ve got. Sorry I couldn’t be more helpful, but maybe somebody else reading this will have some light to shed on this stuff as it relates to the maximization of expectations.

There is also a fourth energy quantity, called enthalpy and denoted

H, which differs from the internal energy by including the pressure/volume term but not the entropy/temperature term. Chemists use enthalpy quite a bit, but I don’t think it sees much use in physics. It was briefly mentioned in my physics stat mech course as the fourth corner of a square, with U (I remember it being called E, but that may have been a particular textbook author’s preference), F, and G at the other three corners. I’d have to actually open the textbook to get much farther than this.Preface: I’ve already been called out by one physicist to the effect that “free energy” is a thermodynamics concept, not a statistical mechanics concept.

In my defense on that issue, the computer science literature on the EM algorithm is rotten with references to “free energy” (nearly always in actual scare quotes) and/or “variational free energy” described in reference to either statistical physics or statistical mechanics. Doesn’t mean they’re right, but the description approaches ubiquity.

Most of those footnote a paper by Neal and Hinton who took the classic EM algorithm and turned it around into something using entropy and KL divergences. The math itself is perfectly clear, but then they start tossing around analogies to energy, free energy, and entropy. So far as I can tell, the analogy goes like this:

“And so we’ve derived a form the says ‘This = That – Entropy’ so obviously This is a Free Energy and That is an Energy, aren’t we clever?”

Except… except… goddammit, energy and entropy have different units! That is where my brain tilts and the whole thing falls apart, and you put your fingers on it when you mentioned temperature, the conversion factor between energy and entropy.

About 9/10 of my brain wants to just write this off as a bad analogy using a piece of bad terminology (“free energy”) trying to explain a badly-named algorithm (expectation maximization is *much* better understood as a method of successively improving lower bounds.)

The other 1/10 is my gut instinct that this is where all the annealing algorithms in CS come from if only I could figure out why energy and entropy have inexplicably the same units and where the fuck did the temperature go?

Frustratingly, most of the more basic papers on the topic go to great lengths to explain what entropy is, because apparently that’s the hard part to understand intuitively. Except in my case, with an engineering and information theory background, that is the only part of the analogy that does not require additional exposition.

Anyway, thanks– one of the reasons I thought of you in conjunction with this was precisely because of the negative temperature stuff you had written up a while back.

(Hinton is a CS/Psychologist/Neural Networks/Machine Learning kinda guy. Radford, I dunno. They provide no footnotes to physics lit. Occasionally, I see a footnote to Feynman’s lectures, which I have no electronic access to and have thus far been too lazy to order through the library.)

(Oh, and I left a statement about Hinton out: Hinton was also involved with the development of Boltzmann and Helmholtz machines, which are physics-inspired neural networks algorithms. He’s a seriously smart guy, so it’s not that I don’t think he knows what he’s talking about.)

Eric – Enthalpy is to Gibbs free energy as the internal energy is to the Helmholtz free energy. Basically, it’s the internal energy plus those little adjustments that Chad mentioned with respect to Gibbs free energy. (The SdT is the same in both.)

If you want to get into details, the difference is what

“ensemble” you’re working with – here “ensemble” just means which of the conjugate pairs of parameters you’re viewing as fixed, and which ones you’re viewing as variable. P/V is one pair of conjugate parameters, S/T is another, and N/mu (the number of particles and the chemical potential) is a third. The Helmholtz free energy and the internal energy are relevant parameters when working with ensembles where the volume is seen as constant, and the pressure changes in response to state changes. For the Gibbs free energy and enthalpy, it’s the pressure that’s viewed as being held constant, and the volume changes with respect to reactions. Chemists typically use the latter one because they normally work with open flasks at constant atmospheric pressure. (Explosives work tends to use constant volume assumptions, though, as they happen fast enough that the pressure can’t equilbrate with atmosphere.)

Enthalpy is just internal energy plus a PV term that falls out of integration by parts when you replace the PdV term with a VdP term. (Why are those differentials backwards from what you might expect? God only knows.)

Are these definitions any good for use by laypersons?:

Entropy: A measure of disorder, dissipation of energy, and potential configurations of information. Entropy increases as energy dissipates from a source to a sink, as the order of a system breaks down, and as a quantity of bits is randomized.

Free energy: A quantity of energy available to perform work.

Work: Conversion of energy from one form to another (e.g. in photovoltaics, from photons to electrons), minus entropic losses (e.g. in machinery, mechanical friction translated to heat).

—

Re. Chad: “If you set up a system where the entropy decreases as you increase the energy, that corresponds to a negative temperature.”

I read the science news items on negative temperature, but somehow missed that implication. How does what you said relate to the fairly common layperson assumption that adding energy to a system (within the limits of the robustness of the system) tends to decrease the entropy of the system?

In other words, where’s the error in the layperson assumption: the definition of the boundaries of the system, the artificial constraint of the robustness of the system (analogies such as combustion of fuel in an engine vs. the mechanical strength limits of the engine), or somewhere else? Or have definitions changed as a result of the findings on negative temperature?

—

“Free energy” is also used in the fringe literature in conjunction with “over-unity devices” (in that context, euphemism for “perpetual motion machines”). That, in conjunction with Richard Stallman’s often-quoted line “free as in speech does not equal free as in beer,” gets us “free speech != free beer or free energy.”

I think the “entropy” and “free energy” definitions are probably okay. The “work” definition is a little too narrow, though– it should probably encompass any process that turns energy from one form into another, regardless of entropy. The work I do pushing a heavy box up a ramp goes not only into the increased gravitational potential of the box, but also the disordered heat energy of the box, ramp, and surrounding air due to frictional effects.

I would say that “adding energy decreases entropy” is a misconception– in fact, the canonical example of decreasing entropy generally involves cooling things, decreasing the thermal energy of a subsystem. The key is that some other subsystem must be supplying work in a way that increases the thermal energy of its environment. The work moving energy across the boundary of a subsystem with decreasing entropy can be either positive or negative, but must be balanced by other energy flows elsewhere, and the entropy associated with all that will give a net increase.

G is correct that work only includes the mechanically useful portion of the energy transformed (see your definition of “work” and “heat” above). If I push against a static wall, then my muscles consume energy, but I’m doing no useful work on the wall.

By the way, pV=nRT emerges very naturally from a statistical mechanical treatment of the ideal gas by deriving an expression for Helmholtz free energy from partition function, and then using your equation for dF (p = dF/dV at constant T). Equations of state for more complex systems can be computed using numerical simulations.