Q: Why does E=MC2 ?

Physicist: It’s a little surprising that this question didn’t come up earlier.  Unfortunately, there’s no intuitive way to understand why “the energy of the rest mass of an object is equal to the rest mass times the speed of light squared” (E=MC2).  A complete derivation/proof includes a fair chunk of math (in the second half of this post), a decent understanding of relativity, and (most important) experimental verification.

So first, here’s an old physics trick (used by old physicists) to guess answers without doing any thinking or work, and perhaps while drinking.  Take everything that could have anything to do with the question (any speeds, densities, sizes, etc.) and put them together so that the units line up correctly.  There’s an excellent example in this old post about poo.

Here’s the idea; energy is distance times force (E=DF), and force is mass times acceleration (E=DMA), and acceleration is velocity over time, which is the same as distance over time squared (E = DMD/T2 = MD2/T2).

So energy has units of mass times velocity squared.  So, if there were some kind of universal relationship between mass and energy, then it should depend on universal constants.  Quick!  Name a universal speed!  E=MC2

Totally done.

This is a long long way from being a solid argument.  You could just as easily have E=5MC2 or E=πMC2 or something like that, and still have the units correct.  It may even be possible to mix together a bunch of other universal constants until you get velocity squared, or there may just be a new, previously unknown physical constant involved.  This trick is just something used to get a thumbnail sketch of what “looks” like a correct answer, and in this particular case it’s exactly right.

For a more formal derivation, you’d have to stir the answer gravy:


Answer gravy: This derivation is going to seem pretty indirect, and it is.  But that’s basically because E = mc2 is an accidental result from a more all-encompassing theory.  So bear with me…

The length of regular vectors, (which could be distance, momentum, whatever) remains unchanged by rotations.  If you take a stick and just turn it, then of course it stays the same length.  The same holds true for speed: 60 mph is 60 mph no matter what direction you’re moving in.

Although the vector, v, changes when you rotate your point of view, its length, ||v||, always stays the same.

If you have a vector (x,y,z), then it’s length is denoted by “||(x,y,z)||“.  According to the Pythagorean theorem, ||(x,y,z)||^2 = x^2+y^2+z^2.

When relativity came along, time suddenly became an important fourth component: (x,y,z,t).  And the true “conserved distance” was revealed to be: ||(x,y,z,t)||^2 = x^2+y^2+z^2-c^2t^2.  Notice that when you ignore time (t=0), then this reduces to the usual definition.

This fancy new “spacetime interval” conserves the length of things under ordinary rotations (which just move around the x, y, z part), but also conserves length under “rotations involving time”.  In ordinary physics you can rotate how you see things by turning your head.  In relativity you rotate how you see things in spacetime, by running past them (changing your speed with respect to what you’re looking at).  “Spacetime rotations” (changing your own speed) are often called “Lorentz boosts“, by people who don’t feel like being clearly understood.

You can prove that the spacetime interval is invariant based only on the speed of light being the same to everyone.  It’s a bit of a tangent, so I won’t include it here. (Update 8/10/13: but it is included here!)

Some difficulties show up in relativity when you try to talk about velocity.  If your position vector is (x,y,z), then your velocity vector is \vec{v} = (\frac{\partial x}{\partial t},\frac{\partial y}{\partial t},\frac{\partial z}{\partial t}) (velocity is the time derivative of position).

But since relativity messes with distance and time, it’s important to come up with a better definition of time.  The answer Einstein came up with was \tau, which is time as measured by clocks on the moving object in question. \tau is often called “proper time”.  So the better definition of velocity is (\frac{\partial x}{\partial \tau},\frac{\partial y}{\partial \tau},\frac{\partial z}{\partial \tau}, \frac{\partial t}{\partial \tau}).  This way you can talk about how fast an object is moving through time, as well as how fast it’s moving through space.

By the way, as measured by every one else on the planet, you’re currently moving through their time (t) at almost exactly 1 second per second (\frac{\partial t}{\partial \tau}=1).

One of the most important, simple things that you can know about relativity is the gamma function: \gamma = \frac{1}{\sqrt{1-\left(\frac{||v||}{c}\right)^2}}You can derive the gamma function by thinking about light clocks, or a couple other things, but I don’t want to side track on that just now.

Among other things, \gamma = \frac{\partial t}{\partial \tau}.  That is, \gamma is the ratio of how fast “outside time” passes from the point of view of the object’s “on-board time”.  So now, using the chain rule from calculus:

(\frac{\partial x}{\partial \tau},\frac{\partial y}{\partial \tau},\frac{\partial z}{\partial \tau}, \frac{\partial t}{\partial \tau}) = (\frac{\partial t}{\partial \tau}\frac{\partial x}{\partial t},\frac{\partial t}{\partial \tau}\frac{\partial y}{\partial t},\frac{\partial t}{\partial \tau}\frac{\partial z}{\partial t}, \frac{\partial t}{\partial \tau}) = (\gamma\frac{\partial x}{\partial t},\gamma\frac{\partial y}{\partial t},\gamma\frac{\partial z}{\partial t}, \gamma).

For succinctness (and tradition) I’ll bundle the first three terms together:

(\gamma\frac{\partial x}{\partial t},\gamma\frac{\partial y}{\partial t},\gamma\frac{\partial z}{\partial t}, \gamma) = (\gamma \vec{v}, \gamma)

Now check this out!  Remember that the spacetime interval for a spacetime vector with spacial component \vec{a}, and temporal component b, is ||(\vec{a},b)||^2 = a^2-c^2b^2.

||(\gamma\vec{v}, \gamma)||^2 = \gamma^2v^2-c^2\gamma^2 = \gamma^2(v^2-c^2) = \frac{v^2-c^2}{1-\frac{v^2}{c^2}} = c^2\frac{v^2-c^2}{c^2-v^2} = -c^2

(This used a slight breach of notation: “\vec{v}” is a velocity vector and “v” is the length of the velocity, or “speed”)

The amazing thing about “spacetime speed” is that, no matter what v is, ||(\gamma \vec{v}, \gamma)||^2 = -c^2.

(Quick aside; it may concern you that a squared quantity can be negative.  Don’t worry about it.)

Now, Einstein (having a passing familiarity with physics) knew that momentum (\vec{P}) is conserved, and that the magnitude of momentum is conserved by rotation (in other words, the direction of the momentum is messed up by rotation, but the amount of momentum is conserved (top picture of this post).  He also knew that to get from velocity to momentum, just multiply by mass (momentum is \vec{P}=m\vec{v}).  Easy nuf.   m^2||(\gamma \vec{v}, \gamma)||^2 = -m^2c^2 \Rightarrow ||(\gamma m\vec{v}, \gamma m)||^2 = -m^2c^2

So if ordinary momentum is given by the first term (the “spacial term”): \vec{P}= \gamma m \vec{v}, then what’s that other stuff (\gamma m)?  Look at the conserved quantity:

\begin{array}{ll} ||(\gamma m\vec{v}, \gamma m)||^2 = -m^2c^2 \\ \Rightarrow \left(\gamma mv\right)^2 - \left(\gamma mc\right)^2 = -m^2c^2 \\\Rightarrow P^2 - \left(\gamma mc\right)^2 = -m^2c^2 \end{array}

What’s interesting here is that m2c2 never changes, and P2 only changes if you start moving. For example, if you were to run as fast as a bullet (in the direction of the bullet), you wouldn’t worry about it hurting you, because from your perspective it has no momentum.

So whatever that last term is (\gamma mc) it’s also conserved (as long as you don’t change your own speed).  Which is interesting.  So the ‘Stein studied its behavior very closely.  If you take its Taylor expansion, which turns functions into polynomials, you get this:

\begin{array}{ll} m c \gamma\\= \frac{mc}{\sqrt{1-\left(\frac{v}{c}\right)^2}}\\= mc\left[1+\frac{1}{2}\frac{v^2}{c^2}+\frac{3}{8}\frac{v^4}{c^4}+\frac{5}{16}\frac{v^6}{c^6} \cdots \right]\\=mc+\frac{1}{2}m\frac{v^2}{c}+\frac{3}{8}m\frac{v^4}{c^3}+\frac{5}{16}m\frac{v^6}{c^5} \cdots \end{array}

The second term there should look kinda familiar (if you’ve taken intro physics); it’s the classical kinetic energy of an object (1/2 mv2) divided by c.  Could this whole thing (thought Einstein) be the energy of the object in question, divided by c?  Energy is definitely conserved.  And, since c is a constant, energy divided by c is also conserved.

So, multiplying by c: E = \gamma m c^2 = mc^2+\frac{1}{2}mv^2+\frac{3}{8}m\frac{v^4}{c^2}+\frac{5}{16}m\frac{v^6}{c^4} \cdots.

You can also plug this into P^2 - \left(\gamma mc\right)^2 = -m^2c^2 and, Alaca-math!  You get Einstein’s (sorta) famous energy/momentum relation: P^2c^2 + m^2c^4 = E^2.

Notice that the energy and momentum here are not the classical (Newtonian) energy and momentum: E=\frac{1}{2}mv^2 and \vec{P} = m\vec{v}.  Instead they are the relativistic energy and momentum: E=\gamma mc^2 and \vec{P} = \gamma m\vec{v}.  This only has noticeable effects at extremely high speeds, and at lower speeds they look like: E \approx mc^2 + \frac{1}{2}mv^2 and \vec{P} \approx m\vec{v}, which is what you’d hope for.  New theories should always include the old theories as a special case (or disprove them).

Now, holy crap.  If you allow the speed of the object to be zero (v=0), you find that everything other than the first term in that long equation for E vanishes, and you’re left with (drumroll): E=mc2!  So even objects that aren’t moving are still holding energy.  A lot of energy.  One kilogram of matter holds the energy equivalent of 21.4 Megatons of TNT, or about 1500 Hiroshima bombs.

The first question that should come to mind when you’ve got a new theory that’s, honestly, pretty insane is “why didn’t anyone notice this before?”  Why  is it that the only part of the energy that anyone ever noticed was \frac{1}{2}mv^2?  Well, the higher terms are a little hard to see.  Up until Einstein that fastest things around were bullets moving at about the speed of sound.  If you were to use the “\frac{1}{2}mv^2” equation for kinetic energy you would be exactly right up to one part in 20,000,000,000,000,000.  All of the higher terms are divided by some power of c (a big number), so until the speed gets ridiculously high they just don’t matter.

But what about the mc2?  Well, to be detected energy has to do something.  If somebody flings a battery at you, it really doesn’t matter if the battery is charged up or not.

Side note: This derivation isn’t a “proof” per say, just a really solid argument: “there’s this thing that’s conserved according to relativity, and it looks exactly like energy”.  However, you can’t, using math alone, prove anything about the outside universe.  The “proof” came when E=mc2 was tested experimentally (with particle accelerators ‘n stuff).

But Einstein’s techniques and equations have been verified as many times as they’ve been tested.  One of the most profound conclusions is that, literally, “energy is the time component of momentum”.  Or “E/c” is at least.  So conservation of energy, momentum, and matter are all actually the same conservation law!

Posted in -- By the Physicist, Physics, Relativity | 81 Comments

Q: What are the equations of electromagnetism? What all do they describe to us?

Physicist: Electromagnetism and all the involved math are surprisingly visual sciences.  Understanding Maxwell’s equations (the equations of electromagnetism) involves pictures aplenty.

\begin{array}{ll} i)&\nabla \cdot \vec{E} = \frac {\rho} {\varepsilon_0} \\ ii)&\nabla \cdot \vec{B} = 0 \\ iii)&\nabla \times \vec{B} = \mu_0\vec{J} + \mu_0 \varepsilon_0 \frac{\partial \vec{E}} {\partial t}\\ iv)&\nabla \times \vec{E} = -\frac{\partial \vec{B}} {\partial t} \end{array}

In these equations \mu_0 and \varepsilon_0 are physical constants that dictate how strong the electric and magnetic forces are, but when doing calculations (or really whenever) it’s a pain to keep track of them.  So, from now on, I’ll ignore them:

\begin{array}{ll} i)&\nabla \cdot \vec{E} = \rho\\ ii)&\nabla \cdot \vec{B} = 0 \\ iii)&\nabla \times \vec{B} = \vec{J} + \frac{\partial \vec{E}} {\partial t}\\ iv)&\nabla \times \vec{E} = -\frac{\partial \vec{B}} {\partial t} \end{array}

Where:

\begin{array}{ll}\vec{E} & \textrm{Is the electric field}\\\vec{B} & \textrm{Is the magnetic field}\\\vec{J} & \textrm{Is the electrical current}\\\rho & \textrm{Is the density of electric charge}\end{array}

The electric and magnetic fields, “\vec{E}” and “\vec{B}“, are like wind or flowing water; at every location they point in some direction, with some strength.

By the way, you’d expect “M” to be used for the magnetic field, but it’s already used for mass, so some genius decided “B” was a good letter for magnets.  There are two strange symbols (operations) in these equations: “\nabla \cdot” and “\nabla \times“.  The first is called the “divergence”, and the second is called “curl”.  Here’s what they mean.

Divergence (\nabla \cdot) is a measure of how much a field comes together or flies apart.  In what follows “\vec{W}” is the flow of water or wind.

Restrict your attention only to the surface of the sink. Positive divergence means that the field is "flowing" out of a region. Negative divergence means that the field is "flowing" into a region. When the divergence is zero, then the amount that "flows" in must be equal to the amount that flows out.

Curl is how much the field “twists”.  For some idea of what curl does to a field; in a whirlpool or a tornado all of the curl is in the center funnel, and the field (wind) wraps around where the curl is.

If W is the way the wind is blowing, then the tornado is the curl of W. Curl is the "twistiness" of the field, which points around the curl. Although the wind around the tornado is moving, all of the twisting is happening only at the tornado itself.


i) \nabla\cdot \vec{E} = \rho:

You can picture magnetic and electric fields in terms of “field lines”.  The more field lines, the stronger the field.  You can use this metaphor, for example, to picture why the fields lose strength as you get farther and farther away from their source (the lines spread out).

This first equation simply states that electric field lines (\vec{E}) only start at positive charges (\rho>0), and only end at negative charges (\rho<0).

Electric field lines only begin and end at charges. Any region that contains no charges (or an equal number of positive and negative charges) will have the same amount of field lines entering it as exiting.

If a region contains no charges, then the number of field lines entering it is the same as the number of field lines exiting it.  Again, “field lines” don’t exist.  They’re just a really useful metaphor.


ii) \nabla \cdot \vec{B} = 0:

This is the magnetic version of the electric equation above.  This states that magnet field lines never begin or end.  Instead, they must always form closed loops.

If somehow magnetic monopoles (“magnetic charges”) existed, then this equation would look exactly like (i).  However, there are no magnetic monopoles.


iii) \nabla \times \vec{B} = \vec{J} + \frac{\partial \vec{E}} {\partial t}:

First the “J” part: \nabla \times \vec{B} = \vec{J}  This equation states that magnetic fields (B) literally curl around electrical currents (J).

"Magnetic fields curl around current". This is a current-carrying wire going through a piece of paper with iron fillings on it. Iron fillings have the nice property that they tend to line up along magnetic fields.

So, in terms of the metaphors earlier, an electric current is like the funnel of a tornado, and the magnetic field is like the wind whipping around the funnel.

A different, but equivalent, way of stating this equation is: “if there’s a magnetic field running around a loop, then there must be a current running through the loop”.

Say you have a farm and it’s completely enclosed by a fence.  Then this statement is equivalent to saying “if there’s wind blowing in the same direction all the way around the fence, then there must be a tornado in my farm”.  Basically, if the wind is moving in circles, it must be twisting around at some point, and that’s the curl.

Notice that the word “through” was underlined (for emphasis!).  When a scientist talks about going through a loop, it’s important to have a rigorous definition.  After all, what if the loop is really weird shaped?  So you find a “spanning surface”.  A spanning surface is any surface that has the loop as a boundary.

An example of a weird loop, and an equally weird spanning surface.

The current is defined to be “going through the loop” if it passes through that loop’s spanning surface.  This may seem like a useless, ad-hoc way of defining “through”, but it turns out that the math likes it.  It falls out of a theorem called “Stokes theorem“.  What’s very important is that the current needs to be passing through any spanning surface.

Which brings us to the second part of the equation: \nabla \times \vec{B} = \frac{\partial \vec{E}} {\partial t}

What if your wire has a capacitor in it?  A capacitor (at its most basic) is a wire going to a plate, then a gap, then another plate, then the wire continues (top diagram in the picture below).  A current, which is just moving charge, causes those charges to build up on one side, and the opposite charge to build up on the other side.  But no actual charge is moving from one plate to the other.

Pick a loop in space. If a current goes through a surface that spans that loop, then the current causes a magnetic field to run around that loop. But in a capacitor a changing electric field takes the place of the current.

So, if you were cruel enough to pick a spanning surface that goes through the capacitor, where there’s no current (middle diagram above), instead of through the wire, where there is a current (top diagram),  you’d get a new and contradictory result.

But the situation itself hasn’t changed!  No matter what surface you pick, the magnetic field around the loop has to be the same.  It can’t be just current that makes the magnetic field curl around.  So, what’s going on in the capacitor?

Well, as current flows into one side a positive charge builds up, and as current flows out of the other side a negative charge builds up.  As a result, an increasing electric field appears in between the plates of the capacitor (bottom diagram).

The conclusion is that current (\vec{J}) and changing electric fields (\frac{\partial E}{\partial t}) create curl in magnetic fields; \nabla \times \vec{B} = \vec{J} + \frac{\partial \vec{E}} {\partial t}


iv)\nabla \times \vec{E} = -\frac{\partial \vec{B}} {\partial t}:

In terms of modern society, this is arguably the most important equation, since the behavior of electrical turbines are dictated by this equation.  Like the last equation, this equation states that a changing magnetic field creates curl in the electric field.

If you have an electric field that curls in a circle, then you can generate current and electrical power.

A changing magnetic field creates curl in the electric field. So if you increase the magnetic field through a loop of wire, there will be an electric field along the wire (tornado makes wind along the fence). When an electric field runs along a wire it pushes charges along it. Induced current!

So if you’ve got a loop of wire sitting next to a magnet, there will be no current.  But, it you move that magnet, the magnetic field through the loop changes, and there will be current.

An automotive alternator. When spinning it will change the magnetic field through those loops of wire, and generate current.

Most power plants (hydro, nuclear, gas, coal, wind) simply spin generators, which (basically) move big magnets back and forth, changing the magnetic field through loops of wire and creating current.  In fact, it changes the magnetic field through those loops 60 times a second, which is why the electrical power from your outlet, in turn, also switches 60 times every second (alternating current).

Solar panels are the glaring exception, but other than that, effectively all power is made the same way: \nabla \times \vec{E} = -\frac{\partial \vec{B}} {\partial t}

Notice, by the way, that unlike equation (iii), there’s no “J” term.  J is moving electric charge.  If there were such a thing as “magnetic charge”, then you could have “magnetic current”, and there would be a magnetic current term in equation (iv).

But there’s not.


The speed of light: Around the turn of the last century (1900) some smart people were noticing that Maxwell’s equations implied that changing electric fields created magnetic fields (iii) and that changing magnetic fields created electric fields (iv).  They started wondering if electric and magnetic fields could sustain each other without the need for electrical charges or currents.

The short answer is yes.  These self-sustaining fields are in fact light (isn’t that weird?).  So when you see a chunk of sunlight, what you’re seeing is just a moving charge on the surface of the sun, which creates a changing electric field, which creates a changing magnetic field, which creates… until those fields interact with the chemicals in your eye and register as “light”, 93 million miles later.

Answer gravy:

The long answer requires some vector calculus.  Take Mawell’s equations in vacuum (no charges, no currents):

\begin{array}{ll} i)&\nabla \cdot \vec{E} = 0 \\ ii)&\nabla \cdot \vec{B} = 0 \\ iii)&\nabla \times \vec{B} = \mu_0 \varepsilon_0 \frac{\partial \vec{E}} {\partial t}\\ iv)&\nabla \times \vec{E} = -\frac{\partial \vec{B}} {\partial t} \end{array}

I’ll just try to solve for the electric field, so taking the time derivative of (iii) and then plugging in (iv) yields:

\begin{array}{ll}\mu_0 \varepsilon_0 \frac{\partial \vec{E}} {\partial t} = \nabla \times \vec{B} \\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = \nabla \times \frac{\partial \vec{B}}{\partial t} \\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = \nabla \times \left( -\nabla \times \vec{E} \right)\\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = -\nabla \times \left( \nabla \times \vec{E} \right) \end{array}

Vector calculus is one of those corners of mathematics resplendent with an amazing array of identities that you pretend to memorize, but secretly look up in books 3 minutes before you’re tested.  Here’s one: \nabla \times \left( \nabla \times \vec{A} \right) = \nabla(\nabla \cdot \vec{A}) - \nabla^{2}\vec{A}
I was thinking briefly about deriving this, but if you know (mathematically) what these symbols mean, then you can prove this yourself (it’s pretty easy). If you don’t, then it’s just scary.  Back to the point:

\begin{array}{ll}\mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = -\nabla \times \left( \nabla \times \vec{E} \right)\\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = -\nabla(\nabla \cdot \vec{E}) + \nabla^{2}\vec{E}\\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = -\nabla(0) + \nabla^{2}\vec{E}\\\Rightarrow \mu_0 \varepsilon_0 \frac{\partial^2 \vec{E}} {\partial t^2} = \nabla^{2}\vec{E}\\\Rightarrow \frac{\partial^2 \vec{E}} {\partial t^2} = \frac{1}{\mu_0 \varepsilon_0}\nabla^{2}\vec{E}\\\Rightarrow \frac{\partial^2 \vec{E}} {\partial t^2} = \left(\frac{1}{(\sqrt{\mu_0 \varepsilon_0}}\right)^2\nabla^{2}\vec{E}\end{array}

Here I’ve used that there are no charges, so \nabla \cdot \vec{E} = 0.

This last equation seems to be written in a very strange way, but check it!  The general wave equation is written: \frac{\partial^2 \vec{A}} {\partial t^2} = v^2\nabla^{2}\vec{A}, where A is a wave, and v is the propagation speed of that wave.  So, the electric field E, propagates as a wave at v = \frac{1}{\sqrt{\mu_0 \varepsilon_0}} = 2.99 \times 10^8 m/s (light speed).

I suspect that when this was first figured out, the dudes involved shit themselves.  At that time there had been no previous (direct) indication that electricity and magnetism had anything to do with light.

Even more profound, since all physical laws are independent of how fast you’re moving (physics is the same whether you’re moving or sitting still), there was now a speed that’s independent of how fast you’re moving.  Relativity!

Posted in -- By the Physicist, Equations, Physics | 14 Comments

Q: What is the entropy of nothing?

Mathematician: In physics, entropy relates to the number of states that a system can be in. If a system actually contained absolutely nothing, then (quantum mechanical considerations aside), it would only have one state, and therefore would have 0 entropy (there would be no uncertainty at all about what state it is in). In reality though, things are more complex. Particles and anti-particles will pop into existence in pairs in empty space.

Posted in -- By the Mathematician, Entropy/Information, Philosophical | 8 Comments

Q: How can quantum computers break encryption?

Physicist: What follows is the famous Shor algorithm, which can break any RSA encryption key.


The problem: RSA, the most common form of public key encryption, is based on the fact that large numbers are hard to factor.  Without going into too much detail, an encryption key M, is the product of two large primes P and Q (M=PQ), and breaking the key comes down to finding P and Q, given M.  This is difficult and can’t be done in any reasonable time for large values of M.

 

For example, M=6563955109193980058697529924699940996676491413219355771.  What are P and Q?

(ans: 8764325985409367513190343 and 748940091927375783904810247597.  Prime numbers supplied by numberempire.com)

One method involves modular arithmetic.  In modular math you take a number, M, and every time you deal with a number larger than M, you subtract M until you’re dealing with a number smaller than M.  Clocks are the most commonly seen example of “mod 12” arithmetic.  For example, 31 o’clock is the same as 19 o’clock is the same as 7 o’clock.

You could re-write that as [31]_{12} = [19]_{12} = [7]_{12}.  There’s a lot of interesting stuff (card tricks) involving modular math in this older post, so if you’re not familiar with modular math jump over there real quick.

Now check this out:

\begin{array}{lll} &[2^0]_{15}=[1]_{15}\\&[2^1]_{15}=[2]_{15}\\&[2^2]_{15}=[4]_{15}\\&[2^3]_{15}=[8]_{15}\\&[2^4]_{15}=[1]_{15}&([16]_{15}=[1]_{15})\\&[2^5]_{15}=[2]_{15}&([32]_{15}=[2]_{15})\\&\cdots\end{array}

This pattern: 1,2,4,8,1,2,4,8,1,… will repeat forever.  Here’s why that’s useful.  For a given A (it doesn’t really matter what A is), if you can find the lowest value, r, for which [A^r]_M=[1]_M then, if r is even, you can do this:

\begin{array}{ll} &[A^r]_M=[1]_M\\&\Rightarrow [A^r-1]_M=[0]_M\\&\Rightarrow [A^{2\frac{r}{2}}-1^2]_M=[0]_M\\&\Rightarrow [(A^{\frac{r}{2}}-1)(A^{\frac{r}{2}}+1)]_M=[0]_M\end{array}

If r isn’t even you change A and try again.  When you say something is equal to [0]_M, what you mean is that it’s a multiple of M.  So (A^{\frac{r}{2}}-1) and (A^{\frac{r}{2}}+1) have factors in common with M, and yet (for mathy reasons) neither of them is a multiple of M on its own.

So in the “15” example, A=2, M=15, and r=4.

\begin{array}{ll} [2^4]_{15}=[1]_{15}\\\Rightarrow [2^4-1]_{15}=[0]_{15}\\\Rightarrow[(2^2-1)(2^2+1)]_{15}=[0]_{15}\\\Rightarrow [(3)(5)]_{15}=[0]_{15}\end{array}

Boom!  There are your factors!  P=3 and Q=5.

For large values of M however, you can’t just raise A to higher and higher powers and wait until [A^r]_M=[1]_M.  Heavens no.  For example, if you were trying this technique on the M from the top of this section you’d find that [2^{6563955109193980058697529175751084743315298140895917832}]_M= [1]_M.

It’s easy to raise A to a large power once, but there isn’t enough time in the universe to raise it to every power up to some huge number.

Enter quantum computing.  A quantum computer has no problem raising A to many, many powers at the same time.


The solution (The Shor algorithm): So the name of the game is to find that “r” value.  It has two properties: it’s the smallest number such that [A^r]_M=[1]_M, and as you raise A to higher and higher powers r is how long it takes for the pattern to repeat (like in the “15” example; 1,2,4,8,1,2,4,8,…).

 

2^x mod 15. This is the example used to demonstrate the Shor algorithm below. The important thing to notice is that the function repeats.

In a nutshell: You’d like to find the value of r such that the function ([A^x]_M) repeats every r, so that you can do the “(A^{r/2}+1)(A^{r/2}-1)” trick.  Since this function has such a nice, repeating pattern the “Fourier transform” is very sharp.  The Fourier transform breaks down signals into their component frequencies, and a repeating function has a very definite frequency.  You can use that frequency to give you r.  The smaller r is, the faster the function repeats, and the higher the frequency, and the larger r is, the slower the function repeats, and the lower the frequency.

So why do you need a quantum computer to do this?  You need the function to exist in the computer all at once.  If (like in a conventional computer) you can only have one value at a time, suddenly it doesn’t make sense to talk about the “frequency” of the function.

A classical computer can only see one value of x at a time, so there's no sense talking about "repeating" and "frequencies". A normal computer can do many values in a row, and different computers can handle different values at the same time, but you need a quantum computer to have multiple values in the same processor.

In what follows I’ll go through the algorithm step by step, then run through an example (factoring 15), indicated by italics.  It’s not necessary to understand all of the math in detail to understand the ideas.  So please: don’t stress.

The computer starts with two registers.  In what follows the notation “| 1\rangle | 2\rangle” means the first register holds a 1 and the second register holds a 2.  Several of these added together means that the computer is in multiple states at the same time.  For example: “| 1\rangle | 2\rangle + | 3\rangle | 4\rangle” means that the computer is holding two states at the same time.  1 and 3 in the first register, 2 and 4 in the second register.

Step 1) Initialize the first register to an equal super-position of every possible number from 0 to N, where N is some power of 2 that’s larger than M2, and initialize the second register to zero.

N has to be a power of 2 because it’s dictated by the number of qbits in the first register, but for now, the only thing that’s important is that N is big.

The quantum state looks like this: \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1}|x\rangle |0\rangle.

In practice, for M=15, you’d want N>225.  But never mind that now, to save space we’ll use N=16.  The initialized state looks like:

\frac{1}{4}|0\rangle |0\rangle +\frac{1}{4}|1\rangle |0\rangle +\frac{1}{4}|2\rangle |0\rangle +\frac{1}{4}|3\rangle |0\rangle +\frac{1}{4}|4\rangle |0\rangle +\frac{1}{4}|5\rangle |0\rangle +\frac{1}{4}|6\rangle |0\rangle +\frac{1}{4}|7\rangle |0\rangle +\frac{1}{4}|8\rangle |0\rangle +\frac{1}{4}|9\rangle |0\rangle +\frac{1}{4}|10\rangle |0\rangle +\frac{1}{4}|11\rangle |0\rangle +\frac{1}{4}|12\rangle |0\rangle +\frac{1}{4}|13\rangle |0\rangle +\frac{1}{4}|14\rangle |0\rangle +\frac{1}{4}|15\rangle |0\rangle

The “1/4” in front of each term is there because the probability of an event is the square of the probability amplitude, which is what we’re looking at.  The “1/4” means that each term has a 1/16 ( = (1/4)2 ) chance of being measured, if you measured right now.  Which you don’t.

Step 2) Define f(x)=[A^x]_M.  Take the first register, run it through f, and put the result in the second register.

Now: \frac{1}{\sqrt{N}} \sum_{x=0}^{N-1}|x\rangle |f(x)\rangle

It doesn’t really matter what A you pick, so generally smaller, easier A’s are the way to go.

In this example, f(x)=[2^x]_{15}, so now you have:

\frac{1}{4}|0\rangle |1\rangle +\frac{1}{4}|1\rangle |2\rangle +\frac{1}{4}|2\rangle |4\rangle +\frac{1}{4}|3\rangle |8\rangle +\frac{1}{4}|4\rangle |1\rangle +\frac{1}{4}|5\rangle |2\rangle +\frac{1}{4}|6\rangle |4\rangle +\frac{1}{4}|7\rangle |8\rangle +\frac{1}{4}|8\rangle |1\rangle +\frac{1}{4}|9\rangle |2\rangle +\frac{1}{4}|10\rangle |4\rangle +\frac{1}{4}|11\rangle |8\rangle +\frac{1}{4}|12\rangle |1\rangle +\frac{1}{4}|13\rangle |2\rangle +\frac{1}{4}|14\rangle |4\rangle +\frac{1}{4}|15\rangle |8\rangle

The same repeating 1,2,4,8,… pattern shows up.

Step 3) “Look” at the second register.  One of the incredibly awesome things about quantum computing is that it’s sometimes necessary to entangle the outside of the computer with part of the internal mechanism.  You can describe this as “wave function collapse” or “projection” or whatever.  In this case, it has the effect that the vast majority of states “vanish”, and that those that remain are spaced at a regular interval (which turns out to be the “r” that the algorithm is looking for).  One of the clever things about the Shor algorithm is that it doesn’t matter what you see, just that it is seen.  (It doesn’t have to be seen by a person or anything).  Lets say that the observed value of f(x) is “B”.  The new state looks like:

\sqrt{\frac{r}{N}} \sum_{j=0}^{N/r} |x_0+jr\rangle |B\rangle

This is just the collection of all of the inputs that could have lead to f(x)=B, starting at the lowest value of x that could do it (x_0) and then every rth number after that.

Let’s say, for example, that you look at the second register and you see “8”.  As a result the second register is in a “definite state” (|8\rangle), but the first register is still in a super-position of several states.

\frac{1}{2}|3\rangle |8\rangle +\frac{1}{2}|7\rangle |8\rangle +\frac{1}{2}|11\rangle |8\rangle +\frac{1}{2}|15\rangle |8\rangle

The states that “disappeared” are still being used.  They’re just being used by different versions of you that saw different results (1, 2, and 4).  That should be pretty off-putting, so just roll with it.  Not that it matters, but in this case B=8, x_0 = 3, and r=4.  This r is what the algorithm is looking for.

Step 4) Take the “quantum Fourier transform” of the first register.  The Fourier transform takes in waves and spits out the frequencies of those waves.  If you have a function, f, then its Fourier transform is written “\hat{f} ” (read “f hat”).  A good way to picture the Fourier transform is as a piano keyboard.  Sound itself is just air moving back and forth (f), but if it’s moving back and forth quickly, then you must be playing the high keys (\hat{f}).

Functions (left column) and their Fourier transforms (right column): Sin(x), Sin(2x), and Sin(x)+Sin(2x)

In this case the pattern of numbers that exist in the first register are all evenly spaced (r apart).  This is a very regular tone, so the Fourier transform will have sharp spikes like the examples in the picture above.  The new state, after the quantum Fourier transform is:

\frac{1}{\sqrt{N}} \sum_{k=0}^{N-1} \left[\sqrt{\frac{r}{N}} \sum_{j=0}^{N/r} e^{-2\pi i \frac{k}{N} (x_0+jr)} \right] |k\rangle |B\rangle =\frac{\sqrt{r}}{N} \sum_{k=0}^{N-1} e^{-2\pi i \frac{k}{N} x_0} \left[ \sum_{j=0}^{N/r} e^{-2\pi i \frac{kr}{N}j} \right] |k\rangle |B\rangle

The Fourier transform is the part of this process that takes advantage of constructive and destructive interference.

The Fourier transform of the example state in step 3 is:

\frac{1}{2}|0\rangle |8\rangle + \frac{i}{2}|4\rangle |8\rangle -\frac{1}{2}|8\rangle |8\rangle -\frac{i}{2}|12\rangle |8\rangle

Yes, those are imaginary i’s (as in i = \sqrt{-1}), but it’s cool.  The probability is equal to the square of the absolute value of each of those numbers, and the absolute value of i is 1.  So, for example, \left|\frac{i}{2}\right|^2 = \left(\frac{1}{2}\right)^2 = \frac{1}{4}.  Each of these four values of k (0, 4, 8, and 12) has an equal one-in-four chance of being measured.

Step 5) Measure the first register.  This will give you another number.  The “spikiness” of the Fourier transform in the last step means that when you measure the value of the first register, you’ll be most likely to measure values of k such that \frac{kr}{N} is very near, or is equal to an integer.  The last equation in step 4 is mostly fluff.  The important part is the “\sum_j e^{-2\pi i \frac{kr}{N}j}” part.  When \frac{kr}{N} is close to an integer, then this summation looks like “1+1+1+1+1….” and ends up very large.  Otherwise, it ends up canceling itself out.  For example, if \frac{kr}{N}=\frac{1}{2}, then the summation becomes “1-1+1-1+1-1…”.

The important summation as seen in the complex plane. When kr/N is close to an integer every term in the summation "agrees" and makes the sum bigger. When kr/N is not close to an integer the terms of the sum "disagree" and the terms cancel each other out, leading to a very small total sum.

This is just the very mathematical side of saying “the Fourier transform is very spiked”.  For a better understanding of this math, look up “complex plane“, “complex exponential“, and “geometric sum“.

The probability of measuring any of the values of the state \frac{1}{2}|0\rangle |8\rangle + \frac{i}{2}|4\rangle |8\rangle -\frac{1}{2}|8\rangle |8\rangle -\frac{i}{2}|12\rangle |8\rangle is 1/4.  Notice that each value of k makes \frac{kr}{N} an integer!  (remember that r=4 and N=16)  Let’s say that (by random chance) we measure the first register and get k=12.  It actually doesn’t matter which of the values we get (0, 4, 8, or 12), which is a big part of what makes this algorithm awesome (really awesome).

Step 6) Do some math.  So, in step 5 you measure a value of k such that \frac{kr}{N} \approx \ell, where \ell is an integer.  You know what N is (it’s the number of input states from step 1), and you know what k is (just measured it), but r and \ell are unknown.  So what you’ve got is: \frac{k}{N} \approx \frac{\ell}{r}, an approximation of one rational number with another.

I didn’t want to make a big deal about it, but for the number, M, that you’re factoring, you’ll need N>M2.  And, since M>r, N>r2 as well.  It turns out that, given this condition, for a given value of k you’ll have a uniquely determined \ell and r.  You don’t care what \ell is, so chuck it out and keep r.  The way you figure out what \frac{\ell}{r} is from \frac{k}{N} is by using a trick called “Approximation by Continued Fractions“.

In the last step the measured value was k=12.  So, \frac{\ell}{r} \approx \frac{12}{16}=\frac{3}{4}.  In this case the approximation was exact, and r=4 and \ell=3 (not that it matters).

So, finally, r=4!  We now know that [2^4]_{15} =[1]_{15}. So:

\begin{array}{ll} &[2^4-1]_{15} =[0]_{15}\\& \Rightarrow [(2^2-1)(2^2+1)]_{15} =[0]_{15}\\& \Rightarrow [(3)(5)]_{15} =[0]_{15}\end{array}

We can now say, with heads held high, that 15 = 3×5.  Tell your friends!


The “15” example is a little ideal (but also simple enough that you can look at each step in detail), so here’s one more example of the math with bigger numbers.

 

Say M=77, and you want to find the factors.  Then you choose A=2 (because it’s easiest), and since M2=5929, you choose N = 8192 (a 13 bit register).  You run through the algorithm and find that k=5188 (it could have been any one of many values, but this is what you got this time).  Now you know that \frac{5188}{8192} \approx \frac{\ell}{r}, where r<M.  Using approximation by continued fractions you find that the closest values of \ell and r that work are \frac{17}{30}.  So, your guess for r is r=30.

The two numbers you get are: [230/2-1]77 = [32767]77 = 42 and [230/2+1]77 = [32769]77 = 44.  Now obviously, neither 42 nor 44 are factors of 77.  However, they each share a factor in common with 77.  All you have to do now is find the greatest common divisor (which is an extremely fast operation).

gcd(77,42) = 7

gcd(77,44) = 11


The Shor algorithm works perfectly (gives you a useful value of k) more than 40% of the time.  That may not sound great.  But if it doesn’t work the first time, why not try it a few thousand more times?  On a quantum computer, this algorithm is effectively instantaneous.

 

I left a couple of details out of the math here because, frankly, this post is a little over the top.  If you read all the way to here, buy yourself a drink and take a nap.  However, if you’re interested in exactly why you need N>M2, how to execute the quantum Fourier transform, and why the algorithm works better than 40% of the time, then there are some rough notes in this pdf: Shor.


A commenter pointed out that the equations in this post may not be showing up for everyone.  In partial remedy, here’s a pdf of the above post: quantum_RSA_Shor.

 

Posted in -- By the Physicist, Computer Science, Equations, Math, Number Theory, Physics, Probability, Quantum Theory | 19 Comments

Q: How does quantum computing work?

The original question was: Could you give a description of the principles behind quantum computing? And how is it that some problems have a better time-complexity when they’re run on a quantum-computer?


Physicist: Particles and sets of particles are frequently seen to be in multiple states simultaneously.  The internal mechanisms of a quantum computer are, similarly, in many states at the same time.  An ordinary computer can take a single input, do a single calculation, and output a single result.  A quantum computer can take many inputs, do many calculations, and produce many results at the same time.

The number of parallel computations increases exponentially with the number of qbits (quantum bits) involved.  So, while a normal computer can take exactly one input with N bits, a quantum computer can take 2^N inputs, simultaneously, with N qbits.

Like a Turing machine, the exact construction of a quantum computer (and there are a lot of extremely different constructions) isn’t particularly important for the philosophy behind its functioning.  Some example architectures include: ion traps, anyon world line manipulation, and diamond spintronics.

In the picture below (and in general) the notation “|x\rangle” is read as the “x state”.  Nothing special, just notation.  The number in front of a state is that state’s “probability amplitude”.

One method for generating qbits is to fire a photon at a beam splitter. The photon both passes through (0) and is refelected (1). Unlike an ordinary bit, a qbit is in both states at the same time. If you set this up several times you can get several qbits, and you can talk about larger numbers. With two qbits you can have the numbers/states 0, 1, 2, and 3 (00, 01, 10, and 11 in binary) all at the same time.

But, a quantum computer is more than just a lot of normal computers running in parallel.  Those different internal states interfere with each other, adding and subtracting from each other the same way waves can undergo constructive and destructive interference.  Quantum interference is the means by which a quantum computer does its computations (the quantum part anyway).  Most proposed quantum computers can also do classical computations.

Therein lies the problem with quantum computers.  A quantum computer can do all of those calculations simultaneously, but the answer it spits out is a jumbled mess of every possible answer.  The trick to a good quantum algorithm is to find a way to express that jumbled mess as something useful.

An excellent (free) reference can be found here.  They are the lecture notes for a quantum computation class in the 90’s, by John Preskill.  Were I so equipped, I would have Preskill’s babies.  Awesome notes.  Reads like a good book.

To understand quantum computers it helps to see an example of a quantum algorithm.  So, for the not-faint-of-heart and the girded-of-loin, what follows is answer gravy:


Deutsch’s problem: One of the classic examples of quantum computation is “Deutsch’s problem”.

Say you’ve got an impatient (quantum) Oracle, that has surprisingly strong opinions about all the numbers from 0 to 7, and hates (hates) answering questions.  So, after answering more than a couple of questions it start stabbing.

If you give the Oracle a number it either says “Good number!” (signified by 1) or “Bad number!” (signified by -1).  Somehow you find out that either the oracle always says the same thing, or it says it likes exactly half of the numbers and hates half of the numbers.  So, for all the numbers 0 to 7 it could hate all of them, like all of them, or like 4 / hate 4.

The Quantum Oracle after three or more questions.

It’s worth noting here that Deutsch’s problem doesn’t really solve anything, it’s just an interesting example.

So, if you ask “0?” and the Oracle says “Good number!” you now know that either she likes all numbers, or likes half / hates half.  To know for certain you’d have to ask as many as 5 questions (it might like 0-3, and hate 4-7), and by then you’d be Oracle stabbed.

If only there were some way to ask it about every number at the same time…

Here it is: Prepare 3 qbits.  With these you can represent a super-position of every number between 0 (000) and 7 (111), using binary.  The Oracle (being a quantum Oracle) can respond to each number at the same time, when you give her a superposition of every possible number, but she’ll give you a superposition of every possible response.

The input is a quantum state that looks like this:

\frac{1}{\sqrt{8}} \sum_{x=0}^7 | x \rangle = \frac{1}{\sqrt{8}} | 0 \rangle + \frac{1}{\sqrt{8}}|1 \rangle +\frac{1}{\sqrt{8}}| 2 \rangle +\frac{1}{\sqrt{8}}| 3 \rangle +\frac{1}{\sqrt{8}}| 4 \rangle +\frac{1}{\sqrt{8}}| 5 \rangle +\frac{1}{\sqrt{8}}| 6 \rangle +\frac{1}{\sqrt{8}}| 7 \rangle

This is just an even combination of every number from 0 to 7 where (since probability is the square of the probability amplitude) the probability of each term being seen is 1/8.

If the Oracle’s answer to the number x is f(x), then as a result (never mind how), you can construct a new state: \left[ \frac{1}{8} \sum_{x=0}^7 f(x) \right] |0\rangle+(\cdots)|1 \rangle+ \cdots.  There are other weird summations for each of the other states (|1 \rangle, |2\rangle, \cdots) but for this problem the zero state is the only important one.

So if the Oracle hates all the numbers you have 8 -1’s (totaling -8), if the Oracle likes all the numbers you have 8 1’s (totaling 8), and if the Oracle likes half and hates half you have 4 1’s and 4 -1’s (totaling zero).  So, if the Oracle likes all the numbers the probability amplitude of the zero state (|0 \rangle) is 1, if she hates them all the amplitude is -1, and if she half-n-halfs the amplitude is 0.

Since probability is equal to the square of the amplitude, if she either hates or likes all the numbers then you will definitely measure a |0 \rangle (since 1^2 = (-1)^2=1), and if she half-n-halfs you’ll definitely see some other state, and not see a |0 \rangle (since 0^2=0).

Notice that that summation in front of the |0\rangle is over every number from 0 to 7.  Normally, you can’t ask about more than one number at a time, but because of “quantum parallelism” you can input multiple, over-lapping, numbers/states at the same time, and even process them simultaneously.  In an ordinary computer the number could be just zero, or just three, or whatever, but never all of them at the same time.  Notice also that when the Oracle “half-n-halfs” that summation becomes “+4-4=0”.  That’s destructive interference!  (the other possibilities involve constructive interference)

So, by asking a single “quantum question” you can discover instantly whether the Oracle always has the same opinion about the numbers 0-7, or has differing opinions.  Normally, you’d be stuck asking as many as 5 ordinary questions.  The same technique continues to work even for very large numbers.  For example, it’s still just one quantum question when you’re considering the Oracle’s opinion on the numbers 0 to 1,048,575 (which is 2^{20} numbers).

But, the drawback is always that the answer comes back as a combination of every input, requiring the writer of the quantum algorithm to be crazy-clever.

Posted in -- By the Physicist, Computer Science, Engineering, Entropy/Information, Math, Physics, Quantum Theory | 22 Comments

Q: What causes buoyancy?

The original question was: I know that the buoyant force on an object equals the weight of the liquid that it displaces.  However, I have some questions as to what it means to ‘displace’ a liquid,  for example, does an object that rises vertically from the floor of a pool to the surface of the water, such as the very wall of the pool, displace the water and therefore have a buoyant force exerted on it? For that matter, does a shoreline displace the ocean?
If this is true, as I think it is, that an object has buoyancy even if there is no liquid directly below it, then where does the force come from?  I have always imagined that buoyancy originates from particles of the higher density liquid pushing against the bottom of the submerged object.  However, this doesn’t really make sense, because I know a kickboard will always float to the surface of the water even if you attempt to push it all the way to the bottom.
Can you help me understand how this works?


Physicist: All that gravity wants to do is get as much matter as low as it can.

Counter balance scales: Despite the lighter side of the scale being elevated (seemingly defying gravity) there's still a net lowering of mass because the heavier side drops. But, it doesn't drop as fast as it would with no counter weight.

Normally things can just fall,  but if there’s a fluid in the way, everything needs to be rearranged.  For example, as a lead weight sinks in water it’s forcing the water just below it to be moved just above it.  Although there is some mass being moved upward (the water) the net result is that more mass is moved lower, so the lead sinks.

An immersed object displaces water as it rises and falls. As it falls some amount of water has to be elevated. So, the total amount of matter moving downward is less than it is for the same amount of moving matter that's not in water. As a result, the immersed object feels less total downward force.

Similarly, the kickboard will float to the surface because as it moves up the water just above it is moved to just below and, since the water is more dense, the net result is that more mass is moved lower.  If you could somehow get the kickboard absolutely flush against the bottom of a pool it should stay there, but that’s really hard to do.  Maybe with a flat board, and a flat, non-porous pool floor?

Buoyancy can also be derived in terms of the difference in pressures between the top and bottom of an object.  The deeper you are in water, the higher the pressure.  So the bottom of an object is always subject to a higher pressure than the top, resulting a net upward force.

In either case, if there’s no water below an object, there shouldn’t be any buoyant force.

Objects surrounded by water are subject to the buoyant force, but mere contact with water isn't enough.

Personally, I find the counter-balance analogy easier to think about.  For example, it’s easier to understand why the water displaced by a boat weights exactly as much as the boat itself, from a counter-balance point of view.  But the pressure-difference approach works exactly as well, and ultimately is a “safer” way to do the math.  They give you exactly the same answers, they’re just two different ways of looking at the situation.

Posted in -- By the Physicist, Physics | 3 Comments