Q: Is the quantum zeno effect a real thing?

Quick aside: “Zeno’s paradox” was originally proposed (probably) by Zeno, who was basically trying to show that movement is impossible.  His idea was that in order to get to where you’re going you need to cover half of the distance, then half of what remains (1/4), then half of what remains (1/8), and so on.  So you’re covering an infinite number of half-ways, which just can’t be done since there’s an infinity in there.  Mathematicians today ain’t shook, firstly because there aren’t any actual problems with the math and secondly because they can move to their chalk boards to check.

The “Quantum Zeno effect” or “Turing Paradox” is the idea that if you measure a quantum system repeatedly and rapidly you can prevent it from changing.   Kind of “a watched quantum pot never boils” sort of thing.  This is yet another example of the math behind quantum theory writing checks that nobody though the experiments would be able to cash.  And yet…


Physicist: It is a real effect!

However, it’s not like if you pay attention to a uranium brick it becomes less radioactive and if you look away it becomes more radioactive.  The details depend on how you do the measurement, and (it’s worth mentioning) not on whether or not a conscious Human being is doing the measuring.

R.I.P. Schrödinger Jr. (1929-1935)

As always, the easiest thing to consider is light, which wears its quantum mechanical properties like they’re going out of fashion.  There are a lot of things that cause the polarization of light to rotate, like passing it through sugar water.  Just by measuring the polarization rapidly you can actually halt (or nearly halt) the rotation.

Horizontally polarized light that passes through a series of “rotators” will eventually become vertically polarized and won’t pass through a horizontal polarizer. However, repeated measurements can prevent that from happening, despite the fact that the measurements aren’t really affecting the light.

“Measure” in this case means “pass the light through a polarizer“: if it passes through it has the same polarization as the polarizer, and if it doesn’t pass through then it didn’t have the same polarization.  The greater the difference in angle between the polarization of the light and the polarization of the polarizer, the lower the chance that the light will get through, until at 90° none of the light gets through.

Classically (“classical” means “before quantum mechanics”) the interaction that we see between a polorized photon and a polarizer is impossible.  It should be that if you have a vertically polarized photon and it tries to pass through a horizontal polarizer, at any time, it’ll be stopped.  A bunch of do-nothing measurements shouldn’t change that.  In the example above, the rotators should make it so that with each rotator the chance of the photon getting through the next horizontal polarizer gets smaller and smaller.

But the quantum mechanical nature of measurements is different and very weird.  If light passes through a polarizer it comes out having the same polarization as the polarizer, without actually changing or being affected by the polarizer (this is where the weirdness of quantum mechanics comes in).  A classical photon (or a classical anything) needs to be in one particular state.  But in quantum mech, as demonstrated by the Stern-Gerlach experiment (among others), even when you know that a photon is polarized in one particular direction, it’s still in multiple states.  It’s just a matter of picking a “basis”.  For example, a vertical state is combination of the two diagonal states.

The state of a classical photon is what it is, regardless of how you measure it (blue). But the state of a quantum photon (actual photon) takes different forms depending on how it’s being measured.  In this particular case, a vertical state is exactly the same as a combination of diagonal states (red).

Not only can you halt the change of a quantum system through measurement, you can induce change.  For example, you can rotate the polarization of light by measuring it a few times.

If you shoot vertically-polarized light at a horizontal polarizer, nothing will get through (there’s no \rightarrow in \uparrow ).  But if you shoot it at a diagonal polarizer first, half will get through the diagonal polarizer ( \uparrow is half \nwarrow and half \nearrow), and then half of that will get through the horizontal polarizer (\nearrow is half \uparrow and half \rightarrow).  Just by measuring the polarization, some of the light has been changed from vertical to horizontal!  (the rest is destroyed)

Normally light can’t travel through a vertical polarizer and then a horizontal polarizer. But by putting a diagonal polarizer in between them, suddenly it’s possible (about 1/8 of the ambient light gets through all three). This “measurement effect” is impossible in classical physics.

By replacing the single 45° intermediate polarizer with a fan of dozens of polarizers you can get almost all of the light through and rotate the polarization as much as you like.  In fact, this is the essential idea behind how most polarization rotators work.  For example, in the face of a digital watch, or in every pixel of an LCD screen, there’s a layer of liquid crystal that twists when an electrical voltage is passed through it.  It goes from acting like a series of vertical polarizers, to acting like a fan that can rotate light.  In this way it can rotate the polarization of the light to either pass through or be stopped by a stationary polatizer, which is generally the outer-most surface of the screen.

The Quantum Zeno effect at work in the world of high fashion.

Answer gravy: The probability that a photon of one polarization will pass through a polarizer that’s offset by some angle θ is P = |cos(θ)|2.  So for example, if the polarization of a photon is vertical and the polarizer is set at 45°, then there’s a 50% chance of the photon going through: P = |cos(45°)|2 = |1/√2|2 = 1/2.  Once the photon goes through it assumes the polarizer’s angle of polarization.

Technically, this is a result of the polarizer removing the part of the photon’s probability amplitude that’s perpendicular to the polarizer.  Literally, the photon both does and doesn’t go through the polarizer, and the part that goes through is the part that “agrees” with the polarizer.

If you have a fan of N polarizers separated from each other by θ, then the probability of a photon making it through all of them is P = (|cos(θ)|2)N = |cos(θ)|2N.  So, let’s say you want to get a bunch of photons to end up rotated by 90° (π/2 radians) by passing it through N polarizers, each offset from the one before by θ = 90°/N = π/(2N): P = |cos(π/(2N))|2N.  But check this out!

\begin{array}{ll}P=\left| \cos^2{\left(\frac{\pi}{2N}\right)}\right|^N\\= \left| 1-\sin^2{\left(\frac{\pi}{2N}\right)}\right|^N\\\ge \left| 1-\left(\frac{\pi}{2N}\right)^2\right|^N&\left(-\sin^2{(x)}\ge -x^2 \right)\\\ge 1-N\left(\frac{\pi}{2N}\right)^2&\left((1-x)^N \ge 1-Nx \right)\\= 1-\frac{\pi^2}{4N}\end{array}

The second term there gets real small, fairly fast, so by adding more and more polarizers at smaller and smaller angles, the chance of a photon getting all the way through can get arbitrarily close to 100%.  For example, 90 polarizers that each shift by 1° allow over 97% of the light to get through.

In classical physics (where light can only be in one definite state), you’d expect very nearly zero light to get through, because in that fan of 90 polarizers, some of them will be perpendicular or nearly perpendicular to any given photon that comes along.

The cat picture from is from here.

The professional looking polarizer picture was taken from this article.

Posted in -- By the Physicist, Physics, Quantum Theory | 12 Comments

π day!

We suddenly got hit with a barrage of π questions today.  Turns out it’s not a coincidence.

π is defined, very humbly, as the ratio of the circumference of any circle to its diameter.  From that definition alone it’s managed to worm its way into damn near every branch of math and physics.  For example, did you know that 1+\frac{1}{2^2}+\frac{1}{3^2}+\frac{1}{4^2}+\frac{1}{5^2}+\cdots=\frac{\pi^2}{6}?  It even shows up in the mathematical form of the vaunted Heisenberg Uncertainty Principle: \Delta x \Delta p \ge \frac{h}{4\pi}.

But what’s often cited as the most exciting thing about π is that its decimal expansion, “3.14159…”, goes on forever without repeating.  This isn’t really special to π.  In fact this is the case with (effectively) all irrational numbers.  But π is probably most people’s first exposure to the weirder realities of math, so it’s near and dear to a lot of hearts out there.


Q: Could the sequence of numbers making up the infinite expansion of Pi (or any other irrational number) be considered to make up an even random distribution? If not, how does it differ? If yes, couldn’t it be used when randomness is needed?

Mathematician: The digits of Pi are certainly not random, but its first few billion digits work well enough as random numbers for a lot of applications (i.e. it acts nicely as a source of pseudo random numbers).


Q: If it is true that Pi has all possible finite sequences, and the universe is finite, then then entire universe is somewhere described in the digits of Pi. Talk about your compression algorithms. “You can find a complete description of the universe, zip-encoded, starting at digit 10^120239234884840302929393482022039948393492039483940293849348203949384….”

Physicist: Assuming that decimal expansion of π (“3.1415…”) really is perfectly random, then yes; every possible finite description of the universe is encoded somewhere in the unending digits of π.  That said, there isn’t actually any compression.  If you think of any random number, for example, your 7 digit phone number, then the probability that any particular digit in π is the start of that particular string of 7 numbers is about 1 in 107.  That means that, on average, you’ll have to go out about 107 digits to find a particular phone number.  But to describe that “address” takes exactly 7 digits.

Point is, any sequence of numbers (may/probably) exist in π, but the description of where to find that sequence is effectively always as long as the string of numbers itself.  Sometimes a little shorter, sometimes a little longer.  In fact, try it yourself!  So, if you want to find the string of numbers that describes an entire universe in detail, you’d need a computer about as big as that universe to hold the location of where that number starts in π.


Q: If we changed the math system away from a base 10 system could we find a system where π was not irrational?

Physicist: Rational numbers are numbers that can be expressed as one integer number over another, like “\frac{a}{b}“.  What’s not involved in that definition is the base of the numbers involved, and it turns out not to matter.

The decimal form of a rational number always repeats forever.  It may take some fractions longer than others, and sometimes there’s a “settling down” period, but they always repeat.  In fact, the longest a pattern can go before repeating is always at least a little less than the denominator of the fraction.  Regardless of the base used.  For example:

5/7 = 0.714285714285714285714285… repeats every 6 digits.

37/40 = 0.925000000… settles down, and then repeats every 1 digit.

2/3 = 0.6666666… repeats every 1 digit.

In binary you’d write “2/3” as “10/11” and its binary representation is 10/11 = 0.10101010101…., which repeats every 2 (still less than 3, isn’t that strange?).

Using geometric series, you can convert any repeating number, in any base, into a fraction.  So anything with a repeating representation in decimal (base 10), binary (base 2), hexadecimal (base 16), whatever, can be written as one number over another (it’s a rational number).  Conversely, if a number is not rational, it can never have a repeating representation in any base.  It was difficult to prove the irrationality of π conclusively (for a couple thousand years), but we’ve known for about 250 years that π is definitely irrational, so there’s no way to write it in a way that repeats.

Posted in -- By the Mathematician, -- By the Physicist, Math | 16 Comments

Q: Is there an intuitive proof for the chain rule?

Physicist: The chain rule is a tool from calculus that says that if you have one function “nested” inside of another, f(g(x)), then the derivative of the whole mess is given by \frac{df}{dx} (g(x))\cdot \frac{dg}{dx} (x).  There are a number of ways to prove this, but one of the more enlightening ways to look at the chain rule (without rigorously proving it) is to look at what happens to any function, f(x), when you muck about with the argument (the “x” part).

Doubling the "argument" of a function scrunches it up. As a result the slope at each corresponding part of the new function is doubled.

When you multiply the argument by some amount, the graph of the function gets squished by the same amount.  If you, for example, plug in “x=3” to f(2x), that’s exactly the same as plugging in “x=6” to f(x).  For f(2x), everything happens at half the original x value.

However, while f(2x) when x=3 is the same as f(x) when x=6, the same is not true of their slopes.  The slope (derivative) is “rise over run” and the run just became half as long, so the slope just got twice as big.  Scrunching a graph makes the slope steeper (see picture above).

So, the slope of f(2x) at x=3 is actually double the slope of f(x) at x=6.  You can write this in general as \frac{d}{dx} \left[ f(2x) \right] = \frac{df}{dx}(2x)\cdot 2.

Here’s the calculus leap: replacing the x in f(x) with 2x clearly means that you’re running through the function twice as fast, so when you take the derivative you just multiply by two to deal with the scrunching.  But, if you instead replace x with a more complicated function, g(x), then the amount of speed up and slow down depends on g(x).  If g(x) has a slope of 2 at some point, then it’s acting like 2x and you get the same “times two” slope.  If it’s got a slope of 3 or 1/5, then the slope of f at the corresponding point will be multiplied by 3 or 1/5 respectively.

sin(x) in blue and sin(x^2/4) in green. x^2 starts slow and gets faster and faster, and as a result the green line gets steeper and steeper.

So, to find the slope of f(g(x)), which is just the derivative, \frac{d}{dx}\left[f(g(x))\right] you first find what the slope of f would be at the appropriate x value, \frac{df}{dx}(g(x)), and then multiply by how much g is speeding things up or slowing things down (scrunching or expanding).  The slope of  g is just the derivative, so you’re multiplying by \frac{dg}{dx}(x).

Boom!  Chain rule: \frac{d}{dx}\left[ f(g(x))\right] = \frac{df}{dx} (g(x))\cdot \frac{dg}{dx} (x)

It’s worth pointing out that, like all calc rules, it doesn’t matter that this rule only talks about two functions.  If you have something like f(g(h(x))), then you can treat g(h(x)) as one function, and you’ll find that after running through the chain rule once you’ll be faced with another, simpler, chain rule problem:

\frac{d}{dx}\left[f(g(h(x)))\right] = \frac{df}{dx}(g(h(x)))\cdot\frac{d}{dx}\left[g(h(x))\right]
Posted in -- By the Physicist, Equations, Math | 12 Comments

Q: How do you write algorithms to enycrypt things?

Physicist: There are several algorithms, but almost all of them are all based on “trap-door encryption”.  The idea is that you find some kind of mathematical process that’s easy to run forward, but effectively impossible to run backward, unless you know a trick (which you keep secret). It’s likened to a trap-door because (as every super-villain knows) it’s easy to go through a trap-door in one direction, but difficult in the other.  This is fundamentally different from the encoding schemes most people are familiar with, like “A=2, B=15, C=…” or Igpa Atinla, which are called “substitution cyphers”.  If you know how a substitution cipher is done, then you can not only encode a message, but you can decode it.

But encryption (as opposed to a “cypher”) is different.  Even if you know everything about how to encrypt (encode) a message, you will still be unable to decyrpt (decode) it.  Encryption is relatively new (publicly known since 1977).  Even the famous Enigma Device that the Germans used during WW2 was just a rolling substitution cypher.

The idea is, you allow everybody to know how to do the forward operation (which is known as the “public key”), which allows them to encrypt a message, but keep the secret to the reverse operation to yourself (the “private key”), which allows you to decrypt the messages.  You can think of it as like telling everybody else how to build safes that they can put messages in, but keep to yourself the secret behind opening those safes.

So if you want to talk to a particular person, you use their particular public key.  The central idea of encryption is that you can set up a system where other people can talk to you, perfectly securely, while sending all of their messages through a completely open and public channel.  Were you and a friend so inclined, you could communicate with each other entirely through a sign in Times Square, and no one would ever know what you were saying.  You can even do this without meeting before hand to set us a secret code!

By far the most common method today is RSA encryption.  You can think of RSA as a huge wheel with a different number written on each spoke in a scrambled order.  These numbers correspond to every possible string of letters or numbers of a particular length or shorter.  If you rotate the wheel all the way around (or a multiple of all the way around) you get your message back, but if you only rotate part of the way you get a random number (technically, a pseudo-random number).

RSA encryption. If your message is Johnny Von Purpleshirt, then by “turning the wheel” your encrypted message might be Doggington MacSmellingsomething. Decryption is just turning the wheel the rest of the way to get back to Johnny. The secret is in knowing how many “spokes” the wheel has.  If you know, then you know how to recover or decrypt the message.  If you don’t, then all you can do is encrypt.

The public key is turning the wheel a certain amount (not all the way), and the private key is how much you need to turn the wheel to get it the rest of the way.  The “secret” (and what keeps the private key private) is knowing exactly how big the wheel is.

RSA encryption is secure because the “wheel” involved typically has at least 10150 “spokes”.  Even with full knowledge of the public key, the “size of the wheel” is really hard to pin down.  It could take any one of approximately 1075 values.

If you want to send a message that’s too big to encrypt all at once, you just chop it up into smaller pieces and encrypt them one at a time.  This technique is not dissimilar to the standard means by which one eats something larger than one’s face.

Beyond RSA, if you want to create a new form of encryption (like elliptic-curve or knapsack encryption, for example), you just hire a mathematician who studies some obscure branch of number theory and wait for a while.

Completely ordinary cryptanalyst.

There are a hell of a lot of things that can be done with encryption other than just sending messages.  There’s shared random secret distribution, e-cash, e-signatures, secure voting, all kinds of stuff.  It’s awesome.


Answer Gravy: First, anything you can write with words you can turn into a number.  For example, what you’re reading now is being stored on a server somewhere in the form of a bucket of 1’s and 0’s.  So any discussion of codes and encryption can be reduced to a discussion of numbers.  This is going to be pretty dense and fast:

The ideas behind RSA encryption are modular math and some interesting consequences from group theory.  Modular arithmetic is what you’re doing when you try to figure out what time it will be in more than 12 hours.  For example, if it’s 9:00, then in 5 hours it will be 2:00.  This is “mod 12” arithmetic.  Every time a number is larger than 12 you subtract 12 until it’s smaller.  This “9:00 + 5” example can be written [9+5]_{12}=[14]_{12}=[2]_{12}=2.

There’s a function called the Euler phi, “φ(n)”, that’s defined as the number of numbers less than n that have no prime factors in common with n.  For example, φ(10)=4.  The factors of 10 are 2 and 5, and there are 4 numbers less than 10 that don’t contain any 2’s and 5’s: 1, 3, 7, and 9.

It so happens that [x^{\varphi(m)}]_m=1, for any x*.  For example, [3^{\varphi(10)}]_{10}=[3^4]_{10}=[81]_{10}=[1]_{10}=1 or [7^{\varphi(10)}]_{10}=[7^4]_{10}=[2401]_{10}=[1]_{10}=1.  Notice what happens when you raise a number to the “j\varphi(m)+1” power:

\begin{array}{ll}[x^{j\varphi(m)+1}]_m\\=[x^{j\varphi(m)}x]_m\\=[\left(x^{\varphi(m)}\right)^jx]_m\\=[\left(1\right)^jx]_m\\=[x]_m\end{array}

So, if you raise any x to a particular power (mod m) it eventually cycles back and you get x again.  The process of encrypting something is nothing more than getting x part of the way through the cycle, and decryption is just completing the cycle and coming back to x.

Now, say you’ve got a pair of numbers, k and ℓ, such that kℓ = jφ(m)+1.  To get from the original text, T, to the cyphertext, C, you just raise T to the kth power: [T^k]_m = C.  k is the public key.

To recover the original text just raise C to the ℓ power: [C^\ell]_m=[\left(T^k\right)^\ell]_m=[T^{k\ell}]_m=[T^{j\varphi(m)+1}]_m= T.  ℓ is the private key.

That’s basically all there is to RSA encryption.

To create m, you just need to find two large primes, p and q.  To find large primes you just pick a big number and use something like Fermat’s Little Theorem, or a more full-proof modern variant, to test whether or not your pick is prime.  Once you have those primes you can generate m=pq and φ(m)=(p-1)(q-1).

To create k you just need a random number that’s coprime to φ(m), and determining that is easy enough: you can use Euclid’s algorithm.  Once you have k and φ(m) you can find ℓ by solving a Diophantine equation, kx+φ(m)y=1, for x and y, and then ℓ = [x]φ(m).  Alternatively, ℓ = [kφ(φ(m))-1]φ(m).  However, in order to calculate φ(x), you need to know the prime factors of x.  The factors of m are known, because m=pq, but the factors of φ(m) may not be known.

When you’ve got all your number-ducks in a row, you make m and k public.  This means everybody can encrypt.  But you keep ℓ, φ(m), p, and q private.  Without p and q, there’s no (easy) way to find φ(m), and without φ(m) there’s no (easy) way to find ℓ.  There is a very quick way to break encryption keys (find ℓ), but it involves hardware that doesn’t exist just yet.  Here’s how!


*More accurately, [x^{\varphi(m)+1}]_m=x and [x^{j\varphi(m)+1}]_m=x are always true, for any x, when m is the product of two prime numbers, but [x^{\varphi(m)}]_m=1 is only true when x and m have no common factors.  So, for example, [2^{\varphi(10)}]_{10}=[2^4]_{10}=[16]_{10}=[6]_{10}=6\ne1 and [2^{\varphi(10)+1}]_{10}=[2^5]_{10}=[32]_{10}=[2]_{10}=2.

The details behind why aren’t complicated, but they aren’t generally interesting either.  The point is, in this case the math still holds up and is easier to understand when you say “[x^{\varphi(m)}]_m=1“, even though this statement isn’t exactly true.

The wagon wheel picture is from here.

Posted in -- By the Physicist, Computer Science, Math, Number Theory | 8 Comments

Q: Satellites experience less time because they’re moving fast, but more time because they’re so high. Is there an orbit where the effects cancel out? Is that useful?

The original question was: I read that due to time dilation from both gravity and speed, GPS satellites need their clocks adjusted to match Earth’s time or else the whole idea would fall flat on its face. So my question is wouldn’t there be a natural way to match the GPS clocks with Earth by simply having the time dilation from movement offset the time dilation from gravity? How fast would they have to orbit the Earth to cancel out the gravitational time dilation?


Physicist: That’s a really good question!  The thumbnail sketch of time dilation is: fast things and low things experience less time.  So satellites above our head experience a little more time because they’re high, but a little less because they’re moving.

Something orbiting at ground level (assuming you could orbit at ground level) would be tearing along at about 8 km/s: same height as us, and great speed, means slower in time.

Something orbiting very far away is moving pretty slow.  Great height and low-speed means faster in time.  Somewhere in between there’s an orbit where the effects cancel.

It turns out that a satellite would have to orbit at about the same speed it would hit the ground if it fell, straight down, from the height of that orbit.  That orbit is 50% of the radius of the planet above the surface of the planet.  In the case of Earth, that’s 1,975 miles up, well below the 12,600 mile altitude of the GPS network.  So, their on-board clocks run a little faster than identical ground-based clocks (a gain of about 1.7 seconds per century).

For the non-physicists out there, I can’t tell you how unusually straight forward that 50% radius thing is.  You almost never get results that clean.

To a very good approximation the gravitational time dilation can be calculated by using “moving” time dilation and plugging in the speed you’d be falling, if you fell from that height.

The time-slowing and time-speeding effects cancel at an orbital height of 50% of the planet’s radius.  At this altitude the orbital speed, and the speed with which you’d hit the ground falling from that height, are equal.

You can picture this in terms of someone at the higher location dropping clocks to someone at the lower location, who has an uncanny power to read speeding clocks.  The laws of physics, including the ones regarding time, act more or less the way they “should” in a zero-g environment.  For example, in zero gravity if you knock a cup off of a table it doesn’t start magically moving (falling) for no reason.  Impossible!  The falling clocks are basically carrying an accurate record (not needing to worry about gravity) of the time frame from which they started falling.

At least that’s one method of calculating general relativistic time weirdness.  A better (but equivalent) one is here.

As for fixing the problem: no problem.  Relativity, although difficult to wrap your head around, is no mystery.  We know exactly how fast time is passing for every satellite in the sky.  To deal with it, all that’s needed is clocks that run at a slightly different rate.  The specifics can be found on page 7 of this.  Building a clock to keep a very particular “incorrect” time is exactly as difficult as building a clock to keep “correct” time.  No problem.


Answer Gravy: The “general relativity can be approximated by using special relativity and how fast things fall” works well so long as the gravity isn’t really messing up spacetime.  Basically, it doesn’t work on black holes, but it works great for planets and stars.  Once you figure out how fast something would be falling between two levels, you plug that into the “gamma function”, γ, which tells you how many times faster time is traveling at the higher level than the lower.

\gamma = \frac{1}{\sqrt{1-v^2/c^2}}

The speed, v, that an object with mass, m, will be going if it falls from somewhere in space, R, to ground level, r, can be found by looking at the difference in gravitational potential energy and setting that equal to the gain in kinetic energy: \frac{GMm}{r} - \frac{GMm}{R} = \frac{1}{2}mv_g^2.

This speed, vg, is what you use to find how much time is going faster for the satellite.

The speed of a (circular) orbit can be found by setting the gravitational force equal to centrifugal: \frac{GMm}{R^2} = \frac{mv_s^2}{R}.  This speed, vs, is what you plug in to find how much time is going slower.

In general the speed of the clocks on a satellite is faster than clocks on the ground by a factor of \frac{\gamma_g}{\gamma_s}=\frac{\sqrt{1-v_s^2/c^2}}{\sqrt{1-v_g^2/c^2}}.  However, in this case we’re looking for the overall factor to be 1 (time is the same for the satellite as the ground).  So, vg = vs.

For vg: v_g^2=\frac{2GM}{r} - \frac{2GM}{R}.

For vs: \frac{v_s^2}{R}=\frac{GM}{R^2}\Rightarrow v_s^2=\frac{GM}{R}.

\begin{array}{ll}\frac{2GM}{r} - \frac{2GM}{R} = v_g^2 = v_s^2 = \frac{GM}{R}\\\Rightarrow\frac{2GM}{r} - \frac{2GM}{R} =\frac{GM}{R}\\\Rightarrow\frac{2GM}{r} =\frac{3GM}{R}\\\Rightarrow\frac{2}{r} =\frac{3}{R}\\\Rightarrow \quad R =\frac{3}{2}r\end{array}
Posted in -- By the Physicist, Physics, Relativity | 11 Comments

Q: Is it possible to objectively quantify the amount of information a sentence contains?

The original question was: It seems to me that it’s impossible to measure the information content of a message without considering the recipient of the message. For example, one might say that a coin toss generates a single bit of information: the result is either heads or tails.  I beg to differ.  Someone who already knows that a coin would be tossed, only gains a single bit when he or she hears, “The quarter landed on heads.”  But someone else who didn’t know a coin was being tossed learns not only that the quarter landed on heads, but also that the quarter was tossed.  One could also deduce the speaker’s nationality or what sort of emotional response the coin toss has elicited.  If you’re creative enough, there’s no limit to how much information you squeeze out of this supposedly one-bit sentence.  Is it really possible to objectively quantify the amount of information a sentence contains?


Physicist: Information theory (like any theory) is unrealistically simple.  By expanding the complexity of a model you’ll find that, in general, you can glean more information.  However, before you jump into figuring out how much information that a coin flip could theoretically convey you start with the simple case: heads and tails.  Then you can get into more complicated ideas like what kind of information can be conveyed with the fact of the coin’s flip, or its year of issue, or what kind of person would bother telling anybody else about their coin hobby.

In an extremely inexact, but still kinda accurate nutshell: the information of a sentence can be measured by how much you’re surprised by it.  The exact definition used by information theorists can be found in Claude Shannon’s original paper (if you know what a logarithm is, you can read the paper).  Even more readable is “Cover and Thomas“.

Unfortunately, there is no completely objective way to quantify exactly how much information a sentence contains.  For example, each sentence in this post has substantially lower entropy (information) for an English language speaker than for a Russian language _____, since after each ____ there are relatively few other words than can reasonably follow, if you adhere to the grammatical rules and spelling __ English (this would be a _____ example if I could write ____).  For the Russian speaker every next word or letter is far less predictable.

If, a thousand years ago, somebody had said “To be or…” the next words would be pretty surprising.  Everyone listening would be expecting something like “…three bee or something like that, I mean I started running the second I saw the hive”, and would be surprised by the actual end of the sentence.  However, today people know the whole quote (“…not to be.”), so you could say that the sentence contains less information, because most people can predict more of the words in it.

The difficulty comes about because information is defined and couched in the mathematics of probability, and how much new information a signal brings you is based on conditional probability (which is what you’re using when you say something like “the chance of blah given blerg”).  So the amount of information you get from something is literally conditioned on what you already know.  So for example, if you send someone the same message twice, they won’t get twice the information.

Real problems crop up when you don’t know what information the other person already has.  Not speaking the same language is a common example of this.  But if you have absolutely no “conditions” (things like; a coin is being flipped, written language exists, whatever), then there’s no way to send, or really even define, information.

This brings up buckets of questions about how babies learn anything.  You could also argue that this is why the cover of the Voyager Golden Record ended up making no damn sense.

What?

In case you don’t have the proper previous (and subjective) information to immediately understand what this plate is trying to say, here’s the cheat sheet.

Even so, you can still gauge how much information is being conveyed with pretty minimal conditioning, like “all the information here is contained in this string of symbols”.  In fact, you can determine if a string of unknown symbols is a language, or even what language (given a large enough sample)!

Posted in -- By the Physicist, Entropy/Information | 3 Comments