Q: If a long hot streak is less likely than a short hot streak, then doesn’t that mean that the chance of success drops the more successes there are?

One of the original questions was:  I understand “gambler’s fallacy” where it is mistaken to assume that if something happens more frequently during a period then it will be less frequently in the future.  Example:  If I flip a coin 9 times and each time I get HEADS, than to assume that  it is more “probable” that the 10th flip will be tails is a incorrect assumption.

I also understand that before I begin flipping that coin in the first place, the odds of getting 10 consecutive HEADS is a very big number and not a mere 50/50.

My question is:  Is it more likely?, more probable?, more expectant?, or is there a higher chance of a coin turning up TAILS after 9 HEADS?


Physicist: Questions of this ilk come up a lot.  Probability and combinatorics, as a field study, are just mistake factories.  In large part because single words massively change the difference between two calculations, not just in the result but in how you get there.  In this case the problem word is “given”.

Probabilities can change completely when the context, the “conditionals”, change.  For example, the probability that someone is eating a sandwich is normally pretty low, but the probability that a person is eating a sandwich given that there’s half a sandwich in front of them is pretty high.

To understand the coin example, it helps to re-phrase in terms of conditional probabilities.  The probability of flipping ten heads in a row, P(10H), is P(10H) = \left(\frac{1}{2}\right)^{10}\approx 0.1\%.  Not too likely.

The probability of flipping tails given that the 9 previous flips were heads is a conditional probability: P(T | 9H) = P(T) = 1/2.

In the first situation, we’re trying to figure out the probability that a coin will fall a particular way 10 times.  In the second situation, we’re trying to figure out the probability that a coin will fall a particular way only once.  Random things like coins and dice are “memoryless”, which means that previous results have no appreciable impact on future results.  Mathematically, when A and B are unrelated events, we say P(A|B) = P(A).  For example, “the probability that it’s Tuesday given that today is rainy, is equal to the probability that it’s Tuesday” because weather and days of the week are independent.  Similarly, each coin flip is independent, so P(T | 9H) = P(T).

The probability of the “given” may be large or small, but that isn’t important for determining what happens next.  So, after the 9th coin in a row comes up heads everyone will be waiting with bated breath (9 in a row is unusual after all) for number ten, and will be disappointed exactly half the time (number 10 isn’t affected by the previous 9).

This turns out to not be the case when it comes to human-controlled events.  Nobody is “good at playing craps” or “good at roulette”, but from time to time someone can be good at sport.  But even in sports, where human beings are controlling things, we find that there still aren’t genuine hot or cold streaks (sans injuries).  That’s not to say that a person can’t tally several goalings in a row, but that these are no more or less common than you’d expect if you modeled the rate of scoring as random.

For example, say Tony Hawk has already gotten three home runs by dribbling a puck into the end zone thrice.  The probability that he’ll get another point isn’t substantially different from the probability that he’d get that first point.  Checkmate.

Notice the ass-covering use of “not substantially different”.  When you’re gathering statistics on the weight of rocks or the speed of light you can be inhumanly accurate, but when you’re gathering statistics on people you can be at best humanly accurate.  There’s enough noise in sports (even bowling) that the best we can say with certainty is that hot and cold streaks are not statistically significant enough to be easily detectable, which they really need to be if you plan to bet on them.

Posted in -- By the Physicist, Math, Probability | 12 Comments

Q: Where do the rules for “significant figures” come from?

Physicist: When you’re doing math with numbers that aren’t known exactly, it’s necessary to keep track of both the number itself and the amount of error that number carries.  Sometimes this is made very explicit.  You may for example see something like “3.2 ± 0.08”.  This means : “the value is around 3.2, but don’t be surprised if it’s as high as 3.28 or as low as 3.12… but farther out than that would be mildly surprising.”

120cm ± 0.5cm, due to hair.

120 ± 0.5 cm, due to hair-based error.

However!  Dealing with two numbers is immoral and inescapably tedious.  So, humanity’s mightiest accountants came up with a short hand: stop writing the number when it becomes pointless.  It’s a decent system.  Significant digits are why driving directions don’t say things like “drive 8.13395942652 miles, then make a slight right”.  Rather than writing a number with its error, just stop writing the number at the digit where noises and errors and lack of damn-giving are big enough to change the next digit.  The value of the number and the error in one number.  Short.

The important thing to remember about sig figs is that they are an imprecise but typically “good enough” way to deal with errors in basic arithmetic.  They’re not an exact science, and are more at home in the “rules of punctuation” schema than they are in the toolbox of a rigorous scientist.  When a number suddenly stops without warning, the assumption that the reader is supposed to make is “somebody rounded this off”.  When a number is rounded off, the error is at least half of the last digit.  For example, 40.950 and 41.04998 both end up being rounded to the same number, and both are reasonable possible values of “41.0” or “41±0.05”.

For example, using significant figures, 2.0 + 0.001 = 2.0.  What the equation is really saying is that the error on that 2.0 is around ±0.05 (the “true” number will probably round out to 2.0).  That error alone is bigger than the entire second number, never mind what its error is (it’s around ±0.0005).  So the sum 2.0 + 0.001 = 2.0, because both sides of the equation are equal to 2 + “an error of around 0.05 or less, give or take”.

2.0+0.001 = 2.0 because the significant digits are conveying a notion of "error", and the second number is "drowned out" by the error in the first.

2.0 + 0.001 = 2.0.  The significant digits are conveying a notion of “error”, and the second number is being “drowned out” by the error in the first.

“Rounding off” does a terrible violence to math.  Now the error, rather than being a respectable standard deviation that was painstakingly and precisely derived from multiple trials and tabulations, is instead an order-of-magnitude stab in the dark.

The rules regarding error (answer gravy below) show that if your error is only known to within an order of magnitude (where a power of ten describes its size), then when you’re done adding or multiplying two numbers together, what results will have an error of the same magnitude in the sense that you’ll retain the same number of significant digits.

For example,

\begin{array}{ll}    1234\times 0.32 \\    = (1234 \pm 0.5)(0.32\pm 0.005) \\    = \left([1.234\pm 0.0005] 10^3\right)\left([3.2\pm 0.05] 10^{-1}\right) & \leftarrow\textrm{Scientific Notation} \\    = \left(1.234\pm 0.0005\right)\left(3.2\pm 0.05\right)10^2 \\    = \left(1.234\times3.2 \pm 1.234\times0.05 \pm 3.2\times0.0005 \pm 0.05\times0.0005\right)10^2 \\    = \left(3.9488 \pm 0.0617 \pm 0.0016 \pm 0.000025\right)10^2 \\    \approx \left(3.9488 \pm 0.0617\right)10^2 \\    \approx \left(3.9488 \pm 0.05\right)10^2 \\    = 3.9 \times 10^2    \end{array}

This last expression could be anywhere from 3.8988 x 102 to 3.9988 x 102.  The only digits that aren’t being completely swamped by the error are “3.9”.  So the final “correct” answer is “3.9 x 102“.  Not coincidentally, this has two significant digits, just like “0.32” which had the least number of significant digits at the start of the calculation.  The bulk of the error in the end came from “±1.234×0.05”, the size of which was dictated by that “0.05”, which was the error from “0.32”.

Notice that in the second to last step it was callously declared that “0.0617 ≈ 0.05”.  Normally this would be a travesty, but significant figures are the mathematical equivalent of “you know, give or take or whatever”.  Rounding off means that we’re ignoring the true error and replacing it with the closest power of ten.  That is, there’s a lot of error in how big the error is.  When you’re already introducing errors by replacing numbers like 68, 337, and 145 with “100” (the nearest power of ten), “0.0617 ≈ 0.05” doesn’t seem so bad.  The initial error was on the order of 1 part in 10, and the final error was likewise on the order of 1 part in 10.  Give or take.  This is the secret beauty of sig figs and scientific notation; they quietly retain the “part in ten to the ___” error.

That said, sig figs are kind of a train wreck.  They are not a good way to accurately keep track of errors.  What they do is save people a little effort, manage errors and fudges in a could-be-worse kind of way, and instill a deep sense of fatalism.  Significant figures underscore at every turn the limits either of human expertise or concern.

By far the most common use of sig figs is in grading.  When a student returns an exam with something like “I have calculated the mass of the Earth to be 5.97366729297353452283 x 1024 kg”, the grader knows immediately that the student doesn’t grok significant figures (the correct answer is “the Earth’s mass is 6 x 1024 kg, why all the worry?”).  With that in mind, the grader is now a step closer to making up a grade.  The student, for their part, could have saved some paper.


Answer Gravy: You can think of a number with an error as being a “random variable“.  Like rolling dice (a decidedly random event that generates a definitively random variable), things like measuring, estimating, or rounding create random numbers within a certain range.  The better the measurement (or whatever it is that generates the number), the smaller this range.  There are any number of reasons for results to be inexact, but we can sweep all of them under the same carpet labeling them all “error”; keeping track only of their total size using (usually) standard deviation or variance.  When you see the expression “3±0.1”, this represents a random variable with an average of 3 and a standard deviation of 0.1 (unless someone screwed up or is just making up numbers, which happens a lot).

When adding two random variables, (A±a) + (B±b), the means are easy, A+B, but the errors are a little more complex.  (A±a) + (B±b) = (A+B) ± ?.  The standard deviation is the square root of the variance, so a2 is the variance of the first random variable.  It turns out that the variance of a sum is just the sum of the variances, which is handy.  So, the variance of the sum is a2 + b2 and (A±a) + (B±b) = A+B ± √(a^2+b^2).

When adding numbers using significant digits, you’re declaring that a=0.5 x 10-D1 and b=0.5 x 10-D2, where D1 and D2 are the number of significant digits each number has.  Notice that if these are different, then the bigger error takes over.  For example, \sqrt{\left(0.5\cdot10^{-1}\right)^2 + \left(0.5\cdot10^{-2}\right)^2} = 0.5\cdot 10^{-1}\sqrt{1 + 10^{-2}} \approx 0.5\cdot 10^{-1}.  When the digits are the same, the error is multiplied by √2 (same math as last equation).  But again, sig figs aren’t a filbert brush, they’re a rolling brush.  √2?  That’s just another way of writing “1”.

The cornerstone of "sig fig" philosophy; you're not all over the place, but it won't be perfect.

The cornerstone of “sig fig” philosophy; not all over the place, but not super concerned with details.

Multiplying numbers is one notch trickier, and it demonstrates why sig figs can be considered more clever than being lazy normally warrants.  When a number is written in scientific notation, the information about the size of the error is exactly where it is most useful.  The example above of “1234 x 0.32” gives some idea of how the 10’s and errors move around.  What that example blurred over was how the errors (the standard deviations) should have been handled.

First, the standard deviation of a product is a little messed up: (A\pm a)(B\pm b) = AB \pm\sqrt{A^2b^2 + B^2a^2 + a^2b^2}.  Even so!  When using sig figs the larger error is by far the more important, and the product once again has the same number of sig figs.  In the example, 1234 x 0.32 = (1.234 ± 0.0005) (3.2 ± 0.05) x 10-2.  So, a = 0.0005 and b = 0.05.  Therefore, the standard deviation of the product must be:

\begin{array}{ll}    \sqrt{A^2b^2 + B^2a^2 + a^2b^2} \\[2mm]    = Ab\sqrt{1 + \frac{B^2a^2}{A^2b^2} + \frac{a^2}{A^2}} \\[2mm]    = (1.234) (0.05) \sqrt{1.0000069} \\[2mm]    \approx(1.234)(0.05)\\[2mm]    \approx 0.05    \end{array}

Notice that when you multiply numbers, their error increases substantially each time (by a factor of about 1.234 this time).  According to Benford’s law, the average first digit of a number is 3.440*. As a result, if you’re pulling numbers “out of a hat”, then on average every two multiplies should knock off a significant digit, because 3.4402 = 1 x 101.

Personally, I like to prepare everything algebraically, keep track of sig figs and scientific notation from beginning to end, then drop the last 2 significant digits from the final result.  Partly to be extra safe, but mostly to do it wrong.

*they’re a little annoying, right?

Posted in -- By the Physicist, Conventions, Equations, Math | 10 Comments

Q: If time slows down when you travel at high speeds, then couldn’t you travel across the galaxy within your lifetime by just accelerating continuously?

Physicist: Yup!  But sadly, this will never happen.

This is a good news / really bad news situation.  On the one hand, it is true (for all intents and purposes) that if you travel fast enough, time will slow down and you’ll get to your destination is surprisingly little time.  The far side of the galaxy is about 100,000 lightyears away, so it will always take at least 100,000 years to get there.  However, the on-board clocks run slower (from the perspective of anyone “sitting still” in the galaxy) so the ship and everything on it may experience far less than 100,000 years.

First, when you read about traveling to far-off stars you’ll often hear about “constant acceleration drives”, which are rockets capable of accelerating at a comfortable 1g for years at a time (“1g” means that people on the rocket would feel an acceleration equivalent to the force of Earth’s gravity).  However!  Leaving a rocket on until it’s moving near the speed of light is totally infeasible.  A rocket capable of 1g of acceleration for years is a rocket that can hover just above the ground for years.  While this is definitely possible for a few seconds or minutes (“retro rockets“), you’ll never see people building bridges on rockets, or hanging out and having a picnic for an afternoon or three on a hovering rocket.  Spacecraft in general coast ballistically except for the very beginning and very end of their trip (excluding small corrections).  For example, the shuttle (before the program was shut down) could spend weeks coasting along in orbit, but the main rockets only fire for the first 8 minutes or so.  And those 8 minutes are why the shuttle weighs more than 20 times as much on the launch pad than when it lands.

The big exception is ion drives, but a fart produces more thrust than an ion drive (seriously) so… meh.

Rockets: in a hurry for a little while and then not for a long while.

Rockets: in a hurry for a little while and then not for a long while.

In order to move faster, a rocket needs to carry more fuel, so it’s heavier, so it needs more fuel, etc.  The math isn’t difficult, but it is disheartening.  Even with antimatter fuel (the best possible source by weight) and a photon drive (exhaust velocity doesn’t get better than light speed), your ship would need to be 13 parts fuel to one part everything else, in order to get to 99% of light speed.

That said, if somehow you could accelerate at a comfortable 1g forever, you could cross our galaxy (accelerating halfway, then decelerating halfway) in a mere 20-25 years of on-board time.  According to every one else in the galaxy, you’d have been cruising at nearly light speed for the full 100,000 years.  By the way, this trip (across the Milky Way, accelerate halfway, decelerate halfway, anti-matter fuel, photon drives) would require a fuel-to-ship ratio of about 10,500,000,000 : 1.  Won’t happen.

The speed of light is still a fundamental limit, so if you were on the ship you’ll still never see stars whipping by faster than the speed of light (which you might expect would be necessary to cross 100,000 light years in only 25 years).  But relativity is a slick science; length contraction and time dilation are two sides of the same coin.  While everyone else in the galaxy explains the remarkably short travel time in terms of the people on the ship moving slower through time, the people on the ship attribute it to the distance being shorter.  The stars pass by slower than light speed, but they’re closer together (in the direction of travel).  “Which explanation is right?” isn’t a useful question; if every party does their math right, they’ll come to the same conclusions.


Answer Gravy: Figuring out how severe relativistic effects are often comes down to calculating \gamma = \frac{1}{\sqrt{1-\left(\frac{v}{c}\right)^2}}, which is the factor which describes how many times slower time passes and how many times shorter distances contract (for outside observers only, since you will always see yourself as stationary).  Photon ships make the calculation surprisingly simple.  Here’s a back-of-the-envelope trick:

If your fuel is antimatter and matter, then the energy released is E=Mc2 (it’s actually useful sometimes!).  If the exhaust is light, then the momentum it carries is P=E/c.  Finally, the energy of a moving object is γMc2 and the momentum is γMv.  It’s not obvious, but for values of v much smaller than c, this is very nearly the same as Newton’s equations.

For a fuel mass of f, a rocket mass of m, and a beam of exhaust light with energy E, lining up the energy and momentum before and after yields:

\begin{array}{ll}\left\{\begin{array}{ll}(m+f)c^2 = \gamma mc^2+E\\0=\gamma mv - \frac{E}{c}\end{array}\right.\\\Rightarrow (m+f)c^2=\gamma mc^2+\gamma mcv=\gamma mc(v+c)\\\Rightarrow \gamma = \frac{c}{v+c}\left(1+\frac{f}{m}\right)\end{array}

So, when v ≈ c (when the ship is traveling near light speed), \gamma \approx \frac{1}{2}\left(1+\frac{f}{m}\right) \approx \frac{f}{2m}.  That means that if, for example, you want to travel so fast that your trip is ten times slower than it “should” be, then you need to have around 20 times more fuel than ship.  Even worse, if you want to stop when you get where you’re going, you’ll need to square that ratio (the fuel needed to stop is included as part of the ship’s mass when speeding up).

More tricky to derive and/or use is the math behind constant acceleration.  If a ship is accelerating at a rate “a”, the on-board clock reads “τ”, and the position and time of the ship according to everyone who’s “stationary” are “x” and “t”, then

x(\tau) = \frac{c^2}{a}Cosh\left(\frac{a}{c}\tau\right)-\frac{c^2}{a} \approx \frac{c^2}{2a}e^{\frac{a}{c}\tau} t(\tau) = \frac{c}{a}Sinh\left(\frac{a}{c}\tau\right)

this is lined up so that x(0) = t(0) = 0 (which means that everyone’s clocks are synced when the engines are first turned on).

Posted in -- By the Physicist, Astronomy, Physics, Relativity | 41 Comments

Q: When something falls on your foot, how much force is involved?

Physicist: There’s a cute trick you can use here.  It a falling object starts at rest and ends at rest, then it gains all of its energy from gravity, and all of that energy is deposited in your unfortunate foot.

Kinetic energy is (average) force times distance; whether you’re winding a spring, starting a fire (with friction), firing projectiles, or crushing your foot.  The energy the object gains when falling is equal to its weight (the force of gravity) times the distance it falls.  The energy the object uses to bust metatarsals is equal to the distance it takes for it to come to a stop times the force that does that stopping.  S0, D_{fall}F_{fall} = E = D_{stop}F_{stop}.

The distance times the force that gets an object moving is equal to the distance times the force that brings that object to a halt.

The distance times the force that gets an object moving is equal to the distance times the force that brings that object to a halt.

Of course, the distance over which the object slows down is much smaller than the distance over which it sped up.  As a result, the stopping force required is proportionately larger.  This is one of the reasons why physicists flinch so much during unrealistic action movies (that, and loud noises make us skittish).  Something falling on your foot stops in about half a cm or a quarter inch, what with skin and bones that flex a little.  Give or take.

Bowling balls; keeping pediotrosts employed since.

Bowling balls: keeping podiatrists gainfully employed for 700 years.

So, if you drop a 10 pound ball 4 feet (48 inches), and it stops in a quarter inch, then the force at the bottom of the fall is F = \frac{48}{0.25}10lbs \approx 2,000lbs.  This is why padding is so important; if that distance was only an eighth of an inch (seems reasonable) then the force jumps to 4,000lbs, and if that distance is increased to half an inch then the force drops to 1,000 lbs.

The bowling ball picture is from here.

Posted in -- By the Physicist, Experiments, Physics | 55 Comments

Q: If nothing can escape a black hole’s gravity, then how does the gravity itself escape?

Physicist: A black hole is usually described as a singularity, where all the mass is (or isn’t?), which is surrounded by an “event horizon”.  The event horizon is the “altitude” at which the escape velocity is the speed of light, so nothing can escape.  But if gravity is “emitted” by black holes, then how does that “gravity signal” get out?  The short answer is that gravity isn’t “emitted” by matter.  Instead, it’s a property of the spacetime near matter and energy.

It’s worth stepping back and considering where our understanding of black holes, and where all of our predictions about their behavior, comes from.  Ultimately our understanding of black holes, as well as all of our predictions for their bizarre behavior, stems from the math we use to describe them.  The extremely short answer to this question is: the math says nothing can escape, and that the gravity doesn’t “escape” so much as it “persists”.  No problem.

Einstein’s whole thing was considering the results of experiments at face value.  When test after test always showed the speed of light was exactly the same, regardless of how the experiment was moving, Einstein said “hey, what if the speed of light is always the same regardless of how you’re moving?”.  Genius.  There’s special relativity.

It also turns out that no experiment can tell the difference between floating motionless in deep space and accelerating under the pull of gravity (when you fall you’re weightless).  Einstein’s stunning insight (paraphrased) was “dudes!  What if there’s no difference between falling and floating?”.  Amazing stuff.

Sarcasm aside, what was genuinely impressive was the effort it took to turn those singsong statements into useful math.  After a decade of work, and buckets of differential geometry (needed to deal with messed up coordinate systems like the surface of Earth, or worse, curved spacetime) the “Einstein Field Equations” were eventually derived, and presumably named after Einstein’s inspiration: the infamous Professor Field.

This is technically 16 equations, however there are tricks to get that down to a more sedate 6 equations.

This is technically 16 equations (μ and ν are indices that take on 4 values each), however there are tricks to get that down to a more sedate 6 equations.

The left side of this horrible mess describes the shape of spacetime and relates it to the right side, which describes the amount of matter and energy (doesn’t particularly matter which) present.  This equation is based on two principles: “matter and energy make gravity… somehow” and “when you don’t feel a push or pull in any direction, then you’re moving in a straight line”.  That push or pull is defined as what an accelerometer would measure.  So satellites are not accelerating because they’re always in free-fall, whereas you are accelerating right now because if you hold an accelerometer it will read 9.8m/s2 (1 standard Earth gravity).  Isn’t that weird?  The path of a freely falling object (even an orbiting object) is a straight line through a non-flat spacetime.

Moving past the mind-bending weirdness; this equation, and all of the mathematical mechanisms of relativity, work perfectly for every prediction that we’ve been able to test.  So experimental investigation has given General Relativity a ringing endorsement.  It’s not used/taught/believed merely because it’s pretty, but because it works.

Importantly, the curvature described isn’t merely dependent on the presence of “stuff”, but on the curvature of the spacetime nearby.  Instead of being emitted from some distant source, gravity is a property of the space you inhabit right now, right where you are.  This is the important point that the “bowling ball on a sheet” demonstration is trying to get across.

The Einstein Field Equations

The Einstein Field Equations describe the stretching of spacetime as being caused both by the presence of matter and also by the curvature of nearby spacetime.  Gravity doesn’t “reach out” any more than the metal ball in the middle is.

So here’s the point.  Gravity is just a question of the “shape” of spacetime.  That’s affected by matter and energy, but it’s also affected by the shape of spacetime nearby.  If you’re far away from a star (or anything else really) the gravity you experience doesn’t come directly that star, but from the patch of space you’re sitting in.  It turns out that if that star gets smaller and keeps the same mass, that the shape of the space you’re in stays about the same (as long as you stay the same distance away, the density of an object isn’t relevant to its gravity).  Even if that object collapses into a black hole, the gravity field around it stays about the same; the shape of the spacetime is stable and perfectly happy to stay the way it is, even when the matter that originally gave rise to it is doing goofy stuff like being a black hole.

This stuff is really difficult / nigh impossible to grok directly.  All we’ve really got are the experiments and observations, which led to a couple simple statements, which led to some nasty math, which led to some surprising predictions (including those concerning black holes), which so far have held up to all of the observations of known black holes that we can do (which is difficult because they’re dark, tiny, and the closest is around 8,000 light years away, which is not walking-distance).  That said: the math comes before understanding, and the math doesn’t come easy.

Funny because it's true.

It’s funny because it’s true.

Here’s the bad news.  In physics we’ve got lots of math, which is nice, but no math should really be trusted to predict reality without lots of tests and verification and experiment (ultimately that’s where physics comes from in the first place).  Unfortunately no information ever escapes from beyond the event horizon.  So while we’ve got lots of tests that can check the nature of gravity outside of the horizon (the gravity here on Earth behaves in the same way that gravity well above the horizon behaves), we have no way even in theory to investigate the interior of the event horizon.  The existence of singularities, and what’s going on in those extreme scenarios in general, may be a mystery forever.  Maybe.

This probably doesn’t need to be mentioned, but the comic is from xkcd.

Posted in -- By the Physicist, Astronomy, Physics, Relativity | 76 Comments

Q: Is there a formula for finding primes? Do primes follow a pattern?

Physicist: Primes are, for many purposes, basically random.  It’s not easy to “find the next prime” or determine if a given number is prime, but there are tricks.  Which trick depends on the size of the number.  Some of the more obvious ones are things like “no even numbers (other than 2)” and “the last digit can’t be 5”; but those just eliminate possibilities instead of confirming them.  Confirming that a number is prime is a lot more difficult.

Small (~10): The Sieve of Eratosthenes finds primes and also does a decent job demonstrating the “pattern” that they form.

Starting with 2, remove every multiple. The first blank is a new prime. Remove every multiple of that new prime. Repeat forever.

Starting with 2, remove every multiple. The first blank is a new prime. Remove every multiple of that new prime. Repeat forever or until bored.

The integers come in 4 flavors: composites, primes, units (1 and -1), and zero.  2 is the first prime and every multiple of it is composite (because they have 2 as a factor).  If you mark every multiple of 2, you’ll be marking only composite numbers.  The first unmarked number is 3 (another prime), and every multiple of 3 is composite.  Continue forever.  This makes a “map” of all of the primes up to a given number (in the picture above it’s 120).  Every composite number has at least one factor less than or equal to its square root, so if the largest number on your map is N, then you only need to check up to √N.  After that, all of the remaining blanks are primes.

This algorithm is great for people (human beings as opposed to computers) because it rapidly finds lots of primes.  However, like most by-hand algorithms it’s slow (by computer standards).  You wouldn’t want to use it to check all the numbers up to, say, 450787.

Eratosthenes, in a completely unrelated project, accurately calculated the circumference of the Earth around 2200 years ago using nothing more than the Sun, a little trigonometry, and some dude willing to walk the ~900km between Alexandria and Syene.  This marks one of the earliest recorded instances of grad student abuse.

Medium (~1010): Fermat’s Little Theorem or AKS.

Fermat’s little theorem (as opposed to Fermat’s theorem) works like this: if N is prime and A is any number such that 1<A< N, and if A^{N-1} \, mod \, N\ne1, then N is definitely composite and if A^{N-1} \, mod \, N=1 then N is very likely to be prime.  “Mod N” means every time you have a value bigger than N, you subtract multiples of N until your number is less than N.  Equivalently, it’s the remainder after division by N.  This test has no false negatives, but it does sometimes have false positives.  These are the “Carmichael numbers” and they’re more and more rare the larger the numbers being considered.  However, because of their existence we can’t use FLT with impunity.  For most purposes (such as generating encryption keys) FLT is more than good enough.

For a very long time (millennia) there was no way to verify with certainty that a number is prime in an efficient way.  But in 2002 Primes is in P was published, which introduced AKS (Agrawal–Kayal–Saxena primality test) that can determine whether or not a number is prime with complete certainty.  The time it takes for both FTL and AKS to work is determined by the log of N (which means they’re fast enough to be useful).

Stupid Big (~101010): Even if you have a fantastically fast technique for determining primality, you can render it useless by giving it a large enough number.  The largest prime found to date (May 2014) is N = 257,885,161 − 1.  At 17.4 million digits, this number is around ten times longer than the Lord of the Rings, and about twice as interesting as the Silmarillion.

Number of digits in the largest known prime vs. the year it was verified.

Number of digits in the largest known prime vs. the year it was verified.

To check that a number this big is prime you need to pick the number carefully.  The reason that 257,885,161 − 1 can be written so succinctly (just a power of two minus one) is that it’s one of the Mersenne primes, which have a couple nice properties that make them easy to check.

A Mersenne number is of the form Mn = 2n -1.  Turns out that if n isn’t prime, then neither is Mn.  Just like FLT there are false positives; for example M11 = 211 -1 = 23×89, which is clearly composite even though 11 is prime.  Fortunately, there’s yet another cute trick.  Create the sequence of numbers, Sk, defined recursively as Sk = (Sk-1)2 – 2 with S0 = 4.  If S_{p-2} = 0\,mod\,M_p, then Mp is prime.  This is really, really not obvious, so be cool.

With enough computer power this is a thing that can be done, but it typically requires more computing power than can reasonably be found in one place.

Update (Jan, 2016): The largest known prime is now 274,207,281 – 1.

Update (Jan, 2019): The largest known prime is now 282,589,933 – 1.


Answer Gravy: Fermat’s little theorem is pretty easy to use, but it helps to see an example.  There’s a lot more of this sort of thing (including a derivation) over here.

Example: N=7 and A=2.

[27-1]7 = [26]7 = [64]7 = [64-9×7]7 = [64-63]7 = 1

So, 7 is mostly likely prime.

Example: N=9 and A=5.

[58-1]9 = [57]9 = [25x25x25x5]9 = [7x7x7x5]9 = [49×35]9 = [4×8]9 = [32]9 = 5

Astute readers will note that 5 is different from 1, so 9 is definitely not be prime.

For bigger numbers a wise nerd will typically exponentiate by squaring.

Example: N=457 and A=2.  First, a bunch of squares:

\begin{array}{ll}\left[2^2\right]_{457}=\left[4\right]_{457}\\\left[2^4\right]_{457}=\left[4^2\right]_{457}=\left[16\right]_{457}\\\left[2^8\right]_{457} =\left[16^2\right]_{457}=\left[256\right]_{457}\\\left[2^{16}\right]_{457}=\left[256^2\right]_{457}=\left[185\right]_{457}\\\left[2^{32}\right]_{457}=\left[185^2\right]_{457}=\left[407\right]_{457}\\\left[2^{64}\right]_{457}=\left[407^2\right]_{457}=\left[215\right]_{457}\\\left[2^{128}\right]_{457}=\left[215^2\right]_{457}=\left[68\right]_{457}\\\left[2^{256}\right]_{457}=\left[68^2\right]_{457}=\left[54\right]_{457}\end{array}

As it happens, 457-1 = 456 = 256 + 128 + 64+8.

\begin{array}{ll}\left[2^{456}\right]_{457}\\[2mm]  =\left[2^{256}\cdot2^{128}\cdot2^{64}\cdot2^{8}\right]_{457}\\[2mm]  =\left[54\cdot68\cdot215\cdot256\right]_{457}\\[2mm]  =\left[202106880\right]_{457}\\[2mm]  =1\end{array}

So 457 is very likely to be prime (it is).  This can be verified with either some fancy algorithm or (more reasonably) by checking that it’s not divisible by any number up to √457.

Posted in -- By the Physicist, Math, Number Theory | 38 Comments