A Bit about Numbers

Author

Darren Irwin

On previous pages we have learned some about how Julia remembers numbers as belonging to different types. Why is this? Isn’t a number just a number? Well, when it comes to storing them in computer, the answer is no. To understand why, let’s learn a little about how computers store information.

Binary memory

The great majority of computers today (and certainly your own personal computer) store information as a series of 1s and 0s. (You can equally well think of these as on/off or true/false.) A single 1/0 switch is called bit—the smallest unit of information that a computer can store.

A set of 8 bits is called a byte. The memory on a computer consists of gigantic grids of such bytes. When we talk about memory sizes of computers, we use terms like kilobytes (KB; 1,000 bytes, or 8,000 bits) or megabytes (MB; 1,000,000 bytes), and so forth.

Binary counting

So how can numbers other than 0 and 1 be stored? As a series of 0s and 1s, which are interpreted as a group according to some rule. For example, we can simply count in binary to convert our decimal (base-10) numbers to the way computers often store integers:

Decimal 0 can be stored as binary 0.
Decimal 1 can be stored as binary 1.
Decimal 2 can be stored as binary 10.
Decimal 3 can be stored as binary 11.
Decimal 4 can be stored as binary 100.
Decimal 5 can be stored as binary 101.

And so forth. You can see that storing larger integers requires more bits.

The take-home message here: For integers, there is a tradeoff between memory size and the range of integers that can be stored.

What about non-integers?

The above works fine for integers, but we often want to store numbers like 7.1 or 0.005 or sqrt(2). For this, people have come up with a clever way of encoding numbers called floating-point numbers, because you can think of it as storing numbers as an integer multiplied by a number that represents the magnitude (which adjusts where the decimal point goes).

This floating-point number system works great, but there is a limitation: In a fixed set of bits, there is a limit to the precision of such numbers. In other words, there are only certain values such numbers can take, with gaps between them. The more bits we dedicate to storing such numbers, the smaller these gaps are.

The take-home message here: For floating-point numbers, there is a tradeoff between memory size and precision of the numbers stored.

Numbers in Julia

To learn a bit about how numbers are represented by different Julia types, we can use the bitstring() function:

bitstring(1)

"0000000000000000000000000000000000000000000000000000000000000001"

This shows us that the number 1 is stored as 63 0s and then a 1. This is Julia’s default way of storing integers, as type Int64 (a 64-bit integer).

Write out bit representations for a series of integers

Can you write a simple for loop to write out the bit representations of a series of integers, for example 0 through 10? (Reminder: the println function will print out some text)

Let’s try the same but use 1.0 instead:

bitstring(1.0)

"0011111111110000000000000000000000000000000000000000000000000000"

Here we have quite a different representation. This is because the number is now stored as type Float64 (a 64-bit floating-point number).

The `about()` function

We can learn more about how numbers are encoded by adding a nice little package called About:

using Pkg; Pkg.add("About")

This package has one function, wisely named about() as it tells us about objects (note though that it can only tell you about relatively simple objects).

using About
about(1)

Int64 (<: Signed <: Integer <: Real <: Number <: Any), occupies 8B.

 0000000000000000000000000000000000000000000000000000000000000001
 = +1

The highlighting tells you how the number is encoded (one bit for the positive sign, the rest for the integer)

about(1.0)

Float64 (<: AbstractFloat <: Real <: Number <: Any), occupies 8B.

 0011111111110000000000000000000000000000000000000000000000000000 
 ╨└────┬────┘└────────────────────────┬─────────────────────────┘
 +    2^0   ×                1.000000000000000000                
 = 1.0000000000000000

The result shows you which bits are for which components of the floating-point number.

Try about() on other objects

Play around with this function to learn about how other types of things are stored. (For example, about('x'), about("ACGT"), about(3//7), about(sqrt(2)), about(pi).)

Choosing types wisely

Why does all this matter? Well, if you don’t actually need 64 bits for your numbers, then you can tell Julia to store them in fewer bits, and this can increase efficiency (i.e., reduce memory and increase speed).

For example, if you know that a variable will only need to store integers ranging between -128 and 127, then you can specify that variable as an 8-bit integer:

x8 = Int8(39)

typeof(x8)

Int8

bitstring(x8)

"00100111"

about(x8)

Int8 (<: Signed <: Integer <: Real <: Number <: Any), occupies 1B.

 00100111 = +39

We have succeeded in storing the number 39 in only 8 bits rather than 64.

As another example, let’s say we need to store a floating-point number but we don’t need a lot of precision. Let’s store it as a 16-bit floating-point number:

pi16 = Float16(3.14159)

Float16(3.14)

Julia’s response indicates the rough level of precision that the number has been stored in (only 3.14). That may not be a wise choice for that one number, but imagine if you had millions of numbers stored in memory—perhaps that precision is good enough for what you are using them for, and cuts the memory by 75% (compared to Float64).

Check the capacity of types for your numbers

If you are working with integers, you can find the minimum storable integer for a type like this:

typemin(Int16)

-32768

And the maximum like this:

typemax(Int16)

Hence we know that type Int16 can store integers from -32768 to 32767.

Weird math

What happens if you take encode the number 32767 encoded as an Int16 and add 10 to it? Try out by entering Int16(32767) + Int16(10). Can you figure out what happened? (Hint: this is called “overflow”.)

Beware of other weird math

In addition to overflow (with integers), we want to be aware of the consequences of floating-point imprecision. An example:

0.1 + 0.2 == 0.3   # note the "==" makes this a Boolean statement, asking whether this equation is true

false

The REPL returns false. What?!?

This is a result of the imprecision of floating-point numbers. Let’s ask what the left side of the equation is evaluated as:

0.1 + 0.2

0.30000000000000004

We see it is slightly different from 0.3. This is because both 0.1 and 0.2 are stored with a bit of imprecision, and their total is actually a smidgin bigger than 0.3.

We are now at risk of a deep dive into computer science, but I will pull us back from the brink. The makers of Julia have provided us biologists a solution to the above. Just use the “approximately equal” symbol: ≈ (to get this, type \approx then press the “tab” key.)

0.1 + 0.2 ≈ 0.3   # note the "≈" is the Boolean operator for approximately equal

true

The REPL now says it is a true statement, within the (im)precision of the data type being used.

(The ≈ symbol is a short way of calling the isapprox() function—those interested can look that up for the gory details.)

Julia provides another solution to this situation: We can use the Rational type to encode rational numbers precisely, as ratios of integers:

1//10

1//10

We can see that Julia is treating this expression in a different way than 0.1 or 1/10 (which Julia would convert to 0.1). With 1//10, Julia is remembering both the numerator and denominator, storing this as a ratio of integers. We can check the type:

typeof(1//10)

Rational{Int64}

Now that the ratios are now stored precisely, let’s check our Boolean statement:

1//10 + 2//10 == 3//10

true

Precise math now works! However, there is a limitation to use of the Rational type: it cannot store irrational numbers like sqrt(3) or π.

Machine epsilon

Above we touched on how the Float64 type (or any way of storing numbers in a finite number of bits) must have gaps between the numbers that can be represented precisely. The size of the gap between two representable floating-point numbers is known as “machine epsilon”. We can use the eps() function to return the distance between 1.0 and the next larger representable floating-point value:

eps(Float32) # "Float32" is the type that we feed into the function

1.1920929f-7

That is for 32-bit floating point numbers. The epsilon for a 64-bit floating-point number is a lot smaller:

eps(Float64)

2.220446049250313e-16

Recognize that number?

Those of you who have used R or have read much of the scientific literature might notice the above value, often shown as 2.2 × 10^-16. For complex reasons, this machine epsilon value was chosen as the minimum reported P-value for many statisical software packages, including R.

`BigInt` and `BigFloat`

For most uses, Int64 and Float64 provide excellent flexibility with numbers. In those rare cases where you need even bigger integers or even more precision on your floats, Julia provides BigInt and BigFloat types. These use as many bits as you need to provide the number of numbers that you want.

For example, let’s start with the largest integer that can be stored as an Int16:

myLargeNum1 = typemax(Int64)

9223372036854775807

If we add 1 to that, we would have an overflow problem. Instead, let’s make it a BigInt and add 1:

myLargeNum2 = BigInt(myLargeNum1) + 1

9223372036854775808

about(myLargeNum2)

BigInt (mutable) (<: Signed <: Integer <: Real <: Number <: Any)
 Memory footprint: 16B directly (referencing 32B in total)

This uses 16 bytes (128 bits), twice as much as our starting number. Its memory footprint can grow even bigger if we need it to.

The BigInt type can be useful for storing precise values of things like factorials:

factorial(BigInt(50))

30414093201713378043612608166064768844377641568960512000000000000

We can also store floating-point numbers in more than 64 bits, using type BigFloat. Actually, there is a shorthand function big() that converts its input argument into either a BigFloat or BigInt:

big(0.1)

0.1000000000000000055511151231257827021181583404541015625

This displays the value with lots of precision, because it is a BigFloat. It shows how the Float64 value for 0.1 is somewhat different from 1//10.

If we want to make a floating point value that is much closer to perfect 1//10, we could do this:

big"0.100000000000000000000000000000000000000000000"

0.1000000000000000000000000000000000000000000000000000000000000000000000000000002

Next steps

Thinking about efficiency of number types will be particularly important when storing large numbers of numbers in Arrays or DataFrames, which we will explore on the next page.

Binary memory

Binary counting

What about non-integers?

Numbers in Julia

The about() function

Choosing types wisely

Check the capacity of types for your numbers

Beware of other weird math

Machine epsilon

BigInt and BigFloat

Next steps

The `about()` function

`BigInt` and `BigFloat`