Timing Code and Writing for Speed

Chapter 8 Timing Code and Writing for Speed

Objectives

Introduction to timing code and doing comparisons between code that gets the same results.
🔗

🔗
Understanding why code takes longer than other code.
🔗

🔗
Comparing recursive and non-recursive code in terms of number of function evaluations.
🔗

🔗

This chapter will show one aspect of writing fast code, an important aspect of scientific computing. We will study a number of ways to sum up the first \(n\) counting numbers and determine why the timing of things are different. Also, we are going to test the various types of integers as well to determine the speed of things with that. In each case we are going to create a function and then time it.

🔗

Section 8.1 Timing Code

Although we all want our code to run fast, it is often the difference between solving a problem or not being able to solve it or perhaps restricting the size of the problem. Let’s look at an example of timing code in which we multiply two matrices of size n by n

🔗

function matrixMult(n::Integer)
  local A = rand(n,n)
  local B = rand(n,n)
  A*B
end

We can time the command using the @time macro. An example is @time matrixMult(100) and the first line of the output might be 0.000229 seconds (9 allocations: 234.609 KiB).

🔗

Section 8.2 Attempt #1

Let’s start by making an array of all of the numbers, then summing them in a for loop:

🔗

function sum1(n::Int)
  local arr = collect(1:n)
  local sum = 0
  for i in arr
    sum += i
  end
  sum
end

On my laptop: @time sum1(1_000_000) returns

🔗

0.026257 seconds (2 allocations: 7.629 MiB, 91.73% gc time)

500000500000

🔗

The @time is called a macro (note the "@" symbol that starts this), which is similar to a function, however is more flexible. The first line clearly has the time, but also the array allocations and memory allocated. The second line of the macro returns the value of the function.

🔗

If we repeat this for the sum of the first billion counting numbers:

@time sum1(1_000_000_000)

🔗

it takes a while to get

🔗

  4.286106 seconds (2 allocations: 7.451 GiB, 7.30% gc time)

  500000000500000000

🔗

The big difference between these is the total memory allocation. There is a 1000-fold increase in memory allocated (note the MiB or megabytes versus GiB or gigabytes). This also shows about a 160-fold increase in time, surprising that it didn’t go up 1000 times as well.

🔗

Allocating memory is a time-expensive endeavor. Also, even though the machine that I ran this on has 16 gigabytes, perhaps it couldn’t get all 7.5 GiB it needed for this operation at once. This is shown with the 7.30% gc time, which mean garbage collection. In short, there was some time needed to handling so much memory. Garbage collection is a way to clean up allocated memory that is no longer needed. It’s complicated, but the percentage of gc can reveal why a calculation may be taking longer than expected.

🔗

If you have some patience, try this will 2 billion or more and see the results. You will probably need more allocation time and should be at least twice as long.

🔗

Section 8.3 Attempt #2

The big difference seemed to be the total memory allocation, so let’s try a version where we don’t allocate the array.

🔗

function sum2(n::Integer)
  local sum = 0
  for i=1:n
    sum+=i
  end
  sum
end

and running it as @time sum2(1_000_000_000) we get

🔗

0.000000 seconds

  500000000500000000

🔗

and notice that the time has shrunk to zero--we’ll see later why this is true. If we try for larger numbers, such as @time sum2(100_000_000_000), which still takes almost no time, but notice that that the result is not correct. Why? Consider the ideas from Chapter 3.

🔗

Section 8.4 Attempt #3

Hopefully you thought about the strange result above. If you thought overflow, give yourself a gold star. To avoid this, let’s write a BigInt version of this:

🔗

function sum3(n::Int)
  local sum = big(0)
  for i=1:n
    sum+=i
  end
  sum
end

and, @time sum3(1_000_000) returns

🔗

0.146406 seconds (3.00 M allocations: 45.776 MiB)

  500000500000

🔗

Comparing this with the results at the beginning of Attempt #1, shows that BigInts are much slower as mentioned in Chapter 3.

🔗

@time sum3(10_000_000)

🔗

returns

🔗

2.288845 seconds (30.00 M allocations: 457.764 MiB, 34.08% gc time) 

  50000005000000

🔗

which is more than 10 times slower, so it appears that this is linear as possibly as expected. This isn’t practical to find much larger sums though.

🔗

Section 8.5 Attempt #4

Let’s trying using the reduce function on BigInts:

🔗

function sum4(n::Int)
    reduce(+,1:big(n))
end

Then @time sum4(1_000_000) returns

🔗

  0.335802 seconds (5.00 M allocations: 91.553 MiB, 20.56% gc time)

  500000500000

🔗

Notice that this allocates about 91 MiB of memory, which isn’t a lot, but this is probably due to the BigInt allocations. Also, this is about twice the amount of time as that in Attempt #3. Notice also a significant amount of gc (garbage collection) time.

🔗

Also, @time sum4(10_000_000) returns

🔗

  3.254972 seconds (50.00 M allocations: 915.528 MiB, 16.57% gc time)

  50000005000000

🔗

which is about 10 times slower that the previous one, and still about twice the speed of sum3 for the same number n.

🔗

Also notice that the number and size of the allocations have increased by a factor of 10, which should not be surprising. However, notice that they are quite large. For the second sum, there was nearly 1 Gb of allocation despite no array. This is due to the BigInt which allocates quite a bit of memory. It appears that Julia allocates memory for every BigInt created (and there are 10 million in this example).

🔗

Section 8.6 Attempt #5

Let’s use the built-in sum command. Entering @time sum(1:big(10)^6) returns

🔗

 0.001495 seconds (41 allocations: 816 bytes)

  500000500000

🔗

and note that higher powers of 10 do not increase the time much. For example, @time sum(1:big(10)^20) results in

🔗

0.000828 seconds (42 allocations: 904 bytes)

  5000000000000000000050000000000000000000

🔗

What is going on? How is this so fast? Think about this and we’ll answer it below.

🔗

Check Your Understanding 8.1.

Repeat Attempts #1--#4 for Int128. That is, make sure that the calculations are being done using this type. For the for loops, start with

🔗

local sum = Int128(0)

🔗

and for the reduce function use Int128(n) instead of big(n).

🔗

Compare your results with those above.

🔗

Section 8.7 Summary of Results

There were a lot of factors going on in the above example. Here’s a summary.

🔗

Allocating an array is expensive (in terms of memory and time). In Attempt #1, we created an array using the collect function. The creation was why this took so much time.
🔗

🔗
BigInts and Int128 are slower than Int64s. We noticed that switching from Int64 to BigInt in Attempt #2 to #3, there was a significant drop in speed. In short, BigInts are slow. Only use them when needed.
🔗

You should have noticed from the exercise that Int128 is a viable alternative to Int64 if you need larger integers. Int128 is slower than Int64, but still much better than BigInt. Only use BigInt when you need really large integers.
🔗

🔗
Sometimes Julia is super smart about some operations. In both Attempt #2 and Attempt #5, we got much shorter times than expected. In both, you would expect summing 1000 times more numbers would take 1000 times longer, but this isn’t true. In both cases, Julia recognizes that integers are begin summed and is probably using the formula,
🔗

\begin{equation*} 1+2+3 + \cdots + n = \frac{n(n+1)}{2} \end{equation*}

🔗

and using this formula can be done for any number \(n\) with no summing at all.
🔗

🔗

🔗

Section 8.8 Computing Fibonacci Numbers

In Section 4.9, there was an exercise to use recursion to find the fibonacci numbers. A possible solution to this is:

🔗

fibonacci(n::Integer) = (n==1 || n==2) ? 1 : fibonacci(n-1) + fibonacci(n-2)

🔗

where we have used the ternary if-then-else. Note: if n is 1 or 2, then 1 is returned, if not it uses the recursive formula. The first 10 fibonacci numbers are found with

🔗

map(fibonacci,1:10)

🔗

and this results in the array[1 1 2 3 5 8 13 21 34 55].

🔗

It is fast to find the fibonacci numbers for smaller values, but consider

🔗

@time fibonacci(45)

🔗

took 3.149046 seconds. If we find @time fibonacci(46), it takes about 60% longer. To determine what’s going on, consider fibonacci(5). Inside the function, it calls fibonacci(4) and fibonacci(3). Each of these then called the previous two. This can be seen in the following tree graph where \(f(n)\) is the fibonacci function:

🔗

Figure 8.2. A tree graph that shows how the recursive fibonacci numbers are created.
🔗

Since each of the endpoints (either \(f(2)\) or \(f(1)\)) requires and evaluation as does each arrow, there are a total of 13 evaluations for this. If we define:

🔗

function fibonacciEval(n::Integer)
  global num_evals
  if n==1 || n==2
    num_evals +=1
    return 1
  else
    num_evals += 2
    return fibonacciEval(n-1) + fibonacciEval(n-2)
  end
end

then

num_evals=0
fibonacciEval(5)
num_evals

🔗

returns 13. Also

🔗

num_evals=0
fibonacciEval(20)
num_evals

🔗

returns 20293 and

🔗

num_evals=0
fibonacciEval(21)
num_evals

🔗

returns 32836 and notice that finding the 21st fibonacci number takes about 60% more operations and therefore would take about this much longer.

🔗

Thinking about this result shows that we aren’t doing things efficiently. If we have already calculated the 19th and 20th fibonacci numbers, why does it take an extra 60% longer? Basically this is because we aren’t saving the results to be reused.

🔗

Subsection 8.8.1 Speeding Up Fibonacci

As we saw above, the standard recursive version of fibonacci function is very slow.

in fact, it is \(O(e^n)\)

We now attempt to find a faster version. The following is a fibonacci function that uses a for loop:

🔗

function fibonacci2(n::Integer)
  local x,y = (1,1)
  for i = 1:n-1
    x,y = (y, x+y)
  end
  x
end

and note that there is no recursive call. Instead, we use a tuple to iterate pairs of numbers (x,y) which get updated each time through the for loop. Notice on line 2, this initializes the fibonacci sequence and line 4 is the key to this. x takes on the value of y and then y is the sum of the previous two integers.

🔗

You should run this and if you do, note that @time fibonacci2(100) returns

0.000000 seconds
3736710778780434371

🔗

and clearly obviously much faster than the 45th fibonacci number with the original function. The exercise below explores the fibonacci2 function.

🔗

Check Your Understanding 8.3.

Determine which fibonacci number results in an overflow for Int64. Do this by trial and error with the fibonacci2 function.
🔗

🔗
Rewrite the fibonacci2 function to return a BigInt version if \(n\) is greater than or equal to the number in #1.
🔗

🔗
Find the first 200 fibonacci numbers.
🔗

🔗

🔗

Prev Top Next