This chapter will show one aspect of writing fast code, an important aspect of scientific computing. We will study a number of ways to sum up the first \(n\) counting numbers and determine why the timing of things are different. Also, we are going to test the various types of integers as well to determine the speed of things with that. In each case we are going to create a function and then time it.
Although we all want our code to run fast, it is often the difference between solving a problem or not being able to solve it or perhaps restricting the size of the problem. Let’s look at an example of timing code in which we multiply two matrices of size n by n
function matrixMult(n::Integer)
local A = rand(n,n)
local B = rand(n,n)
A*B
end
We can time the command using the @time macro. An example is @time matrixMult(100) and the first line of the output might be 0.000229 seconds (9 allocations: 234.609 KiB).
The @time is called a macro (note the "@" symbol that starts this), which is similar to a function, however is more flexible. The first line clearly has the time, but also the array allocations and memory allocated. The second line of the macro returns the value of the function.
The big difference between these is the total memory allocation. There is a 1000-fold increase in memory allocated (note the MiB or megabytes versus GiB or gigabytes). This also shows about a 160-fold increase in time, surprising that it didn’t go up 1000 times as well.
Allocating memory is a time-expensive endeavor. Also, even though the machine that I ran this on has 16 gigabytes, perhaps it couldn’t get all 7.5 GiB it needed for this operation at once. This is shown with the 7.30% gc time, which mean garbage collection. In short, there was some time needed to handling so much memory. Garbage collection is a way to clean up allocated memory that is no longer needed. It’s complicated, but the percentage of gc can reveal why a calculation may be taking longer than expected.
If you have some patience, try this will 2 billion or more and see the results. You will probably need more allocation time and should be at least twice as long.
and notice that the time has shrunk to zero--we’ll see later why this is true. If we try for larger numbers, such as @time sum2(100_000_000_000), which still takes almost no time, but notice that that the result is not correct. Why? Consider the ideas from Chapter 3.
Hopefully you thought about the strange result above. If you thought overflow, give yourself a gold star. To avoid this, let’s write a BigInt version of this:
Notice that this allocates about 91 MiB of memory, which isn’t a lot, but this is probably due to the BigInt allocations. Also, this is about twice the amount of time as that in Attempt #3. Notice also a significant amount of gc (garbage collection) time.
Also notice that the number and size of the allocations have increased by a factor of 10, which should not be surprising. However, notice that they are quite large. For the second sum, there was nearly 1 Gb of allocation despite no array. This is due to the BigInt which allocates quite a bit of memory. It appears that Julia allocates memory for every BigInt created (and there are 10 million in this example).
Allocating an array is expensive (in terms of memory and time). In Attempt #1, we created an array using the collect function. The creation was why this took so much time.
BigInts and Int128 are slower than Int64s. We noticed that switching from Int64 to BigInt in Attempt #2 to #3, there was a significant drop in speed. In short, BigInts are slow. Only use them when needed.
You should have noticed from the exercise that Int128 is a viable alternative to Int64 if you need larger integers. Int128 is slower than Int64, but still much better than BigInt. Only use BigInt when you need really large integers.
Sometimes Julia is super smart about some operations. In both Attempt #2 and Attempt #5, we got much shorter times than expected. In both, you would expect summing 1000 times more numbers would take 1000 times longer, but this isn’t true. In both cases, Julia recognizes that integers are begin summed and is probably using the formula,
where we have used the ternary if-then-else. Note: if n is 1 or 2, then 1 is returned, if not it uses the recursive formula. The first 10 fibonacci numbers are found with
took 3.149046 seconds. If we find @time fibonacci(46), it takes about 60% longer. To determine what’s going on, consider fibonacci(5). Inside the function, it calls fibonacci(4) and fibonacci(3). Each of these then called the previous two. This can be seen in the following tree graph where \(f(n)\) is the fibonacci function:
Since each of the endpoints (either \(f(2)\) or \(f(1)\)) requires and evaluation as does each arrow, there are a total of 13 evaluations for this. If we define:
function fibonacciEval(n::Integer)
global num_evals
if n==1 || n==2
num_evals +=1
return 1
else
num_evals += 2
return fibonacciEval(n-1) + fibonacciEval(n-2)
end
end
Thinking about this result shows that we aren’t doing things efficiently. If we have already calculated the 19th and 20th fibonacci numbers, why does it take an extra 60% longer? Basically this is because we aren’t saving the results to be reused.
function fibonacci2(n::Integer)
local x,y = (1,1)
for i = 1:n-1
x,y = (y, x+y)
end
x
end
and note that there is no recursive call. Instead, we use a tuple to iterate pairs of numbers (x,y) which get updated each time through the for loop. Notice on line 2, this initializes the fibonacci sequence and line 4 is the key to this. x takes on the value of y and then y is the sum of the previous two integers.