This chapter investigates some aspects of Number Theory, which is the study of the integers. Prime numbers are the building blocks of the integers and play a huge role in other fields, like cryptography. We will use this topic as an example of scientific computing. We will look at examples of computational number theory throughout this chapter and then spend some time investigating the efficiency of the code here by formally benchmarking functions.
Recall that a prime number is an integer greater than 1 whose only factors are 1 and itself. Most languages have built-in versions of testing for primes, and Julia has a package for this, and we will see it toward the end of this chapter. However, we will start by writing our own versions.
Based on the definition, if we can find the factors of an integer, then we can use it to determine if the number is prime. Also, recall that a number k is a factor of n if mod(n,k) or n % k is 0. For example, 4 is a factor of 24 because mod(24,4) is 0. Thus we can check all integers between 2 and n-1 are factors. Because we are trying to find a subset of all numbers in 1 to n, we will use the filter command seen in Chapter 6.
Now that we have a method that determines whether a number is prime or not, we’ll turn to finding prime numbers. Many languages have a function called nextPrime that takes a number as an input and returns a prime number greater than that. Also, you will need to have the isPrime function that you were asked to write in the exercise above. If you haven’t done that get the solution above. Before presenting the code, here’s some things to think about:
One way to approach this is a loop (probably a while loop because we don’t know how many steps to do before finding a prime), but this is another great example of a recursive function.
and there are many ways to accomplish this. One includes using filter on the numbers 2:n. Secondly, one might start with the empty array [] and using a while loop, call nextPrime until an index reaches n.
The while looks is more challenging. We will use an index k and set it to 1 initially and every time through the loop call nextPrime(k) to get the next one. The following function finds the primes:
A perfect number is an integer, n which has the sum of the factors (except itself) equal to n. For example, 28 is a perfect number because the factors are 1,2,4,7,14,28. The sum of the factors (except itself) is \(1+2+4+7+14=28\text{.}\) A nice function would be to check if a number is perfect. Here’s a template:
First, entering findAllFactors(28) returns the array [1 2 4 7 14 28]. To use the definition of a perfect number, we need to exclude the last element of the array above. One way to do this is to use the pop! function. If we define
function isPerfect(n::Integer)
A=findAllFactors(n)
pop!(A)
sum(A)==n
end
We should also test this. We know that 6 and 28 are perfect, so isPerfect(6) returns trueand isPerfect(28) returns true. Many other integers are not perfect, so isPerfect(10) and isPerfect(100) both return false.
Although there is nothing wrong with the above function, we can shorten it a bit by not removing the last element of the array. Note that if we sum all of the factors of a perfect number \(n\text{,}\) the sum will be \(2n\text{.}\) The following is alternative way to do this.
Note that in the function at the end is 2n, which has a implicit multiply in there. If you multiply any variable on the left with a numerical literal (that is a number, not a variable that is a number), then multiply is implied.
Subsection9.3.1Timing the two functions and the BenchmarkTools package
An objective way to determine which of the two functions is better is to test if one is faster. Let’s test if 10,000 is perfect or not. Using @time on both of these. First, @time isPerfect(100_000) returns
These both seem to return about the same time, but trying running each of these a number of times. You’ll notice that since the time is fairly short, the results bounce around quite a time. This is common for functions that don’t take much time.
We’ll examine the BenchmarkTools package. Recall that you need to add it using the package manager described in Appendix B. Once you have added the package, we’ll need to tell Julia that we’re using it by
And this package has the macro @btime, which is much like the @time macro, however is much more accurate in that it runs the code a few times, reporting the average results. Running @btime isPerfect(100_000) takes a few seconds then reports
Note that the \(\mu s\) is microseconds where 1 \(\mu s\) is one-millionth of a second or \(10^{-6}\) seconds. If you rerun each of these a few times, you’ll get consistent results and times. That’s the point of the package BenchmarkTools. It runs the code many times and returns an average time.
About the results, since both versions of the isPerfect function are nearly identical, you may want to check a wider range of numbers (try this for numbers in the millions) to see if there is a difference. If you find there is virtually no difference between the functions, typically go with the one is that easier to understand (this is often the shorter one, but not always).
Note: it’s also helpful that if this process results in the number 4, then you can never result in a sequence that reaches 1. You can call these number unhappy. Here’s some examples:
The number 19 is also a happy number because \(1^{2}+9^{2}=1+81=82\text{,}\) then \(8^{2}+2^{2}=64+4=68\text{,}\) then \(6^{2}+8^{2}=36+64=100\text{,}\) then \(1^{2}+0^{2}+0^{2}=1\text{.}\)
and since we have returned to 4, this will continue cycling, so we stop and say 4 is unhappy. Any number in this sequence is unhappy as well. (It has been proven that any unhappy number will eventually hit this cycle.)
Julia has a built-in function to take a number and split the digits called digits. For example digits(1234) returns [4 3 2 1], where the position in the array is the digit in the corresponding place in the number. For example, the 4 in the first slot in the array is the ones digit and the 1 is the thousands digit (since it is element 4 of the array).
Next, let’s write a function called isHappy which determines if a number n is happy. The function should be recursive and return true if it is 1, return false if it is 4 and do the steps above otherwise.
We will find a function that takes in a single integer and outputs a boolean. Our first attempt will be with a loop. We need to go through the digits, square the numbers and find the sum. We can do this as
function isHappy(n::Integer)
if n==1
return true
elseif n==4
return false
else
local d = digits(n)
local sum = 0
for i=1:length(d)
sum += d[i]^2
end
return isHappy(sum)
end
end
For all of these we use (somewhat arbitrarily) n=1_234. @btime isHappy(n) returns 901.786 ns, @btime isHappy3(n) returns 890.957 ns and @btime isHappy3 returns 892.196 ns. These are all within 0.5% of each other, so basically identical.
which will return an array of true/false if the number is happy or not. Try running it. Note that if you have an array of booleans, Julia returns 0 as false and 1 as true.
This isn’t so helpful. If your look down through the list, it’s hard to tell which numbers are happy and which are not. A better way to do this is to use Julia’s built-in function filter that we saw in Chapter 6. For example,
, the largest prime number has 24,862,048 digits and was found in 2018. The prime is \(2^{82,589,933}-1\) 2
It is interesting in that as of writing this (September 2024), this is still the largest. I’m sure people are looking for larger ones, but haven’t found one in 6 years.
The number \(2^{89}-1\) is prime. However, if we type 2^89-1 into Julia, we get -1. Does this imply that \(2^{89}=0\text{?}\) Recall that the problem with this is that \(2^{89}\) does not fit into Int64 and we are getting an overflow error without being informed. You would need at least 89 bits to store this number, so it could be stored as a Int128.
will give 618970019642690137449562111. However, don’t throw this at your isPrime function because it will not complete this because we have used a fairly bad algorithm for finding factors or testing for primality. We spend most of the rest of this chapter improving our algorithm to find primes.
Although this may not seem slow, 1 million is small if you are studying prime numbers and as we saw in the previous section, they get large fast. The reason this is slower is because isPrime relies on the findAllFactors function which checks every every number between 2 and \(n-1\) to see if it is a factor. This is very inefficient, so we will try some ways to speed things up.
When we wrote findAllFactors above, we used the filter command, however, that may not have been the most efficient. Sometimes writing a for loop is faster. Let’s try. First, we know that both 1 and n are already factors, so we don’t need to check those so within the for loop, we will test if all numbers between 2 and \(n-1\) are factors in the following function.
function findAllFactors2(n::Integer)
local factors = [1]
for i=2:n-1
if n % i ==0
push!(factors,i)
end
end
push!(factors,n) # n is always a factor of itself
end
Although above, we were testing the speed of isPrime, let’s compare the speed of findAllFactors versus findAllFactors2 on a reasonably large number. @btime findAllFactors(49_000_000) returns 55.202 ms and @btime findAllFactors2(49_000_000) returns 30.688 about 45% faster. Therefore, this seems like a good move and let’s write a new isPrime function based on this.
and if we time it with the same input as above as @btime isPrime2(1_000_003) we get 624.208 μs, which is about 45% faster than isPrime for this number.
Subsection9.6.2Speedup #2: Rewrite findAllFactors to stop earlier
In the function findAllFactors2 we go to \(n-1\) to check for a factor. There’s actually no reason to do this. For example, consider \(n=100\text{,}\) the factors are (and you can check this) are \(1,2,4,5,10,20,25,50,100\text{.}\) The second largest factor is at most half of the value \(n\text{.}\) Let’s consider then the following
To check the efficiency of this method @btime findAllFactors3(49_000_000) takes 15.312 ms to return the array, which is about 50% faster than findAllFactors2 for the same input. Note that this is precisely half the time because we are checking half of the numbers. Although this example is fast, if it take half the time and this is repeated many times, like many algorithms may include, then this will be quite helpful.
Subsection9.6.3Speedup #3: Rewrite findAllFactors to not check all integers as factors
Notice that in both of the cases, we go through of the integers up to \(n-1\) or \(n \div 2\text{,}\) whereas that isn’t necessary because most of the time, factors come in pairs. Consider \(n=200\text{,}\) We can write the factors in the following way:
and there are pairs of factors and unless the number is a perfect square, they always occur in pairs. For example, when \(n=100\text{,}\) the factors are
function findAllFactors4(n::Integer)
local x = round(Int,sqrt(n)) # closest integer to sqrt(n)
local factors = [1,n]
local j=2 # keep track where to insert elements
for k=2:x
if n%k==0
# Insert the new factors in the middle of the factors array.
# If k^2 is n, just add k, otherwise add the pairs.
splice!(factors,j:(j-1),k^2 == n ? [k] : [k,div(n,k)])
j+=1
end
end
factors
end
and notice that if we try to evaluate time this with @btime findAllFactors4(49_000_000) the function finishes in 7.375 μs, more than 1000 times faster than the findAllFactors3 function. Because the loop goes up to \(\sqrt{n}\text{,}\) instead of \(n/2\text{,}\) then for a number like 49 million, you get a huge speed increase.
and now @btime isPrime4(1_000_003) returns 632.396 ns where ns is nanoseconds. Comparing this to the original isPrime, this is about 2000 times faster and about 500 times faster than version 2.
Subsection9.6.4Speedup #4: Don’t use findAllFactors at all
If we are only interested with determining if a number is prime or not, we don’t really need to calculate all of the factors. Let’s start with the findAllFactors4 function, which seems fast, but we will get rid of the array. Consider the following:
function isPrime5(n::Integer)
for k=2:floor(Int,sqrt(n)) # integer nearest sqrt(n)
if n%k==0 # if k is a factor of n
return false
end
end
true
end
In short, this function goes through all integers between 2 and \(\sqrt{n}\text{.}\) If it finds a factor, it just returns false. If instead, no integers were factors, it returns true. Timing this function with @btime isPrime4(1_000_003) results in finishing in 644.707 ns. This is just a slight speed up from isPrime4, but a speed up nonetheless.
Subsection9.6.5Speedup #5: Don’t do unnecessary calculations
One thing I noticed while writing this function is that the for loops goes through all numbers up to \(\sqrt{n}\text{.}\) A quick way to eliminate many of these is to check 2 then all odds between 3 and \(\sqrt{n}\text{.}\) This is because if \(2\) isn’t a factor of \(n\text{,}\) then any even cannot be either. Here’s a way to do that.
function isPrime6(n::Integer)
if n == 1
return false
elseif n == 2
return true
elseif n%2==0
return false
end
for k=3:2:floor(Int,sqrt(n)) # odd integers to sqrt(n)
if n%k==0
return false
end
end
true
end
and checking the time on this with @btime isPrime6(1_000_003) results in a time of 311.029 ns, about twice as fast as isPrime5, which is expected since it only checks half of the numbers.
However, there’s a bit of circular reasoning here that we are checking to see if a number is prime and we would use primes to check this, but here goes.
An exercise above asked to generated a list of primes up to some number. The Sieve of Eratosthenes is one that generates a list of primes up to a number \(m\) in the following way.
function getPrimes(m::Integer) ## return all primes up to n using
# the sieve of Eratosthenes
local is_prime=trues(m) ## assume all are prime
local k=2
while k < sqrt(m)
is_prime[2*k:k:m] .= false # all multiples of k are not prime
k = nextPrime(k+ (k==2 ? k=1 : k+2)) # find the next prime after k and they will all be odd after k= 2
end
findall(is_prime)[2:end]
end
and for example, getPrimes(100) returns the array [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
function isPrime7(n::Integer)
if n==1
return false
end
# get all primes up to the square root of n
for k in getPrimes(floor(Int,sqrt(n)))
if n%k==0
return false
end
end
true
end
Lastly, let’s just test the speed of this function:
which takes 3.453 μs and is about 10 times slower than isPrime6. This probably has to do with the fact that the first number of primes must be generated. Below we will test with a larger number to see if this pays off for testing larger primes.
It appears that isPrime6 is the fastest of the algorithms we developed here. Note that this just checks for a single number, whereas, the values may change as one increases the value of n.
returns the answer in 702.833 ns and this is about ten times faster than our fastest algorithm, isPrime6. Note that the time is given in ns or nanoseconds and 1000 nanoseconds is 1 \(\mu\)s.