Skip to main content

Chapter 9 Number theory and Algorithm Development

This chapter investigates some aspects of Number Theory, which is the study of the integers. Prime numbers are the building blocks of the integers and play a huge role in other fields, like cryptography. We will use this topic as an example of scientific computing. We will look at examples of computational number theory throughout this chapter and then spend some time investigating the efficiency of the code here by formally benchmarking functions.

Section 9.1 Prime Numbers

Recall that a prime number is an integer greater than 1 whose only factors are 1 and itself. Most languages have built-in versions of testing for primes, and Julia has a package for this, and we will see it toward the end of this chapter. However, we will start by writing our own versions.
Based on the definition, if we can find the factors of an integer, then we can use it to determine if the number is prime. Also, recall that a number k is a factor of n if mod(n,k) or n % k is 0. For example, 4 is a factor of 24 because mod(24,4) is 0. Thus we can check all integers between 2 and n-1 are factors. Because we are trying to find a subset of all numbers in 1 to n, we will use the filter command seen in Chapter 6.
findAllFactors(n::Integer) = filter(k -> n % k == 0, 1:n)

Check Your Understanding 9.1.

(a)
Type in the function above and test it on various prime and non-prime (composite) numbers.
(b)
Use the function above to write a function isPrime that returns true or false depending on if the input is prime.
Hint.
The template for the function should be
function isPrime(n::Integer)

end
And recall the definition of prime and use that with the findAllFactors function to determine primeness.
Solution.
We basically want to know if the factors is length 2, thus we can use the following:
isPrime(n::Integer) = length(findAllFactors(n)) == 2

Section 9.2 Finding Primes

Now that we have a method that determines whether a number is prime or not, we’ll turn to finding prime numbers. Many languages have a function called nextPrime that takes a number as an input and returns a prime number greater than that. Also, you will need to have the isPrime function that you were asked to write in the exercise above. If you haven’t done that get the solution above. Before presenting the code, here’s some things to think about:
  • Since we want a function that takes in an integer and returns an integer, here’s a good template for our function:
    function nextPrime(n::Integer)
    
    end
    
  • We will need to test if n+1 is prime (that the smallest possible one).
  • If n+1 is prime, return it, if not check the next one.
  • One way to approach this is a loop (probably a while loop because we don’t know how many steps to do before finding a prime), but this is another great example of a recursive function.
Here’s the result:
nextPrime(n::Integer) = isPrime(n+1) ? n+1 : nextPrime(n+1) 
where again, we have used a ternary if-then-else. This first checks if n+1 is prime. If it is return it, if not call nextPrime on the next integer.

Check Your Understanding 9.2.

(a)
Test this function on some numbers that are less than 100 to see if it is working.
Solution.
Examples may include nextPrime(16) which returns 17 and nextPrime(26) which returns 29
(b)
Find the smallest prime less than 1 million.
Solution.
This is found by entering nextPrime(1_000_000) and returns 1000003.
(c)
Write a function that produces the first n prime numbers and returns those numbers as an array.
Hint.
The template for this will look like:
function getPrimes(n::Integer)

end
and there are many ways to accomplish this. One includes using filter on the numbers 2:n. Secondly, one might start with the empty array [] and using a while loop, call nextPrime until an index reaches n.
Solution.
Using the filter method on the numbers 2:n is straightforward in that we can do
filter(isPrime, 2:n)
The while looks is more challenging. We will use an index k and set it to 1 initially and every time through the loop call nextPrime(k) to get the next one. The following function finds the primes:
function getPrimes2(n::Integer)
  local primes = Int[]
  local k = 2
  while k < n
    push!(primes, k)
    k = nextPrime(k)
  end
  primes
end

Section 9.3 Perfect Numbers

A perfect number is an integer, n which has the sum of the factors (except itself) equal to n. For example, 28 is a perfect number because the factors are 1,2,4,7,14,28. The sum of the factors (except itself) is \(1+2+4+7+14=28\text{.}\) A nice function would be to check if a number is perfect. Here’s a template:
function isPerfect(n::Integer)
   # find the factors of n and check the sum
end
The following will go through the steps to look at this. And we will use the function findAllFactors that we wrote above.
First, entering findAllFactors(28) returns the array [1 2 4 7 14 28]. To use the definition of a perfect number, we need to exclude the last element of the array above. One way to do this is to use the pop! function. If we define
A = findAllFactors(28)
pop!(A)
A
then the result is [1 2 4 7 14]. Then we can test if the sum is 28 by sum(A)==28.
We can put all of this together as the function.
function isPerfect(n::Integer)
  A=findAllFactors(n)
  pop!(A)
  sum(A)==n
end
We should also test this. We know that 6 and 28 are perfect, so isPerfect(6) returns trueand isPerfect(28) returns true. Many other integers are not perfect, so isPerfect(10) and isPerfect(100) both return false.
Although there is nothing wrong with the above function, we can shorten it a bit by not removing the last element of the array. Note that if we sum all of the factors of a perfect number \(n\text{,}\) the sum will be \(2n\text{.}\) The following is alternative way to do this.
isPerfect2(n::Integer) = sum(findAllFactors(n)) == 2n
Note that in the function at the end is 2n, which has a implicit multiply in there. If you multiply any variable on the left with a numerical literal (that is a number, not a variable that is a number), then multiply is implied.

Subsection 9.3.1 Timing the two functions and the BenchmarkTools package

An objective way to determine which of the two functions is better is to test if one is faster. Let’s test if 10,000 is perfect or not. Using @time on both of these. First, @time isPerfect(100_000) returns
0.000117 seconds (4 allocations: 528 bytes)

false
And running @time isPerfect2(100_000) returns
0.000118 seconds (4 allocations: 528 bytes)

false
These both seem to return about the same time, but trying running each of these a number of times. You’ll notice that since the time is fairly short, the results bounce around quite a time. This is common for functions that don’t take much time.
We’ll examine the BenchmarkTools package. Recall that you need to add it using the package manager described in Appendix B. Once you have added the package, we’ll need to tell Julia that we’re using it by
using BenchmarkTools
And this package has the macro @btime, which is much like the @time macro, however is much more accurate in that it runs the code a few times, reporting the average results. Running @btime isPerfect(100_000) takes a few seconds then reports
180.292 μs (4 allocations: 528 bytes)

false
returns (and you’ll notice it takes a while)
and running @btime isPerfect2(100_000) returns.
181.000 μs (4 allocations: 528 bytes)

false
Note that the \(\mu s\) is microseconds where 1 \(\mu s\) is one-millionth of a second or \(10^{-6}\) seconds. If you rerun each of these a few times, you’ll get consistent results and times. That’s the point of the package BenchmarkTools. It runs the code many times and returns an average time.
About the results, since both versions of the isPerfect function are nearly identical, you may want to check a wider range of numbers (try this for numbers in the millions) to see if there is a difference. If you find there is virtually no difference between the functions, typically go with the one is that easier to understand (this is often the shorter one, but not always).

Section 9.4 Happy Numbers

Here’s another fun example. A positive integer \(n\) is called happy if you perform the following steps:
  1. Take the digits of \(n\) and square each one.
  2. Sum the squares.
  3. If the sum is 1, then the number is happy. If not repeat these steps.
Note: it’s also helpful that if this process results in the number 4, then you can never result in a sequence that reaches 1. You can call these number unhappy. Here’s some examples:
  • 13 is happy because \(1^{2}+3^{2}=10\) and \(1^{2}+0^{2}=1\text{,}\) so the result ends in 1.
  • The number 19 is also a happy number because \(1^{2}+9^{2}=1+81=82\text{,}\) then \(8^{2}+2^{2}=64+4=68\text{,}\) then \(6^{2}+8^{2}=36+64=100\text{,}\) then \(1^{2}+0^{2}+0^{2}=1\text{.}\)
  • The number \(4\) isn’t happy because
    • \(4^{2}=16\text{,}\) then
    • \(1^{2}+6^{2}=1+36=37\text{,}\) then
    • \(3^{2}+7^{2}=9+49=58\text{,}\) then
    • \(5^{2}+8^{2}=25+64=89\text{,}\) then
    • \(8^{2}+9^{2}=64+81=145\text{,}\) then
    • \(1^{2}+4^{2}+5^{2}=1+16+25=42\text{,}\) then
    • \(4^{2}+2^{2}=16+4=20\text{,}\) then
    • \(\displaystyle 2^{2}+0^{2}=4\)
    and since we have returned to 4, this will continue cycling, so we stop and say 4 is unhappy. Any number in this sequence is unhappy as well. (It has been proven that any unhappy number will eventually hit this cycle.)

Subsection 9.4.1 Computing Happy Numbers in Julia

Julia has a built-in function to take a number and split the digits called digits. For example digits(1234) returns [4 3 2 1], where the position in the array is the digit in the corresponding place in the number. For example, the 4 in the first slot in the array is the ones digit and the 1 is the thousands digit (since it is element 4 of the array).
Next, let’s write a function called isHappy which determines if a number n is happy. The function should be recursive and return true if it is 1, return false if it is 4 and do the steps above otherwise.
We will find a function that takes in a single integer and outputs a boolean. Our first attempt will be with a loop. We need to go through the digits, square the numbers and find the sum. We can do this as
function isHappy(n::Integer)
  if n==1
    return true
  elseif n==4
    return false
  else
    local d = digits(n)
    local sum = 0
    for i=1:length(d)
      sum += d[i]^2
    end
    return isHappy(sum)
  end
end
Here’s a few things to note:
  • Since we want a recursive function we call isHappy inside the function.
  • Also since it is recursive, we need to make sure that there is a way to stop the recursion. This occurs in the first two parts of the if statement.
  • The for loop between lines 9 and 11 sum the squares of each of the digits.
  • the return isHappy(sum) statement on line 12 calls the function recursively on the sum of the digits.
Check Your Understanding 9.3.
Try running the isHappy code on many positive integers.

Subsection 9.4.2 Alternative forms

We can shorten the last block of code into the following:
function isHappy2(n::Integer)
  if n==1
    return true
  elseif n==4
    return false
  else
    return isHappy2(sum(x->x^2,digits(n)))
  end
end
where instead of using a for loop, we sum the squares of an array using the sum command.
Lastly, we can write this as a one-liner again using the ternary if-then-else (but in this case, it is nested as
isHappy3(n::Integer) = n == 1 ? true : n == 4 ? false : isHappy3(sum(x->x^2,digits(n)))
and unless you’re strange (like me) and like such if-then-else statements, this is much less readable.
Check Your Understanding 9.4.
Test the three versions of isHappy presented here using the macro @btime from the package BenchmarkTools to see if there is a difference in speed.
Solution.
For all of these we use (somewhat arbitrarily) n=1_234. @btime isHappy(n) returns 901.786 ns, @btime isHappy3(n) returns 890.957 ns and @btime isHappy3 returns 892.196 ns. These are all within 0.5% of each other, so basically identical.

Subsection 9.4.3 Finding Happy numbers

One way to find happy numbers then is check a range of integers using the function you wrote above. For example, you can type
[isHappy(n) for n=1:100]
which will return an array of true/false if the number is happy or not. Try running it. Note that if you have an array of booleans, Julia returns 0 as false and 1 as true.
This isn’t so helpful. If your look down through the list, it’s hard to tell which numbers are happy and which are not. A better way to do this is to use Julia’s built-in function filter that we saw in Chapter 6. For example,
filter(isHappy, 1:100)
returns all integers between 1 and 100 that are happy or the array [1 7 10 13 19 23 28 31 32 44 49 68 70 79 82 86 91 94 97 100]
We saw the filter command with an array, but you can also use it with a range as shown above.

Section 9.5 Primes can be big!!

According to Wikipedia
 1 
, the largest prime number has 24,862,048 digits and was found in 2018. The prime is \(2^{82,589,933}-1\)
 2 
It is interesting in that as of writing this (September 2024), this is still the largest. I’m sure people are looking for larger ones, but haven’t found one in 6 years.
.
The number \(2^{89}-1\) is prime. However, if we type 2^89-1 into Julia, we get -1. Does this imply that \(2^{89}=0\text{?}\) Recall that the problem with this is that \(2^{89}\) does not fit into Int64 and we are getting an overflow error without being informed. You would need at least 89 bits to store this number, so it could be stored as a Int128.
As we saw in an earlier chapter, we can use BigInt so
n = big(2)^89-1
will give 618970019642690137449562111. However, don’t throw this at your isPrime function because it will not complete this because we have used a fairly bad algorithm for finding factors or testing for primality. We spend most of the rest of this chapter improving our algorithm to find primes.

Section 9.6 Speeding up isPrime

Let’s time the isPrime function that you wrote above on a reasonable-sized number. With
isPrime(n::Integer)= length(findAllFactors(n))==2
@btime isPrime(1_000_003)
and the results show that it takes 1.098 ms to complete.
Although this may not seem slow, 1 million is small if you are studying prime numbers and as we saw in the previous section, they get large fast. The reason this is slower is because isPrime relies on the findAllFactors function which checks every every number between 2 and \(n-1\) to see if it is a factor. This is very inefficient, so we will try some ways to speed things up.

Subsection 9.6.1 Speedup #1: Use a For loop

When we wrote findAllFactors above, we used the filter command, however, that may not have been the most efficient. Sometimes writing a for loop is faster. Let’s try. First, we know that both 1 and n are already factors, so we don’t need to check those so within the for loop, we will test if all numbers between 2 and \(n-1\) are factors in the following function.
function findAllFactors2(n::Integer)
  local factors = [1]
  for i=2:n-1
    if n % i ==0
      push!(factors,i)
    end
  end
  push!(factors,n) # n is always a factor of itself
end
Although above, we were testing the speed of isPrime, let’s compare the speed of findAllFactors versus findAllFactors2 on a reasonably large number. @btime findAllFactors(49_000_000) returns 55.202 ms and @btime findAllFactors2(49_000_000) returns 30.688 about 45% faster. Therefore, this seems like a good move and let’s write a new isPrime function based on this.
isPrime2(n::Integer) = length(findAllFactors2(n)) == 2
and if we time it with the same input as above as @btime isPrime2(1_000_003) we get 624.208 μs, which is about 45% faster than isPrime for this number.

Subsection 9.6.2 Speedup #2: Rewrite findAllFactors to stop earlier

In the function findAllFactors2 we go to \(n-1\) to check for a factor. There’s actually no reason to do this. For example, consider \(n=100\text{,}\) the factors are (and you can check this) are \(1,2,4,5,10,20,25,50,100\text{.}\) The second largest factor is at most half of the value \(n\text{.}\) Let’s consider then the following
function findAllFactors3(n::Integer)
  local factors = [1]
  for i=2:n÷2
    if n % i ==0
      push!(factors,i)
    end
  end
  push!(factors,n)
end
where n÷2 is integer division. You will see in an exercise why we don’t do n/2.
To check the efficiency of this method @btime findAllFactors3(49_000_000) takes 15.312 ms to return the array, which is about 50% faster than findAllFactors2 for the same input. Note that this is precisely half the time because we are checking half of the numbers. Although this example is fast, if it take half the time and this is repeated many times, like many algorithms may include, then this will be quite helpful.
We could then define a new isPrime function to use this instead.
isPrime3(n::Integer) = length(findAllFactors3(n))==2
If we run @btime isPrime3(1_000_003) we get this completing in 312.083 μs which runs in about 50% faster than the isPrime2 function.

Subsection 9.6.3 Speedup #3: Rewrite findAllFactors to not check all integers as factors

Notice that in both of the cases, we go through of the integers up to \(n-1\) or \(n \div 2\text{,}\) whereas that isn’t necessary because most of the time, factors come in pairs. Consider \(n=200\text{,}\) We can write the factors in the following way:
1 2 4 5 8 10
200 100 50 40 25 20
and there are pairs of factors and unless the number is a perfect square, they always occur in pairs. For example, when \(n=100\text{,}\) the factors are
1 2 4 5 10
100 50 25 20
and the factor 10 doesn’t have a pair.
function findAllFactors4(n::Integer)
  local x = round(Int,sqrt(n)) # closest integer to sqrt(n)
  local factors = [1,n]
  local j=2 # keep track where to insert elements
  for k=2:x
    if n%k==0
      # Insert the new factors in the middle of the factors array.
      # If k^2 is n, just add k, otherwise add the pairs.
      splice!(factors,j:(j-1),k^2 == n ? [k] : [k,div(n,k)])
      j+=1
    end
  end
  factors
end
and notice that if we try to evaluate time this with @btime findAllFactors4(49_000_000) the function finishes in 7.375 μs, more than 1000 times faster than the findAllFactors3 function. Because the loop goes up to \(\sqrt{n}\text{,}\) instead of \(n/2\text{,}\) then for a number like 49 million, you get a huge speed increase.
Since we really want to test the speed of an isPrime function, let’s write:
isPrime4(n::Integer) = length(findAllFactors4(n))==2
and now @btime isPrime4(1_000_003) returns 632.396 ns where ns is nanoseconds. Comparing this to the original isPrime, this is about 2000 times faster and about 500 times faster than version 2.

Subsection 9.6.4 Speedup #4: Don’t use findAllFactors at all

If we are only interested with determining if a number is prime or not, we don’t really need to calculate all of the factors. Let’s start with the findAllFactors4 function, which seems fast, but we will get rid of the array. Consider the following:
function isPrime5(n::Integer)
  for k=2:floor(Int,sqrt(n)) # integer nearest sqrt(n)
    if n%k==0 # if k is a factor of n
      return false
    end
  end
  true
end
In short, this function goes through all integers between 2 and \(\sqrt{n}\text{.}\) If it finds a factor, it just returns false. If instead, no integers were factors, it returns true. Timing this function with @btime isPrime4(1_000_003) results in finishing in 644.707 ns. This is just a slight speed up from isPrime4, but a speed up nonetheless.

Subsection 9.6.5 Speedup #5: Don’t do unnecessary calculations

One thing I noticed while writing this function is that the for loops goes through all numbers up to \(\sqrt{n}\text{.}\) A quick way to eliminate many of these is to check 2 then all odds between 3 and \(\sqrt{n}\text{.}\) This is because if \(2\) isn’t a factor of \(n\text{,}\) then any even cannot be either. Here’s a way to do that.
function isPrime6(n::Integer)
  if n == 1
  	return false
  elseif n == 2
    return true
  elseif n%2==0
    return false
  end
  for k=3:2:floor(Int,sqrt(n)) # odd integers to sqrt(n)
    if n%k==0
      return false
    end
  end
  true
end
and checking the time on this with @btime isPrime6(1_000_003) results in a time of 311.029 ns, about twice as fast as isPrime5, which is expected since it only checks half of the numbers.

Subsection 9.6.6 Speedup #6: check only primes as possible factors

We can continue the thought of the previous algorithm in that the only factors we need to check are prime numbers.
However, there’s a bit of circular reasoning here that we are checking to see if a number is prime and we would use primes to check this, but here goes.
An exercise above asked to generated a list of primes up to some number. The Sieve of Eratosthenes is one that generates a list of primes up to a number \(m\) in the following way.
  1. Write down all numbers between 2 and \(m\text{.}\)
  2. Start with 2 and cross out all multiples of 2.
  3. The next number greater than 2 not crossed out is 3. Cross out all multiples of 3.
  4. The next number greater than 3 not crossed out is 5.Cross out all multiples of 5.
  5. Continue this process until you reach \(\sqrt{n}\text{.}\)
  6. The resulting list are all prime.
See the animation on the wikipedia page linked above for a nice visual.
Here’s a Julia version of this algorithm:
function getPrimes(m::Integer) ## return all primes up to n using
  # the sieve of Eratosthenes
  local is_prime=trues(m) ## assume all are prime
  local k=2
  while k < sqrt(m)
    is_prime[2*k:k:m] .= false # all multiples of k are not prime
    k = nextPrime(k+ (k==2 ? k=1 : k+2)) # find the next prime after k and they will all be odd after k= 2
  end
  findall(is_prime)[2:end]
end
and for example, getPrimes(100) returns the array [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
Now, we can adapt the isPrime code to use only primes in the following way:
function isPrime7(n::Integer)
  if n==1
    return false
  end
  # get all primes up to the square root of n
  for k in getPrimes(floor(Int,sqrt(n)))
    if n%k==0
      return false
    end
  end
  true
end
Lastly, let’s just test the speed of this function:
@btime isPrime7(1_000_003)
which takes 3.453 μs and is about 10 times slower than isPrime6. This probably has to do with the fact that the first number of primes must be generated. Below we will test with a larger number to see if this pays off for testing larger primes.

Section 9.7 Summary of isPrime

Since this was all a lot to take in, let’s summarize all of these functions. We’ll stick with n=1_000_000_007, which is a prime number
Table 9.5. Summary of isPrime timing
command time
@btime isPrime(n) 1.132 s
@btime isPrime2(n) 631.973 ms
@btime isPrime3(n) 315.827 ms
@btime isPrime4(n) 20.167 μs
@btime isPrime5(n) 19.750 μs
@btime isPrime6(n) 9.833 μs
@btime isPrime7(n) 112.959 μs
It appears that isPrime6 is the fastest of the algorithms we developed here. Note that this just checks for a single number, whereas, the values may change as one increases the value of n.

Section 9.8 Testing for Primes using the Primes

Load the Primes library (and if you haven’t added it yet, do so):
using Primes
The Primes library has the isprime function to test for primality. Let’s test a few examples.
@btime isprime(1_000_000_007)
returns the answer in 702.833 ns and this is about ten times faster than our fastest algorithm, isPrime6. Note that the time is given in ns or nanoseconds and 1000 nanoseconds is 1 \(\mu\)s.
Note that the isprime function is quite fast. If we put in \(2^{89}-1\text{,}\) it still finds the answer quite quickly.
@btime isprime(big(2)^89-1)
returns the answer in 9.875 μs. I tried to run isPrime6 on \(2^{89}-1\) and it ran overnight and didn’t finish.