Chapter 5: Functional Programming and an Introduction to Writing fast code

Return to all notes

This chapter introduces ideas of functional programming, which is a way to think about programming using functions. Typically this requires a language that falls into the category of a functional computer language, one of which is Julia. In short, a language that has this feature has a function (or procedure) as a fundamental element of the language and one of which can be passed as arguments.

Functional vs. Statement-based languages (Not the terminology)

Let’s look at a relatively simple example of taking an array of integers from 1 to 5 and producing another array which is the square of each number. In a statement-based point-of-view:

arr=[1,2,3,4,5]
newarr=[]
for num in arr
  push!(newarr,num^2)
end
newarr

which returns the array [1,4,9,16,25]. There are other variations on this. From a functional point-of-view, a common method is a map command which takes in an array and produces another array in which a function acts on each element of the array. In this case, the function is the square.

f(x)=x^2
arr=[1,2,3,4,5]
map(f,arr)

or more simply using an anonymous function,

map(x->x^2,arr)

The key to this example is the ability to pass the function as an argument to the map command.

Exercise

  1. Write a statement-based loop to take the array [1,2,3,4,5] and output an array that is the reciprocal of each number.

  2. Write a functional-based code using the map command to do the same.

map!

So maybe we can be really excited about the map command, but map! is similar with an important difference. Let’s say that you need a new array, but the old one isn’t important and in this case map! is what should be used.

A=[1,2,3,4,5]
map!(x->x^2,A)

and now the content of the array A is [1,4,9,16,25]. Check it by just typing A.

We will see many functions in julia with a bang (!) and the convention is that such functions alter their arguments.

Note: many languages don’t allow functions to change the arguments and there is often good reason to do so, mainly preventing bugs from cropping in. The approach in Julia is to allow it, but make sure that the user is aware is is happening. Also, often there are two versions of functions, one which returns a new thing that is altered (the one without the !) and a second that alters the thing (with the !).

Reducing an array

The map command in general takes in an array and returns an array. Another big idea on array is to reduce the entire array to a single value. The most common example of this is to sum the contents of a numeric array. To do this with the reduce command we type

reduce(+,arr)

and the result on the array arr=[1,2,3,4,5] is 15. Note: the sum command is basically this function call.

To multiply all numbers, we can type:

reduce(*,arr)

These are obvious examples. Here’s one that returns the count of the number of array elements that are greater than zero:

num_pos(arr) = reduce((n,val)->ifelse(val>0,n+1,n),arr,init=0)

and then if it is tested on an array of both positive and negative numbers:

num_pos([-3,5,8,-2,11])

this returns 3. How this works is as follows. There are two values stored within reduce. The first is n and the second is val

  1. the variable n is initially 0 (this is the init=0 part of the function call)
  2. On the first step, val takes on the first value in the array (or -3). It is checked if it is positive and if so, return n+1 or n, so after the first step the internal function returns 0.
  3. On the second step, val is 5 and this time the function returns n+1 or 1
  4. val is 8 and the function returns n+1 or 2
  5. val is -2 and the function returns n or 2
  6. val is 11 and the function returns n+1 or 3.
  7. Since the array has been passed through, the result of the entire function is 3.

Exercise

Write a reduce function that will count the number of times the string “hi” appears in an array. Test it on a few arrays.

Exercise

The mapreduce function

The mapreduce function is perhaps more helpful than reduce. For example, if we want to sum the squares of each number in an array, then mapreduce can do this easily.

mapreduce(x->x^2,+,0,[1,2,3])

is a short-hand way to do $12+22+32=14$

Note: julia also has a version of the sum function that applied a function

sum(x->x^2,[1,2,3])

Also returns 14.

Exercise

The geometric mean of a set of numbers $x_1,x_2, \ldots, x_n$ is defined as
$$ \sqrt[n]{x_1 x_2 \cdots x_n}$$

Use the map reduce function to find the geometric mean of [1,2,5,7,9,10]

Ranges

As we saw in Chapter 4, in a for loop, the following syntax:

for i=1:4
  println(i)
end

where the value of i was printed for 1,2,3,4. The syntax 1:4 is called a Range and is shorthand for all integer between 1 and 4 inclusively. Technically, this is called a UnitRange because the skips between numbers is 1.

If you type

1:4

you will it return UnitRange{Int64}, where is a Unit Type with integers. Note the difference using 1.0:4.0.

A handy command is collect, which returns an array with all of the numbers in the range.

collect(1:10)

returns an integer array with all of the numbers expected. typing

collect(1.0:9.5)

returns an array of all of the floating point numbers between 1.0 and 9.0 and

collect(1:3:30)

returns an array starting a 1 and skipping every 3 integers (stopping at 28). This range is called a StepRange

Writing fast code

The last part of this chapter will show one aspect of writing fast code, an important aspect of scientific computing. We will study a number of ways to sum up the contents of an array and determine why the timing of things are different.

Sum of integers from 1 to n: Try #1

Although this is not a difficult task, we wish to sum all of the integers from one to n, some large positive integer. We have seen this previously, but want to explore a number of ways to do this.

Let’s start by making an array of all of the numbers, then summing them in a for loop:

function sum1(n::Integer)
    local arr = collect(1:n)
    local sum = 0
    for i in arr
        sum += i
    end
    sum
end

Try calling this for 1000 or a bit higher and you’ll notice it’s pretty fast. On a relatively slow machine:

@time sum1(1_000_000)

returns

0.021230 seconds (13.84 k allocations: 8.335 MiB)

500000500000

However, if we try it for a billion:

@time sum1(1_000_000_000)

it takes a while:

20.393715 seconds (7 allocations: 7.451 GiB, 0.25% gc time)

500000000500000000

The big difference between these is the total memory allocation. Above was 8.335 megabytes and the second was 7.451 gigabytes. I ran this on a machine with 16Gb of RAM and it appears that julia didn’t need all 7.451 at the same time, but still, that’s a huge difference.

Sum of integers from 1 to n: Try #2

The big difference seemed to be the total memory allocation, so let’s try a version where we don’t allocate the array.

function sum2(n::Integer)
    local sum = 0
    for i=1:n
        sum+=i
    end
    sum
end

and running it as @time sum2(1_000_000_000) we get:

0.010735 seconds (12.86 k allocations: 686.479 KiB)

500000000500000000

and you can see a HUGE difference in speed and memory allocation between sum1 and sum2. If we crank further with

@time sum2(1_000_000_000_000)

we get:

0.000003 seconds (5 allocations: 176 bytes)

1001882602603448320

however, this super fast, but incorrect. This is because of integer overflow.

Sum of integers from 1 to n: Try #3

So let’s write a BigInt version of this:

function sum3(n::BigInt)
    local sum = big(0)
    for i=1:n
        sum+=i
    end
    sum
end

and:

@time sum3(big(10)^6)

returns

1.460529 seconds (13.04 M allocations: 215.475 MiB, 24.04% gc time)

500000500000

which you can see is much slower due to the fact of operations with BigInts are much slower than the built-in integers. If we do:

@time sum3(big(10)^7)

we get:

14.245302 seconds (130.00 M allocations: 2.086 GiB, 22.76% gc time)

50000005000000

which is 10 time slower, so it appears that this is linear as possibly as expected.

Sum of integers from 1 to n: Try #4

Let’s trying using the reduce function on BigInts:

function sum4(n::BigInt)
    reduce(+,1:n)
end

Then:

@time sum4(big(10)^6)

returns:

0.889843 seconds (9.03 M allocations: 138.714 MiB, 20.15% gc time)

500000500000

which is about half the time (and half the memory allocation) of the previous one. Also:

@time sum4(big(10)^7)

returns

8.512681 seconds (90.00 M allocations: 1.341 GiB, 20.20% gc time)

50000005000000

which is about 10 times slower, but half the speed of sum3 for the same number n. The reduce command probably has a number of built-in savings compared to running a for-loop.

Sum of integers from 1 to n: Try #5

Let’s use the built-in sum command. Entering:

@time sum(1:big(10)^6)

returns

0.000081 seconds (62 allocations: 1.172 KiB)

500000500000

and note that higher powers of 10 do not increase the time much. For example,

@time sum(1:big(10)^20)

results in

0.002809 seconds (111 allocations: 4.117 KiB)

5000000000000000000050000000000000000000

What is going on? How is this so fast?