This chapter introduces ideas of functional programming, which is a way to think about programming using functions. Typically this requires a language that falls into the category of a functional computer language, one of which is Julia. In short, a language that has this feature has a function (or procedure) as a fundamental element of the language and one of which can be passed as arguments.
Let’s look at a relatively simple example of taking an array of integers from 1 to 5 and producing another array which is the square of each number. In a statement-based point-of-view:
arr=[1,2,3,4,5]
newarr=[]
for num in arr
push!(newarr,num^2)
end
newarr
which returns the array [1,4,9,16,25]
. There are other variations on this. From a functional point-of-view, a common method is a map
command which takes in an array and produces another array in which a function acts on each element of the array. In this case, the function is the square.
f(x)=x^2
arr=[1,2,3,4,5]
map(f,arr)
or more simply using an anonymous function,
map(x->x^2,arr)
The key to this example is the ability to pass the function as an argument to the map
command.
Write a statement-based loop to take the array [1,2,3,4,5]
and output an array that is the reciprocal of each number.
Write a functional-based code using the map
command to do the same.
map!
So maybe we can be really excited about the map
command, but map!
is similar with an important difference. Let’s say that you need a new array, but the old one isn’t important and in this case map!
is what should be used.
A=[1,2,3,4,5]
map!(x->x^2,A)
and now the content of the array A
is [1,4,9,16,25]
. Check it by just typing A
.
We will see many functions in julia with a bang (!) and the convention is that such functions alter their arguments.
Note: many languages don’t allow functions to change the arguments and there is often good reason to do so, mainly preventing bugs from cropping in. The approach in Julia is to allow it, but make sure that the user is aware is is happening. Also, often there are two versions of functions, one which returns a new thing that is altered (the one without the !) and a second that alters the thing (with the !).
The map
command in general takes in an array and returns an array. Another big idea on array is to reduce the entire array to a single value. The most common example of this is to sum the contents of a numeric array. To do this with the reduce
command we type
reduce(+,arr)
and the result on the array arr=[1,2,3,4,5]
is 15. Note: the sum
command is basically this function call.
To multiply all numbers, we can type:
reduce(*,arr)
These are obvious examples. Here’s one that returns the count of the number of array elements that are greater than zero:
num_pos(arr) = reduce((n,val)->ifelse(val>0,n+1,n),arr,init=0)
and then if it is tested on an array of both positive and negative numbers:
num_pos([-3,5,8,-2,11])
this returns 3. How this works is as follows. There are two values stored within reduce. The first is n
and the second is val
n
is initially 0 (this is the init=0
part of the function call)val
takes on the first value in the array (or -3
). It is checked if it is positive and if so, return n+1
or n
, so after the first step the internal function returns 0.val
is 5 and this time the function returns n+1
or 1val
is 8 and the function returns n+1
or 2val
is -2 and the function returns n
or 2val
is 11 and the function returns n+1
or 3.Write a reduce function that will count the number of times the string “hi” appears in an array. Test it on a few arrays.
reduce(string,"",["J","u","l","i","a"])
do?mapreduce
function The mapreduce function is perhaps more helpful than reduce
. For example, if we want to sum the squares of each number in an array, then mapreduce
can do this easily.
mapreduce(x->x^2,+,0,[1,2,3])
is a short-hand way to do $12+22+32=14$
Note: julia also has a version of the sum
function that applied a function
sum(x->x^2,[1,2,3])
Also returns 14.
The geometric mean of a set of numbers $x_1,x_2, \ldots, x_n$ is defined as
$$ \sqrt[n]{x_1 x_2 \cdots x_n}$$
Use the map reduce function to find the geometric mean of [1,2,5,7,9,10]
As we saw in Chapter 4, in a for
loop, the following syntax:
for i=1:4
println(i)
end
where the value of i was printed for 1,2,3,4. The syntax 1:4
is called a Range
and is shorthand for all integer between 1 and 4 inclusively. Technically, this is called a UnitRange
because the skips between numbers is 1.
If you type
1:4
you will it return UnitRange{Int64}
, where is a Unit Type with integers. Note the difference using 1.0:4.0
.
A handy command is collect
, which returns an array with all of the numbers in the range.
collect(1:10)
returns an integer array with all of the numbers expected. typing
collect(1.0:9.5)
returns an array of all of the floating point numbers between 1.0 and 9.0 and
collect(1:3:30)
returns an array starting a 1 and skipping every 3 integers (stopping at 28). This range is called a StepRange
The last part of this chapter will show one aspect of writing fast code, an important aspect of scientific computing. We will study a number of ways to sum up the contents of an array and determine why the timing of things are different.
Although this is not a difficult task, we wish to sum all of the integers from one to n
, some large positive integer. We have seen this previously, but want to explore a number of ways to do this.
Let’s start by making an array of all of the numbers, then summing them in a for loop:
function sum1(n::Integer)
local arr = collect(1:n)
local sum = 0
for i in arr
sum += i
end
sum
end
Try calling this for 1000 or a bit higher and you’ll notice it’s pretty fast. On a relatively slow machine:
@time sum1(1_000_000)
returns
0.021230 seconds (13.84 k allocations: 8.335 MiB)
500000500000
However, if we try it for a billion:
@time sum1(1_000_000_000)
it takes a while:
20.393715 seconds (7 allocations: 7.451 GiB, 0.25% gc time)
500000000500000000
The big difference between these is the total memory allocation. Above was 8.335 megabytes and the second was 7.451 gigabytes. I ran this on a machine with 16Gb of RAM and it appears that julia didn’t need all 7.451 at the same time, but still, that’s a huge difference.
The big difference seemed to be the total memory allocation, so let’s try a version where we don’t allocate the array.
function sum2(n::Integer)
local sum = 0
for i=1:n
sum+=i
end
sum
end
and running it as @time sum2(1_000_000_000)
we get:
0.010735 seconds (12.86 k allocations: 686.479 KiB)
500000000500000000
and you can see a HUGE difference in speed and memory allocation between sum1
and sum2
. If we crank further with
@time sum2(1_000_000_000_000)
we get:
0.000003 seconds (5 allocations: 176 bytes)
1001882602603448320
however, this super fast, but incorrect. This is because of integer overflow.
So let’s write a BigInt
version of this:
function sum3(n::BigInt)
local sum = big(0)
for i=1:n
sum+=i
end
sum
end
and:
@time sum3(big(10)^6)
returns
1.460529 seconds (13.04 M allocations: 215.475 MiB, 24.04% gc time)
500000500000
which you can see is much slower due to the fact of operations with BigInts are much slower than the built-in integers. If we do:
@time sum3(big(10)^7)
we get:
14.245302 seconds (130.00 M allocations: 2.086 GiB, 22.76% gc time)
50000005000000
which is 10 time slower, so it appears that this is linear as possibly as expected.
Let’s trying using the reduce
function on BigInts:
function sum4(n::BigInt)
reduce(+,1:n)
end
Then:
@time sum4(big(10)^6)
returns:
0.889843 seconds (9.03 M allocations: 138.714 MiB, 20.15% gc time)
500000500000
which is about half the time (and half the memory allocation) of the previous one. Also:
@time sum4(big(10)^7)
returns
8.512681 seconds (90.00 M allocations: 1.341 GiB, 20.20% gc time)
50000005000000
which is about 10 times slower, but half the speed of sum3
for the same number n
. The reduce
command probably has a number of built-in savings compared to running a for-loop.
Let’s use the built-in sum
command. Entering:
@time sum(1:big(10)^6)
returns
0.000081 seconds (62 allocations: 1.172 KiB)
500000500000
and note that higher powers of 10 do not increase the time much. For example,
@time sum(1:big(10)^20)
results in
0.002809 seconds (111 allocations: 4.117 KiB)
5000000000000000000050000000000000000000
What is going on? How is this so fast?