Collections, Sets and Dictionaries

Chapter 22 Collections, Sets and Dictionaries

Objectives

Learn that many previously-learned data structures like arrays, can be thought of more abstractly.
🔗

🔗
Create and use a Set, a Julia type that is similar to a mathematical set, which is a collection of things without an order to them.
🔗

🔗
Create and use a Dictionary, which is a collection of key/value pairs and tasks to perform on them.
🔗

🔗
Create a user-defined collection by specifying an iterator, which explains how to construct the collection.
🔗

🔗

In Chapter 6, we introduced an array, which is a set of things (integers, floats, strings or other) with an order to it. We will examine many common features that together make these as an abstract notion of a collection. Since Julia does not have classes, there isn’t a formal abstract class like other languagues. Instead, there are common methods associated with collections like an array that we will also see in the Sets and Dictionaries an important collection types. Lastly, we will create our own abstract collection.

🔗

Section 22.1 Abstract Collection and Iteration

A collection is a set of items that have a few common properties including determining how many things are in the collection, whether or not the collection is empty and a way to iterate through all of the items. The concrete objects we have seen so far in this text include arrays, tuples and ranges. Any collection has the following functions applied to it

🔗

isempty 🔗: A boolean function that determines if a collection is empty. isempty([1,2,3]) returns false and isempty(3:2) returns true because 3:2 is a range and since the second number is smaller than the first, there are no elements in it.
🔗
empty! 🔗: This function takes a collection and removes all elements of the collection. If A=[1,2,3,4] then applying empty!(A) results in A being the empty array [].
🔗
length 🔗: A function that returns the number of elements in the collection. As we’ve seen before length([1,2,3]) returns 3, but note also (1,2,3,4) returns 4. length(1:2:11) returns 6.
🔗

Note that a 2D array (Matrix) is also a collection and if C=[1,2; 3, 4; 5, 6] is the 3 by 2 matrix, then length(C) returns 6, the total number of elements in the Matrix. This applies also to higher dimensional arrays.
🔗
in or ∈ and the negations 🔗: These function are boolean and determine if a single item is in or not in the collection. For example 1 ∈ [1,2,3,4] or 1 in [1,2,3,4] returns true and 3 ∈ 1:3:11 returns false.
🔗

The functions notin or the single character ∉ is the negation of the previous one. The expression 1 ∉ [1,2,3,4] returns false.
🔗

🔗

One can access all of the elements of a collection by iterating over the elements. One simple way of doing this is with a for loop and we have seen this in Chapter 5. For example

🔗

for i=1:4
  @show i
end

🔗

will print out (show) all of the elements in the range 1:4. Recall also, we can use a for loop over an array like

🔗

for i in [5, 9, 10, 2]
  @show i
end

🔗

and finally, the other abstract collection we have see so far is a tuple (either named or unnamed). We can iterate over those like

🔗

for i in (x=3, y=4.5, z=7)
  @show i
end

🔗

which results in

🔗

i = 3
i = 4.5
i = 7

🔗

Although this seems quite simple, we will see that what is going on under the hood is that anything that is a collection has a method to go through the elements one at a time in order. This is called an iterator. In fact the for loop with the range (first example above), can be translated to

🔗

range = 1:4
next = iterate(range)
while next !== nothing
    (i, state) = next
    @show i
    next = iterate(range, state)
end

and as we disccused in Chapter 5 a for loop can be written as a while loop. The important part of this is the iterate method that is called before the loop and within the loop. The iterate function takes an iterable collection (range, array, tuple) and a state variable that in this case is just the value in the range until the iteration is finished when it returns nothing.

🔗

All collections need to define the function iterate and in Section 22.4 we show an example of a user-defined collection that defines iterate.

🔗

Section 22.2 Sets

A mathematical set is a collect of things without an order. For example, mathematically, a set is generally written as \(\{1,2,3\}\) and there are important functions like union and intersection that is used. If \(A=\{1,2,3\}\) and \(B=\{3,4,5\}\text{,}\) then \(A \cup B = \{1,2,3,4,5\}\) and \(A \cap B = \{3\}\text{.}\) Also the sets \(\{1,2,3\}\) and \(\{3,2,1\}\) are the same in that order of the elements is not important.

🔗

Julia has a Set is a collection that behaves the same way. If

🔗

A=Set([1,2,3])
B=Set([3,2,1])

🔗

then first notice that the order of the element may be switched around on output, but if we check for equality with A==B, then the result is true

If we did [1,2,3]==[3,2,1] on arrays, the result is false

🔗

Julia also has the functions union and intersection that work on Sets, but also any iterators. union(Set([1,2,3]),Set([3,4,5])) returns Set([1,2,3,4,5]), but union([3,4,8],3:7) returns [3,4,8,5,6,7] and notice that in this case the order is kept.

🔗

Sets don’t need to be sets of integers. A set of a string produces the characters. For example, Set("hello") returns

🔗

Set{Char} with 4 elements:
  'h'
  'l'
  'e'
  'o'

🔗

and notice that is it the set of characters in the work "hello" and "l" is not repeated.

🔗

Another function is the boolean version of in or ϵ that will return true if a element is in the set, false otherwise. For example if A=Set([1,3,5,7,9]), then 7 ϵ A returns true. There is also a not-in operator, ∉ which can be enters as \notin TAB and if we do 4 ∉ A, the result is true.

🔗

There are other functions that can be used with sets include set difference and testing for subsets. See the Julia documentation on sets for more information.

🔗

Section 22.3 Dictionaries

In Julia, a Dictionary is more general version of a named tuple and are sets of key/value pairs. The main difference is that a Dict is mutable and there are many methods that update a Dict. To create a Dict, pass key/value pairs using "fat arrow" notation like:

🔗

d = Dict("a"=>1, "b"=>2, "c" => 3)

🔗

or alternatively, an array of tuples as key/value pairs:

🔗

s = Dict([("bart", 10), ("lisa", 8), ("maggie", 1)])

🔗

The getter/accessor is similar to that of an array. d["a"] will return 1, the value of the key corresponding to "a". And one can set a value in the same way. d["a"] = 10 will update the value associated with the key "a" to be 10. Note that this is one way that a Dictionary differs from a named tuple. The Dictionary can be updated/mutated and a named tuple cannot.

🔗

push! and pop! work with Dicts in a similar manner to those as arrays. For example,

🔗

push!(s, "homer" => 45)

🔗

now returns the Dictionary with this entry added. Unlike an array, pop! requires both the Dict and a key which will be returns and removed.

🔗

pop!(s, "maggie")

🔗

returns 1 the value that corresponds to the "maggie" entry and if s is entered, then it no longer contains this entry.

🔗

The method keys returns an array of the keys in the Dictionary. Since "maggie" is no longer there keys(s) returns ["lisa", "homer", "bart"]. The order of the keys may differ from what you see here. It returns an iterator, but probably better thought of as a set.

🔗

The method values returns an array of the values in the Dictionary in the same order as that of the keys. When running values(s), we get [8,45,10].

🔗

As mentioned above, a dictionary is a more flexible version of a named tuple. Another example is that Dictionaries can be nested. Consider creating a dictionary which stores contact information for a person.

🔗

homer = Dict(
  "first_name" => "Homer",
  "last_name" => "Simpson",
  "phones" => [
    "home" => "987-555-1234",
    "cell" => "987-555-1212"
  ],
  "home_address" => Dict(
    "street" => "742 Evergreen Terrace",
    "city" => "Springfield"
  ),
  "work_address" => Dict(
    "street" => "10 Power Plant Lane",
    "city" => "Springfield"
  )
)

and as you can see, the keys are all strings, but the corresponding values differ. There are two strings, an array (of strings) and two Dicts. Although this is more complex, we can still access things in a similar way. For example homer["work_address"]["street"] will return "10 Power Plant Lane".

🔗

There are other methods that we can use with Dictionaries and the Julia documentation on dictionaries is a good source. Additionally in Chapter 49, we will examine how to interact with a webserver that returns JSON strings which are parsed as dictionaries. This is a common format on the internet and we will see how to handle such things in Julia.

🔗

Section 22.4 User-created collections

Although we have seen many built-in collections and there are plenty of other examples in Julia packages, it should be no surprise to note that we can create a collection. Mainly, there are some key functions that need to be implemented on the collection. Let’s just show this with an example.

🔗

In Chapter 7, we looked at Fibonacci numbers as examples of recursive functions and how to implement them efficiently. We will create a type Fibonacci using a struct like in Chapter 12 in the following manner:

🔗

struct Fibonacci
  n::Integer
  function Fibonacci(n)
    n >= 0 || throw(ArgumentError("This must be called with a nonnegative integer"))
    new(n)
  end
end

where a constructor has been used to check that only nonnegative integers are used. As is, this is just a struct with a single field called n. To make this an iterator we need to implement some of the functions associated with an iterator. We will first implement the functions isempty and length and define them as

🔗

Base.isempty(f::Fibonacci) = f.n == 0
Base.length(f::Fibonacci) = f.n

🔗

and as can be seen that if we create a Fibonacci with 0 elements that it is empty and has length 0. It can be tested with isempty(Fibonacci(0)) which returns true and length(Fibonacci(10)) returns 10.

🔗

So far this doesn’t see to do anything related to the fibonacci numbers, but by defining the iterator function we can get the functionality that we desire. If

🔗

 function Base.iterate(f::Fibonacci, state = 1)
   if state > f.n
    return nothing
   end
   local (x,y) = (1,1)
   for i = 1:state-1
     x,y = (y, x+y)
   end
   (x, state + 1)
end

where the arguments of the iterate function is a object of the type in question and a variable state which acts like an index. The function should return nothing if the iteration is done and otherwise, return a tuple with the next value and increment the index/state. Note in our case above, we have used the faster (non recursive) version of fibonacci in Subsection 8.8.1.

🔗

There are many ways that iterate can be used. For example in a for loop like

🔗

for i in Fibonacci(10)
  @show i
end

🔗

will produce the first 10 fibonacci numbers. Also, collect(Fibonacci(10)) will create an array of length 10 with the Fibonacci numbers or [1, 1, 2, 3, 5, 8, 13, 21, 34, 55].

🔗

Section 22.5 Iterable Collection

There are a few different types of collections in Julia. As we discussed above, a collection can be thought of as a bunch of things. The most general type of this is a iterable collection, which can be iterated over (or you can think that you can run a for loop over these). All of the collections in this chapter (arrays, tuples, ranges, sets, dictionaries) all have this property and the Fibonacci collection is as well.

🔗

Section 22.6 Indexable Collection

Many collections are also indexable and in general if a collection is ordered, then the collection is iterable. Vectors (and arrays in general) are indexable as are tuples, ranges and strings. However Sets, Dicts and named tuples are not.

🔗

If you have a indexable collection then getindex is the way to access it. For example, if A=[1,3,5,7,9], then getindex(A,3) returns the third element or 5. This is generally done with the notation A[3].

🔗

Related, is the setindex! function that assigns a value to a index. For example, setindex!(A,11,3) will update the array A so the 3rd element has the value 11. Note: the order of the elements is that the element is the 2nd argument and the index is the 3rd one. This is the formal method of the shorthand A[3]=11.

🔗

Note 22.1.

Although as discussed, strings are indexable collections, the setindex! method does not apply to them, since strings are immutable.

🔗

Subsection 22.6.1 Listing of Methods for Indexable Collection

We just saw that getindex and setindex! are applicable (and probably one of the most important methods) to such collections. Here’s some other helpful methods. For examples below, consider A=[5,7,9,11,13,15].

🔗

first 🔗: return the first element of a collection. first(A) returns 5.
🔗
last 🔗: returns the last element of a collection. last(A) returns 15.
🔗
firstindex 🔗: returns the first index of the collection. firstindex(A) returns 1. In general this is 1, but some collections may start at other indices.
🔗
lastindex 🔗: returns the last index of the collection. lastindex(A) returns 6.
🔗

🔗

Subsection 22.6.2 Indexing on other collections

We mentioned that Sets and Dicts are not indexable. Let’s take a look at what happens if we treat it as such. Let S = Set([1,2,3,4,5]). If you run this, you will probably get the elements returned in a different order. If you run any of the above methods on this set, you will get an error. For example, last(S) returns

🔗

MethodError: no method matching lastindex(::Set{Int64})
The function `lastindex` exists, but no method is defined for this combination of argument types.

🔗

This is as expected in that there are not indices associated with sets. It’s odd that if your run first(S) you will get an answer in that firstindex(S) returns a value.

🔗

If we try access either getindex, last or lastindex of our user-defined Fibonacci collection, then we will get errors. These seem like reasonable methods to have with this collection, so we can defined it.

🔗

The lastindex of a Fibonacci collection should be n and we can defined it as

🔗

Base.lastindex(f::Fibonacci) = f.n

🔗

And to test this, let’s define f = Fibonacci(10), then lastindex(f) will return 10. Now to create the getindex method on this, we will use the iterate method that we have already created above. Note, if we evaluate iterate(f,7), then we get the tuple (13, 8) where the first is the 7th Fibonacci number and the second element just increments the index to 8. We’d like getindex to return the first number here. We can do this with

🔗

function Base.getindex(f::Fibonacci, i::Int) 
  0 < i <= f.n || throw(ArgumentError("The index must be between 1 and $(f.n)"))
  iterate(f,i)[1]
end

🔗

Now, we can access any element up to the last Fibonacci number. For example, if we redefine f = Fibonacci(1000) and then find the 999th fibonnacii number, as f[999] which returns 8261794739546030242. Also last(f) is now defined and returns 817770325994397771.

🔗

We can also leverage other methods of iterable collection that are typically associated with vectors and arrays. Consider the findfirst method which takes a boolean function and a collection and returns the index of the first element that satsifies the first. Consider finding the first Fibonacci number greater than 1000. We’d like to do this with findfirst(x -> x > 1000, f), however evaluating this returns

🔗

MethodError: no method matching keys(::Fibonacci)
The function `keys` exists, but no method is defined for this combination of argument types.

🔗

indicating that under the scene, the keys method (that we saw associated with named tuples and Dicts) is used. The keys for the fibonacci will be all of the numbers 1 to n, therefore we can define

🔗

Base.keys(f::Fibonacci) = 1:f.n

🔗

and then running findfirst(x -> x > 1000, f) returns 17 indicating that the 17th Fibonacci number is greater than 1000. And the value is f[17] or 1597.

🔗

Section 22.7 Summary of Abstract Collections

We have seen objects like arrays, ranges and tuples have a common structure that allows one to iterate over its elements. We have also seen other objects like Sets and Dictionaries that fall into abstract collections as well. Finally, we saw an example how to write our own abstract collection by defining a struct as well as defining functions of that struct that make it act like an iterator.

🔗

Prev Top Next