Tuesday, April 13, 2021

Sets, Tuples, Named Tuples, and Dictionaries

So far we have seen Booleans, strings, characters, and numeric data types.

Booleans, characters and the numeric types like Int64 and Float64 are called scalar types or atomic types because they can only store one truth value, one character, or one number at any one time. Strings are different - they are composed of zero or more characters. Types like these are called composite types, which means that they are built out of either scalar types or composite types, and they can store more than one value at once.

Ranges are also composite types, since they have at least a start and an end.

In this post we cover several other data types:

  • sets
  • tuples
  • named tuples
  • dictionaries

These are all composite types, as are structs and arrays which we will cover in later blog posts.

There are subtle differences between these types, and these differences will determine when to use each type. Two questions to ask about each data type are:

  • Can the values they contain be changed?
  • Can those values be looped-over?

When a data type's values can be changed, that data type is called mutable. Strings are immutable, so that the only way to change one of the characters is to recreate the entire string. Ranges are also immutable, so that once a range is created, its start and stop values cannot be changed.

A data type is called iterable when it is possible to loop over that data type's values. Ranges and strings are interable. All the composite types we'll cover are iterable.

A composite type is called indexable if we can access methods using an index. Strings are indexible, as we saw: the first character of the string s is accessed by s[1], the second character is accessed by s[2], and so on.

Another difference between these different data types is the speed at which various operations can be performed. This will be discussed in a later blog post.

Sets

Sets are collections of elements, and are completely described by those elements. As such:

  • Duplicate elements are ignored
  • There is no inherent order to the elements

Because sets are unordered, they are not indexible.

Let's start by defining several sets that contain integers:

In [1]:
Out[1]:
Set{Set{Int64}} with 2 elements:
  Set([4, 7, 2, 9, 3])
  Set([4, 6, 2, 1])

Sets can be defined that have no elements! This is called the empty set. Julia doesn't have a built-in constant for the empty set (like it does for pi), but we can define this set as follows, typing ∅ by entering \emptyset [TAB]:

In [2]:
Out[2]:
Set{Any}()

The order of elements in a set is immaterial:

In [3]:
Out[3]:
true

Also, duplicates are ignored:

In [4]:
Out[4]:
true

All that matters about sets is the elements they have (and don't have) in them:

In [5]:
Out[5]:
true

We get the number of elements in a set using the length() function:

In [6]:
Out[6]:
Set([4, 6, 2, 1])
4

Testing for Membership

There are two ways to determine whether an element is in a set:

In [7]:
true
true

We can also test to see if an element is not in a set:

In [8]:
true

Subsets and Subsets

If every element in a set C is also an element in a set A, then C is a subset of A, and it is written C ⊆ A. Equivalently, A is said to be a superset of C, which is written A ⊇ C.

TODO: add illustration

In [9]:
true
true
false
true
true

Mutating Sets

Sets are mutable, so elements can be added to or removed from a set:

In [10]:
Out[10]:
Set{Int64} with 5 elements:
  4
  6
  2
  11
  1
In [11]:
Out[11]:
Set{Int64} with 4 elements:
  4
  6
  2
  1

Note that the push!() and delete!() functions have an exclamantion mark at the end. This is a Julia convention: functions which mutate the thing they're applied to are named with an exclamation mark at the end.

Creating New Sets from Existing Sets

New sets can be created from of existing ones using union (∪), intersection (∩), set difference, and symmetric difference:

In [12]:
A = Set([4, 6, 2, 1])
A = Set([4, 7, 2, 9, 3])

union = Set([4, 6, 7, 2, 9, 3, 1])
intersection = Set([4, 2])

difference = Set([6, 1])
symmetric difference = Set([6, 7, 9, 3, 1])

Iterating Over a Set

Sets are iterable like strings, but (unlike strings) the order of elements is not 100% guaranteed to be the same each time:

In [13]:
4
6
2
1

Set Comprehension

Loops can be used to populate a set like this:

In [14]:
Set(Any[64, 4, 16, 49, 25, 36, 81, 9, 1, 100])

Another way to accomplish this is to use what is called comprehension.

In [15]:
Out[15]:
Set{Int64} with 10 elements:
  64
  4
  16
  49
  25
  36
  81
  9
  1
  100

The syntax looks strange, but it will make more sense if we read the word "for" as meaning "such that", which is how we read the colon in the set-builder notation:

{x2:x1:10}

We'd read this as "the set of all x2 such that x is in the range 1 to 10."

Another way to look at the above expression is like this:

F = Set(x^2 for x ∈ 1:10)

x^2 is the output expression for the variable x when x ranges over the input set 1:10.

Conditions can be added, too:

In [16]:
Out[16]:
Set{Int64} with 5 elements:
  64
  49
  81
  36
  100

Nested loops can be combined. The ordered pairs generated by this next line of code are called tuples, which will be described in a minute:

In [17]:
Out[17]:
Set{Tuple{Int64, Int64}} with 10 elements:
  (2, 4)
  (2, 5)
  (1, 7)
  (1, 3)
  (1, 6)
  (1, 4)
  (2, 7)
  (2, 3)
  (1, 5)
  (2, 6)
In [18]:
syntax: extra token "mutable" after end of expression

Stacktrace:
 [1] top-level scope
   @ In[18]:3
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

Tuples

The Julia manual describes tuples as "an abstraction of the arguments of a function - without the function itself." Tuples can be thought as a "utility" type - many Julia functions return a tuple of values, like for example the keys() function we'll see shortly.

At first glance, tuples look like sets. They're defined in a similar manner:

In [19]:
Out[19]:
(3.4, 6, "Tuples can contain different types", 'x')
In [20]:
syntax: extra token "number" after end of expression

Stacktrace:
 [1] top-level scope
   @ In[20]:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

In [21]:
Out[21]:
6
In [22]:
syntax: extra token "and" after end of expression

Stacktrace:
 [1] top-level scope
   @ In[22]:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

In [23]:
Out[23]:
true
In [24]:
Out[24]:
1-element Vector{Int64}:
 7
In [25]:
Out[25]:
6-element Vector{Int64}:
  2
  5
  7
  4
 11
 14

Tuples are immutable, so there is nothing corresponding to push!() and delete!() functions - elements cannot be added, removed, or changed.

Looping is the same as with sets, too:

In [26]:
2
5
7

One difference between sets and tuples is that order matters:

In [27]:
Out[27]:
false
In [28]:
syntax: extra token "difference" after end of expression

Stacktrace:
 [1] top-level scope
   @ In[28]:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

In [29]:
Out[29]:
false

Tuples, unlike sets, are indexible:

In [30]:
(2, 5, 7)
2
5
7

Summary of Tuples

  • They're iterable
  • They're immutable
  • They are indexed
  • Order does matter
  • Duplicates are allowed

Named Tuples

Named tuples are similar to ordinary tuples, except that each element in a named tuple has a name (also known as a key) and a value. Named tuples are thus an ordered list of key-value pairs. For example:

In [31]:
Out[31]:
(b = 8, a = 2, c = 5)

Even though the keys and values of the elements are the same, the order is different, so those two named tuples are distinct:

In [32]:
Out[32]:
false

Duplicate keys are not allowed:

In [33]:
syntax: field name "a" repeated in named tuple around In[33]:2

Stacktrace:
 [1] top-level scope
   @ In[33]:2
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

Tuples can be indexed in two ways: by position and by name:

In [34]:
2
2

The keys (a, b, and c in the case of nt1) are not strings but "symbols". As symbols, the element names would be written as :a, :b, and :c. So we can retrieve the first element of nt1 as follows:

In [35]:
2

This is useful because the element name can be a variable. For example:

In [36]:
8

The keys of a named tuple can be found using the keys() function:

In [37]:
Out[37]:
(:a, :b, :c)

This returns a tuple of symbols. The values() function returns the... values:

In [38]:
Out[38]:
(2, 8, 5)

There is also a function called pairs() which returns all the key-value pairs in a named tuple:

In [39]:
Out[39]:
pairs(::NamedTuple) with 3 entries:
  :a => 2
  :b => 8
  :c => 5

Iterating over Named Tuples

Because there are these two ways of accessing elements, there are several ways of iterating over a named tuple.

In [40]:
2
8
5
In [41]:
a = 2
b = 8
c = 5
In [42]:
2
8
5
In [43]:
a = 2
b = 8
c = 5

Summary of Named Tuples

  • They are indexed by position or by key
  • They're iterable either by keys or values or both
  • They're immutable
  • Order does matter
  • Duplicate keys are not allowed

Dictionaries

Dictionaries are the last data type examined here, and they're the most flexible.

Like named tuples, dictionaries consist of key-value pairs; unlike named tuples, the order of the keys are irrelevant.

In [44]:
Out[44]:
Dict{String, Integer} with 3 entries:
  "c" => true
  "b" => 76
  "a" => 3
In [45]:
Out[45]:
true

Duplicate keys are allowed, and last value of the key is what counts:

In [46]:
Out[46]:
Dict{String, Integer} with 3 entries:
  "c" => true
  "b" => 76
  "a" => 3

To check whether a dictionary has a specific key, use the haskey() function:

In [47]:
Out[47]:
true

Adding, Removing, and Changing Values

Dictionaries are mutable:

In [48]:
Dict{String, Integer}("c" => true, "b" => 76, "a" => 18)
In [49]:
Dict{String, Integer}("c" => true, "b" => 76, "a" => 18, "d" => -82)
In [50]:
Dict{String, Integer}("c" => true, "b" => 76, "d" => -82)

Here's a more substantial dictionary, one that contains another dictionary as a value:

In [51]:
Out[51]:
Dict{String, Any} with 5 entries:
  "yearReleased" => 1971
  "director"     => "Stanley Kubrick"
  "plot"         => "Jailbird Alex is 'cured' of his criminality. Then his trou…
  "title"        => "A Clockwork Orange"
  "cast"         => Dict("Dim"=>"Warren Clarke", "Prison Guard"=>"Michael Bates…

The subdictionary can be accessed like any other value:

In [52]:
Out[52]:
Dict{String, String} with 9 entries:
  "Dim"             => "Warren Clarke"
  "Prison Guard"    => "Michael Bates"
  "Prison Chaplain" => "Godfrey Quigley"
  "Georgie"         => "James Marcus"
  "Mr. Alexander"   => "Patrick Magee"
  "Alex"            => "Malcolm McDowell"
  "Pete"            => "Michael Tarn"
  "Mr. Deltoid"     => "Aubrey Morris"
  "Minister"        => "Anthony Sharp"

To get individual values inside that subdictionary, just reference the desired key:

In [53]:
Out[53]:
"Malcolm McDowell"

Values can be changed, and changed back, in the same way:

In [54]:
Out[54]:
"Malcolm McDowell"

New key-values are added into the subdictionary in the same way we saw above:

In [55]:
Out[55]:
"Sheila Raynor"

Just like named tuples, dictionaries also have the keys(), values(), and pairs() functions:

Iterating over Dictionaries

Just like named tuples, the keys(), values(), and pairs() functions operate on dictionaries, and those functions can be used to loop over the key-values in a dictionary:

In [56]:
Dim
Georgie
Minister
Mr. Alexander
Pete
Mr. Deltoid
Prison Guard
Prison Chaplain
Dad
Mum
Alex
In [57]:
Warren Clarke
James Marcus
Anthony Sharp
Patrick Magee
Michael Tarn
Aubrey Morris
Michael Bates
Godfrey Quigley
Philip Stone
Sheila Raynor
Malcolm McDowell
In [58]:
Dim is played by Warren Clarke
Georgie is played by James Marcus
Minister is played by Anthony Sharp
Mr. Alexander is played by Patrick Magee
Pete is played by Michael Tarn
Mr. Deltoid is played by Aubrey Morris
Prison Guard is played by Michael Bates
Prison Chaplain is played by Godfrey Quigley
Dad is played by Philip Stone
Mum is played by Sheila Raynor
Alex is played by Malcolm McDowell

For better output, use rpad() and lpad():

In [59]:
Cast of Characters:
Warren Clarke.............................Dim
James Marcus..........................Georgie
Anthony Sharp........................Minister
Patrick Magee...................Mr. Alexander
Michael Tarn.............................Pete
Aubrey Morris.....................Mr. Deltoid
Michael Bates....................Prison Guard
Godfrey Quigley...............Prison Chaplain
Philip Stone..............................Dad
Sheila Raynor.............................Mum
Malcolm McDowell.........................Alex

Summary of Dictionaries

  • Consists of key-value pairs
  • Order of elements is irrelevant
  • Duplicate elements are allowed
  • Is mutable
  • Iterable using keys, values, or both
In [ ]: