So far we have seen Booleans, strings, characters, and numeric data types.
Booleans, characters and the numeric types like Int64 and Float64 are called scalar types or atomic types because they can only store one truth value, one character, or one number at any one time. Strings are different - they are composed of zero or more characters. Types like these are called composite types, which means that they are built out of either scalar types or composite types, and they can store more than one value at once.
Ranges are also composite types, since they have at least a start and an end.
In this post we cover several other data types:
- sets
- tuples
- named tuples
- dictionaries
These are all composite types, as are structs and arrays which we will cover in later blog posts.
There are subtle differences between these types, and these differences will determine when to use each type. Two questions to ask about each data type are:
- Can the values they contain be changed?
- Can those values be looped-over?
When a data type's values can be changed, that data type is called mutable. Strings are immutable, so that the only way to change one of the characters is to recreate the entire string. Ranges are also immutable, so that once a range is created, its start and stop values cannot be changed.
A data type is called iterable when it is possible to loop over that data type's values. Ranges and strings are interable. All the composite types we'll cover are iterable.
A composite type is called indexable if we can access methods using an index. Strings are indexible, as we saw: the first character of the string s
is accessed by s[1]
, the second character is accessed by s[2]
, and so on.
Another difference between these different data types is the speed at which various operations can be performed. This will be discussed in a later blog post.
Sets¶
Sets are collections of elements, and are completely described by those elements. As such:
- Duplicate elements are ignored
- There is no inherent order to the elements
Because sets are unordered, they are not indexible.
Let's start by defining several sets that contain integers:
x
# Three sets that contain only integers:
A = Set([1, 2, 4, 6])
B = Set([3, 2, 4, 7, 9])
C = Set([2, 6])
# The elements in a set don't all have to be the same type:
D = Set([5, 2.3, "Sets can contain strings", 'x'])
# Set can contain other sets:
E = Set([A, B])
Set{Set{Int64}} with 2 elements: Set([4, 7, 2, 9, 3]) Set([4, 6, 2, 1])
Sets can be defined that have no elements! This is called the empty set. Julia doesn't have a built-in constant for the empty set (like it does for pi), but we can define this set as follows, typing ∅ by entering \emptyset [TAB]:
x
∅ = Set([])
Set{Any}()
The order of elements in a set is immaterial:
xxxxxxxxxx
Set([1, 2, 3]) == Set([2, 3, 1])
true
Also, duplicates are ignored:
xxxxxxxxxx
Set([2, 1, 3, 2, 3, 2, 2]) == Set([1, 2, 3])
true
All that matters about sets is the elements they have (and don't have) in them:
xxxxxxxxxx
∅ == Set([])
true
We get the number of elements in a set using the length()
function:
x
println(A)
length(A)
Set([4, 6, 2, 1])
4
Testing for Membership¶
There are two ways to determine whether an element is in a set:
x
println(2 in A)
println(2 ∈ A) # This is entered by typing \in [TAB]
true true
We can also test to see if an element is not in a set:
x
println(3 ∉ A) # This is entered by typing \notin [TAB]
true
Subsets and Subsets¶
If every element in a set C is also an element in a set A, then C is a subset of A, and it is written C ⊆ A. Equivalently, A is said to be a superset of C, which is written A ⊇ C.
TODO: add illustration
xxxxxxxxxx
# Typed using \subseteq [TAB]
println(C ⊆ A)
# Typed using \supseteq [TAB]
println(A ⊇ C)
# Typed using \subsetneq [TAB]
println(A ⊊ A)
# Typed using \supsetneq [TAB]
println(A ⊋ C)
# The empty set is a subset of every other set
println(∅ ⊆ A)
true true false true true
Mutating Sets¶
Sets are mutable, so elements can be added to or removed from a set:
xxxxxxxxxx
# Add element to a set
push!(A, 11)
Set{Int64} with 5 elements: 4 6 2 11 1
xxxxxxxxxx
# Remove an element from a set
delete!(A, 11)
Set{Int64} with 4 elements: 4 6 2 1
Note that the push!()
and delete!()
functions have an exclamantion mark at the end. This is a Julia convention: functions which mutate the thing they're applied to are named with an exclamation mark at the end.
Creating New Sets from Existing Sets¶
New sets can be created from of existing ones using union (∪), intersection (∩), set difference, and symmetric difference:
println("A = ", A)
println("A = ", B)
println()
# Set union - elements which are in one set, the other set, or both
unionset = union(A, B)
# This is the same, using more familiar notation (typed with \cup [TAB])
unionset = A ∪ B
# Set intersection - elements which are in both sets
intersectionset = intersect(A, B)
# Typed with \cap [TAB]
intersectionset = A ∩ B
# There are two types of set difference, neither of which have operators:
# Elements which are in A but not in B
diffset = setdiff(A, B)
# Elements which are members in A or B, but not both
symmetricdiffset = symdiff(A, B)
println("union = ", unionset)
println("intersection = ", intersectionset)
println()
println("difference = ", diffset)
println("symmetric difference = ", symmetricdiffset)
A = Set([4, 6, 2, 1]) A = Set([4, 7, 2, 9, 3]) union = Set([4, 6, 7, 2, 9, 3, 1]) intersection = Set([4, 2]) difference = Set([6, 1]) symmetric difference = Set([6, 7, 9, 3, 1])
Iterating Over a Set¶
Sets are iterable like strings, but (unlike strings) the order of elements is not 100% guaranteed to be the same each time:
xxxxxxxxxx
for x ∈ A
println(x)
end
4 6 2 1
Set Comprehension¶
Loops can be used to populate a set like this:
xxxxxxxxxx
F = Set([])
for x in 1:10
push!(F, x^2)
end
println(F)
Set(Any[64, 4, 16, 49, 25, 36, 81, 9, 1, 100])
Another way to accomplish this is to use what is called comprehension.
F = Set(x^2 for x ∈ 1:10)
Set{Int64} with 10 elements: 64 4 16 49 25 36 81 9 1 100
The syntax looks strange, but it will make more sense if we read the word "for" as meaning "such that", which is how we read the colon in the set-builder notation:
We'd read this as "the set of all such that is in the range 1 to 10."
Another way to look at the above expression is like this:
F = Set(x^2 for x ∈ 1:10)
x^2
is the output expression for the variable x
when x
ranges over the input set 1:10
.
Conditions can be added, too:
G = Set(x^2 for x ∈ 1:10 if x > 5)
Set{Int64} with 5 elements: 64 49 81 36 100
Nested loops can be combined. The ordered pairs generated by this next line of code are called tuples, which will be described in a minute:
xxxxxxxxxx
H = Set((x, y) for x ∈ 1:2, y ∈ 3:7)
Set{Tuple{Int64, Int64}} with 10 elements: (2, 4) (2, 5) (1, 7) (1, 3) (1, 6) (1, 4) (2, 7) (2, 3) (1, 5) (2, 6)
### Summary of Sets
- They're mutable
- They're iterable
- They cannot be indexed
- Order does not matter
- Duplicates are ignored
syntax: extra token "mutable" after end of expression Stacktrace: [1] top-level scope @ In[18]:3 [2] eval @ ./boot.jl:360 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
Tuples¶
The Julia manual describes tuples as "an abstraction of the arguments of a function - without the function itself." Tuples can be thought as a "utility" type - many Julia functions return a tuple of values, like for example the keys()
function we'll see shortly.
At first glance, tuples look like sets. They're defined in a similar manner:
r = Tuple([2, 5, 7])
# They can also be defined using a shorter notation:
s = (4, 7, 11, 14)
# Even the parentheses are optional:
t = 1, 1, 2, 3, 5, 8
u = (3.4, 6, "Tuples can contain different types", 'x')
(3.4, 6, "Tuples can contain different types", 'x')
xxxxxxxxxx
The number of elements is calculated the same as with sets:
syntax: extra token "number" after end of expression Stacktrace: [1] top-level scope @ In[20]:1 [2] eval @ ./boot.jl:360 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
xxxxxxxxxx
length(t)
6
xxxxxxxxxx
Membership and the set operations work the same way as sets:
syntax: extra token "and" after end of expression Stacktrace: [1] top-level scope @ In[22]:1 [2] eval @ ./boot.jl:360 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
xxxxxxxxxx
5 ∈ t
true
xxxxxxxxxx
r ∩ s
1-element Vector{Int64}: 7
xxxxxxxxxx
r ∪ s
6-element Vector{Int64}: 2 5 7 4 11 14
Tuples are immutable, so there is nothing corresponding to push!()
and delete!()
functions - elements cannot be added, removed, or changed.
Looping is the same as with sets, too:
xxxxxxxxxx
for x ∈ r
println(x)
end
2 5 7
One difference between sets and tuples is that order matters:
xxxxxxxxxx
(1, 2, 3) == (3, 2, 1)
false
xxxxxxxxxx
Another difference is that tuples can have duplicate elements:
syntax: extra token "difference" after end of expression Stacktrace: [1] top-level scope @ In[28]:1 [2] eval @ ./boot.jl:360 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
xxxxxxxxxx
(1, 2, 2, 2, 3, 3, 1, 3) == (1, 2, 3)
false
Tuples, unlike sets, are indexible:
println(r)
println(r[1])
println(r[2])
println(r[3])
(2, 5, 7) 2 5 7
Summary of Tuples¶
- They're iterable
- They're immutable
- They are indexed
- Order does matter
- Duplicates are allowed
Named Tuples¶
Named tuples are similar to ordinary tuples, except that each element in a named tuple has a name (also known as a key) and a value. Named tuples are thus an ordered list of key-value pairs. For example:
xxxxxxxxxx
nt1 = (a = 2, b = 8, c = 5)
nt2 = (b = 8, a = 2, c = 5)
(b = 8, a = 2, c = 5)
Even though the keys and values of the elements are the same, the order is different, so those two named tuples are distinct:
xxxxxxxxxx
nt1 == nt2
false
Duplicate keys are not allowed:
xxxxxxxxxx
# Will produce an error
nt3 = (a = 2, b = 8, c = 5, a = 23)
syntax: field name "a" repeated in named tuple around In[33]:2 Stacktrace: [1] top-level scope @ In[33]:2 [2] eval @ ./boot.jl:360 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1094
Tuples can be indexed in two ways: by position and by name:
println(nt1[1])
println(nt1.a)
2 2
The keys (a, b, and c in the case of nt1) are not strings but "symbols". As symbols, the element names would be written as :a, :b, and :c. So we can retrieve the first element of nt1 as follows:
xxxxxxxxxx
println(nt2[:a])
2
This is useful because the element name can be a variable. For example:
xxxxxxxxxx
z = :b
println(nt1[z])
8
The keys of a named tuple can be found using the keys()
function:
xxxxxxxxxx
keys(nt1)
(:a, :b, :c)
This returns a tuple of symbols. The values()
function returns the... values:
xxxxxxxxxx
values(nt1)
(2, 8, 5)
There is also a function called pairs()
which returns all the key-value pairs in a named tuple:
xxxxxxxxxx
pairs(nt1)
pairs(::NamedTuple) with 3 entries: :a => 2 :b => 8 :c => 5
Iterating over Named Tuples¶
Because there are these two ways of accessing elements, there are several ways of iterating over a named tuple.
# Iterating over values
for x in nt1
println(x)
end
2 8 5
xxxxxxxxxx
# Iterating over keys
for k in keys(nt1)
println(k, " = ", nt1[k])
end
a = 2 b = 8 c = 5
xxxxxxxxxx
# Another way to iterate over values
for v in values(nt1)
println(v)
end
2 8 5
xxxxxxxxxx
# Iterating over key-value pairs
for (k, v) in pairs(nt1)
println(k, " = ", v)
end
a = 2 b = 8 c = 5
Summary of Named Tuples¶
- They are indexed by position or by key
- They're iterable either by keys or values or both
- They're immutable
- Order does matter
- Duplicate keys are not allowed
Dictionaries¶
Dictionaries are the last data type examined here, and they're the most flexible.
Like named tuples, dictionaries consist of key-value pairs; unlike named tuples, the order of the keys are irrelevant.
x
d1 = Dict("a" => 3, "b" => 76, "c" => true)
d2 = Dict("c" => true, "b" => 76, "a" => 3)
Dict{String, Integer} with 3 entries: "c" => true "b" => 76 "a" => 3
d1 == d2
true
Duplicate keys are allowed, and last value of the key is what counts:
d3 = Dict("c" => true, "a" => 42, "b" => 76, "a" => 3)
Dict{String, Integer} with 3 entries: "c" => true "b" => 76 "a" => 3
To check whether a dictionary has a specific key, use the haskey()
function:
haskey(d2, "c")
true
Adding, Removing, and Changing Values¶
Dictionaries are mutable:
# Change a value
d1["a"] = 18
println(d1)
Dict{String, Integer}("c" => true, "b" => 76, "a" => 18)
# Add a new key-value pair
d1["d"] = -82
println(d1)
Dict{String, Integer}("c" => true, "b" => 76, "a" => 18, "d" => -82)
xxxxxxxxxx
# Remove a key-value pair:
delete!(d1, "a")
println(d1)
Dict{String, Integer}("c" => true, "b" => 76, "d" => -82)
Here's a more substantial dictionary, one that contains another dictionary as a value:
x
movie = Dict(
"title" => "A Clockwork Orange",
"director" => "Stanley Kubrick",
"yearReleased" => 1971,
"plot" => "Jailbird Alex is 'cured' of his criminality. Then his troubles begin.",
"cast" => Dict(
"Alex" => "Malcolm McDowell",
"Pete" => "Michael Tarn",
"Georgie" => "James Marcus",
"Dim" => "Warren Clarke",
"Mr. Deltoid" => "Aubrey Morris",
"Mr. Alexander" => "Patrick Magee",
"Prison Guard" => "Michael Bates",
"Prison Chaplain" => "Godfrey Quigley",
"Minister" => "Anthony Sharp"
)
)
Dict{String, Any} with 5 entries: "yearReleased" => 1971 "director" => "Stanley Kubrick" "plot" => "Jailbird Alex is 'cured' of his criminality. Then his trou… "title" => "A Clockwork Orange" "cast" => Dict("Dim"=>"Warren Clarke", "Prison Guard"=>"Michael Bates…
The subdictionary can be accessed like any other value:
movie["cast"]
Dict{String, String} with 9 entries: "Dim" => "Warren Clarke" "Prison Guard" => "Michael Bates" "Prison Chaplain" => "Godfrey Quigley" "Georgie" => "James Marcus" "Mr. Alexander" => "Patrick Magee" "Alex" => "Malcolm McDowell" "Pete" => "Michael Tarn" "Mr. Deltoid" => "Aubrey Morris" "Minister" => "Anthony Sharp"
To get individual values inside that subdictionary, just reference the desired key:
movie["cast"]["Alex"]
"Malcolm McDowell"
Values can be changed, and changed back, in the same way:
movie["cast"]["Alex"] = "Malcolm McccccccccccccDowell"
movie["cast"]["Alex"] = "Malcolm McDowell"
"Malcolm McDowell"
New key-values are added into the subdictionary in the same way we saw above:
x
movie["cast"]["Dad"] = "Philip Stone"
movie["cast"]["Mum"] = "Sheila Raynor"
"Sheila Raynor"
Just like named tuples, dictionaries also have the keys()
, values()
, and pairs()
functions:
Iterating over Dictionaries¶
Just like named tuples, the keys()
, values()
, and pairs()
functions operate on dictionaries, and those functions can be used to loop over the key-values in a dictionary:
for character in keys(movie["cast"])
println(character)
end
Dim Georgie Minister Mr. Alexander Pete Mr. Deltoid Prison Guard Prison Chaplain Dad Mum Alex
for actor in values(movie["cast"])
println(actor)
end
Warren Clarke James Marcus Anthony Sharp Patrick Magee Michael Tarn Aubrey Morris Michael Bates Godfrey Quigley Philip Stone Sheila Raynor Malcolm McDowell
x
for (character, actor) in pairs(movie["cast"])
println(character, " is played by ", actor)
end
Dim is played by Warren Clarke Georgie is played by James Marcus Minister is played by Anthony Sharp Mr. Alexander is played by Patrick Magee Pete is played by Michael Tarn Mr. Deltoid is played by Aubrey Morris Prison Guard is played by Michael Bates Prison Chaplain is played by Godfrey Quigley Dad is played by Philip Stone Mum is played by Sheila Raynor Alex is played by Malcolm McDowell
For better output, use rpad()
and lpad()
:
x
println("Cast of Characters:")
for (character, actor) in pairs(movie["cast"])
print(rpad(actor, 20, '.'))
println(lpad(character, 25, '.'))
end
Cast of Characters: Warren Clarke.............................Dim James Marcus..........................Georgie Anthony Sharp........................Minister Patrick Magee...................Mr. Alexander Michael Tarn.............................Pete Aubrey Morris.....................Mr. Deltoid Michael Bates....................Prison Guard Godfrey Quigley...............Prison Chaplain Philip Stone..............................Dad Sheila Raynor.............................Mum Malcolm McDowell.........................Alex
Summary of Dictionaries¶
- Consists of key-value pairs
- Order of elements is irrelevant
- Duplicate elements are allowed
- Is mutable
- Iterable using keys, values, or both
No comments:
Post a Comment