Monday, March 29, 2021

Functions: Recursion

A common approach to defining functions is to specify the value that the function takes on in one or a small number of cases, and then specify larger values in terms of smnaller values. Definitions like this are called recursive definitions, and the functions so defined are called recursive functions.

Here's an extremely artificial and obtuse example of all this: if you wanted to add the numbers from 1 to 5, well...

  • this is just 5 + the sum of the numbers from 1 to 4
  • and that last sum is 4 + the sum of the numbers from 1 to 3
  • and that last sum is 3 + the sum of the numbers from 1 to 2
  • and that last sum is 2 + the sum of the numbers from 1 to 1
  • but the sum of the numbers from 1 to 1 is just 1!

We then work our way back up that series of statements and get the answer: 5 + 4 + 3 + 2 + 1.

That last step - that the sum of the numbers from 1 to 1 is called a base case. Without base cases, recursive functions would not terminate, and we'd get the dreaded stack overflow error! What's a stack overflow error? This fun video explains it:

https://www.youtube.com/watch?v=-PX0BV9hGZY

Recursive functions always work from larger inputs to smaller inputs, so if a base case is specified, how can it NOT stop at a base case? Here are some ways:

  • we call the function with a non-integer value or a negative value
  • instead of using one value less like we did with the sum example, we decrement by two
  • instead of decrementing by a fixed amount, we divide the input by, e.g., ten at each step

The common problem in each is that the function misses the base case as it works on smaller and smaller values. Each one of these situations can be fixed, but if they're not anticipated, we get a stack overflow error, and must then fix the base case in hindsight.

Here's an example of the third situation:

In [1]:
Out[1]:
f (generic function with 1 method)
In [2]:
Out[2]:
3

This function works great when given a power of ten, like 1, 10, 100, 1000, etc., but fails on other values because we divide by two at each step therefore never hitting the base case which is when n = 1. Let's try it:

In [3]:
StackOverflowError:

Stacktrace:
 [1] f(n::Float64) (repeats 79984 times)
   @ Main ./In[1]:5

See, a stack overflow error.

By the way, what exactly does this function calculate when it does terminate?

Later we'll see an example of a recursive function that uses two base cases.

The Factorial Function

For our first real example, consider factorial function: this function, written n!, takes a positive integer n as input and returns the product of all integers from one to that number. For example, 4! = 4 * 3 * 2 * 1 = 24. Here's how it can be defined recursively:

n!={1if n=1n(n1)!otherwise

Here's how to translate that definition into a Julia function:

In [4]:
Out[4]:
factorial (generic function with 1 method)
In [5]:
Out[5]:
24

Let's make sure we understand how this works by walking through it by hand:

factorial(4) = 4 * factorial(3) = 4 * (3 * factorial(2)) = 4 * (3 * (2 * factorial(1))) = 4 * (3 * (2 * 1)) = 4 * (3 * 2) = 4 * 6 = 24

Whenever a function is called, a "program state" is saved to a "stack". Once the function is finished, the program state is popped off the stack and the program continues. As we can see, the factorial function as written consumes memory space for the stack of program state, as well as performing the actual multiplication!

Can we do better?

Tail Recursion for the Factorial Function

One approach to improve the calculations is to use what is called tail recursion. This is a programming technique where recursion is the very last thing the function does. Note that the implementation of factorial we just wrote does not operate in this way: after doing recursion, we then have to do all the multiplication!

The idea of tail recursion is to maintain the intermediate results as an argument of the function. In a sense, this extra argument acts like an accumulator.

Ordinarily we think of the factorial function as requiring exactly one argument, so how do we get this extra argument? In languages that don't allow a function's arguments to have default values, one would have to write two functions:

  • one function with two arguments that does the actual tail recursion work, and
  • another function, a wrapper function, with one argument that calls the first function while initializing the value of the second argument

Here are the two functions, written without benefit of default values:

In [6]:
Out[6]:
factorial_2 (generic function with 1 method)
In [7]:
Out[7]:
24

Fortunately, Julia allows a function's arguments to have default values! Here is how we would then write the factorial using tail recursion without the need of a wrapper function:

In [8]:
Out[8]:
factorial_TR (generic function with 2 methods)
In [9]:
Out[9]:
24

Let's walk through the calculation by hand:

factorial_TR(4) = factorial_TR(4, 1) = factorial_TR(3, 4) = factorial_TR(2, 12) = factorial_TR(1, 24) = 24

The last line follows because the implementation of factorial_TR tells us to return the accumulator when n is 1. That's the base case.

Notice how the multiplication is performed as factorial_TR is called, instead of waiting for the recursion to "unwind" as in the first factorial implementation.

Can we do better?

Calculating Factorials Using a Loop

Recursion can be avoided, at least with factorials, by just following the original description of the factorial function as given above: "take a positive integer n as input and return the product of all integers from one to that number." Really, this is just a loop, so we can implement that description in Julia as follows:

In [10]:
Out[10]:
factorial_as_loop (generic function with 1 method)
In [11]:
Out[11]:
24

This is perhaps the most understandable way of calculating factorials.

The Euclidean Algorithm

Suppose we wish to find the greatest common divisor (GCD) of two positive integers a and b. This is the largest whole number that divides evenly into both a and b. How can we do this?

A first solution one may try is to factor a and b into primes, and take the product of the common factors; if a and b have no primes in common, their GCD is just 1. For example, let a = 6750 and b = 1400. Then:

a=6750=213353

b=1400=235271

Both a and b have 2 as a factor, and the smallest number of 2s is found in a; 5 is a factor of both, and the smallest number of 5s is found in b, so:

gcd(6750,1400)=2152=50

The problem with this approach is that it requires we find the prime factorization of both numbers - not an easy task!

The immediate purpose of the Euclidean algorithm is to find the GCD of the two numbers without first factoring them into primes. The fundamental idea behind this algorithm is that the GCD of a and b (where ab) is the same as the GCD of of a-b and b. This is not an obvious statement - see any elementary numnber theory textbook for a proof.

This fundamental idea tells us, for example, that

gcd(6750,1400)=gcd(5350,1400)

We can check this as follows:

ab=5350=21521071

b=1400=235271

and, using the common prime counting trick,

gcd(5350,1400)=2152=50.

This key insight - that the GCD of a and b (where ab) is the same as the GCD of of ab and b - basically states how to convert the original task of finding the GCD into the same task, but with smaller numbers. That sounds like recursion!

Of course, if there is no base case, the Euclidian algorithm would never terminate. The base case is this: the GCD of a number b and 0 is just b.

Let's try implementing it!

In [12]:
Out[12]:
gcd (generic function with 1 method)
In [13]:
Out[13]:
50

Let's walk through the execution of this function, focusing on the times that a and b are swapped, and the point where the function calls itself:

gcd(6750, 1400) = gcd(5350, 1400) = gcd(3950, 1400) = gcd(2550, 1400) = gcd(1150, 1400) = gcd(1400, 1150) = gcd(250, 1150) = gcd(1150, 250) = gcd(900, 250) = gcd(650, 250) = gcd(400, 250) = gcd(150, 250) = gcd(250, 150) = gcd(100, 150) = gcd(150, 100) = gcd(50, 100) = gcd(100, 50) = gcd(50, 50) = gcd(0, 50) = gcd(50, 0) = 50

We can modify the implementation so that we aren't subtracting b from a over and over, like the first four steps where we subtracted 1400, etc. Really, the purpose of all this subtraction is to find the remainder - how much is left over from subtracting 1400 from 6750 as many times as possible while still getting something non-negative.

Julia has an operator for finding the remainder: %. Let's use it!

In [14]:
Out[14]:
gcd (generic function with 1 method)
In [15]:
Out[15]:
50

Let's walk through this by hand so see what we gained:

gcd(6750, 1400) = gcd(1150, 1400) = gcd(1400, 1150) = gcd(250, 1150) = gcd(1150, 250) = gcd(150, 250) = gcd(250, 150) = gcd(100, 150) = gcd(150, 100) = gcd(50, 100) = gcd(100, 50) = gcd(0, 50) = gcd(50, 0) = 50

This is better, but notice that we have to swap the arguments each time. By definition, the remainder of a divided by b must be smaller than b, so we can swap them when we recursively call the function:

In [16]:
Out[16]:
gcd (generic function with 1 method)

We can go further: there is no need to swap the numbers at the start, since b is always greater than a % b. This, then, is our final recursive implementation of Euclid's algorithm:

In [17]:
Out[17]:
gcd (generic function with 1 method)
In [18]:
Out[18]:
50

To compare this with previous versions, here's a walkthrough of this last implementation:

gcd(6750, 1400) = gcd(1400, 1150) = gcd(1150, 250) = gcd(250, 150) = gcd(150, 100) = gcd(100, 50) = gcd(50, 0) = 50

Nice - no swaps and no repeated subtractions!

The Euclidean Algorithm - the Iterative Version

The above implementation is correct, but it doesn't quite match the way the Euclidean algorithm is described in many a modern elementary number theory textbooks. Instead, the algorithm is presented as a loop - we're supposed to repeatedly subtract the smaller number from the larger one, swapping as we go along, until the smaller one is zero. At that point, the larger number is the GCD.

We can use the above insights to avoid repeated subtractions to simplify this, and here's how it would be implemented:

In [19]:
Out[19]:
gcd_as_loop (generic function with 1 method)
In [20]:
Out[20]:
50

This implementation precisely matches the Euclidean algorthm as described in modern textbooks, where a typical example calculation might look like this, with arrows showing how the value in column b moves over to column a in each step:

a b
6750 1150
1150 250
250 150
150 100
100 50
50 0

While this is surely an efficient method for computation of the GCD, does it maintain the insight as to why the Euclidean algorithm works?

The Fibonacci Numbers

Here's our last example, the Fibonacci numbers. This is a sequence of numbers related to the groth of plants, animal populations, etc. The first few values of this sequence are:

0, 1, 1, 2, 3, 5, 8, 13, 21, ...

The first value is zero, the second value is one, and subsequent values are the sum of the two preceding terms. Here's how it can be defined as a recurrence relation:

Fn={0if n=01if n=1Fn1+Fn2otherwise

Again, this is easy to translate into a Julia function:

In [21]:
Out[21]:
fib (generic function with 1 method)
In [22]:
Out[22]:
13
In [23]:
Out[23]:
5

This Fibonacci function has not only the space and delayed computation problems of the recursive factorial function, it also has a problem of repeated calculation! To see what I mean, let's walk through through a small example:

fib(5) = fib(4) + fib(3) = (fib(3) + fib(2)) + (fib(2) + fib(1)) = ((fib(2) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1)) = (((fib(1) + fib(0)) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1)) = (((1 + 0) + 1) + (1 + 0)) + ((1 + 0) + 1) = ((1 + 1) + 1) + (1 + 1) = (2 + 1) + 2 = 3 + 2 = 5

Notice how many times fib(3) had to be calculated!

What can be done about this?

Memoization

Memoization is a method for storing the results of calculations for later use, so instead of calculating fib(2) over and over, we will calculate it once and then when the value is needed again, we will look-up that value rather than repeat the calculation.

How should we store these calculations? One approach is to use dictionaries.

In [24]:
Out[24]:
Dict{Int64, Int64} with 2 entries:
  0 => 0
  1 => 1

To make use of this, we make a new version of the fib function. Since FibonacciDict stores the first two values of the sequence, we don't need the base cases.

In [25]:
Out[25]:
fib_memoized (generic function with 1 method)

Before running this function...

In [26]:
Out[26]:
Dict{Int64, Int64} with 2 entries:
  0 => 0
  1 => 1
In [27]:
Out[27]:
13

... and afterwards:

In [28]:
Out[28]:
8-element Vector{Pair{Int64, Int64}}:
 0 => 0
 1 => 1
 2 => 1
 3 => 2
 4 => 3
 5 => 5
 6 => 8
 7 => 13

What memoization does is to trade the effort needed for duplicate calculations (time) for storing the calcuations in memory (space). This type of space-time tradeoff is very common in computer programming.

Can we do better?

Tail Recursion for the Fibonacci Numbers

Let's try tail recursion. One way of implementing this is to use two extra arguments for storing the previous two values in the series.

As mentioned above, in languages where functions cannot have default values, we would use two functions, one to perform the calculation, and a wrapper function.

In [29]:
Out[29]:
fib_2_without_defaults (generic function with 1 method)

Here's how we'd implement it using default values:

In [30]:
Out[30]:
fib_TR (generic function with 3 methods)
In [31]:
Out[31]:
13

Was there any duplicate calculations? Let's walk through this by hand to see:

fib_TR(7) = fib_TR(7, 0, 1) = fib_TR(6, 1, 1) = fib_TR(5, 1, 2) = fib_TR(4, 2, 3) = fib_TR(3, 3, 5) = fib_TR(2, 5, 8) = fib_TR(1, 8, 13) = fib_TR(0, 13, 21) = 13

So no duplicate calculations, yay!

Can we do better?

Fibonacci Numbers - the Iterative Version

Another way we can calculate the nth Fibonacci number is by starting with 0 and 1, add them, then use that result to calculate the next number in the sequence, doing this over and over n times. Each time through the loop, we add the two previous values, then move the that sum and the immediate predecessor to the previous values.

In [32]:
Out[32]:
fib_as_loop (generic function with 1 method)
In [33]:
Out[33]:
13

Here's the values of n, a, and b as fib_as_loop(7) executes:

n a b
- 0 1
1 1 1
2 1 2
3 2 3
4 3 5
5 5 8
6 8 13
7 13 21

This is a very intuitive approach, but still requires us to calculate all the previous terms in the Fibonacci sequence in order to get the number we want. Can we do better?

Closed Form of the Fibonacci Sequence

Our final approach to calculating the Fibonacci numbers is to use what is called the "closed form" of this sequence. Not all recurrance relations have a closed form, but as it goes, there is a formula called Binet's formula for calculating the nth Fibonacci number without calculating the prior values. This formula uses two constants, one of which is the golden ratio:

ϕ=1+52

ψ=152=1ϕ

and the Binet formula itself is:

Fn=15(ϕnψn)

The Golden Ratio is in Julia's Base package:

In [34]:
Out[34]:
φ = 1.6180339887498...

and this makes Binet's formula even easier to convert to a Julia function:

In [35]:
Out[35]:
fib_Binet (generic function with 1 method)
In [36]:
Out[36]:
5.0

The formula is completely accurate and always returns an integer (which is in itself amazing), but we are working with limited precision arithmetic, so we should expect rounding errors. We do indeed get them:

In [37]:
Out[37]:
832039.9999999999

We can fix this by rounding, but we can do better by noticing that

ϕn5<12

for all n0. Thus:

Fn=ϕn5+12

and that is just ϕn5 rounded to the nearest integer. Let's try it out!

In [38]:
Out[38]:
fib_Binet2 (generic function with 1 method)

Using Binet's formula we can calculate extremely large Fibonacci numbers:

In [39]:
Out[39]:
7540113804746344448

Which of the other versions of the Fibonacci function can calculate F92 without encountering some sort of error?

Conclusion

In this chapter we used recursive functions to compute GCDs, factorials, and Fibonacci numbers... only to rewrite them in non-recursive forms!

Does this mean that recursive functions are to be avoided at all costs?

No! And here's why:

  1. Recursion is an intuitive and natural way for describing and solving problems (though this intuition only comes from experience)
  2. Many problems in compiler design are best solved using recursion
  3. For certain data structures, like trees, it makes sense to use recursion to operate upon them
  4. The performance of recursive algorithms frequently can be analyzed and understood
  5. It makes sense to produce certain types of graphics using recursion; we'll see this later.

There are extremely few hard-and-fast rules when it comes to the art of computer programming (despite the bilge expounded by "best practice experts," "clean coders," and "style guides"), and completely banishing recursion is not one of them. Like any other tool, recursion must be used intelligently in order to get good results.

No comments:

Post a Comment