A better recursion function example in R
I had a problem to solve that involved using a recursive function - following links through web pages, programmatically, until a target was found. Searching Google for recursive function tutorials provided examples that were all the same and of no real use - calculate a single factorial value or Fibonacci sequence. I wanted a function to capture values and return a ‘complex’ object.
Background
I was attempting to capture all the dataset URLs from an online server by simulating clicking through their file directories. The root page would give a relative link to following pages, and this would happen recursively until you hit the a directory node/end point with ‘dataset’ links. Clearly, I wanted a function that would take the root URL get the links, navigateto each of them, do this again and again until the page resulted in that with the dataset links.
To check whether the current page for held ‘links’ or ‘datasets’ was simple as either the page had catalogref
or dataset
nodes, respectively. I wanted to return the values in the dataset nodes while capturing the path to them from the root URL. Sounds simple, but it was my first real foray in recursive programming.
The example data
In this example, I use a mock list to simulate the URLs, recurse through them and return the dataset values if found.
<-
mock_list list(
a = list(
aa = list(data = 1,
data = 2,
data = 3),
bb = list(data = 4,
data = 5)
),b = list(
ba = list(data = 6),
bb = list(data = 7,
data = 8)
) )
The goal would be to return something like
a:aa:data = 1,2,3
The recursive function
Building the logic
Our function starts with a list.
<- function(list) {
find_data }
It should use the names of the list to access it’s elements/values, and then iterate through them with a for
loop (maybe an apply
function could work also). While we work through this, we’ll message (print) the list name to the console, as well as the element itself.
<- function(list) {
find_data
<- names(list)
list_names
for (list_name in list_names) {
<- list[[list_name]]
list_element
# do something
message(list_name)
print(list_element)
}
}
With this function, we only examine the top-level list elements.
|>
mock_list find_data()
## a
## $aa
## $aa$data
## [1] 1
##
## $aa$data
## [1] 2
##
## $aa$data
## [1] 3
##
##
## $bb
## $bb$data
## [1] 4
##
## $bb$data
## [1] 5
## b
## $ba
## $ba$data
## [1] 6
##
##
## $bb
## $bb$data
## [1] 7
##
## $bb$data
## [1] 8
Adding recursion
Thus we need a tell the function to run itself with the current list element.
<- function(list) {
find_data
<- names(list)
list_names
for (list_name in list_names) {
<- list[[list_name]]
list_element
# do something
message(list_name)
print(list_element)
# run function again
find_data(list_element)
}
}
An like magic, we’ve gone through all the list elements.
|>
mock_list find_data()
## a
## $aa
## $aa$data
## [1] 1
##
## $aa$data
## [1] 2
##
## $aa$data
## [1] 3
##
##
## $bb
## $bb$data
## [1] 4
##
## $bb$data
## [1] 5
## aa
## $data
## [1] 1
##
## $data
## [1] 2
##
## $data
## [1] 3
## data
## [1] 1
## data
## [1] 1
## data
## [1] 1
## bb
## $data
## [1] 4
##
## $data
## [1] 5
## data
## [1] 4
## data
## [1] 4
## b
## $ba
## $ba$data
## [1] 6
##
##
## $bb
## $bb$data
## [1] 7
##
## $bb$data
## [1] 8
## ba
## $data
## [1] 6
## data
## [1] 6
## bb
## $data
## [1] 7
##
## $data
## [1] 8
## data
## [1] 7
## data
## [1] 7
Improving the logic
We only want to return something where the names data
exist in the list elements.
<- function(list) {
find_data
<- names(list)
list_names
for (list_name in list_names) {
<- list[[list_name]]
list_element
# do something
message(list_name)
# examine list element names
<- names(list_element)
list_element_names
# do something depending on 'data' in names of elements
= any('data' %in% list_element_names)
has_data_names
if (has_data_names){
message(' element has data')
else {
} message(' need to recurse')
}
# run function again
find_data(list_element)
}
}
This almost works.
|>
mock_list find_data()
## a
## need to recurse
## aa
## element has data
## data
## need to recurse
## data
## need to recurse
## data
## need to recurse
## bb
## element has data
## data
## need to recurse
## data
## need to recurse
## b
## need to recurse
## ba
## element has data
## data
## need to recurse
## bb
## element has data
## data
## need to recurse
## data
## need to recurse
Reason why it ‘almost’ works is that we go too far into the list.
a # mock_list$a
need to recurse # does not have 'data' elements
aa # mock_list$a$aa
element has data # has 'data' elements ... we want to return data here
data # BUT our function continues for each child element
need to recurse
data # here ..
need to recurse
data # and here ...
need to recurse
ab # We should conintue here
element has data
...
Moving the recursion position
We need to only recurse when we do not find ‘data’.
<- function(list) {
find_data
<- names(list)
list_names
for (list_name in list_names) {
<- list[[list_name]]
list_element
# do something
message(list_name)
# examine list element names
<- names(list_element)
list_element_names
# do something depending on 'data' in names of elements
= any('data' %in% list_element_names)
has_data_names
if (has_data_names){
message(' element has data')
else {
} message(' need to recurse')
# run function when we need to recurse
find_data(list_element)
}
}
}
This looks like what we’re after. But we are only printing to the console right now.
|>
mock_list find_data()
## a
## need to recurse
## aa
## element has data
## bb
## element has data
## b
## need to recurse
## ba
## element has data
## bb
## element has data
Returning data
Now the mind boggling part - what, when and where do our outputs go?
A list object is a good choice, as we can build them up using named elements. We then allocate data to named elements in it both when the function is returning data and when the recursion function is returning data.
I only arrived at this solution through trial and error !!
<- function(list) {
find_data
# Output list to save results
<- list()
output
<- names(list)
list_names
for (list_name in list_names) {
<- list[[list_name]]
list_element
# examine list element names
<- names(list_element)
list_element_names
# do something depending on 'data' in names of elements
= any('data' %in% list_element_names)
has_data_names
if (has_data_names){
# return the list element with data
<- list_element
output[[list_name]]
else {
}
# run function when we need to recurse
<- find_data(list_element)
output[[list_name]]
}
}
# and return the final output
output
}
Bingo! A named list that records the path to data through the names of the list e.g. a$aa
|>
mock_list find_data()
## $a
## $a$aa
## $a$aa$data
## [1] 1
##
## $a$aa$data
## [1] 2
##
## $a$aa$data
## [1] 3
##
##
## $a$bb
## $a$bb$data
## [1] 4
##
## $a$bb$data
## [1] 5
##
##
##
## $b
## $b$ba
## $b$ba$data
## [1] 6
##
##
## $b$bb
## $b$bb$data
## [1] 7
##
## $b$bb$data
## [1] 8
<-
results |>
mock_list find_data()
$a$aa results
## $data
## [1] 1
##
## $data
## [1] 2
##
## $data
## [1] 3
$b$ba results
## $data
## [1] 6
Conclusion
Here we wrote a recursive function to search through a list and return found objects into another list. The logic appeared relatively easily, but at the same time, how the final object is build is slightly perplexing as it seems like we were referencing the same thing - magic nonetheless.