Patterns of fallible iteration

Since Rust is a language without exceptions, errors are signaled by returning a Result: an enum that denotes either the function’s successful return value, or information about the error. This can seem like it requires a lot of error-handling boilerplate, but for most code this is in fact not the case. Rust’s powerful type system, the near-ubiquitous error-returning convention, and the ? operator combine convenience of exceptions with the safety of explicit return values. However, there are situations when error handling is buried inside callbacks and propagating the error takes a bit of effort. One such situation is when processing iterators that can return errors, where the ? error-handling operator cannot be used in the usual way.

This article will present a fallible iterator and show several ways to handle its errors without resorting to panic.

Explicit loop vs iterator

Reading a file is a typical example of fallible iteration. While the importance of handling errors when opening a file is widely acknowledged because the file can be missing or unreadable, errors while reading the file happen comparatively rarely and are often converted to panic with unwrap(). But such errors can and do occur, for example when the file is behind a network file system, or when it is backed by a pipe or hardware device, and should be handled cleanly. For demonstration purposes we’ll use a simple function that reads a file line by line, parses each line as integer, and returns the line with the maximum value. To indicate the possibility of failure, the function will return Result<u64, io::Error>, written more shortly as io::Result<u64>:

fn max_line(file_name: &str) -> io::Result<u64> {
    // ...
}

This signature gives the callers freedom to decide how to handle the errors indicated by max_line – they can unwrap() the return value to panic in case of error, they can match the error and handle the error variant, or they can use ? to propagate the error, if it occurs, to their caller. Here is a straightforward implementation of max_line, itself making copious use of the strange-looking ? operator:

fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    let mut max = 0;
    for line_result in BufReader::new(file).lines() {
        let line = line_result?;
        let n = line
            .parse::<u64>()
            .map_err(|e| io::Error::new(io::ErrorKind::Other, e.to_string()))?;
        if n > max {
            max = n;
        }
    }
    Ok(max)
}

? is a postfix operator best described as use-or-return. If x is Result<T, SomeError>, the type of the expression x? will be T. When x is Ok(val), x? will evaluate to val, and when x is Err(e), the operator will just return the error result from the function. Applying the ? operator to Result actually requires the function to be declared as returning Result with a compatible error variant as return value, and fails to compile if this is not the case. BufRead::lines() provides lines as Results, so the expression let line = line_result? desugars to something like:

let line = match line_result {
    Ok(s) => s,               // a string, use it
    Err(e) => return Err(e),  // an error, return it
};

A minor complication arises from FromStr::parse for u64 not returning an io::Error, but a ParseIntError. Since we don’t particularly care about distinguishing between these two kinds of error condition, we use map_err() to unceremoniously convert ParseIntError into an io::Error, retaining the error message.

Readers experienced in idiomatic Rust will notice that the above function calculates the maximum using a for loop and a mutable accumulator, effectively reimplementing Iterator::max. Since BufReader::lines() already returns an iterator, it seems like an obvious improvement to combine it with map() and `max(), something like this:

// XXX
fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    BufReader::new(file)
        .lines()
        .map(|line_result| {
            let line = line_result?;
            let n = line.parse::<u64>()
                .map_err(|e| io::Error::new(io::ErrorKind::Other, e.to_string()))?;
            Ok(n)
        })
        .max()
        // unwrap_or() because max() returns an Option which is None 
        // when the iterator is empty
        .unwrap_or(Ok(0))
}

Not unsurprisingly, this fails to compile. The compiler complains that “the trait std::cmp::Ord is not implemented for std::io::Error“, which is Rust’s way of saying that Iterator::max() has no idea how to compare the io::Result<u64> values the closure passed to map() is producing. But the problem lies deeper than than a simple inability to compare. In the original for loop the ? operator could return from the max_line function, effectively aborting the iteration in case of error. In the iterator version, the parsing happens inside a closure, which takes away the power of ? to abort max_line(). In case of error, ? used in the closure returns an error result from the closure, but the iteration continues.

Stopping iteration on error will require a different approach, with several options depending on what we need to do with the items produced by the iterator.

Collecting items into container – collect

If you need to collect items from a fallible iterator, Rust offers an elegant idiom for aborting the iteration on error. Instead of normally collecting into a container, you can collect into Result<Container>. The effect will be exactly what we are after: collect() will return a Result that will be Ok(container) if all the values were Ok, and Err(e) if an error was encountered, in which case the iteration will have stopped after consuming the error. Applied to max_line, it would look like this:

// correct, but uses an intermediate vector
fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    let numbers_result: io::Result<Vec<u64>> = BufReader::new(file)
        .lines()
        .map(parse_line)
        .collect();
    let max = numbers_result?.into_iter().max().unwrap_or(0);
    Ok(max)
}

This performs the processing in two steps: first, the numbers are collected into a result that contains either a vector of numbers, or an error. We use ? to grab the vector or return the error and, provided there was no error, proceed to find the maximum.

Side note: since the closure passed to map will remain unchanged in subsequent examples, we moved it to a separate helper function:

fn parse_line(line_result: io::Result<String>) -> io::Result<u64> {
    let line = line_result?;
    let n = line
        .parse::<u64>()
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e.to_string()))?;
    Ok(n)
}

The above max_line has the same error-handling behavior as the original for loop, but at the cost of allocating a potentially huge vector of numbers only to find the maximum. If you need the items in the vector for other purposes, that’s the approach you want to take. If you need to process the items in a streaming fashion, read on.

Consuming items – try_fold

max_line() is privileged to control the iteration from start to end: it sets up an iterator and consumes it, either with the for loop or letting a folding function like max() do it. Other than collect() discussed above, Iterator provides two additional consuming methods that stop on error: try_fold and try_for_each.

While a try_max() doesn’t exist, it is easy to emulate it because we can replace max() with fold(0, |max, n| std::cmp::max(max, n)). Likewise, try_max() is neatly expressible in terms of try_fold:

fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    BufReader::new(file)
        .lines()
        .map(parse_line)
        .try_fold(0u64, |max, n_result| Ok(std::cmp::max(max, n_result?)))
}

Not quite as pretty as try_max() would have been, but works exactly as the original for loop, and operates on the iterator. We don’t even need to use the ? operator for the result because max_line() returns an io::Result<u64> and that’s exactly what try_fold() gives us. If you can use this in your code, that’s almost certainly what you want.

The limitation of this approach is that it requires that you control how the iterator is consumed. If instead of max() we needed to invoke a function that accepts an iterator provided by a third party, we would have problems. Not every function that consumes an iterator can be easily replaced by a home-grown version based on try_fold or try_for_each. For example, the itertools crate contains a number of useful iterator adapters to solve real-world problems, including itertools::merge_by. We could use it to efficiently merge two sorted streams of records into a single stream, maintaining the sort order:

let iter1 = reader1.lines().map(Record::from_string);
let iter2 = reader2.lines().map(Record::from_string);
let mut out = Writer::new(...);
// XXX where to put try_fold or try_for_each?
for rec in itertools::merge_by(iter1, iter2, comparator) {
    out.write(rec.to_string())?;
}

Handling errors reported by iter1 and iter2 is not a simple matter of using try_fold or try_for_each because the record iterators are consumed by itertools::merge_by. As of this writing there is no try_merge_by in itertools, and it’s not clear there should be one, because adding try_merge_by would imply adding a fallible version of other adapters, eventually doubling the API surface of the module and the trait. There has to be a better way.

Stop-at-error iterator adapter – scan

Stopping at an error is really a special case of stopping the iteration on an arbitrary condition – something Iterator easily handles with take_while. It would seem that adapting an iterator to stop on error is as simple as tacking take_while(Result::is_ok) onto it, possibly adding a map to extract the values. The result looks like this:

// XXX
fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    let max = BufReader::new(file)
        .lines()
        .map(parse_line)
        .take_while(|n_result| match n_result {
            Ok(_) => true, // keep going
            Err(e) => {
                // XXX e is &io::Error
                false      // stop
            }
        })
        .map(Result::unwrap)
        .max()
        .unwrap_or(0);
    Ok(max)
}

This is closer to what we want, but not quite there yet. Iteration stops on error, but we don’t have the error value, nor any indication whether an error occurred. Since take_while() passes its callback a shared reference to the item, the closure cannot move the error to a captured local variable – the compiler would complain of a “move out of shared reference”. Also, take_while is a filter and not a transformer, so it requires the unsightly .map(Result::unwrap) making it look like the program might panic on error, when it will in fact correctly stop iterating.

We can address all the issues with take_while() by switching to its big brother Iterator::scan(). Like take_while(), scan() supports stopping the iteration, but it also allows transforming the value and storing intermediate state. We’ll use all those features, ending up with this:

fn max_line(file_name: &str) -> io::Result<u64> {
    let file = File::open(file_name)?;
    let mut err = Ok(());
    let max = BufReader::new(file)
        .lines()
        .map(parse_line)
        .scan(&mut err, until_err)
        .max()
        .unwrap_or(0);
    err?;
    Ok(max)
}

There are several new things going on, so let’s go through them one by one.

First, we initialize err to a Result which we’ll use as the place to store the error if one occurs. Its initial value is a placeholder – we could have used a type like Option<io::Error> and initialized it to None, but it’s nicer to use a Result because it will then work with ? in our function. As err serves only to detect the error, it has no meaningful Ok variant, and we initialize it with a unit value.

Second, after the unchanged map(parse_line), we chain the iterator to the scan() adapter, passing it a utility function which we’ll show shortly. The function passed to scan() provides values that will be returned from iterator’s next(), which means it stops the iteration by returning None. And that’s precisely what until_err does: returns Some(item) if the result is Ok(item), and otherwise stores the error result into the provided reference and returns None.

Third, since max() isn’t privy to our setup, it will return the max of the numbers it gets from the iterator. If there was no error, that will be the whole file and in case of error, we’ll get the maxiumum of lines until the first error. (Which is equivalent to what the max variable would have contained in case of error in our original loop implementation.) So after the call to max(), we must remember to check whether the iteration was prematurely aborted, and if so, return the error. err? does that and is simply a shorthand for if let Err(err) = err { return err }. Finally, once we’ve gotten the errors out of the way and have proven that max() has observed all lines from the file, we can return it from the function.

Finally, The until_err helper function looks like this:

fn until_err<T, E>(err: &mut &mut Result<(), E>, item: Result<T, E>) -> Option<T> {
    match item {
        Ok(item) => Some(item),
        Err(e) => {
            **err = Err(e);
            None
        }
    }
}

The signature looks daunting because the function accepts a unique reference to the object passed to scan, which is itself a reference to our local variable. (This signature allows passing temporary values to scan(), but we cannot do so because we actually need to use the err local variable as a side channel.) The logic of the function is a straightforward and described above: a pattern match mapping Ok(item) to Some(item) and Err(e) to None, the latter stopping the iteration and storing e to make it available to the caller.

To conclude, here is how scan() would apply to the code that uses itertools::merge_by:

let (mut err1, mut err2) = (Ok(()), Ok(()));
let iter1 = reader1.lines().map(Record::from_string).scan(&mut err1, until_err);
let iter2 = reader2.lines().map(Record::from_string).scan(&mut err2, until_err);
let mut out = Writer::new(...);
for rec in itertools::merge_by(iter1, iter2, comparator) {
    // check whether an input iterator has stopped, and abandon the loop if so
    err1?;
    err2?;
    out.write(rec.to_string())?;
}

Once merge_by is set up correctly, the remaining for loop can be easily transformed to use try_for_each() (or a separate scan() and for_each() pair!) to handle the write errors:

itertools::merge_by(iter1, iter2, comparator)
    .try_for_each(|rec| {
        err1?;
        err2?;
        out.write(rec.to_string())?;
    })?;

Summary

Fallible iteration is often omitted from introductory materials, but Rust does provide several idioms to handle errors in items:

  • If you need to collect the items, use collect::<Result<Container>, _>().
  • If you control how the iterator is consumed, use try_fold() or try_for_each() to stop at first error. Those two methods are even provided by Rayon’s parallel iterators.
  • Otherwise, use scan(&mut err, until_err) and use the iterator as usual. You’ll just need to live with the until_err helper function (which you can also write as a closure) and remember to check err for error after the iterator has been exhausted.

Note: the original version of this article used Iterator::sum() as the method that consumes the iterator, but a reddit reader pointed out that sum() actually stops on first result automatically.

Leave a Reply