Picture of stu

Living Lazy, Without Variables

Programmers coming to functional languages for the first time cannot imagine life without variables. I address this head-on in the Clojure book. In Section 2.7 (free download here), I port an imperative method from the Apache Commons Lang to Clojure. First the Java version:

// From Apache Commons Lang, http://commons.apache.org/lang/
public static int indexOfAny(String str, char[] searchChars) {
  if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
      return -1;
  }
  for (int i = 0; i < str.length(); i++) {
      char ch = str.charAt(i);
      for (int j = 0; j < searchChars.length; j++) {
        if (searchChars[j] == ch) {
            return i;
        } 
      }
  }
  return -1;
}

And now the Clojure code. I have shown the supporting function indexed as well:

(defn indexed [s] (map vector (iterate inc 0) s))
(defn index-of-any [s chars]
  (some (fn [[idx char]] (if (get chars char) idx)) 
          (indexed s)))

There are many things I like about the Clojure version, but I want to focus on something I didn't mention already in the book. A reader thought the Clojure version did too much work:

...the [Java] version can be seen as *more efficient* when a match is found because scanning stops right there, whereas "indexed" constructs the whole list of pairs, regardless of whether or not a match WILL be found....

The reader's assumption is reasonable, but incorrect. Clojure's sequence library functions are generally lazy. So the call to indexed is really just a promise to generate indexes if they are actually needed.

To see this, create a logging-seq that writes to stdout every time it actually yields an element:

(defn logging-seq [s]
  (if s
    (do (println "Iterating over " (first s))
    (lazy-cons (first s) (logging-seq (rest s))))))

Now, you can add logging-seq to indexed so that each element of indexed is of the form [index, element, logged-element].

(defn indexed [s] (map vector (iterate inc 0) s (logging-seq s)))

Test the modified indexed function at the Clojure REPL:

user=> (indexed "foo")
Iterating over  f
(Iterating over  o
[0 \f \f] Iterating over  o
[1 \o \o] [2 \o \o])

As you can see, the indexed sequence is only produced as needed. (At the REPL it is needed to print the return value.)

Finally, you can test indexed-of-any and see that Clojure only produces enough of the sequence to get an answer. For a match on the first character, it only goes to the first character:

(index-of-any "foo" #{\f})
Iterating over  f
0

If there is no match, index-of-any has to traverse the entire string:

(index-of-any "foo" #{\z})
Iterating over  f
Iterating over  o
Iterating over  o
nil

So give up on those variables, and live lazy!

Comments
  1. PhilDecember 01, 2008 @ 05:09 PM

    Good stuff; thanks for posting this.

    I was really confused initially by the way the REPL interleaves println output with the return value; REPLs in every other language wait till the list is complete before outputting anything.

    Also, it looks like the indentation in logging-seq is not quite right; it makes it seem that the call to do only wraps the println call instead of the whole remainder of the function body.

  2. Howard M. Lewis ShipDecember 01, 2008 @ 06:41 PM

    Laziness is so important! It was one of the big features that drew me to Haskell (though I never made a much progress there as I have in Clojure). I was initially disappointed that Clojure was not inherently lazy, the way Haskell is … but since nearly everything you do in Clojure is operations on collections, and all those operations are fully lazy, the end result is the same. The fastest code is the code that never executes!

  3. VulpyneDecember 01, 2008 @ 09:45 PM

    And in Haskell:

    indexOfAny :: String -> String -> Int indexOfAny s cs = case take 1 $ dropWhile (\(c,) -> c `notElem` cs) $ zip s [0..] of { [] -> -1; ((,p):_) -> p }

  4. Daniel SpiewakDecember 02, 2008 @ 08:37 AM

    I love lazy collections. Actually, I love pervasive laziness alla-Haskell much more, but I’ll settle for what I can get. :-) It’s a shame that Scala can’t put more emphasis on this aspect of FP… (it’s design precludes it)

    It’s worth noting that even if Clojure’s collections weren’t lazy, the functional version would still be mathematically equivalent in worst-case performance to the imperative version. The imperative version makes the optimization assumption that the element will be found early in the collection. However, with truly random data sets and elements which are in the collection, this only cuts the average runtime in half. In other words, we have:

    Functional: O(n) \theta(n) Imperative: O(n) \theta(n / 2)

    The trick is that \theta(n / 2) is actually on the same order as \theta(n), meaning that the performance difference is negligible.

    Of course, FP does impose overhead in other areas, such as an insane number of short-lived objects; but that’s a mind-bogglingly low-cost imposition on modern HotSpot, so we don’t need to worry about it. The bigger issue is that FP requires a lot of anonymous functions/lambdas/closures/thingies. Calling an anonymous function has a bit more overhead than a conventional function, and certainly creating this function in the first place has a great deal of overhead.

    On the whole, FP does tend to be a bit slower than imperative programming, but not because it is missing break/continue.

  5. FalconNLDecember 02, 2008 @ 08:44 AM

    Shorter Haskell version:

    import Maybe

    indexOfAny s cs = maybe (-1) snd . listToMaybe . filter (flip elem cs . fst) $ zip s [0..]

    Even shorter Haskell version:

    import Data.List

    indexOfAny s cs = maybe (-1) id $ findIndex (flip elem cs) s

  6. kunnarDecember 02, 2008 @ 09:41 PM

    great post!

    @Vulpyne i figured out clojure example, but this haskell thing seems to be just random pile of chars :)

  7. Phil BrassDecember 02, 2008 @ 09:42 PM

    As awesome as Haskell’s pure lazy efficient self is, I can’t help thinking its got a very good chance of taking the “write-only” crown from Perl. I just need to keep studying it until I get it, I suppose…

  8. James IryDecember 03, 2008 @ 12:58 AM

    Be careful about saying “no variables.” You certainly do have variables in there. Just like in math where I can say that in the linear function “f(x) = 2x + 1” x is a variable. It’s a variable because…well…it varies. The fact that, unlike in many programming languages, the symbol is not bound to a mutable cell that can be changed imperatively doesn’t make it any less a variable.

    The word variable is in fact used to in describing lambda calculus and Haskell 98, neither of which has a concept of mutation.

  9. Ricky ClarksonDecember 04, 2008 @ 12:08 PM

    C#’s System.Linq is completely lazy. Scala has lots of support for laziness.

  10. Itay MamanDecember 07, 2008 @ 05:34 PM

    I like this example which nails the differences between imperative and functional right to the point. Good job Stuart. I know how hard it is to find great examples.