[Haskell-beginners] Question about lazy evaluation
Zhi-Qiang Lei
zhiqiang.lei at gmail.com
Mon Sep 12 09:05:48 CEST 2011
On Sep 9, 2011, at 1:36 PM, Brandon Allbery wrote:
> On Thu, Sep 8, 2011 at 23:41, Zhi-Qiang Lei <zhiqiang.lei at gmail.com> wrote:
> But the final result is "foobar", so when you said it has completely forgotten about the 'f', were you just meaning in or after that special recursion step, and the 'f' is stored in somewhere else? What is the difference here between Haskell and other imperative language? Thanks.
>
> The 'f' has been passed up to the caller, and is no longer known within *this* computation. Assuming something else isn't holding onto a reference to the computation this one was passed as a parameter, the "cell" that bound the 'f' to the rest of it can be garbage collected immediately and the 'f' will be garbage collected whenever the caller is done with it.
>
> Maybe the part you're missing is that it's effectively incremental. Instead of computing and returning the whole thing, it's done one element at a time and (unless it is forced to be strict somehow) the whole "foobar" isn't ever created as such. (Unless the caller simply re-concatenates the individual elements.) Instead of returning the whole thing, any computation "yield"s (see Python, Ruby, or Icon "generators") the current element and holds on to the rest for the next time it's invoked. So flow of control is not linear; it follows the exact datum needed at any given step and does that and no more.
>
> Strict: compute "foobar" and return to caller
>
> Lazy: Each line below is a subsequent return of control from the caller.
>
> 1. Compute ('f' : ('o' : ("o" ++ "bar"))), yield 'f' to caller, holding onto the tail
> 2. Compute ('o' : ("o" ++ "bar"")), yield 'o'
> The outermost computation is now the (++), so we do that first. The standard expansion of (++) is
>
> > [] ++ ys = ys
> > (x:xs) ++ ys = x : (xs ++ ys)
>
> This is designed to be lazy: the first case just yields up the second parameter to be processed directly, since nothing else can be done with it; the second case picks the first parameter apart one element at a time, yielding each one, then ultimately falls into the first case once the first parameter has been emptied. So:
>
> 3. Compute ('o' : ("" ++ "bar")) by second case of (++), yield 'o'
> 4. Compute "bar" by first case of (++), yield 'b', holding the tail ('a' : 'r' : [])
> 5. Compute ('a' : ('r' : [])), yield 'a'
> 6. Compute ('r' : []), yield 'r'
> 7. Return []
>
> Getting technical: everything (except unboxed values, which you won't see in normal Haskell) is represented as a "thunk", a chunk of code ("computation") which yields up values each time it is invoked until it is exhausted. Values are yielded by evaluating to "weak head normal form", that is, until we reach a data constructor tag. In this case, the constructor for a character constant internally looks something like (C# 'f'#), where C# is the constructor and 'f'# is a raw ("unboxed") Unicode code point stored as a 32-bit machine word. (The (:) is a constructor as well.) This is how we evaluate "through" the (++) to produce a value in step 4; since (++) doesn't produce something in WHNF (when it "returns" ys, it is returning an unevaluated thunk that represents the second parameter), evaluation continues inward until it does (on the next low-level evaluation step, when that anonymous function is invoked).
>
> In normal Haskell, everything is "boxed" (wrapped in a constructor), which is what enables laziness; almost always, what's passed around isn't a value but an anonymous function which computes that value. The stuff that can do the final step of reaching inside a constructor to get at the actual value is hidden inside the runtime code. Much of it is tied to the IO monad, but not all: (+) for a primitive data type such as Int is strict, and reaches inside the constructors of its parameters when evaluated.
>
> Sometimes you want lazy addition, though; often enough that there are a number of packages on Hackage that implement Peano numbers in the type system, aka "type-level naturals". I'll go through one of the simpler implementations this as another example of laziness:
>
> > data Nat = Z | S Nat
> >
> > (+') :: Nat -> Nat -> Nat
> > Z +' ys = ys -- compare to [] case for (++) earlier
> > (S xs) +' ys = x +' S ys -- compare to the other case
>
> The first parameter to (+') is always evaluated to WHNF (because we're pattern matching on it, which is strict); this makes it either Z (zero) or (S x) for an x that is not itself evaluated yet. Beyond that, nothing is evaluated; the result has both x and y unevaluated, and simply moves a constructor inward. Comparing to (++) above, we have (S xs) here vs. (x : xs) above (this is sometimes referred to as a "functional list" since it's a list built via function/constructor application instead of via the list constructor (:)). The other difference is that there's only one non-"empty list" value in the Peano numbers (S) whereas a general list can contain any boxed value (x).
>
> (There is a temptation to "optimize" the above definition by adding in between the two cases
>
> > xs +' Z = xs
>
> This is *not* an optimization in a lazy language! It's forcing the second parameter to be evaluated to WHNF (to see if it's Z or (S x)) when it needn't be; this ends up making (+') strict instead of lazy, and the result is just a painfully slow version of the strict (+). It may be (at least when the full case is expensive) an optimization in a strict language, but in a lazy language you want to avoid these kinds of unnecessary tests; lazy incremental evaluation means the runtime may never need to evaluate the second parameter at all, so why force it?)
>
> I hope this makes things a little clearer.
>
> --
> brandon s allbery allbery.b at gmail.com
> wandering unix systems administrator (available) (412) 475-9364 vm/sms
>
That's much clearer. I appreciate everyone in this thread.
Best regards,
Zhi-Qiang Lei
zhiqiang.lei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20110912/2d113f6e/attachment.htm>
More information about the Beginners
mailing list