Zip will be a familiar operator to any readers who use Python. It was introduced in .NET 4 - it's not entirely clear why it wasn't part of the first release of LINQ, to be honest. Perhaps no-one thought of it as a useful operator until it was too late in the release cycle, or perhaps implementing it in the other providers (e.g. LINQ to SQL) took too long. Eric Lippert blogged about it in 2009, and I find it interesting to note that aside from braces, layout and names we've got exactly the same code. (I read the post at the time of course, but implemented it tonight without looking back at what Eric had done.) It's not exactly surprising, given how trivial the implementation is. Anyway, enough chit-chat...
Zip has a single signature, which isn't terribly complicated:
public static IEnumerablethis IEnumerable
IEnumerable
Func
Just from the signature, the name, and experience from the rest of this blog series it should be easy enough to guess what Zip does:
It uses deferred execution, not reading from either sequence until the result sequence is read All three parameters must be non-null; this is validated eagerly Both sequences are iterated over "at the same time": it calls GetEnumerator() on each sequence, then moves each iterator forward, then reads from it, and repeats. The result selector is applied to each pair of items obtained in this way, and the result yielded It stops when either sequence terminates As a natural consequence of how the sequences are read, we don't need to perform any buffering: we only care about one element from each sequence at a time.There are really only two things that I could see might have been designed differently:
It could have just returned IEnumerableI don't have any problem with the design that's been chosen here though.
There are no really interesting test cases here. We test argument validation, deferred execution, and the obvious "normal" cases. I do have tests where "first" is longer than "second" and vice versa.
The one test case which is noteworthy isn't really present for the sake of testing at all - it's to demonstrate a technique which can occasionally be handy. Sometimes we really want to perform a projection on adjacent pairs of elements. Unfortunately there's no LINQ operator to do this naturally (although it's easy to write one) but Zip can provide a workaround, so long as we don't mind evaluating the sequence twice. (That could be a problem in some cases, but is fine in others.)
Obviously if you just zip a sequence with itself directly you get each element paired with the same one. We effectively need to "shift" or "delay" one sequence somehow. We can do this using Skip, as shown in this test:
[Test]public void AdjacentElements()
{
string[] elements = { "a", "b", "c", "d", "e" };
var query = elements.Zip(elements.Skip(1), (x, y) => x + y);
query.AssertSequenceEqual("ab", "bc", "cd", "de");
}
It always takes me a little while to work out whether I want to make first skip or second - but if we want the second element as the first element of second (try getting that right ten times in a row - it makes sense, honest!) means that we want to call Skip on the sequence used as the argument for second. Obviously it would work the other way round too - we'd just get the pairs presented with the values switched, so the results of the query above would be "ba", "cb" etc.
Guess what? It's yet another operator with a split implementation between the argument validation and the "real work". I'll skip argument validation, and get into the tricky stuff. Are you ready? Sure you don't want another coffee?
private static IEnumerableIEnumerable
IEnumerable
Func
{
using (IEnumerator
using (IEnumerator
{
while (iterator1.MoveNext() && iterator2.MoveNext())
{
yield return resultSelector(iterator1.Current, iterator2.Current);
}
}
}
Okay, so possibly "tricky stuff" was a bit of an overstatement. Just about the only things to note are:
I've "stacked" the using statements instead of putting the inner one in braces and indenting it. For using statements with different variable types, this is one way to keep things readable, although it can be a pain when tools try to reformat the code. (Also, I don't usually omit optional braces like this. It does make me feel a bit dirty.) I've used the "symmetric" approach again instead of a using statement with a foreach loop inside it. That wouldn't be hard to do, but it wouldn't be as simple.That's just about it. The code does exactly what it looks like, which doesn't make for a very interesting blog post, but does make for good readability.
Two operators to go, one of which I might not even tackle fully (AsQueryable - it is part of Queryable rather than Enumerable, after all).
AsEnumerable should be pretty easy...


0 comments:
Post a Comment