While implementing Edulinq, I only focused on two implementations: .NET 4.0 and Edulinq. However, I was aware that there were other implementations available, notably LinqBridge and the one which comes with Mono. Obviously it's interesting to see how other implementations behave, so I've now made a few changes in order to make the test code run in these different environments.
I'm using Mono 2.8 (I can't remember the minor version number offhand) but I tend to think of it as "Mono 3.5" or "Mono 4.0" depending on which runtime I'm using and which base libraries I'm compiling against, to correspond with the .NET versions. Both runtimes ship as part of Mono 2.8. I will use these version numbers for this post, and ask forgiveness for my lack of precision: whenever you see "Mono 3.5" please just think "Mono 2.8 running against the 2.0 runtime, possibly using some of the class libraries normally associated with .NET 3.5".
LinqBridge is a bit like Edulinq - a clean room implementation of LINQ to Objects, but built against .NET 2.0. It contains its own Func delegate declarations and its own version of ExtensionAttribute for extension methods. In my experience this makes it difficult to use with the "real" .NET 3.5, so my build targets .NET 2.0 when running against LinqBridge. This means that tests using HashSet had to be disabled. The version of LinqBridge I'm running against is 1.2 - the latest binary available on the web site. This has AsEnumerable as a plain static method rather than an extension method; the code has been fixed in source control, but I wanted to run against a prebuilt binary, so I've just disabled my own AsEnumerable tests for LinqBridge. Likewise the tests for Zip are disabled both for LinqBridge and the "Mono 3.5" tests as Zip was only introduced in .NET 4.
The other issue of not having .NET 4 available in the tests is that the string.Join
There are batch files under a new "testing" directory which will build and run:
Microsoft's LINQ to Objects and Edulinq under .NET LinqBridge, Mono 3.5's LINQ to Objects and Edulinq under Mono 3.5 Mono 4.0's LINQ to Objects and Edulinq under Mono 4.0Although I have LinqBridge running under .NET 2.0 in Visual Studio, it's a bit of a pain building the tests from a batch file (at least without just calling msbuild). The failures running under Mono 3.5 are the same as those running under .NET 2.0 as far as I can tell, so I'm not too worried.
Note that while I have built the Mono tests under both the 3.5 and 4.0 profiles, the results were the same other than due to generic variance, so I've only included the results of the 4.0 profile below.
Don't forget that the Edulinq tests were written in the spirit of investigation. They cover aspects of LINQ's behaviour which are not guaranteed, both in terms of optimization and simple correctness of behaviour. I have included a test which demonstrates the "issue" with calling Contains on an ICollection
The tests which fail against Microsoft's implementation (for known reasons) are normally marked with an [Ignore] attribute to prevent them from alarming me unduly during development. NUnit categories would make more sense here, but I don't believe ReSharper supports them, and that's the way I run the tests normally. Likewise the tests which take a very long time (such as counting more than int.MaxValue elements) are normally suppressed.
In order to truly run all my tests, I now have a horrible hack using conditional compilation: if the ALL_TESTS preprocessor symbol is defined, I build my own IgnoreAttribute class in the Edulinq.Tests namespace, which effectively takes precedence over the NUnit one... so NUnit will ignore the [Ignore], so to speak. Frankly all this conditional compilation is pretty horrible, and I wouldn't use it for a "real" project, but this is a slightly unusual situation.
EDIT: It turns out that ReSharper does support categories. I'm not sure how far that support goes yet, but at the very least there's "Group by categories" available. I may go through all my tests and apply a category to each one: optimization, execution mode, time-consuming etc. We'll see whether I can find the energy for that :)
So, let's have a look at what the test results are...
Unsurprisingly, Edulinq passes all its own tests, with the minor exception of CastTest.OriginalSourceReturnedDueToGenericCovariance running under Mono 3.5, which doesn't include covariance. Arguably this test should be conditionalised to not even run in that situation, as it's not expected to work.
8 failures, all expected:
Contains delegates to the ICollectionAll of these have been discussed already, so I won't go into them now.
LinqBridge had a total of 33 failures. I haven't looked into them in detail, but just going from the test output I've broken them down into the following broad categories:
Optimization: Cast never returns the original source, presumably always introducing an intermediate iterator. All three of Microsoft's "missed opportunities" listed above are also missed in LinqBridge Use of input sequences: Except and Intersect appear to read the first sequence first (possibly completely?) and then the second sequence. Edulinq and LINQ to Objects read the second sequence completely and then stream the first sequence. This behaviour is undocumented. Join, GroupBy and GroupJoin appear not to be deferred at all. If I'm right, this is a definite bug. Aggregation accuracy: both Average and Sum over an IEnumerableMono failed 18 of the tests. There are fewer definite bugs than in LinqBridge, but it's definitely not perfect. Here's the breakdown:
Optimization: Mono misses the same three opportunities that LinqBridge and Microsoft miss. Contains(item) delegates to ICollectionIt didn't seem fair to only test other implementations against the Edulinq tests. After all, it's only natural that my tests should work against my own code. What happens if we run the Mono and LinqBridge tests against my code?
The LinqBridge tests didn't find anything surprising. There were two failures:
I don't have the "delegate Contains to ICollectionThe Mono tests picked up the same two failures as above, and two genuine bugs:
By implementing Take via TakeWhile, I was iterating too far: in order for the condition to become false, we had to iterate to the first item we wouldn't return. ToLookup didn't accept null keys - a fault which propagated to GroupJoin, Join and GroupBy too. (EDIT: It turns out that it's more subtle than that. Nothing should break, but the MS implementation ignores null keys for Join and GroupJoin. Edulinq now does the same, but I've raised a Connect issue to suggest this should at least be documented.)I've fixed these in source control, and will add an addendum to each of the relevant posts (Take, ToLookup) when I have a moment spare.
There's one additional failure, trying to find the average of a sequence of two Int64.MaxValue values. That overflows on both Edulinq and LINQ to Objects - that's the downside of using an Int64 to sum the values. As mentioned, Mono suffers a degree of inaccuracy instead; it's all a matter of trade-offs. (A really smart implementation might use Int64 while possible, and then go up to using Double where necessary, I suppose.)
Unfortunately I don't have the tests for the Microsoft implementation, of course... I'd love to know whether there's anything I've failed with there.
This was very interesting - there's a mixture of failure conditions around, and plenty of "non-failures" where each implementation's tests are enforcing their own behaviour.
I do find it amusing that all three of the "mainstream" implementations have the same OrderByDescending bug though. Other than that, the clear bugs between Mono and LinqBridge don't intersect, which is slightly surprising.
It's nice to see that despite not setting out to create a "production-quality" implementation of LINQ to Objects, that's mostly what I've ended up with. Who knows - maybe some aspects of my implementation or tests will end up in Mono in the future :)
Given the various different optimizations mentioned in this post, I think it's only fitting that next time I'll discuss where we can optimize, where it's worth optimizing, and some more tricks we could still pull out of the bag...


0 comments:
Post a Comment