More design decisions around optimization today, but possibly less controversial ones...
Cast and OfType are somewhat unusual LINQ operators. They are extension methods, but they work on the non-generic IEnumerable type instead of the generic IEnumerable type:
public static IEnumerable Cast(this IEnumerable source) public static IEnumerable OfType(this IEnumerable source)
It's worth mentioning what Cast and OfType are used for to start with. There are two main purposes:
Using a non-generic collection (such as a DataTable or an ArrayList) within a LINQ query (DataTable has the AsEnumerable extension method too)Changing the type of a generic collection, usually to use a more specific type (e.g. you have List but you're confident they're all actually Employee instances - or you only want to query against the Employee instances)
I can't say that I use either operator terribly often, but if you're starting off from a non-generic collection for whatever reason, these two are your only easy way to get "into" the LINQ world.
Here's a quick rundown of the behaviour they have in common:
The source parameter must not be null, and this is validated eagerly It uses deferred execution: the input sequence is not read until the output sequence is It streams its data - you can use it on arbitrarily-long sequences and the extra memory required will be constant (and small :)
Both operators effectively try to convert each element of the input sequence to the result type (TResult). When they're successful, the results are equivalent (ignoring optimizations, which I'll come to later). The operators differ in how they handle elements which aren't of the result type.
Cast simply tries to cast each element to the result type. If the cast fails, it will throw an InvalidCastException in the normal way. OfType, however, sees whether each element is a value of the result type first - and ignores it if it's not.
There's one important case to consider where Cast will successfully return a value and OfType will ignore it: null references (with a nullable return type). In normal code, you can cast a null reference to any nullable type (whether that's a reference type or a nullable value type). However, if you use the "is" C# operator with a null value, it will always return false. Cast and OfType follow the same rules, basically.
It's worth noting that (as of .NET 3.5 SP1) Cast and OfType only perform reference and unboxing conversions. They won't convert a boxed int to a long, or execute user-defined conversions. Basically they follow the same rules as converting from object to a generic type parameter. (That's very convenient for the implementation!) In the original implementation of .NET 3.5, I believe some other conversions were supported (in particular, I believe that the boxed int to long conversion would have worked). I haven't even attempted to replicate the pre-SP1 behaviour. You can read more details in Ed Maurer's blog post from 2008.
There's one final aspect to discuss: optimization. If "source" already implements IEnumerable, the Cast operator just returns the parameter directly, within the original method call. (In other words, this behaviour isn't deferred.) Basically we know that every cast will succeed, so there's no harm in returning the input sequence. This means you shouldn't use Cast as an "isolation" call to protect your original data source, in the same way as we sometimes use Select with an identity projection. See Eric Lippert's blog post on degenerate queries for more about protecting the original source of a query.
In the LINQ to Objects implementation, OfType never returns the source directly. It always uses an iterator. Most of the time, it's probably right to do so. Just because something implements IEnumerable doesn't mean everything within it should be returned by OfType... because some elements may be null. The same is true of an IEnumerable - but not an IEnumerable. For a non-nullable value type T, if source implements IEnumerable then source.OfType() will always contain the exact same sequence of elements as source. It does no more harm to return source from OfType() here than it does from Cast().
There are "obvious" tests for deferred execution and eager argument validation. Beyond that, I effectively have two types of test: ones which focus on whether the call returns the original argument, and ones which test the behaviour of iterating over the results (including whether or not an exception is thrown).
The iteration tests are generally not that interesting - in particular, they're similar to tests we've got everywhere else. The "identity" tests are more interesting, because they show some differences between conversions that are allowed by the CLR and those allowed by C#. It's obvious that an array of strings is going to be convertible to IEnumerable, but a test like this might give you more pause for thought:
[Test] public void OriginalSourceReturnedForInt32ArrayToUInt32SequenceConversion() { IEnumerable enums = new int[10]; Assert.AreSame(enums, enums.Cast()); }
That's trying to "cast" an int[] to an IEnumerable. If you try the same in normal C# code, it will fail - although if you cast it to "object" first (to distract the compiler, as it were) it's fine at both compile time and execution time:
We can have a bit more fun at the compiler's expense, and note its arrogance:
int[] ints = new int[10]; if (ints is IEnumerable) { Console.WriteLine("This won't be printed"); } if (((object) ints) is IEnumerable) { Console.WriteLine("This will be printed"); }
This generates a warning for the first block "The given expression is never of the provided (...) type" and the compiler has the cheek to remove the block entirely... despite the fact that it would have worked if only it had been emitted as code.
Now, I'm not really trying to have a dig at the C# team here - the compiler is actually acting entirely reasonably within the rules of C#. It's just that the CLR has subtly different rules around conversions - so when the compiler makes a prediction about what would happen with a particular cast or "is" test, it can be wrong. I don't think this has ever bitten me as an issue, but it's quite fun to watch. As well as this signed/unsigned difference, there are similar conversions between arrays of enums and their underlying types.
There's another type of conversion which is interesting:
[Test] public void OriginalSourceReturnedDueToGenericCovariance() { IEnumerable strings = new List(); Assert.AreSame(strings, strings.Cast
0 comments:
Post a Comment