The second and third AOOOD operators today... if I'm brave enough to tackle Average tomorrow, I'll have done them all. More surprises here today, this time in terms of documentation...
Min and Max are both extension methods with 22 overloads each. Min looks like this:
public static int Min(this IEnumerablepublic static int Min
this IEnumerable
Func
public static int? Min(this IEnumerable
public static int? Min
this IEnumerable
Func
public static TSource Min
public static TResult Min
this IEnumerable
Func
)
(Max is exactly the same as Min; just replace the name.)
The more obvious aspects of the behaviour are as follows:
source and selector mustn't be null All overloads use immediate execution The minimum or maximum value within the sequence is returned If a selector is present, it is applied to each value within source, and the maximum of the projected values is returned. (Note how the return type of these methods is TResult, not TSource.)Some less obvious aspects - in all cases referring to the result type (as the source type is somewhat incidental when a selector is present; it doesn't affect the behaviour):
The type's IComparableThe first point is particularly interesting when you consider the double and float types, and their "NaN" (not-a-number) values. For example, Math.Max regards NaN as greater than positive infinity, but Enumerable.Max regards positive infinity as being the greater of the two. Math.Min and Enumerable.Min agree, however, that NaN is less than negative infinity. (It would actually make sense to me for NaN to be treated as the numeric equivalent of null here, but that would be strange in other ways...) Basically, NaN behaves oddly in all kinds of ways. I believe that IEEE-754-2008 actually specifies behaviour with NaNs which encourages the results we're getting here, but I haven't verified that yet. (I can't find a free version of the standard online, which is troubling in itself. Ah well.)
The behaviour of the nullable and non-nullable types is well documented for the type-specific overloads using int, Nullable
(I should point out that ArgumentException isn't actually mentioned either for the case where values are incomparable, but that feels like a slightly less important offence for some reason. Possibly just because it didn't trip me up.)
If I remember, I'll open a Connect issue against this hole in the documentation when I find time. Unlike the optimizations and set ordering (where it's reasonably forgivable to deliberately omit implementation details from the contract) you simply can't predict the behaviour in a useful way from the documentation here. And yes, I'm going on about this because it bit me. I had to resort to writing tests and running them against LINQ to Objects to see if they were correct or not. (They were incorrect in various places.)
If you look at the behaviour of the non-generic methods, the generic ones are entirely consistent of course.
There are a couple of things which you might consider "missing" in terms of Max and Min:
The ability to find out the minimum/maximum value of a sequence by a projection. For example, consider a sequence of people. We may wish to find the youngest person in the sequence, in which case we'd like to be able to write something like: var oldest = people.MaxBy(person => person.Age); We can find the maximum age itself easily enough - but then we'd need a second pass to find the first person with that age. I've addressed this in MoreLINQ with the MaxBy and MinBy operators. The System.Interactive assembly in Reactive Extensions has the same methods too. The ability to specify a custom IComparerStill, at least that means there's less to test...
I decided I really couldn't find the energy to replicate all the tests for every type involved here. Instead, I have a bunch of tests for int and Nullable
The tests cover:
Argument validation Empty sequences Sequences of null values where applicable Projections of the above Generic tests for nullable and non-nullable value types, and reference types (with empty sequences etc) Incomparable valuesOkay, let's start off with the simplest detail: the order of implementation:
Max(int) Max(generic) Cut and paste Max implementations for other numeric types (replace the type name, basically) Cut and paste the entirety of Max to Min: Replace "Max" with "Min" everywhere Replace " < " with " > " everywhere (only 4 occurrences; basically the results of calling Compare or ComparerTo and comparing with 0)Just as with Sum, I could have used templating - but I don't think it would actually have saved me significant time.
This time, I thought I'd use Select internally for the overloads with selectors (unlike my approach for Sum which used identity projections). There's no particular reason for this - I just thought it would be interesting to try both approaches. Overall, I think I prefer this one, but I haven't done any benchmarking to find out the relative performance penalties.
Each set of numeric overloads calls into a single pair of generic "implementation" methods. These aren't the public general-purpose ones: they require that the types in use implement IComparable
I've separated out the nullable implementations from the non-nullable ones, as the behaviour differs significantly between the two.
Here's the public code for int:
public static int Max(this IEnumerable{
return PrimitiveMax(source);
}
public static int Max
this IEnumerable
Func
{
return PrimitiveMax(source.Select(selector));
}
public static int? Max(this IEnumerable
{
return NullablePrimitiveMax(source);
}
public static int? Max
this IEnumerable
Func
{
return NullablePrimitiveMax(source.Select(selector));
}
All the methods consider argument validation to be somebody else's problem - either Select or the generic method we're calling to find the maximum value. Part of me thinks this is lazy; part of me likes it in terms of not repeating code. All of me would prefer the ability to specify non-nullable parameters declaratively...
Here are the "primitive" methods called into above:
private static T PrimitiveMax
{
if (source == null)
{
throw new ArgumentNullException("source");
}
using (IEnumerator
{
if (!iterator.MoveNext())
{
throw new InvalidOperationException("Sequence was empty");
}
T max = iterator.Current;
while (iterator.MoveNext())
{
T item = iterator.Current;
if (max.CompareTo(item) < 0)
{
max = item;
}
}
return max;
}
}
private static T? NullablePrimitiveMax
{
if (source == null)
{
throw new ArgumentNullException("source");
}
T? max = null;
foreach (T? item in source)
{
if (item != null &&
(max == null || max.Value.CompareTo(item.Value) < 0))
{
max = item;
}
}
return max;
}
The first method is interesting in terms of its approach to throwing an exception if the first element isn't present, and using that as an initial candidate otherwise.
The second method needs to consider nullity twice on each iteration:
Is the item from the sequence null? If so, we can ignore it. Is our "current maximum" null? If so, we can replace it with the item from the sequence without performing a comparison.Now there's one case which is ambiguous here: when both values are null. At that point we can choose to replace our "current maximum" with the item, or not... it doesn't matter as the values are the same anyway. It is important that we don't try to perform a comparison unless both values are non-null though... the short-circuiting && and || operators keep us safe here.
Having implemented the code above, all the interesting work lies in the generic forms. Here we don't have different public methods to determine which kind of behaviour we'll use: but I wrote two private methods instead, and just delegated to the right one from the public one. This seemed cleaner than putting the code all in one method:
public static TSource Maxthis IEnumerable
{
if (source == null)
{
throw new ArgumentNullException("source");
}
return default(TSource) == null ? NullableGenericMax(source) : NonNullableGenericMax(source);
}
public static TResult Max
this IEnumerable
Func
{
return Max(source.Select(selector));
}
private static T NonNullableGenericMax
{
Comparer
using (IEnumerator
{
if (!iterator.MoveNext())
{
throw new InvalidOperationException("Sequence was empty");
}
T max = iterator.Current;
while (iterator.MoveNext())
{
T item = iterator.Current;
if (comparer.Compare(max, item) < 0)
{
max = item;
}
}
return max;
}
}
private static T NullableGenericMax
{
Comparer
T max = default(T);
foreach (T item in source)
{
if (item != null &&
(max == null || comparer.Compare(max, item) < 0))
{
max = item;
}
}
return max;
}
As you can tell, there's a significant similarity between the "PrimitiveMax" and "NonNullableGenericMax" methods, and likewise between "NullablePrimitiveMax" and "NullableGenericMax". This should come as no surprise. Fundamentally the difference is just between using an IComparable
Once I'd discovered the correct behaviour, this was reasonably simple. Of course, the above code wasn't my first implementation, where I'd completely forgotten about null values, and hadn't thought about how the nullability of the source type might affect the behaviour of empty sequences...
If you're ever in a company which rewards you for checking in lots of lines of code, offer to implement Sum/Min/Max. This weekend I've checked in about 2,500 lines of code in (split between production and test) and none of it's been terribly hard. Of course, if you're ever in such a company you should also consider looking for another job. (Have I mentioned that Google's hiring? Email me if you're interested. I'm serious.)
As you can tell, I was slightly irritated by the lack of clarity around the documentation in some places - but I find it interesting that even a simple-sounding function like "find the maximum value from a sequence" should need the kind of documentation that's missing here. I'm not saying it's a failure of the design - more just musing how a complete specification is almost always going to be longer than you might think at first glance. And if you think I was diligent here, think again: I didn't bother specifying which maximum or minimum value would be returned if there were two. For example, if a sequence consists of references to two equal but distinct strings, which reference should be returned? I have neither stated what my implementation (or the LINQ to Objects implementation) will do, nor tested for it.
Next up is Average - a single method with a mere 20 overloads. There are various corner cases to consider... but that's a post for another day.


0 comments:
Post a Comment