A nice easy pair of operators tonight. I should possibly have covered them at the same time as First/Last/Single and the OrDefault variants, but never mind...
ElementAt and ElementAtOrDefault have a single overload each:
public static TSource ElementAtthis IEnumerable
int index)
public static TSource ElementAtOrDefault
this IEnumerable
int index)
Isn't that blissfully simple after the overload storm of the past few days?
The two operators work in very similar ways:
They use immediate execution. The source parameter must not be null, and this is validated immediately. They return the element at the specified zero-based index, if it's in the range 0 <= index < count.The methods only differ in their handling of an index which falls outside the given bound. ElementAt will throw an ArgumentOutOfRangeException; ElementAtOrDefault will return the default value for TSource (e.g. 0, null, false). This is true even if index is negative. You might have expected some way to specify the default value to return if the index is out of bounds, but there isn't one. (This is consistent with FirstOrDefault() and so on, but not with Nullable
This behaviour leaves us some room for common code - for once I haven't used cut and paste for the implementation. Anyway, I'm getting ahead of myself.
As you can imagine, my tests for the two operators are identical except for the expected result in the case of the index being out of range. I've tested the following cases:
Null source A negative index An index which is too big on a NonEnumerableCollection An index which is too big on a NonEnumerableList An index which is too big on a lazy sequence (using Enumerable.Range) A valid index in a NonEnumerableList A valid index in a lazy sequenceThe "non-enumerable" list and collection are to test that the optimizations we're going to perform are working. In fact, the NonEnumerableCollection test fails on LINQ to Objects - it's only optimized for IList
None of the tests are very interesting, to be honest.
As I mentioned earlier, I've used some common code for once (although I admit the first implementation used cut and paste). As the only difference between the two methods is the handling of a particular kind of failure, I've used the TryXXX pattern which exists elsewhere in the framework. There's a common method which tries to retrieve the right element as an out parameter, and indicates whether or not it succeeded via the return value. Not every kind of failure is just returned, of course - we want to throw an ArgumentNullException if source is null in either case.
That leaves our public methods looking quite straightforward:
public static TSource ElementAtthis IEnumerable
int index)
{
TSource ret;
if (!TryElementAt(source, index, out ret))
{
throw new ArgumentOutOfRangeException("index");
}
return ret;
}
public static TSource ElementAtOrDefault
this IEnumerable
int index)
{
TSource ret;
TryElementAt(source, index, out ret);
return ret;
}
TryElementAt will only return false if the index is out of bounds, so the exception is always appropriate. However, there is a disadvantage to this approach: we can't easily indicate in the exception message whether index was too large or negative. We could have specified a message which included the value of index itself, of course. I think it's a minor matter either way, to be honest.
The main body of the code is in TryElementAt, obviously. It would actually be very simple - just looping and counting up to index, checking as we went - except there are two potential optimizations.
The most obvious - and most profitable - optimization is if the collection implements IList
The less clear optimization is if the collection implements ICollection
This is the sort of optimization which is impossible to judge without real data. How often are these operators called with an invalid index? How often does that happen on a collection which implements ICollection
For the moment, I've kept all the optimizations. Here's the code:
private static bool TryElementAtIEnumerable
int index,
out TSource element)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
element = default(TSource);
if (index < 0)
{
return false;
}
ICollection
if (collection != null)
{
int count = collection.Count;
if (index >= count)
{
return false;
}
IList
if (list != null)
{
element = list[index];
return true;
}
}
ICollection nonGenericCollection = source as ICollection;
if (nonGenericCollection != null)
{
int count = nonGenericCollection.Count;
if (index >= count)
{
return false;
}
}
using (IEnumerator
{
for (int i = -1; i < index; i++)
{
if (!iterator.MoveNext())
{
return false;
}
}
element = iterator.Current;
return true;
}
}
As you can see, the optimized cases actually form the bulk of the code - part of me thinks it would be worth removing the non-IList
It's worth looking at the "slow" case where we actually iterate. The for loop looks odd, until you think that to get at element 0, you have to call MoveNext() once. We don't want to just add one to index or use a less-than-or-equal condition: both of those would fail in the case where index is int.MaxValue; we'd either not loop at all (by incrementing index and it overflowing either causing an exception or becoming negative) or we'd loop forever, as every int is less than or equal to int.MaxValue.
Another way to look at it is that the loop counter ("i") is the "current index" within the iterator: the iterator starts before the first element, so it's reasonable to start at -1.
The reason I'm drawing attention to this is that I got all of this wrong first time... and was very grateful for unit tests to catch me out.
For me, the most interesting part of ElementAt is the decision about optimization. I'm sure I'm not the only one who optimizes without data at times - but it's a dangerous thing to do. The problem is that this isn't the normal micro-optimization quandary of "it's always a tiny bit better, but it's probably insignificant and makes the code harder to read". For the cases where this is faster, it could make an enormous difference - asking for element one million of a linked list which doesn't quite have enough elements could be very painful. But do failure cases need to be fast? How common are they? As you can tell, I'm dithering. I think it's at least worth thinking about what optimizations might make a difference - even if we later remove them.
Next time, I think I'll tackle Contains - an operator which you might expect to be really fast on a HashSet


0 comments:
Post a Comment