Introduction
Aggregate functions are used to get summary outputs from queries – things like the total count of elements, summed values, average values, minimum and maximum values. It also includes the All function that will return a Boolean true or false value depending on whether all elements in a collection satisfy a particular condition, and the Any function that returns a Boolean based on whether any element in a collection satisfy a condition.
The syntax of a LINQ expression that uses basic aggregate functionality is fairly simple. For instance, if you just want to return a total count of a complete collection, or the minimum or maximum value, then it’s quite straightforward, as you’ll see in a moment. But in reality, you’ll often want to get these kind of aggregated results for subsets of the data, in which case you’ll need to apply the aggregate functions to data that you've grouped, using the Group By clause in LINQ. In that situation, you have to move up from the Visual Basic clauses and use method based queries. These method based queries mostly include the use of Lambda expressions, a relative newcomer to the game for many of us. Although they can seem quite daunting at first, unless you have very complex needs, they’re not really that tough to master.
Using the Aggregate Clause
Let’s start with something quite simple. Using the Northwind database, you want to get some aggregate values based on Freight costs in the Orders table. If you’ve read my earlier blogs on LINQ to SQL, you’ll have seen how I created the object model. If you want to refer to those blogs, you can find them here and here. The object model I’m using is named NorthwindOrders and the data context is therefore named NorthwindOrdersDataContext. In addition I’ve created an instance of the NorthwindOrdersDataContext that I’ve named NWOrderContext.
Here’s the basic LINQ query:
Dim NWOrders = Aggregate orders In NWOrderContext.Orders
Into Sum(orders.Freight),
Average(orders.Freight),
Max(orders.Freight)
You’ll see that the usual From clause that I’ve used in all the previous queries has been replaced with the Aggregate clause. In addition each of aggregate functions – Sum, Average and Max in this example – performs a calculation and then stores the result in a property that takes the same name as the function. That is, the sum is contained in a property named Sum, the average in a property named Average, and so on. If for some reason you wanted to change the names of any of these properties, you can do that, but it’s not a common thing to do. In that case, you’d use code like:
AverageFreightCost = Average(orders.Freight),
in the LINQ query.
To display the results, you simply access the properties of the anonymous type that the query created:
Console.WriteLine("Total Freight Costs = {0}", FormatCurrency(NWOrders.Sum))
Console.WriteLine("Average Freight Cost = {0}", FormatCurrency(NWOrders.Average))
Console.WriteLine("Most Expensive Item = {0}", FormatCurrency(NWOrders.Max))
Note that because the results aren’t collections, you don’t have to enumerate through them in the way I’ve done previously. The resulting output will be:
Clauses versus the method based approach
I’ll move on to the method based approach in a moment, but here I just want to point out that the Visual Basic clauses are there to make the syntax a little easier for the developer to use. What actually happens under the hood is that the compiler translates the clauses into method based queries for you. In my opinion, you may as well take advantage of the easier to use clauses whenever you can, especially if you’re just starting out with LINQ.
Demystifying Lambda Expressions
I’ve seen some pretty mind-boggling Lambda expressions around the place, but our requirements here are fairly straightforward. However, you do need to get a handle on what they are and how they work.
I’m assuming that you’re familiar with functions in Visual Basic. For example, a simple function that adds 1 to an integer value that’s passed in as a parameter:
Private Function AddOne(ByVal baseNumber As Integer) As Integer
Return baseNumber + 1
End Function
This function has a name (AddOne), a parameter (baseNumber) and it returns a value (baseNumber plus 1). The Lambda equivalent of this function doesn’t need a name and it bundles everything together in a single in-line expression. It would look something like this:
Function(baseNumber) baseNumber + 1
I think you can easily see the correlation between the two approaches in this simple example.
So, in the main example I’m going to use, the Sum aggregate function will take a Lambda expression that uses an Order object as the parameter and the value of the Freight property of an Order object as the expression. It looks like this:
Sum(Function(o) o.Freight)
It’s pretty meaningless on its own, but if I don’t strip it out of the full LINQ query, it all looks frighteningly complicated. But if you compare the two Lambdas I’ve shown – the baseNumber one and the Sum one, you can see that each has a parameter and an expression that returns a value. In the case of the Sum example, ‘o’ represents an Order object, as you’ll see when I show you the full code sample. But for now (based on my own personal learning curve) the key point to understand is how the general syntax of a Lambda expression works.
If you’re still not clear on this, I suggest that you try building your own simple Lambda expressions based on standard functions (whether your own creations or those you find in the .NET classes). If you play around with this until you feel comfortable with the general concept, you’ll find it much easier to use Lambda expressions in the bigger picture.
A LINQ query that sums freight costs by company
The LINQ query that I’m going to use isn’t simple, but it’s a realistic example of the kind of thing you may need to create in your real world applications. I’ll break it down into what I hope are manageable chunks.
The task is to build a LINQ query that lists all companies in the Northwind database who have placed orders and show the total freight cost for each individual company. Here’s the finished query:
Dim NWOrders = From c In NWOrderContext.Customers
Group Join o In NWOrderContext.Orders
On c.CustomerID Equals o.CustomerID
Into customersWithOrders = Group
Select New With {Key c.CompanyName,
Key .TotalAmount = customersWithOrders.Sum(Function(o) o.Freight)}
The first line of this query is the same as I’ve used in previous LINQ queries, so should be familiar to you at this stage. It might be useful though at this point to remind ourselves what the object model looks like:
The model has classes that represent two of the tables in the Northwind database. The LINQ query uses a Join to identify those customers who have entries in the Order table – i.e. customers with orders.
Dim NWOrders = From c In NWOrderContext.Customers
Group Join o In NWOrderContext.Orders
On c.CustomerID Equals o.CustomerID
Into customersWithOrders = Group
Select New With {Key c.CompanyName,
Key .TotalAmount = customersWithOrders.Sum(Function(o) o.Freight)}
We now have references to both customers and orders in the query and have specifically filtered in only those customers who have orders. Notice though that the second line starts with the Group keyword, so this will ensure that all the results will be grouped. The next line names the group as customersWithOrders:
Dim NWOrders = From c In NWOrderContext.Customers
Group Join o In NWOrderContext.Orders
On c.CustomerID Equals o.CustomerID
Into customersWithOrders = Group
Select New With {Key c.CompanyName,
Key .TotalAmount = customersWithOrders.Sum(Function(o) o.Freight)}
By the way, you don’t have to break the query into lines as I’ve done here. I’ve only done it in order to try and break the whole thing down into understandable parts.
The fifth line begins to create the anonymous type and sets one of its properties – CompanyName. This property takes its name directly from the CompanyName property of the Customer object. If I needed to provide an alias to rename this property, I could do so here, but haven’t.
Dim NWOrders = From c In NWOrderContext.Customers
Group Join o In NWOrderContext.Orders
On c.CustomerID Equals o.CustomerID
Into customersWithOrders = Group
Select New With {Key c.CompanyName,
Key .TotalAmount = customersWithOrders.Sum(Function(o) o.Freight)}
‘Select New With’ instructs the compiler to create a new anonymous type that will contain some properties. So far in this example, as at the end of line 5, it has a property named CompanyName.
The Key keyword needs some explanation though, if you’re not too familiar with anonymous types. When an anonymous type is created, it’s possible to designate some properties as key properties. Bear in mind that although this article deals only with LINQ and anonymous types, you can create anonymous types in many scenarios other than LINQ related ones. The properties are ‘key’ in the sense that they are the only properties that are considered when a comparison is made between instances to see if they are equal. Key properties are also automatically read only.
I discovered recently that all properties of anonymous types that are created by a query expression like the one shown above, are actually set as key properties by default. However, as it’s useful to keep in mind that key properties are an option that you should consider when working outside of the LINQ environment I’ve chosen to include this (admittedly redundant) keyword in the query.
The final line at last brings us back to the Lambda expression:
Dim NWOrders = From c In NWOrderContext.Customers
Group Join o In NWOrderContext.Orders
On c.CustomerID Equals o.CustomerID
Into customersWithOrders = Group
Select New With {Key c.CompanyName,
Key .TotalAmount = customersWithOrders.Sum(Function(o) o.Freight)}
The second property of the anonymous type is named TotalAmount. It gets its values by looking into that group of customers created earlier in the query and applying the Sum function to pull the individual sums for each customer.
The Lambda expression, as I explained earlier, takes the Order object as a parameter, and its expression returns the Freight value for that object instance. When the LINQ query runs, the result is an anonymous type whose two properties are CompanyName and TotalAmount. TotalAmount is a sophisticated little property that gets the Freight cost sum for each individual collection of Order objects for each customer (customersWithOrders).
On reflection, I see now that I could probably have devised a better name for the customersWithOrders group. It probably would have been better to call it ‘CustomersOrdersGrouped’ or something similar. I hope my naming hasn’t overly confused you.
The results of this query can be displayed by enumerating through the collection of anonymous type instances, in the way I’ve done in previous examples:
For Each CompanyOrders In NWOrders
Console.WriteLine(String.Format("{0} ({1}) ", CompanyOrders.CompanyName,
FormatCurrency(CompanyOrders.TotalAmount)))
Next
The final few lines of the resulting output look like this:
Summary
You can of course now use the same techniques with other aggregate functions, such as Average, Max, Min, etc. You can also combine more than one these in the same query expression if you need to. I may create some examples of this in a future blog, although hopefully you now won’t need me to do so.
As I usually do, I’ve tried to break it down into manageable chunks of understanding and gone through it step by step. I hope I’ve succeeded in making a difficult topic a little easier to understand.
Posted Apr 06 2011, 01:20 PM by Ged Mead Filed under: .NET, VB.NET, Visual Basic.NET, LINQ, LINQ to SQL, Key property, Anonymous type

0 comments:
Post a Comment