In SQL, the DISTINCT keyword allows us to filter out duplicate values, ensuring that each record appears only once based on the specified column(s). However, when dealing with entire records, using DISTINCT on all columns can result in unexpected outcomes since each record typically has a unique identifier (like an Id). To effectively use DISTINCT, we need to determine the specific column or combination of columns we want to filter by. For example, consider the following query:
SELECT DISTINCT City FROM Customers;
The above query retrieves a unique list of customer locations.. Now let's see how we can achieve the same with the help of LINQ?
LINQ already has a Distinct method that we can use but, unfortunately not all LINQ methods could be translated to SQL. For that reason the code examples below are separated in two sections - LINQ queries that can only be executed in memory and LINQ queries where SQL server execution is possible.
In the approach you will need a IEqualityComparer class that will describe how two Customer records are compared by one another.
var context = new SupermarketContext();
using (context)
{
var customers = context.Customer.ToList().Distinct(new CustomerComparer());
}
public class CustomerComparer : IEqualityComparer<Customer>
{
public bool Equals(Customer x, Customer y)
{
if (x == null || y == null)
return false;
return x.City == y.City;
}
public int GetHashCode(Customer obj)
{
if (obj == null)
return 0;
// Combine hash codes for Name and City properties
int hashCity = obj.City == null ? 0 : obj.City.GetHashCode();
return hashCity;
}
}
DistinctBy is designed for use with one or multiple properties but cannot be translated directly into a SQL query.
var context = new SupermarketContext();
using (context)
{
var customers = context.Customer.ToList().DistinctBy(x => x.City);
}
You can even use DistinctBy with multiple properties like so
var context = new SupermarketContext();
using (context)
{
var customers = context.Customer.ToList().DistinctBy(x => new { x.City, x.Name });
}
By combining Select and Distinct you can accomplish server side filtering. It will work with both single property and multiple properties.
var queryString = context.Customer
.Select(c => new { c.City, c.Name })
.Distinct()
.ToQueryString();
SELECT DISTINCT [c].[City], [c].[Name] FROM [Customer] AS [c]
The result of this operation will be a list of anonymous type objects having two properties City and Name.
Now let's say that you don’t want just the City and Name but the entire Customer object. Consider this example table:
Id | Name | City | Address |
---|---|---|---|
1 | Alice Smith | New York | 123 Main St |
2 | Bob Johnson | San Francisco | 789 Pine Street |
3 | Eve Adams | Seattle | 456 Elm St |
4 | John Doe | Seattle | 143 Spruce St |
5 | John Doe | Seattle | 64 Strawberry St |
Running
var distinctCustomers = context.Customer
.Select(c => new { c.City, c.Name })
.Distinct();
will group records 4 and 5 into one consolidated result. If the goal is to take one customer representation for each distinct group, then we need to decide which exact customer will be that, based on some other criteria. I decided that I want the last customer to represent the group.
var distinctQuery = context.Customer
.GroupBy(x => new { x.Name, x.City })
.Select(g => g.OrderBy(x=>x.Id).Last()).ToQueryString();
SELECT [c3].[Id], [c3].[Address], [c3].[City], [c3].[Name]
FROM (
SELECT [c].[Name], [c].[City]
FROM [Customer] AS [c]
GROUP BY [c].[Name], [c].[City]
) AS [c1]
LEFT JOIN (
SELECT [c2].[Id], [c2].[Address], [c2].[City], [c2].[Name]
FROM (
SELECT [c0].[Id], [c0].[Address], [c0].[City], [c0].[Name], ROW_NUMBER() OVER(PARTITION BY [c0].[Name], [c0].[City] ORDER BY [c0].[Id] DESC) AS [row]
FROM [Customer] AS [c0]
) AS [c2]
WHERE [c2].[row] <= 1
) AS [c3] ON [c1].[Name] = [c3].[Name] AND [c1].[City] = [c3].[City]
To pull the last customer from each distinct group we have to order them by Id descending and take the first from each group.
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY [City] ORDER BY [Id] DESC) AS ROW_NUM FROM [dbo].[Customer]) AS City_Partition
WHERE City_Partition.ROW_NUM = 1
var customers = context.Customer
.FromSql($"SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY [City] ORDER BY [Id] DESC) AS ROW_NUM FROM [dbo].[Customer]) AS City_Partition
WHERE City_Partition.ROW_NUM = 1")
.ToList();