Recently I encountered a case where Linq will fall over flatly in terms of performance compared to running your own optimisations.
The case in point is where data is already sorted.
In this scenario I had to retrieve from a data dump of a simulation different values to make meaningful analysis of the drive data. In this case the Frame data was sorted based on duration and distance traveled.
I was using Linq to isolate subsections of the drive for analysis, for example:
var framesBetween = from p in frames where p.SceneData.DistanceTravelled >= bucketStartDistance && p.SceneData.DistanceTravelled < bucketEndDistance select p;
The issue with the above statement, when using standard off the shelf List and ObservableCollection's inside .NET is that the Linq processor seems to be going through the entire array or list before returning the data set. This approach kind of makes sense since .NET has no awareness that the data is already sorted, and therefore would have to scan the entire array.
I ended up having to role out of my own solution which essentially did a for loop where I would exit from the loop when the required distance is reached.
FrameCollection outputCollection = new FrameCollection();
bool inState = true;
for (; startIndex < masterCollection.Count; ++startIndex)
{
Frame frame = masterCollection[startIndex];
if (frame.SceneData.DistanceTravelled >= distanceStart)
{
inState = true;
}
if (inState == true)
{
outputCollection.Add(frame);
}
if (includeFinalFrame == false && frame.SceneData.DistanceTravelled >= distanceEnd)
{
break;
}
if (includeFinalFrame == true && frame.SceneData.DistanceTravelled > distanceEnd)
{
break;
}
}
return outputCollection;
One thing I will investigate and put into a later article is if LINQ will optimise for sorted data in the case where you are using a container type such as SortedList<T> and take into account the above optimisations.