Finding the maximum value in an array using vectorization
This post is part of the series 'SIMD'. Be sure to check out the rest of the blog posts of the series!
- Faster Guid comparisons using Vectors (SIMD) in .NET
- Finding the maximum value in an array using vectorization (this post)
- Replace characters in a string using Vectorization
In the previous post, I described vectorization in .NET and how it can be useful to improve the performance of algorithms.
In this post, I'll explain how to use vectorization to find the maximum value of an array. The algorithm is already implemented in Enumerable.Max
. You can find the original code on GitHub. I simplified the code to make it more readable and added comments to explain how it works.
C#
internal static int MaxIntegerSimd(ReadOnlySpan<int> span)
{
int result;
int index;
if (Vector.IsHardwareAccelerated && span.Length >= Vector<int>.Count * 2)
{
// The span is at least two vectors long. Create a vector from the first N elements,
// and then repeatedly compare that against the next vector from the span. At the end,
// the resulting vector will contain the maximum values found, and we then need only
// to find the max of those.
var maxes = new Vector<int>(span);
index = Vector<int>.Count; // 8 when AVX2 is supported
do
{
// Vector.Max returns a new Vector with the maximum of each item:
// result[0] = max(left[0], right[0])
// result[1] = max(left[1], right[1])
// ...
// result[8] = max(left[8], right[8])
//
// Vector.Max is a single instruction (vpmaxsd), so it is much faster than
// comparing each items manually in a for loop
// https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_max_epi32&expand=3327&ig_expand=4537
//
// For instance,
// left [2, 5, 8, 4, 8, 8, 6, 2]
// right [8, 7, 1, 3, 9, 9, 0, 8]
// result [8, 7, 8, 4, 9, 9, 6, 8]
var right = new Vector<int>(span.Slice(index));
maxes = Vector.Max(maxes, right);
// Continue with the next 8 items of the span
index += Vector<int>.Count;
}
while (index + Vector<int>.Count <= span.Length);
// The "maxes" vector contains the 8 highest values.
// We need to find the maximum values from the 8 values.
result = maxes[0];
for (int i = 1; i < Vector<int>.Count; i++)
{
if (maxes[i] > result)
{
result = maxes[i];
}
}
}
else
{
// The vectorization is not possible => use the basic implementation
result = span[0];
index = 1;
}
// Iterate through the remaining elements, comparing against the max.
// This is needed when the number of items in the collection is not a
// multiple of Vector<int>.Count.
// The for loop is fast as the number of iteration is less than Vector<int>.Count.
// note: the uint cast is a trick to help the JIT removing bound checks.
for (int i = index; (uint)i < (uint)span.Length; i++)
{
if (span[i] > result)
result = span[i];
}
return result;
}
#Benchmark
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22616
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.4.22252.9
[Host] : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT
DefaultJob : .NET 7.0.0 (7.0.22.22904), X64 RyuJIT
Method | N | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
Simple | 8 | 4.201 ns | 0.0600 ns | 0.0532 ns | 1.00 |
Vectorized | 8 | 5.631 ns | 0.0747 ns | 0.0699 ns | 1.34 |
Simple | 16 | 7.483 ns | 0.1118 ns | 0.1046 ns | 1.00 |
Vectorized | 16 | 4.126 ns | 0.0398 ns | 0.0372 ns | 0.55 |
Simple | 100 | 49.090 ns | 0.9797 ns | 1.5252 ns | 1.00 |
Vectorized | 100 | 9.330 ns | 0.1296 ns | 0.1212 ns | 0.19 |
Simple | 1000 | 443.727 ns | 6.9752 ns | 6.5246 ns | 1.00 |
Vectorized | 1000 | 65.149 ns | 0.6617 ns | 0.6189 ns | 0.15 |
Simple | 10000 | 4,369.990 ns | 51.4527 ns | 45.6115 ns | 1.00 |
Vectorized | 10000 | 572.290 ns | 5.7642 ns | 5.3918 ns | 0.13 |
#Additional resources
- Faster Guid comparisons using Vectors (SIMD) in .NET
- Enumerable.MaxInteger
- Exploring .NET Core platform intrinsics: Part 3 - Viewing the code generated by the JIT
Do you have a question or a suggestion about this post? Contact me!
Enjoy this blog?💖 Sponsor on GitHub