# Performance differences when using AVX instructions

Recent news on exploits on both Meltdown and Spectre got me thinking and researching a bit more in depth on Assembly. I ended up reading on the differences and performance gains when using SIMD instructions versus naive implementations. Let’s briefly discuss what SIMD is.

SIMD (Single instruction, multiple data) is the process of piping vector data through a single instruction, effectively speeding up the calculations significantly. Given that SIMD instruction can process larger amount of data in parallel atomically, SIMD does provide a significant performance boost when used. Real-Life applications of SIMD are various, ranging from image processing, audio processing and graphics generation.

Let’s investigate the real performance gains when using SIMD instructions – in this case we’ll be using AVX (Advanced Vector Extensions), which provides newer SIMD instructions. We’ll be using several SIMD instructions, such as VADDPS. VSUBPS, VMULPS, and VDIVPS. Each instruction is responsible for adding, subtracting, multiplying and dividing single precision numbers (floats).

In reality, we will not be writing any Assembly at all, we’ll be using Intrinsics, which ship directly with any decent C/C++ compiler. For our example, we’ll be using MSVC compiler, but any decent compiler will do. The Intel Intrinsics Guide provides a very good platform to look up any required intrinsic functions one may need, thus removing the need to write Assembly, just C code.

There are two benchmarks for each arithmetic operation: one is done naively and one is done using intrinsics thus using the necessary AVX instruction. Each operation is  performed 200,000,000 times thus to make sure that there is enough time to demonstrate it for a benchmark,

Here’s an example of how the multiplication is implemented naively:

```void DoNaiveMultiplication(int iterations)
{
float z;

for (int i = 0; i < iterations; i++)
{
z = x * y;
z = x * y;
z = x * y;
z = x * y;
z = x * y;
z = x * y;
z = x * y;
z = x * y;
}
}
```

Here's an example of how the multiplication is implemented in AVX:

void DoAvxMultiplication(int iterations)
{
__m256 result;

for (int i = 0; i < iterations; i++)
{
result = _mm256_mul_ps(x256, y256);
}
}

Finally, let's take a look on how the results look: 