Performance differences when using AVX instructions

Download source code from here

Recent news on exploits on both Meltdown and Spectre got me thinking and researching a bit more in depth on Assembly. I ended up reading on the differences and performance gains when using SIMD instructions versus naive implementations. Let’s briefly discuss what SIMD is.

SIMD (Single instruction, multiple data) is the process of piping vector data through a single instruction, effectively speeding up the calculations significantly. Given that SIMD instruction can process larger amount of data in parallel atomically, SIMD does provide a significant performance boost when used. Real-Life applications of SIMD are various, ranging from image processing, audio processing and graphics generation.

Let’s investigate the real performance gains when using SIMD instructions – in this case we’ll be using AVX (Advanced Vector Extensions), which provides newer SIMD instructions. We’ll be using several SIMD instructions, such as VADDPS. VSUBPS, VMULPS, and VDIVPS. Each instruction is responsible for adding, subtracting, multiplying and dividing single precision numbers (floats).

In reality, we will not be writing any Assembly at all, we’ll be using Intrinsics, which ship directly with any decent C/C++ compiler. For our example, we’ll be using MSVC compiler, but any decent compiler will do. The Intel Intrinsics Guide provides a very good platform to look up any required intrinsic functions one may need, thus removing the need to write Assembly, just C code.

There are two benchmarks for each arithmetic operation: one is done naively and one is done using intrinsics thus using the necessary AVX instruction. Each operation is  performed 200,000,000 times thus to make sure that there is enough time to demonstrate it for a benchmark,

Here’s an example of how the multiplication is implemented naively:

void DoNaiveMultiplication(int iterations)
{
    float z[8];

    for (int i = 0; i < iterations; i++)
    {
        z[0] = x[0] * y[0];
        z[1] = x[1] * y[1];
        z[2] = x[2] * y[2];
        z[3] = x[3] * y[3];
        z[4] = x[4] * y[4];
        z[5] = x[5] * y[5];
        z[6] = x[6] * y[6];
        z[7] = x[7] * y[7];
    }
}

Here's an example of how the multiplication is implemented in AVX:

void DoAvxMultiplication(int iterations)
{
__m256 x256 = _mm256_loadu_ps((__m256*)x);
__m256 y256 = _mm256_loadu_ps((__m256*)y);
__m256 result;

for (int i = 0; i < iterations; i++)
{
result = _mm256_mul_ps(x256, y256);
}
}

Finally, let's take a look on how the results look:

naivevsavx

 

AVXPerformanceGains
Performance gains when using AVX

From the graph above, one can see that when optimizing from naive to AVX, there are the following gains:

  • Addition: 217% faster – from 1141ms to 359ms
  • Subtraction: 209% faster – from 1110ms to 359ms
  • Multiplication: 221% faster- from 1156ms to 360ms
  • Division: 300% faster – from 2687ms to 672ms

Of course, the benchmarks show the best case scenarios; so real-life mileage may vary. These benchmarks can be downloaded and tested out from here. Kindly note that you’ll need either an Intel CPU from 2011 onwards (Sandy Bridge), or an AMD processor from 2011 onwards (Bulldozer)  in order to be able to run the benchmarks.

 

 

Exception Filtering in C#

What do you do when you have a piece of code that can fail, and when it fails, you need to log to a database? You wrap your code in a try-catch block and chuck a Log call in the catch block. That’s all good! What if I tell you that there is a better way to do it?

try
{
    // Code that might fail
}
catch(Exception ex)
{
    // Handle
    // Log to database
}

What’s the problem with the typical approach?

When your code enters a catch block – the stack unwinds. This refers to the process when the stack goes backwards / upwards in order to arrive the stack frame where the original call is located. Wikipedia can explain this in a bit more detail. What this means is that we might lose information with regards to the original stack location and information. If a catch block is being entered just to log to the database and then the exception is being re-thrown, this means that we’re losing vital information to discover where the issue exists; this is especially true in release / live environments.

What’s the way forward?

C# 6 offers the Exception Filtering concept; here’s how to use it.

try
{
    //Code
}
catch (FancyException fe) when (fe.ErrorCode > 0)
{
    //Handle
}

The above catch block won’t be executed if the ErrorCode property of the exception is not greater than zero. Brilliant, we can now introduce logic without interfering with the catch mechanism and avoiding stack unwinding!

A more advanced example

Let’s now go and see a more advanced example. The application below accepts input from the Console – when the input length is zero, an exception with code 0 is raised, else an exception with code 1 is raised. Anytime an exception is raised, the application logs it. Though, the exception is only caught if only if the ErrorCode is greater than 0. The complete application is on GitHub.


class Program
{
    static void Main(string[] args)
    {
        while (true)
        {
            new FancyRepository().GetCatchErrorGreaterThanZero(Console.ReadLine());
        }
    }
}

public class FancyRepository
{
    public string GetCatchErrorGreaterThanZero(string value)
    {
        try
        {
            return GetInternal(value);
        }
        catch (FancyException fe) when (LogToDatabase(fe.ErrorCode) || fe.ErrorCode > 0)
        {
            throw;
        }
    }

    private string GetInternal(string value)
    {
        if (!value.Any())
           throw new FancyException(0);

        throw new FancyException(1);
    }

    private bool LogToDatabase(int errorCode)
    {
        Console.WriteLine($"Exception with code {errorCode} has been logged");
        return false;
    }
}

 

1st Scenario – Triggering the filter

In the first scenario, when the exception is thrown by the GetInternal method, the filter successfully executes and prevents the code from entering the catch statement. This can be illustrated by the fact that Visual Studio breaks in the throw new FancyException(0); line rather than in the throw; line. This means that the stack has not been unwound; this can be proven by the fact that we can still investigate the randomNumber value. The Call Stack is fully preserved – we can go through each frame and investigate the data in each call stack.

1

2nd Scenario – Triggering the catch

In the second scenario, when the exception is thrown by the GetInternal method, the filter does not handle it due to the ErrorCode is greater than 0. This means that the catch statement is executed and the error is re-thrown. In the debugger, we can see this due to the fact that Visual Studio break in the throw; line rather than the throw new FancyException(1); line. This means that we’ve lost a stack frame; it is impossible to investigate the randomNumber value, since the stack has been unwound to the GetCatchErrorGreaterThanZero call.

2catch

What’s happening under the hood?

As one can assume, the underlying code being generated must differ at an IL level, since the stack is not being unwound. And one would assume right – the when keyword is being translated into the filter instruction.

Let’s take two try-catch blocks, and see their equivalent IL.


try
{
    throw new Exception();
}
catch(Exception ex)
{

}

Generates

 .try
 {
     IL_0003: nop
     IL_0004: newobj instance void [mscorlib]System.Exception::.ctor()
     IL_0009: throw
 } // end .try
 catch [mscorlib]System.Exception
 {
     IL_000a: stloc.1
     IL_000b: nop
     IL_000c: nop
     IL_000d: leave.s IL_000f
 } // end handler

The next one is just like the previous, but it introduces a filter to check on some value on whether it’s equivalent to 1.


try
{
    throw new Exception();
}
catch(Exception ex) when(value == 1)
{

}

Generates

.try
 {
     IL_0010: nop
     IL_0011: newobj instance void [mscorlib]System.Exception::.ctor()
     IL_0016: throw
 } // end .try
 filter
 {
     IL_0017: isinst [mscorlib]System.Exception
     IL_001c: dup
     IL_001d: brtrue.s IL_0023
     IL_001f: pop
     IL_0020: ldc.i4.0
     IL_0021: br.s IL_002d
     IL_0023: stloc.2
     IL_0024: ldloc.0
     IL_0025: ldc.i4.1
     IL_0026: ceq
     IL_0028: stloc.3
     IL_0029: ldloc.3
     IL_002a: ldc.i4.0
     IL_002b: cgt.un
     IL_002d: endfilter
 } // end filter
 { // handler
     IL_002f: pop
     IL_0030: nop
     IL_0031: nop
     IL_0032: leave.s IL_0034
 } // end handler

Although the second example generates more IL (which is partly due the value checking), it does not enter the catch block! Interestingly enough the filter keyword, is not available in C# directly (only available through the use of the when keyword.

Credits

This blog post would have been impossible if readers of my blog did not provide me with the necessary feedback. I understand that the first version of this post was outright wrong. I’ve taken feedback received from my readers and changed it so now it delivers the intended message. I thank all the below people.

Rachel Farrell – Introduced me to the fact that the when keyword generates the filter IL rather than just being syntactic sugar.

Ben Camilleri – Pointed out that when catching the exception, the statement should be throw; instead of throw ex; to maintain the StackTrace property properly.

Cedric Mamo – Pointed out that the logic was flawed and provided the appropriate solution in order to successfully demonstrate it using Visual Studio.

Until the next one!

Code never lies, Documentation sometimes does!

Lately, I was working on a Windows Service using C#. Having never done such using C#, I’d thought that I’d go through the documentation that Microsoft provides. I went through it in quite a breeze; my service was running in no time.

I then added some initialisation code, which means that service startup is not instant. No problem with that; in fact the documentation has a section dedicated for such. The user’s code can provide some status update on what state the initialisation is. Unfortunately, C# does not provide this functionality; you’ll have to call native API to do so (through the SetServiceStatus call).

As I was going through the C# documentation of the struct that it accepts, I noticed that it does not match up with the documentation for the native API. The C# says that it accept long (64-bit) parameters, whilst the native API says that it accepts DWORD (32-bit) parameters. This got me thinking; is the C# documentation wrong?

I’ve whipped up two applications: one in C++ and one in C#. I checked the size of, in bytes, of the SERVICE_STATUS that SetServiceStatus expects. The answer was 28 bytes, which makes sense given that it consists of 7 DWORDs (32-bit) – 7 * 4  = 28 bytes.

size_t sizeOfServiceStatus = sizeof(SERVICE_STATUS);
cout << "Size: " << sizeOfServiceStatus << endl;

The C# application consists of copying and pasting the example in Microsoft’s documentation. After checking out the ServiceStatus struct’s size, it showed 56! Again, this was not surprising, since it consists of 6 long (64-bit) plus the ServiceState enum (which defaults to int, 32-bit) plus an additional 32-bit of padding – (6 * 8) + 4 + 4 = 56 . Therefore, the resultant struct is 56 bytes instead of 28 bytes!

int size = Marshal.SizeOf(typeof(ServiceStatus));
Console.WriteLine("Size: " + size);

Unfortunately, this will still to appear to work in the code, but obviously the output of this function is undefined since the data that’s being fed in is totally out of alignment. To make matters worse, pinvoke.net reports this as Microsoft do, which threw me off in the beginning as well.

Naturally, fixing this issue is trivial; it’s just a matter of converting all longs to uint (since DWORD is an unsigned integer). Therefore, the example should look like the following:

public enum ServiceState
{
SERVICE_STOPPED = 0x00000001,
SERVICE_START_PENDING = 0x00000002,
SERVICE_STOP_PENDING = 0x00000003,
SERVICE_RUNNING = 0x00000004,
SERVICE_CONTINUE_PENDING = 0x00000005,
SERVICE_PAUSE_PENDING = 0x00000006,
SERVICE_PAUSED = 0x00000007,
}

[StructLayout(LayoutKind.Sequential)]
public struct ServiceStatus
{
public uint dwServiceType;
public ServiceState dwCurrentState;
public uint dwControlsAccepted;
public uint dwWin32ExitCode;
public uint dwServiceSpecificExitCode;
public uint dwCheckPoint;
public uint dwWaitHint;
};

 

On the usage of ‘out’ parameters

The other day, I was discussing with a colleague on whether or not the usage of out parameters is OK. If I’m honest, I immediately cringed as I am not really a fan of said keyword. But first, let’s briefly discuss what on how the ‘out’ parameter keyword works.

In C#, the ‘out’ keyword is used to allow a method to return multiple values of data. This means that the method can return data using the ‘return’ statement and modify values using the ‘out’ keyword. Why did I say modify instead of return when referring to the out statement? Simple, because what the ‘out’ does, is that it receives a pointer to said data structure and then dereferences and applies the value when a new value is assigned. This means that the ‘out’ keyword is introducing the concept of pointers.

OK, the previous paragraph may not make much sense if you do not have any experience with unmanaged languages and pointers. And that’s exactly the main problem with the ‘out’ parameter. It’s introducing pointers without the user’s awareness.

Let’s now talk about the pattern and architecture of said ‘out’ parameter. As we said earlier, the ‘out’ keyword is used in a method to allow it to return multiple values. An ‘out’ parameter gives a guarantee that the value will get initialised by the callee and the callee does not expect the value passed as the out parameter to be initialised.  Let’s see an example:

User GetUser(int id, out string errorMessage)
{
 // User is fetched from database
 // ErrorMessage is set if there was an error fetching the user
}

This may be used as such

string errorMessage;
User user = GetUser(1, out errorMessage);

By the way, C# 7 now allows the out variable to be declared inline, looking something like this:

User user = GetUser(1, out string errorMessage);

This can easily be refactored, so that the message can be returned encapsulated within the same object. It may look something like the below:

class UserWithError
{
 User user {get; set;}
 string ErrorMessage {get; set;}
}

UserWithError GetUser(int id)
{
 // User is fetched from database
 // ErrorMessage is set if there was an error fetching the user
}

Let’s quickly go through the problems with the ‘out’ keyword exposed. Firstly, it’s not easy to discard the return value. With a return value, we can easily call GetUser and ignore the return value. But with the out parameter, we do have to pass a string to capture the error message, even if one does not need the actual error message. Secondly, declaration is a bit more cumbersome since it needs to happen in more than one line (since we need to declare the out parameter). Although this was fixed in C# 7, there are a lot of code-bases which are not running C# 7. Lastly, this prevents from the method to run as “async”.

By the way, ‘out’ raises a Code Quality warning, as defined by Microsoft’s design patterns.

Last thing I want to mention is the use of the ‘out’ keyword in the Try pattern, which returns a bool as a return type, and sets a value using the ‘out’ keyword. This is the only globally accepted pattern which makes use of the ‘out’ keyword.

int amount;
if(Int32.TryParse(amountAsString, out amount))
{
//amountAsString was indeed an integer
}

Long story short, if you want a method to return multiple values, wrap them in a class; don’t use thee ‘out’ keyword.

Run your C# code instantly in Visual Studio (2015 and up)

A lesser known trick introduced in Visual Studio 2015 (Update 1) is the fact that you can instantly run C# code without having to create a dummy project. The new Roslyn compiler has introduced C# Interactive Shell; a REPL engine. The REPL engine provides instant feedback to the user according to the input provided. This means that you do not need any main method or any other magic; just pluck in your C# code and get feedback immediately.

In order to fire up the C# Interactive Shell, go to View -> Other Windows -> C# Interactive.

window
Firing up the C# Interactive Shell

The C# Interactive shell equipped with many features that we are accustomed with when using Visual Studio such as Syntax Highlighting, Code Completion, Intellisense and such.

roslyn1
Sample code running in the C# Interactive shell

When you run the C# Interactive shell by default, it does not take into consideration the code that you’re currently editing; it’ll behave like a basic REPL engine; nothing more. Visual Studio provides functionality to run the shell in the context of the currently loaded project. To do that, right click the desired project and press “Initialize Interactive with Project”. Doing this will allow the C# Interactive shell to work directly with the loaded project.

interactive
Initialize Interative with Project

The C# interactive shell provides a lot of functionality such as making use of the async features seamlessly. One must note that obviously, code will still run synchronously. It also has several other features, which have been thoroughly documented on Roslyn’s Github page.

One must note that this is NOT a replacement for the immediate window. Whilst debugging a process, it seems that the only way to interact immediately with the process is through the immediate window; the Interactive shell does not work. To be honest, it makes sense since the C# interactive shell is intended to run C# code instantly without requiring a running solution, unlike the immediate window.

This feature is an ideal addition to any developer who need to run some experimental /dirty code quickly, without any headache whatsoever. I used to use tools such as LINQPad (it has other uses too through) or sites such as RexTester to try out something quickly. With this tool, such tools are not needed anymore!

Edit: Thanks for spotting the typo Christopher Demicoli!

On the usage of bool in method parameters

The number of times that I encountered a piece of code like the following is quite alarming:

Transaction transaction =
    TransactionFactory.CreateTransaction(
        true,
        true,
        false,
        true);

What do those true, true, false, true mean? We’ll need to view the method signature each time we need to understand what those four booleans represent!  This means that anyone trying to skim through your code will have a bad time. Sidenote: if you’re writing method calls with such parameters, you seriously need to consider re-thinking such calls.

Let’s try to improve the readability of that code by a bit

Transaction transaction =
    TransactionFactory.CreateTransaction(
        true /* postInSage */,
        true /* isPaidInFull */,
        false /* recurringTransaction */,
        true /* sendEmailReceipt */);

What did we do here? We added some comments next to each boolean so that when reading the code, the reader can quickly identify what each boolean signifies. Neat, we’ve improved the readability by a lot! Microsoft developers seem to like doing it this way; a quick look at .NET Framework Source will show you some good examples, such as here, here and here.

But, what happens in case the order of the booleans change? Apart from breaking functionality, the comments will not update to reflect the new API call. As they say, comments lie, code never does.

Instead of opting to document the parameter names with comments, C# offers the facility of naming your parameters. This means that you can choose to ignore the order of the parameters, but affix the name of the parameter before the actual value. Let’s apply it to our example.

Transaction transaction =
    TransactionFactory.CreateTransaction(
        postInSage: true,
        isPaidInFull: true,
        recurringTransaction: false,
        sendEmailReceipt: true);

That’s looking great! We can even improve a bit by defaulting all boolean arguments to false, thus we’ll only pass those booleans which should be true.

Now, the method signature will look like this:

CreateTransaction(
    bool postInSage = false,
    bool isPaidInFull = false,
    bool recurringTransaction = false,
    bool sendEmailReceipt = false) 

The method call with look like this

Transaction transaction =
    TransactionFactory.CreateTransaction(
        postInSage: true,
        isPaidInFull: true,
        sendEmailReceipt: true); 

We can also take a totally different approach and eliminate the use of boolean parameters and introduce Enums, specifically Enum flags. This means that when we call the CreateTransaction method, we’ll simply pass the required flags. In case you forgot, here’s a quick refresher on how it works.  It will look something of the sort:

Transaction transaction =
    TransactionFactory.CreateTransaction(
        TransactionFlags.PostInSage |
        TransactionFlags.IsPaidInFull |
        TransactionFlags.SendEmailReceipt);

Not bad! When you read that piece of code, you can easily identify any properties that should be taken into consideration when creating the transaction. We ended up eliminating the need of booleans, in favour of flags.

Does this means that booleans should never be used when dealing with parameters? Of course not. I just wanted to shed some light on the fact that there are better approaches than just writing and consuming APIs in a more readable fashion.

Can we be a bit more careful on how we use the Internal access modifier?

The other day, I was writing some SharePoint code, and I required a RunWithElevatedPrivileges call. This call is normally accompanied by the creation of a new SPSite and a new SPWeb objects. This is even demonstrated in the RunWithElevatedPrivileges MSDN excerpt, as shown below. What this code does and such, it does not really matter for the sake of this post.


SPSecurity.RunWithElevatedPrivileges(delegate()
{
    using (SPSite site = new SPSite(web.Site.ID))
    {
        // implementation details omitted
    }
});

This is all fine and good, but I’ve noticed that the project that I’m working on already contains loads of RunWithElevatedPrivileges and the accompanying creation of new SPSite and SPWeb; thus I thought that it would be great if I had access to an overload of RunWithElevatedPrivileges that provides a callback with SPSite and SPWeb as parameters rather than creating them myself. So I thought that this is probably offered by SharePoint but a quick look at the public SharePoint API shows that this does not exist.

Then I thought, how is this possible? This is a common use case; somewhere in the SharePoint API, this ought to exist. So, grabbing ILSpy, I’ve reflected the code and gave a quick look. Unsurprisingly, I’ve found the exact overload that I was looking for. Though, for some weird reason, it’s set to Internal, rather than public. Hold on a minute, why is this kind of API not public? This is not some kind of abstraction; it’s API that should be readily available for the developer.


// Microsoft.SharePoint.SPSecurity
internal static void RunWithElevatedSiteAndWeb(SPWeb originalWeb, SPSecurity.CodeToRunWithElevatedSite secureCode)
{
    if (originalWeb.CurrentUser != null && originalWeb.CurrentUser.ID == 1073741823 && !originalWeb.Site.HasAppPrincipalContext)
    {
        secureCode(originalWeb.Site, originalWeb);
        return;
    }
    SPSecurity.RunWithElevatedPrivileges(delegate
    {
        using (SPSite sPSite = new SPSite(originalWeb.Site.ID, originalWeb.Site.Zone))
        {
            using (SPWeb sPWeb = sPSite.OpenWeb(originalWeb.ID))
            {
                secureCode(sPSite, sPWeb);
            }
        }
    });
}

This made me think: can we be a bit more careful on how we use the Internal access modifier? I mean, I understand that portions of the code should be private, since such code will be only used in the same class to simplify the underlying code. But, API that are clearly useful by developers having a LOT of internal methods is a big no for me. It is clearly not adding business value to the API, just frustration to the end developer since he needs to re-implement (or copy) the same implementation in his solution.

Obviously, I am not saying that ALL Internal methods are badly designed; if this was the case, it would not exist at all. I’m saying that API developers should think twice before limiting API to internal, which can clearly be used 3rd party developers. Private methods are OK, but internal methods, I think one needs to be a bit more careful on how this is used.

Or..maybe this is just one of the many, many quirks of the SharePoint API.