Google Hash Code 2018 solution and source code – 1st in Malta and top 20% worldwide

Our solution: https://github.com/albertherd/Hash-Code-2018-Team-Stark

I recently had the honour to team up with my friends (and co-workers) to have a crack at the Google Hash Code 2018 online qualification round. We went to compete with little to no expectation. In fact, by the first two hours of the competition, we had almost nothing!

Google Hash Code is a team based competition, where teams all over the world gather together to participate and solve a Google engineering problem, using a language of their choice.

The problem was: Given a set of rides and a number of cars in a city grid, assign rides to cars, and serve the rides in a restricted time. There are also bonuses if you manage to start the ride as early as possible. Here is the Problem statement, input data set A,  input data set B input data set C  input data set D,  input data set E,

Our solution is not the cleanest solution, I must admit. One has to keep in mind that when undertaking these competitions, the biggest challenge is to come up with a decent solution the problem in a very short amount of time, rather than creating a complex solution which does solves the problem perfectly.

However imperfect, hacks-infested and dirty, it managed to place us first in the Maltese islands, and in the top 20% worldwide (more specifically, 895th).

Capture

Our solution does not make use of a Genetic Algorithm; we tried to create a Greedy Algorithm with several optimizations in order to boost the quality of the output. Optimizations include:

  • Having the main loop revolve around steps, rather than cars or rides (much like a game engine main loop).
  • Sort all the initial rides by start time.
  • Discard any rides that have been already undertaken.
  • When trying to pick out a ride, we make sure that given current time (or step), we check if the ride can actually be served on time. Using this thought, we’ve managed to produce output with 0 late rides across the board!
  • Get the first feasible N rides – sort them out by closest, then by travelling and undertake that ride. Since each ride yielded the same bonus, irrespective of the distance traveled, the shortest rides were always the most profitable.
  • Other minor optimizations.

There are multiple optimizations and enhancements that we wished to include, such as removing those impossible rides from the data set as we go along, and distributing each car / ride according to a “zone”. In fact, the code does have a Zone class, which never got used.

Have a look at our solution : https://github.com/albertherd/Hash-Code-2018-Team-Stark

Until the next one!

Performance differences when using AVX instructions

Download source code from here

Recent news on exploits on both Meltdown and Spectre got me thinking and researching a bit more in depth on Assembly. I ended up reading on the differences and performance gains when using SIMD instructions versus naive implementations. Let’s briefly discuss what SIMD is.

SIMD (Single instruction, multiple data) is the process of piping vector data through a single instruction, effectively speeding up the calculations significantly. Given that SIMD instruction can process larger amount of data in parallel atomically, SIMD does provide a significant performance boost when used. Real-Life applications of SIMD are various, ranging from image processing, audio processing and graphics generation.

Let’s investigate the real performance gains when using SIMD instructions – in this case we’ll be using AVX (Advanced Vector Extensions), which provides newer SIMD instructions. We’ll be using several SIMD instructions, such as VADDPS. VSUBPS, VMULPS, and VDIVPS. Each instruction is responsible for adding, subtracting, multiplying and dividing single precision numbers (floats).

In reality, we will not be writing any Assembly at all, we’ll be using Intrinsics, which ship directly with any decent C/C++ compiler. For our example, we’ll be using MSVC compiler, but any decent compiler will do. The Intel Intrinsics Guide provides a very good platform to look up any required intrinsic functions one may need, thus removing the need to write Assembly, just C code.

There are two benchmarks for each arithmetic operation: one is done naively and one is done using intrinsics thus using the necessary AVX instruction. Each operation is  performed 200,000,000 times thus to make sure that there is enough time to demonstrate it for a benchmark,

Here’s an example of how the multiplication is implemented naively:

void DoNaiveMultiplication(int iterations)
{
    float z[8];

    for (int i = 0; i < iterations; i++)
    {
        z[0] = x[0] * y[0];
        z[1] = x[1] * y[1];
        z[2] = x[2] * y[2];
        z[3] = x[3] * y[3];
        z[4] = x[4] * y[4];
        z[5] = x[5] * y[5];
        z[6] = x[6] * y[6];
        z[7] = x[7] * y[7];
    }
}

Here's an example of how the multiplication is implemented in AVX:

void DoAvxMultiplication(int iterations)
{
__m256 x256 = _mm256_loadu_ps((__m256*)x);
__m256 y256 = _mm256_loadu_ps((__m256*)y);
__m256 result;

for (int i = 0; i < iterations; i++)
{
result = _mm256_mul_ps(x256, y256);
}
}

Finally, let's take a look on how the results look:

naivevsavx

 

AVXPerformanceGains
Performance gains when using AVX

From the graph above, one can see that when optimizing from naive to AVX, there are the following gains:

  • Addition: 217% faster – from 1141ms to 359ms
  • Subtraction: 209% faster – from 1110ms to 359ms
  • Multiplication: 221% faster- from 1156ms to 360ms
  • Division: 300% faster – from 2687ms to 672ms

Of course, the benchmarks show the best case scenarios; so real-life mileage may vary. These benchmarks can be downloaded and tested out from here. Kindly note that you’ll need either an Intel CPU from 2011 onwards (Sandy Bridge), or an AMD processor from 2011 onwards (Bulldozer)  in order to be able to run the benchmarks.

 

 

Exception Filtering in C#

What do you do when you have a piece of code that can fail, and when it fails, you need to log to a database? You wrap your code in a try-catch block and chuck a Log call in the catch block. That’s all good! What if I tell you that there is a better way to do it?

try
{
    // Code that might fail
}
catch(Exception ex)
{
    // Handle
    // Log to database
}

What’s the problem with the typical approach?

When your code enters a catch block – the stack unwinds. This refers to the process when the stack goes backwards / upwards in order to arrive the stack frame where the original call is located. Wikipedia can explain this in a bit more detail. What this means is that we might lose information with regards to the original stack location and information. If a catch block is being entered just to log to the database and then the exception is being re-thrown, this means that we’re losing vital information to discover where the issue exists; this is especially true in release / live environments.

What’s the way forward?

C# 6 offers the Exception Filtering concept; here’s how to use it.

try
{
    //Code
}
catch (FancyException fe) when (fe.ErrorCode > 0)
{
    //Handle
}

The above catch block won’t be executed if the ErrorCode property of the exception is not greater than zero. Brilliant, we can now introduce logic without interfering with the catch mechanism and avoiding stack unwinding!

A more advanced example

Let’s now go and see a more advanced example. The application below accepts input from the Console – when the input length is zero, an exception with code 0 is raised, else an exception with code 1 is raised. Anytime an exception is raised, the application logs it. Though, the exception is only caught if only if the ErrorCode is greater than 0. The complete application is on GitHub.


class Program
{
    static void Main(string[] args)
    {
        while (true)
        {
            new FancyRepository().GetCatchErrorGreaterThanZero(Console.ReadLine());
        }
    }
}

public class FancyRepository
{
    public string GetCatchErrorGreaterThanZero(string value)
    {
        try
        {
            return GetInternal(value);
        }
        catch (FancyException fe) when (LogToDatabase(fe.ErrorCode) || fe.ErrorCode > 0)
        {
            throw;
        }
    }

    private string GetInternal(string value)
    {
        if (!value.Any())
           throw new FancyException(0);

        throw new FancyException(1);
    }

    private bool LogToDatabase(int errorCode)
    {
        Console.WriteLine($"Exception with code {errorCode} has been logged");
        return false;
    }
}

 

1st Scenario – Triggering the filter

In the first scenario, when the exception is thrown by the GetInternal method, the filter successfully executes and prevents the code from entering the catch statement. This can be illustrated by the fact that Visual Studio breaks in the throw new FancyException(0); line rather than in the throw; line. This means that the stack has not been unwound; this can be proven by the fact that we can still investigate the randomNumber value. The Call Stack is fully preserved – we can go through each frame and investigate the data in each call stack.

1

2nd Scenario – Triggering the catch

In the second scenario, when the exception is thrown by the GetInternal method, the filter does not handle it due to the ErrorCode is greater than 0. This means that the catch statement is executed and the error is re-thrown. In the debugger, we can see this due to the fact that Visual Studio break in the throw; line rather than the throw new FancyException(1); line. This means that we’ve lost a stack frame; it is impossible to investigate the randomNumber value, since the stack has been unwound to the GetCatchErrorGreaterThanZero call.

2catch

What’s happening under the hood?

As one can assume, the underlying code being generated must differ at an IL level, since the stack is not being unwound. And one would assume right – the when keyword is being translated into the filter instruction.

Let’s take two try-catch blocks, and see their equivalent IL.


try
{
    throw new Exception();
}
catch(Exception ex)
{

}

Generates

 .try
 {
     IL_0003: nop
     IL_0004: newobj instance void [mscorlib]System.Exception::.ctor()
     IL_0009: throw
 } // end .try
 catch [mscorlib]System.Exception
 {
     IL_000a: stloc.1
     IL_000b: nop
     IL_000c: nop
     IL_000d: leave.s IL_000f
 } // end handler

The next one is just like the previous, but it introduces a filter to check on some value on whether it’s equivalent to 1.


try
{
    throw new Exception();
}
catch(Exception ex) when(value == 1)
{

}

Generates

.try
 {
     IL_0010: nop
     IL_0011: newobj instance void [mscorlib]System.Exception::.ctor()
     IL_0016: throw
 } // end .try
 filter
 {
     IL_0017: isinst [mscorlib]System.Exception
     IL_001c: dup
     IL_001d: brtrue.s IL_0023
     IL_001f: pop
     IL_0020: ldc.i4.0
     IL_0021: br.s IL_002d
     IL_0023: stloc.2
     IL_0024: ldloc.0
     IL_0025: ldc.i4.1
     IL_0026: ceq
     IL_0028: stloc.3
     IL_0029: ldloc.3
     IL_002a: ldc.i4.0
     IL_002b: cgt.un
     IL_002d: endfilter
 } // end filter
 { // handler
     IL_002f: pop
     IL_0030: nop
     IL_0031: nop
     IL_0032: leave.s IL_0034
 } // end handler

Although the second example generates more IL (which is partly due the value checking), it does not enter the catch block! Interestingly enough the filter keyword, is not available in C# directly (only available through the use of the when keyword.

Credits

This blog post would have been impossible if readers of my blog did not provide me with the necessary feedback. I understand that the first version of this post was outright wrong. I’ve taken feedback received from my readers and changed it so now it delivers the intended message. I thank all the below people.

Rachel Farrell – Introduced me to the fact that the when keyword generates the filter IL rather than just being syntactic sugar.

Ben Camilleri – Pointed out that when catching the exception, the statement should be throw; instead of throw ex; to maintain the StackTrace property properly.

Cedric Mamo – Pointed out that the logic was flawed and provided the appropriate solution in order to successfully demonstrate it using Visual Studio.

Until the next one!

On the usage of bool in method parameters

The number of times that I encountered a piece of code like the following is quite alarming:

Transaction transaction =
    TransactionFactory.CreateTransaction(
        true,
        true,
        false,
        true);

What do those true, true, false, true mean? We’ll need to view the method signature each time we need to understand what those four booleans represent!  This means that anyone trying to skim through your code will have a bad time. Sidenote: if you’re writing method calls with such parameters, you seriously need to consider re-thinking such calls.

Let’s try to improve the readability of that code by a bit

Transaction transaction =
    TransactionFactory.CreateTransaction(
        true /* postInSage */,
        true /* isPaidInFull */,
        false /* recurringTransaction */,
        true /* sendEmailReceipt */);

What did we do here? We added some comments next to each boolean so that when reading the code, the reader can quickly identify what each boolean signifies. Neat, we’ve improved the readability by a lot! Microsoft developers seem to like doing it this way; a quick look at .NET Framework Source will show you some good examples, such as here, here and here.

But, what happens in case the order of the booleans change? Apart from breaking functionality, the comments will not update to reflect the new API call. As they say, comments lie, code never does.

Instead of opting to document the parameter names with comments, C# offers the facility of naming your parameters. This means that you can choose to ignore the order of the parameters, but affix the name of the parameter before the actual value. Let’s apply it to our example.

Transaction transaction =
    TransactionFactory.CreateTransaction(
        postInSage: true,
        isPaidInFull: true,
        recurringTransaction: false,
        sendEmailReceipt: true);

That’s looking great! We can even improve a bit by defaulting all boolean arguments to false, thus we’ll only pass those booleans which should be true.

Now, the method signature will look like this:

CreateTransaction(
    bool postInSage = false,
    bool isPaidInFull = false,
    bool recurringTransaction = false,
    bool sendEmailReceipt = false) 

The method call with look like this

Transaction transaction =
    TransactionFactory.CreateTransaction(
        postInSage: true,
        isPaidInFull: true,
        sendEmailReceipt: true); 

We can also take a totally different approach and eliminate the use of boolean parameters and introduce Enums, specifically Enum flags. This means that when we call the CreateTransaction method, we’ll simply pass the required flags. In case you forgot, here’s a quick refresher on how it works.  It will look something of the sort:

Transaction transaction =
    TransactionFactory.CreateTransaction(
        TransactionFlags.PostInSage |
        TransactionFlags.IsPaidInFull |
        TransactionFlags.SendEmailReceipt);

Not bad! When you read that piece of code, you can easily identify any properties that should be taken into consideration when creating the transaction. We ended up eliminating the need of booleans, in favour of flags.

Does this means that booleans should never be used when dealing with parameters? Of course not. I just wanted to shed some light on the fact that there are better approaches than just writing and consuming APIs in a more readable fashion.