Let’s face it: one of the biggest problems we can have when dealing with data in any programming language that has a concept of “null” is ending up, by accident, with a null reference error of some kind, where we did not properly guarded against the nullability of the element we have to process.
Null values can come from anywhere, for many different reasons, so we must often think about the process of how we got the data that is provided as a parameter of the function we’re writing, think about how we will manage that nullity if it happens, etc. Usually, you’ll end up with code that looks like this:
This is such a short amount of code, and already, about half of my function is dealing with the potential nullity of my inputs…
Today, in this article, we will look at one of the way one can reduce the amount of null guarding by implementing a short-and-sweet monadic “bind” function for the nullable data types in C#. Using a functional approach is obviously not the only way to deal with this kind of situation, but I particularly like functional programming, and I find that the extensions in C# are especially well suited for this task. You will not get a dedicated syntax like you would in Haskell (sadly, the
>>= operator cannot be overloaded, and the
>> behind it must always have an integer on the right side; and, at least so far, no particular expression as
do), but you’ll still get a pretty well-streamlined way of chaining operations that can return null values, while having functions that can feel safe about their inputs.
This article is not about monads, even though we’re using the concept behind the scenes. I really want to focus on the nullable type, not on the generic idea, so I’ll define a monad very simply here, so that you can follow along even if you’re not familiar with it. A monad is a container with two properties: it can be created, and it has a function called
bind which receives an extra function, which potentially transforms the type stored in the container into another container. By container, I mean a generic type that may or may not possess instance(s) of its generic type inside it. In that regard, List<> is a container. Nullable<> is another one. So the idea is that the
bind function is allowed to process the data contained by these containers directly, but only by returning the result in a container itself, to maintain the contract.
So why do we care? Because it allows users of that monadic container to functions that will take away basic checks. In the case of the List<>,
bind (which is called
SelectMany in Linq) helps you write less code by iterating over its content for you, while also making sure your function will not be called if there is no data in the container. In the case of nullable data types, the
bind we will make today will help you write more secure functions without the hassle of verifying nullability yourself, as it will be encapsulated in the
bind function directly.
How secure the functions can be largely depends on the type system of your language. In C#, we’re halfway there: value types (declared with
struct) are not nullable by default, so their nullability must be explicitly typed, while reference types (declared with
class) are always nullable implicitly (it’s even their default value!). Note that this will change in the future (https://blogs.msdn.microsoft.com/dotnet/2017/11/15/nullable-reference-types-in-csharp/), and I must admit we, here at BesLogic, are very excited to see nullable reference types make their apparition in C#.
The reason why typing nullity is so good is that your compiler can then make sure you are not providing null data to functions that would only be able to process non-nullable data, hence allowing you to fully trust that you will get absolutely none null reference errors at runtime! This is truly amazing: your compiler will refuse to let you pass a nullable data to a function that was designed to take a non-nullable data as input, so when you are the one writing the latter, you can 100% avoid null guards. No need to write
if not null statements everywhere with these data.
On the other side, when you deal with nullable data, the compilers can more easily tell you that you seem to be referencing the value of a nullable type without having guarded against its nullability first. Consider this bit of code, for instance:
Depending on the tools at your disposition, you may find that the first version is highlighted, to tell you that you are trying to access the value of a data that my be null (which is forbidden by nullables). This kind of warning is not always seen on reference types at the moment, because they could all potentially be null, so you would find most of your code being highlighted, even if you know for sure that your value is fine.
But even with that added safety, it remains tedious to deal with such data: you still have to use if’s, or the ternary operator if you prefer expressions, you still have to manually check if the data has a value, and if so, dereference it, and if not, returning null instead.
This is where bind comes to play. With it, you are able to the functions above like this:
The first implementation adds a slight change, though: it will always return null if any of its parameters are null. Depending on what
null means in the context of your function, this may not always be the suitable way to deal with it (which is how
bind works), but let’s say here that the input may be user data gone wrong, and the formatter may be selected automatically, and the faulty input invalidated the selection.
The second implementation keeps the same meaning, but is more verbose than the first. This is a side effect of dealing with this in C# instead of another language. The C# compiler is not able to infer all types properly unless we help it a bit. For instance, here, I need to cast my return type to its nullable version, so that the compiler figures everything out properly.
But still! Now I can write everything as expressions, I know for certain that data inputs are all non-null (which guarantees no runtime errors of null reference!), and I can do every of this in a relatively concise and very readable way, thanks to the C# extensions.
Here is how these two
Bind functions are implemented:
As you can see, it’s pretty straightforward, and there’s absolutely no magic happening here. We only wrap away the null checks.
Even though the Nullable<> type has a short syntax alias in C# (the
? behind the types), you can clearly see the containers in here: you provide a container of TIn and a function that operates on a TIn and returns a container of TOut, with the whole returning itself a container of TOut. Written differently:
TIn? -> (TIn -> TOut?) -> TOut?. The transformation is clear (and is what defines the monadic
I hope you’ll find this useful in order to write more runtime-secure code in the future!
By the way, the two
Bind declarations above only cover half of the potential use cases. Can you figure out the two that are missing, and how you would implement them?
See you next week for another article like this!