Safe zero-copy operations in C#

My attempt at talking about one of the most underrated features of C#.

Safe zero-copy operations in C#

C# is a versatile language. You can write mobile apps, desktop apps, games, websites, services and APIs with it. You can write it like Java with all the abstractions and AbstractionFactoryClassProviders. But differently from Java, you can write low-level and unsafe code too. When I say low-level, I mean without the GC, with raw pointers.

Low-level code is usually required for performance or interoperability with C libraries or the operating system. The reason low-level code helps with performance is that it can be used to eliminate runtime checks on memory accesses.

Array element accesses are bounds-checked in C# for safety. But, that means that there's performance impact unless the compiler can eliminate a bounds-checking operation. The bounds-checking elimination logic needs to ensure that the array index was already bounds-checked before, or can be assured to be inside bounds during the compile-time. For example, take this simple function:

int sum(int[] array)
{
  int sum = 0;
  for (int i = 0; i < array.Length; i++)
  {
    sum += array[i];
  }
  return sum;
}

That's an ideal situation for bounds-checking elimination because the index variable i is created with known boundaries, and it depends on the array's length. The index variable's lifetime is shorter than the array's lifetime and it's guaranteed to be contained valid values throughout the function. The native code produced for sum has no bounds-checking:

L0000	xor	    eax, eax
L0002	xor	    edx, edx
L0004	mov     r8d, [rcx+8]          ; read length
L0008	test    r8d, r8d              ; is empty?
L000b	jle	    short L001c           ; skip the loop
L000d	mov	    r10d, edx
L0010	add	    eax, [rcx+r10*4+0x10] ; sum += array[i];
L0015	inc	    edx                   ; i++
L0017	cmp	    r8d, edx              ; compare length with i
L001a	jg	    short L000d           ; loop if still greater
L001c	ret	

But, what if the function signature was slightly different?

int sum(int[] array, int startIndex, int endIndex)
{
  int sum = 0;
  for (int i = startIndex; i <= endIndex; i++)
  {
    sum += array[i];
  }
  return sum;
}

Now, the C# compiler doesn't have a way to know if the passed startIndex and endIndex values are inside the boundaries of array because their lifetimes are distinct. So the native assembly produced becomes way more involved with bounds-checking operations:

L0000	sub		rsp, 0x28				
L0004	xor		eax, eax			; sum = 0
L0006	cmp		edx, r8d			; startIndex > endIndex?
L0009	jg		short L0045			; then skip the entire function
L000b	test	rcx, rcx			; array is null?
L000e	je		short L0031			; then cause NullReferenceException
L0010	mov		r10d, edx
L0013	or		r10d, r8d
L0016	jl		short L0031
L0018	cmp		[rcx+8], r8d		; array.Length <= endIndex ?
L001c	jle		short L0031			; then do bounds-checking
L001e	xchg	ax, ax				; alignment NOP
L0020	mov		r10d, edx
L0023	add		eax, [rcx+r10*4+0x10]   ; sum += array[i]
L0028	inc		edx					; consider i + 1
L002a	cmp		edx, r8d			; i > endIndex?
L002d	jle		short L0020			; no, go on
L002f	jmp		short L0045			; return
L0031	cmp		edx, [rcx+8]		; i > array.Length?
L0034	jae		short L004a			; bounds-checking failed. go to ----+
L0036	mov		r10d, edx												|
L0039	add		eax, [rcx+r10*4+0x10]	; sum += array[i]				|
L003e	inc		edx					; i++								|
L0040	cmp		edx, r8d			; i <= endIndex ?					|
L0043	jle		short L0031			; continue for loop					|
L0045	add		rsp, 0x28												|
L0049	ret							; return sum						|
L004a	call	0x00007ff857ec6200	; throw IndexOutOfRangeException <--+

We could use low-level unsafe functions and pointers in C# (yes, C# supports raw pointers!) to avoid bounds-checking altogether, like this:

unsafe int sum(int* ptr, int length)
{
  int* end = ptr + length;
  int sum = 0;
  for (; ptr < end; ptr++)
  {
    sum += *ptr;
  }
  return sum;
}

That also creates a very optimized code that supports passing along a sub-portion of an array:

L0000	movsxd	rax, edx        
L0003	lea	rax, [rcx+rax*4]    ; end = ptr + length
L0007	xor	edx, edx            ; sum = 0
L0009	cmp	rcx, rax            ; ptr >= end ?
L000c	jae	short L0019         ; then return
L000e	add	edx, [rcx]          ; sum += *ptr
L0010	add	rcx, 4              ; ptr += sizeof(int)
L0014	cmp	rcx, rax            ; ptr < end?
L0017	jb	short L000e         ; then keep looping
L0019	mov	eax, edx
L001b	ret	                    ; return sum

Unsafe code and pointer-arithmetic can be very performant as you can see. The problem is that it's too dangerous. With incorrect values of length, you don't simply get an IndexOutOfRangeException but instead your app either crashes, or returns incorrect results. If your code happened to modify the memory region instead of just reading it, then you could have a nice entry point for a buffer overflow security vulnerability in your app too. Not to mention that all the callers of that function will have to have unsafe blocks too.

But it's possible to handle this safe and fast in C# without resorting to esoteric rituals like that. First, how do you solve this problem of indexes to describe a portion of an array and actual boundaries of the array being disconnected from each other? You create a new immutable type that holds these values together. And that type is called a span in C#. Other programming languages may call it a slice. Declaration of Span type resembles something like this. Well, it's not exactly this, but I want you to understand the concept first:

readonly struct Span<T>
{
  readonly T* _ptr;
  readonly int _len;
}

It's basically an immutable pointer with length. The great thing about a type like this is that the compiler can assure that once an immutable Span is initialized with correct bounds, it will always be safe to access without any bounds-checking. That means, you can pass around sub-views of arrays or even other spans safely and quickly without the performance overhead.

But, how can it be safe? What if the GC decides to throw away the structure that ptr points to? Well, that's where "ref types" come into play in C#.

A ref type is a type that can't leave the stack and escape to the heap, so it's always guaranteed that a T type will outlive a Span<T> instance. That's why the actual Span<T> declaration looks like this:

readonly ref struct Span<T>  // notice "ref" 
{
  readonly ref T _ptr;        // notice "ref"
  readonly int _len;
}

Since a ref type can only live in stack, it can't be a member of a class, nor can it be assigned to a non-ref variable, like, it can't be boxed either. A ref type can only be contained inside another ref type. It's ref types all the way.

Span-based version of our sum function can eliminate bounds-checking despite that it can now have several super powers. The first one is that it can receive a sub-view of an array too with specific indices:

int sum(Span<int> span)
{
  int sum = 0;
  for (int i = 0; i < span.Length; i++)
  {
    sum += span[i];
  }
  return sum;
}

For instance you can call this function with sum(array) or you can call it with a sub-view of an array like sum(array[startIndex..endIndex]). That wouldn't incur new bounds-checking operations other than when you're trying to slice the array using the range operator. See how the generated assembly code for sum becomes optimized again:

L0000	mov	rax, [rcx]
L0003	mov	ecx, [rcx+8]
L0006	xor	edx, edx            ; sum = 0
L0008	xor	r8d, r8d            ; i = 0
L000b	test	ecx, ecx		; span.Length == 0?
L000d	jle	short L001e
L000f	mov	r10d, r8d
L0012	add	edx, [rax+r10*4]    ; sum += span[i]
L0016	inc	r8d                 ; i++
L0019	cmp	r8d, ecx            ; i < Length?
L001c	jl	short L000f         ; then keep looping
L001e	mov	eax, edx            
L0020	ret						; return sum

Another superpower you get is that the ability to declare the data structure you receive immutable in your function signature, so your function is guaranteed not to touch it, and you can find bugs instantly. All you need to do is to replace Span<T> with ReadOnlySpan<T>. Then your attempts to modify the span contents will immediately cause a compiler error. Something impossible with regular arrays, even if you declare them readonly. The readonly directive only protects the reference from modification not the contents of the data structure.

Passing along a smaller portion of a larger data structure to relevant APIs used to involve either copying or passing the relevant part's offset and length values along with that data structure. It required the API to support calls with larger structures with offsets. It was impossible to guarantee the safety of such APIs as the relationship between parameters couldn't be established by the compiler or the runtime.

It's now both easy and expressive to implement zero-copy operations safely using spans. Consider a Quicksort implementation for instance; it usually has a function like this that works with portions of a given array:

int partition(int[] array, int low, int high)
{
  int midpoint = (high + low) / 2; // I know, we'll get there!
  int mid = array[midpoint];

  // tuple swaps in C#! ("..^1" means "Length - 1")
  (array[midpoint], array[^1]) = (array[^1], array[midpoint]);
  int pivotIndex = 0;
  for (int i = low; i < high - 1; i++)
  {
     if (array[i] < mid)
     {
       swap(array, i, pivotIndex);
       pivotIndex += 1;
     }
  }
  (array[midpoint], array[endpoint]) = (array[endpoint], array[midpoint]);
  return pivotIndex;
}

This function receives an array and offsets that designate a part of it, and rearranges the items based on a picked value in it. Values smaller than it move to the left, larger than it move to the right.

The mid = array[midpoint] has to be bounds-checked because the compiler can't know if the index is inside the bounds of this array. The for loop also performs bounds-check for array accesses in the loop, which some of them can be eliminated, but not fully guaranteed.

There is also an overflow error because we pass array ranges using high and low values: (high+low) can overflow for very large arrays, and the results would be catastrophic, can even cause buffer overflow exceptions.

The partition function gets recursively called many times by Quicksort function below, so that means bounds-checking can be a performance issue.

void Quicksort(int[] array, int low, int high)
{
  if (array.Length <= 1)
  {
    return;
  }

  int pivot = partition(array, low, high);
  Quicksort(span, low, pivot - 1);
  Quicksort(span, pivot + 1, high);
}

With spans, the same Quicksort function looks like this:

void Quicksort(Span<int> span)
{
  if (span.Length <= 1)
  {
    return;
  }

  int pivot = partition(span);
  Quicksort(span[..pivot]);
  Quicksort(span[(pivot + 1)..]);
}

See how expressive using spans are especially with the range syntax? It lets you get a new span out of an existing span or an array using double dots (..)? And the partition function even looks much better too:

int partition(Span<int> span)
{
  int midpoint = span.Length / 2; // look ma, no buffer overflows!
  int mid = span[midpoint];
  (span[midpoint], span[^1]) = (span[^1], span[midpoint]);
  int pivotIndex = 0;
  for (int i = 0; i < span.Length - 1; i++)
  {
     if (span[i] < mid)
     {
       swap(array, i, pivotIndex);
       pivotIndex += 1;
     }
  }
  (span[midpoint], span[^1]) = (span[^1], span[midpoint]);
  return pivotIndex;
}

Because C# spans are also zero-based, it's harder to have buffer overflow problems caused by formulae like (low + high) / 2. Now, the implementation is as fast as an unsafe implementation with raw pointers, but still extremely safe.

New zero-copy operations in .NET runtime

I used a recursive example here to show how sub-portions of a larger data structure can be passed to another function, but spans can be used almost everywhere, and now .NET runtime supports zero-copy alternatives of popular functions too.

Take String.Split for example. You can now split a string without creating new copies of every split portion of the string. You can split a CSV line into its parts like this:

string csvLine = // .. read CSV line
string[] parts = csvLine.Split(',');

foreach (string part in parts)
{
  Console.WriteLine(part);
}

The problem with that is, now you're dealing with newly created 5 buffers with varying lengths. .NET allocates new memory for them, GC keeps track of them. It's slow, it hogs memory. It's problematic especially in loops, and can create GC pressure, slowing your app even more.

You can instead cast your CSV line into a ReadOnlySpan<char> and iterate over its components to write it to the output:

string csvLine = // .. read CSV line

var span = csvLine.AsSpan();
var parts = span.Split(',');
foreach (var range in parts)
{
  Console.Out.WriteLine(span[range]);
}

Note that we use a small detour to use Console.Out.WriteLine instead of Console.WriteLine because Console class lacks an overload to output a ReadOnlySpan<char> like a string.

The great ROI of that experiment is making zero memory allocations after reading CSV line into memory. A similar logic that improves performance and memory efficiency can be applied everywhere that can receive a Span<T>/ReadOnlySpan<T> instead of an array.

Embrace the future

Spans and slice-like structures in are the future of safe memory operations in modern programming languages. Embrace them. The quick takeaways are:

  • Use spans over arrays in your function declarations unless you explicitly require a standalone array in your function for some reason. Such a change opens your API into zero-copy optimization scenarios, and calling code will be more expressive.
  • Don't bother with unsafe/pointers if you can code the same logic with spans. You can still perform low-level memory operations without wandering into the dangerous parts of the forest. Your code will still be fast, and yet safer.

Use spans, wherever possible, mostly readonly.