/home/huy/everyday

everyday: Development Log

01.05.2022 - Zig/Where data is stored and how to choose an allocator

Zig has no runtime, hence, no automatic memory management. It's the developer's responsibility to manage the allocation/deallocation of memories. It's important to know how memory allocation works, where your data is stored and what to do with them.

Where data is stored in a Zig program?

String literals, comptime variables, top-level var, struct-level var, and const declarations are stored in the global constant data section.

Function-level var variables are stored in the function's stack frame, once the function returns, any pointers to variables in the function's stack frame will become invalid.

So, Zig will not stop you from making mistakes like this:

fn createNode(value: i32) *Node {
  var node = Node { .value = value, .left = null, .right = null };
  return &node;
}

In the code above, we return a pointer to a Node object that was created inside createNode function. This object, however, will be freed after createNode function returns, and won't be valid anymore. Any dereference to this pointer will lead to undefined behavior.

This also explains why you are able to return a string literal from a function and safety get away with it:

fn sayHello() []const u8 {
    return "hello world";
}

How to choose an allocator?

In the above example, the right way to return a pointer from a function is to first allocate some memory for the pointer and initialize the Node struct to that pointer.

Unlike C, Zig does not come with any default allocator, so we need to determine which allocator we should use based on our use case.

It's a convention in Zig that a function or struct that needs to perform memory allocation accepts an Allocator parameter.

The rules of thumb to choose an allocator is:

  • If we are making a library, do not specify any allocator, but let the user pass their desired Allocator as a parameter into our library and use it.

  • If we are linking with libc, use std.heap.c_allocator.

  • If the maximum memory size is known or need to be bound by any number at compile time, use std.heap.FixedBufferAllocator or std.heap.ThreadSafeFixedBufferAllocator.

  • If we are making a program that runs straight from start to end, like a CLI program, and it would make sense to free everything at once at the end, use something like:

    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();
    
    // pass the reference of this allocator to other struct as needed
    var allocator = &arena.allocator();
    

    The call to arena.deinit() will try to automatically traverse and free everything we allocated during the program.

  • Otherwise, use std.heap.GeneralPurposeAllocator, this allocator gives us the ability to detect memory leak, but it does not automatically free anything.

    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(!gpa.deinit());
    
    // pass the reference of this allocator to other struct as needed
    var allocator = &gpa.allocator();
    

See std.heap package for the details of each allocator.

We also need to handle allocation error when using allocators, most of the time, an error named error.OutOfMemory will be returned and must be catch:

fn createNode(allocator: *Allocator, value: i32) !*Node {
    const result = try allocator.create(Node);
    result.* = Node{ .value = value };
    return result;
}

var node = createNode(allocator, 12) catch |err| {
  std.debug.print("Error: Out of memory\n{}\n", .{err});
  return null;
};

Lastly, it's highly recommended to watch this great talk What's a Memory Allocator Anyway? by Benjamin Feng.

Read more