Optimizing Go Slice Allocations: A Step-by-Step Guide to Stack-Friendly Sizing

Introduction

Go programs can become slower when they perform many heap allocations, especially for small slices that grow dynamically. Each heap allocation triggers the memory allocator and adds pressure on the garbage collector, even with modern improvements like the Green Tea collector. Stack allocations, on the other hand, are nearly free—they don't involve the allocator and are automatically cleaned up when the function returns. This guide will walk you through identifying and fixing heap allocation issues in slice usage, specifically by pre-allocating slices with a constant size so that the backing array can live on the stack. You'll learn how to transform a dynamic append loop into a more efficient pattern that avoids repeated heap allocations and reduces GC load.

Optimizing Go Slice Allocations: A Step-by-Step Guide to Stack-Friendly Sizing — Source: blog.golang.org

What You Need

A Go development environment (Go 1.22 or later recommended).
Basic knowledge of Go slices, append, and garbage collection.
A sample program that reads tasks from a channel and processes them (e.g., the process function from the original article).
Optional: Profiling tools like pprof or benchstat to measure improvements.

Step-by-Step Guide

Step 1: Understand the Problem – Dynamic Slice Growth

Consider the typical pattern of collecting items from a channel into a slice:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

On each iteration, if the underlying array is full, append allocates a new array with double the capacity (Go's growth algorithm). For small slices, this leads to many allocations early on: size 1 -> 2 -> 4 -> 8 -> ... Each allocation is on the heap and the old array becomes garbage. If the slice never grows large, you're wasting time and memory.

Step 2: Profile or Reason About Slice Size

Before optimizing, estimate the expected number of tasks that will be read from the channel. Is it always small? Often 0-10? Or could it be hundreds? If you know an upper bound, you can pre-allocate the slice with exactly that capacity. For example, if you expect never more than 32 tasks, set capacity to 32. If the number varies but is small, you might still benefit from a fixed small capacity.

Step 3: Pre-allocate the Slice with `make`

Replace var tasks []task with a make call that specifies a length of 0 but a capacity equal to your expected maximum:

func process(c chan task) {
    tasks := make([]task, 0, 32) // pre-allocate backing array of size 32
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Now, if the actual number of tasks is ≤ 32, no heap allocation occurs for the backing store (except possibly the initial make itself, but see Step 4).

Step 4: Encourage Stack Allocation

The make call above still allocates the backing array on the heap if the slice is non-constant size. To force stack allocation, the capacity must be a compile-time constant. In Go, if you declare a slice with a constant literal capacity (e.g., [32]task and then slice it), the compiler can allocate the array on the stack. But you can't directly use make with a constant for stack allocation. Instead, use a fixed-size array and slice it:

func process(c chan task) {
    var buf [32]task
    tasks := buf[:0]  // empty slice backed by stack array
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Here, buf is a local array – it lives on the stack. The slice tasks points into it. As long as the number of tasks doesn't exceed 32, all appends use the stack array without any heap allocation. The array is freed when the function returns.

Important: If you exceed 32, append will allocate a new backing array on the heap, and the original stack array is no longer used. So choose a capacity that covers the common case but not too large to waste stack space.

Step 5: Handle Overflow Gracefully

If you cannot guarantee the maximum number of tasks, you can fall back to a heap-allocated slice when the fixed-size buffer overflows. For example:

func process(c chan task) {
    var buf [32]task
    tasks := buf[:0]
    for t := range c {
        if len(tasks) < cap(tasks) {
            tasks = append(tasks, t)
        } else {
            // overflow: switch to heap-allocated slice
            heapTasks := make([]task, 32, 64)
            copy(heapTasks, tasks)
            heapTasks = append(heapTasks, t)
            // continue reading into heapTasks
            for t := range c {
                heapTasks = append(heapTasks, t)
            }
            processAll(heapTasks)
            return
        }
    }
    processAll(tasks)
}

But this adds complexity. In most cases, if overflow happens rarely, it's acceptable to let the extra allocations occur. You can also use a hybrid approach with append directly on a slice that started from a stack array – once it overflows, Go automatically allocates a heap array and copies. The stack array becomes unused but not garbage (it stays on the stack). This is fine.

Step 6: Benchmark and Verify

Write a benchmark to compare the original code and the optimized version. Use go test -bench=. -benchmem to see allocation counts and bytes. You should see a significant reduction in heap allocations for the common-case size. Profile with pprof to ensure no unexpected allocations remain.

Example benchmark:

func BenchmarkProcess(b *testing.B) {
    ch := make(chan task, 100)
    for i := 0; i < b.N; i++ {
        go func() {
            for j := 0; j < 25; j++ {
                ch <- task{...}
            }
            close(ch)
        }()
        process(ch)
    }
}

Compare results. You should see 0 allocations per operation if the number of items fits in the stack buffer.

Step 7: Apply to Real Code

Identify other hot spots in your Go code where small slices are built incrementally. Common candidates: collecting results from database queries, building request payloads, aggregating log messages, etc. Replace dynamic slicing with a stack-allocated fixed-size buffer where the maximum size is known and small. Use the same pattern: var buf [N]T; slice := buf[:0].

Tips

Choose the buffer size wisely. It must be a compile-time constant. A size of 32 or 64 is often enough for many use cases. Too large (e.g., 1024) may waste stack space and can actually hurt performance due to increased stack memory usage.
Use the -gcflags=-m flag to see escape analysis decisions. If the array escapes to the heap, your optimization won't work. Check that buf does not escape (e.g., by returning a slice of it).
Combine with compiler optimizations. Go's inliner and escape analysis are improving. In some cases, the compiler may even allocate small slices on the stack automatically (check each Go version).
Be careful with slices that are returned. If the slice is returned from the function, the backing array cannot be on the stack because it would be invalid after the return. This optimization only works for slices that are consumed within the same function (or passed to functions that are inlined and don't retain the slice beyond the caller).
Measure, measure, measure. Not every slice loop is a bottleneck. Profile your entire application before micro-optimizing. Focus on hotspots identified by CPU profiles or allocation profiles.

Optimizing Go Slice Allocations: A Step-by-Step Guide to Stack-Friendly Sizing

Introduction

What You Need

Step-by-Step Guide

Step 1: Understand the Problem – Dynamic Slice Growth

Step 2: Profile or Reason About Slice Size

Step 3: Pre-allocate the Slice with `make`

Step 4: Encourage Stack Allocation

Step 5: Handle Overflow Gracefully

Step 6: Benchmark and Verify

Step 7: Apply to Real Code

Tips

See Also

External Resources

Optimizing Go Slice Allocations: A Step-by-Step Guide to Stack-Friendly Sizing

Introduction

What You Need

Step-by-Step Guide

Step 1: Understand the Problem – Dynamic Slice Growth

Step 2: Profile or Reason About Slice Size

Step 3: Pre-allocate the Slice with make

Step 4: Encourage Stack Allocation

Step 5: Handle Overflow Gracefully

Step 6: Benchmark and Verify

Step 7: Apply to Real Code

Tips

See Also

External Resources

Step 3: Pre-allocate the Slice with `make`