Published on 2025-11-21
I have been writing production applications in Go for a few years now. I like some aspects of Go. One aspect I do not like is how easy it is to create data races in Go.
Go is often touted for its ease to write highly concurrent programs. However, it is also mind-boggling how many ways Go happily gives us developers to shoot ourselves in the foot.
Over the years I have encountered and fixed many interesting kinds of data races in Go. If that interests you, I have written about Go concurrency in the past and about some existing footguns, without them being necessarily 'Go data races':
So what is a 'Go data race'? Quite simply, it is Go code that does not conform to the Go memory model. Importantly, Go defines in its memory model what a Go compiler MUST do and MAY do when faced with a non-conforming program exhibiting data races. Not everything is allowed, quite the contrary in fact. Data races in Go are not benign either: their effects can range from 'no symptoms' to 'arbitrary memory corruption'.
Quoting the Go memory model:
This means that races on multiword data structures can lead to inconsistent values not corresponding to a single write. When the values depend on the consistency of internal (pointer, length) or (pointer, type) pairs, as can be the case for interface values, maps, slices, and strings in most Go implementations, such races can in turn lead to arbitrary memory corruption.
With this out of the way, let's take a tour of real data races in Go code that I have encountered and fixed. At the end I will emit some recommendations to (try to) avoid them.
I also recommend reading the paper A Study of Real-World Data Races in Golang. This article humbly hopes to be a spiritual companion to it. Some items here are also present in this paper, and some are new.
In the code I will often use errgroup.WaitGroup or sync.WaitGroup because they act as a fork-join pattern, shortening the code. The exact same can be done with 'raw' Go channels and goroutines. This also serves to show that using higher-level concepts does not magically protect against all data races.
This one is very common in Go and also very easy to fall into. Here is a simplified reproducer:
package main
import (
"context"
"golang.org/x/sync/errgroup"
)
func Foo() error { return nil }
func Bar() error { return nil }
func Baz() error { return nil }
func Run(ctx context.Context) error {
err := Foo()
if err != nil {
return err
}
wg, ctx := errgroup.WithContext(ctx)
wg.Go(func() error {
err = Baz()
if err != nil {
return err
}
return nil
})
wg.Go(func() error {
err = Bar()
if err != nil {
return err
}
return nil
})
return wg.Wait()
}
func main() {
println(Run(context.Background()))
}
The issue might not be immediately visible.
The problem is that the err outer variable is implicitly captured by the closures running each in a separate goroutine. They then mutate err concurrently. What they meant to do is instead use a variable local to the closure and return that instead. There is conceptually no need to share any data; this is purely accidental.
The fix is simple, I'll show two variants in the same diff: define a local variable, or use a named return value.
diff --git a/cmd-sg/main.go b/cmd-sg/main.go
index 7eabdbc..4349157 100644
--- a/cmd-sg/main.go
+++ b/cmd-sg/main.go
@@ -18,14 +18,14 @@ func Run(ctx context.Context) error {
wg, ctx := errgroup.WithContext(ctx)
wg.Go(func() error {
- err = Baz()
+ err := Baz()
if err != nil {
return err
}
return nil
})
- wg.Go(func() error {
+ wg.Go(func() (err error) {
err = Bar()
if err != nil {
return err
It is unfortunate that a one character difference is all we need to fall into this trap. I feel for the original developer who wrote this code without realizing the implicit capture. As mentioned in a previous article where this silent behavior bit me, we can use the build flag -gcflags='-d closure=1' to make the Go compiler print which variables are being captured by the closure:
$ go build -gcflags='-d closure=1'
./main.go:20:8: heap closure, captured vars = [err]
./main.go:28:8: heap closure, captured vars = [err]
But this is not realistic to do that in a big codebase and inspect each closure. It's useful if you know that a given closure might suffer from this problem.
The Go docs state about http.Client:
[...] Clients should be reused instead of created as needed. Clients are safe for concurrent use by multiple goroutines.
So imagine my surprise when the Go race detector flagged a race tied to http.Client. The code looked like this:
package main
import (
"context"
"net/http"
"golang.org/x/sync/errgroup"
)
func Run(ctx context.Context) error {
client := http.Client{}
wg, ctx := errgroup.WithContext(ctx)
wg.Go(func() error {
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
if req.Host == "google.com" {
return nil
} else {
return http.ErrUseLastResponse
}
}
_, err := client.Get("http://google.com")
return err
})
wg.Go(func() error {
client.CheckRedirect = nil
_, err := client.Get("http://amazon.com")
return err
})
return wg.Wait()
}
func main() {
println(Run(context.Background()))
}
The program makes two concurrent HTTP requests to two different URLs. For the first one, the code restricts redirects (I invented the exact logic for that, no need to look too much into it, the real code has complex logic here). For the second one, no redirect checks are performed, by setting CheckRedirect to nil. This code is idiomatic and follows the recommendations from the documentation:
CheckRedirect specifies the policy for handling redirects. If CheckRedirect is not nil, the client calls it before following an HTTP redirect. If CheckRedirect is nil, the Client uses its default policy [...].
The problem is: the CheckRedirect field is modified concurrently without any synchronization which is a data race.
This code also suffers from an I/O race: depending on the network speed and response time for both URLs, the redirects might or might be checked, since the callback might get overwritten from the other goroutine, right when the HTTP client would call it.
Alternatively, the http.Client could end up calling a nil callback if the callback was set when the http.Client checked whether it was nil or not, but before http.Client had the chance to call it, the other goroutine set it to nil. Boom, nil dereference.
Here, the simplest fix is to use two different HTTP clients:
diff --git a/cmd-sg/main.go b/cmd-sg/main.go
index 351ecc0..8abee1c 100644
--- a/cmd-sg/main.go
+++ b/cmd-sg/main.go
@@ -8,10 +8,10 @@ import (
)
func Run(ctx context.Context) error {
- client := http.Client{}
wg, ctx := errgroup.WithContext(ctx)
wg.Go(func() error {
+ client := http.Client{}
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
if req.Host == "google.com" {
return nil
@@ -23,6 +23,7 @@ func Run(ctx context.Context) error {
return err
})
wg.Go(func() error {
+ client := http.Client{}
client.CheckRedirect = nil
_, err := client.Get("http://amazon.com")
return err
This may affect performance negatively since some resources will not be shared anymore.
Additionally, in some situations, this is not so easy because http.Client does not offer a Clone() method (a recurring issue in Go as we'll see). For example, a Go test may start a httptest.Server and then call .Client() on this server to obtain a preconfigured HTTP client for this server. Then, there is no easy way to duplicate this client to use it from two different tests running in parallel.
Again here, I would not blame the original developer. In my view, the docs for http.Client are misleading and should mention that not every single operation is concurrency safe. Perhaps with the wording: 'once a http.Client is constructed, performing a HTTP request is concurrency safe, provided that the http.Client fields are not modified concurrently'. Which is less catchy than 'Clients are safe for concurrent use', period.
The next data race is one that baffled me for a bit, because the code was using a mutex properly and I could not fathom why a race would be possible.
Here's a minimal reproducer:
package main
import (
"encoding/json"
"net/http"
"sync"
)
type Plans map[string]int
type PricingInfo struct {
plans Plans
}
var pricingInfo = PricingInfo{plans: Plans{"cheap plan": 1, "expensive plan": 5}}
type PricingService struct {
info PricingInfo
infoMtx sync.Mutex
}
func NewPricingService() *PricingService {
return &PricingService{info: pricingInfo, infoMtx: sync.Mutex{}}
}
func AddPricing(w http.ResponseWriter, r *http.Request) {
pricingService := NewPricingService()
pricingService.infoMtx.Lock()
defer pricingService.infoMtx.Unlock()
pricingService.info.plans["middle plan"] = 3
encoder := json.NewEncoder(w)
encoder.Encode(pricingService.info)
}
func GetPricing(w http.ResponseWriter, r *http.Request) {
pricingService := NewPricingService()
pricingService.infoMtx.Lock()
defer pricingService.infoMtx.Unlock()
encoder := json.NewEncoder(w)
encoder.Encode(pricingService.info)
}
func main() {
http.HandleFunc("POST /add-pricing", AddPricing)
http.HandleFunc("GET /pricing", GetPricing)
http.ListenAndServe(":12345", nil)
}
A global mutable map of pricing information is guarded by a mutex. One HTTP endpoint reads the map, another adds an item to it. Pretty simple I would say. The locking is done correctly.
Yet the map suffers from a data race. Here is a reproducer:
package main
import (
"net/http"
"net/http/httptest"
"testing"
)
func TestMain(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("POST /add-pricing", AddPricing)
mux.HandleFunc("GET /pricing", GetPricing)
server := httptest.NewServer(mux)
t.Cleanup(server.Close)
t.Run("get pricing", func(t *testing.T) {
t.Parallel()
_, err := server.Client().Get(server.URL + "/pricing")
if err != nil {
panic(err)
}
})
for range 5 {
t.Run("add pricing", func(t *testing.T) {
t.Parallel()
_, err := server.Client().Post(server.URL+"/add-pricing", "application/json", nil)
if err != nil {
panic(err)
}
})
}
}
The reason why is because the data, and the mutex guarding it, do not have the same 'lifetime'. The pricingInfo map is global and exists from the start of the program to the end. But the mutex infoMtx exists only for the duration of the HTTP handler (and thus HTTP request). We effectively have 1 map and N mutexes, none of them shared between HTTP handlers. So HTTP handlers cannot synchronize access to the map.
The intent of the code was (I think) to do a deep clone of the pricing information at the beginning of the HTTP handler in NewPricingService. Unfortunately, Go does a shallow copy of structures and thus each PricingService instance ends up sharing the same underlying plans map, which is this global map. It could be that for a long time, it worked because the PricingInfo struct did not yet contain the map (in the real code it contains a lot of ints and strings which are value types and will be copied correctly by a shallow copy), and the map was only added later.
This data race is akin to copying a mutex by value when passing it to a function, which then locks it. This does no synchronization at all since a copy of the mutex is being locked - no mutex is shared between concurrent units.
In any event, the fix is to align the lifetime of the data and the mutex:
I went with the second approach in the real code because it seemed to be the original intent:
diff --git a/cmd-sg/main.go b/cmd-sg/main.go
index fb59f5c..c7a7a94 100644
--- a/cmd-sg/main.go
+++ b/cmd-sg/main.go
@@ -2,6 +2,7 @@ package main
import (
"encoding/json"
+ "maps"
"net/http"
"sync"
)
@@ -19,8 +20,15 @@ type PricingService struct {
infoMtx sync.Mutex
}
+func ClonePricing(pricingInfo PricingInfo) PricingInfo {
+ cloned := PricingInfo{plans: make(Plans, len(pricingInfo.plans))}
+ maps.Copy(cloned.plans, pricingInfo.plans)
+
+ return cloned
+}
+
func NewPricingService() *PricingService {
- return &PricingService{info: pricingInfo, infoMtx: sync.Mutex{}}
+ return &PricingService{info: ClonePricing(pricingInfo), infoMtx: sync.Mutex{}}
}
func AddPricing(w http.ResponseWriter, r *http.Request) {
It's annoying to have to implement this manually and especially to have to check every single nested field to determine if it's a value type or a reference type (the former will behave correctly with a shallow copy, the latter needs a custom deep copy implementation). I miss the derive(Clone) annotation in Rust. This is something that the compiler can (and should) do better than me.
Furthermore, as mentioned in the previous section, some types from the standard library or third-party libraries do not implement a deep Clone() function and contain private fields which prevent us from implementing that ourselves.
I think Rust's API for a mutex is better because a Rust mutex wraps the data it protects and thus it is harder to have uncorrelated lifetimes for the data and the mutex.
Go's mutex API likely could not have been implemented this way since it would have required generics which did not exist at the time. But as of today: I think it could.
Nonetheless, the Go compiler has no way to detect accidental shallow copying, whereas Rust's compiler has the concepts of Copy and Clone - so that issue remains in Go, and is not a simple API mistake in the standard library we can fix.
I encountered many cases of concurrently modifying a map, slice, etc without any synchronization. That's your run of the mill data race and they are typically fixed by 'slapping a mutex on it' or using a concurrency safe data structure such as sync.Map.
I will thus share here a more interesting one where it is not as straightforward.
This time, the code is convoluted but what it does is relatively simple:
package main
import (
"bytes"
"context"
"io"
"strings"
"time"
"github.com/ory/dockertest/v3"
"github.com/ory/dockertest/v3/docker"
"golang.org/x/sync/errgroup"
)
func GetSigningSecretFromStripeContainer() string {
dp, err := dockertest.NewPool("")
if err != nil {
panic(err)
}
forwarder, err := dp.RunWithOptions(&dockertest.RunOptions{
Repository: "stripe/stripe-cli",
Tag: "v1.19.1",
})
if err != nil {
panic(err)
}
output := &bytes.Buffer{}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
var signingSecret string
eg := errgroup.Group{}
eg.Go(func() error {
defer cancel()
for {
ln, err := output.ReadString('\n')
if err == io.EOF {
<-time.After(100 * time.Millisecond)
continue
}
if err != nil {
return err
}
if strings.Contains(ln, "Ready!") {
ln = ln[strings.Index(ln, "whsec_"):]
signingSecret = ln[:strings.Index(ln, " ")]
return nil
}
}
})
dp.Client.Logs(docker.LogsOptions{
Context: ctx,
Stderr: true,
Follow: true,
RawTerminal: true,
Container: forwarder.Container.ID,
OutputStream: output,
})
eg.Wait()
return signingSecret
}
func main() {
println(GetSigningSecretFromStripeContainer())
}
So, the issue may be clear from the description but here it is spelled out: one goroutine writes to a (growing) byte buffer, another one reads from it, and there is no synchronization: that's a clear data race.
What is interesting here is that we have to pass an io.Writer for the OutputStream to the library, and this library will write to the writer we passed. We cannot insert a mutex lock anywhere around the write site, since we do not control the library and there no hooks (e.g. pre/post write callbacks) to do so.
We implement our own writer that does the synchronization with a mutex:
type SyncWriter struct {
Writer io.Writer
Mtx *sync.Mutex
}
func NewSyncWriter(w io.Writer, mtx *sync.Mutex) io.Writer {
return &SyncWriter{Writer: w, Mtx: mtx}
}
func (w *SyncWriter) Write(p []byte) (n int, err error) {
w.Mtx.Lock()
defer w.Mtx.Unlock()
written, err := w.Writer.Write(p)
return written, err
}
We pass it as is to the third-party library, and when we want to read the byte buffer, we lock the mutex first:
diff --git a/cmd-sg/main.go b/cmd-sg/main.go
index 5529d90..42571b9 100644
--- a/cmd-sg/main.go
+++ b/cmd-sg/main.go
@@ -5,6 +5,7 @@ import (
"context"
"io"
"strings"
+ "sync"
"time"
"github.com/ory/dockertest/v3"
@@ -12,6 +13,24 @@ import (
"golang.org/x/sync/errgroup"
)
+type SyncWriter struct {
+ Writer io.Writer
+ Mtx *sync.Mutex
+}
+
+func NewSyncWriter(w io.Writer, mtx *sync.Mutex) io.Writer {
+ return &SyncWriter{Writer: w, Mtx: mtx}
+}
+
+func (w *SyncWriter) Write(p []byte) (n int, err error) {
+ w.Mtx.Lock()
+ defer w.Mtx.Unlock()
+
+ written, err := w.Writer.Write(p)
+
+ return written, err
+}
+
func GetSigningSecretFromStripeContainer() string {
dp, err := dockertest.NewPool("")
if err != nil {
@@ -27,6 +46,8 @@ func GetSigningSecretFromStripeContainer() string {
}
output := &bytes.Buffer{}
+ outputMtx := sync.Mutex{}
+ writer := NewSyncWriter(output, &outputMtx)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
@@ -37,7 +58,9 @@ func GetSigningSecretFromStripeContainer() string {
defer cancel()
for {
+ outputMtx.Lock()
ln, err := output.ReadString('\n')
+ outputMtx.Unlock()
if err == io.EOF {
<-time.After(100 * time.Millisecond)
continue
@@ -59,7 +82,7 @@ func GetSigningSecretFromStripeContainer() string {
Follow: true,
RawTerminal: true,
Container: forwarder.Container.ID,
- OutputStream: output,
+ OutputStream: writer,
})
eg.Wait()
Most types in the Go standard library (or third-party libraries for that matter) are not concurrency safe and synchronization is typically on you. I still often see questions on the internet about that, so assume it is not until the documentation states otherwise.
It would also be nice if more types have a 'sync' version, e.g. SyncWriter, SyncReader, etc.
The Go race detector is great but will not detect all data races. Data races will cause you pain and suffering, be it flaky tests, weird production errors, or in the worst case memory corruption.
Due to how easy it is to spawn goroutines without a care in the world (and also to run tests in parallel), it will happen to you. It's not a question of if, just when, how bad, and how many days/weeks it will cost you to find them and fix them.
If you are not running your test suite with the race detector enabled, you have numerous data races in your code. That's just a fact.
Go the language and the Go linter ecosystem do not have nearly enough answers to this problem. Some language features make it way too easy to trigger data races, for example implicit capture of outer variables in closures.
The best option left to Go developers is to try to reach 100% test coverage of their code and run the tests with the race detector on.
We should be able to do better in 2025. Just like with memory safety, when even expert developers regularly produce data races, it's the fault of the language/tooling/APIs/etc. It is not enough to blame humans and demand they just 'do better'.
Ideas for the Go language:
const in more places. If something is constant, there cannot be data races with it.Clone() function in the compiler for every type (like Rust's derive(Clone)). Maybe it's opt-in or opt-out, not sure. Or perhaps it's a built-in like make.freeze() functionality like JavaScript's Object.freeze() to prevent an object from being mutated.Mutex by taking inspiration from other languages. This has been done successfully in the past with WaitGroup compared to using raw goroutines and channels.Ideas for Go programs:
string type in Go is immutable.If you enjoy what you're reading, you want to support me, and can afford it: Support me. That allows me to write more cool articles!
This blog is open-source! If you find a problem, please open a Github issue. The content of this blog as well as the code snippets are under the BSD-3 License which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.