⏴ Back to all articles

Published on 2026-03-11. Last modified on 2026-03-19.

Detect data races with DTrace in any language

Table of contents

For a gentle introduction to DTrace especially in conjunction with Go, see my past article: An optimization and debugging story with Go and DTrace, or my other DTrace articles.

I recently fixed a data race in Go at work. Well, I fixed a number of them, but this one is easy to understand so I'll use it to talk about race detectors. Since I work on an open-source project, I can share my work which is great!

A data race is a concurrent access to shared data in a way that does not respect the rules of the programming language. Some languages are stricter or looser when they establish how that can happen, but they all forbid some kinds of concurrent (unsynchronized) accesses, typically write-write or read-write.

The symptoms are bizarre and very painful to diagnose: stale reads, inconsistent data, 'impossible' code path taken, crashes, etc.

Since some compilers are very aggressive at re-ordering and optimizing code in the absence of explicit data dependencies, the compiled program can be very different from what the code looks like.

This is exacerbated by another actor: the CPU. On a typical machine, there are multiple CPUs, some on the same die and some on different dies, and they all have to coordinate to access the same memory. They also share a cache which has to be kept in sync with the real data. A data race will typically result in an out-of-date cache.

The theory

Go's memory model defines a data race in terms of 'happens-before' and 'synchronized before':

A read-write data race on memory location x consists of a read-like memory operation r on x and a write-like memory operation w on x, at least one of which is non-synchronizing, which are unordered by happens before (that is, neither r happens before w nor w happens before r).

A write-write data race on memory location x consists of two write-like memory operations w and w' on x, at least one of which is non-synchronizing, which are unordered by happens before.

Note that if there are no read-write or write-write data races on memory location x, then any read r on x has only one possible W(r): the single w that immediately precedes it in the happens before order.

More generally, it can be shown that any Go program that is data-race-free, meaning it has no program executions with read-write or write-write data races, can only have outcomes explained by some sequentially consistent interleaving of the goroutine executions. (The proof is the same as Section 7 of Boehm and Adve's paper cited above.) This property is called DRF-SC.

The intent of the formal definition is to match the DRF-SC guarantee provided to race-free programs by other languages, including C, C++, Java, JavaScript, Rust, and Swift.

C's standard uses similar language, section '5.1.2.5 Multi-threaded executions and data races':

[...] Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

[...] The library defines atomic operations (7.17) and operations on mutexes (7.28.4) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization operation on one or more memory locations is one of an acquire operation, a release operation, both an acquire and release operation, or a consume operation. A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence. In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics.

In practice

This is great to have formally defined, but it's not very actionable. Many programming languages do not have enough compile time guarantees to avoid data races at compile time, and thus provide a runtime race detector: Go, C and C++ with Thread Sanitizer (a.k.a. TSan), etc.

Typically these can detect some data races but not all, and incur a big performance penalty (I have experienced x5 to x20 slow-downs in real production code). Also, they typically require that all the code in the program is compiled with this detector enabled, which is sometimes very time-consuming, or not possible at all (some projects use closed-source libraries). Additionally there is the case of new emerging programming languages which do not have yet a race detector implemented.

What can we do then?

What if we could somehow use DTrace to observe our program and detect, like a runtime race detector, invalid accesses? It does not have to catch all races, some would be already nice. Even a crude approach could be helpful.

So here is the idea:

This can be refined further as we'll see but it's good enough for now, and that's the basis of what Thread Sanitizer does.

A C example

I have reproduced the Go race in C for simplicity, because Go inlines function calls quite heavily and some functions, e.g. append(), len(), are not real functions but in fact builtin, it's hard to trace them. Also, C is a lingua franca and since there is no runtime (garbage collector, scheduler) to speak of, I find it easier to understand what's going on. But no worries, we'll come back to the Go code in a bit.

In theory DTrace can trace arbitrary instructions and static probes, but in Go static probes are annoying to declare since that needs CGO, and on ARM64 macOS (my current laptop) tracing arbitrary instructions does not work.

The program appends data to a growable byte array in a thread, and reads the length of this byte array in another thread, without synchronization, until an expected value is reached. Textbook read-write data race, but this kind of thing happens in production code when the compiler does not protect us from ourselves:

  
C
#include <assert.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #define DATA_LEN (1 << 20) typedef struct { uint8_t *data; size_t len, cap; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } size_t byte_array_get_cap(ByteArray *b) { return b->cap; } void byte_array_set_len(ByteArray *b, size_t len) { b->len = len; } void byte_array_set_cap(ByteArray *b, size_t cap) { b->cap = cap; } void byte_array_ensure_cap_at_least(ByteArray *b, size_t cap) { size_t current_cap = byte_array_get_cap(b); if (current_cap >= cap) { return; } byte_array_set_cap(b, current_cap < 8 ? 8 : current_cap * 2); b->data = realloc(b->data, b->cap); } void byte_array_push(ByteArray *b, uint8_t elem) { const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); } void *async_push(void *any) { ByteArray *b = any; for (size_t i = 0; i < DATA_LEN; i++) { byte_array_push(b, (uint8_t)i); } return NULL; } int main() { ByteArray byte_array = {0}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); for (;;) { const size_t len = byte_array_get_len(&byte_array); if (len == DATA_LEN) { break; } } assert(byte_array_get_len(&byte_array) == DATA_LEN); }

A few notes about this code:

Thread Sanitizer

Now we run it with Thread Sanitizer:

  
Shell
$ clang blog1.c -g -o blog1 -fsanitize=thread $ ./blog1

And it appears to run just fine, no warning whatsoever (actually a warning complains that we never call pthread_join but that's ok).

If you get lucky, you'll see one diagnostic:

  
Plaintext
================== WARNING: ThreadSanitizer: data race (pid=2145) Read of size 8 at 0x00016d946f08 by main thread: #0 byte_array_get_len blog1.c:14 (blog1:arm64+0x100000610) #1 main blog1.c:59 (blog1:arm64+0x1000009fc) Previous write of size 8 at 0x00016d946f08 by thread T1: #0 byte_array_set_len blog1.c:18 (blog1:arm64+0x1000006dc) #1 byte_array_push blog1.c:39 (blog1:arm64+0x100000904) #2 async_push blog1.c:46 (blog1:arm64+0x100000970) Location is stack of main thread. Thread T1 (tid=272869675, running) created by main thread at: #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64+0x31ac0) #1 main blog1.c:56 (blog1:arm64+0x1000009f0) SUMMARY: ThreadSanitizer: data race blog1.c:14 in byte_array_get_len ==================

But this was very rare in my testing.

The DTrace race detector

Alright, now let's implement our idea in DTrace:

  
Dtrace
#pragma D option dynvarsize=16m #pragma D option cleanrate=100hz typedef enum {AccessKindRead=1, AccessKindWrite=2} AccessKind; typedef struct { AccessKind kind; size_t tid; int ts; } Access; Access accesses[uintptr_t /* data ptr */]; int func_access[string]; BEGIN { // Record which kind of access each function does. func_access["byte_array_get_len"] = AccessKindRead; func_access["byte_array_set_len"] = AccessKindWrite; } // Watch `byte_array_get_len` and `byte_array_set_len`. pid$target::byte_array_?et_len:entry { // `arg0` is the pointer to the `ByteArray` structure, // and `len` is at offset `8` in the `ByteArray` structure (on 64 bits systems). // So this gets the value of `byte_array->len`. self->mem_ptr = arg0 + 8; this->theirs = accesses[self->mem_ptr]; // Get the access kind based on the function: `byte_array_get_len` is a read and `byte_array_set_len` is a write. this->my_access_kind = func_access[probefunc]; this->now = timestamp; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != tid && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, tid, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, self->mem_ptr); ustack(); } // Update the map with the current access. accesses[self->mem_ptr].kind = this->my_access_kind; accesses[self->mem_ptr].tid = tid; accesses[self->mem_ptr].ts = this->now; } pid$target::byte_array_?et_len:return /self->mem_ptr != 0/ { // Clear the map since the access is done. accesses[self->mem_ptr].kind = 0; accesses[self->mem_ptr].tid = 0; accesses[self->mem_ptr].ts = 0; self->mem_ptr = 0; }

A few notes about this D script:


Now let's run our D script. We see a ton of these messages:

  
Shell
$ sudo dtrace -s ./blog.d -c ./blog1 [...] 11 242002 byte_array_get_len:entry possible data race: my_access_kind:1 my_tid=273369021 my_ts=1447848961324000 their_access_kind:2 their_tid=273369096 their_ts=-1694029743 mem_ptr=16bd5f118 blog1`byte_array_get_len blog1`main+0x44 dyld`start+0x1c10 12 242003 byte_array_set_len:entry possible data race: my_access_kind:2 my_tid=273369096 my_ts=1447848963598000 their_access_kind:1 their_tid=273369021 their_ts=-1691755409 mem_ptr=16bd5f118 blog1`byte_array_set_len blog1`byte_array_push+0x58 blog1`async_push+0x40 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8 [...]

Indeed, the main thread has a data race when calling byte_array_get_len with the other thread which calls byte_array_set_len, as expected. We even see that it is a read-write data races, from the printed accesses.

Now let's fix the program crudely with a mutex:

  
Diff
--- blog1.c 2026-03-13 16:39:18 +++ blog_mtx.c 2026-03-13 16:38:48 @@ -9,6 +9,7 @@ typedef struct { uint8_t *data; size_t len, cap; + pthread_mutex_t mtx; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } @@ -32,11 +33,14 @@ } void byte_array_push(ByteArray *b, uint8_t elem) { + pthread_mutex_lock(&b->mtx); + const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); + pthread_mutex_unlock(&b->mtx); } void *async_push(void *any) { @@ -50,13 +54,15 @@ } int main() { - ByteArray byte_array = {0}; + ByteArray byte_array = {.mtx = PTHREAD_MUTEX_INITIALIZER}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); for (;;) { + pthread_mutex_lock(&byte_array.mtx); const size_t len = byte_array_get_len(&byte_array); + pthread_mutex_unlock(&byte_array.mtx); if (len == DATA_LEN) { break; }

And now our D script shows no warning of potential data races. Yay!

There are many ways to fix this race, this one might not be the best since it will cause a lot of lock contention, so let's try another approach.

Improvements

Now you might be thinking: wait a minute, how do we know that the D script is actually working? It seems to track concurrent accesses, but the fix with the mutex actually creates a critical section, or 'exclusive' section, where only one thread has access at any point to the data structure. As such, there are no concurrent accesses.

Let's check the correctness by applying a smarter, possibly more performant fix of the data race using a RW lock (called in Go sync.RWMutex):

  
Diff
--- blog1.c 2026-03-13 16:39:18 +++ blog_rwlock.c 2026-03-13 16:38:58 @@ -9,6 +9,7 @@ typedef struct { uint8_t *data; size_t len, cap; + pthread_rwlock_t rw_lock; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } @@ -32,11 +33,15 @@ } void byte_array_push(ByteArray *b, uint8_t elem) { + pthread_rwlock_wrlock(&b->rw_lock); + const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); + + pthread_rwlock_unlock(&b->rw_lock); } void *async_push(void *any) { @@ -50,13 +55,15 @@ } int main() { - ByteArray byte_array = {0}; + ByteArray byte_array = {.rw_lock = PTHREAD_RWLOCK_INITIALIZER}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); for (;;) { + pthread_rwlock_rdlock(&byte_array.rw_lock); const size_t len = byte_array_get_len(&byte_array); + pthread_rwlock_unlock(&byte_array.rw_lock); if (len == DATA_LEN) { break; }

The code is very similar in its structure to the mutex version, but this now allows for N concurrent readers or one concurrent writer. It means that we now can have multiple concurrent threads reading the same address in memory.

To take full advantage of this, we spawn another thread that also reads the length in a loop:

  
Diff
diff --git a/blog_rwlock b/blog_rwlock index 8e7a442..b5c8920 100755 Binary files a/blog_rwlock and b/blog_rwlock differ diff --git a/blog_rwlock.c b/blog_rwlock.c index a402298..271acf5 100644 --- a/blog_rwlock.c +++ b/blog_rwlock.c @@ -54,12 +54,29 @@ void *async_push(void *any) { return NULL; } +void *async_read(void *any) { + ByteArray *b = any; + + for (;;) { + pthread_rwlock_rdlock(&b->rw_lock); + const size_t len = byte_array_get_len(b); + pthread_rwlock_unlock(&b->rw_lock); + if (len == DATA_LEN) { + break; + } + } + return NULL; +} + int main() { ByteArray byte_array = {.rw_lock = PTHREAD_RWLOCK_INITIALIZER}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); + pthread_t thread_read = {0}; + pthread_create(&thread_read, NULL, async_read, &byte_array); + for (;;) { pthread_rwlock_rdlock(&byte_array.rw_lock); const size_t len = byte_array_get_len(&byte_array);

Let's re-run our D script:

  
Shell
$ sudo dtrace -s ./blog.d -c ./blog_rwlock

And... no warnings, as expected. Terrific!

This is the clause that does all the heavy lifting:

  
Dtrace
(this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...'

And N concurrent readers are fine. In fact, if we comment out this condition, we see indeed that we have two concurrent readers and lots of false positives get printed:

  
Plaintext
13 6893 byte_array_get_len:entry possible data race: my_access_kind:1 my_tid=274158070 my_ts=1449489633975000 their_access_kind:1 their_tid=274158134 their_ts=0 mem_ptr=16eed3030 blog_rwlock`byte_array_get_len blog_rwlock`main+0x9c dyld`start+0x1c10 11 6893 byte_array_get_len:entry possible data race: my_access_kind:1 my_tid=274158134 my_ts=1449489634196625 their_access_kind:1 their_tid=274158070 their_ts=-1698664731 mem_ptr=16eed3030 blog_rwlock`byte_array_get_len blog_rwlock`async_read+0x30 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8

Benchmarks

Runtime Runtime with Thread Sanitizer Runtime with DTrace race detector
Racy program, debug mode 6.3 ms 275.8 ms 7.05 s
Racy program, release mode DNF 1 DNF 1 DNF 1
Program with mutex, release mode 44.5 ms 714 ms 156 ms
Program with RW lock, release mode 4.3 s 4.9 s 4.2 s

How the benchmarks were done:

Commentary:

Back to Go

Alright, now that we have something working, let's try it on the original racy Go program.

Due to the nature of Go, we have to make minor adjustments:

  
Dtrace
#pragma D option dynvarsize=16m #pragma D option cleanrate=100hz typedef enum {AccessKindRead=1, AccessKindWrite=2} AccessKind; typedef struct { AccessKind kind; size_t tid; int ts; } Access; typedef struct { void* data; size_t len, cap; } GoSlice; Access accesses[uintptr_t /* data ptr */]; int func_access[string]; uintptr_t gid_to_mem[int]; BEGIN { // Record which kind of access each function does. func_access["github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func3"] = AccessKindRead; func_access["github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func1"] = AccessKindWrite; } pid$target::*TestQueueHTTPEmail.func1:entry { this->my_access_kind = func_access[probefunc]; this->now = timestamp; this->goroutine_id = uregs[R_X28]; this->ptr_to_slice_header = *(uintptr_t*)copyin(uregs[R_X26] + 16, 8); this->go_slice = (GoSlice*)copyin(this->ptr_to_slice_header, sizeof(GoSlice)); this->mem_ptr = (int)this->ptr_to_slice_header; this->theirs = accesses[this->mem_ptr]; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != this->goroutine_id && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, this->goroutine_id, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, this->mem_ptr); ustack(); } // Update the map with the current access. accesses[this->mem_ptr].kind = this->my_access_kind; accesses[this->mem_ptr].tid = this->goroutine_id; accesses[this->mem_ptr].ts = this->now; gid_to_mem[this->goroutine_id] = this->mem_ptr; } pid$target::*TestQueueHTTPEmail.func3:entry { this->my_access_kind = func_access[probefunc]; this->now = timestamp; this->goroutine_id = uregs[R_X28]; this->ptr_to_slice_header = *(uintptr_t*)copyin(uregs[R_X26] + 8, 8); this->go_slice = (GoSlice*)copyin(this->ptr_to_slice_header, sizeof(GoSlice)); this->mem_ptr = (int)this->ptr_to_slice_header; this->theirs = accesses[this->mem_ptr]; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != this->goroutine_id && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, this->goroutine_id, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, this->mem_ptr); ustack(); } // Update the map with the current access. accesses[this->mem_ptr].kind = this->my_access_kind; accesses[this->mem_ptr].tid = this->goroutine_id; accesses[this->mem_ptr].ts = this->now; gid_to_mem[this->goroutine_id] = this->mem_ptr; } pid$target::*TestQueueHTTPEmail.func1:return, pid$target::*TestQueueHTTPEmail.func3:return { this->goroutine_id = uregs[R_X28]; this->mem_ptr = gid_to_mem[this->goroutine_id]; if (this->mem_ptr != 0){ // Clear the map since the access is done. accesses[this->mem_ptr].kind = 0; accesses[this->mem_ptr].tid = 0; accesses[this->mem_ptr].ts = 0; gid_to_mem[this->goroutine_id] = 0; } }

A few notes about Go's generated assembly, on ARM64, per the Go ABI:

This is a small difference between Go and C: for C we watched for concurrent access to the length field of the byte array. For Go we watch for concurrent access to the slice. There isn't a real reason for that, I just wanted to tweak the approach slightly.

When we run it, we see that the read-write race is detected:

  
Plaintext
12 13933 github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func1:entry possible data race: my_access_kind:2 my_tid=114567426771776 my_ts=84677501145916 their_access_kind:1 their_tid=64 their_ts=0 mem_ptr=cc832d38 courier.test`github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func1 courier.test`net/http.HandlerFunc.ServeHTTP+0x38 courier.test`net/http.serverHandler.ServeHTTP+0xb0 courier.test`net/http.(*conn).serve+0x51c courier.test`net/http.(*Server).Serve.gowrap3+0x24 courier.test`runtime.goexit.abi0+0x4 9 13934 github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func3:entry possible data race: my_access_kind:1 my_tid=114567393856320 my_ts=84677501654791 their_access_kind:2 their_tid=114567426771776 their_ts=-2073554395 mem_ptr=cc832d38 courier.test`github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func3 courier.test`github.com/ory/x/resilience.Retry+0x28c courier.test`github.com/ory/kratos/courier_test.TestQueueHTTPEmail+0x3750 courier.test`testing.tRunner+0xc4 courier.test`testing.(*T).Run.gowrap1+0x20 courier.test`runtime.goexit.abi0+0x4

Conclusion

Thread Sanitizer does a lot more than what we have covered, because it understands all the synchronization primitives: condition variables, atomics, etc. It tracks 'happens before' relationships between threads that call these primitives. We do not do that in our crude D script, even though we perhaps could, with a good amount of post-processing in a general purpose language.

Another major difference is that Thread Sanitizer is general purpose and tries to track all memory accesses, whereas our DTrace approach selectively tracks a few memory accesses: we 'zoom in' on a specific piece of code.

Even though, I think this is already a pretty good approach, in the spirit of '20% of the work gets you 80% of the way there'. The main appeal is that it avoids having to rebuild the whole program with different flags which can be very time consuming, or sometimes not feasible at all.

However, deciphering assembly produced by the Go compiler is non-trivial, so if you have access to DTrace static probes, I think it's much, much easier that way.

Finally, remember that neither our DTrace approach nor Thread Sanitizer guarantee that all data races will be caught, since these are runtime detectors that only see the code paths actually taken when observing this particular run of the program, and also because they operate with limited amounts of memory: they cannot remember all memory accesses in the program, they only do a best effort to remember most of them. They do not prove the absence of bugs, only their presence.

In fact, while writing this article and the accompanying test programs, Thread Sanitizer very rarely flagged the glaring data races. That was motivating, as well as terrifying.

My recommendation would still be to use your programming language or platform recommended race detector if possible. DTrace is a good fallback if that's not feasible or if you want to avoid recompiling, for example when troubleshooting in the production environment.

Addendum: The full code

The D script for C
  
Dtrace
#pragma D option dynvarsize=16m #pragma D option cleanrate=100hz typedef enum {AccessKindRead=1, AccessKindWrite=2} AccessKind; typedef struct { AccessKind kind; size_t tid; int ts; } Access; Access accesses[uintptr_t /* data ptr */]; int func_access[string]; BEGIN { // Record which kind of access each function does. func_access["byte_array_get_len"] = AccessKindRead; func_access["byte_array_set_len"] = AccessKindWrite; } pid$target::byte_array_?et_len:entry { // `arg0` is the pointer to the `ByteArray` structure, // and `len` is at offset `8` in the `ByteArray` structure. // So this gets the value of `byte_array->len`. self->mem_ptr = arg0 + 8; this->theirs = accesses[self->mem_ptr]; this->my_access_kind = func_access[probefunc]; this->now = timestamp; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != tid && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, tid, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, self->mem_ptr); ustack(); } // Update the map with the current access. accesses[self->mem_ptr].kind = this->my_access_kind; accesses[self->mem_ptr].tid = tid; accesses[self->mem_ptr].ts = this->now; } pid$target::byte_array_?et_len:return /self->mem_ptr != 0/ { // Clear the map since the access is done. accesses[self->mem_ptr].kind = 0; accesses[self->mem_ptr].tid = 0; accesses[self->mem_ptr].ts = 0; self->mem_ptr = 0; }
The D script for Go
  
Dtrace
#pragma D option dynvarsize=16m #pragma D option cleanrate=100hz typedef enum {AccessKindRead=1, AccessKindWrite=2} AccessKind; typedef struct { AccessKind kind; size_t tid; int ts; } Access; typedef struct { void* data; size_t len, cap; } GoSlice; Access accesses[uintptr_t /* data ptr */]; int func_access[string]; uintptr_t gid_to_mem[int]; BEGIN { // Record which kind of access each function does. func_access["github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func3"] = AccessKindRead; func_access["github.com/ory/kratos/courier_test.TestQueueHTTPEmail.func1"] = AccessKindWrite; } pid$target::*TestQueueHTTPEmail.func1:entry { this->my_access_kind = func_access[probefunc]; this->now = timestamp; this->goroutine_id = uregs[R_X28]; this->ptr_to_slice_header = *(uintptr_t*)copyin(uregs[R_X26] + 16, 8); this->go_slice = (GoSlice*)copyin(this->ptr_to_slice_header, sizeof(GoSlice)); this->mem_ptr = (int)this->ptr_to_slice_header; this->theirs = accesses[this->mem_ptr]; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != this->goroutine_id && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, this->goroutine_id, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, this->mem_ptr); ustack(); } // Update the map with the current access. accesses[this->mem_ptr].kind = this->my_access_kind; accesses[this->mem_ptr].tid = this->goroutine_id; accesses[this->mem_ptr].ts = this->now; gid_to_mem[this->goroutine_id] = this->mem_ptr; } pid$target::*TestQueueHTTPEmail.func3:entry { this->my_access_kind = func_access[probefunc]; this->now = timestamp; this->goroutine_id = uregs[R_X28]; this->ptr_to_slice_header = *(uintptr_t*)copyin(uregs[R_X26] + 8, 8); this->go_slice = (GoSlice*)copyin(this->ptr_to_slice_header, sizeof(GoSlice)); this->mem_ptr = (int)this->ptr_to_slice_header; this->theirs = accesses[this->mem_ptr]; if (this->theirs.tid !=0 && // 'if a thread is concurrently accessing the same memory...' this->theirs.tid != this->goroutine_id && // 'and this is another thread as the current one...' (this->my_access_kind == AccessKindWrite || this->theirs.kind == AccessKindWrite)) { // 'and at least one access is a write...' printf("possible data race: my_access_kind:%d my_tid=%d my_ts=%d their_access_kind:%d their_tid=%d their_ts=%d mem_ptr=%p\n", this->my_access_kind, this->goroutine_id, this->now, this->theirs.kind, this->theirs.tid, this->theirs.ts, this->mem_ptr); ustack(); } // Update the map with the current access. accesses[this->mem_ptr].kind = this->my_access_kind; accesses[this->mem_ptr].tid = this->goroutine_id; accesses[this->mem_ptr].ts = this->now; gid_to_mem[this->goroutine_id] = this->mem_ptr; } pid$target::*TestQueueHTTPEmail.func1:return, pid$target::*TestQueueHTTPEmail.func3:return { this->goroutine_id = uregs[R_X28]; this->mem_ptr = gid_to_mem[this->goroutine_id]; if (this->mem_ptr != 0){ // Clear the map since the access is done. accesses[this->mem_ptr].kind = 0; accesses[this->mem_ptr].tid = 0; accesses[this->mem_ptr].ts = 0; gid_to_mem[this->goroutine_id] = 0; } }
The racy C program
  
C
#include <assert.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #define DATA_LEN (1 << 20) typedef struct { uint8_t *data; size_t len, cap; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } size_t byte_array_get_cap(ByteArray *b) { return b->cap; } void byte_array_set_len(ByteArray *b, size_t len) { b->len = len; } void byte_array_set_cap(ByteArray *b, size_t cap) { b->cap = cap; } void byte_array_ensure_cap_at_least(ByteArray *b, size_t cap) { size_t current_cap = byte_array_get_cap(b); if (current_cap >= cap) { return; } byte_array_set_cap(b, current_cap < 8 ? 8 : current_cap * 2); b->data = realloc(b->data, b->cap); } void byte_array_push(ByteArray *b, uint8_t elem) { const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); } void *async_push(void *any) { ByteArray *b = any; for (size_t i = 0; i < DATA_LEN; i++) { byte_array_push(b, (uint8_t)i); } return NULL; } int main() { ByteArray byte_array = {0}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); for (;;) { const size_t len = byte_array_get_len(&byte_array); if (len == DATA_LEN) { break; } } assert(byte_array_get_len(&byte_array) == DATA_LEN); }
The C program fixed with a mutex
  
C
#include <assert.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #define DATA_LEN (1 << 20) typedef struct { uint8_t *data; size_t len, cap; pthread_mutex_t mtx; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } size_t byte_array_get_cap(ByteArray *b) { return b->cap; } void byte_array_set_len(ByteArray *b, size_t len) { b->len = len; } void byte_array_set_cap(ByteArray *b, size_t cap) { b->cap = cap; } void byte_array_ensure_cap_at_least(ByteArray *b, size_t cap) { size_t current_cap = byte_array_get_cap(b); if (current_cap >= cap) { return; } byte_array_set_cap(b, current_cap < 8 ? 8 : current_cap * 2); b->data = realloc(b->data, b->cap); } void byte_array_push(ByteArray *b, uint8_t elem) { pthread_mutex_lock(&b->mtx); const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); pthread_mutex_unlock(&b->mtx); } void *async_push(void *any) { ByteArray *b = any; for (size_t i = 0; i < DATA_LEN; i++) { byte_array_push(b, (uint8_t)i); } return NULL; } int main() { ByteArray byte_array = {.mtx = PTHREAD_MUTEX_INITIALIZER}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); for (;;) { pthread_mutex_lock(&byte_array.mtx); const size_t len = byte_array_get_len(&byte_array); pthread_mutex_unlock(&byte_array.mtx); if (len == DATA_LEN) { break; } } assert(byte_array_get_len(&byte_array) == DATA_LEN); }
The C program fixed with a RW lock
  
C
#include <assert.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #define DATA_LEN (1 << 20) typedef struct { uint8_t *data; size_t len, cap; pthread_rwlock_t rw_lock; } ByteArray; size_t byte_array_get_len(ByteArray *b) { return b->len; } size_t byte_array_get_cap(ByteArray *b) { return b->cap; } void byte_array_set_len(ByteArray *b, size_t len) { b->len = len; } void byte_array_set_cap(ByteArray *b, size_t cap) { b->cap = cap; } void byte_array_ensure_cap_at_least(ByteArray *b, size_t cap) { size_t current_cap = byte_array_get_cap(b); if (current_cap >= cap) { return; } byte_array_set_cap(b, current_cap < 8 ? 8 : current_cap * 2); b->data = realloc(b->data, b->cap); } void byte_array_push(ByteArray *b, uint8_t elem) { pthread_rwlock_wrlock(&b->rw_lock); const size_t len = byte_array_get_len(b); byte_array_ensure_cap_at_least(b, len + 1); b->data[len] = elem; byte_array_set_len(b, len + 1); pthread_rwlock_unlock(&b->rw_lock); } void *async_push(void *any) { ByteArray *b = any; for (size_t i = 0; i < DATA_LEN; i++) { byte_array_push(b, (uint8_t)i); } return NULL; } void *async_read(void *any) { ByteArray *b = any; for (;;) { pthread_rwlock_rdlock(&b->rw_lock); const size_t len = byte_array_get_len(b); pthread_rwlock_unlock(&b->rw_lock); if (len == DATA_LEN) { break; } } return NULL; } int main() { ByteArray byte_array = {.rw_lock = PTHREAD_RWLOCK_INITIALIZER}; pthread_t thread_push = {0}; pthread_create(&thread_push, NULL, async_push, &byte_array); pthread_t thread_read = {0}; pthread_create(&thread_read, NULL, async_read, &byte_array); for (;;) { pthread_rwlock_rdlock(&byte_array.rw_lock); const size_t len = byte_array_get_len(&byte_array); pthread_rwlock_unlock(&byte_array.rw_lock); if (len == DATA_LEN) { break; } } assert(byte_array_get_len(&byte_array) == DATA_LEN); }
  1. DNF means Did Not Finish, and that's because the compiler compiles the loop polling for the length of the array into an infinite loop with an empty body, because the read of the length is hoisted out of the loop body, as previously explained.

⏴ Back to all articles