Background#1: working on Linux system diagnosis tools, I often end up 
scraping the process and system filesystems quite extensively using 
os.ReadDir. On busy systems this might be a place to optimize things a 
little bit, or at least see if it does reap any benefits.

Background#2: "*You can copy the ReadDir source code and customize it. 
Anyway, I think the os.(*File).ReadDir call is hugely *[sic!] *slower than 
the Sort call*[...] 
<https://groups.google.com/g/golang-nuts/c/Q7hYQ9GdX9Q/m/l9WQZhPp5yEJ>" 
(from golang-nuts discussion "io.ioutil.ReadDir sort by default")

So I learned how to write benchmarks using "testing"; for reference, here's 
my benchmarking code at the time of this writing readdir_bench_test.go 
<https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go>
 
(Github).

The system used is Ubuntu 24.10 with kernel 6.8.0-41-generic #41-Ubuntu SMP 
PREEMPT_DYNAMIC, the CPU is a AMD Ryzen 9 7950X 16-Core Processor. I run 
the tests inside a "taskset" command so that the Go OS-level threads get 
scheduled only to a certain set of logical CPUs. It seems that explicitly 
scheduling only on one of two hyper-threaded CPUs doesn't seem to have 
significant impact on my results.

First, let's benchmark os.ReadDir (benchmark code 
<https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L95C1-L103C1>)
 
against the os.Open/os.(*File).ReadDir replacement, where the replacement 
benchmark 
code 
<https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L105C1-L117C1>
 
lacks the sorting. Let's benchmark with a small number of 16 directory 
entries:

taskset -c 24-31 go test -bench=ReadDir -run=^$ -cpu=2 -benchmem 
-benchtime=10s -dir-entries=16

goos: linux
goarch: amd64
pkg: github.com/thediveo/faf
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkReadDir/os.ReadDir-2            2119070              5650 ns/op   
         1719 B/op         32 allocs/op
BenchmarkReadDir/os.File.ReadDir-2       1973565              6106 ns/op   
         1718 B/op         32 allocs/op


Oh my red-nosed deer, that back-fired, this optimization is slower. The 
reason can be explained when diving into the stdlib sources, and the 
internal os.openDir 
<https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/os/file.go;l=394> 
in particular: a nice comment explains it:

// openDir opens a file which is assumed to be a directory. As such, it 
skips
// the syscalls that make the file descriptor non-blocking as these take 
time
// and will fail on file descriptors for directories.
func openDir(name string) (*File, error) {
    testlog.Open(name)
    return openDirNolog(name)
}

The backfiring disappears for larger directories, such as with 1024 entries:

taskset -c 24-31 go test -bench=ReadDir -run=^$ -cpu=1,2 -benchmem 
-benchtime=10s -dir-entries=1024

goos: linux
goarch: amd64
pkg: github.com/thediveo/faf
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkReadDir/os.ReadDir-2              52126            230698 ns/op   
       128720 B/op       2055 allocs/op
BenchmarkReadDir/os.File.ReadDir-2         76035            158164 ns/op   
       128657 B/op       2055 allocs/op

For a large directory with 1024 entries, kicking out sorting results in 
about 30% less runtime, whereas for 16 entries, we are almost 10% slower.

Unfortunately, openDir is internal and resorting to linkname tactics will 
come to an end anyway. So let's try what happens, if we simply just use 
os.Open and then throw the resulting fd into os.NewFile (benchmark code 
<https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L119C1-L132C1>),
 
for 16 entries:

BenchmarkReadDir/os.ReadDir-2            2119070              5650 ns/op   
         1719 B/op         32 allocs/op
BenchmarkReadDir/os.NewFile-2            2218638              5338 ns/op   
         1718 B/op         32 allocs/op

Yes, this brings us back into the benchmark game, we're now faster even for 
these small directories than a plain os.ReadDir.

For reference, these are the abbreviated results for 1024 entries, which 
are much more favorable now, around 1/3 faster:

BenchmarkReadDir/os.ReadDir-2              52126            230698 ns/op   
       128720 B/op       2055 allocs/op
BenchmarkReadDir/os.NewFile-2              76341            156573 ns/op   
       128658 B/op       2055 allocs/op

If you take a look at the linked benchmarks then you will notice that 
there's a fourth benchmark that optimizes further by pushing entries to a 
consumer using the Go 1.23 iterator pattern and using []byte entry names 
instead of strings, where the entry data is valid only for the particular 
iteration. In my use cases I often don't keep the directory entry 
information as-is anyway, so any heap allocations avoided might pay for 
good. Just for reference, for 1024 entries:

BenchmarkReadDir/os.ReadDir-2              52126            230698 ns/op   
       128720 B/op       2055 allocs/op
BenchmarkReadDir/faf.ReadDir-2            106552            111842 ns/op   
           48 B/op          1 allocs/op

Hopefully my venture into benchmarking os.ReadDir might be of interest to 
someone. Happy Go-ing!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/5d7f325a-8c21-45a2-8eaa-cb5c540b68b1n%40googlegroups.com.

Reply via email to