Background#1: working on Linux system diagnosis tools, I often end up scraping the process and system filesystems quite extensively using os.ReadDir. On busy systems this might be a place to optimize things a little bit, or at least see if it does reap any benefits.
Background#2: "*You can copy the ReadDir source code and customize it. Anyway, I think the os.(*File).ReadDir call is hugely *[sic!] *slower than the Sort call*[...] <https://groups.google.com/g/golang-nuts/c/Q7hYQ9GdX9Q/m/l9WQZhPp5yEJ>" (from golang-nuts discussion "io.ioutil.ReadDir sort by default") So I learned how to write benchmarks using "testing"; for reference, here's my benchmarking code at the time of this writing readdir_bench_test.go <https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go> (Github). The system used is Ubuntu 24.10 with kernel 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC, the CPU is a AMD Ryzen 9 7950X 16-Core Processor. I run the tests inside a "taskset" command so that the Go OS-level threads get scheduled only to a certain set of logical CPUs. It seems that explicitly scheduling only on one of two hyper-threaded CPUs doesn't seem to have significant impact on my results. First, let's benchmark os.ReadDir (benchmark code <https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L95C1-L103C1>) against the os.Open/os.(*File).ReadDir replacement, where the replacement benchmark code <https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L105C1-L117C1> lacks the sorting. Let's benchmark with a small number of 16 directory entries: taskset -c 24-31 go test -bench=ReadDir -run=^$ -cpu=2 -benchmem -benchtime=10s -dir-entries=16 goos: linux goarch: amd64 pkg: github.com/thediveo/faf cpu: AMD Ryzen 9 7950X 16-Core Processor BenchmarkReadDir/os.ReadDir-2 2119070 5650 ns/op 1719 B/op 32 allocs/op BenchmarkReadDir/os.File.ReadDir-2 1973565 6106 ns/op 1718 B/op 32 allocs/op Oh my red-nosed deer, that back-fired, this optimization is slower. The reason can be explained when diving into the stdlib sources, and the internal os.openDir <https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/os/file.go;l=394> in particular: a nice comment explains it: // openDir opens a file which is assumed to be a directory. As such, it skips // the syscalls that make the file descriptor non-blocking as these take time // and will fail on file descriptors for directories. func openDir(name string) (*File, error) { testlog.Open(name) return openDirNolog(name) } The backfiring disappears for larger directories, such as with 1024 entries: taskset -c 24-31 go test -bench=ReadDir -run=^$ -cpu=1,2 -benchmem -benchtime=10s -dir-entries=1024 goos: linux goarch: amd64 pkg: github.com/thediveo/faf cpu: AMD Ryzen 9 7950X 16-Core Processor BenchmarkReadDir/os.ReadDir-2 52126 230698 ns/op 128720 B/op 2055 allocs/op BenchmarkReadDir/os.File.ReadDir-2 76035 158164 ns/op 128657 B/op 2055 allocs/op For a large directory with 1024 entries, kicking out sorting results in about 30% less runtime, whereas for 16 entries, we are almost 10% slower. Unfortunately, openDir is internal and resorting to linkname tactics will come to an end anyway. So let's try what happens, if we simply just use os.Open and then throw the resulting fd into os.NewFile (benchmark code <https://github.com/thediveo/faf/blob/51a7f846d3f851ac65ec9d6b9b03d96cf7365154/readdir_bench_test.go#L119C1-L132C1>), for 16 entries: BenchmarkReadDir/os.ReadDir-2 2119070 5650 ns/op 1719 B/op 32 allocs/op BenchmarkReadDir/os.NewFile-2 2218638 5338 ns/op 1718 B/op 32 allocs/op Yes, this brings us back into the benchmark game, we're now faster even for these small directories than a plain os.ReadDir. For reference, these are the abbreviated results for 1024 entries, which are much more favorable now, around 1/3 faster: BenchmarkReadDir/os.ReadDir-2 52126 230698 ns/op 128720 B/op 2055 allocs/op BenchmarkReadDir/os.NewFile-2 76341 156573 ns/op 128658 B/op 2055 allocs/op If you take a look at the linked benchmarks then you will notice that there's a fourth benchmark that optimizes further by pushing entries to a consumer using the Go 1.23 iterator pattern and using []byte entry names instead of strings, where the entry data is valid only for the particular iteration. In my use cases I often don't keep the directory entry information as-is anyway, so any heap allocations avoided might pay for good. Just for reference, for 1024 entries: BenchmarkReadDir/os.ReadDir-2 52126 230698 ns/op 128720 B/op 2055 allocs/op BenchmarkReadDir/faf.ReadDir-2 106552 111842 ns/op 48 B/op 1 allocs/op Hopefully my venture into benchmarking os.ReadDir might be of interest to someone. Happy Go-ing! -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/5d7f325a-8c21-45a2-8eaa-cb5c540b68b1n%40googlegroups.com.