Hello everyone,

I'd like to get feedback on an idea for changing how Go manages goroutine 
stack growth.
Below is a short draft of the proposal.

## Current State

Currently, almost every Go function prologue includes a stack growth check.
If the remaining stack space is insufficient, the runtime allocates a 
larger stack and copies the old one, adjusting pointers to local variables 
as needed.

**Drawbacks of this approach:**

* Increased CPU usage due to frequent stack size checks and possible 
reallocations
* Larger code size because of the additional prologue instructions

## Proposed Stack Management Mechanism
I would like to hear your opinion on the following stack growth mechanism 
and whether it's worth exploring further. 
If you think that this idea has potential, I'll continue by estimating its 
effect on CPU usage and code size and, if estimations will look good 
enough, make a proof of concept.

### Reallocation via Page Faults
The idea is inspired by how Linux manages system thread stacks.

In Linux, each thread reserves (by default) 8 MB of virtual memory for its 
stack. Physical memory is mapped lazily - new pages are allocated when the 
thread touches them, via page faults. 
When the stack limit is reached, the program aborts. 

In Go, however, instead of aborting, we could reuse the existing stack 
growth logic - relocating the stack to a larger chunk when a page fault 
occurs near the stack boundary.

**Potential drawbacks**
* The Go runtime would need to handle page faults:
    * This might increase the number of page faults and add handling 
overhead
    * It could be tricky to distinguish between stack-related and unrelated 
page faults

* A large number of goroutines will consume a large amount of virtual 
address space
* The minimal stack size would effectively increase from 2 KB to 4 KB (one 
physical page). In the worst case, when all goroutine use <2Kb stack space, 
this will double memory consumption
* This mechanism would depend on OS-level signal handling and may require 
platform-specific implementations

The main concern, as I see it, is the increased use of virtual address 
space. 
A rough estimation:
100k goroutines with 8 MB stacks each would reserve ~800 GB (=2^3 * 10^5 * 
2^20 ~ 2^38 B), i.e., about 1/1000 of the 2^48 bit virtual address space.
This seems acceptable, especially since we can reserve less than 8 MB.

The second concern is the larger minimum stack size (4 KB vs 2 KB). This 
could double memory consumption in the worst case.
I'm not yet sure whether this trade-off would be acceptable or if it can be 
mitigated.

Also, the cross-platform support is a major concern. 

## Additional notes
* The current implementation supports stack shrinking (when less than 1/4 
of the stack is used). I guess we can shrink stack with MADV_DONTNEED.
* Stack growth checks are currently tied to the goroutine preemption. 
Removing them might indirectly affect the scheduler. However, Go has other 
cooperative/asynchronous preemption, so this may not be a major issue.

### Conclusion
What do you think about this idea? 
Is this direction worth further exploration? To get some concrete 
performance improvement estimations and make PoC?

Thank you for your time and feedback!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/f608147c-24ae-4126-96f4-753e4c5990ffn%40googlegroups.com.

Reply via email to