Hi Brendan, take a look of this amazing event of Ardan Labs where they 
explore in depth how to analyze the GC. I think you will find your answer 
how the GC works and the best way to know how much goroutines vs GC memory 
use for your best performance. 

You need to sign in into Ardan Labs but it is free. I didn't find the video 


They have some interesting info about it in the blogs page too: 


Marcos Issler 

Em segunda-feira, 22 de julho de 2024 às 21:22:25 UTC-3, ben...@gmail.com 

> Hi Brendan, this is a fun problem. I'm looking at 
> https://github.com/Goldabj/1brc-go/blob/main/cmd/brc/log_processor.go, 
> and I suspect the main thing is that you're converting []byte to string, 
> which almost certainly allocates. You're also scanning through the bytes 
> several times (first to find the '\n', then the ';', then the '.'). And 
> you're calling ParseInt (twice), which does more work than you need to for 
> this problem. If you want to go further, I'd suggest keeping things as 
> []byte, and making the code search for the various delimiters *and* parse 
> at the same time.
> If you want to "cheat" :-), you could look at my write-up, where I present 
> 9 (incrementally faster) solutions to this problem: 
> https://benhoyt.com/writings/go-1brc/
> -Ben
> On Tuesday, July 23, 2024 at 8:17:57 AM UTC+12 Ian Lance Taylor wrote:
>> On Mon, Jul 22, 2024 at 1:05 PM Brendan Goldacker <brend...@gmail.com> 
>> wrote:
>>> I've been using the 1 billion row challenge 
>>> <https://github.com/gunnarmorling/1brc/discussions/67> as a way to 
>>> learn go over the past few days. With my limited go experience, I've seem 
>>> to hit a wall. 
>>> I have my submission here <https://github.com/Goldabj/1brc-go>. On my 
>>> M1 macbook pro with 16GB of memory, it runs in ~39s. I can get it a few 
>>> seconds faster if I play around with some parameters (numWorkers and 
>>> chunkSizes). I'm trying to make it a goal to get <20s. However, I'm 
>>> stuck.
>>> I came across another submission from this blog post  
>>> <https://www.bytesizego.com/blog/one-billion-row-challenge-go>. I've 
>>> copied over the submission to my repo (otherSubmission.go) and it runs 
>>> in ~19s. I was looking over their solution and it appears very similar to 
>>> mine in apprach. The only difference is that they use os.Read to read 
>>> the file sequentially, and my submission uses MMap. Plus a few other 
>>> minor differences. Overall, both approaches look to be about the same. 
>>> However, theirs runs in about half the time. I can't find anything that 
>>> stands out to why there would be such a large difference in runtime.
>>> I've been trying to use the profile and execution traces to figure out 
>>> why. Below are my flameGraphs and execution traces
[image: Screenshot 2024-07-22 at 1 19 40 PM] 

[image: Screenshot 2024-07-22 at 1 21 47 PM] 

>>> Below are the otherSubmission profile and execution traces:
[image: Screenshot 2024-07-22 at 1 23 39 PM] 

[image: Screenshot 2024-07-22 at 1 22 47 PM] 

>>> The first thing that stands out to my are the # of GC collections. My 
>>> submission seems to be running GC much more frequently - 100,000 
>>> incremental sweeps, 32,000 background sweeps, and around 1000 stop the 
>>> world GCs. The heap only grows to 180MB at its peak, and typically stays 
>>> around 72MB. In the otherSubmission, incremental GC only runs 6 times, and 
>>> stop the work is ~30 times. Their heap grows much larger (up to 3.5GB), and 
>>> GC happens less frequently.
>>> Also, looking at the flame graphs, it looks like there is much more time 
>>> being spent on managing the stack systemstack, runtime.morestack, and 
>>> runtime.mcall. I think this may be due to more scheduler overhead due 
>>> to the frequent GCs (waiting for routines to get to a stopping point, 
>>> running GC, then re-scheduling threads to start again).
>>> I believe the time discrepancy is mostly due to GC overhead and go 
>>> scheduler overhead. Although, I'm not sure why.
>>> Could anyone help a newbie figure out where I'm going wrong? Also could 
>>> share any tools or tricks that could help me figure this out?
>> I have not looked at your code.  But if the GC is running a lot, the 
>> first step is to look at where you are allocating memory, using Go's memory 
>> profiler.  It may help to read https://go.dev/blog/pprof.  See if you 
>> can adjust your code to allocate less memory.
>> Ian

