Hi Brendan, this is a fun problem. I'm looking 
at https://github.com/Goldabj/1brc-go/blob/main/cmd/brc/log_processor.go, 
and I suspect the main thing is that you're converting []byte to string, 
which almost certainly allocates. You're also scanning through the bytes 
several times (first to find the '\n', then the ';', then the '.'). And 
you're calling ParseInt (twice), which does more work than you need to for 
this problem. If you want to go further, I'd suggest keeping things as 
[]byte, and making the code search for the various delimiters *and* parse 
at the same time.

If you want to "cheat" :-), you could look at my write-up, where I present 
9 (incrementally faster) solutions to this 
problem: https://benhoyt.com/writings/go-1brc/

-Ben

On Tuesday, July 23, 2024 at 8:17:57 AM UTC+12 Ian Lance Taylor wrote:

> On Mon, Jul 22, 2024 at 1:05 PM Brendan Goldacker <brend...@gmail.com> 
> wrote:
>
>> I've been using the 1 billion row challenge 
>> <https://github.com/gunnarmorling/1brc/discussions/67> as a way to learn 
>> go over the past few days. With my limited go experience, I've seem to hit 
>> a wall. 
>>
>> I have my submission here <https://github.com/Goldabj/1brc-go>. On my M1 
>> macbook pro with 16GB of memory, it runs in ~39s. I can get it a few 
>> seconds faster if I play around with some parameters (numWorkers and 
>> chunkSizes). I'm trying to make it a goal to get <20s. However, I'm 
>> stuck.
>>
>> I came across another submission from this blog post  
>> <https://www.bytesizego.com/blog/one-billion-row-challenge-go>. I've 
>> copied over the submission to my repo (otherSubmission.go) and it runs 
>> in ~19s. I was looking over their solution and it appears very similar to 
>> mine in apprach. The only difference is that they use os.Read to read 
>> the file sequentially, and my submission uses MMap. Plus a few other 
>> minor differences. Overall, both approaches look to be about the same. 
>> However, theirs runs in about half the time. I can't find anything that 
>> stands out to why there would be such a large difference in runtime.
>>
>> I've been trying to use the profile and execution traces to figure out 
>> why. Below are my flameGraphs and execution traces
>> [image: Screenshot 2024-07-22 at 1 19 40 PM] 
>> <https://private-user-images.githubusercontent.com/11671931/351074916-985123e4-966c-4e17-9706-4275b4143e7a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2NzQ2NDcsIm5iZiI6MTcyMTY3NDM0NywicGF0aCI6Ii8xMTY3MTkzMS8zNTEwNzQ5MTYtOTg1MTIzZTQtOTY2Yy00ZTE3LTk3MDYtNDI3NWI0MTQzZTdhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDE4NTIyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFlY2E3ZGRlOGQ0Njc5NGJlZDAzMTFjYjk2ZmQwMDhlOTk3MjM4NzNmZjNkNzI3N2RmZjAzYTIwNTRhM2NmNGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.A930je5Bc_a8pcF2TD_2zvLY5ZINVL_FH5bnq9l_Pig>
>>  [image: Screenshot 2024-07-22 at 1 21 47 PM] 
>> <https://private-user-images.githubusercontent.com/11671931/351074943-9340b735-5858-4de8-a1b2-e5d07ea3762b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2NzQ2NDcsIm5iZiI6MTcyMTY3NDM0NywicGF0aCI6Ii8xMTY3MTkzMS8zNTEwNzQ5NDMtOTM0MGI3MzUtNTg1OC00ZGU4LWExYjItZTVkMDdlYTM3NjJiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDE4NTIyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4ODBiYTcyOGM2OTk5ZmJlMTU2ZGViMTg0N2VhYjRmNmUxOTZlZjcwYWY2YWU1OTY3MzcxNTc0MjZhNzZiMDUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.FhWgcWxT992KkMV6m21LoQnCJU5-JaoCP2TcTfidSNE>
>>
>> Below are the otherSubmission profile and execution traces:
>> [image: Screenshot 2024-07-22 at 1 23 39 PM] 
>> <https://private-user-images.githubusercontent.com/11671931/351075058-d46a2447-2e9e-49c2-97f4-2d48b9451490.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2NzQ2NDcsIm5iZiI6MTcyMTY3NDM0NywicGF0aCI6Ii8xMTY3MTkzMS8zNTEwNzUwNTgtZDQ2YTI0NDctMmU5ZS00OWMyLTk3ZjQtMmQ0OGI5NDUxNDkwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDE4NTIyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWMwZDY4MGJhOWU3M2Q3ZjUwNTAxYjZhYjdlZTI3NGFhMGZhOTc3ZmI5Y2Q0Njc4MzcwMGJjMmYzMjdlM2RhZjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.vQrfAS9DP6V2wp_NOht_nD1xdEWNdug7jupBOTqpoOg>
>>  [image: Screenshot 2024-07-22 at 1 22 47 PM] 
>> <https://private-user-images.githubusercontent.com/11671931/351075091-20e2a17d-0f29-4bfc-9915-081d923c9130.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2NzQ2NDcsIm5iZiI6MTcyMTY3NDM0NywicGF0aCI6Ii8xMTY3MTkzMS8zNTEwNzUwOTEtMjBlMmExN2QtMGYyOS00YmZjLTk5MTUtMDgxZDkyM2M5MTMwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDE4NTIyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQxNDA3NjhlMWNmODdiNmI1ZGM1YTRmNTI1MTc4YWIzYjVlNTNkOWM5OWExZWMyODc3YTc4NzA3MDJlNjEyZDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Rjmzp26Tu5rPReCAacP7d0euX5NeRxszBJ5TnEoLSyw>
>>
>> The first thing that stands out to my are the # of GC collections. My 
>> submission seems to be running GC much more frequently - 100,000 
>> incremental sweeps, 32,000 background sweeps, and around 1000 stop the 
>> world GCs. The heap only grows to 180MB at its peak, and typically stays 
>> around 72MB. In the otherSubmission, incremental GC only runs 6 times, and 
>> stop the work is ~30 times. Their heap grows much larger (up to 3.5GB), and 
>> GC happens less frequently.
>>
>> Also, looking at the flame graphs, it looks like there is much more time 
>> being spent on managing the stack systemstack, runtime.morestack, and 
>> runtime.mcall. I think this may be due to more scheduler overhead due to 
>> the frequent GCs (waiting for routines to get to a stopping point, running 
>> GC, then re-scheduling threads to start again).
>>
>> I believe the time discrepancy is mostly due to GC overhead and go 
>> scheduler overhead. Although, I'm not sure why.
>>
>> Could anyone help a newbie figure out where I'm going wrong? Also could 
>> share any tools or tricks that could help me figure this out?
>>
>
> I have not looked at your code.  But if the GC is running a lot, the first 
> step is to look at where you are allocating memory, using Go's memory 
> profiler.  It may help to read https://go.dev/blog/pprof.  See if you can 
> adjust your code to allocate less memory.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8ac474f2-61a1-4df1-a755-3f2e372944c1n%40googlegroups.com.

Reply via email to