Thanks Robert. Of course you are right, and a pull request would be welcome 
:)

Seriously though -- I do appreciate the comment. At the time, if
I remember -- this was 2 years ago when I wrote it -- I recall
not wanting to complicate the code by having to deal
with the CVS file lines that got split between two goroutines
if I didn't find the newlines first. Once you do that
you need more locking to resolve the conflict and
not step on the same memory another goroutine is using...
much more coordination seemed necessary.

That and the true bottle neck usually being the
parsing of the floats means once I matched what
the C code for data.table was doing, I moved on. So yes,
it could be faster, but the simpler code was appealing.

- J


On Wednesday, September 24, 2025 at 11:31:17 PM UTC+1 robert engels wrote:

As an aside, your slurp isn’t really doing what you think.

The line byby := bytes.Split(buf, newline) is causing the entire file to be 
read into memory on a single core, which is unnecessary.

You need to modify the code a bit to get the optimum performance.

You should calculate a base offset which is (total file size / number of 
cores).

Then calculate the actual offsets by seeking to that point, then advancing 
to the next new line, then do the same for the rest - so then you having an 
array of slices - each of which is a portion of the file.


On Sep 24, 2025, at 5:19 PM, Jason E. Aten wrote:

Hi Vikram,

Sounds like you got it working--great!  Also the LLMs are terrific for 
explaining language concepts
if you are stuck conceptually.

If you need a dataframe package that scales to big data 
(as it turns out parsing floating
point numbers is a very slow operation), 
I wrote a use-all-cores fast parallel loading dataframe 
for Go called SlurpDF. I was envious of how 
fast R's data.table could read in CSV files in parallel. See

https://github.com/glycerine/slurpdf

See slurp_test.go for an example of writing back to CSV on disk.

(this was in service of a little Xgboost-like gradient boosted decision 
tree ensemble machine learner, e.g. https://github.com/glycerine/gocortado)

Enjoy,
Jason


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/b45b28a1-63e0-43b1-a56a-ca2f6625459fn%40googlegroups.com.

Reply via email to