Re: [go-nuts] Help with Dataframes

Jason E. Aten Sat, 18 Oct 2025 07:03:24 -0700

Thanks Robert. Of course you are right, and a pull request would be welcome 
:)

Seriously though -- I do appreciate the comment. At the time, if
I remember -- this was 2 years ago when I wrote it -- I recall
not wanting to complicate the code by having to deal
with the CVS file lines that got split between two goroutines
if I didn't find the newlines first. Once you do that
you need more locking to resolve the conflict and
not step on the same memory another goroutine is using...
much more coordination seemed necessary.

That and the true bottle neck usually being the
parsing of the floats means once I matched what
the C code for data.table was doing, I moved on. So yes,
it could be faster, but the simpler code was appealing.

- J

On Wednesday, September 24, 2025 at 11:31:17 PM UTC+1 robert engels wrote:

As an aside, your slurp isn’t really doing what you think.

The line byby := bytes.Split(buf, newline) is causing the entire file to be
read into memory on a single core, which is unnecessary.

You need to modify the code a bit to get the optimum performance.

You should calculate a base offset which is (total file size / number of
cores).

Then calculate the actual offsets by seeking to that point, then advancing
to the next new line, then do the same for the rest - so then you having an
array of slices - each of which is a portion of the file.

On Sep 24, 2025, at 5:19 PM, Jason E. Aten wrote:

Hi Vikram,

Sounds like you got it working--great! Also the LLMs are terrific for
explaining language concepts
if you are stuck conceptually.

If you need a dataframe package that scales to big data
(as it turns out parsing floating
point numbers is a very slow operation),
I wrote a use-all-cores fast parallel loading dataframe
for Go called SlurpDF. I was envious of how
fast R's data.table could read in CSV files in parallel. See

https://github.com/glycerine/slurpdf

See slurp_test.go for an example of writing back to CSV on disk.

(this was in service of a little Xgboost-like gradient boosted decision
tree ensemble machine learner, e.g. https://github.com/glycerine/gocortado)

Enjoy,
Jason

--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/golang-nuts/b45b28a1-63e0-43b1-a56a-ca2f6625459fn%40googlegroups.com.

Re: [go-nuts] Help with Dataframes

Reply via email to