Thanks Robert. Of course you are right, and a pull request would be welcome :)
Seriously though -- I do appreciate the comment. At the time, if I remember -- this was 2 years ago when I wrote it -- I recall not wanting to complicate the code by having to deal with the CVS file lines that got split between two goroutines if I didn't find the newlines first. Once you do that you need more locking to resolve the conflict and not step on the same memory another goroutine is using... much more coordination seemed necessary. That and the true bottle neck usually being the parsing of the floats means once I matched what the C code for data.table was doing, I moved on. So yes, it could be faster, but the simpler code was appealing. - J On Wednesday, September 24, 2025 at 11:31:17 PM UTC+1 robert engels wrote: As an aside, your slurp isn’t really doing what you think. The line byby := bytes.Split(buf, newline) is causing the entire file to be read into memory on a single core, which is unnecessary. You need to modify the code a bit to get the optimum performance. You should calculate a base offset which is (total file size / number of cores). Then calculate the actual offsets by seeking to that point, then advancing to the next new line, then do the same for the rest - so then you having an array of slices - each of which is a portion of the file. On Sep 24, 2025, at 5:19 PM, Jason E. Aten wrote: Hi Vikram, Sounds like you got it working--great! Also the LLMs are terrific for explaining language concepts if you are stuck conceptually. If you need a dataframe package that scales to big data (as it turns out parsing floating point numbers is a very slow operation), I wrote a use-all-cores fast parallel loading dataframe for Go called SlurpDF. I was envious of how fast R's data.table could read in CSV files in parallel. See https://github.com/glycerine/slurpdf See slurp_test.go for an example of writing back to CSV on disk. (this was in service of a little Xgboost-like gradient boosted decision tree ensemble machine learner, e.g. https://github.com/glycerine/gocortado) Enjoy, Jason -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/b45b28a1-63e0-43b1-a56a-ca2f6625459fn%40googlegroups.com.
