Tamás Gulácsi, wow didn't know that providing bufio.Reader to bufio.NewReader doesn't wrap your reader. Looks like this is the solution I've been looking for. Thanks!
On Saturday, October 31, 2020 at 6:50:18 PM UTC+1 Tamás Gulácsi wrote: > Why do you need an access to the internal bufio.Reader? > > If you provide a bufio.Reader to bufio.NewReader, then it will NOT create > a new reader, but give back your reader. > So if you keep your bufio.Reader, and give it to csv.NewReader, than you > will have the same *bufio.Reader > as what the csv.Reader's inner r ! > > Severyn Lisovsky a következőt írta (2020. október 31., szombat, 18:31:34 > UTC+1): > >> Tamás Gulácsi, this was basically my initial idea to do that, but >> unfortunately there is no access to internal bufio.Reader. See: >> https://golang.org/src/encoding/csv/reader.go#L170 >> >> peterGo, my file is ~100GB so downloading it just for sake of splitting >> doesn't make sense to me. I want for each worker to make use of >> NewRangeReader >> method >> <https://godoc.org/cloud.google.com/go/storage#ObjectHandle.NewRangeReader> >> to >> download only related piece of the file. >> >> ren...@ix.netcom.com ByteCount reader that wraps the underlying reader >> wouldn't help because csv.Reader doesn't read from underlying reader >> synchronically, it reads from bufio.Reader which buffers the bytes. So for >> example if you read 1 row from CSV (eg. 10 bytes) from underlying io.Reader >> will be 4096 bytes read. On the next csv.Reader.Read() call none of bytes >> will be read from underlying io.Reader because it will take next row out of >> the buffer >> >> On Saturday, October 31, 2020 at 6:02:32 PM UTC+1 Tamás Gulácsi wrote: >> >>> Give csv.NewReader your own *bufio.Reader. >>> Regarding (https://pkg.go.dev/pkg/bufio/#NewReaderSize) if the >>> underlying io.Reader is already a *bufio.Reader with a big enough size (and >>> csv.NewReader uses the default 4k), >>> then the underlying reader is used, no new wrapping is introduced. >>> >>> This way if you use >>> cr := countingReader{Reader:r} >>> br := bufio.NewReader(cr) >>> csvR := csv.NewReader(br) >>> >>> then cr.N - br.Buffered() is the number of bytes read by csv.Reader, the >>> end of the last line read. >>> >>> Hope this helps. >>> >>> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 3:17:26 >>> UTC+1): >>> >>>> Hi, >>>> >>>> I have difficulty counting bytes that were processed by csv.Reader >>>> because it reads from internally created bufio.Reader. If I pass some >>>> counting reader to csv.NewReader it will show not the actual number bytes >>>> "processed" by csv.Reader to receive the output I get calling >>>> csv.Reader.Read method, but the number of bytes copied to bufio.Reader's >>>> buffer internally (some bytes may be read during next csv.Reader.Read call >>>> from the buffer). >>>> >>>> Is there a way I can deal with this issue by not forking encoding/csv >>>> package? >>>> >>>> To give you more high-level picture - I want to split remote csv file >>>> to chunks. Each chunk should be standalone csv file - starting from actual >>>> beginning of the line, ending with newline byte. So I'm trying to do the >>>> following - split file size by the number of chunks, and for each chunk - >>>> skip first bytes up to newline symbol and read to offset+chunkSize+[number >>>> of bytes to the next newline symbol] >>>> >>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/661de50e-6e00-4d13-b2d1-b729f97aa3fen%40googlegroups.com.