Tamás Gulácsi, this was basically my initial idea to do that, but 
unfortunately there is no access to internal bufio.Reader. See:
https://golang.org/src/encoding/csv/reader.go#L170

peterGo, my file is ~100GB so downloading it just for sake of splitting 
doesn't make sense to me. I want for each worker to make use of NewRangeReader 
method 
<https://godoc.org/cloud.google.com/go/storage#ObjectHandle.NewRangeReader> to 
download only related piece of the file. 

ren...@ix.netcom.com ByteCount reader that wraps the underlying reader 
wouldn't help because csv.Reader doesn't read from underlying reader 
synchronically, it reads from bufio.Reader which buffers the bytes. So for 
example if you read 1 row from CSV (eg. 10 bytes) from underlying io.Reader 
will be 4096 bytes read. On the next csv.Reader.Read() call none of bytes 
will be read from underlying io.Reader because it will take next row out of 
the buffer

On Saturday, October 31, 2020 at 6:02:32 PM UTC+1 Tamás Gulácsi wrote:

> Give csv.NewReader your own *bufio.Reader. 
> Regarding (https://pkg.go.dev/pkg/bufio/#NewReaderSize) if the underlying 
> io.Reader is already a *bufio.Reader with a big enough size (and 
> csv.NewReader uses the default 4k),
> then the underlying reader is used, no new wrapping is introduced.
>
> This way if you use 
>   cr := countingReader{Reader:r}  
>   br := bufio.NewReader(cr)
>   csvR := csv.NewReader(br)
>
> then cr.N - br.Buffered() is the number of bytes read by csv.Reader, the 
> end of the last line read.
>
> Hope this helps.
>
> Severyn Lisovsky a következőt írta (2020. október 31., szombat, 3:17:26 
> UTC+1):
>
>> Hi,
>>
>> I have difficulty counting bytes that were processed by csv.Reader 
>> because it reads from internally created bufio.Reader. If I pass some 
>> counting reader to csv.NewReader it will show not the actual number bytes 
>> "processed" by csv.Reader to receive the output I get calling 
>> csv.Reader.Read method, but the number of bytes copied to bufio.Reader's 
>> buffer internally (some bytes may be read during next csv.Reader.Read call 
>> from the buffer).
>>
>> Is there a way I can deal with this issue by not forking encoding/csv 
>> package?
>>
>> To give you more high-level picture - I want to split remote csv file to 
>> chunks. Each chunk should be standalone csv file - starting from actual 
>> beginning of the line, ending with newline byte. So I'm trying to do the 
>> following - split file size by the number of chunks, and for each chunk - 
>> skip first bytes up to newline symbol and read to offset+chunkSize+[number 
>> of bytes to the next newline symbol]
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/24264af8-771b-4250-8eec-a88721572a32n%40googlegroups.com.

Reply via email to