Hi Supun, 
Roughly what happens now is #2. However, in your example, it may be the case 
that we are reading CSV data from disk faster than we are transcoding it into 
Parquet and writing it. Note that we attempt to use the full disk bandwidth and 
assign batches to cores once the reads are done, so ideally a core is never 
blocked on IO. In other words, even if we have 2 only cores, we may kick off 
100 batch reads and process them when the read completes and a core is 
available. This is where backpressure is needed: to prevent us from having this 
huge number of reads piling up and filling up memory faster than we can 
process. 

Sasha Krassovsky 

> 20 мая 2022 г., в 20:09, Supun Kamburugamuve <su...@apache.org> написал(а):
> 

Reply via email to