Hi Supun, Roughly what happens now is #2. However, in your example, it may be the case that we are reading CSV data from disk faster than we are transcoding it into Parquet and writing it. Note that we attempt to use the full disk bandwidth and assign batches to cores once the reads are done, so ideally a core is never blocked on IO. In other words, even if we have 2 only cores, we may kick off 100 batch reads and process them when the read completes and a core is available. This is where backpressure is needed: to prevent us from having this huge number of reads piling up and filling up memory faster than we can process.
Sasha Krassovsky > 20 мая 2022 г., в 20:09, Supun Kamburugamuve <su...@apache.org> написал(а): >