Hello, If I am on a cluster with 2 task managers with 64 CPUs each, I can configure 128 slots in accordance with the documentation. If I set parallelism to 128 and read a 64 MB file (one datasource with a single file), will flink really create 500K slices? Or, will it check the default blocksize of the host it is reading from and allocate only as many slices as there are blocks?
If the file is on S3: 1. Does a single thread copy it to local disk and then have 128 slices consume it? 2. Does a single thread read read the file from S3 and consume it, treating it as one slice? 3. Does flink talk to S3 and make a multi-part read to local storage and then read from local storage in 128 slices? If a datasource has a large number of files, does each slot read one file at a time with a single thread, or does each slot read one part of each file such that 128 slots consume each file one at a time? More generally, does flink try to allocate files to slots such that each slot reads the same volume with as long a sequential read as possible? How does it distinguish between reading from the local HDFS and S3, given that they might have vastly different performance characteristics. Thanks, David Thank you, David -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/