Hi, I've been playing around with using apache flink to process some data, and I'm starting out using the batch DataSet API.
To start, I read in some data from files in an S3 folder: DataSet<String> records = env.readTextFile("s3://my-s3-bucket/some-folder/"); Within the folder, there are 20 gzipped files, and I have 20 node/tasks run (so parallel 20). It looks like each node is reading in ALL the files (whole folder), but what I really want is for each node/task to read in 1 file each and each process the data within the file they read in. Is this expected behavior? Am I suppose to be doing something different here to get the results I want? Thanks.