Hey,
I have a very specific use case. I have a history of records stored as
Parquet in S3. I would like to read and process them with Flink. The issue
is that the number of files is quite large ( >100k). If I provide the full
list of files to HadoopInputFormat that I am using it will fail with
AskTimeoutException, which Is weird since I am using YARN and setting the
-yD akka.ask.timeout=600s, even thought according to the logs the setting
is processed properly, the job execution still with AskTimeoutException
after 10s, which seems weird to me. I have managed to go around this, by
grouping files and reading them in a loop, so that finally I have the
Seq[DataSet<Record>]. But if I try to union those datasets, then I will
receive the AskTimeoutException again. So my question is, what can be the
reason behind this exception being thrown and why is the setting ignored,
even if this is pared properly.

I will be glad for any help.

Best Regards,
Dom.

Reply via email to