Re: Flink Read thousands of files with batch

Dominik Wosiński Tue, 12 Nov 2019 02:49:56 -0800

Hey,
I have increased the `akka.client.timeout` but it has not helped at all.
Here is the log with callstack for AskTimeoutException:


019-11-12 10:19:17,425 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      -
Received JobGraph submission 81fbbc3f41ad5e08ac832d0e656478bc (Flink
Java Job at Tue Nov 12 10:04:19 UTC 2019).
2019-11-12 10:19:17,425 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher      -
Submitting job 81fbbc3f41ad5e08ac832d0e656478bc (Flink Java Job at Tue
Nov 12 10:04:19 UTC 2019).
2019-11-12 10:19:17,444 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService              -
Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster
at akka://flink/user/jobmanager_0 .
2019-11-12 10:19:17,452 INFO
org.apache.flink.runtime.jobmaster.JobMaster                  -
Initializing job Flink Java Job at Tue Nov 12 10:04:19 UTC 2019
(81fbbc3f41ad5e08ac832d0e656478bc).
2019-11-12 10:19:17,477 INFO
org.apache.flink.runtime.jobmaster.JobMaster                  - Using
restart strategy NoRestartStrategy for Flink Java Job at Tue Nov 12
10:04:19 UTC 2019 (81fbbc3f41ad5e08ac832d0e656478bc).
2019-11-12 10:19:17,495 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
recovers via failover strategy: full graph restart
2019-11-12 10:19:17,513 INFO
org.apache.flink.runtime.jobmaster.JobMaster                  -
Running initialization on master for job Flink Java Job at Tue Nov 12
10:04:19 UTC 2019 (81fbbc3f41ad5e08ac832d0e656478bc).
2019-11-12 10:19:17,777 INFO
org.apache.flink.runtime.jobmaster.JobMaster                  -
Successfully ran initialization on master in 264 ms.
2019-11-12 10:19:27,442 ERROR
org.apache.flink.runtime.rest.handler.job.JobSubmitHandler    -
Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#-1507147991]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)


Thanks in Advance,

Best Regards,

Dom


wt., 12 lis 2019 o 03:26 tison <wander4...@gmail.com> napisał(a):

> I suspect you suffer from Client submission failure which also throws
> AskTimeoutException.
>
> The related configure option are `akka.client.timeout` which you can
> increase. However, there
> was some cases you can resolve the problem by upgrading Java to latest
> minimum version 8u212
>
> Best,
> tison.
>
>
> Zhu Zhu <reed...@gmail.com> 于2019年11月11日周一 下午6:03写道：
>
>> Hi Dominik,
>>
>> Would you check whether the JM GC status?
>> One possible cause is that the large number of file metas
>> inHadoopInputFormat is burdening the JM memory.
>>
>> `akka.ask.timeout` is the default RPC timeout, while some RPCs may
>> override
>> this timeout for their own purpose. e.g. the RPCs from web usually use
>> `web.timeout`.
>> Providing the detailed call stack of the AskTimeoutException may help to
>> identify where this timeout happened.
>>
>> Thanks,
>> Zhu Zhu
>>
>> Dominik Wosiński <wos...@gmail.com> 于2019年11月11日周一 上午3:35写道：
>>
>> > Hey,
>> > I have a very specific use case. I have a history of records stored as
>> > Parquet in S3. I would like to read and process them with Flink. The
>> issue
>> > is that the number of files is quite large ( >100k). If I provide the
>> full
>> > list of files to HadoopInputFormat that I am using it will fail with
>> > AskTimeoutException, which Is weird since I am using YARN and setting
>> the
>> > -yD akka.ask.timeout=600s, even thought according to the logs the
>> setting
>> > is processed properly, the job execution still with AskTimeoutException
>> > after 10s, which seems weird to me. I have managed to go around this, by
>> > grouping files and reading them in a loop, so that finally I have the
>> > Seq[DataSet<Record>]. But if I try to union those datasets, then I will
>> > receive the AskTimeoutException again. So my question is, what can be
>> the
>> > reason behind this exception being thrown and why is the setting
>> ignored,
>> > even if this is pared properly.
>> >
>> > I will be glad for any help.
>> >
>> > Best Regards,
>> > Dom.
>> >
>>
>

Re: Flink Read thousands of files with batch

Reply via email to