Thanks Till. Actually, i have around 5GB pods for each TM, and each pod with only one slot. But the metrics i have pulled is as below, which is slightly confusing. It says only ~50MB of Heap is committed for the tasks. Would you be able to point me to the right configuration to be set.
Thanks ~Ramya. [image: image.png] On Tue, Jun 9, 2020 at 3:12 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Ramya, > > it looks as if you should give your Flink pods and also the Flink process a > bit more memory as the process fails with an out of memory error. You could > also try Flink's latest version which comes with native Kubernetes support. > > Cheers, > Till > > On Tue, Jun 9, 2020 at 8:45 AM Ramya Ramamurthy <hair...@gmail.com> wrote: > > > Hi, > > > > My flink jobs are constantly going down beyond an hour with the below > > exception. > > This is Flink 1.7 on kubes, with checkpoints to Google storage. > > > > AsynchronousException{java.lang.Exception: Could not materialize > > checkpoint 21 for operator Source: Kafka011TableSource(sid, _zpsbd3, > > _zpsbd4, _zpsbd6, _zpsbd7, _zpsbd9, lvl_1, isBot, botcode, ssresp, > > reason, ts) -> from: (sid, _zpsbd3, _zpsbd6, ts) -> > > Timestamps/Watermarks -> where: (<>(sid, _UTF-16LE'7759')), select: > > (sid, _zpsbd3, _zpsbd6, ts) -> time attribute: (ts) (5/6).} > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153) > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947) > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.Exception: Could not materialize checkpoint 21 > > for operator Source: Kafka011TableSource(sid, _zpsbd3, _zpsbd4, > > _zpsbd6, _zpsbd7, _zpsbd9, lvl_1, isBot, botcode, ssresp, reason, ts) > > -> from: (sid, _zpsbd3, _zpsbd6, ts) -> Timestamps/Watermarks -> > > where: (<>(sid, _UTF-16LE'7759')), select: (sid, _zpsbd3, _zpsbd6, ts) > > -> time attribute: (ts) (5/6). > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) > > ... 6 more > > Caused by: java.util.concurrent.ExecutionException: > > java.lang.OutOfMemoryError: Java heap space > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > at > > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > > at > > > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53) > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) > > ... 5 more > > Caused by: java.lang.OutOfMemoryError: Java heap space > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:609) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:558) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:482) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:599) > > at > > > com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:272) > > ... 4 more > > > > > > > > Any help here in understanding this would be highly appreciated. > > > > > > Thanks. > > >