Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Kye Bae
in please look at the logs on other > machines (maybe system logs) > 3. Some OS failure - please look at the system logs on other machines > 4. Some hardware failure (restart / crash) > 5. Network problems > > Piotrek > > pon., 7 gru 2020 o 23:31 Kye Bae napisaƂ(a): > >

Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-07 Thread Kye Bae
I forgot to mention: this is Flink 1.10. -K On Mon, Dec 7, 2020 at 5:08 PM Kye Bae wrote: > Hello! > > We have a real-time streaming workflow that has been running for about 2.5 > weeks. > > Then, we began to get the exception below from taskmanagers (random) since > y

ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-07 Thread Kye Bae
Hello! We have a real-time streaming workflow that has been running for about 2.5 weeks. Then, we began to get the exception below from taskmanagers (random) since yesterday, and the job began to fail/restart every hour or so. The job does recover after each restart, but sometimes it takes more

Re: [External Sender] Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

2020-11-17 Thread Kye Bae
> >> I'll keep you up to date with my findings.. >> >> Best, >> Flavio >> >> On Mon, Nov 16, 2020 at 8:22 PM Kye Bae wrote: >> >>> Hello! >>> >>> The JVM metaspace is where all the classes (not class instances or &

Re: [External Sender] Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

2020-11-16 Thread Kye Bae
Hello! The JVM metaspace is where all the classes (not class instances or objects) get loaded. jmap -histo is going to show you the heap space usage info not the metaspace. You could inspect what is happening in the metaspace by using jcmd (e.g., jcmd JPID VM.native_memory summary) after restarti

Re: [External Sender] Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-25 Thread Kye Bae
Not sure about Flink 1.10.x. Can share a few things up to Flink 1.9.x: 1. If your Flink cluster runs only one job, avoid using dynamic classloader for your job: start it from one of the Flink class paths. As of Flink 1.9.x, using the dynamic classloader results in the same classes getting loaded e