[jira] [Created] (FLINK-9381) BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Amit Jain (JIRA)
Amit Jain created FLINK-9381: Summary: BlobServer data for a job is not getting cleaned up at JM Key: FLINK-9381 URL: https://issues.apache.org/jira/browse/FLINK-9381 Project: Flink Issue Type

Re: Rewriting a new file instead of writing a ".valid-length" file in BucketSink when restoring

2018-05-15 Thread Amit Jain
Hi Timo, Should not we delegate the recovery option to the user? I think we can ask the user to provide Reader to respective Writer class and save valid-length info in operator state apart from the current flow. According to user chosen recovery option, we can stream the Reader output to Writer cl

Re: [ANNOUNCE] Two new committers: Xingcan Cui and Nico Kruber

2018-05-10 Thread Amit Jain
Congrats! On Thu, May 10, 2018 at 10:10 AM, Xingcan Cui wrote: > Thanks, everyone! > > It’s an honor which inspires me to devote more to our community. > > Regards, > Xingcan > >> On May 10, 2018, at 2:06 AM, Peter Huang wrote: >> >> Congratulations Nico and Xingcan! >> >> On Wed, May 9, 2018 at

Re: TaskManager deadlock on NetworkBufferPool

2018-04-19 Thread Amit Jain
FYI > > On Thu, Apr 19, 2018 at 10:04 AM, Amit Jain wrote: > > > @Ufuk Please find execution plan in the attachment. > > > > @Nico Job is not making progress at all. This issue is happening > > randomly. Few of our jobs are working with only few MB of data and still,

Re: TaskManager deadlock on NetworkBufferPool

2018-04-19 Thread Amit Jain
>> https://lists.apache.org/thread.html/a6b6fb1a42a975608fa8641c86df30 > b47f022985ade845f1f1ec542a@%3Cdev.flink.apache.org%3E > >> > >> 2018-04-04 20:23 GMT+02:00 Ted Yu : > >> > >>> I searched for 0x0005e28fe218 in the two files you attached > >>

[jira] [Created] (FLINK-9207) Client returns SUCCESS(0) return code for canceled job

2018-04-18 Thread Amit Jain (JIRA)
Amit Jain created FLINK-9207: Summary: Client returns SUCCESS(0) return code for canceled job Key: FLINK-9207 URL: https://issues.apache.org/jira/browse/FLINK-9207 Project: Flink Issue Type: Bug

Re: TaskManager deadlock on NetworkBufferPool

2018-04-04 Thread Amit Jain
+u...@flink.apache.org On Wed, Apr 4, 2018 at 11:33 AM, Amit Jain wrote: > Hi, > > We are hitting TaskManager deadlock on NetworkBufferPool bug in Flink 1.3.2. > We have set of ETL's merge jobs for a number of tables and stuck with above > issue randomly daily. > > I&#

TaskManager deadlock on NetworkBufferPool

2018-04-03 Thread Amit Jain
Hi, We are hitting TaskManager deadlock on NetworkBufferPool bug in Flink 1.3.2. We have set of ETL's merge jobs for a number of tables and stuck with above issue randomly daily. I'm attaching the thread dump of JobManager and one of the Task Man

Re: Batch job getting stuck

2018-02-14 Thread Amit Jain
11 GB -- Thanks, Amit On Mon, Feb 12, 2018 at 9:50 PM, Timo Walther wrote: > Hi Amit, > > how is the memory consumption when the jobs get stuck? Is the Java GC > active? Are you using off-heap memory? > > Regards, > Timo > > Am 2/12/18 um 10:10 AM schrieb Amit Jain: > &g

Batch job getting stuck

2018-02-12 Thread Amit Jain
Hi, We have created Batch job where we are trying to merge set of S3 directories in TextFormat with the old snapshot in Parquet format. We are running 50 such jobs daily and found the progress of few random jobs get stuck in between. We have gone through logs of JobManager, TaskManager and could

Re: Fetch Metrics

2018-02-05 Thread Amit Jain
Hi Suvimal, You may use REST API connecting to Job Manager to retrieve the stored metrics. Have a look at this link https://ci.apache.org/ projects/flink/flink-docs-release-1.4/monitoring/metrics.html#rest-api- integration -- Thanks, Amit On Mon, Feb 5, 2018 at 2:47 PM, Suvimal Yashraj < suvimal

Re: Invalid lambda deserialization

2018-01-03 Thread Amit Jain
gt; Regards, > Timo > > > Am 1/3/18 um 11:41 AM schrieb Amit Jain: > >> Hi, >> >> I'm writing a job to merge old data with changelogs using DataSet API >> where >> I'm reading changelog using TextInputFormat and old data using >> HadoopInpu

Invalid lambda deserialization

2018-01-03 Thread Amit Jain
Hi, I'm writing a job to merge old data with changelogs using DataSet API where I'm reading changelog using TextInputFormat and old data using HadoopInputFormat. I can see, job manager has successfully deployed the program flow to worker nodes. However, workers are immediately going to failed sta