Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-13 Thread Gerard Garcia
ing, the task manager process will be exited > finally to trigger restarting the job. > > Zhijiang > > ------ > 发件人:Gerard Garcia > 发送时间:2018年7月2日(星期一) 18:29 > 收件人:wangzhijiang999 > 抄 送:user > 主 题:Re: Flink job

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-18 Thread Gerard Garcia
Hi vino, Seems that jobs id stay in /jobgraphs when we cancel them manually. For example, after cancelling the job with id 75e16686cb4fe0d33ead8e29af131d09 the entry is still in zookeeper's path /flink/default/jobgraphs, but the job disappeared from /home/nas/flink/ha/default/blob/. That is the c

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-19 Thread Gerard Garcia
Can you please provide log from JobManager/Entry point for further > investigation? > > Cheers, > Andrey > > On 18 Jul 2018, at 10:16, Gerard Garcia wrote: > > Hi vino, > > Seems that jobs id stay in /jobgraphs when we cancel them manually. For > example, after cancelling

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-23 Thread Gerard Garcia
>> >>> At the end it says removed job graph e403893e5208ca47ace886a77e405291 >>> from ZooKeeper but I still can see it at /flink/default/jobgraphs: >>> >>> [zk: localhost:2181(CONNECTED) 14] ls >>> /flink/default/jobgraphs/e403893e5208ca47ace886a77e40529

Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-23 Thread Gerard Garcia
waiting for lock > which is also occupied by task output process. > > As you mentioned, it makes sense to check the data structure of the output > record and reduces the size or make it lightweight to handle. > > Best, > > Zhijiang > > ---------

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-23 Thread Gerard Garcia
erard On Mon, Jul 23, 2018 at 11:57 AM Gerard Garcia wrote: > Hi Till, > > I can't post the full log (as there is internal info in them) but I've > found this. Is that what you are looking for? > > 11:29:17.351 [main] INFO org.ap

Re: Override CaseClassSerializer with custom serializer

2018-08-20 Thread Gerard Garcia
Hi Timo, I see. Yes, we have already use the "Object Reuse" option. It was a nice performance improvement when we first set it! I guess another option we can try is to somehow make things "easier" to Flink so it can chain operators together. Most of them are not chained, I think it's because they

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Gerard Garcia
ess of the newly elected leader. The leader path > should also be logged in the cluster entrypoint logs. You can use the > ZooKeeper cli for accessing the ZNodes. > > Cheers, > Till > > On Mon, Jul 23, 2018 at 4:07 PM Gerard Garcia wrote: > >> We have just started ex

Re: Unbalanced Kafka consumer consumption

2018-10-30 Thread Gerard Garcia
ao,Best > > Original Message > *Sender:* Gerard Garcia > *Recipient:* fearsome.lucidity > *Cc:* user > *Date:* Monday, Oct 29, 2018 17:50 > *Subject:* Re: Unbalanced Kafka consumer consumption > > The stream is partitioned by key after ingestion at the finest granularity

Re: Unbalanced Kafka consumer consumption

2018-11-22 Thread Gerard Garcia
x27;t ingest into the Kafka topic live > but want to read persisted data. > 5. Are you using Flink's metrics to monitor the different source tasks? > Check what the source operator's output rate is (should be visible from the > web UI). > > Cheers, > Till > > On T

Re: Unbalanced Kafka consumer consumption

2018-12-19 Thread Gerard Garcia
We finally figure it out. We had a large value in the Kafka consumer option 'max.partition.fetch.bytes', this made the KafkaConsumer to not consume at a balanced rate from all partitions. Gerard

Re: Timestamp synchronized message consumption across kafka partitions

2019-03-07 Thread Gerard Garcia
I'll answer myself. I guess the most viable option for now is to wait for the work in http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html On Thu, Mar 7, 2019, 3:24 PM gerardg wrote: > I'm wondering if there is a way to avoid consuming too fa

Re: Status.JVM.Memory.Heap.Used metric shows only a few megabytes but profiling shows gigabytes (as expected)

2019-03-20 Thread Gerard Garcia
Thanks, next time I'll check in in jira better. On Tue, Mar 19, 2019 at 6:27 PM Chesnay Schepler wrote: > Known issue, fixed in 1.7.3/1.8.0: > https://issues.apache.org/jira/browse/FLINK-11183 > > On 19.03.2019 15:03, gerardg wrote: > > Hi, > > > > Before Flink 1.7.0 we were getting correct valu

Re: Missing checkpoint when restarting failed job

2017-11-28 Thread Gerard Garcia
I've been monitoring the task and checkpoint 1 never gets deleted. Right now we have: chk-1 chk-1222 chk-326 chk-329 chk-357 chk-358 chk-8945 chk-8999 chk-9525 chk-9788 chk-9789 chk-9790 chk-9791 I made the task fail and it recovered without problems so for now I would say that the pro

Re: Kafka topic partition skewness causes watermark not being emitted

2017-12-13 Thread Gerard Garcia
Thanks Gordon. Don't worry, I'll be careful to not have empty partitions until the next release. Also, I'll keep an eye to FLINK-5479 and if at some point I see that there is a fix and the issue bothers us too much I'll try to apply the patch myself to the latest stable release. Gerard On Wed, D

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Gerard Garcia
Hi Stefan, thanks Yes, we are also using keyed state in other operators the problem is that serialization is quite expensive and in some of them we would prefer to avoid it by storing the state in memory (for our use case one specific operator with in memory state gives at least a 30% throughput i

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Gerard Garcia
a job. But that is currently not possible. I still want to > answer your other question: you could currently compute all things about > key-groups and their assignment to operators by using the methods > from org.apache.flink.runtime.state.KeyGroupRangeAssignment. > > Best, > Stefan