Re: Having a backoff while experiencing checkpointing failures

2018-06-11 Thread Stefan Richter
Hi, I think the behaviour of min_pause_between_checkpoints is either buggy or we should at least discuss if it would not be better to respect a pause also for failed checkpoints. As far as I know there is no ongoing work to add backoff, so I suggest you open a jira issue and make a case for thi

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread Piotr Nowojski
Hi, What’s your KafkaConsumer configuration? Especially values for: - is checkpointing enabled? - enable.auto.commit (or auto.commit.enable for Kafka 0.8) / auto.commit.interval.ms - did you set setCommitOffsetsOnCheckpoints() ? Please also refer to https://ci.apache.org/projects/flink/flink-do

Re: [BucketingSink] notify on moving into pending/ final state

2018-06-11 Thread Piotr Nowojski
Hi, I see that could be a useful feature. What exactly now is preventing you from inheriting from BucketingSink? Maybe it would be just enough to make the BucketingSink easier extendable. One thing now that could collide with such feature is that Kostas is now working on larger BucketingSink r

Re: why BlobServer use ServerSocket instead of Netty's ServerBootstrap?

2018-06-11 Thread Timo Walther
Hi, I think this question should rather be send to the dev@ mailing list. But I will loop in Nico that might know more about the implementation details. Regards, Timo Am 11.06.18 um 05:07 schrieb makeyang: after checking code, I found that BlobServer use ServerSocket instead of Netty's Serv

Re: File does not exist prevent from Job manager to start .

2018-06-11 Thread Till Rohrmann
Hi Miki, Flink tries first to store the checkpoint data in Hadoop before writing the handle to the meta data in ZooKeeper. Thus, if the handle is in ZooKeeper, then it should also have been written to HDFS. Maybe you could check the HDFS logs whether you find something suspicious. If ZooKeeper fa

Re: Heap Problem with Checkpoints

2018-06-11 Thread Piotr Nowojski
Hi, What kind of messages are those “logs about S3 operations”? Did you try to google search them? Maybe it’s a known S3 issue? Another approach is please use some heap space analyser from which you can backtrack classes that are referencing those “memory leaks” and again try to google any kno

Re: Take elements from window

2018-06-11 Thread Piotr Nowojski
Hi, Do I understand you correctly, that you just want to have a three different sliding windows (for 3 rules) with duration of 10, 20 and 30 minutes? If so, I haven’t tested it but I would guess that there are at least two solutions for the problem: 1. just create three different sliding windo

Akka version conflict running on Flink cluster

2018-06-11 Thread Wouter Zorgdrager
Hi, I think I'm running into an Akka version conflict when running a Flink job on a cluster. The current situation: - Flink cluster on Flink 1.4.2 (using Docker) - Flink job which uses twitter4s [1] library and Akka version 2.5.8 In my Flink job I try to 'shutdown' an Akka actor from the twitter

Re: Stopping of a streaming job empties state store on HDFS

2018-06-11 Thread Stefan Richter
Hi, > Am 08.06.2018 um 01:16 schrieb Peter Zende : > > Hi all, > > We have a streaming pipeline (Flink 1.4.2) for which we implemented stoppable > sources to be able to gracefully exit from the job with Yarn state > "finished/succeeded". > This works fine, however after creating a savepoint,

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread Juho Autio
Hi Piotr, thanks for your insights. > What’s your KafkaConsumer configuration? We only set these in the properties that are passed to FlinkKafkaConsumer010 constructor: auto.offset.reset=latest bootstrap.servers=my-kafka-host:9092 group.id=my_group flink.partition-discovery.interval-millis=3

Re: how to emit record to downstream operator in snapshotState and/or onProcessingTime

2018-06-11 Thread Piotr Nowojski
Hi, Indeed it seems like this is not possible to emit records on checkpoint/snapshot through ProcessFunction. However you could do it via a custom Operator (there you have a constant access to output collector). Another workaround might be to register processing time service in your ProcessFunc

[ANNOUNCE] Weekly community update #24

2018-06-11 Thread Till Rohrmann
Dear community, this is the weekly community update thread #24. Please post any news and updates you want to share with the community to this thread. # flink-shaded 4.0 released flink-shaded 4.0 has been released [1] which bumps Flink's shaded Netty dependency to 4.1.24. # Rework of Flink websi

Re: Kafka to Flink to Hive - Writes failing

2018-06-11 Thread Piotr Nowojski
Yes, BucketingSink is a better option. You can start from looking at the BucketingSink java docs. Please also take a look on this: https://stackoverflow.com/questions/47669729/how-to-write-to-orc-files-using-bucketingsink-in-apache-flink

Re: Akka version conflict running on Flink cluster

2018-06-11 Thread Piotr Nowojski
Hi, Please take a look on this thread first: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Akka-Http-used-in-custom-RichSourceFunction-td20314.html

Writing csv to Hadoop Data stream

2018-06-11 Thread miki haiat
Hi, Im trying to stream data to Haddop as a csv . In batch processing i can use HadoopOutputFormat like that ( example/WordCount.java

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread Piotr Nowojski
The more I look into it, the more it seems like a Kafka bug or some cluster failure from which your Kafka cluster did not recover. In your cases auto committing should be set to true and in that case KafkaConsumer should commit offsets once every so often when it’s polling messages. Unless for

Re: Flink kafka consumer stopped committing offsets

2018-06-11 Thread amit pal
Probably your kafka consumer is rebalancing. This can be due to a bigger message processing time due to which kafka broker is marking your consumer dead and rebalancing. This all happens before the consumer can commit the offsets. On Mon, Jun 11, 2018 at 7:37 PM Piotr Nowojski wrote: > The more

Re: how to emit record to downstream operator in snapshotState and/or onProcessingTime

2018-06-11 Thread Steven Wu
Pirotr, > However you could do it via a custom Operator (there you have a constant access to output collector). Can you elaborate that a little bit? are you referring to "Output> output" in AbstractStreamOperator class? > register processing time service in your ProcessFunction. I think your ti

Re: how to emit record to downstream operator in snapshotState and/or onProcessingTime

2018-06-11 Thread Steven Wu
@Override public void processElement(Integer value, Context ctx, Collector out) throws Exception { ctx.timerService().registerProcessingTimeTimer(...); } @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector< Integer> out) throws Exception { // … } correcting myself re

Ask about aggeragtion on joined streams

2018-06-11 Thread Rad Rad
Hi, Could you help me if I want to do aggregations of two joined streams such as AVG FirstStream.join(SecondStream) .where(new FirstKeySelector()) .equalTo(new SecondKeySelector()) .window(Tumbli

Flink 1.6 release note!!

2018-06-11 Thread Puneet Kinra
Hi can anybody please send the link or ref document for 1.6 release. -- *Cheers * *Puneet Kinra* *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com * *e-mail :puneet.ki...@customercentria.com *

Re: Flink 1.6 release note!!

2018-06-11 Thread vipul singh
I think you are looking for this? http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Flink-1-6-features-tc20502.html 1.6 release notes as per current website: https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.6.html Recently 1.5 was released: ht

DataStreamCalcRule grows beyond 64 KB

2018-06-11 Thread rakeshchalasani
Hi, We hit a situation where the code generation on Flink grows beyond 64KB and fails. Spark SQL has a similar issue and it automatically disables code-generation in such a case. Any way we can control that here? Following is the error stack: org.apache.flink.api.common.InvalidProgramException: T

How to submit two Flink jobs to the same Flink cluster?

2018-06-11 Thread Angelica
I have a Flink Standalone Cluster based on Flink 1.4.2 (1 job manager, 4 task slots) and want to submit two different Flink programs. Not sure if this is possible at all as some flink archives say that a job manager can only run one job. If this is true, any ideas how can I get around this issue?

Re: DataStreamCalcRule grows beyond 64 KB

2018-06-11 Thread Hequn Cheng
Hi rakeshchalasani, At the moment flink only splits methods by fields to avoid 64k problem, so current implementation will reach the limits if a single field becomes too large. Flink community has already planed to solve the problem, see [1]. As a workaround, you can define you own udf to avoid th

Re: why BlobServer use ServerSocket instead of Netty's ServerBootstrap?

2018-06-11 Thread makeyang
thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Flink 1.4.0 release commit

2018-06-11 Thread Abdul Qadeer
Hi! I was trying to find out the commit from which 1.4.0 was released. The release was on 29th November 2017 but I am not able to find any commits around that date to verify this. Any help appreciated.

Re: How to submit two Flink jobs to the same Flink cluster?

2018-06-11 Thread Sampath Bhat
Hi Angelica You can run any number of flink jobs in flink cluster. There is no restriction as such until and unless there are issues with flink jobs resource sharing(Ex : two jobs accessing same port). On Tue, Jun 12, 2018 at 5:03 AM, Angelica wrote: > I have a Flink Standalone Cluster based on