Flink 1.3 release date

2017-06-01 Thread Tarandeep Singh
Hi, Any updates on 1.3 release date? Thanks, Tarandeep

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread Kostas Kloudas
Hi Ninad, I think that Gordon could shed some more light on this but I suggest you should update your Flink version to at least the 1.2. The reason is that we are already in the process of releasing Flink 1.3 (which will come probably today) and a lot of things have changed/fixed/improved sin

Re: Checkpoints very slow with high backpressure

2017-06-01 Thread rhashmi
I tried to extend timeout to 1 hour but no luck. it is still timing out. So i am guessing something stuck, will dig down further. Here is configuration detail. Standalone cluster & checkpoint store in S3. i just have 217680 messages in 24 partitions. Anyidea? -- View this message in co

Re: Checkpoints very slow with high backpressure

2017-06-01 Thread rhashmi
I tried to extend timeout to 1 hour but no luck. it is still timing out & no exception in log file So i am guessing something stuck, will dig down further. Here is configuration detail. Standalone cluster & checkpoint store in S3. i just have 217680 messages in 24 partitions. Anyidea?

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread Tzu-Li (Gordon) Tai
Hi Ninad, This exception you’re seeing does not cause data loss. As a matter of fact, its preventing data loss based on how Flink’s checkpoints / fault-tolerance works. So, a recap of what the problem was when this “uncaught exception leak” issue was first reported: Prior to the fix, on checkpo

Re: Checkpoints very slow with high backpressure

2017-06-01 Thread rhashmi
Enable info log. it seems it stuck ==> /mnt/ephemeral/logs/flink-flink-jobmanager-0-vpc2w2-rep-stage-flink1.log <== 2017-06-01 12:45:18,229 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1496321118221 ==> /mnt/ephemeral/logs/flink-flink-taskmanag

Re: Does job restart resume from last known internal checkpoint?

2017-06-01 Thread Nico Kruber
Additionally, externalized checkpoints [3] may be retained after cancelling a job. However, externalized checkpoints do not support rescaling (some documentation improvements on this part are already present in a PR[4]). Nico [3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setu

[ANNOUNCE] Apache Flink 1.3.0 released

2017-06-01 Thread Robert Metzger
The Apache Flink community is pleased to announce the release of Apache Flink 1.3.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https://fl

GC metrics from Job Manager UI

2017-06-01 Thread Flavio Pompermaier
Hi to all, Is there any statistics about gc in a TM available from the Job Manager UI at the moment? That would be very useful to understand whether Flink is slowing down because of the job itself or some high gc in some operator. Best, Flavio

Re: Excessive stdout is causing java heap out of mem

2017-06-01 Thread Robert Metzger
What you can always do to reduce pressure on the heap from large state is using the RocksDB state backend. Then, all the state will be kept on disk. On Thu, May 25, 2017 at 7:20 AM, Fritz Budiyanto wrote: > Hi Robert, > > Yes, lots of buffering in the heap. State backend is JobManager with Heap

Re: Flink parallel tasks, slots and vcores

2017-06-01 Thread Robert Metzger
Hi Sathi, Are you seeing 8 slots in the JobManager UI? How many shards do you have in your kinesis stream? On Fri, May 26, 2017 at 3:14 PM, Jason Brelloch wrote: > Can you give us more information about what your Flink job is doing and > the distribution of the Kinesis data/keys? The distrib

Re: Flink - Iteration and Backpressure

2017-06-01 Thread Robert Metzger
Hi Mahesh, why do you need to iterate over the elements? I wonder if you can't just stream the data from kafka1-kafka4 through your topology? On Fri, May 26, 2017 at 7:21 PM, MAHESH KUMAR wrote: > Hi Team, > > I am trying to build an audit like system where I read messages from "n" > Kafka q

Re: Memory ran out. Compaction failed. - Exception

2017-06-01 Thread Robert Metzger
Hi Marc, The CompactingHashtable is not spillable (the only operator in batch actually that isn't), so you can only either reduce your data size or increase your memory. When you are using unmanaged memory, make sure the allocation of managed memory is reduced to a minimum. On Mon, May 29, 2017 a

Re: Flink Docker Kubernetes Gitlab CI CDeployment

2017-06-01 Thread Robert Metzger
Hi Nancy, there is no documentation on a Gitlab Ci integration with a K8s deployment available. But I'm pretty sure you can deploy Flink using docker containers (check this out: http://flink.apache.org/news/2017/05/16/official-docker-image.html ) I'm CCing Patrick, he has some experience deployin

Re: Duplicated data when using Externalized Checkpoints in a Flink Highly Available cluster

2017-06-01 Thread Robert Metzger
Hi Amara, how are you validating if you have duplicates in your output or not? If you are just writing the output to another Kafka topic or print it to standard out, you'll see duplicates even if exactly once works. Flink does not provide exactly once delivery. Flink has exactly once semantics for

Re: Problems submitting Flink to Yarn with Kerberos

2017-06-01 Thread Robert Metzger
Can you check the logs of the JobManager? (maybe in DEBUG level), to see if you see something that tries to establish a connection with it? Are you sure you are properly authenticated to access the JM? On Tue, May 30, 2017 at 4:41 PM, Dominique Rondé < dominique.ro...@allsecur.de> wrote: > Hi Gor

Re: HTTP listener source

2017-06-01 Thread Robert Metzger
Hi, yes, there's something like this available in Apache Bahir: https://github.com/apache/bahir-flink/tree/master/flink-connector-netty On Wed, May 31, 2017 at 3:51 AM, Madhukar Thota wrote: > Hi > > As anyone implemented HTTP listener in flink source which acts has a > Rest API to receive JSON

Re: Flink 1.3 release date

2017-06-01 Thread Robert Metzger
It has been released today. Let us know if you find any issues with it. On Thu, Jun 1, 2017 at 9:50 AM, Tarandeep Singh wrote: > Hi, > > Any updates on 1.3 release date? > > Thanks, > Tarandeep >

Re: Problems submitting Flink to Yarn with Kerberos

2017-06-01 Thread Tzu-Li (Gordon) Tai
Hi Dominique, Had another quick look at the error trace you provided, the problem doesn’t seem to be related to Kerberos authentication. For some reason the JobManager simply isn’t reachable from the client, as Robert has pointed out. There should be some clue about this in the JM logs. On 1 J

Re: Flink - Iteration and Backpressure

2017-06-01 Thread MAHESH KUMAR
Hi Robert, The Message Auditor System must monitor all the 4 kafka queue and gather information about messages that made through all of them or say specifically which queue a particular message did not make it through. We want the window time to be equivalent to our SLA time so that any message th

Re: Cassandra connector POJO - tombstone question

2017-06-01 Thread Tarandeep Singh
Hi Chesnay, Did your code changes (exposing mapper options) made it in 1.3 release? Thank you, Tarandeep On Wed, Apr 12, 2017 at 2:34 PM, Tarandeep Singh wrote: > Thanks Chesnay, this will work. > > Best, > Tarandeep > > On Wed, Apr 12, 2017 at 2:42 AM, Chesnay Schepler > wrote: > >> Hello, >

Re: GC metrics from Job Manager UI

2017-06-01 Thread Chesnay Schepler
In the UI, no. But you can access TaskManager metrics through the JobManager REST API. On 01.06.2017 17:00, Flavio Pompermaier wrote: Hi to all, Is there any statistics about gc in a TM available from the Job Manager UI at the moment? That would be very useful to understand whether Flink is s

Re: Cassandra connector POJO - tombstone question

2017-06-01 Thread Chesnay Schepler
No, unfortunately I forgot about them :/ On 01.06.2017 19:39, Tarandeep Singh wrote: Hi Chesnay, Did your code changes (exposing mapper options) made it in 1.3 release? Thank you, Tarandeep On Wed, Apr 12, 2017 at 2:34 PM, Tarandeep Singh > wrote: Thanks Ches

Re: Does job restart resume from last known internal checkpoint?

2017-06-01 Thread Moiz S Jinia
Bump.. On Tue, May 30, 2017 at 10:17 PM, Moiz S Jinia wrote: > In a checkpointed Flink job will doing a graceful restart make it resume > from last known internal checkpoint? Or are all checkpoints discarded when > the job is stopped? > > If discarded, what will be the resume point? > > Moiz >

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread ninad
Thanks Gordon and Kostas. Gordon, "When a failure occurs in the job, Flink uses the last completed checkpoint to restart the job. In the case of the Flink Kafka producer, this essentially makes sure that records which did not make it into Kafka and caused the last run to fail are reprocessed and

Re: Cassandra connector POJO - tombstone question

2017-06-01 Thread Tarandeep Singh
No problem :) Thanks for letting me know. Best, Tarandeep On Thu, Jun 1, 2017 at 11:18 AM, Chesnay Schepler wrote: > No, unfortunately I forgot about them :/ > > > On 01.06.2017 19:39, Tarandeep Singh wrote: > > Hi Chesnay, > > Did your code changes (exposing mapper options) made it in 1.3 rele

Restoring Queryable State

2017-06-01 Thread Philip Doctor
Hello, My job differs slightly from example Queryable State jobs. I have a keyed stream and I will emit managed ValueState at certain points at runtime but the names aren’t entirely known beforehand. I have check pointing enabled and when I restore from a check point, everything *almost* works

Re: [ANNOUNCE] Apache Flink 1.3.0 released

2017-06-01 Thread Ted Yu
Robert: Do you know when maven artifacts would be populated ? Currently I don't see 1.3.0 here: https://mvnrepository.com/artifact/org.apache.flink/flink-core Thanks On Thu, Jun 1, 2017 at 7:48 AM, Robert Metzger wrote: > The Apache Flink community is pleased to announce the release of Apache

Re: [ANNOUNCE] Apache Flink 1.3.0 released

2017-06-01 Thread Ismaël Mejía
Ted, The artifacts are already in maven central. https://search.maven.org/#search|ga|1|g%3A%22org.apache.flink%22 Notice that google somehow puts always that mvnrepository.com website mirror first in the results, even if it is almost always outdated. I had the same problem when checking for jars b