Re: Strictly use TLSv1.2

2018-06-20 Thread Vinay Patil
Hi, Can someone please help me with this issue. Regards, Vinay Patil -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Sampath Bhat
Hello Till Thanks for clarification. But I've few questions based on your reply. In non-HA setups we need the jobmanager.rpc.address to derive the hostname of the rest server. why is there dependency on jobmanager.rpc.address to get the hostname rest server? This holds good only for normal deploy

[no subject]

2018-06-20 Thread Vinod Gavhane
Regards, Vinod Gavhane

Cleaning of state snapshot in state backend(HDFS)

2018-06-20 Thread Garvit Sharma
Hi, Consider a managed keyed state backed by HDFS with checkpointing enabled. Now, as the state grows the state data will be saved on HDFS. Now, let's say, we clear the state. Would the state data be removed from HDFS too? How does Flink manage to clear the state data from state backend on clear

Re: Blob Server Removes Failed Jobs Immediately

2018-06-20 Thread Till Rohrmann
Hi Dominik, all job related files (non-HA as well as HA) are removed once the job reaches a globally terminal state (FINISHED, CANCELLED, FAILED). This is the case because Flink assumes that the job is done and won't be retried afterwards. Thus, the documentation in the Flip is not true and should

Re: Exception while submitting jobs through Yarn

2018-06-20 Thread Till Rohrmann
Great to hear. Please open a PR for the improvements you like to contribute. Cheers, Till On Wed, Jun 20, 2018 at 4:56 PM Garvit Sharma wrote: > So, finally, I have got this working. The issue was because of a poor > library which was using xerces 2.6 :). > > In this process, I found few things

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-20 Thread Siew Wai Yow
Thanks Chesnay, the application will take value from "state.savepoints.dir" as default if set target-directory to nul. But then it trying to create the directory in local machine, which caused the below error because it is a HDFS directory. The same URL works in previous Flink 1.3.2. Is somethin

Flink 1.5 TooLongFrameException in cluster mode?

2018-06-20 Thread chrisr123
I download Flink 1.5 and I'm trying to run it in standalone mode. 1 job manager, 2 task managers. I can run flink job when I run in local mode: 1 machine as both job manager and task manager. But when I add 2 remote machines as slaves and try to run, I am seeing this error in the log and the job

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Hi guys, I have another question related to the KPL problem. I wonder what the consequences of overwhelming KPL internal queue (kinesis) can be. From my observation in experimenting with 1.4.2 (which does not have backpressure support yet in the open pr stated below), when the flink cluster is

How to use broadcast variables in data stream

2018-06-20 Thread zhen li
Hi,all: I want to use some other broadcast resources such as list or map in the flatmap function or customer triggers, but I don’t find some api to satisfy. Anyone can help? thanks

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy
Alas, that error appears to be a red herring. Admin mistyped the cancel command leading to the error. But immediately corrected it, resulting in the job being canceled next. So seems unrelated to the job coming back to life later on. On Wed, Jun 20, 2018 at 10:04 AM Elias Levy wrote: > The so

Re: Blob Server Removes Failed Jobs Immediately

2018-06-20 Thread Chesnay Schepler
hmm, this indeed looks odd. Looping in Till (cc) who might know more about this. On 20.06.2018 16:43, Dominik Wosiński wrote: Hello, I'm not sure whether the problem is connected with bad configuration or it's some inconsistency in the documentation but according to this document:https://cwi

Re: Cluster resurrects old job

2018-06-20 Thread Elias Levy
The source of the issue may be this error that occurred when the job was being canceled on June 5: June 5th 2018, 14:59:59.430 Failure during cancellation of job c59dd3133b1182ce2c05a5e2603a0646 with savepoint. java.io.IOException: Failed to create savepoint directory at --checkpoint-dir at org.ap

Cluster resurrects old job

2018-06-20 Thread Elias Levy
We had an unusual situation last night. One of our Flink clusters experienced some connectivity issues, with lead to the the single job running on the cluster failing and then being restored. And then something odd happened. The cluster decided to also restore an old version of the job. One we

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Thanks, Gordon. You are quick and It is very helpful to me. I tried some other alternatives to resolve this, finally thought about rewriting the FlinkKinesisProducer class for our need. Glad that I asked before I started. Really appreciate the quick response. From: "Tzu-Li (Gordon) Tai" Date: W

Re: Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Tzu-Li (Gordon) Tai
Hi Gavin, The problem is that the Kinesis producer currently does not propagate backpressure properly. Records are added to the internally used KPL client’s queue, without any queue size limit. This is considered a bug, and already has a pull request for it [1], which we should probably push t

Backpressure from producer with flink connector kinesis 1.4.2

2018-06-20 Thread Liu, Gavin (CAI - Atlanta)
Hi guys, I am new to flink framework. And we are building an application that takes kinesis stream for both flink source and sink. The flink version we are using is 1.4.2, which is also the version for the flink-connector-kinesis. We built the flink-connector-kinesis jar explicitly with KPL ver

Re: # of active session windows of a streaming job

2018-06-20 Thread Dongwon Kim
Hi Fabian and Chesnay, As Chesnay pointed out, it seems that I need to write the current counter (which is defined inside Trigger) into state which I think should be the operator state of the window operator. However, as I previously said, TriggerContext allows for users to access only the partiti

Re: Exception while submitting jobs through Yarn

2018-06-20 Thread Garvit Sharma
So, finally, I have got this working. The issue was because of a poor library which was using xerces 2.6 :). In this process, I found few things missing from the doc would like to contribute the same. I really appreciate the support provided. Thanks, On Tue, 19 Jun 2018 at 4:05 PM, Ted Yu wrot

Blob Server Removes Failed Jobs Immediately

2018-06-20 Thread Dominik Wosiński
Hello, I'm not sure whether the problem is connected with bad configuration or it's some inconsistency in the documentation but according to this document: https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture . *I*f a job fails, all non-HA files' refCoun

Re: How to get past "bad" Kafka message, restart, keep state

2018-06-20 Thread Tzu-Li (Gordon) Tai
Hi, You can “skip” the corrupted message by returning `null` from the deserialize method on the user-provided DeserializationSchema. This lets the Kafka connector consider the record as processed, advances the offset, but doesn’t emit anything downstream for it. Hope this helps! Cheers, Gordon

Re: How to get past "bad" Kafka message, restart, keep state

2018-06-20 Thread Kien Truong
Hi, You can use FlatMap instead of Map, and only collect valid elements. Regards, Kien On 6/20/2018 7:57 AM, chrisr123 wrote: First time I'm trying to get this to work so bear with me. I'm trying to learn checkpointing with Kafka and handling "bad" messages, restarting without losing state.

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread Andrey Zagrebin
Hi Amol, > In above code also it will sort the records in specific time window only. All windows will be emitted as watermark passes the end of the window. The watermark only increases. So the non-overlapping windows should be also sorted by time and as a consequence the records across windows

Re: Heap Problem with Checkpoints

2018-06-20 Thread Fabian Wollert
to that last one: i'm accessing S3 from one EC2 instance which has a IAM Role attached ... I'll get back to you when i have those stacktraces printed ... will have to build the project and package the custom version first, might take some time, and also some vacation is up next ... Cheers --

Re: Heap Problem with Checkpoints

2018-06-20 Thread Piotr Nowojski
Btw, side questions. Could it be, that you are accessing two different Hadoop file systems (two different schemas) or even the same one from two different users (encoded in the file system URI) within the same Flink JobMaster? If so, the answer might be this possible resource leak in Flink: http

Re: Heap Problem with Checkpoints

2018-06-20 Thread Piotr Nowojski
Hi, I was looking in this more, and I have couple of suspicions, but it’s still hard to tell which is correct. Could you for example place a breakpoint (or add a code there to print a stack trace) in org.apache.log4j.helpers.AppenderAttachableImpl#addAppender And check who is calling it? Since i

Re: # of active session windows of a streaming job

2018-06-20 Thread Chesnay Schepler
Checkpointing of metrics is a manual process. The operator must write the current value into state, retrieve it on restore and restore the counter's count. On 20.06.2018 12:10, Fabian Hueske wrote: Hi Dongwon, You are of course right! We need to decrement the counter when the window is close

Re: Passing records between two jobs

2018-06-20 Thread Fabian Hueske
Hi Avihai, Rafi pointed out the two common approaches to deal with this situation. Let me expand a bit on those. 1) Transactional producing in to queues: There are two approaches to accomplish exactly-once producing into queues, 1) using a system with transactional support such as Kafka or 2) mai

Re: # of active session windows of a streaming job

2018-06-20 Thread Fabian Hueske
Hi Dongwon, You are of course right! We need to decrement the counter when the window is closed. The idea of using Trigger.clear() (the clean up method is called clear() instead of onClose()) method is great! It will be called when the window is closed but also when it is merged. So, I think you

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread Andrey Zagrebin
Hi, Good point, sorry for confusion, BoundedOutOfOrdernessTimestampExtractor of course does not buffer records, you need to apply windowing (e.g. TumblingEventTimeWindows) for that and then sort the window output by time and emit records in sorted order. You can also use windowAll which alread

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-20 Thread Stefan Richter
Hi, it is possible that the number of processing time timers can grow, because internal timers are scoped by time, key, and namespace (typically this means „window“, because each key can be part of multiple windows). So if the number of keys in your application is steadily growing this can happ

Re: Ordering of stream from different kafka partitions

2018-06-20 Thread sihua zhou
Hi, I think a global ordering is a bit impractical on production, but in theroy, you still can do that. You need to - Firstly fix the operate's parallelism to 1(except the source node). - If you want to sort the records within a bouned time, then you can keyBy() a constant and window it,

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Till Rohrmann
It will, but it defaults to jobmanager.rpc.address if no rest.address has been specified. On Wed, Jun 20, 2018 at 9:49 AM Chesnay Schepler wrote: > Shouldn't the non-HA case be covered by rest.address? > > On 20.06.2018 09:40, Till Rohrmann wrote: > > Hi Sampath, > > it is no longer possible to

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Chesnay Schepler
Shouldn't the non-HA case be covered by rest.address? On 20.06.2018 09:40, Till Rohrmann wrote: Hi Sampath, it is no longer possible to not start the rest server endpoint by setting rest.port to -1. If you do this, then the cluster won't start. The comment in the flink-conf.yaml holds only tr

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Till Rohrmann
Hi Sampath, it is no longer possible to not start the rest server endpoint by setting rest.port to -1. If you do this, then the cluster won't start. The comment in the flink-conf.yaml holds only true for the legacy mode. In non-HA setups we need the jobmanager.rpc.address to derive the hostname o

Re: Passing records between two jobs

2018-06-20 Thread Rafi Aroch
Hi Avihai, The problem is that every message queuing sink only provides at-least-once > guarantee > >From what I see, possible messaging queue which guarantees exactly-once is Kafka 0.11, while using the Kafka transactional messaging

Re: Breakage in Flink CLI in 1.5.0

2018-06-20 Thread Chesnay Schepler
I was worried this might be the case. The rest.port handling was simply copied from the legacy web-server, which explicitly allowed shutting it down. It may (I'm not entirely sure) also not be necessary for all deployment modes; for example if the job is baked into the job/taskmanager images.

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-20 Thread Chesnay Schepler
I think you can set the target-directory to null. But I'm not sure why this particular request requires this, other request allow optional fields to simply be ommitted... On 20.06.2018 06:12, Siew Wai Yow wrote: Hi all, Seems pass in target-directory is a must now for checkpoints REST API,