Re: Taskmanagers are quarantined

2017-11-27 Thread T Obi
Hello Chesnay, Thank you for answer to my rough question. Not all of taskmanagers are quarantined at a time, but each taskmanager has been quarantined at least once. We are using CDH 5.8 based on hadoop 2.6. We didn't give attention about datanodes. We will check it. However, we are also using t

Non-intrusive way to detect which type is using kryo ?

2017-11-27 Thread Kien Truong
Hi, Are there any way to only log when Kryo serializer is used? It's a pain to disable generic type then try to solve the exception one by one. Best regards, Kien

Flink 1.2.0->1.3.2 TaskManager reporting to JobManager

2017-11-27 Thread Chan, Regina
Hi, As I moved from Flink 1.2.0 to 1.3.2 I noticed that the TaskManager may have all tasks with FINISHED but then take about 2-3 minutes before the Job execution switches to FINISHED. What is it doing that's taking this long? This was a parallelism = 1 case... Regina Chan Goldman Sachs - Enter

Re: How to perform efficient DataSet reuse between iterations

2017-11-27 Thread Fabian Hueske
Hi Miguel, I'm sorry but AFAIK, the situation has not changed. Is it possible that you are calling execute() multiple times? In that case, the 1-st and 2-nd graph would be recomputed before the 3-rd graph is computed. That would explain the increasing execution time of 15 seconds. Best, Fabian

Re: Flink stress testing and metrics

2017-11-27 Thread Ladhari Sadok
It is working with FLINK-7608 , but just to know : how to implement it with slf4jReporter ? I didn't find an example ! 2017-11-27 17:33 GMT+01:00 Chesnay Schepler : > The most reliable way to see the latency metric is configure a metric > reporter >

Re: S3 Access in eu-central-1

2017-11-27 Thread Stephan Ewen
Hi! The endpoint config entry looks correct. I was looking at this issue to see if there are pointers to anything else, but it looks like the explicit endpoint entry is the most important thing: https://issues.apache.org/jira/browse/HADOOP-13324 I cc-ed Steve Loughran, who is Hadoop's S3 expert (

Re: Best way to wait for different events

2017-11-27 Thread Lothium
Thanks for you response! Yes, I think to 99.9% there shouldn't be a "late event" and I would also implement a logic in the ProcessFunction, which checks for a specific order of the events per transaction id. Using the clear() function for the state should free the ressources and using that many

Re: Apache Flink - Question about TriggerResult.FIRE

2017-11-27 Thread M Singh
Thanks Aljoscha. On Sunday, November 26, 2017 11:21 PM, Aljoscha Krettek wrote: Hi, Yes, after the watermark (or processing-time) pass "end-of-window + allowed-lateness" everything that is stored for a window is deleted. Best,Aljoscha On 25. Nov 2017, at 18:07, M Singh wrote: Hi: An

Re: Apache Flink - Non equi joins

2017-11-27 Thread M Singh
Thanks Fabian for your detailed response. Mans On Monday, November 27, 2017 1:47 AM, Fabian Hueske wrote: Hi Mans, no, non-equi joins are not supported by the relational APIs because they can be prohibitively expensive to compute.There's one exception. Cross joins where one of the in

Re: Flink stress testing and metrics

2017-11-27 Thread Chesnay Schepler
The most reliable way to see the latency metric is configure a metric reporter . However, only some reporters can properly work with the latency metric (about to change with FLINK-7608 though!). The

Re: Taskmanagers are quarantined

2017-11-27 Thread Chesnay Schepler
Are only some taskmanagers quarantined, or all of them? Do the quarantined taskmanagers have anything in common? (are the failing ones always on certain machines; do the stacktraces reference the same hdfs datanodes) Which hadoop version are you using? From the stack-trace it appears that mul

Re: Flink stress testing and metrics

2017-11-27 Thread Ladhari Sadok
Thanks Aljoscha, as I see it is not fixed yet ( In Progress ) can you give me another solution to visualize the latency or exporting them to a file , ... I want to get the latency in any way: file, graph, ... just to get idea of the latency. Regards. 2017-11-27 13:17 GMT+01:00 Aljoscha Krettek :

Re: Call for responses: Apache Flink user survey 2017

2017-11-27 Thread Till Rohrmann
Hi everyone, As a reminder, the 2017 Apache Flink User Survey will be open for responses through the end of the day today, Monday November 27. Thank you to everyone who's participated already. Cheers, Till On Tue, Nov 7, 2017 at 4:04 PM, Till Rohrmann wrote: > Hi everyone, > > data Artisans is

Re: external checkpoints

2017-11-27 Thread Aljoscha Krettek
Hi, I think we might have a slight misunderstanding here. In most situations you do in fact have to configure two directories: - the directory where the metadata for externalised checkpoints is stored, this would be "state.checkpoints.dir" - the directory where the actual checkpoint data shoul

Re: Flink stress testing and metrics

2017-11-27 Thread Aljoscha Krettek
Hi, This is a known issue: the latency metrics are reported in a format that the web dashboard does not understand. This is the Jira issue for fixing it: https://issues.apache.org/jira/browse/FLINK-7608 Best, Aljoscha > On 27. Nov 2017, at 0

Re: Best way to wait for different events

2017-11-27 Thread Chesnay Schepler
Your approach with ProcessFunction should work in general. Can you guarantee that no event can arrive for a transaction for which an aggregated event was already emitted? On 26.11.2017 18:22, Lothium wrote: Hi, I have a question to a specific use case and hope that you can help me with that.

Re: Status of Kafka011JsonTableSink for 1.4.0 release?

2017-11-27 Thread Fabian Hueske
Hi George, Flink 1.4 will not include a KafkaTableSink for Kafka 0.11 but a DataStream API SinkFunction (KafkaProducer). As an alternative to usingthe Kafka010TableSink, you can also convert the result Table into a DataStream and use the KafkaProducer for Kafka 0.11 to emit the DataStream. We wil

Re: Apache Flink - Non equi joins

2017-11-27 Thread Fabian Hueske
Hi Mans, no, non-equi joins are not supported by the relational APIs because they can be prohibitively expensive to compute. There's one exception. Cross joins where one of the input tables is guaranteed to have a single row (because it is the result of a non-grouped aggregation) are supported in

Re: Creating flink byte[] deserialiser

2017-11-27 Thread Tzu-Li (Gordon) Tai
Hi Soheil, AFAIK, there is no built-in byte array deserializer in Flink. However, it is very simple to implement one. You can do that by implementing the `DeserializationSchema` interface, and for the implementation of the `deserialize` method, simply return the fetched bytes from Kafka as the

Re: Flink stress testing and metrics

2017-11-27 Thread Ladhari Sadok
Thanks Timo for your answer. Can any one else confirm the bug ? 2017-11-23 9:26 GMT+01:00 Timo Walther : > Yes, I agree that this looks like a bug. You can open an issue about that. > Maybe with a small reproduceble example to give others the chance to fix it. > > > Am 11/22/17 um 10:18 PM schrie

Re: S3 Access in eu-central-1

2017-11-27 Thread Dominik Bruhn
Hey, can anyone give a hint? Does anyone have flink running with an S3 Bucket in Frankfurt/eu-central-1 and can share his config and setup? Thanks, Dominik > On 22. Nov 2017, at 17:52, domi...@dbruhn.de wrote: > > Hey everyone, > I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop

Creating flink byte[] deserialiser

2017-11-27 Thread Soheil Pourbafrani
Hi, I want to read(consume) data from Kafka as byte array just something like Kafka byte array deserializer. In Flink I just find SimpleStringSchema and it is note suitable for my data. Is any built-in byte array deserializer in Flink or if not how can I create a simple byte array deserializer?

Re: Dataset read csv file problem

2017-11-27 Thread ebru
Thank you Fabian, we’ve implemented a custom CsvInputFormat. > On 24 Nov 2017, at 15:35, Fabian Hueske wrote: > > Hi Ebru, > > this case is not supported by Flink's CsvInputFormat. The problem is that > such a file could not be read in parallel because it is not possible to > identify record