Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-17 Thread Piotr Nowojski
Hi, Thanks for the additional data. Just to make sure, are you using Flink 1.5.0? There are a couple of threads that seams to be looping in serialisation, while others are blocked and either waiting for new data or waiting for some one to consume some data. Could you debug or CPU profile the co

Re: Flink CLI properties with HA

2018-07-17 Thread Sampath Bhat
Hi vino Should the flink CLI have access to the path mentioned in *high-availability.storageDir*? If my flink cluster is on set of machines and i submit my job from flink CLI from another independent machine by giving necessary details will the CLI try to access *high-availability.storageDir *path

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Timo Walther
Hi Tison, I guess this was a mistake that will be fixed soon. Till (in CC) forked off the release-1.6 branch yesterday? Regards, Timo Am 17.07.18 um 04:00 schrieb 陈梓立: Hi, I see no 1.6 branch or tag. What's the reason we skip 1.6 and now 1.7-SNAPSHOT? or there is a 1.6 I miss. Best, tiso

Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Soheil Pourbafrani
Hi, according to the documents I tried to get late data using side output. final OutputTag> lateOutputTag = new OutputTag>("late-data"){}; DataStream> res = aggregatedTuple .assignTimestampsAndWatermarks(new Bound()) }).keyBy(1).timeWindow(Time.milliseconds(160))/*.countWi

Re: Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Dawid Wysakowicz
Hi Soheil, The /getSideOutput/ method is part of /SingleOutputStreamOperator/ which extends /DataStream///. Try using /SingleOutputStreamOperator/ as the type for your res variable. Best, Dawid On 17/07/18 09:36, Soheil Pourbafrani wrote: > Hi, according to the documents I tried to get late da

Re: Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Xingcan Cui
Hi Soheil, The `getSideOutput()` method is defined on the operator instead of the datastream. You can invoke it after any action (e.g., map, window) performed on a datastream. Best, Xingcan > On Jul 17, 2018, at 3:36 PM, Soheil Pourbafrani wrote: > > Hi, according to the documents I tried to

Re: Parallelism and keyed streams

2018-07-17 Thread Nicholas Walton
Martin, To clarify things the code causing the issue is here, nothing clever. The code fails at the line in bold. The Long index values are set earlier in sequence 1,2,3,4,5,6,7…... val scaledReadings : DataStream[(Int,Long, Double, Double)] = maxChannelReading .keyBy(0) .map { in =

Re: Flink WindowedStream - Need assistance

2018-07-17 Thread Titus Rakkesh
Friends, any assistance regarding this? On Mon, Jul 16, 2018 at 3:34 PM, Titus Rakkesh wrote: > We have 2 independent streams which will receive elements in different > frequency, > > DataStream> splittedActivationTuple; > > DataStream> unionReloadsStream; > > We have a requirement to keep "spli

Global latency metrics

2018-07-17 Thread shimin yang
Hi All, Is there a method to get the global latency directly? Since I only find the latency for each operator in the Flink Rest API. Best, Shimin

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Chesnay Schepler
The release-1.6 branch exists (https://git-wip-us.apache.org/repos/asf?p=flink.git;a=shortlog;h=refs/heads/release-1.6), but wasn't synced to GitHub yet. On 17.07.2018 09:33, Timo Walther wrote: Hi Tison, I guess this was a mistake that will be fixed soon. Till (in CC) forked off the release-

Re: Global latency metrics

2018-07-17 Thread Chesnay Schepler
No, you can only get the latency for each operator. For starters, how would a global latency even account for multiple sources/sink? On 17.07.2018 10:22, shimin yang wrote: Hi All, Is there a method to get the global latency directly? Since I only find the latency for each operator in the F

Re: Flink CLI properties with HA

2018-07-17 Thread vino yang
Hi Sampath, It seems Flink CLI for standalone would not access *high-availability.storageDir.* What's the exception stack trace in your environment? Thanks, vino. 2018-07-17 15:08 GMT+08:00 Sampath Bhat : > Hi vino > > Should the flink CLI have access to the path mentioned in > *high-availabil

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Till Rohrmann
Yes, pulling from https://git-wip-us.apache.org/repos/asf/flink.git should show you the release-1.6 branch. Cheers, Till On Tue, Jul 17, 2018 at 10:37 AM Chesnay Schepler wrote: > The release-1.6 branch exists ( > https://git-wip-us.apache.org/repos/asf?p=flink.git;a=shortlog;h=refs/heads/relea

Re: Flink CLI properties with HA

2018-07-17 Thread Till Rohrmann
Hi Sampath, technically the client does not need to know the `high-availability.storageDir` to submit a job. However, due to how we construct the ZooKeeperHaServices it is still needed. The reason behind this is that we use the same services for the server and the client. Thus, the implementation

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-17 Thread Ashwin Sinha
Hi Timo, We want to add this functionality in a forked branch. Can you guide us with steps to quickly apply a patch/fix for the same? On Mon, Jul 16, 2018 at 9:06 PM Ashwin Sinha wrote: > Thanks Timo for the clarification, but our processing actually involves > aggregations on huge past data al

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-17 Thread Timo Walther
Hi Ashwin, if you quickly want to make this work you can look into org.apache.flink.table.descriptors.RowtimeValidator#getRowtimeComponents. This is the component that converts the string property into a org.apache.flink.table.sources.tsextractors.TimestampExtractor. You can implement your c

Object reuse in DataStreams

2018-07-17 Thread Urs Schoenenberger
Hi all, we came across some interesting behaviour today. We enabled object reuse on a streaming job that looks like this: stream = env.addSource(source) stream.map(mapFnA).addSink(sinkA) stream.map(mapFnB).addSink(sinkB) Operator chaining is enabled, so the optimizer fuses all operations into a

Re: StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-17 Thread Till Rohrmann
Hi Elias, I think introducing a new state and the deprecating the old one is currently the only way to solve this problem. The community is currently working on supporting state evolution [1]. With this feature it should be possible to change serializers between two savepoints. Unfortunately, the

Serialization questions

2018-07-17 Thread Flavio Pompermaier
Hi to all, I was trying to check whether our jobs are properly typed or not. I've started disabling generic types[1] in order to discover untyped transformations and so I added the proper returns() to operators. Unfortunately there are jobs where we serialize Thrift and DateTime objects, so I need

[ANNOUNCE] Weekly community update #29

2018-07-17 Thread Till Rohrmann
Dear community, this is the weekly community update thread #29. Please post any news and updates you want to share with the community to this thread. # Feature freeze Flink 1.6 The Flink community has cut off the release branch for Flink 1.6 [1]. From now on, the community will concentrate on fi

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread 陈梓立
Yes I can see it now. Thank you all! Till Rohrmann 于2018年7月17日周二 下午7:53写道: > Yes, pulling from https://git-wip-us.apache.org/repos/asf/flink.git > should show you the release-1.6 branch. > > Cheers, > Till > > On Tue, Jul 17, 2018 at 10:37 AM Chesnay Schepler > wrote: > >> The release-1.6 branc

RequiredParameters in Flink 1.5.1

2018-07-17 Thread Flavio Pompermaier
Hi to all, I'm trying to migrate a job from Flink 1.3.1 to 1.5.1 but it seems that RequiredParameters and ParameterTool works differently from before... My code is the following: ParameterTool parameters = ParameterTool.fromArgs(args); RequiredParameters required = new RequiredParameters(); requi

FlinkCEP and scientific papers ?

2018-07-17 Thread Esa Heikkinen
Hi I don't know this the correct forum to ask, but are there exist some good scientific papers about FlinkCEP (Complex Event Processing) ? I know Flink is based to Stratosphere, but how is it FlinkCEP ? BR Esa

RE: AvroInputFormat NullPointerException issues

2018-07-17 Thread Porritt, James
My MyAvroSchema class is as follows. It was generated using avro-tools: /** * Autogenerated by Avro * * DO NOT EDIT DIRECTLY */ import org.apache.avro.specific.SpecificData; import org.apache.avro.message.BinaryMessageEncoder; import org.apache.avro.message.BinaryMessageDecoder; import org.apache

clear method on Window Trigger

2018-07-17 Thread Soheil Pourbafrani
Hi, Can someone elaborate on when the clear method on class Trigger will be called and what is the duty of that? Also, I don't know what is the benefit of FIRE_AND_PURGE against FIRE and it's use case. For example, in a scenario, if we have a count of 3 Window that also will trigger after a timeo

回复:Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-17 Thread Zhijiang(wangzhijiang999)
Hi Gerard, From the jstack you provided, the task is serializing the output record and during this process it will not process the input data any more. It can not indicate out of memory issue from this stack. And if the output buffer is exhausted, the task will be blocked on requestBufferBlocki

RE: AvroInputFormat NullPointerException issues

2018-07-17 Thread Porritt, James
I got to the bottom of this – it was a namespace issue. My schema was; { "type" : "record", "name" : "MyAvroSchema", "fields" : [ { "name" : "a", "type" : [ "null", "int" ] }, { "name" : "b", "type" : [ "null", "string" ] }] } But actually, I was putting the generated MyAv

Re: AvroInputFormat NullPointerException issues

2018-07-17 Thread vino yang
Hi Porritt, OK, it looks good. Thanks, vino. 2018-07-17 23:13 GMT+08:00 Porritt, James : > I got to the bottom of this – it was a namespace issue. My schema was; > > { > > "type" : "record", > > "name" : "MyAvroSchema", > > "fields" : [ { > > "name" : "a", > > "type" : [ "null", "

Re: flink javax.xml.parser Error

2018-07-17 Thread antonio saldivar
If somebody is facing this issue I solve it by adding the exclusion to my POM.xml and I am also using javax.xml org.apache.flink artifactId>flink-core 1.4.2 xml-apis xml-apis javax.xml jaxb-api 2.1 El lun., 16 ju

[ANNOUNCE] Program for Flink Forward Berlin 2018 has been announced

2018-07-17 Thread Fabian Hueske
Hi everyone, I'd like to announce the program for Flink Forward Berlin 2018. The program committee [1] assembled a program of about 50 talks on use cases, operations, ecosystem, tech deep dive, and research topics. The conference will host speakers from Airbnb, Amazon, Google, ING, Lyft, Microsof

Re: FlinkCEP and scientific papers ?

2018-07-17 Thread vino yang
Hi Esa, AFAIK, the earlier Flink CEP refers to the Paper 《Efficient Pattern Matching over Event Streams》[1]. Flink absorbed two major idea from this paper: 1. NFA-b model on event stream 2. a shared versioned match buffer which is a optimized data structure To Till and Chesnay: Did I missed an

Re: clear method on Window Trigger

2018-07-17 Thread vino yang
Hi Soheil, Did you read the documentation about Flink Window/Trigger [1]? FIRE_AND_PURGE usually used to implement the count window. Flink provide a PurgingTrigger as a warper for other trigger to make those triggers can be purge. One of this class use case is count window[2][3]. About your exam

Parallel stream partitions

2018-07-17 Thread Nicholas Walton
Suppose I have a data stream of tuples with the sequence of ticks being 1,2,3,…. for each separate k. I understand and keyBy(2) will partition the stream so each partition has the same key in each tuple. I now have a sequence of functions to apply to the streams say f(),g() and h() in that ord

Re: clear method on Window Trigger

2018-07-17 Thread Hequn Cheng
Hi Soheil, The clear() method performs any action needed upon removal of the corresponding window. This is called when a window is purged. The differences between FIRE and FIRE_AND_PURGE is FIRE only trigger the computation while FIRE_AND_PURGE trigger the computation and clear the elements in the

Keeping only latest row by key?

2018-07-17 Thread Porritt, James
In Spark if I want to be able to get a set of unique rows by id, using the criteria of keeping the row with the latest timestamp, I would do the following: .withColumn("rn", F.row_number().over( Window.partitionBy

Re: Parallel stream partitions

2018-07-17 Thread Ken Krugler
Hi Nick, > On Jul 17, 2018, at 9:09 AM, Nicholas Walton > wrote: > > Suppose I have a data stream of tuples > with the sequence of ticks being 1,2,3,…. for each separate k. > > I understand and keyBy(2) I think you want keyBy(1), since it’s 0-based. > will partition t

FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

2018-07-17 Thread sagar loke
Hi, We have a use case where we are consuming from more than 100s of Kafka Topics. Each topic has different number of partitions. As per the documentation, to parallelize a Kafka Topic, we need to use setParallelism() == number of Kafka Partitions for a topic. But if we are consuming multiple to

Re: FlinkCEP and scientific papers ?

2018-07-17 Thread Till Rohrmann
You are right Vino, the initial implementation was based on the above mentioned paper. Cheers, Till On Tue, Jul 17, 2018 at 5:34 PM vino yang wrote: > Hi Esa, > > AFAIK, the earlier Flink CEP refers to the Paper 《Efficient Pattern > Matching over Event Streams》[1]. Flink absorbed two major id

Re: Cannot configure akka.ask.timeout

2018-07-17 Thread Lukas Kircher
Hello, does anybody have an idea what is going on? I have not yet found a solution. Am I doing something wrong? Or is the 'akka.ask.timeout' parameter not related to the exception stated below? Could somebody please take a look at this? More details can be found in the message prior to this.