Re: Kafka Monitoring

2016-11-07 Thread vinay patil
Hi Limbo, I can see the lag by using that URL, but the Lag there is not showing updated results, it does not change, also if you try to to change the consumer group value it will still show you the same value instead of saying consumer group does not exist or similar kind of error :) According to

Re: Kafka Monitoring

2016-11-07 Thread vinay patil
Hi Daniel, My Zookeeper instances are running and I have provided the hosts in Kafka manager config files. Can you please elaborate how do I set it up with Zookeeper ? I have also tried setting offset.storage as Zookeeper , still I don't see the consumer getting listed in KafkaManager, also the k

Re: Kafka Monitoring

2016-11-07 Thread limbo
Hi, I have the same problem, I think the reason is that the consumer of flink use the low level API, and when I type the group name in manager url I can get the lag of the flink consumer, like this: http://your_manager_url/clusters//consumers//topic//type/ZK

Re: Kafka Monitoring

2016-11-07 Thread Daniel Santos
Hello, I have been using that setup. >From my understanding, if one desires to see the offset being consumed by >Flink on KafkaManger, one has to set it up with zookeeper. On 0.9 it will only >serve as a view of progress. Basically what's mandatory on 0.8 is optional on 0.9, and for viewing

Re: Memory on Aggr

2016-11-07 Thread Fabian Hueske
First of all, the document only proposes semantics for Flink's support of relational queries on streams. It does not describe the implementation and in fact most of it is not implemented. How the queries will be executed would depend on the definition of the table, i.e., whether the tables are der

Memory on Aggr

2016-11-07 Thread Alberto Ramón
>From "Relational Queries on Data Stream in Apache Flink" > Bounday Memory Requirements ( https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU/edit# ) *SELECT user, page, COUNT(page) AS pCntFROM pageviews* *GROUP BY user, page* *-Versus-* *SELECT user, page, COUNT(p

Re: Csv to windows?

2016-11-07 Thread Yassine MARZOUGUI
Hi Flelix, As I see in kddcup.newtestdata_small_unlabeled_index , the first field of connectionRecords (splits[0]), is unique for each record, therefore when a

Kafka Monitoring

2016-11-07 Thread Vinay Patil
Hi, I am monitoring Kafka using KafkaManager for checking offset lag and other Kafka metrics, however I am not able to see the consumers when I use FlinkKafkaConsumer , for console-consumer it shows them in the Consumers list. I have set the required parameters for the kafka consumer while runni

Re: Csv to windows?

2016-11-07 Thread Felix Neutatz
Hi Till, the mapper solution makes sense :) Unfortunately, in my case it was not a typo in the path. I checked and saw that the records are read. You can find the whole program here: https://github.com/FelixNeutatz/CluStream/blob/master/flink-java-project/src/main/java/org/apache/flink/clustream

Re: Best way of doing some global initialization

2016-11-07 Thread Aljoscha Krettek
Hi, you would not be able to modify the three sources, you would basically have to reimplement yourself a Flink Kafka Source that is at the same time an operator that listens to this one element. Cheers, Aljoscha On Thu, 3 Nov 2016 at 15:10 Satish Chandra Gupta wrote: > Hi Aljoscha, > > Thanks

Re: window-like use case

2016-11-07 Thread Aljoscha Krettek
Thanks for the update! Let us know if you need anything. On Fri, 4 Nov 2016 at 20:43 Maciek Próchniak wrote: > Hi Aljoscha, > > I know it's tricky... > > Few weeks ago we decided to implement it without windows, using just > stateful operator and some queues/map per key as state - so yeah, we tr

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Till Rohrmann
And this other job also performs a window operation based on event time? What do you mean with “I have a doubt is the necessary parallelism for window operation if reprocessing a skew input from Kafka”? Also be aware that the windowAll operation is executed with a dop of 1, making it effectively

Re: Last event of each window belongs to the next window - Wrong

2016-11-07 Thread Aljoscha Krettek
Hi, why are you keying by the source ID and not by the user ID? Cheers, Aljoscha On Mon, 7 Nov 2016 at 15:42 Till Rohrmann wrote: > Hi Samir, > > the windowing API in Flink works the following way: First an incoming > element is assigned to a window. This is defined in the window clause where >

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Sendoh
Hi Till. Thank you for suggesting. We know the timestamp is correct because another Flink job is running with the three topics correctly. We also know the operators work well before window apply() because we check the result before window apply(). What currently I have a doubt is the necessary pa

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Till Rohrmann
Hi Sendoh, from your description it's really hard to figure out what the problem could be. The first thing to do would be check how many records you actually consume from Kafka and how many items are outputted. Next I would take a look at the timestamp extractor. Can it be that records are discard

Re: Stephan "Apache Flink to very Large State"

2016-11-07 Thread Alberto Ramón
:( thanks 2016-11-07 15:14 GMT+01:00 Till Rohrmann : > Hi Alberto, > > there were some technical problems with the recording. Therefore, the > publication is a little bit delayed. It should be added soon. > > Cheers, > Till > > On Sat, Nov 5, 2016 at 2:26 PM, Alberto Ramón > wrote: > >> Hello >

Re: Last-Event-Only Timer (Custom Trigger)

2016-11-07 Thread Till Rohrmann
Hi Julian, you can use the TriggerContext to register and unregister event time timers which fire when the given event time has been passed. That’s one way to implement what you’ve described. If you don’t want to use time windows you could also use session windows. Take a look at the EventTimeSess

Re: Using Flink with Accumulo

2016-11-07 Thread Josh Elser
Oliver Swoboda wrote: Hi Josh, thank you for your quick answer! 2016-11-03 17:03 GMT+01:00 Josh Elser mailto:els...@apache.org>>: Hi Oliver, Cool stuff. I wish I knew more about Flink to make some better suggestions. Some points inline, and sorry in advance if I suggest somet

Cannot see all events in window apply() for big input

2016-11-07 Thread Sendoh
Hi Flink users, we have an issue to see all events in the window apply() function, while we see them before the window operation. The input is from Kafka and contains at least 6 topics which is at least 30 GB in total, and we have tested locally in IDE and cluster using 1.1.3 and 1.0.3. It works

Re: Datastream reset variable in every-time window in map function

2016-11-07 Thread Till Rohrmann
Hi Subash, if you do the outlier detection as part of your window function/reducer, then you don't have to store state in your operator or you could clear it every time the window function is called (indicating that the window has been finished). Cheers, Till On Sun, Nov 6, 2016 at 10:15 PM, sub

Re: Csv to windows?

2016-11-07 Thread Till Rohrmann
Hi Felix, I'm not sure whether grouping/keyBy by processing time makes semantically any sense. This can be anything depending on the execution order. Therefore, there is not build in mechanism to group by processing time. But you can always write a mapper which assigns the current processing time

Re: Last event of each window belongs to the next window - Wrong

2016-11-07 Thread Till Rohrmann
Hi Samir, the windowing API in Flink works the following way: First an incoming element is assigned to a window. This is defined in the window clause where you create a GlobalWindow. Thus, all elements for the same sourceId will be assigned to the same window. Next, the element is given to a Trigg

Re: Using Flink with Accumulo

2016-11-07 Thread Oliver Swoboda
Hi Josh, thank you for your quick answer! 2016-11-03 17:03 GMT+01:00 Josh Elser : > Hi Oliver, > > Cool stuff. I wish I knew more about Flink to make some better > suggestions. Some points inline, and sorry in advance if I suggest > something outright wrong. Hopefully someone from the Flink side

Re: Stephan "Apache Flink to very Large State"

2016-11-07 Thread Till Rohrmann
Hi Alberto, there were some technical problems with the recording. Therefore, the publication is a little bit delayed. It should be added soon. Cheers, Till On Sat, Nov 5, 2016 at 2:26 PM, Alberto Ramón wrote: > Hello > > In FlinkForward 2016, There was a meet: > > http://flink-forward.org/kb_

Re: Parallelizing DataStream operations on Array elements

2016-11-07 Thread Till Rohrmann
In order to parallelize by voxel you have to do a keyBy(rowId) given that rowId is the same as voxel id. Glad to hear that you’ve resolved the problem :-) Cheers, Till ​ On Sat, Nov 5, 2016 at 2:47 AM, danielsuo wrote: > I was able to resolve my issue by collecting all the 'column' Arrays via

Re: Flink Time Window State

2016-11-07 Thread Ufuk Celebi
The goal is to do it before the end of this year. For this to happen, the first release canidate would need to be available by end of November/beginning of December. There is an ongoing discussion here:  http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Schedule-and-Scope-fo

Re: Flink failed when can not connect to BlobServer

2016-11-07 Thread Stephan Ewen
Hi! The Blob server runs on the JobManager and is used to distribute JAR files. The best way to handle this scale is the following: Option (1) Use the 1.2-SNAPSHOT version to run Flink on YARN, it will add the JAR files to the Job's YARN resources - so no BLOBs need to be fetched. Option (2)

Re: Flink Time Window State

2016-11-07 Thread Daniel Santos
Hi, Thank you Ufuk. Hmm. Out of curiosity. Is there any idea when will 1.2 be released? Best Regards, Daniel Santos On November 7, 2016 12:45:51 PM GMT+00:00, Ufuk Celebi wrote: >On 7 November 2016 at 13:06:16, Daniel Santos (dsan...@cryptolab.net) >wrote: >> I believe the job won't star

Re: Flink Time Window State

2016-11-07 Thread Ufuk Celebi
On 7 November 2016 at 13:06:16, Daniel Santos (dsan...@cryptolab.net) wrote: > I believe the job won't start from the last savepoint is that correct, > on versions ( > 1.2 ), it will start afresh ? Yes, with 1.2 you will be able to take a savepoint and then resume from that savepoint with differe

Re: automatically submit a job to a HA cluster

2016-11-07 Thread Ufuk Celebi
On 7 November 2016 at 13:00:30, Andrew Ge Wu (andrew.ge...@gmail.com) wrote: > Hi, > > We have a streaming job wants to submit to a HA cluster via jenkins. > Recently we had a downtime on one of the master node and we have it > restarted, it seems the > backup master became master and submitti

Re: Flink Time Window State

2016-11-07 Thread Daniel Santos
Hi, Thank you very much. I see. Ok it makes sense. I believe there is kinda catch with parallelism. Say one does a savepoint and then it changes the parallelism. I believe the job won't start from the last savepoint is that correct, on versions ( > 1.2 ), it will start afresh ? Best Regard

automatically submit a job to a HA cluster

2016-11-07 Thread Andrew Ge Wu
Hi, We have a streaming job wants to submit to a HA cluster via jenkins. Recently we had a downtime on one of the master node and we have it restarted, it seems the backup master became master and submitting to the original master does not do anything. Currently we are using command line to ca

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-07 Thread Ufuk Celebi
On 4 November 2016 at 17:09:25, Josh (jof...@gmail.com) wrote: > Thanks, I didn't know about the -z flag! > > I haven't been able to get it to work though (using yarn-cluster, with a > zookeeper root configured to /flink in my flink-conf.yaml) > > I can see my job directory in ZK under > /fl

Last-Event-Only Timer (Custom Trigger)

2016-11-07 Thread Julian Bauß
Hello everybody, I'm currently trying to implement a Function that allows me to detect that a certain amount of time has passed after receiving the last element of a stream (in a given time window). For example if nothing happened for 6 hours within a given Session I want to do something (set a fl

Flink failed when can not connect to BlobServer

2016-11-07 Thread Si-li Liu
Hi, all I use Flink DataSet API to do some batch job, read some log then group and sort them. Our cluster has almost 2000 servers, we get used to use traditional MR job, then I tried Flink to do some experiment job, but I counter this error and can not continue, does anyone can help with it? Our

Re: Flink stream job change and recovery

2016-11-07 Thread Fabian Hueske
Hi, savepoints are essentially checkpoints with some additional metadata. So changing the parallelism in 1.2 will work as follows: - take a savepoint (i.e., an explicitly triggered checkpoint) and stop the job - restart the application from the savepoint with new parallelism The application will

AW: Re: Lot of RocksDB files in tmp Directory

2016-11-07 Thread Dominique Rondé
First of all, thanks for the explanation. That sounds reasonable. But I started the flink routes 3 days ago and went out for the weekend. Since we are two people with access to flink i guess there is something strange happen... GreetsDominique  Von meinem Samsung Gerät gesendet.