Re: Kinesis connector SHARD_GETRECORDS_MAX default value

2017-06-22 Thread Tzu-Li (Gordon) Tai
Hi Steffen, Thanks for bringing up the discussion! I think the reason why SHARD_GETRECORDS_INTERVAL_MILLIS was defaulted to 0 in the first place was because we didn’t want false impressions that the there was some latency introduced in Flink with the default settings. To this end, I’m leaning t

Re: Quick Question...

2017-06-22 Thread Steve Jerman
Thx for the quick answer Get Outlook for iOS From: Chesnay Schepler Sent: Thursday, June 22, 2017 12:13:23 PM To: user@flink.apache.org Subject: Re: Quick Question... Hello, in the DataSet API you can do this when specifying your transform

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-22 Thread Adarsh Jain
Hi Stefan, Yes your understood right, when I give full path till the filename it works fine however when I give path till directory it does not read the data, doesn't print any exceptions too ... I am also not sure why it is behaving like this. Should be easily replicable, in case you can try. Wi

Re: Quick Question...

2017-06-22 Thread Chesnay Schepler
Hello, in the DataSet API you can do this when specifying your transformations, something along the lines of dataset.map(..).withConfiguration. In the DataStream API you cannot set the Configuration at all. Note that in both APIs you can also just pass the Configuration into the constructor

Quick Question...

2017-06-22 Thread Steve Jerman
Hi, I have a quick question… How do I set the Configuration passed into RichFunction.open? I *thought* that setting GlobalJobParameters would do it ... env.getConfig().setGlobalJobParameters(jobParameters); But it seems not… Steve

Different Window Sizes in keyed stream

2017-06-22 Thread Ahmad Hassan
Hi All, I want to know if flink allows to define sliding window size and slide time on the fly. For example I want to configure sliding window of size 2 min and slide 1 min for tenant A but size 10 min and slide min for tenant B in a keyed stream and so on for other tenants. My code is below.

Re: Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Thanks Fabian and Stefan for all the help. Best Regards, > On 22 Jun 2017, at 18:06, Fabian Hueske wrote: > > 1].

Re: Keyed windows with single sink

2017-06-22 Thread Fabian Hueske
You have to register an event-time timer in the `processElement()` method. You'll get a callback to `onTimer()` when the function receives a watermark that is greater than the registered timer. So you can always register a timer for the end time of the next window to get a call back to `onTimer()`

Re: Error "key group must belong to the backend" on restore

2017-06-22 Thread Gyula Fóra
Thanks Stefan for the tip, in this case I have a Long key so it's unlikely that the hash code has changed. And as I mentioned I have several jobs with the same exact topolgy which run just fine after migration. It is super weird... Maybe I am blind to some stupid error, so I'll keep looking. Gyul

Re: Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Hi Fabian, How the process function will be called at 12:00:01 as there are no windows output or events after 12:00:00. Thanks > On 22 Jun 2017, at 17:07, Fabian Hueske wrote: > > Let's say window A and window B end at 12:00:00 and window C at 13:00:00. > When the ProcessFunction receives a

Re: Error "key group must belong to the backend" on restore

2017-06-22 Thread Stefan Richter
Hi, I have seen the first exception in cases where the key had no proper and stable hash code method, e.g. when the key was an array. What the first exception basically means is that the backend received a key, which it does not expect because determined by the hash the key belongs to a key gro

Re: Keyed windows with single sink

2017-06-22 Thread Fabian Hueske
Let's say window A and window B end at 12:00:00 and window C at 13:00:00. When the ProcessFunction receives a watermark at 12:00:01, it knows that Window A and B have been finished. When it receives a watermark of 13:00:01 it knows that all results of window C have been received. If there were no r

Re: coGroup exception or something else in Gelly job

2017-06-22 Thread Kaepke, Marc
Hi Greg if you have an idea, I'm still interested. In case you didn't, please give me a feedback too. Best, Marc Sent from my iPhone On 15. Jun 2017, at 15:19, Kaepke, Marc mailto:marc.kae...@haw-hamburg.de>> wrote: Hi Greg, I wanna ask if there was any news about the implementation or oppo

Error "key group must belong to the backend" on restore

2017-06-22 Thread Gyula Fóra
Hi all! I am wondering if anyone has any practical idea why I might get this error when migrating a job from 1.2.1 to 1.3.0? Idea on debugging might help as well. I have several almost exactly similar jobs (minor config differences) and all of them succeed except for this single job. I have see

Re: Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Thanks for the answers. My scenario is: | Window A | | Window B | | Window C | If no events are received for Window C, then how process function would know that both window 'A' and window 'B' have finished and need to aggregated their result before sink is called? Thanks On

Re: Performance Improvement on Flink 1.2.0

2017-06-22 Thread Greg Hogan
Some documentation on application profiling with Flink 1.3 (can be manually inserted into the scripts for Flink 1.2): https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/application_profiling.html > On Jun 22, 2017, at 9:24 AM, Stefan Richter > wrote: > > Hi, > > the an

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-22 Thread Stefan Richter
Hi, I am not sure I am getting the problem right: the code works if you use a file name, but it does not work for directories? What exactly is not working? Do you get any exceptions? Best, Stefan > Am 22.06.2017 um 17:01 schrieb Adarsh Jain : > > Hi, > > I am trying to use "Recursive Travers

Re: Keyed windows with single sink

2017-06-22 Thread Fabian Hueske
Hi Ahmad, Flink's watermark mechanism guarantees that when you receive a watermark for time t all records with a timestamp smaller than t have been received as well. Records emitted from a window have the timestamp of their end time. So, the ProcessFunction receives a timestamp for 12:00:00 you ca

Re: Problem with ProcessFunction timeout feature

2017-06-22 Thread Stefan Richter
Hi, if I understand correctly, your problem is that event time does not progress in case you don’t receive events, so you cannot detect the timeout of devices. Would it make sense to have you source periodically send artificial events to advance the watermark in the absence of device events, wi

About nodes number on Flink

2017-06-22 Thread AndreaKinn
Hello, I'm developing a Flink toy-application on my local machine before to deploy the real one on a real cluster. Now I have to determine how many nodes I need to set the cluster. I already read these documents: jobs and scheduling

Recursive Traversal of the Input Path Directory, Not working

2017-06-22 Thread Adarsh Jain
Hi, I am trying to use "Recursive Traversal of the Input Path Directory" in Flink 1.3 using scala. Snippet of my code below. If I give exact file name it is working fine. Ref https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html import org.apache.flink.api.java.utils.Pa

Re: Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Hi Stefan, How process function would know that the last window result has arrived? Because slidingwindows slide every 5 minutes which means that window of new time-range or new watermark will arrive after 5 minutes. Thanks On 22 June 2017 at 15:10, Stefan Richter wrote: > The process functio

Problem with ProcessFunction timeout feature

2017-06-22 Thread Álvaro Vilaplana García
Hi, Please, can you help me with a problem? I summarise in the next points, I hope is enough clear to approach some help. a) We have devices, each with its own ID, which we don’t have control of b) These devices send messages, with an internally generated, non-synced (amongst other devices) tim

Re: Keyed windows with single sink

2017-06-22 Thread Stefan Richter
The process function has the signature void processElement(I value, Context ctx, Collector out) throws Exception where the context is providing access to the current watermark and you can also register timer callbacks, when that trigger when a certain watermark is reached. You can simply monitor

Re: Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Thanks Stefan, But how the Process function will have these watermarks? I have sliding windows like below final DataStream eventStream = inputStream .keyBy(TENANT, CATEGORY) .window(SlidingProcessingTimeWindows.of(Time.minutes(100,Time.minutes(5))) .fold(new WindowStats(), new ProductAggregatio

Re: Keyed windows with single sink

2017-06-22 Thread Stefan Richter
Hi, one possible approach could be that you have a process function before the sink. Process function is aware of watermarks, so it can collect and buffer window results until it sees a watermark. This is the signal that all results for windows with an end time smaller than the watermark are co

Re: Performance Improvement on Flink 1.2.0

2017-06-22 Thread Stefan Richter
Hi, the answer highly depends on what you job is doing and there is no information about that. Also what is your target in performance? Are you using batch or streaming? If you feel like the performance is lower than expected, I suggest that you do some profiling to figure out the hotspots. For

Keyed windows with single sink

2017-06-22 Thread Ahmad Hassan
Hi All, I am using categoryID as a keyby attribute for creating keyed stream from a product event stream. Keyed stream then creates time windows for each category. However, when the window time expires, i want to write the output data of all the products in all all categories in a single atomic op

Re: Flink equivalent to Samza's bootstrap stream?

2017-06-22 Thread Stefan Richter
Hi, unfortunately, there is no direct and easy API for this, but it seems worthwhile having this in the future. Nevertheless, I can see two ways of doing this with low effort in Flink: 1. The cleanest way I could think of: implement your own multiplexing in the source. At first, the source wil

Re: HA Standalone Cluster configuration

2017-06-22 Thread Stefan Richter
Hi, I think 1. should not be a problem if the machine has enough capacities to run both. 2. is not truly harmful if you have more than one Zookeeper node, but in case the machine of your JM goes down, it also takes off one ZK node. It is no problem if the remaining ZK nodes can take over to re

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread sohimankotia
Hi Chesnay, I have data categorized on some attribute(Key in partition ) which will be having n possible values. As of now job is enabled for only one value of that attribute . In couple of days we will enable all values of attribute with more parallelism so each attribute's type data get process

Re: flink doesn't seem to serialize generated Avro pojo

2017-06-22 Thread Bart van Deenen
Thanks for the explanation, at least it's a start. Bart On Thu, Jun 22, 2017, at 11:20, Stefan Richter wrote: > Hi, > > this „amazing error“ message typically means that the class of the object > instance you created was loaded by a different classloader than the one > that loaded the class in t

Re: Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread Chesnay Schepler
So let's get the obvious question out of the way: Why are you adding a partitioner when your parallelism is 1? On 22.06.2017 11:58, sohimankotia wrote: I have a execution flow (Streaming Job) with parallelism 1. source -> map -> partitioner -> flatmap -> sink Since adding partitioner will s

Partitioner is spending around 2 to 4 minutes while pushing data to next operator

2017-06-22 Thread sohimankotia
I have a execution flow (Streaming Job) with parallelism 1. source -> map -> partitioner -> flatmap -> sink Since adding partitioner will start new thread but partitioner is spending average of 2 to 4 minutes while moving data from map to flat map . For more details about this : http://apach

Performance Improvement on Flink 1.2.0

2017-06-22 Thread Samim Ahmed
Hi All, This query regarding the flink performance improvement . *Flink Configuration:* using flink in clustor mode with 3 salves and a master configuration slots used 30 (as the system has 30 core) task manager memory 30GB parallelism used : 30 jobmanager.heap.mb: 20480 taskmanager.heap.mb: 2048

Re: flink doesn't seem to serialize generated Avro pojo

2017-06-22 Thread Stefan Richter
Hi, this „amazing error“ message typically means that the class of the object instance you created was loaded by a different classloader than the one that loaded the class in the code that tries to cast it. A class in Java is fully identified by the canonical classname AND the classloader that l

flink doesn't seem to serialize generated Avro pojo

2017-06-22 Thread Bart van Deenen
Hi All I have a simple avro file {"namespace": "com.kpn.datalab.schemas.omnicrm.contact_history.v1", "type": "record", "name": "contactHistory", "fields": [ {"name": "events", "type": {"type":"array", "items": "bytes"}}, {"name": "krn", "type": "string"} ]

Re: Painful KryoException: java.lang.IndexOutOfBoundsException on Flink Batch Api scala

2017-06-22 Thread Tzu-Li (Gordon) Tai
Thanks a lot Andrea! On 21 June 2017 at 8:36:32 PM, Andrea Spina (andrea.sp...@radicalbit.io) wrote: I Gordon, sadly no news since the last message. At the end I jumped over the issue, I was not able to solve it. I'll try provide a runnable example asap. Thank you. Andrea -- View thi

Re: Related datastream

2017-06-22 Thread nragon
I believe I could try with microbatch system in order to release some memory. Meaning, if I have to generate 1M records splitting in 100m each iteration. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Related-datastream-tp13901p13908.html Se

Re: Kafka and Flink integration

2017-06-22 Thread Urs Schoenenberger
Hi Greg, do you have a link where I could read up on the rationale behind avoiding Kryo? I'm currently facing a similar decision and would like to get some more background on this. Thank you very much, Urs On 21.06.2017 12:10, Greg Hogan wrote: > The recommendation has been to avoid Kryo where p