Re: Test harness for CoProcessFunction outputting Protobuf messages

2019-02-04 Thread Alexey Trenikhun
Sure - https://jira.apache.org/jira/browse/FLINK-11523 Thanks,Alexey From: Fabian Hueske Sent: Monday, February 4, 2019 5:59 AM To: Alexey Trenikhun Cc: user@flink.apache.org Subject: Re: Test harness for CoProcessFunction outputting Protobuf messages Hi Alexey,

Re: KakfaConsumer

2019-02-04 Thread Vishal Santoshi
In fact "*Checkpointing enabled:* if checkpointing is enabled, the Flink Kafka Consumer will commit the offsets stored in the checkpointed states when the checkpoints are completed. This ensures that the committed offsets in Kafka brokers is consistent with the offsets in the checkpointed states."

Re: KakfaConsumer

2019-02-04 Thread Vishal Santoshi
So it does that also commit the offsets to *kafka* on* checkpoint/savepoint *as well to it's own distributed state ? Just wanted to confirm. On Mon, Feb 4, 2019 at 6:56 PM Nagarjun Guraja wrote: > Hi Vishal, > > Flink does checkpoint to Kafka(Offset commits) by default which could be > disabled

Re: KakfaConsumer

2019-02-04 Thread Nagarjun Guraja
Hi Vishal, Flink does checkpoint to Kafka(Offset commits) by default which could be disabled. Look here for more information. Regards, Nagarjun *Success is

KakfaConsumer

2019-02-04 Thread Vishal Santoshi
A simple query Does Flink' KafkaConnector flush the current offsets to kafka on a SP ? Note that the I do koow that Flink consumes data from Kafka topics and periodically checkpoints using Flink's distributed checkpointing mechanism. In case of failure, Flink will restore the records from checkpoi

Re: Is the order guaranteed with Windowall

2019-02-04 Thread morin . david . bzh
ok great. Thanks ! On 2019/02/04 18:00:16, Fabian Hueske wrote: > Yes, I think that should work. > > Best, Fabian > > Am Mo., 4. Feb. 2019 um 18:35 Uhr schrieb morin.david@gmail.com < > morin.david@gmail.com>: > > > Hello Fabian, > > > > Thanks ! > > According to your answers on this

Re: Regarding json/xml/csv file splitting

2019-02-04 Thread Ken Krugler
Normally parallel processing of text input files is handled via Hadoop TextInputFormat, which support splitting of files on line boundaries at (roughly) HDFS block boundaries. There are various XML Hadoop InputFormats available, which try to sync up with splittable locations. The one I’ve used

Re: How to add caching to async function?

2019-02-04 Thread Lasse Nedergaard
Hi William We have created a solution that do it. Please take a look at my presentation from Flink forward. https://www.slideshare.net/mobile/FlinkForward/flink-forward-berlin-2018-lasse-nedergaard-our-successful-journey-with-flink Hopefully you can get inspired. Med venlig hilsen / Best regar

Re: How to add caching to async function?

2019-02-04 Thread Fabian Hueske
Hi William, Does the cache need to be fault tolerant? If not you could use a regular in-memory map as cache (+some LRU cleaning). Or do you expect the cache to group too large for the memory? Best, Fabian Am Mo., 4. Feb. 2019 um 18:00 Uhr schrieb William Saar : > Hi, > I am trying to implement

Re: Gelly Scatter-Gather Iteration, In a single superstep, GatherFunction.updateVertex invoked more then once

2019-02-04 Thread Greg Hogan
Would you perchance have an example program to demonstrate the unexpected behavior? Does this issue always manifest or are you only seeing duplicate calls under specific circumstances? On Mon, Oct 22, 2018 at 8:33 AM 曹建华 wrote: > Hi: > According to the code comment, in Scatter-Gather Iteration,

Re: Is the order guaranteed with Windowall

2019-02-04 Thread Fabian Hueske
Yes, I think that should work. Best, Fabian Am Mo., 4. Feb. 2019 um 18:35 Uhr schrieb morin.david@gmail.com < morin.david@gmail.com>: > Hello Fabian, > > Thanks ! > According to your answers on this post > https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-ca

Re: Is the order guaranteed with Windowall

2019-02-04 Thread morin . david . bzh
Hello Fabian, Thanks ! According to your answers on this post https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key, if I'm right I can use my sort function followed by a keyby and use a Window for aggregate these events. And the order will be preser

How to add caching to async function?

2019-02-04 Thread William Saar
Hi, I am trying to implement an async function that looks up a value in a cache or, if the value doesn't exist in the cache, queries a web service, but I'm having trouble creating the cache. I've tried to create a RichAsyncFunction and add a map state as cache, but I'm getting: State is not suppor

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
Thanks Fabian for the explanation. Let me do some more reading so what you said can sync-in little more. From: Fabian Hueske Date: Monday, February 4, 2019 at 10:22 AM To: "Aggarwal, Ajay" Cc: Congxian Qiu , "user@flink.apache.org" Subject: Re: Reverse of KeyBy Hi, Subpartitions are just a

Re: Reverse of KeyBy

2019-02-04 Thread Fabian Hueske
Hi, Subpartitions are just a logical concept. When you keyBy a stream, the next operator will be applied in a keyed context. After that, the data might still be partitioned, but the keyed context is gone. Is this what you mean by automatic "joining of partitioned sub-streams"? With the program th

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
Yes, LargeMessageId is globally unique, so I shouldn’t need composite key. So both of you are suggesting I do the following InputStream (1) .keyBy (LargeMessageId) (2) .flatMap(new MyReassemblyFunction()) (3) .keyBy(MyKey) (4) .??? Let me explain my doubt (perhaps due to lack of

Re: Add header to a file produced using the writeAsFormattedText method

2019-02-04 Thread Fabian Hueske
Hi, I'm not aware of any plans for this. It might be an interesting feature for the relational APIs but I don't think it would be added to the DataSet or DataStream API. Best, Fabian Am Mo., 4. Feb. 2019 um 15:23 Uhr schrieb Papadopoulos, Konstantinos < konstantinos.papadopou...@iriworldwide.com

RE: Add header to a file produced using the writeAsFormattedText method

2019-02-04 Thread Papadopoulos, Konstantinos
Hi Fabian, Do you know if there is any plan Flink core framework to support such functionality? Best, Konstantinos From: Fabian Hueske Sent: Δευτέρα, 4 Φεβρουαρίου 2019 3:49 μμ To: Papadopoulos, Konstantinos Cc: user@flink.apache.org Subject: Re: Add header to a file produced using the writeA

Re: Reverse of KeyBy

2019-02-04 Thread Fabian Hueske
Hi, Calling keyBy twice will not work, because the second call overrides the first. You can keyBy on a composite key (MyKey, LargeMessageId). You can do the following InputStream .keyBy ( (MyKey, LargeMessageId) ) \\ use composite key .flatMap(new MyReassemblyFunction()) .keyBy(MyKey) .?

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
Thank you for your suggestion. But per my understanding if I KeyBy (LargeMessageId) first then I can’t guarantee order of LargeMessages per MyKey. Because MyKey messages will get spread over multiple partitions by LargeMessageId. Am I correct? From: Congxian Qiu Date: Sunday, February 3, 2019

Re: Test harness for CoProcessFunction outputting Protobuf messages

2019-02-04 Thread Fabian Hueske
Hi Alexey, I think you are right. It does not seem to be possible to provide a TypeInformation for side outputs to a TestHarness. This sounds like a useful addition. Would you mind creating a Jira issue for that? Thank you, Fabian Am So., 3. Feb. 2019 um 19:13 Uhr schrieb Alexey Trenikhun : >

Re: Is the order guaranteed with Windowall

2019-02-04 Thread Fabian Hueske
Hi, A WindowAll is executed in a single task. If you sort the data before the window, the sorting must also happen in a single task, i.e., with parallelism 1. The reasons is that an operator somewhat randomly merges multiple input partitions. So even if each input partition is sorted, the merging

Re: Add header to a file produced using the writeAsFormattedText method

2019-02-04 Thread Fabian Hueske
Hi Konstantinos, Writing headers to files is currently not supported by the underlying TextOutputFormat. You can implement a custom OutputFormat by extending TextOutputFormat to add this functionality. Best, Fabian Am Fr., 1. Feb. 2019 um 16:04 Uhr schrieb Papadopoulos, Konstantinos < konstantin

Re: How to load multiple same-format files with single batch job?

2019-02-04 Thread Fabian Hueske
Hi, The files will be read in a streaming fashion. Typically files are broken down into processing splits that are distributed to tasks for reading. How a task reads a file split depends on the implementation, but usually the format reads the split as a stream and does not read the split as a whol

Re: How to load multiple same-format files with single batch job?

2019-02-04 Thread françois lacombe
Hi Fabian, Thank you for this input. This is interesting. With such an input format, will all the file will be loaded in memory before to be processed or will all be streamed? All the best François Le mar. 29 janv. 2019 à 22:20, Fabian Hueske a écrit : > Hi, > > You can point a file-based inp