Re: Flink Dynamodb as sink

2020-02-03 Thread hemant singh
Thanks, I'll check it out. On Tue, Feb 4, 2020 at 12:30 PM 容祖儿 wrote: > you can customize a Sink function (implement SinkFunction) that's not so > hard. > > regards. > > On Tue, Feb 4, 2020 at 2:38 PM hemant singh wrote: > >> Hi All, >> >> I am using dynamodb as a sink for one of the metrics pi

Re: Flink Dynamodb as sink

2020-02-03 Thread 容祖儿
you can customize a Sink function (implement SinkFunction) that's not so hard. regards. On Tue, Feb 4, 2020 at 2:38 PM hemant singh wrote: > Hi All, > > I am using dynamodb as a sink for one of the metrics pipeline. Wanted to > understand if there are any existing connectors. I did searched an

Flink Dynamodb as sink

2020-02-03 Thread hemant singh
Hi All, I am using dynamodb as a sink for one of the metrics pipeline. Wanted to understand if there are any existing connectors. I did searched and could not find one. If none exists, has anyone implemented one and any hints on that direction will help a lot. Thanks, Hemant

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Hequn Cheng
Hi Jincheng, +1 for this proposal. >From the perspective of users, I think it would nice to have PyFlink on PyPI which makes it much easier to install PyFlink. Best, Hequn On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang wrote: > +1 > > > Xingbo Huang 于2020年2月4日周二 下午1:07写道: > >> Hi Jincheng, >> >> T

Performance issue with RegistryAvroSerializationSchema

2020-02-03 Thread Steve Whelan
Hi, I'm running Flink v1.9. I backported the commit adding serialization support for Confluent's schema registry[1]. Using the code as is, I saw a nearly 50% drop in peak throughput for my job compared to using *AvroRowSerializationSchema*. Looking at the code, *RegistryAvroSerializationSchema.se

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Jeff Zhang
+1 Xingbo Huang 于2020年2月4日周二 下午1:07写道: > Hi Jincheng, > > Thanks for driving this. > +1 for this proposal. > > Compared to building from source, downloading directly from PyPI will > greatly save the development cost of Python users. > > Best, > Xingbo > > > > Wei Zhong 于2020年2月4日周二 下午12:43写道:

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Xingbo Huang
Hi Jincheng, Thanks for driving this. +1 for this proposal. Compared to building from source, downloading directly from PyPI will greatly save the development cost of Python users. Best, Xingbo Wei Zhong 于2020年2月4日周二 下午12:43写道: > Hi Jincheng, > > Thanks for bring up this discussion! > > +1

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Wei Zhong
Hi Jincheng, Thanks for bring up this discussion! +1 for this proposal. Building from source takes long time and requires a good network environment. Some users may not have such an environment. Uploading to PyPI will greatly improve the user experience. Best, Wei jincheng sun 于2020年2月4日周二 上午1

Re: Flink build-in functions

2020-02-03 Thread Jingsong Li
Hi Sunfulin, Did you use blink-planner? What functions are missing? Best, Jingsong Lee On Tue, Feb 4, 2020 at 12:23 PM Wyatt Chun wrote: > They are two different systems for differentiated usage. For your > question, why don’t give a direct try on Blink? > > Regards > > On Tue, Feb 4, 2020 at

Re: Flink build-in functions

2020-02-03 Thread Wyatt Chun
They are two different systems for differentiated usage. For your question, why don’t give a direct try on Blink? Regards On Tue, Feb 4, 2020 at 10:02 AM sunfulin wrote: > As far as I can see, the latest flink version does not have a fullfilled > support for blink build-in functions. Many date

[DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread jincheng sun
Hi folks, I am very happy to receive some user inquiries about the use of Flink Python API (PyFlink) recently. One of the more common questions is whether it is possible to install PyFlink without using source code build. The most convenient and natural way for users is to use `pip install apache-

Flink build-in functions

2020-02-03 Thread sunfulin
As far as I can see, the latest flink version does not have a fullfilled support for blink build-in functions. Many date functions and string functions can not be used in Flink. I want to know that when shall we use flink just as to use blink in the same way.

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Kostas Kloudas
Hi Mark, Currently no, but if rolling on every checkpoint is ok with you, in future versions it is easy to allow to roll on every checkpoint, but also on inactivity intervals. Cheers, Kostas On Mon, Feb 3, 2020 at 5:24 PM Mark Harris wrote: > Hi Kostas, > > Thanks for your help here - I think

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Mark Harris
Hi Kostas, Thanks for your help here - I think we're OK with the increased heap size, but happy to explore other alternatives. I see the problem - we're currently using a BulkFormat, which doesn't seem to let us override the rolling policy. Is there an equivalent for the BulkFormat? Best regar

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Kostas Kloudas
Hi Mark, You can use something like the following and change the intervals accordingly: final StreamingFileSink sink = StreamingFileSink .forRowFormat(new Path(outputPath), new SimpleStringEncoder<>("UTF-8")) .withRollingPolicy(

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Mark Harris
Hi Kostas, Sorry, stupid question: How do I set that for a StreamingFileSink? Best regards, Mark From: Kostas Kloudas Sent: 03 February 2020 14:58 To: Mark Harris Cc: Piotr Nowojski ; Cliff Resnick ; David Magalhães ; Till Rohrmann ; flink-u...@apache.org Su

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Kostas Kloudas
Hi Mark, Have you tried to set your rolling policy to close inactive part files after some time [1]? If the part files in the buckets are inactive and there are no new part files, then the state handle for those buckets will also be removed. Cheers, Kostas https://ci.apache.org/projects/flink/fl

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Piotr Nowojski
Hi, Thanks for getting back with the semi solution! Sorry that I was not responding before - I was trying to figure this out with some of my colleagues. > I think the DeleteOnExit problem will mean it needs to be restarted every few > weeks, but that's acceptable for now. I hope by the time y

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-02-03 Thread Mark Harris
Hi all, The out-of-memory heap dump had the answer - the job was failing with an OutOfMemoryError because the activeBuckets members of 3 instances of org.apache.flink.streaming.api.functions.sink.filesystem.Buckets were filling a significant enough part of the memory of the taskmanager that no

Re: Flink solution for having shared variable between task managers

2020-02-03 Thread Soheil Pourbafrani
Thanks, I'll check it out. On Mon, Feb 3, 2020 at 10:05 AM Fabian Hueske wrote: > Hi, > > I think you are looking for BroadcastState [1]. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html > > Am Fr., 17. Jan. 2020 um 14:50 Uhr

Question: Determining Total Recovery Time

2020-02-03 Thread Morgan Geldenhuys
Community, I am interested in determining the total time to recover for a Flink application after experiencing a partial failure. Let's assume a pipeline consisting of Kafka -> Flink -> Kafka with Exactly-Once guarantees enabled. Taking a look at the documentation (https://ci.apache.org/pro

Re: Does flink support retries on checkpoint write failures

2020-02-03 Thread Till Rohrmann
Glad to hear that you could solve/mitigate the problem and thanks for letting us know. Cheers, Till On Sat, Feb 1, 2020 at 2:45 PM Richard Deurwaarder wrote: > Hi Till & others, > > We enabled setFailOnCheckpointingErrors > (setTolerableCheckpointFailureNumber isn't available in 1.8) and this >

Re: Flink solution for having shared variable between task managers

2020-02-03 Thread Fabian Hueske
Hi, I think you are looking for BroadcastState [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html Am Fr., 17. Jan. 2020 um 14:50 Uhr schrieb Soheil Pourbafrani < soheil.i...@gmail.com>: > Hi, > > According to the processing logic,