Re: Flink Table API schema doesn't support nested object in ObjectArray

2019-08-02 Thread Fabian Hueske
Thanks for the bug report Jacky! Would you mind opening a Jira issue, preferably with a code snippet that reproduces the bug? Thank you, Fabian Am Fr., 2. Aug. 2019 um 16:01 Uhr schrieb Jacky Du : > Hi, All > > Just find that Flink Table API have some issue if define nested object in > an objec

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Oytun Tez
This bit of info is very useful, Fabian, thank you: You can get the parallel task id from the > RuntimeContext.getIndexOfThisSubtask(). > RuntimeContext.getNumberOfParallelSubtasks() gives the total number of > tasks. --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translatio

Flink Table API schema doesn't support nested object in ObjectArray

2019-08-02 Thread Jacky Du
Hi, All Just find that Flink Table API have some issue if define nested object in an object array . it will give column not found exception if a table schema define like below : payload : Row(arraylist : ObjectArrayTypeInfo) but Table APi works fine if we don't have nested object in array , so b

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Eduardo Winpenny Tejedor
awesome, thanks On Fri, 2 Aug 2019, 10:56 Fabian Hueske, wrote: > Hi, > > Regarding step 3, it is sufficient to check that you got on message from > each parallel task of the previous operator. That's because a task > processes the timers of all keys before moving forward. > Timers are always pr

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Mohammad Hosseinian
Hi Vasily, I haven't tested the stare recovery under YARN setup. But in case of stand-alone Flink cluster setup, I needed to run the application with proper open-checkpoint recovery directory (whose name stars with 'chk-') passed as -s parameter value. This was the only way I could recover my a

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Ahmad Hassan
Hi Fabian, Thank you, We will look into it now. Best, On Fri, 2 Aug 2019 at 12:50, Fabian Hueske wrote: > Ok, I won't go into the implementation detail. > > The idea is to track all products that were observed in the last five > minutes (i.e., unique product ids) in a five minute tumbling wind

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Vasily Melnik
Hi, Maxim. My console-consumer command: kafka-console-consumer --zookeeper ... --topic test --from-beginning --isolation-level read_committed It works perfectly well with manually written kafka producer - it reads data only after commitTransaction. On Fri, 2 Aug 2019 at 14:19, Maxim Parkachov

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
Ok, I won't go into the implementation detail. The idea is to track all products that were observed in the last five minutes (i.e., unique product ids) in a five minute tumbling window. Every five minutes, the observed products are send to a process function that collects the data of the last 24 h

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Maxim Parkachov
Hi Vasily, as far as I know, by default console-consumer reads uncommited. Try setting isolation.level to read_committed in console-consumer properties. Hope this helps, Maxim. On Fri, Aug 2, 2019 at 12:42 PM Vasily Melnik < vasily.mel...@glowbyteconsulting.com> wrote: > Hi, Eduardo. > Maybe i

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Vasily Melnik
Hi, Eduardo. Maybe i should describe experiment design precisely : 1) I run Flink on YARN (YARN Session method). 2) I do not stop/cancell application, i just kill TaskManager process 3) After that YARN creates another TaskManager Process and auto checkpoint restore from HDFS happens. That's why i

Re: Flink issue

2019-08-02 Thread Zhu Zhu
For network issue, this answer might help http://mail-archives.apache.org/mod_mbox/flink-user/201907.mbox/%3cdb49d6f2-1a6b-490a-8502-edd9562b0163.yungao...@aliyun.com%3E . Thanks, Zhu Zhu Karthick Thanigaimani 于2019年8月2日周五 下午5:34写道: > Yes Zhu, the server is online and it didn't get died. So it

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Ahmad Hassan
Hi Fabian, Thanks for this detail. However, our pipeline is keeping track of list of products seen in 24 hour with 5 min slide (288 windows). inStream .filter(Objects::*nonNull*) .keyBy(*TENANT*) .window(SlidingProcessingTimeWindows.*of*(Time.*minutes*(24), Time.*minutes* (5))) .trigger(TimeT

Re: Apache Flink and additional fileformats (Excel, blockchains)

2019-08-02 Thread Jörn Franke
Thanks a lot! > Am 02.08.2019 um 12:02 schrieb Fabian Hueske : > > Hi Joern, > > Thanks for sharing your connectors! > The Flink community is currently working on a website that collects and lists > externally maintained connectors and libraries for Flink. > We are still figuring out some deta

Re: Apache Flink and additional fileformats (Excel, blockchains)

2019-08-02 Thread Fabian Hueske
Hi Joern, Thanks for sharing your connectors! The Flink community is currently working on a website that collects and lists externally maintained connectors and libraries for Flink. We are still figuring out some details, but hope that it can go live soon. Would be great to have your repositories

Re: Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Eduardo Winpenny Tejedor
Hi Vasily, You're probably executing this from your IDE or from a local Flink cluster without starting your job from a checkpoint. When you start your Flink job for the second time you need to specify the path to the latest checkpoint as an argument, otherwise Flink will start from scratch. You'

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Fabian Hueske
Hi, Regarding step 3, it is sufficient to check that you got on message from each parallel task of the previous operator. That's because a task processes the timers of all keys before moving forward. Timers are always processed per key, but you could deduplicate on the parallel task id and check t

Re: From Kafka Stream to Flink

2019-08-02 Thread Fabian Hueske
Hi, Flink does not distinguish between streams and tables. For the Table API / SQL, there are only tables that are changing over time, i.e., dynamic tables. A Stream in the Kafka Streams or KSQL sense, is in Flink a Table with append-only changes, i.e., records are only inserted and never deleted

Exactly-once semantics in Flink Kafka Producer

2019-08-02 Thread Vasily Melnik
I,m trying to test Flink 1.8.0 exactly-once semantics with Kafka Source and Sink: 1. Run flink app, simply transferring messages from one topic to another with parallelism=1, checkpoint interval 20 seconds 2. Generate messages with incrementing integer numbers using Python script each

Re: StreamingFileSink part file count reset

2019-08-02 Thread Biao Liu
Hi Sidhartha, I don't think you should worry about this. Currently the `StreamingFileSink` uses a long to keep this counter. The maximum of long is 9,223,372,036,854,775,807. The counter would be reset if count of files reaches that value. I don't think it should happen. WRT the max filename leng

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-02 Thread Eduardo Winpenny Tejedor
Hi Oytun, that sounds like a great idea thanks!! Just wanted to confirm a couple of things: -In step 2 by merging do you mean anything else apart from setting the operator parallelism to 1? Forcing a parallelism of 1 should ensure all items go to the same task. -In step 3 I don't think I could ch

Re: Support priority of the Flink YARN application in Flink 1.9

2019-08-02 Thread tian boxiu
Hello everyone, I have created a Jira issue in: https://issues.apache.org/jira/browse/FLINK-13548. Thanks for reviewing it. Thank you~ Boxiu Fabian Hueske 于2019年8月2日周五 下午4:09写道: > Hi Boxiu, > > This sounds like a good feature. > > Please have a look at our contribution guidelines [1]. > To pr

Re: How to implement Multi-tenancy in Flink

2019-08-02 Thread Fabian Hueske
Hi Ahmad, First of all, you need to preaggregate the data in a 5 minute tumbling window. For example, if your aggregation function is count or sum, this is simple. You have a 5 min tumbling window that just emits a count or sum every 5 minutes. The ProcessFunction then has a MapState (called buff

Re: Support priority of the Flink YARN application in Flink 1.9

2019-08-02 Thread Fabian Hueske
Hi Boxiu, This sounds like a good feature. Please have a look at our contribution guidelines [1]. To propose a feature, you should open a Jira issue [2] and start a discussion there. Please note that the feature freeze for the Flink 1.9 release happened a few weeks ago. The community is currentl

Re: Flink issue

2019-08-02 Thread Zhu Zhu
Hi Karthick, Could you check whether the `lost` TM 'flink-taskmanager-b/2:6121' is still alive when the error is reported? If it is still alive then, this seems to be a network issue between 'flink-taskmanager-c:6121' and 'flink-taskmanager-b/2:6121'. Otherwise, it is needed to check why the TM ex