Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Yik San Chan
Hi Dian, Thank you! Best, Yik San On Tue, Apr 20, 2021 at 9:24 AM Dian Fu wrote: > Yes, your understanding is correct. > > So model.predict accept pandas.Series as inputs? If this is the case, then > I guess Pandas UDF is a perfect choice for your requirements. > > Regards, > Dian > > 2021年4月1

Re: CRD compatible with native and standalone mode

2021-04-19 Thread Yang Wang
I think the compatibility depends on you. For example, you could have the same CustomResourceDefinition for standalone and native Flink applications. They could look like this[1]. Since the CR is defined in yaml[2], native and standalone could have some dedicated fields. And you could easily parse

Re: How to config the flink to load libs in myself path

2021-04-19 Thread Guowei Ma
Hi, chenxuying There is currently no official support for this. What I am curious about is why you have this requirement. In theory, you can always build your own image. Best, Guowei On Mon, Apr 19, 2021 at 9:58 PM chenxuying wrote: > Hi all, I deployed the flink in K8S by session cluster [1] >

Re: Max-parellelism limitation

2021-04-19 Thread Guowei Ma
Hi, Olivier Yes. The introduction of this concept is to solve the problem of rescaling the keystate. Best, Guowei On Mon, Apr 19, 2021 at 8:56 PM Olivier Nouguier wrote: > Hi, > May I have the confirmation that the max-parallelism limitation only > occurs when keyed states are used ? > > > --

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Guowei Ma
Hi, Sumeet Thanks you for the sharing. As Dian suggested, I think you could use b as your `group_by`'s key and so the b could be output directly. I think it is more simple. Best, Guowei On Mon, Apr 19, 2021 at 7:31 PM Dian Fu wrote: > Hi Sumeet, > > Thanks for the sharing. > > Then I guess you

Re: PyFlink Kafka-Connector NoClassDefFoundError

2021-04-19 Thread Dian Fu
Hi Giacomo, AFAIK, it should support accepting row type as the array elements. Did you encounter some problems? Besides, it would be great if you could share a minimal example which could reproduce the above exception (along with the test data). Regards, Dian > 2021年4月20日 上午12:20,g.g.m.5...@

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Dian Fu
Yes, your understanding is correct. So model.predict accept pandas.Series as inputs? If this is the case, then I guess Pandas UDF is a perfect choice for your requirements. Regards, Dian > 2021年4月19日 下午8:23,Yik San Chan 写道: > > Hi Dian, > > By "access data at row basis", do you mean, for inp

CRD compatible with native and standalone mode

2021-04-19 Thread gaurav kulkarni
Hi,  I plan to create a flink K8s operator which supports standalone mode, and and switch to native mode sometime later. I was wondering what are some of the approaches to ensure that CRD is compatible with both native and standalone mode?  Thanks 

Are configs stored as part of savepoints

2021-04-19 Thread gaurav kulkarni
Hi,  I was wondering if configs applied while creating a flink application are also stored as part of savepoint? If yes, an app is restored from a savepoint, does it start with the same configs? Thanks 

Task Local Recovery with mountable disks in the cloud

2021-04-19 Thread Sonam Mandal
Hello, We've been experimenting with Task-local recovery using Kubernetes. We have a way to specify mounting the same disk across Task Manager restarts/deletions for when the pods get recreated. In this scenario, we noticed that task local recovery does not kick in (as expected based on the doc

How can I demarcate which event elements are the boundaries of a window?

2021-04-19 Thread Marco Villalobos
I have a tumbling window that aggregates into a process window function. Downstream there is a keyed process function. [window aggregate into process function] -> keyed process function I am not quite sure how the keyed process knows which elements are at the boundary of the window. Is there a m

Aw: Re: PyFlink Kafka-Connector NoClassDefFoundError

2021-04-19 Thread G . G . M . 5611
Hi Dian, thanks, that did the trick. Unfortunately, I have a new problem now.   As I said I'm trying to read json data from a kafka topic into a datastream. I tried doing this using the JsonRowDeserializationSchema-class as below (the Json-objects are tweets and thus pretty nested and complex). H

[1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-19 Thread Hailu, Andreas [Engineering]
Hi Flink team, I'm trying to configure a Flink on YARN with SSL enabled. I've followed the documentation's instruction [1] to generate a Keystore and Truststore locally, and added a the properties to my flink-conf.yaml. security.ssl.rest.keystore: /home/user/ssl/deploy-keys/rest.keystore securi

How to config the flink to load libs in myself path

2021-04-19 Thread chenxuying
Hi all, I deployed the flink in K8S by session cluster [1] the default plugin path is /opt/flink/plugins, the default lib path is /opt/flink/lib, the default usrlib path is /opt/flink/usrlib, I wonder if it is possible for change the default path. For example, I wish flink don't load libs from /o

Max-parellelism limitation

2021-04-19 Thread Olivier Nouguier
Hi, May I have the confirmation that the max-parallelism limitation only occurs when keyed states are used ? -- Olivier Nouguier SSE e | olivier.nougu...@teads.com m | 0651383971 Teads France SAS, 159 rue de Thor, Business Plaza, Bat. 4, 34000 Montpellier, France [image: image]

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Yik San Chan
Hi Dian, By "access data at row basis", do you mean, for input X, for row in X: doSomething(row) If that's the case, I believe I am not accessing the vector like that. What I do is pretty much, for input X1, X2 and X3: model = ... predictions = model.predict(X1, X2, X3) Do I understand it

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-19 Thread Arvid Heise
Hi Mathieu, The easiest way is to already emit several inputs on the source level. If you use DeserializationSchema, try to use the method with the collector. The watermarks should then be generated as if you would only receive one element at a time. On Sun, Apr 18, 2021 at 11:08 AM Mathieu D wr

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Dian Fu
I have not tested this and so I have no direct answer to this question. There are some tricky things behind this. For Pandas UDF, the input data will be organized as columnar format. however, if there are multiple input arguments for the Pandas UDF and you access data at row basis in the Pandas

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Dian Fu
Hi Sumeet, Thanks for the sharing. Then I guess you could use `.group_by(col('w'), input.a, input.b)`. Since the value for input.a is always the same, it’s equal to group_by(col(‘w'), input.b) logically. The benefit is that you could access input.a directly in the select clause. Regards, Dian

Re: Batch Task Synchronization

2021-04-19 Thread Guowei Ma
Hi, Mary Flink has an alignment mechanism for synchronization. All upstream taks (for example reduce1) will send a message after the end of a round to inform all downstream that he has processed all the data. When the downstream (reduce2) collected all the messages from all his upstream t

Re: Shall we add an option to ignore the header when flink sql consume filesystem csv source?

2021-04-19 Thread Yik San Chan
Hi Jin, Look forward to the implementation! Best, Yik San On Mon, Apr 19, 2021 at 5:58 PM JIN FENG wrote: > Hi, Yik San > > Yes, I'll implement this soon. > > On Mon, Apr 19, 2021 at 5:56 PM Yik San Chan > wrote: > >> Hi Jinfeng, >> >> Thanks for pointing out! Is there any plan to implement

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Sumeet Malhotra
Hi Guowei, Let me elaborate the use case with an example. Sample input table looks like this: timea b c - t0 a0 b0 1 t1 a0 b1 2 t2 a0 b2 3 t3 a0 b0 6 t4 a0 b1 7 t5 a0 b2 8 Basically, every time interval there are new readings fro

Re: Shall we add an option to ignore the header when flink sql consume filesystem csv source?

2021-04-19 Thread JIN FENG
Hi, Yik San Yes, I'll implement this soon. On Mon, Apr 19, 2021 at 5:56 PM Yik San Chan wrote: > Hi Jinfeng, > > Thanks for pointing out! Is there any plan to implement this soon? > > Best, > Yik San > > On Mon, Apr 19, 2021 at 5:43 PM JIN FENG wrote: > >> Hi, Yik San >> This is a reasonable

Re: Shall we add an option to ignore the header when flink sql consume filesystem csv source?

2021-04-19 Thread Yik San Chan
Hi Jinfeng, Thanks for pointing out! Is there any plan to implement this soon? Best, Yik San On Mon, Apr 19, 2021 at 5:43 PM JIN FENG wrote: > Hi, Yik San > This is a reasonable behavior, and the related jira is > https://issues.apache.org/jira/browse/FLINK-22178 . > > On Mon, Apr 19, 2021 at

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Dian Fu
Hi Sumeet, 1) Regarding to the above exception, it’s a known issue and has been fixed in FLINK-21922 [1]. It will be available in the coming 1.12.3. You could also cherry-pick that fix to 1.12.2 and build from source following the instruction

Re: Shall we add an option to ignore the header when flink sql consume filesystem csv source?

2021-04-19 Thread JIN FENG
Hi, Yik San This is a reasonable behavior, and the related jira is https://issues.apache.org/jira/browse/FLINK-22178 . On Mon, Apr 19, 2021 at 5:26 PM Yik San Chan wrote: > Hi community, > > According to > https://stackoverflow.com/questions/65359382/apache-flink-sql-reference-guide-for-table-pr

Shall we add an option to ignore the header when flink sql consume filesystem csv source?

2021-04-19 Thread Yik San Chan
Hi community, According to https://stackoverflow.com/questions/65359382/apache-flink-sql-reference-guide-for-table-properties#comment115663505_65404207, there is no way to ignore csv header when flink sql consumes filesystem csv source. I think this is a reasonable behavior that users want. Shall

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Sumeet Malhotra
Thanks Guowei. I'm trying out Over Windows, as follows: input \ .over_window( Over.partition_by(col(input.a)) \ .order_by(input.Timestamp) \ .preceding(lit(10).seconds) \ .alias('w')) \ .select( input.b, input.c.avg.over(col('w'))) \ .exe

Batch Task Synchronization

2021-04-19 Thread Maria Xekalaki
Hi All, This is more of a general question. How are tasks synchronized in batch execution? If, for example, we ran an iterative pipeline (map1 -> reduce1 -> reduce2 -> map2), and the first two operators (map1->reduce1) were chained, how would reduce2 be notified that map1 -> reduce1 have comple

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Yik San Chan
Hmm one more question - as I said, there are 2 gains from using pandas UDF - (1) smaller ser-de and invocation overhead, and (2) vector calculation. (2) depends on use cases, how about (1)? Is the benefit (1) always-true? Best, Yik San On Mon, Apr 19, 2021 at 4:33 PM Yik San Chan wrote: > Hi F

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-19 Thread Yik San Chan
Hi Fabian and Dian, Thanks for the reply. They make sense. Best, Yik San On Mon, Apr 19, 2021 at 9:49 AM Dian Fu wrote: > Hi Yik San, > > It much depends on what you want to do in your Python UDF implementation. > As you know that, for vectorized Python UDF (aka. Pandas UDF), the input > data

Re: Flink InfluxDb connector not present in Maven

2021-04-19 Thread Chesnay Schepler
Please reach out to the bahir project (bahir.apache.org) for issues related to the release of Bahir connectors. On 4/17/2021 10:58 AM, Vinay Patil wrote: Hi Team, Flink influx db connector `flink-connector-influxdb_2.1` is not present in Maven , can you please upload the same https://repo.ma

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Guowei Ma
Hi, Sumeet Maybe you could try the Over Windows[1], which could keep the "non-group-key" column. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#over-windows Best, Guowei On Mon, Apr 19, 2021 at 3:25 PM Sumeet Malhotra wrote: > Thanks Guowei! Regardin

Re: Accessing columns from input stream table during Window operations

2021-04-19 Thread Sumeet Malhotra
Thanks Guowei! Regarding "input.c.avg" you're right, that doesn't cause any issues. It's only when I want to use "input.b". My use case is to basically emit "input.b" in the final sink as is, and not really perform any aggregation on that column - more like pass through from input to sink. What's