Re: Accessing columns from input stream table during Window operations

2021-04-18 Thread Guowei Ma
Hi, Sumeet For "input.b" I think you should aggregate the non-group-key column[1]. But I am not sure why the "input.c.avg.alias('avg_value')" has resolved errors. Would you mind giving more detailed error information? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/tabl

Accessing columns from input stream table during Window operations

2021-04-18 Thread Sumeet Malhotra
Hi, I have a use case where I'm creating a Tumbling window as follows: "input" table has columns [Timestamp, a, b, c] input \ .window(Tumble.over(lit(10).seconds).on(input.Timestamp).alias("w")) \ .group_by(col('w'), input.a) \ .select( col('w').start.alias('window_start'),

Re: PyFlink Kafka-Connector NoClassDefFoundError

2021-04-18 Thread Dian Fu
Hi, You need to use the fat jar [1] as documented in the Kafka Table & SQL connector page [2]. [1] https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.12.2/flink-sql-connector-kafka_2.11-1.12.2.jar

Re: PyFlink UDF: When to use vectorized vs scalar

2021-04-18 Thread Dian Fu
Hi Yik San, It much depends on what you want to do in your Python UDF implementation. As you know that, for vectorized Python UDF (aka. Pandas UDF), the input data are organized as columnar format. So if your Python UDF implementation could benefit from this, e.g. making use of the functionalit

Interesting article about correctness and latency

2021-04-18 Thread BenoƮt Paris
Hi all! I read this very interesting and refreshing article today, about correctness and (vs?) latency in streaming systems. I thought I'd share it. https://scattered-thoughts.net/writing/internal-consistency-in-streaming-systems/ With some comments on Hacker News: https://news.ycombinator.com/

PyFlink Kafka-Connector NoClassDefFoundError

2021-04-18 Thread G . G . M . 5611
Hi, I am trying to run a very basic job in PyFlink (getting Data from a Kafka-Topic and printing the stream). In the command line I run: ./bin/flink run \ --python /home/ubuntu/load_kafka.py \ --jarfile /home/ubuntu/flink-connector-kafka_2.12-1.12.2.jar I downloaded the jar from: https://mvnr

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-18 Thread Mathieu D
Hi, I can't change the way devices send their data. We are constrained in the messages sent per day per device. To illustrate my question: - at 9:08 a message is emitted. It packs together several measures: - measure m1 taken at 8:52 - measure m2 taken at 9:07 m1 must go in the 8:00-9:00 aggrega