Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Xingbo Huang
Hi Pierre, The serialization/deserialization of sparse Row in flink is specially optimized. The principle is that each Row will have a leading mask when serializing to identify whether the field at the specified position is NULL, and one field corresponds to one bit. For example, if you have 10k f

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
It appears that even when I pass id through the map function and join back with the original table, it does not seem to think that the id passed through map is a unique key. Is there any way to solve this while still preserving the primary key? On Wed, Dec 2, 2020 at 5:27 PM Rex Fenley wrote: >

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Pierre Oberholzer
Hi Xingbo, Community, Thanks a lot for your support. May I finally ask to conclude this thread, including wider audience: - Are serious performance issues to be expected with 100k fields per ROW (i.e. due solely to metadata overhead and independently of queries logic) ? - In sparse population (say

Re: Running Flink job as a rest

2020-12-02 Thread Yun Tang
Hi Dhurandar, I'm afraid that Flink's rest API cannot satisfy your request as it would not act as any source. One possible example could be SocketWindowWordCount [1] which listens data on a port from all taskmanagers with sources. [1] https://github.com/apache/flink/blob/master/flink-examples/

Re: Questions regarding DDL and savepoints

2020-12-02 Thread Yun Tang
Hi Kevin, If you pass the savepoint path to resume application [1], the application would resume from last savepoint. If you change the logic of your DDL and since no uid can be set from users, I am afraid not all state could be restored as you expected. [1] https://ci.apache.org/projects/fli

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Xingbo Huang
Hi Pierre, This example is written based on the syntax of release-1.12 that is about to be released, and the test passed. In release-1.12, input_type can be omitted and expression can be used directly. If you are using release-1.11, you only need to modify the grammar of udf used slightly accordin

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
Even odder, if I pull the constructor of the function into its own variable it "works" (though it appears that map only passes through the fields mapped over which means I'll need an additional join, though now I think I'm on the right path). I.e. def splatFruits(table: Table, columnPrefix: String

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
Looks like `as` needed to move outside of where it was before to fix that error. Though now I'm receiving >org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Aliasing more fields than we actually have. Example code now: // table will always have pk id def

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
So I just instead tried changing SplatFruitsFunc to a ScalaFunction and leftOuterJoinLateral to a map and I'm receiving: > org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Only a scalar function can be used in the map operator. which seems odd because doc

Questions regarding DDL and savepoints

2020-12-02 Thread Kevin Kwon
I have a question regarding DDLs if they are considered operators and can be savepointed For example CREATE TABLE mytable ( id BIGINT, data STRING WATERMARK(...) ) with ( connector = 'kafka' ) If I create the table like above, save&exit and resume application, will the application start

Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
Hello, I have a TableFunction and wherever it is applied with a leftOuterJoinLateral, my table loses any inference of there being a primary key. I see this because all subsequent joins end up with "NoUniqueKey" when I know a primary key of id should exist. I'm wondering if this is expected behavi

Running Flink job as a rest

2020-12-02 Thread dhurandar S
Can Flink job be running as Rest Server, Where Apache Flink job is listening on a port (443). When a user calls this URL with payload, data directly goes to the Apache Flink windowing function. Right now Flink can ingest data from Kafka or Kinesis, but we have a use case where we would like to pus

TextFile source && KeyedWindow triggers --> Unexpected execution order

2020-12-02 Thread ANON Marta
Hello! I have a datastream like this: env.readTextFile("events.log") .map(event => StopFactory(event)) // I have defined a Stop class and this creates an instance from the file line .assignTimestampsAndWatermarks(stopEventTimeExtractor) // extract the timestamp from a field from each instance .

Re: PyFlink - Scala UDF - How to convert Scala Map in Table API?

2020-12-02 Thread Pierre Oberholzer
Hi Xingbo, Nice ! This looks a bit hacky, but shows that it can be done ;) I just got an exception preventing me running your code, apparently from udf.py: TypeError: Invalid input_type: input_type should be DataType but contains None Can you pls check again ? If the schema is defined is a .avs

Re: Process windows not firing - > Can it be a Watermak issue?

2020-12-02 Thread Simone Cavallarin
Hi Till Super, understood! I will also read the website with the link that you provided me. Thanks and have a nice eve. best s From: Till Rohrmann Sent: 02 December 2020 17:44 To: Simone Cavallarin Cc: user Subject: Re: Process windows not firing - > Can i

Re: Routing events to different kafka topics dynamically

2020-12-02 Thread Till Rohrmann
Hi Prasanna, I believe that what Aljoscha suggestd in the linked discussion is still the best way to go forward. Given your description of the problem this should actually be pretty straightforward as you can deduce the topic from the message. Hence, you just need to create the ProducerRecord with

Re: Process windows not firing - > Can it be a Watermak issue?

2020-12-02 Thread Till Rohrmann
Hi Simone, You need to set this option because otherwise Flink will not generate the Watermarks. If you don't need watermarks (e.g. when using ProcessingTime), then the system needs to send fewer records over the wire. That's basically how Flink has been developed [1]. With Flink 1.12 this option

Re: Process windows not firing - > Can it be a Watermak issue?

2020-12-02 Thread Simone Cavallarin
Hi Till and David, First at all thanks for the quick reply, really appreciated! Drilling down to the issue: 1. No session is closing because there isn't a sufficiently long gap in the test data -> It was the first thing that I thought, before asking I run a test checking the gap on my data

Re: Uploading job jar via web UI in flink HA mode

2020-12-02 Thread sidhant gupta
Hi Till, Thanks for the clarification and suggestions Regards Sidhant Gupta On Wed, Dec 2, 2020, 10:10 PM Till Rohrmann wrote: > Hi Sidhant, > > Have you seen this discussion [1]? If you want to use S3, then you need to > make sure that you start your Flink processes with the appropriate > Fil

Re: Partitioned tables in SQL client configuration.

2020-12-02 Thread Till Rohrmann
Hi Maciek, I am pulling in Timo who might help you with this problem. Cheers, Till On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak wrote: > Hello, > > I try to configure SQL Client to query partitioned ORC data on local > filesystem. I have directory structure like that: > > /tmp/table1/startd

Re: Tracking ID in log4j MDC

2020-12-02 Thread Till Rohrmann
Hi Anil, Flink does not maintain the MDC context between threads. Hence, I don't think that it is possible w/o changes to Flink. One note, if operators are chained then they are run by the same thread. Cheers, Till On Wed, Dec 2, 2020 at 7:22 AM Anil K wrote: > Hi All, > > Is it possible to h

Re: Uploading job jar via web UI in flink HA mode

2020-12-02 Thread Till Rohrmann
Hi Sidhant, Have you seen this discussion [1]? If you want to use S3, then you need to make sure that you start your Flink processes with the appropriate FileSystemProvider for S3 [2]. So the problem you are seeing is most likely caused by the JVM not knowing a S3 file system implementation. Be a

Routing events to different kafka topics dynamically

2020-12-02 Thread Prasanna kumar
Hi, Events need to be routed to different kafka topics dynamically based upon some info in the message. We have implemented using KeyedSerializationSchema similar to https://stackoverflow.com/questions/49508508/apache-flink-how-to-sink-events-to-different-kafka-topics-depending-on-the-even. But i

Re: test (harness and minicluster)

2020-12-02 Thread Till Rohrmann
Hi Martin, In general, Flink's MiniCluster should be able to run every complete Flink JobGraph. However, from what I read you are looking for a test harness for a processWindowFunction so that you can test this function in a more unit test style, right? What you can do is to use the OneInputStream

Re: Application Mode support on VVP v2.3

2020-12-02 Thread Till Rohrmann
Hi Narasimha, thanks for reaching out to the community. I am not entirely sure whether VVP 2.3 supports the application mode. Since this is a rather new feature, it could be that it has not been integrated yet. I am pulling in Ufuk and Fabian who should be able to definitely answer your question.

Application Mode support on VVP v2.3

2020-12-02 Thread narasimha
Hi, Using ververica platform to deploy flink jobs, found that it is not supporting application deployment mode. Just want to check if it is expected. Below is a brief of how the main method has been composed. class Job1 { public void execute(){ StreamExecutingEnvironemnt env = ...

test (harness and minicluster)

2020-12-02 Thread Martin Frank Hansen
Hi, I am trying to make a test-suite for our flink cluster using harness and minicluster. As we are using processWindowFunctions in our pipeline we need some good ways of validating these functions. To my surprise processWindowFunctions are neither supported by test-harness or minicluster setups,

Uploading job jar via web UI in flink HA mode

2020-12-02 Thread sidhant gupta
Hi All, I have 2 job managers in flink HA mode cluster setup. I have a load balancer forwarding request to both (leader and stand by) the job managers in default round-robin fashion. While uploading the job jar the Web UI is fluctuating between the leader and standby page. Its difficult to upload

Re: Performance consequence of leftOuterJoinLateral

2020-12-02 Thread Rex Fenley
Yes, exactly. Thanks! On Tue, Dec 1, 2020 at 6:11 PM Danny Chan wrote: > Hi, Rex ~ > > For "leftOuterJoinLateral" do you mean join a table function through > lateral table ? > If it is, yes, the complexity is O(1) for each probe key of LHS. The table > function evaluate the extra columns and app