Re: Use Flink for OLAP

2021-11-04 Thread Ww J
Thank you for the information! This is a very interesting topic. In last year’s Flink Forward conference, there was an interesting talk about Hologres. https://www.flink-forward.org/sf-2020/conference-program#data-warehouse--data-lakes--what-s-next-

Re: Use Flink for OLAP

2021-11-04 Thread Caizhi Weng
Hi! Flink is a distributed, stateful streaming data-flow engine (with optimizations for batch or olap jobs too) and it currently is not shipped with any storage system. It needs to be used along with external storage / computation system like hdfs, hive, kafka, iceberg, etc. to build a data wareho

Re: Use Flink for OLAP

2021-11-04 Thread Ww J
Thanks. Can Flink replace the popular OLAP databases, for example, AWS redshift? It seems to me that generally Flink is used as ETL for OLAP. > On Nov 4, 2021, at 9:33 PM, Caizhi Weng wrote: > > Hi! > > Yes you can. Note that it is recommended to run Flink in session cluster mode > (instead o

Re: Use Flink for OLAP

2021-11-04 Thread Caizhi Weng
Hi! Yes you can. Note that it is recommended to run Flink in session cluster mode (instead of per job mode) to minimize distribution and scheduling time for each OLAP query. Ww J 于2021年11月5日周五 下午12:30写道: > Hi, > > Can Flink be used for OLAP queries? > > Thanks, > > Jack >

Use Flink for OLAP

2021-11-04 Thread Ww J
Hi, Can Flink be used for OLAP queries? Thanks, Jack

How to write unit test for stateful operators in Pyflink apps

2021-11-04 Thread Long Nguyễn
Hi. I'm using Pyflink and Table APIs to create a window with Python. I have just read the Unit Testing Stateful or Timely UDFs & Custom Operators secti

Re: GenericWriteAheadSink, declined checkpoint for a finished source

2021-11-04 Thread Caizhi Weng
Hi! Thanks Austin for the answer. I agree that FLIP-147 has solved the problem, just set execution.checkpointing.checkpoints-after-tasks-finish.enabled to true to enable this feature. JDBC sinks solves this problem in a different way. It flushes the sink when closed (see JdbcOutputFormat#close [1

Re: What is Could not retrieve file from transient blob store?

2021-11-04 Thread Guowei Ma
Hi, John Thanks for your update. >From the last 4 lines for the log it seems that there is some TM lost. So it is likely that the TM stopped caused that the retrieve log failed. Best, Guowei On Thu, Nov 4, 2021 at 10:10 PM John Smith wrote: > No, I guess it's stable. > > 2021-11-02 22:41:08,276

Re: Need help with window TopN query

2021-11-04 Thread JING ZHANG
Sorry for late response, Martijn and Francesco have already give good advises to find out the problem. I only have one minor supplementary information, window rank/join/aggregate would emit results after the end of the window. Now the window size is 24 hour, is there any possible the first window i

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
In general I would strongly encourage you to find a way to `key` your stream, it will make everything much simpler. On Thu, Nov 4, 2021 at 6:05 PM Seth Wiesman wrote: > Why not? > > All those classes have a Symbol attribute, why can't you use that to key > the stream? > > On Thu, Nov 4, 2021 at

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
Why not? All those classes have a Symbol attribute, why can't you use that to key the stream? On Thu, Nov 4, 2021 at 5:51 PM Isidoros Ioannou wrote: > Hi Seth, > > thank you for your answer. > In this case you are right and it would solve my problem. but actually my > case is a bit more complex

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Isidoros Ioannou
Hi Seth, thank you for your answer. In this case you are right and it would solve my problem. but actually my case is a bit more complex and my mistake I wanted to provide a simple example. The actual case is, I have DataStream< ServerAwareMessage > inputStream as a source , Message is just an

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Isidoros Ioannou
I am using Processing time characteristic. DataStream inputStream = env.fromElements( Model.of(1, "A", "US"), Model.of(2, "B", "US"), Model.of(3, "C", "US"), Model.of(4, "A", "AU"), Model.of(5, "B", "AU"), Model.of(6, "C", "AU

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
HI Isidoros, If you want the pattern to be scoped to symbols, I suggest you use a `keyBy` in your stream. Constructing the pattern will now look like this: KeyedStream keydInput = inputStream.keyBy(model -> model.getSymbol); PatternStream marketOpenPatternStream = CEP.pattern(keydInput, patter

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Austin Cawley-Edwards
Thanks for the update, the requirements make sense. Some follow up questions: * What time characteristic are you using? Processing or Event? * Can you describe a bit more what you mean by "input like the one I have commented bellow"? What is special about the one you have commented? Best, Austin

Fwd: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Isidoros Ioannou
-- Forwarded message - Από: Isidoros Ioannou Date: Πέμ, 4 Νοε 2021 στις 10:01 μ.μ. Subject: Re: IterativeCondition instead of SimpleCondition not matching pattern To: Austin Cawley-Edwards Hi Austin, thank you for your answer and I really appreciate your willingness to help. Ac

Re: GenericWriteAheadSink, declined checkpoint for a finished source

2021-11-04 Thread Austin Cawley-Edwards
Hi James, You are correct that since Flink 1.14 [1] (which included FLIP-147 [2]) there is support for checkpointing after some tasks has finished, which sounds like it will solve this use case. You may also want to look at the JDBC sink[3] which also supports batching, as well as some other nice

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Austin Cawley-Edwards
Hi Isidoros, Thanks for reaching out to the mailing list. I haven't worked with the CEP library in a long time but can try to help. I'm having a little trouble understanding the desired output + rules. Can you mock up the desired output like you have for the fulfilled pattern sequence? Best, Aust

Re: Need help with window TopN query

2021-11-04 Thread Francesco Guardiani
As a rule of thumb, I would first try to check that Flink ingests correctly your csv. Perhaps try to run just a select on your input and see if the input is parsed as expected and is ordered. On Thu, Nov 4, 2021 at 12:47 PM Martijn Visser wrote: > Hi Pavel, > > There's a Flink SQL recipe in the

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Krzysztof Chmielewski
Sure, I have a connector that will uses HTTP rest call to 3rd party system to get some data based on URL and parameters. Idea is to make it available to Flink SQL in order to use it like SELECT * FROM T where t.id = 123 I would like to have two streams, one would be from T, and the second one wou

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Ingo Bürk
Hi Krzysztof, the new, unified Source interface can only work as a scan source. Could you maybe elaborate a bit on the connector implementation you have and how you intend to have it work as a lookup source? Best Ingo On Thu, Nov 4, 2021 at 4:11 PM Krzysztof Chmielewski < krzysiek.chmielew...@g

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Krzysztof Chmielewski
Thanks Fabian and Ingo, yes I forgot to add the refrence links, so here they are: [1] https://flink.apache.org/2021/09/07/connector-table-sql-api-part1.html [2] https://flink.apache.org/2021/09/07/connector-table-sql-api-part2 [3] https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/dat

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Ingo Bürk
Hi Krzystof, instead of SourceFunctionProvider you need to use SourceProvider. If you look at the filesystem connector you can see an example for that, too (FileSystemTableSource#createSourceProvider). Best Ingo On Thu, Nov 4, 2021 at 3:48 PM Krzysztof Chmielewski < krzysiek.chmielew...@gmail.c

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Fabian Paul
Hi Krzysztof, It is great to hear that you have implemented your source with the unified Source interface. To integrate you source in the Table API you can use the SourceProvider. You can take a look at how our FileSource does is[1] Btw I think you forgot to add your references ;) Best, Fabi

Implementing a Custom Source Connector for Table API and SQL

2021-11-04 Thread Krzysztof Chmielewski
Hi, I was wondering if it is possible to implement a Source Table connector like it is described in [1][2] with custom source that implements a new Source interface [3] and not a SourceFunction. I already have my custom source but when I'm trying to implement a Table Source from LookupTableSource

Re: What is Could not retrieve file from transient blob store?

2021-11-04 Thread John Smith
No, I guess it's stable. 2021-11-02 22:41:08,276 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2021-11-02 22:41:08,292 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Sta

Re: Need help with window TopN query

2021-11-04 Thread Martijn Visser
Hi Pavel, There's a Flink SQL recipe in the Flink SQL Cookbook for a Window TopN, see https://github.com/ververica/flink-sql-cookbook/blob/main/aggregations-and-analytics/11_window_top_n/11_window_top_n.md. I think that could help you with your use case too. Best regards, Martijn On Thu, 4 Nov

Re: Need help with window TopN query

2021-11-04 Thread Pavel Penkov
When the query changed to SELECT user_id, ts, rownum FROM ( SELECT user_id, ts, ROW_NUMBER() OVER (PARTITION BY window_start, window_end, user_id ORDER BY ts ASC) as rownum FROM TABLE( TUMBLE(TABLE visits, DESCRIPTOR(ts), INTERVAL '24' HOURS)) ) WHERE rownum = 1 runs but doesn't produce a

Re: Custom partitioning of keys with keyBy

2021-11-04 Thread naitong Xiao
The keyBy argument function is a deterministic function under same MaxParallelism,  which make sure the key group is always same for same key. My goal here is making keys distributed evenly among operators even with different parallelism. I implement the mapping in a different way by exhausting

Table DataStream Conversion Lost Watermark

2021-11-04 Thread Yunfeng Zhou
Hi, I found that if I convert a Datastream into Table and back into Datastream, watermark of the stream will be lost. As shown in the program below, the TestOperator before the conversion will have its processWatermark() method triggered and watermark value printed, but the one after the conversio

Re: Need help with window TopN query

2021-11-04 Thread Francesco Guardiani
I think the issue here is that the nested select is selecting all the fields produced by the TVF, including window_time (which is implicitly added by the TVF as described here ). Because of

Re: NoResourceAvailableException on taskmanager(s)

2021-11-04 Thread Yangze Guo
Hi, Deniz. The exception implies that there are not enough slots in your standalone cluster. You need to increase the `taskmanager.numberOfTaskSlots` or the `numberOfTaskManagers`. You can search the related log "Received resource requirements from job" in jobManager, which indicates how many slot

Need help with window TopN query

2021-11-04 Thread Pavel Penkov
I'm trying to express a supposedly simple query with Flink SQL - log the first visit a day for each user. Source table is defined like CREATE TABLE visits (user_id int, ts timestamp(3), WATERMARK FOR ts AS ts) WITH ('connector' = 'filesystem', 'path' = 'file:///visits.csv', 'format' = 'csv') The

Re: Data Stream countWindow followed by keyBy does not preserve time order

2021-11-04 Thread Yan Shen
Thanks for the advice. I created this issue: https://issues.apache.org/jira/browse/FLINK-24767 On Thu, Nov 4, 2021 at 2:44 PM Guowei Ma wrote: > Hi Yan > After a second thought I think you are right, the downstream operator > should keep the order of the same key from the same upstream. So feel

NoResourceAvailableException on taskmanager(s)

2021-11-04 Thread Deniz Koçak
Hi, We have been running our job on flink image 1.13.2-stream1-scala_2.12-java11. It's a standalone deployment on a Kubernetes cluster (EKS on AWS which uses EC2 nodes as hosts and also depends on a auto-scaler to adjust the resources cluster wide). After a few mins. (5-20) we see the exception be

RE: Custom partitioning of keys with keyBy

2021-11-04 Thread Schwalbe Matthias
Hi Yuval, … I had to do some guesswork with regard to your use case … still not exactly clear what you want to achieve, however I remember having done something similar in that area 2 years ago. Unfortunately I cannot find the implementation anymore ☹ * If you tried a combination of .parti

Fwd: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Isidoros Ioannou
I face an issue when try to match some elements in a Pattern sequence. Flink 1.11.1 version. Here is my case: final StreamExecutionEnvironment env = EnvironmentProvider.getEnvironment(); DataStream inputStream = env.fromElements( Model.of(1, "A", "US"), Model.of(2, "B", "US

Re: Upgrading to the new KafkaSink + Flink 1.14 from Flink 1.13.2 + FlinkKafkaProducer

2021-11-04 Thread Fabian Paul
Hi Yuval, We tried to reproduce your behaviour but unfortunately did not succeed. It seems you can reliably reproduce the problem. Do you think there is any way to break the problem down and share with us the problematic case so that we can reproduce it? Best, Fabian

Re: What is Could not retrieve file from transient blob store?

2021-11-04 Thread Guowei Ma
>>>Ok I missed the log below. I guess when the task manager was stopped this happened. I think if the TM stopped you also would not get the log. But It will throw another "UnknownTaskExecutorException", which would include something like “No TaskExecutor registered under ”. >>> But I guess it's ok

Re: Custom partitioning of keys with keyBy

2021-11-04 Thread Yuval Itzchakov
Thank you Schwalbe, David and Naitong for your answers! *David*: This is what we're currently doing ATM, and I wanted to know if there's any simplified approach to this. This is what we have so far: https://gist.github.com/YuvalItzchakov/9441a4a0e80609e534e69804e94cb57b *Naitong*: The keyBy intern