Questions about checkpoint retention

2022-01-26 Thread Chen-Che Huang
Hi all, To minimize the recovery time from failure, we employ incremental, retained checkpoint with `state.checkpoints.num-retained as 10` in our Flink apps. With this setting, Flink automatically creates new checkpoints regularly and keeps only the latest 10 checkpoints. Besides, for app upgra

Re: Is it possible to support many different windows for many different keys?

2022-01-26 Thread Alexander Fedulov
> > Again, thank you for your input. You are welcome. I want the stream element to define the window. Got it, that was the missing bit of detail. That is also doable - not with the Windows API, but with the more low level ProcessFunction. Check out my blog post [1] , especially it's third part [

Re: Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
Hi Alexander, Thank you for responding. The solution you proposed uses statically defined windows. What I need a are dynamically created windows determined by metadata in the stream element. I want the stream element to define the window. That’s what I’m trying to research, or an alternate sol

Re: Is it possible to support many different windows for many different keys?

2022-01-26 Thread Alexander Fedulov
Hi Marco, Not sure if I get your problem correctly, but you can process those windows on data "split" from the same input within the same Flink job. Something along these lines: DataStream stream = ... DataStream a = stream.filter( /* time series name == "a" */); a.keyBy(...).window(TumblingEvent

Unbounded streaming with table API and large json as one of the columns

2022-01-26 Thread HG
Hi, I need to calculate elapsed times between steps of a transaction. Each step is an event. All steps belonging to a single transaction have the same transaction id. Every event has a handling time. All information is part of a large JSON structure. But I can have the incoming source supply trans

Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
Hi, I am working with time series data in the form of (timestamp, name, value), and an event time that is the timestamp when the data was published onto kafka, and I have a business requirement in which each stream element becomes enriched, and then processing requires different time series names

Re: Failure Restart Strategy leads to error

2022-01-26 Thread Siddhesh Kalgaonkar
Hi Yun and Oran, Thanks for your time. Much appreciated! Below are my configs: val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) env.enableCheckpointing(2000) //env.setDefaultSavepointDirectory("file:home/siddhesh/Desktop/savepoints/") env.getCheck

Re: Reading performance - Kafka VS FileSystem

2022-01-26 Thread Yun Tang
Hi Jasmin, >From my knowledge, it seems no big company would adopt pure file system source >as the main data source of Flink. We would in general choose a message queue, >e.g Kafka, as the data source. Best Yun Tang From: Jasmin Redžepović Sent: Wednesday, Janu

Re: Reading performance - Kafka VS FileSystem

2022-01-26 Thread Jasmin Redžepović
Also, what would you recommend? I have both options available: * Kafka - protobuf messages * S3 - here are messages copied from kafka for persistence with Kafka Connect service On 26.01.2022., at 14:43, Jasmin Redžepović mailto:jasmin.redzepo...@superbet.com>> wrote: Hello Flink commit

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
DataStream API -- Sender:Guowei Ma Sent At:2022 Jan. 26 (Wed.) 21:51 Recipient:Shawn Du Cc:user Subject:Re: create savepoint on bounded source in streaming mode Hi, Shawn Thank you for your sharing. Unfortunately I do not

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Guowei Ma
Hi, Shawn Thank you for your sharing. Unfortunately I do not think there is an easy way to achieve this now. Actually we have a customer who has the same requirement but the scenario is a little different. The bounded and unbounded pipeline have some differences but the customer wants reuse some s

Reading performance - Kafka VS FileSystem

2022-01-26 Thread Jasmin Redžepović
Hello Flink committers :) Just one short question: How is performance of reading from Kafka source compared to reading from FileSystem source? I would be very grateful if you could provide a short explanation. I saw in documentation that both provide exactly-once semantics for streaming, but t

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
right! -- Sender:Guowei Ma Sent At:2022 Jan. 26 (Wed.) 19:50 Recipient:Shawn Du Cc:user Subject:Re: create savepoint on bounded source in streaming mode Hi,Shawn You want to use the correct state(n-1) for day n-1 and the fu

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Guowei Ma
Hi,Shawn You want to use the correct state(n-1) for day n-1 and the full amount of data for day n to produce the correct state(n) for day n. Then use state(n) to initialize a job to process the data for day n+1. Am I understanding this correctly? Best, Guowei Shawn Du 于2022年1月26日 周三下午7:15写道:

Resolving a CatalogTable

2022-01-26 Thread Balázs Varga
Hi everyone, I'm trying to migrate from the old set of CatalogTable related APIs (CatalogTableImpl, TableSchema, DescriptorProperties) to the new ones (CatalogBaseTable, Schema and ResolvedSchema, CatalogPropertiesUtil), in a custom catalog. The catalog stores table definitions, and the current l

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
Hi Gaowei, think the case: we have one streaming application built by flink, but kinds of reason, the event may be disordered or delayed terribly. we want to replay the data day by day(the data was processed like reordered.). it looks like a batching job but with

Re: How to run in IDE?

2022-01-26 Thread Chesnay Schepler
We will need more of the logs contents to help you (preferably the whole thing. On 25/01/2022 23:55, John Smith wrote: I'm using: final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); But no go. On Mon, 24 Jan 2022 at 16:35, John Smith wrote: Hi u

Flink-ML: Sink model data in online training

2022-01-26 Thread thekingofcity
Hi, I want sink the model data (coefficient from the logsitic regression model in my case) from the flink.ml.api.Model to print or file. I figure out the way to sink it in the batch training mode but face the following exception when the Estimator takes an UNBOUNDED datastream. ``` Caused by:

Re: Failure Restart Strategy leads to error

2022-01-26 Thread Yun Tang
Hi Siddhesh, The root cause is that the configuration of group.id is missing for the Flink program. The configuration of restart strategy has no relationship with this. I think you should pay your attention to kafka related configurations. Best Yun Tang From: S

Re: create savepoint on bounded source in streaming mode

2022-01-26 Thread Shawn Du
cool! HybridSource seems much close to my requirements. Thanks Dawid. I will have a try. Shawn -- Sender:Dawid Wysakowicz Sent At:2022 Jan. 26 (Wed.) 15:49 Recipient:user Subject:Re: create savepoint on bound