Re: DROOLS rule engine with flink

2020-06-24 Thread Jörn Franke
I would implement them directly in Flink/Flink table API. I don’t think Drools plays well in this distributed scenario. It expects a centralized rule store and evaluation . > Am 23.06.2020 um 21:03 schrieb Jaswin Shah : > >  > Hi I am thinking of using some rule engine like DROOLS with flink t

Use CheckpointedFunction interface on Custom Window Trigger

2020-06-24 Thread KristoffSC
Hi all, is it possible that Custom Window Trigger (extending Trigger class) will also implement CheckpointedFunction? In my custom Trigger I have a complicated logic executed in Trigger::onElement method. Currently I'm using a triggerctx.getPartitionedState to do all reads and writes for data man

Re: Session Window with Custom Trigger

2020-06-24 Thread KristoffSC
I think I've figured it out. I switched to GlobalWidnow with my custom trigger. My Trigger combines processingTime trigger logic and onElement trigger logic. Only one should be executed in scope of particular window. I managed to do this by returning FIRE_AND_PURGE and cleat all timers and state

Re: State leak

2020-06-24 Thread Rafi Aroch
Hi Ori, Once a session ends, it's state should get purged. You should take care that a session does end. For example, if you wait for a 'session-end' event, limit it with some time threshold. If it's defined with inactivity gap and your client sends infinite events, you could limit the session len

Re: DROOLS rule engine with flink

2020-06-24 Thread Jaswin Shah
Thats what I wanted to know I will I be able to achieve same only with flink if not use drool engine? From: Jörn Franke Sent: 24 June 2020 12:46 To: Jaswin Shah Cc: user@flink.apache.org Subject: Re: DROOLS rule engine with flink I would implement them directly

Performance issue associated with managed RocksDB memory

2020-06-24 Thread Juha Mynttinen
Hello there, In Flink 1.10 the configuration parameter state.backend.rocksdb.memory.managed defaults to true. This is great, since it makes it simpler to stay within the memory budget e.g. when running in a container environment. However, I've noticed performance issues when the switch is enabl

Re: Re: TypeInformation not found

2020-06-24 Thread Yun Gao
Hi Yu, I tried WordCount and the attached test, it should be able to run normally in my IDEA. Could you have a check of the imported project, or reimport the project if there are still problems ? Best, Yun --Original Mail -- Sender:Yu Wang Send Date:Tue Jun

RE: DROOLS rule engine with flink

2020-06-24 Thread Chenna, Venkatasubbaiah
You can refer this article to implement rules with generic flink job https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html From: Jaswin Shah Sent: Wednesday, June 24, 2020 4:34 PM To: Jörn Franke Cc: user@flink.apache.org Subject: Re: DROOLS rule engine with flink [EXTERNAL EMAIL]

Re: [Announce] Flink Forward Call for Proposals Extended

2020-06-24 Thread Seth Wiesman
As a reminder, the CfP for Flink Forward is open until this Sunday, June 28th. If you've never spoken at a conference before and are thinking about submitting, out amazing event manager Laura just wrote an article on dev.to about why virtual conferences are the best way to get started. [1] https:

Re: State leak

2020-06-24 Thread Ori Popowski
Hi, thanks for answering. Sounds reasonable On Wed, Jun 24, 2020 at 11:50 AM Rafi Aroch wrote: > Hi Ori, > > Once a session ends, it's state should get purged. You should take care > that a session does end. > For example, if you wait for a 'session-end' event, limit it with some > time thresho

Dynamic partitioner for Flink based on incoming load

2020-06-24 Thread Alexander Filipchik
Hello! We are working an a Flink Streaming job that reads data from multiple Kafka topics and writes them to DFS. We are using StreamingFileSink with custom implementation for GCS FS and it generates a lot of files as streams are partitioned among multiple JMs. In the ideal case we should have at

passing additional jvm parameters to the configuration

2020-06-24 Thread Georg Heiler
Hi, how can I pass additional configuration parameters like spark`s extraJavaOptions to a flink job? https://stackoverflow.com/questions/62562153/apache-flink-and-pureconfig-passing-java-properties-on-job-startup contains the details. But the gist is: flink run --class com.github.geoheil.streami

Re:

2020-06-24 Thread Arvid Heise
Hi, your error matches what I see when I forget to import import org.apache.flink.api.scala._ Could you please double-check and if you did that post a (minimal) example? Best, Arvid On Tue, Jun 23, 2020 at 3:42 AM 王宇 wrote: > Hi, all > some error occurred when I run flink in minicluste

Re: Flink Stream job to parquet sink

2020-06-24 Thread Arvid Heise
Hi Anuj, There is currently no way to dynamically change the topology. It would be good to know why your current approach is not working (restart taking too long? Too frequent changes?) So some ideas: - Have some kind submitter that restarts flink automatically on config change (assumes that rest

Re: jobmanager restart failed. could not set up jobmanager

2020-06-24 Thread Arvid Heise
Could you check if the user that starts the job has permissions to write to that location? If so, then we need more information (Flink version, how you execute job) and the proper logs. It would be interesting to know if there are any earlier errors/warnings in the log. On Tue, Jun 23, 2020 at 9:

Re: Non parallel file sources

2020-06-24 Thread Arvid Heise
Another option if the file is small enough is to load it in the driver and directly initialize an in-memory source (env.fromElements). On Tue, Jun 23, 2020 at 9:57 PM Vishwas Siravara wrote: > Thanks that makes sense. > > On Tue, Jun 23, 2020 at 2:13 PM Laurent Exsteens < > laurent.exste...@eura

Re: RichAggregationFunction

2020-06-24 Thread Arvid Heise
Hi Steven, could you please provide more information. Which Flink version are you using? Why isn't RichAggregationFunction working for you? In general, you always have the option to use a custom window assigner and delegate most of the calls to some provided implementation. Then you modify the be

Re: passing additional jvm parameters to the configuration

2020-06-24 Thread Arvid Heise
Hi Georg, could you check if simply using -D is working as described here [1]. If not, could you please be more precise: do you want the parameter to be passed to the driver, the job manager, or the task managers? [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html#deployment

Re: passing additional jvm parameters to the configuration

2020-06-24 Thread Georg Heiler
Hi Arvid, thanks for the quick reply. I have a strong Apache spark background. There, when executing on YARN or locally usually, the cluster is created on-demand for the duration of the batch /streaming job. There, there is only the concept of A) master/driver (application master) B) slave/executo

Re: Faild to load dependency after migration to Flink 1.10

2020-06-24 Thread Chesnay Schepler
Did you make any modifications to the Flink scripts in order to get log4j2 to work with Flink 1.10? IIRC we had to modify the scripts when we migrated to log4j2 in 1.11; if this is done incorrectly it could break thing. Do all Flink processes have this issue, or only TaskExecutors? Can you pro

Re: RichAggregationFunction

2020-06-24 Thread Seth Wiesman
Hi Steven, AggregationFunctions (along with Reduce and other “pre aggregation” functions) are not allowed to be Rich. In general if you need to go outside the predefined bounds of what the window operator provides I’d encourage you to take a look at a KeyedProcessFunction. Seth On Wed, Jun 24,

Re: Dynamic partitioner for Flink based on incoming load

2020-06-24 Thread Seth Wiesman
You can achieve this in Flink 1.10 using the StreamingFileSink. I’d also like to note that Flink 1.11 (which is currently going through release testing and should be available imminently) has support for exactly this functionality in the table API. https://ci.apache.org/projects/flink/flink-docs-

Re: Dynamic partitioner for Flink based on incoming load

2020-06-24 Thread Alexander Filipchik
Maybe I misreading the documentation, but: "Data within the partition directories are split into part files. Each partition will contain at least one part file for each subtask of the sink that has received data for that partition." So, it is 1 partition per subtask. I'm trying to figure out how t

Re: Performance issue associated with managed RocksDB memory

2020-06-24 Thread Juha Mynttinen
Hey, Here's a simple test. It's basically the WordCount example from Flink, but using RocksDB as the state backend and having a stateful operator. The javadocs explain how to use it. /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See