Re: [Statefun] Dynamic behavior

2021-02-24 Thread Miguel Araújo
Thanks Seth. I understood Igal's suggestion. My concern was about maintaining a separate service (outside flink/statefun) when this control stream might be an incremental stream as well (think, rules in fraud detection - although this is not a fraud detection application, but the example is good).

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Jan Lukavský
Hi Rex, If I understand correctly, you are concerned about behavior of Kafka source in the case of compacted topic, right? If that is the case, then this is not directly related to Flink, Flink will expose the behavior defined by Kafka. You can read about it for instance here [1]. TL;TD - you

Re: Window Process function is not getting trigger

2021-02-24 Thread Kezhu Wang
Try `env.setParallelism(1)`. Default parallelism for local environment is `Runtime.getRuntime.availableProcessors`. You test data set are so small that when they are scatter cross multiple parallel instances, there will be no data with event time assigned to trigger downstream computation. Or you

Re: Community chat?

2021-02-24 Thread Marta Paes Moreira
Ah! That freenode channel dates back to...2014? The community is not maintaining any channels other than the Mailing List (and Stack Overflow), currently. But this is something we're looking into, as it's coming up more and more frequently. Would Slack be your first pick? Or would something async

[UPDATE] Release 1.13 feature freeze

2021-02-24 Thread Dawid Wysakowicz
Hi all, The agreed date of a feature freeze is due in about a month. Therefore we thought it would be a good time to give an update of the current progress. From the information we gathered there are currently no known obstacles or foreseeable delays. We are still aiming for the end of March as t

Re: Community chat?

2021-02-24 Thread Yuval Itzchakov
Both have their place IMO. There's a lot of value in synchronous communication for which I'd prefer Slack or Discord. For async communication, I think moving away from mailing lists into something like a Discourse forum would be good. On Wed, Feb 24, 2021 at 11:36 AM Marta Paes Moreira wrote: >

Re: Object and Integer size in RocksDB ValueState

2021-02-24 Thread Maciej Obuchowski
Hey. Let me send simplified example, because I don't think this "(given that the actual stored objects (integers) are the same)" is true - I'm just storing object as a placeholder: public class DeduplicationProcessFunction extends KeyedProcessFunction implements CheckpointedFunction { private

Re: Community chat?

2021-02-24 Thread Sebastián Magrí
I agree with Yuval. If we wanted to keep chats in the open source world, there's also Matrix nowadays which works quite well. On Wed, 24 Feb 2021 at 09:58, Yuval Itzchakov wrote: > Both have their place IMO. > > There's a lot of value in synchronous communication for which I'd prefer > Slack or

Co-relate two streams

2021-02-24 Thread Abhinav Sharma
Hi, How can I co-relate two streams of different types in Flink? Scenario: In stream1, I have data in pojo with a field user. In stream2, I have data in a different pojo which also contains the field user. (However, other than the user field, they have no common field). Now what I want to do is r

Re: WatermarkStrategy.for_bounded_out_of_orderness in table API

2021-02-24 Thread joris.vanagtmaal
Thanks Roman, somehow i must have missed this in the documentation. What is the difference (if any) between: Ascending timestamps: WATERMARK FOR rowtime_column AS rowtime_column - INTERVAL '0.001' SECOND. Bounded out of orderness timestamps: WATERMARK FOR rowtime_column AS rowtime_column - I

Re: How to use ProcessWindowFunction in pyflink?

2021-02-24 Thread Arvid Heise
Hi Hongyuan, it seems as if PyFlink's datastream API is still lacking window support [1], which is targeted for next release. Examples for windows in PyFlink's table API are available here [2]. from pyflink.table.window import Tumblefrom pyflink.table.expressions import lit, col orders = t_env.f

Re: latency related to the checkpointing mode EXACTLY ONCE

2021-02-24 Thread Arvid Heise
When Flink fails and restarts, it goes back in time to reprocess the data of the latest checkpoint. That's why it also deleted all uncommitted data on restart or else you would receive duplicates in your output. Hence, to get exactly once, you cannot read uncommitted data. That is true for all stre

Re: Flink 1.11.3 not able to restore with savepoint taken on Flink 1.9.3

2021-02-24 Thread Arvid Heise
A common pitiful when upgrading a Flink application with savepoints is that no explicit UIDs have been assigned to the operators. You can amend that by first adding UIDs to the job in 1.9.3 and create a savepoint with UIDs. Then try upgrading again. On Fri, Feb 19, 2021 at 9:57 AM Tzu-Li (Gordon)

Re: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-24 Thread Arvid Heise
Hi Jan, Are you running on historic data? Then your partitions might drift apart quickly. However, I still suspect that this is a bug (Watermark should only be from the slowest partition). I'm pulling in Timo who should know more. On Fri, Feb 19, 2021 at 10:50 AM Jan Oelschlegel < oelschle...@

Re: Netty LocalTransportException: Sending the partition request to 'null' failed

2021-02-24 Thread Arvid Heise
Hi Matthias, most of the debug statements are just noise. You can ignore that. Something with your network seems fishy to me. Either taskmanager 1 cannot connect to taskmanager 2 (and vice versa), or the taskmanager cannot connect locally. I found this fragment, which seems suspicious Failed to

Re: WatermarkStrategy.for_bounded_out_of_orderness in table API

2021-02-24 Thread joris.vanagtmaal
Ah thanks, so event though the method of describing it is exactly the same, because you're using the max resolution it isn't useful for out-of-orderness. Ok, clear -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to pass PROCTIME through an aggregate

2021-02-24 Thread Arvid Heise
Hi Rex, just an idea, wouldn't it be possible to just add UNIX_TIMESTAMP() right before your window operation? On Sat, Feb 20, 2021 at 2:14 AM Rex Fenley wrote: > Hello, > > Using the table api, I have a CREATE DDL which adds a PROCTIME() column > and I need to use it deep (like 10 operator

Re: Flink job finished unexpected

2021-02-24 Thread Arvid Heise
Hi Rainie, there are two probably causes: * Network instabilities * Taskmanager died, then you can further dig in the taskmanager logs for errors right before that time. In both cases, Flink should restart the job with the correct restart policies if configured. On Sat, Feb 20, 2021 at 10:07 PM

Re: stop job with Savepoint

2021-02-24 Thread Arvid Heise
Hi Alexey, The list looks complete to me. Please report back if this is not correct. On Sat, Feb 20, 2021 at 11:30 PM Alexey Trenikhun wrote: > Adding "list" to verbs helps, do I need to add anything else ? > > -- > *From:* Alexey Trenikhun > *Sent:* Saturday, Febru

Re: Object and Integer size in RocksDB ValueState

2021-02-24 Thread Roman Khachatryan
Thanks for the clarification. RocksDB stores whatever value Flink passes to it after serialization. The value is passed as an array of bytes so the minimum is single byte. Integer would require 4 bytes, Object - 1 or 2 depending on the serializer (Pojo or Kryo), and boolean just 1 byte. Besides th

Re: Configure operator based on key

2021-02-24 Thread Arvid Heise
Hi Abhinav, I think there is no way and I don't easily see how that could be added. You can however apply the same logic in the trigger also in your process function to detect which case caused the trigger. If that is expensive to calculate, you might want to do it in a map before entering the win

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-24 Thread Jan Oelschlegel
Hi Arvid, thanks for bringing back this topic. Yes, I’m running on historic data, but as you mentioned that should not be the problem, even there is a event-time skew between partitions. But maybe this issue with the missing watermark pushdown per partition is the important fact: https://iss

Re: Flink jobs organization and maintainability

2021-02-24 Thread Arvid Heise
If you have many similar jobs, they should be in the same repo (especially if they have the same development cycle). First, how different are the jobs? A) If they are very similar, go with just one job and configure it differently for each application. Then you can use different deployments of the

Re: Flink custom trigger use case

2021-02-24 Thread Arvid Heise
Hi Diwakar, the issue is that you fire_and_purge the state, you should just FIRE on the first element (or else you lose the information that you received the element already). You'd use FIRE_AND_PURGE on the last element though. On Wed, Feb 24, 2021 at 7:16 AM Khachatryan Roman < khachatryan.ro..

Re: BroadcastState dropped when data deleted in Kafka

2021-02-24 Thread Arvid Heise
Could you double-check if your Flink application was restarted between Kafka topic was cleared and the time you saw that the rules have been lost? I suspect that you deleted the Kafka topic and the Flink application then failed and restarted. Upon restart it read the empty rule topic. To solve it

Re: Object and Integer size in RocksDB ValueState

2021-02-24 Thread Maciej Obuchowski
Thanks Roman, that's exactly what I needed. śr., 24 lut 2021 o 14:37 Roman Khachatryan napisał(a): > > Thanks for the clarification. > > RocksDB stores whatever value Flink passes to it after serialization. > The value is passed as an array of bytes so the minimum is single byte. > Integer would

Re: Window Process function is not getting trigger

2021-02-24 Thread sagar
Thanks Kezhu, It worked!!! On Wed, Feb 24, 2021 at 2:47 PM Kezhu Wang wrote: > Try `env.setParallelism(1)`. Default parallelism for local environment is > `Runtime.getRuntime.availableProcessors`. > > You test data set are so small that when they are scatter cross multiple > parallel instances,

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Arvid Heise
Jan's response is correct, but I'd like to emphasize the impact on a Flink application. If the compaction happens before the data arrives in Flink, the intermediate updates are lost and just the final result appears. Also if you restart your Flink application and reprocess older data, it will natu

Re: Co-relate two streams

2021-02-24 Thread Arvid Heise
Hi Abhinav, sounds like you want to implement a join [1]. You usually want to use a window and then correlate the data between them only within the timeframe. You can use global windows if you cannot add a time window, but note that the state will grow indefinitely. If one of the sources is small

Flink Statefun TTL

2021-02-24 Thread Timothy Bess
Hey, I noticed that the Flink Statefun 2.1.0 release notes had this snippet with regards to TTL: Note: The state expiration mode for remote functions is currently > restricted to AFTER_READ_AND_WRITE, and the actual TTL being set is the > longest duration across all registered state, not for each

Jackson object serialisations

2021-02-24 Thread Lasse Nedergaard
Hi I’m looking for advice for the best and simplest solution to handle JSON in Flink. Our system is data driven and based on JSON. As the structure isn’t static mapping it to POJO isn’t an option I therefore transfer ObjectNode and / or ArrayNode between operators either in Tuples Tuple2 or a

Re: BroadcastState dropped when data deleted in Kafka

2021-02-24 Thread bat man
Hi Arvid, The Flink application was not re-started. I had checked on that. By adding rules to the state of process function you mean the state which is local to the keyedprocess function? >From [1] what is being done here - final MapState> state = getRuntimeContext().getMapState( mapStateDesc);

Re: Jackson object serialisations

2021-02-24 Thread Maciej Obuchowski
Hey Lasse, I've had a similar case, albeit with Avro. I was reading from multiple Kafka topics, which all had different objects and did some metadata driven operations on them. I could not go with any concrete predefined types for them, because there were hundreds of different object types. My sol

Re: Flink Statefun TTL

2021-02-24 Thread Tzu-Li (Gordon) Tai
Hi Timothy, Starting from StateFun 2.2.x, in the module.yaml file, you can set for each individual state of a function an "expireMode" field, which values can be either "after-invoke" or "after-write". For example: ``` - function: meta: ... spec: states: - name: state-

Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-24 Thread Debraj Manna
I am trying out flink example as explained in flink docs in a single node yarn cluster

Re: Flink job finished unexpected

2021-02-24 Thread Rainie Li
I see, I will check tm log. Thank you Arvid. Best regards Rainie On Wed, Feb 24, 2021 at 5:27 AM Arvid Heise wrote: > Hi Rainie, > > there are two probably causes: > * Network instabilities > * Taskmanager died, then you can further dig in the taskmanager logs for > errors right before that tim

Re: Flink custom trigger use case

2021-02-24 Thread Diwakar Jha
Hi Arvid, Thanks. I tried FIRE instead of FIRE_AND_PURGE and it introduced duplicates though the result is still the same i.e record 1 is fired both at the start and the end of the window. so for every window i see the first event of the window is coming twice in the output. I'm trying to explain

Re: Flink 1.11.2 test cases fail with Scala 2.12.12

2021-02-24 Thread soumoks
Thank you! I had scala-library 2.12.8 in my dependency tree (Probably a remnant from when I was testing with Scala 2.12.8). I did the following to fix this issue. Removed scala-library 2.12.8 from my dependency tree and added the below dependency. org.scala-lang scala-library 2.1

Re: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-24 Thread Benchao Li
Hi Jan, What you are observing is correct for the current implementation. Current watermark generation is based on subtask instead of partition. Hence if there are more than on partition in the same subtask, it's very easy to see more data dropped. AFAIK, FLIP-27 could solve this problem, howeve

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Rex Fenley
All of our Flink jobs are (currently) used for web applications at the end of the day. We see a lot of latency spikes from record amplification and we were at first hoping we could pass intermediate results through Kafka and compact them to lower the record amplification, but then it hit me that th

Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-24 Thread Bariša
I'm running flink 1.8.2 in a container, and under heavy load, container gets OOM from the kernel. I'm guessing that that reason for the kernel OOM is large size of the off-heap memory. Is there a way I can limit it in flink 1.8.2? I can see that newer version of flink has a config param, checking

BackPressure in RowTime Task of FlinkSql Job

2021-02-24 Thread Aeden Jameson
I have a job made up of a few FlinkSQL statements using a statement set. In my job graph viewed through the Flink UI a few of the tasks/statements are preceded by this task rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME)) that has an upstream Kafka source/sink task. Occasionally,

Re: Flink Statefun TTL

2021-02-24 Thread Timothy Bess
Hi Gordon, Ah so when it said "all registered state" that means all state keys defined in the "module.yaml", not all state for all function instances. So the expiration has always been _per_ instance then and not across all instances of a function. Thanks for the heads up, that sounds like a good

Re: Flink Statefun TTL

2021-02-24 Thread Tzu-Li (Gordon) Tai
On Thu, Feb 25, 2021 at 12:06 PM Timothy Bess wrote: > Hi Gordon, > > Ah so when it said "all registered state" that means all state keys > defined in the "module.yaml", not all state for all function instances. So > the expiration has always been _per_ instance then and not across all > instance

apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-24 Thread Abhishek Shukla
I was getting bean creation logs and spring boot start up logs in Flink 1.9 with flink1.9_log4j-cli.properties (attached) # Licensed to the Apache Software Foundation (ASF) under one # or more contributor licen

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-24 Thread Debraj Manna
The same has been asked in StackOverflow also. Any suggestions here? On Wed, Feb 24, 2021 at 10:25 PM Debraj Manna wrote: > I am trying out flink example as explained in flink do

Re: Flink jobs organization and maintainability

2021-02-24 Thread yidan zhao
I used a yarm config file to describe my jobs, and using 'start xxxJobName' to start the job which is implemented by shell scripts. Arvid Heise 于2021年2月24日周三 下午10:09写道: > If you have many similar jobs, they should be in the same repo (especially > if they have the same development cycle). > > Fi

Re: Flink custom trigger use case

2021-02-24 Thread Diwakar Jha
Hello, I tried using *processWindowFunction* since it gives access to *globalstate* through *context*. My question is, Is it possible to discard single events inside *process* function of *processWindowFunction* just like *onElements* of triggers? For my use case it seems that trigger is not suffi

Get JobId and JobManager RPC Address in RichMapFunction executed in TaskManager

2021-02-24 Thread Sandeep khanzode
Hello, I am deploying a standalone-job cluster (cluster with a single Job and Task Manager instance instantiated with a —job-classname and —job-id). I have map/flatmap/process functions being executed in the various stream functions in the Taskmanager for which I need access to the Job Id and t

Re: java.io.IOException: Could not create storage directory for BLOB store in '/tmp'

2021-02-24 Thread wheatdog liou
Turns out the disk used by docker for mac is full. I followed the operation on docker site [1] and everything is fine. [1] https://docs.docker.com/docker-for-mac/space/#delete-unnecessary-containers-and-images wheatdog liou 於 2021年2月19日 週五 上午10:47寫道: > Hi, I am new to Flink and was following Fl