Re: CRD compatible with native and standalone mode

2021-04-20 Thread Yang Wang
Exactly. I think most of the fields could be shared by standalone and native mode. Best, Yang gaurav kulkarni 于2021年4月21日周三 上午10:17写道: > Thanks a lot for the response, folks! I appreciate it. I plan to use > native mode in future mostly for the resource management it plans to offer. > Let me

idleTimeMsPerSecond exceeds 1000

2021-04-20 Thread Alexey Trenikhun
Hello, When Flink job mostly idle, idleTimeMsPerSecond for given task_name and subtask_index sometimes exceeds 1000, I saw values up to 1350, but usually not higher than 1020. Is it due to accuracy of nanoTime/currentTimeMillis or there is bug in calculations ? Thanks, Alexey

Re: CRD compatible with native and standalone mode

2021-04-20 Thread gaurav kulkarni
Thanks a lot for the response, folks! I appreciate it. I plan to use native mode in future mostly for the resource management it plans to offer. Let me go through the links provided.   @Yang Wang "Since the CR is defined in yaml[2], native and standalone could have some dedicated fields. And yo

Contiguity and state storage in CEP library

2021-04-20 Thread tbud
We are evaluating a use-case where there will be 100s of events stream coming in per second and we want to run some fixed set of pattern matching rules on them And I use relaxed contiguity rules as described in the documentation. for example : /a pattern sequence "a b+ c" on the stream of "a", "b1"

Contiguity in SQL vs CEP

2021-04-20 Thread tbud
There's 3 different types of Contiguity defined in the CEP documentation [1] looping + non-looping -- Strict, relaxed and non deterministic relaxed. There's no equivalent in the SQL documentation [2]. Can someone shed some light on what's achievable in SQL and what isn't ? Related question : It se

Flink Event specific window

2021-04-20 Thread s_penakalap...@yahoo.com
Hi All, I have one requirement where I need to calculate total amount of transactions done by each each user in last 1 hour.Say Customer1 has done 2 transactions one at 11:00am and other one at 11:20 am.Customer2 has done 1 transaction one at 10:00 am Customer3 has done 3 transactions one at 11:

Re: Producing to Kafka topics dynamically without redeployment

2021-04-20 Thread Ejaskhan S
Hi Ahmed, If you have the logic to identify the destination cluster along with the target topic, you will be able to achieve this with the above solution. 1. Create one kafka producer for each cluster. If 10 clusters are there, create 10 producers. 2. Add a new attribute called 'clusterId' or so

Re: Producing to Kafka topics dynamically without redeployment

2021-04-20 Thread Ahmed A.Hamid
Thank you, Ejaskhan. I think your suggestion would only work if all the topics were on the same Kafka cluster. In my use-case, the topics can be on different clusters, which is why I was thinking of rolling a custom sink that detects config changes and instantiates Kafka producers on demand as

Re: Producing to Kafka topics dynamically without redeployment

2021-04-20 Thread Ejaskhan S
Hi Ahmed, If you want to dynamically produce events to different topics and you have the logic to identify the target topics, you will be able to achieve this in the following way. - Suppose this is your event after the transformation logic(if any) : EVENT. - This is the target topic f

Producing to Kafka topics dynamically without redeployment

2021-04-20 Thread Ahmed A.Hamid
Hello everyone, I have a use-case where I need to have a Flink application produce to a variable number of Kafka topics (specified through configuration), potentially in different clusters, without having to redeploy the app. Let's assume I maintain the set of destination clusters/topics in conf

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-20 Thread Austin Cawley-Edwards
Hi Sambaran, I'm not sure if this is the best approach, though I don't know your full use case/ implementation. What kind of error do you get when trying to map into a PreparedStatement? I assume you tried something like this? SingleOutputStreamOperator stream = env.fromElements(Row.of("YourProc

Re: Flink Statefun Python Batch

2021-04-20 Thread Timothy Bess
Hi Igal, Yes! that's exactly what I was thinking. The batching will naturally happen as the model applies backpressure. We're using pandas and it's pretty costly to create a dataframe and everything to process a single event. Internally the SDK has access to the batch and is calling my function, w

Re: Max-parellelism limitation

2021-04-20 Thread Austin Cawley-Edwards
Hi Olivier, Someone will correct me if I'm wrong, but I believe the max-parallelism limitation, where you cannot scale up past the previously defined max-parallelism, applies to all stateful jobs no matter which type of state you are using. If you haven't seen it already, I think the Production R

Re: Are configs stored as part of savepoints

2021-04-20 Thread Austin Cawley-Edwards
Hi Guarav, Which configs are you referring to? Everything usually stored in `flink-conf.yaml`[1]? The State Processor API[2] is also a good resource to understand what is actually stored, and how you can access it outside of a running job. The SavepointMetadata class[3] is another place to referen

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-20 Thread Austin Cawley-Edwards
Hey Sambaran, I'm not too familiar with the 1.7 JDBCAppendTableSink, but to make sure I understand what you're current solution looks like, it's something like the following, where you're triggering a procedure on each element of a stream? JDBCAppendTableSink sink = JDBCAppendTableSink.buil

Re: Flink support for Kafka versions

2021-04-20 Thread Austin Cawley-Edwards
Hi Prasanna, It looks like the Kafka 2.5.0 connector upgrade is tied to dropping support for Scala 2.11. The best place to track that would be the ticket for Scala 2.13 support, FLINK-13414 [1], and its subtask FLINK-20845 [2]. I have listed FLINK-20845 as a blocker for FLINK-19168 for better vis

Re: CRD compatible with native and standalone mode

2021-04-20 Thread Austin Cawley-Edwards
Hi Gaurav, I think the name "Native Kubernetes" is a bit misleading – this just means that you can use the Flink CLI/ scripts to run Flink applications on Kubernetes without using the Kubernetes APIs/ kubectl directly. What features are you looking to use in the native mode? I think it would be d

Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-20 Thread Sambaran
Hi, I am currently using JDBCAppendTableSink to execute database stored procedures from flink to populate data to external tables using SingleOutputStreamOperator (version 1.7). Now we are trying to update to Flink 1.11/ later and found JDBCAppendTableSink has been removed, currently when looking

Re: How to config the flink to load libs in myself path

2021-04-20 Thread cxydevelop
For example, now I had my custom table source or sink which were builed in a independent jar , and my main code will depend on it. But I don't want to package custom connector jar with main code in a jar flie. In other words, I want to get a thin jar not a fat jar. So I think I can put the custom

Re: Max-parellelism limitation

2021-04-20 Thread Olivier Nouguier
Hi, thank you all for your reply, by limitation I meant the impossibility to resume a job when scaling up because of this max-parallelism. To be more precise, in our deployment, the operator (max) parallelism is computed from the number of available slots ( ~~ task-manager * core ), this approach i

Re: Flink Statefun Python Batch

2021-04-20 Thread Igal Shilman
Hi Tim! Indeed the StateFun SDK / StateFun runtime, has an internal concept of batching, that kicks in the presence of a slow /congested remote function. Keep in mind that under normal circumstances batching does not happen (effectively a batch of size 1 will be sent). [1] This batch is not curren

Flink support for Kafka versions

2021-04-20 Thread Prasanna kumar
Hi Flinksters, We are researching about if we could use the latest version of kafka (2.6.1 or 2.7.0) Since we are using Flink as a processor , we came across this https://issues.apache.org/jira/browse/FLINK-19168. It says that it does not support version 2.5.0 and beyond. That was created 8 mon

Re: Accessing columns from input stream table during Window operations

2021-04-20 Thread Sumeet Malhotra
Thanks Dian, Guowei. I think it makes sense to roll with this approach. On Tue, Apr 20, 2021 at 8:29 AM Guowei Ma wrote: > Hi, Sumeet > Thanks you for the sharing. As Dian suggested, I think you could use b as > your `group_by`'s key and so the b could be output directly. > I think it is more si

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-20 Thread Klemens Muthmann
Hi, I guess this is more of a Java Problem than a Flink Problem. If you want it quick and dirty you could implement a class such as: public class Value { private boolean isLongSet = false; private long longValue = 0L; private boolean isIntegerSet = false; private int intValue = 0

Re: Max-parellelism limitation

2021-04-20 Thread Chesnay Schepler
@Olivier Could you clarify which limitation you are referring to? On 4/20/2021 5:23 AM, Guowei Ma wrote: Hi, Olivier Yes. The introduction of this concept is to solve the problem of rescaling the keystate. Best, Guowei On Mon, Apr 19, 2021 at 8:56 PM Olivier Nouguier mailto:olivier.nougu...

Correctly serializing "Number" as state in ProcessFunction

2021-04-20 Thread Miguel Araújo
Hi everyone, I have a ProcessFunction which needs to store different number types for different keys, e.g., some keys need to store an integer while others need to store a double. I tried to use java.lang.Number as the type for the ValueState, but I got the expected "No fields were detected for c