Re: Flink - Pod Identity

2021-04-03 Thread Sameer Wadkar
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in a

Re: Best way to link static data to event data?

2019-09-27 Thread Sameer Wadkar
The main consideration in these type of scenarios is not the type of source function you use. The key point is how does the event operator get the slow moving master data and cache it. And then recover it if it fails and restarts again. It does not matter that the csv file does not change ofte

Re: Per Key Grained Watermark Support

2019-09-23 Thread Sameer Wadkar
You could still handle late data. Just keep state around longer ( within a predefined lateness interval). Say your time window is a tumbling window of 5 mins and your events for a key are allowed to arrive 30 mins late, keep events around for 35 mins before evicting them from state. It means y

Re: Can Flink help us solve the following use case

2019-08-07 Thread Sameer Wadkar
You could do this using custom triggers and evictors in Flink. That way you can control when the windows fire and what elements are fired with it. And lastly the custom evictors know when to remove elements from the window. Yes Flink can support it. Sent from my iPhone > On Aug 7, 2019, at 4

Re: Flink ML Use cases

2019-05-14 Thread Sameer Wadkar
If you can save the model as a PMML file you can apply it on a stream using one of the java pmml libraries. Sent from my iPhone > On May 14, 2019, at 4:44 PM, Abhishek Singh wrote: > > I was looking forward to using Flink ML for my project where I think I can > use SVM. > > I have been able

State Recovery when job fails and auto-recovers

2018-10-17 Thread Sameer Wadkar
Hi, We have a job which is using ValueState. We have turned off checkpoints. The state is backed by rocksdb which is backed by S3. If the job fails for any exception (ex. Partitions not available or an occasional S3 404 error) and auto-recovers, is the entire state lost or does it continue f

Re: Queryable State

2018-09-05 Thread Sameer Wadkar
I have used connected streams where one part of the connected stream maintains state and the other part consumes it. However it was not queryable externally. For state that is queryable externally you are right you probably need another operator to store state and support query-ability. Sent

Re: Does Flink DataStreams using combiners?

2016-08-12 Thread Sameer Wadkar
Streaming cannot use windows. The aggregations happen on the trigger. The elements being aggregated are only known after the trigger delivers the elements to the evaluation function. Since windows can overlap and even assignment to a window is not done until the elements arrive at the sum ope

Re: Flink : CEP processing

2016-08-09 Thread Sameer Wadkar
In that case you need to get them into one stream somehow (keyBy a dummy value for example). There is always some logical key to keyBy on when data is arriving from multiple sources (ex some portion of the time stamp). You are looking for patterns within something (events happening around the s

Re: Having a single copy of an object read in a RichMapFunction

2016-08-05 Thread Sameer Wadkar
You mean "Connected Streams"? I use that for the same requirement. I way it works it looks like it creates multiple copies per co-map operation. I use the keyed version to match side inputs with the data. Sent from my iPhone > On Aug 5, 2016, at 12:36 PM, Theodore Vasiloudis > wrote: > > Ye

Re: Flink Kafka more consumers than partitions

2016-08-03 Thread Sameer Wadkar
What is the parallelism of the sink or the operator which writes to the sinks in the first case. HBase puts are constrained by the following: 1. How your regions are distributed. Are you pre-splitting your regions for the table. Do you know the number of regions your Hbase tables are split into.

Re: how does flink assign windows to task

2016-07-30 Thread Sameer Wadkar
Vishnu, I would imagine based on Max's explanation and how other systems like MapReduce and Spark partition keys, if you have 100 keys and 50 slots, 2 keys would be assigned to each slot. Each slot would maintain one or more windows (more for time based windows) and each window would have upto

Re: counting words (not frequency)

2016-07-22 Thread Sameer Wadkar
It is complicated: 1. If you have a file you should consider using the DataSet API. It is more complicated to use DataStream with files as you have to simulate a stream from a file. 2. You need a tokenizer for a map operator unless you have a word per line. 3. Sum operator is fine it will count

Re: Processing windows in event time order

2016-07-20 Thread Sameer Wadkar
Hi, If watermarks arriving from multiple sources, how long does the Event Time Trigger wait for the slower source to send its watermarks before triggering only from the faster source? I have seen that if one of the sources is really slow then the elements of the faster source fires and when the