Re: Restore from savepoint with Iterations

2020-05-05 Thread ashish pok
Let me see if I can do artificial throttle somewhere. Volume of data is really high and hence trying to avoid rounds in Kafka too. Looks like options are “not so elegant” until FLIP-15. Thanks for pointers again!!! On Monday, May 4, 2020, 11:06 PM, Ken Krugler wrote: Hi Ashish, The workaroun

Re: Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-09-04 Thread ashish pok
this point. It would be good to know the reason you want to have such HA and see if Flink meets you requirement in another way. Thanks, Jiangjie (Becket) Qin On Thu, Aug 29, 2019 at 9:19 PM ashish pok wrote: Looks like Flink is using “assign” partitions instead of “subscribe” which will not allow

Re: Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-08-29 Thread ashish pok
, ashish pok wrote: All, I was wondering what the expected default behavior is when same app is deployed in 2 separate clusters but with same group Id. In theory idea was to create active-active across separate clusters but it seems like both apps are getting all the data from Kafka.  Anyone

Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-08-28 Thread ashish pok
All, I was wondering what the expected default behavior is when same app is deployed in 2 separate clusters but with same group Id. In theory idea was to create active-active across separate clusters but it seems like both apps are getting all the data from Kafka.  Anyone else has tried somethin

Re: JSON to CEP coversion

2019-01-22 Thread ashish pok
of the logic. Best, Dom. [1] https://github.com/apache/bahir[2] https://github.com/wso2/siddhi wt., 22 sty 2019 o 20:20 ashish pok napisał(a): All, Wondering if anyone in community has started something along the line - idea being CEP logic is abstracted out to metadata instead. That can then

JSON to CEP coversion

2019-01-22 Thread ashish pok
All, Wondering if anyone in community has started something along the line - idea being CEP logic is abstracted out to metadata instead. That can then further be exposed out to users from a REST API/UI etc. Obviously, it would really need some additional information like data catalog etc for it

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
e know if you recommend using latest test utils with 1.4.2 core as a test.  Thanks, Ashish On Monday, September 17, 2018, 9:33:56 AM EDT, ashish pok wrote: Hi Till, I am still in 1.4.2 version and will need some time before we can get later version certified in our Prod env. Timers a

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
at 7:57 PM ashish pok wrote: Hi Till, To answer your first question, I currently don't (and honestly now sure how other than of course in IDE I can use breakpoint, or if something like MockIto can do it). So did I interpret it correctly that it sounds like execution env started using

Re: Unit / Integration Test Timer

2018-09-14 Thread ashish pok
umed within a fraction of the 2 seconds? For this it would be better to use event time which allows you to control how time passes. If you want to test a specific operator you could try out the One/TwoInputStreamOperatorTestHarness. Cheers,Till On Fri, Sep 14, 2018 at 3:36 PM ashish pok wrote

Unit / Integration Test Timer

2018-09-14 Thread ashish pok
All, Hopefully a quick one. I feel like I have seen this answered before a few times before but can't find an appropriate example. I am trying to run few tests where registered timeouts are invoked (snippet below). Simple example as show in documentation for integration test (using flink-test-ut

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
with active windows, windows which have not been purged yet. Maybe Aljoscha knows more about why the window state is growing (I would not rule out a bug). Cheers,Till On Tue, Jul 31, 2018 at 1:45 PM ashish pok wrote: Hi Till, Keys are unbounded (a group of events have same key but that key

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
Hi Till, Keys are unbounded (a group of events have same key but that key doesnt repeat after it is fired other than some odd delayed events). So basically there 1 key that will be aligned to a window. When you say key space of active windows, does that include keys for windows that have already

Re: Implement Joins with Lookup Data

2018-07-25 Thread ashish pok
.  Thanks, - Ashish On Wednesday, July 25, 2018, 4:07 AM, Michael Gendelman wrote: Hi Ashish, We are planning for a similar use case and I was wondering if you can share the amount of resources you have allocated for this flow? Thanks,Michael On Tue, Jul 24, 2018, 18:57 ashish pok wrote: BTW

Re: Implement Joins with Lookup Data

2018-07-24 Thread ashish pok
is dead is not recoverable. How do you recover from that situation? Will the pipeline die and you go over the entire bootstrap process? On Tue, Jul 24, 2018 at 11:56 ashish pok wrote: BTW,  We got around bootstrap problem for similar use case using a “nohup” topic as input stream. Our CICD

Re: Implement Joins with Lookup Data

2018-07-24 Thread ashish pok
BTW,  We got around bootstrap problem for similar use case using a “nohup” topic as input stream. Our CICD pipeline currently passes an initialize option to app IF there is a need to bootstrap and waits for X minutes before taking a savepoint and restart app normally listening to right topic(s).

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
r/ops/state/large_state_tuning.html#task-local-recovery Am 23.07.2018 um 14:18 schrieb ashish pok : Sorry, Just a follow-up. In absence of NAS then the best option to go with here is checkpoint and savepoints both on HDFS and StateBackend using local SSDs then? We were trying to not even hit

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Sorry, Just a follow-up. In absence of NAS then the best option to go with here is checkpoint and savepoints both on HDFS and StateBackend using local SSDs then? We were trying to not even hit HDFS other than for savepoints. - Ashish On Monday, July 23, 2018, 7:45 AM, ashish pok wrote

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Stefan, I did have first point at the back of my mind. I was under the impression though for checkpoints, cleanup would be done by TMs as they are being taken by TMs. So for a standalone cluster with its own zookeeper for JM high availability, a NAS is a must have? We were going to go with local

Re: Multi-tenancy environment with mutual auth

2018-07-16 Thread ashish pok
. Nico On 14/07/18 14:02, ashish pok wrote: > All, > > We are running into a blocking production deployment issue. It looks > like Flink inter-communications doesnt support SSL mutual auth. Any > plans/ways to support it? We are going to have to create DMZ for each > tenant

Multi-tenancy environment with mutual auth

2018-07-14 Thread ashish pok
All, We are running into a blocking production deployment issue. It looks like Flink inter-communications doesnt support SSL mutual auth. Any plans/ways to support it? We are going to have to create DMZ for each tenant without that, not preferable of course. - Ashish

Re: Flink Kafka TimeoutException

2018-07-05 Thread ashish pok
Our experience on this has been that if Kafka cluster is healthy, JVM resource contentions on our Flink app caused by high heap utilization and there by lost CPU cycles on GC also did result in this issue. Getting basic JVM metrics like CPU load, GC times and Heap Util from your app (we use Grap

Re: Memory Leak in ProcessingTimeSessionWindow

2018-07-02 Thread ashish pok
June 22, 2018, 9:00:14 AM EDT, ashish pok wrote: Stefan, All,  If there are no further thoughts on this I am going to switch my app to low level Process API. I still think there is an easier solution here which I am missing but I will revisit that after I fix Production issue. Thanks, A

Re: How to partition within same physical node in Flink

2018-07-02 Thread ashish pok
idered failed                                DataStream cameraWithCubeDataStream = cameraWithCubeDataStreamAsync. keyBy((cameraWithCube) -> cameraWithCube.getTs()); On Thu, Jun 28, 2018 at 9:22 AM ashish pok wrote: Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to fi

Re: How to partition within same physical node in Flink

2018-06-28 Thread ashish pok
Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to first keyBy parent and then by child. If we want to have physical partitioning in a way where physical partiotioning happens first by parent key and localize grouping by child key, is there

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-22 Thread ashish pok
, June 21, 2018, 7:28 AM, ashish pok wrote: Hi Stefan,  Thanks for outlining the steps and are similar to what we have been doing for OOM issues. However, I was looking for something more high level on whether state / key handling needs some sort of cleanup specifically that is not done by default. I

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-21 Thread ashish pok
/sundararajan/querying-java-heap-with-oql Am 20.06.2018 um 02:33 schrieb ashish pok : All,  I took a few heap dumps (when app restarts and at 2 hour intervals) using jmap, they are 5GB to 8GB. I did some compares and what I can see is heap shows data tuples (basically instances of object that is

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-19 Thread ashish pok
based on ProcessingTime, I really would have expected memory to reach a steady state and remain sort of flat from a trending perspective.  Appreciate any pointers anyone might have. Thanks, Ashish On Monday, June 18, 2018, 12:54:03 PM EDT, ashish pok wrote: Right, thats where I am heade

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-18 Thread ashish pok
runs into the problem and share it with us? That would make finding the cause a lot easier. Best,Stefan Am 15.06.2018 um 23:01 schrieb ashish pok : All, I have another slow Memory Leak situation using basic TimeSession Window (earlier it was GlobalWindow related that Fabian helped clarify).  I

Memory Leak in ProcessingTimeSessionWindow

2018-06-15 Thread ashish pok
All, I have another slow Memory Leak situation using basic TimeSession Window (earlier it was GlobalWindow related that Fabian helped clarify).  I have a very simple data pipeline: DataStream processedData = rawTuples .window(ProcessingTimeSessionWindows.withGap(Time.seconds(A

IoT Use Case, Problem and Thoughts

2018-06-05 Thread ashish pok
Fabian, Stephan, All, I started a discussion a while back around having a form of event-based checkpointing policy that will help us in some of our high volume data pipelines. Here is an effort to put this in front of community and understand what capabilities can support these type of use cases

Re: Best way to clean-up states in memory

2018-05-14 Thread ashish pok
s purged. So you cannot (fully) rely on timers to clean up per-window state. Best, Fabian 2018-05-14 9:34 GMT+02:00 Kostas Kloudas : Hi Ashish, It would be helpful to share the code of your custom trigger for the first case.Without that, we cannot tell what state you create and how/when you u

Re: Best way to clean-up states in memory

2018-05-13 Thread ashish pok
ng Flink's RocksDBStateBackend? If your job accumulates state exceeding the available main memory, then you have to use a state backend which can spill to disk. The RocksDBStateBackend offers you exactly this functionality. Cheers,Till On Mon, Apr 30, 2018 at 3:54 PM, ashish pok wrote: All, I am

Best way to clean-up states in memory

2018-04-30 Thread ashish pok
All, I am using noticing heap utilization creeping up slowly in couple of apps which eventually lead to OOM issue. Apps only have 1 process function that cache state. I did make sure I have a clear method invoked when events are collected normally, on exception and on timeout. Are any other best

Re: Scaling down Graphite metrics

2018-04-16 Thread ashish pok
sted in. On 13.04.2018 18:52, ashish pok wrote: All, We love Flinks OOTB metrics but boy there is a ton :) Any way to scale them down (frequency and metric itself)? Flink apps are becoming huge source of data right now. Thanks, -- Ashish

Scaling down Graphite metrics

2018-04-13 Thread ashish pok
All, We love Flinks OOTB metrics but boy there is a ton :) Any way to scale them down (frequency and metric itself)? Flink apps are becoming huge source of data right now. Thanks, -- Ashish

Re: Fw: 1.4.2 - Unable to start YARN containers with more than 1 vCores per Task Manager

2018-04-03 Thread ashish pok
Thanks Shashank, I will check ot out. -- Ashish On Tue, Apr 3, 2018 at 10:11 AM, shashank734 wrote: CHeck in your Yarn configuration, Are you using DeafaultResourceCalculater which only uses memory while allocating resources. So you have to change it to DominantResourceCalculator.     ya

Fw: 1.4.2 - Unable to start YARN containers with more than 1 vCores per Task Manager

2018-04-03 Thread ashish pok
Hi All,   I had been using the following command in a Lab environment successfully in 1.3 Flink version.   yarn-session.sh -n 4 -s 4 -jm 2048 -tm 2048 -Dyarn.containers.vcores=2 -nm infra.test3   As expected, I see 4 TMs with 16 slots and taking 8 vCores from YARN. In a new Prod

Re: Error running on Hadoop 2.7

2018-03-26 Thread ashish pok
us. Thanks a lot! Cheers,Till On Wed, Mar 21, 2018 at 6:25 PM, ashish pok wrote: Hi Piotrek, At this point we are simply trying to start a YARN session.  BTW, we are on Hortonworks HDP 2.6 which is on 2.7 Hadoop if anyone has experienced similar issues.  We actually pulled 2.6 binaries for the

Re: "dynamic" bucketing sink

2018-03-26 Thread ashish pok
Hi Christophe, Have you looked at Kite SDK? We do something like this but using Gobblin and Kite SDK, which is a parallel pipeline to Flink. It feels like if you partition by something logical like topic name, you should be able to sink using Kite SDK. Kite allows you good ways to handle further

Re: Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
On 21 Mar 2018, at 16:52, ashish pok wrote: Hi Piotrek, Yes, this is a brand new Prod environment. 2.6 was in our lab. Thanks, -- Ashish On Wed, Mar 21, 2018 at 11:39 AM, Piotr Nowojski wrote: Hi, Have you replaced all of your old Flink binaries with freshly downloaded Hadoop 2.7 versi

Re: Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
ix in the process? Does some simple word count example works on the cluster after the upgrade? Piotrek On 21 Mar 2018, at 16:11, ashish pok wrote: Hi All, We ran into a roadblock in our new Hadoop environment, migrating from 2.6 to 2.7. It was supposed to be an easy lift to get a YARN sessio

Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
Hi All, We ran into a roadblock in our new Hadoop environment, migrating from 2.6 to 2.7. It was supposed to be an easy lift to get a YARN session but doesnt seem like :) We definitely are using 2.7 binaries but it looks like there is a call here to a private methos which screams runtime incompa

Restart hook and checkpoint

2018-03-02 Thread ashish pok
All, It looks like Flink's default behavior is to restart all operators on a single operator error - in my case it is a Kafka Producer timing out. When this happens, I see logs that all operators are restarted. This essentially leads to data loss. In my case the volume of data is so high that it

Re: Task Manager detached under load

2018-02-24 Thread ashish pok
taskmanager.Debug.memory.start logthread for debugging.  Med venlig hilsen / Best regardsLasse Nedergaard Den 20. jan. 2018 kl. 12.57 skrev Kien Truong : Hi, You should enable and check your garbage collection log. We've encountered case where Task Manager disassociated due to long GC paus

Re: Task manager not able to rejoin job manager after network hicup

2018-02-24 Thread ashish pok
We see the same in 1.4. I dont think we could see this in 1.3. I had started a thread a while back on this. Till asked for more details. I havent had a chance to get back to him on this. If you can repro this easily perhaps you can get to it faster. I will find the thread and resend. Thanks, --

Re: Strata San Jose

2018-02-09 Thread ashish pok
Awesome, I will send a note from my work email :)  -- Ashish On Fri, Feb 9, 2018 at 5:12 AM, Fabian Hueske wrote: Hi Ashish, I'll be at Strata San Jose and give two talks. Just ping me and we can meet there :-) Cheers, Fabian 2018-02-09 0:53 GMT+01:00 ashish pok : Wondering if a

Strata San Jose

2018-02-08 Thread ashish pok
Wondering if any of the core Flink team members are planning to be at the conference? It would be great to meet in peson. Thanks, -- Ashish

Kafka Producer timeout causing data loss

2018-01-19 Thread ashish pok
Team, One more question to the community regarding hardening Flink Apps. Let me start off by saying we do have known Kafka bottlenecks which we are in the midst of resolving. So during certain times of day, a lot of our Flink Apps are seeing Kafka Producer timeout issues. Most of the logs are som

Understanding Restart Strategy

2018-01-19 Thread ashish pok
Team, Hopefully, this is a quick one.  We have setup restart strategy as follows in pretty much all of our apps: env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, Time.of(30, TimeUnit.SECONDS))); This seems pretty straight-forward. App should retry starting 10 times every 30 sec

Task Manager detached under load

2018-01-19 Thread ashish pok
Hi All, We have hit some load related issues and was wondering if any one has some suggestions. We are noticing task managers and job managers being detached from each other under load and never really sync up again. As a result, Flink session shows 0 slots available for processing. Even though,

Re: Metric Registry Warnings

2017-11-13 Thread ashish pok
Thanks Fabian! Sent from Yahoo Mail on Android On Mon, Nov 13, 2017 at 4:44 AM, Fabian Hueske wrote: Hi Ashish, this is a known issue and has been fixed for the next version [1]. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-7100 2017-11-11 16:02 GMT+01:00 Ashish Pokhare

Re: Kafka Consumer fetch-size/rate and Producer queue timeout

2017-11-07 Thread ashish pok
Thanks Fabian. I am seeing thia consistently and can definitely use some help. I have plenty of graphana views I can share if that helps :) Sent from Yahoo Mail on Android On Tue, Nov 7, 2017 at 3:54 AM, Fabian Hueske wrote: Hi Ashish, Gordon (in CC) might be able to help you. Cheers, Fab

Re: Capacity Planning For Large State in YARN Cluster

2017-10-30 Thread ashish pok
, Till ​ On Mon, Oct 30, 2017 at 1:34 AM, ashish pok wrote: Jorn, correct and I suppose that's where we are at this point. RocksDB based backend is definitely looking promising for our use case. Since I haven't gotten a definite no-no on using 30% for YARN cut-off ratio (about 1.8GB

Re: Capacity Planning For Large State in YARN Cluster

2017-10-29 Thread ashish pok
Jorn, correct and I suppose that's where we are at this point. RocksDB based backend is definitely looking promising for our use case. Since I haven't gotten a definite no-no on using 30% for YARN cut-off ratio (about 1.8GB from 6GB memory) and off-heap flag turned on, we will continue on that p