Re: RocksDB KeyValue store

2019-07-29 Thread taher koitawala
I believe Flink serialization is really fast and GC is much better from Flink 1.6 release, along side the state depends on what you do with it. each task manager has its own instance of rocks DB and is responsible for snapshot for his own instance upon checkpointing. Further more if you used a key

RocksDB KeyValue store

2019-07-29 Thread Navneeth Krishnan
Hi All, I looked at the RocksDB KV store implementation and I found that deserialization has to happen for each key lookup. Given a scenario where the key lookup has to happen for every single message would it still be a good idea to store it in rocksdb store or would in-memory store/cache be more

Re: Event time window eviction

2019-07-29 Thread taher koitawala
Hi Navneeth, There are 3 ways we can work with data now flowing and windows and not being fired because watermark flow stopped. 1. Write a custom trigger which fires the window if no elements arrive. 2. Your watermark assigner function can also house the logic that if no more watermarks are

Re:StreamingFileSink part file count reset

2019-07-29 Thread Haibo Sun
Hi Sidhartha, Currently, the part counter is never reset to 0, nor is it allowed to customize the part filename. So I don't think there's any way to reset it right now. I guess the reason why it can't be reset to 0 is that it is concerned that the previous parts will be overwritten. Although

Re: Event time window eviction

2019-07-29 Thread Navneeth Krishnan
Thanks Taher. Are there any examples for this? In my scenario I would have data coming in and it might stop for sometime but I need the window to end after the duration. Also, I believe in version 1.3 the event time will progress only if all partitions in a kafka topic pass the event time. Is that

Re: Graceful Task Manager Termination and Replacement

2019-07-29 Thread Biao Liu
Hi Yu, That's a great proposal. Wish to see this feature soon! On Mon, Jul 29, 2019 at 4:59 PM Yu Li wrote: > Belated but FWIW, besides the region failover and best-efforts failover > efforts, I believe stop with checkpoint as proposed in FLINK-12619 and > FLIP-45 could also help here, FYI. > >

Re: Savepoint process recovery in Jobmanager HA setup

2019-07-29 Thread Bajaj, Abhinav
Thanks much for your response. I was also suspecting the same and just wanted to confirm. I guess the best way forward for now is to request savepoint again. ~ Abhi From: Yun Tang Date: Saturday, July 27, 2019 at 7:35 AM To: "Bajaj, Abhinav" , "user@flink.apache.org" Subject: Re: Savepoint pr

StreamingFileSink part file count reset

2019-07-29 Thread sidhartha saurav
Hi, We are using StreamingFileSink with a custom BucketAssigner and DefaultRollingPolicy. The custom BucketAssigner is simply a date bucket assigner. The StreamingFileSink creates part files with name "part--". The count is an integer and is incrementing on each rollover. Now my doubts are: 1. Wh

Apache Flink and additional fileformats (Excel, blockchains)

2019-07-29 Thread Jörn Franke
Hi, I wrote some time ago several connectors for Apache Flink that are open sourced under the Apache License: * HadoopOffice: Process Excel files (reading/writing) (.xls,.xlsx) in Flink using the datasource or Table API: https://github.com/ZuInnoTe/hadoopoffice/wiki/Using-Apache-Flink-to-read-wri

Re: Is Queryable State not working in standalone cluster with 1 task manager?

2019-07-29 Thread Oytun Tez
And, Flink version: 1.8, incl. Docker container (official flink:1.8 tag) --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Mon, Jul 29, 2019 at 4:32 PM Oytun Tez wrote: > Hi there, > > We have a job that is run inside Dock

Is Queryable State not working in standalone cluster with 1 task manager?

2019-07-29 Thread Oytun Tez
Hi there, We have a job that is run inside Docker via `standalone-job.sh start-foreground --job-classname`, with 1 task manager. I've been trying to make QueryableState available in this setup for 2 days now and I can't seem to enable it. If I create a LocalEnvironment within the code itself and

FlatMap returning Row<> based on ArrayList elements()

2019-07-29 Thread Andres Angel
Hello everyone, I need to parse into an anonymous function an input data to turn it into several Row elements. Originally I would have done something like Row.of(1,2,3,4) but these elements can change on the flight as part of my function. This is why I have decided to store them in a list and righ

Re: Event time window eviction

2019-07-29 Thread taher koitawala
I believe the approach to this is wrong... For fixing windows we can write our custom triggers to fire them... However what I'm not convinced with is switching between event and processing time. Write a custom triggers and fire the event time window if you don't see any activity. That's th

Re: Event time window eviction

2019-07-29 Thread Navneeth Krishnan
Hi All, Any suggestions? Thanks On Thu, Jul 25, 2019 at 11:45 PM Navneeth Krishnan wrote: > Hi All, > > I'm working on a very short tumbling window for 1 second per key. What I > want to achieve is if the event time per key doesn't progress after a > second I want to evict the window, basicall

Re: Memory constrains running Flink on Kubernetes

2019-07-29 Thread wvl
Excellent. Thanks for all the answers so far. So there was another issue I mentioned which we made some progress gaining insight into, namely our metaspace growth when faced with job restarts. We can easily hit 1Gb metaspace usage within 15 minutes if we restart often. We attempted to troubleshoo

Re: Checkpoints timing out for no apparent reason

2019-07-29 Thread spoganshev
Switching to 1.8 didn't help. Timeout exception from Kinesis is a consequence, not a reason. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Why is the size of each checkpoint increasing?

2019-07-29 Thread Till Rohrmann
I think the size of the checkpoint strongly depends on the data you are feeding into this function. Depending on the actual values, it might be that you never fire the window. Please verify what carData actually returns. Cheers, Till On Mon, Jul 29, 2019 at 11:09 AM 陈Darling wrote: > > Flink ve

Re: Extending REST API with new endpoints

2019-07-29 Thread Fabian Hueske
Hi Oytun, Thanks for your input and feature request! The right way to propose a feature and contribute it is described here [1]. Basically, you should open a Jira issue and start a discussion about the feature there. If it is a bigger features, you should also bring it to the dev@f.a.o mailing li

Re: LEFT JOIN issue SQL API

2019-07-29 Thread Fabian Hueske
If you need an outer join, the only solution is to convert the table into a retraction stream and correctly handle the retraction messages. Btw. even then this might not perform as you would like it to be. The query will store all input tables completely in state. So you might run out of space soon

Why is the size of each checkpoint increasing?

2019-07-29 Thread 陈Darling
Flink version is 1.81 The eaxmple is adapted according to TopSpeedWindowing DataStream> topSpeeds = carData .assignTimestampsAndWatermarks(new CarTimestamp()).setParallelism(parallelism) .keyBy(0) .countWindow(countSize, slideSize) .trigger(DeltaTrigger.of(triggerM

Re: Graceful Task Manager Termination and Replacement

2019-07-29 Thread Yu Li
Belated but FWIW, besides the region failover and best-efforts failover efforts, I believe stop with checkpoint as proposed in FLINK-12619 and FLIP-45 could also help here, FYI. W.r.t k8s, there're also some offline discussion about supporting local recovery with persistent volume even when task a

Re: Memory constrains running Flink on Kubernetes

2019-07-29 Thread Yu Li
For the memory usage of RocksDB, there's already some discussion in FLINK-7289 and a good suggestion