Re:are there any ways to test the performance of rocksdb state backend?

2018-05-17 Thread sihua zhou
Hi makeyang, there are some cases under _org.apache.flink.contrib.streaming.state.benchmark.*_ that you can refer to. But, I not sure whether it's possible to upgrade the RocksDB to any higher version because the regression of the merge operator, the comments in this PR https://github.com/apa

are there any ways to test the performance of rocksdb state backend?

2018-05-17 Thread makeyang
I'd like to integrate newer version of rocksdb with flink. I'd like to know if there are existing tools/ways to benchmark the performance of rocksdb state backend to see if there are performence improve or drop? MaKeyang TIG.JD.COM -- Sent from: http://apache-flink-user-mailing-list-archive.2

Re: flink list -r shows CANCELED jobs - Flink 1.5

2018-05-17 Thread Rong Rong
Hi Edward, I dug a little into the CLIFrontend code and seems like there's some discrepancies between the description and the result return from flink list -r I have documented the issue in [1]. Please feel free to comment if I missed anything. Thanks, Rong Reference: [1] https://issues.apache.o

Re: flink list -r shows CANCELED jobs - Flink 1.5

2018-05-17 Thread Rong Rong
This sounds like a bug to me, I can reproduce with the latest RC#3 of Flink 1.5 with: bin/start-cluster.sh > bin/flink run examples/streaming/WordCount.jar > bin/flink list -r > Would you please file a JIRA bug report? I will look into it -- Rong On Thu, May 17, 2018 at 10:50 AM, Edward Rojas

Re: Message guarantees with S3 Sink

2018-05-17 Thread Gary Yao
Hi Amit, The BucketingSink doesn't have well defined semantics when used with S3. Data loss is possible but I am not sure whether it is the only problem. There are plans to rewrite the BucketingSink in Flink 1.6 to enable eventually consistent file systems [1][2]. Best, Gary [1] http://apache-f

Re: flink list -r shows CANCELED jobs - Flink 1.5

2018-05-17 Thread Edward Rojas
I forgot to add an example of the execution: $ ./bin/flink list -r Waiting for response... -- Running/Restarting Jobs --- 17.05.2018 19:34:31 : edec969d6f9609455f9c42443b26d688 : FlinkAvgJob (CANCELED) 17.05.2018 19:36:01 : bd87ffc35e1521806928d6251990d715 : FlinkAv

Re: Message guarantees with S3 Sink

2018-05-17 Thread Amit Jain
Hi Rong, We are using BucketingSink only. I'm looking for the case where TM does not get the chance to call Writer#flush like YARN killed the TM because of OOM. We have configured fs.s3.impl to com.amazon.ws.emr.hadoop.fs.EmrFileSystem in core-site.xml, so BucketingSink is using S3 client internal

flink list -r shows CANCELED jobs - Flink 1.5

2018-05-17 Thread Edward Rojas
Hello all, On Flink 1.5, the CLI returns the CANCELED jobs when requesting only the running job by using the -r flag... is this an intended behavior ? On 1.4 CANCELED jobs does not appear when running this command. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.c

Re: Message guarantees with S3 Sink

2018-05-17 Thread Rong Rong
Hi Amit, Can you elaborate how you write using "S3 sink" and which version of Flink you are using? If you are using BucketingSink[1], you can checkout the API doc and configure to flush before closing your sink. This way your sink is "integrated with the checkpointing mechanism to provide exactly

LocatableInputSplit flink and not Assigning properly

2018-05-17 Thread Daniel Tavares
So I have a cluster with one job manager and two task manager with the IP 192.168.112.74 and 192.168.112.75. Each task manager specific has specific data I want to read and for that, I created LocatableInputSPlit one for each task manager. I am using the LocatableInputSplitAssigner, however, when I

Re: Async Source Function in Flink

2018-05-17 Thread Federico D'Ambrosio
I see, thank you very much for your answer! I'll look into pool connection handling. Alternatively, I suppose that since it is a SourceFunction, even synchronous calls may be used without side effects in Flink? Thank you, Federico Il giorno mar 15 mag 2018 alle ore 16:16 Timo Walther ha scritto

Re: Missing MapState when Timer fires after restored state

2018-05-17 Thread Stefan Richter
Hi, > > This raises a couple of questions: > - Is it a bug though, that the state restoring goes wrong like it does for my > job? Based on my experience it seems like rescaling sometimes works, but then > you can have these random errors. If there is a problem, I would still consider it a bug

Message guarantees with S3 Sink

2018-05-17 Thread Amit Jain
Hi, We are using Flink to process click stream data from Kafka and pushing the same in 128MB file in S3. What is the message processing guarantees with S3 sink? In my understanding, S3A client buffers the data on memory/disk. In failure scenario on particular node, TM would not trigger Writer#clo