Restart from checkpoint after program failure

2018-10-17 Thread chrisr123
Hi Folks, I'm trying to restart my program with restored state from a checkpoint after a program failure (restart strategies tried but exhausted), but I'm not picking up the restored state. What am I doing wrong here? *Summary* I'm using a very simple app on 1 node just to learn checkpointing. A

Re: State Recovery when job fails and auto-recovers

2018-10-17 Thread Hequn Cheng
Hi Sameer, In case of a failure, the job will restarts the operators and resets them to the latest successful checkpoint. So if you turn off checkpoints, all data will be lost. Generally speaking, snapshots are very light-weight and can be drawn frequently without much impact on performance. If it

State Recovery when job fails and auto-recovers

2018-10-17 Thread Sameer Wadkar
Hi, We have a job which is using ValueState. We have turned off checkpoints. The state is backed by rocksdb which is backed by S3. If the job fails for any exception (ex. Partitions not available or an occasional S3 404 error) and auto-recovers, is the entire state lost or does it continue f

Re: How do I initialize the window state on first run?

2018-10-17 Thread Rafi Aroch
Hi Jiayi, This topic has been discussed by others, take a look here for some options by Lyft: https://youtu.be/WdMcyN5QZZQ Rafi On Fri, Oct 12, 2018, 16:51 bupt_ljy wrote: > Yes…that’s an option, but it’ll be very complicated because of our storage > and business. > > Now I’m trying to write a

Re: Take RocksDB state dump

2018-10-17 Thread Gyula Fóra
Hi, If you dont mind a little trying out stuff I have some nice tooling for exactly this: https://github.com/king/bravo Let me know if it works :) Gyula Harshvardhan Agrawal ezt írta (időpont: 2018. okt. 17., Sze, 21:50): > Hello, > > We are currently using a RocksDBStateBackend for our Flin

Take RocksDB state dump

2018-10-17 Thread Harshvardhan Agrawal
Hello, We are currently using a RocksDBStateBackend for our Flink pipeline. We want to analyze the data that is stored in Rocksdb state.Is there a recommended process to do that? The sst_dump tool available from RocksDB isn’t working for us and we keep on getting errors like “Snappy not supported

[ANNOUNCE] Weekly community update #42

2018-10-17 Thread Till Rohrmann
Dear community, this is the weekly community update thread #42. Please post any news and updates you want to share with the community to this thread. # Discussion about Flink SQL integration with Hive Xuefu started a discussion about how to integrate Flink SQL with the Hive ecosystem [1]. If tha

Re: Not all files are processed? Stream source with ContinuousFileMonitoringFunction

2018-10-17 Thread Juan Miguel Cejuela
Update: not 100% sure, but I think I fixed my bug. This is what I did: I reduced (quite a lot) the maximum number of parallel operations in my `AsyncDataStream`. I had set it initially to 1000. The default of 100 did not work for me either. But somehow when I set the value to 10, everything is wo

Re: Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread Niels van Kaam
Sorry, I would not know that. I have worked with custom triggers, but haven't actually had to implement a custom window function yet. By looking at the interfaces I would not say that is possible. Niels On Wed, Oct 17, 2018 at 2:18 PM Ahmad Hassan wrote: > Hi Niels, > > Can we distinguish with

Re: IndexOutOfBoundsException on deserialization after updating to 1.6.1

2018-10-17 Thread Bruno Aranda
Hi, Thanks for your reply. We are still trying to isolate it, because this job was using a more complex state. I think it is caused by a case class that has an Option[MyOtherClass], and MyOtherClass is an enumerator, implemented using the enumeratum library. I have changed that option to be just a

Re: Either bug in the flink, or out of date documentation ( flink 1.6 , cancel API rest endpoint )

2018-10-17 Thread Chesnay Schepler
The section you're looking at is the legacy documentation which only applies if the cluster is running in legacy mode. You want to look at the "Dispatcher" section (https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html#dispatcher), which documents the PATCH opera

Re: IndexOutOfBoundsException on deserialization after updating to 1.6.1

2018-10-17 Thread aitozi
Hi,Bruno Aranda Could you provide an complete example to reproduce the exception? Thanks, Aitozi -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Either bug in the flink, or out of date documentation ( flink 1.6 , cancel API rest endpoint )

2018-10-17 Thread Barisa Obradovic
In the flink documenation, to cancel the job, request should be made to DELETE request to /jobs/:jobid/cancel https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#job-cancellation However, when I run this command, I get 404 back from the jobmanager. After reading the s

Re: Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread Ahmad Hassan
Hi Niels, Can we distinguish within apply function of 'RichWindowFunction' whether it was called due to onElement trigger call or onProcessingtime trigger call of a custom Trigger ? Thanks! On Wed, 17 Oct 2018 at 12:51, Niels van Kaam wrote: > Hi Zhen Li, > > You can control when a windowed st

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Paul Lam
Hi Niels, The link was broken, it should be https://issues.apache.org/jira/browse/FLINK-2491 . A similar question was asked a few days ago. Best, Paul Lam > 在 2018年10月17日,19:56,Niels van Kaam 写道: > > Hi All, > > Thanks for the responses,

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Niels van Kaam
Hi All, Thanks for the responses, the finished source explains my issue then. I can work around the problem by letting my sources negotiate a "final" checkpoint via zookeeper. @Paul, I think your answer was meant for the earlier question asked by Joshua? Cheers, Niels On Wed, Oct 17, 2018 at 11

Re: Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread Niels van Kaam
Hi Zhen Li, You can control when a windowed stream emits data with "Triggers". See: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#triggers Flink comes with a couple of default triggers, but you can also create your own by implementing https://ci.apache.o

IndexOutOfBoundsException on deserialization after updating to 1.6.1

2018-10-17 Thread Bruno Aranda
Hi, We are trying to update from 1.3.2 to 1.6.1, but one of our jobs keeps throwing an exception during deserialization: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at com.e

Manual trigger the window in fold operator or incremental aggregation

2018-10-17 Thread zhen li
Hi all: How can I trigger the window manually in fold operator or incremental aggregation? For example, when some conditions is meet,althouth the count window or time window is not reach

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Joshua Fan
Hi Niels, Probably not, an operator begins to do checkpoint until it gets all the barriers from all the upstream sources, if one source can not send a barrier, the downstream operator can not do checkpoint, FYI. Yours sincerely Joshua On Wed, Oct 17, 2018 at 4:58 PM Niels van Kaam wrote: > Hi

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Fabian Hueske
Hi Niels, Checkpoints can only complete if all sources are running. That's because the checkpoint mechanism relies on injecting checkpoint barriers into the stream at the sources. Best, Fabian Am Mi., 17. Okt. 2018 um 11:11 Uhr schrieb Paul Lam : > Hi Niels, > > Please see https://issues.apache

Re: Checkpointing when one of the sources has completed

2018-10-17 Thread Paul Lam
Hi Niels, Please see https://issues.apache.org/jira/browse/FLINK-249 . Best, Paul Lam > 在 2018年10月17日,16:58,Niels van Kaam 写道: > > Hi All, > > I am debugging an issue where the periodic checkpointing has halted. I > noticed that one of the so

Get nothing from TaskManager in UI

2018-10-17 Thread Joshua Fan
Hi,all Frequently, for some cluster, there is no data from Task Manager in UI, as the picture shows below. [image: tm-hang.png] but the cluster and the job is running well, just no metrics can be got. anything can do to improve this? Thanks for your assistance. Your sincerely Joshua

Checkpointing when one of the sources has completed

2018-10-17 Thread Niels van Kaam
Hi All, I am debugging an issue where the periodic checkpointing has halted. I noticed that one of the sources of my job has completed (finished). The other sources and operators would however still be able to produce output. Does anyone know if Flink's periodic checkpoints are supposed to contin

Re: Need help to understand memory consumption

2018-10-17 Thread jpreisner
Hi all, Thanks for answers. I confirm I have streaming jobs. If I resume : - "When the job is cancelled, these managed memories will be released to the MemoryManager but not recycled by gc, so you will see no changes in memory consumption" is incorrect because MemoryManager functionnality is av

Re: Need help to understand memory consumption

2018-10-17 Thread Fabian Hueske
Hi, As was said before, managed memory (as described in the blog post [1]) is only used for batch jobs. By default, managed memory is only lazily allocated, i.e., when a batch job is executed. Streaming jobs maintain state in state backends. Flink provides state backends that store the state on t