Re: flink async snapshots

2016-05-19 Thread Stavros Kontopoulos
No problem ;) On Thu, May 19, 2016 at 9:54 PM, Abhishek R. Singh < abhis...@tetrationanalytics.com> wrote: > If you can take atomic in-memory copies, then it works (at the cost of > doubling your instantaneous memory). For larger state (say rocks DB), won’t > you have to stop the world (atomic s

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
Cool thnx Paris. On Thu, May 19, 2016 at 9:48 PM, Paris Carbone wrote: > Sure, in practice you can set a threshold of retries since an operator > implementation could cause this indefinitely or any other reason can make > snapshotting generally infeasible. If I recall correctly that threshold >

flink async snapshots

2016-05-19 Thread Abhishek R. Singh
If you can take atomic in-memory copies, then it works (at the cost of doubling your instantaneous memory). For larger state (say rocks DB), won’t you have to stop the world (atomic snapshot) and make a copy? Doesn’t that make it synchronous, instead of background/async? Sorry Stravros - for bu

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
Sure, in practice you can set a threshold of retries since an operator implementation could cause this indefinitely or any other reason can make snapshotting generally infeasible. If I recall correctly that threshold exists in the Flink configuration. On 19 May 2016, at 20:42, Stavros Kontopoul

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
The problem here is different though if something is keep failing (permanently) in practice someone needs to be notified. If the user loses snapshotting he must know. On Thu, May 19, 2016 at 9:36 PM, Abhishek R. Singh < abhis...@tetrationanalytics.com> wrote: > I was wondering how checkpoints can

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
Hi Abhishek, I don’t see the problem there (also this is unrelated to the snapshotting protocol). Intuitively, if you submit a copy of your state (full or delta) for a snapshot version/epoch to a store backend and validate the full snapshot for that version when you eventually receive the acknow

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Abhishek R. Singh
I was wondering how checkpoints can be async? Because your state is constantly mutating. You probably need versioned state, or immutable data structs? -Abhishek- > On May 19, 2016, at 11:14 AM, Paris Carbone wrote: > > Hi Stavros, > > Currently, rollback failure recovery in Flink works in the

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
Invalidations are not necessarily exposed (I hope). Think of it as implementing TCP, you don’t have to warn the user that packets are lost since eventually a packet will be received at the other side in an eventually sunchronous system. Snapshots follow the same paradigm. Hope that helps. On 19

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
Yes thats what i was thinking thnx. When people here exactly once they think are you sure, there is something hidden there... because theory is theory :) So if you keep getting invalidated snapshots but data passes through operators you issue a warning or fail the pipeline and return an exception t

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
In that case, typically a timeout invalidates the whole snapshot (all states for the same epoch) until eventually we have a full complete snapshot. On 19 May 2016, at 20:26, Stavros Kontopoulos mailto:st.kontopou...@gmail.com>> wrote: "Checkpoints are only confirmed if all parallel subtasks suc

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
True, if you like formal modelling and stuff like that you can think of it as a more relaxed/abortable operation (e.g. like abortable consensus) which yields the same guarantees and works ok in semi-synchronous distributed systems (as in the case of Flink). On 19 May 2016, at 20:22, Stavros Kon

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
"Checkpoints are only confirmed if all parallel subtasks successfully created a valid snapshot of the state." as stated by Robert. So to rephrase my question... how confirmation that all snapshots are finished is done and what happens if some task is very slow...or is blocked? If you have N tasks c

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
Hey thnx for the links. There are assumptions though like reliable channels... since you rely on tcp in practice and if a checkpoint fails or is very slow then you need to deal with it thats why i asked previously what happens then.. 3cp does not need assumptions i think, but engineering is mo

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
Regarding your last question, If a checkpoint expires it just gets invalidated and a new complete checkpoint will eventually occur that can be used for recovery. If I am wrong, or something has changed please correct me. Paris On 19 May 2016, at 20:14, Paris Carbone mailto:par...@kth.se>> wro

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Paris Carbone
Hi Stavros, Currently, rollback failure recovery in Flink works in the pipeline level, not in the task level (see Millwheel [1]). It further builds on repayable stream logs (i.e. Kafka), thus, there is no need for 3pc or backup in the pipeline sources. You can also check this presentation [2] w

Connect 2 datastreams and iterate

2016-05-19 Thread Biplob Biswas
Hi, Is there a way to connect 2 datastreams and iterate and then get the feedback as a third stream? I tried doing mergedDS = datastream1.connect(datastream2) iterateDS = mergedDS.iterate().withFeedback(datastream3type.class) but this didnt work and it was showing me errors. Is there any oth

Re: custom sources

2016-05-19 Thread Abhishek R. Singh
Thanks - appreciate the response. The reason I want to control these things is this - my state grows and shrinks over time (user level windowing as application state). I would like to trigger checkpoints just after the state has been crunched/compressed (at the window boundary). Say I crunch ev

Re: flink snapshotting fault-tolerance

2016-05-19 Thread Stavros Kontopoulos
Cool thnx. So if a checkpoint expires the pipeline will block or fail in total or only the specific task related to the operator (running along with the checkpoint task) or nothing happens? On Tue, May 17, 2016 at 3:49 PM, Robert Metzger wrote: > Hi Stravos, > > I haven't implemented our checkpo

TypeExtractor:1672 - class org.joda.time.DateTime is not a valid POJO type

2016-05-19 Thread Flavio Pompermaier
Hi to all, I'm using Flink 1.0.2 and testing the job I discovered that I have a lot of log with this error: *TypeExtractor:1672 - class org.joda.time.DateTime is not a valid POJO type* initially I though I forgot to properly migrate my code from 0.10.x to 1.0.x as stated in [1] but the I checked

inheritance of Program interface in Program.java in org.apache.flink.api.common

2016-05-19 Thread 윤형덕
hello first I'm sorry for my poor english. I were looking at PackagedProgram.java from org.apache.flink.client.program and in following cunstructor: PackagedProgram(File jarFile, List classpaths, String entryPointClassName, String... args) there is some code I couldn't understand. please look

"Last One" Window

2016-05-19 Thread Artem Bogachev
Hi, I’ve faced a problem trying to model our platform using Flink Streams. Let me describe our model: // Stream of data, ex. stocks: (AAPL, 100.0), (GZMP, 100.0) etc. val realData: DataStream[(K, V)] = env.addSource(…) // Stream of forecasts (same format) based on some window aggregates val fo

Re: Performing Reduce on a group of datasets

2016-05-19 Thread Fabian Hueske
I think that sentence is misleading and refers to the internals of Flink. It should be removed, IMO. You can only union two DataSets. If you want to union more, you have to do it one by one. Btw. union does not cause additional processing overhead. Cheers, Fabian 2016-05-19 14:44 GMT+02:00 Rites

Re: killing process in Flink cluster

2016-05-19 Thread Till Rohrmann
Could you check the logs in FLINK_HOME/log/. They might tell you what went wrong after submitting the job to the jobmanager. Cheers, Till ​ On Wed, May 18, 2016 at 10:05 AM, Ramkumar wrote: > Hi, > > As you mentioned, my program contains a loop with infinite iterations. I > was not able to sto

Re: custom sources

2016-05-19 Thread Till Rohrmann
Hi Abhishek, you can implement custom sources by implementing the SourceFunction or the ParallelSourceFunction interface and then calling StreamExecutionEnvironment.addSource. At the moment, it is not possible to control manually or from a source function when to trigger a checkpoint. This is the

Re: Performing Reduce on a group of datasets

2016-05-19 Thread Ritesh Kumar Singh
Thanks for the reply Fabian, Though here's a small thing I found on the documentation page: https://ci.apache.org/projects/flink/flink-docs-release-0.8/programming_guide.html#transformations If you look into the Union section, "This operation happens implicitly if more than one data set is used f

Re: Zookeeper Session Timeout

2016-05-19 Thread Till Rohrmann
Hi Konstantin, I've checked and the CuratorFramework client should be started with the correct session timeout (see ZooKeeperUtils.java:90). However, the ZooKeeper server has a min and max session timeout value ( http://zookeeper.apache.org/doc/r3.3.1/zookeeperAdmin.html). This interval limits the

Re: Flink Version 1.1

2016-05-19 Thread Aljoscha Krettek
I'm afraid these will not be making it into the next release. I'm actively working on both of them, though mostly in the "ideas phase" still. On Wed, 18 May 2016 at 19:49 Vlad Podgurschi wrote: > Is there any chance any of these will make it into 1.1? > > https://issues.apache.org/jira/browse/FL

Re: 1.1-snapshot issues

2016-05-19 Thread Maximilian Michels
This should be resolved according to Apache Infra. On Tue, May 17, 2016 at 11:28 PM, Henry Saputra wrote: > Looks like it has been resolved, Could you try it again? > > On Tue, May 17, 2016 at 7:02 AM, Stefano Baghino > wrote: >> >> I believe that Apache repo is having some issues right now: >>