+1 to the locking interface for the release. I agree with what Marton that it seems that interruptability is to much of a burden on the sources.
The code docs should be very clear (and concise (!)) about why the locking is needed etc. – Ufuk On 31 May 2015, at 14:52, Gyula Fóra <gyula.f...@gmail.com> wrote: > Alright, let's do the locking then :) > > Let's keep only one interface for the release. > > On Sun, May 31, 2015 at 12:58 PM, Márton Balassi <balassi.mar...@gmail.com> > wrote: > >> I am also for having only one source interface. It seems that >> interruptability is to much of a burden on the sources, locking version >> should be still acceptable from the user point of view. We are dealing with >> inherently concurrent tasks, I suppose our users are familiar with locking >> - especially the ones in need for exactly once processing. >> >> On Sat, May 30, 2015 at 2:44 AM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> I would also prefer having only one source. The PR still has both >>> variants so that people can check them out. >>> >>> In my opinion the assumptions about interruptibility are easier to >>> break than the requirement of locking. Even if we get the kafka source >>> to work with the interruptions (which I doubt, because this fails >>> somewhere in their code) this would not guarantee that this will >>> always work in future versions. With the locking you either have the >>> locking, then it is correct (even for feature versions) or you don't, >>> then it is immediately incorrect. >>> >>> On Fri, May 29, 2015 at 10:56 PM, Gyula Fóra <gyula.f...@gmail.com> >> wrote: >>>> Hey, >>>> >>>> It seems like both interfaces are pretty much capable of doing the same >>>> thing but work on slightly different assumptions. >>>> >>>> Isn't there a way that the kafka source can work with the >> interruptions? >>> I >>>> think the reachedEnd/next interface is slightly easier to grasp than >> the >>>> run() with the locks. But in any case I would slightly prefer having >> only >>>> one of them if they can technically do the same thing. >>>> >>>> Also adding a new interface means we add a new streamtask >> implementation >>>> which is also getting slightly too much. >>>> >>>> What is you opinion on this? >>>> >>>> Gyula >>>> >>>> >>>> >>>> On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljos...@apache.org >>> >>>> wrote: >>>> >>>>> Hi All, >>>>> after finishing my pull request that should fix the problems with the >>>>> synchronisation of checkpoints and element emission (the reason for >>>>> the faulty results of the exactly-once tests) I discovered that the >>>>> Kafka source does not deal well with being interrupted. We recently >>>>> changed the SourceFunction to the reachedEnd()/next() interface, with >>>>> the contract that the source must be interruptible to be able to >>>>> perform checkpoints. Now this doesn't seem to work with Kafka. I added >>>>> another Source interface in my PR >>>>> (https://github.com/apache/flink/pull/742). This is similar to the >> old >>>>> interface of run()/cancel(), with the addition that the source must >>>>> acquire a lock before updating state and emitting elements. The update >>>>> of state and the emission of elements must happen in the same >>>>> synchronized block to ensure consistency. This seems to solve the >>>>> problem but now we have two source interfaces. >>>>> >>>>> The question is now. What do you think about the two interfaces? >>>>> Should we keep both? Remove one? >>>>> >>>>> Cheers, >>>>> Aljoscha >>>>> >>> >>