Re: Making state in streaming more explicit

2015-04-30 Thread Stephan Ewen
That would be one way of doing it, yes... On Thu, Apr 30, 2015 at 10:23 PM, Gyula Fóra wrote: > Okay, so the commit would be something like: > > commitState(OperatorState state) > > > On Thu, Apr 30, 2015 at 10:17 PM, Stephan Ewen wrote: > > > I think your assumption (and the current kafka sour

Re: Making state in streaming more explicit

2015-04-30 Thread Gyula Fóra
Okay, so the commit would be something like: commitState(OperatorState state) On Thu, Apr 30, 2015 at 10:17 PM, Stephan Ewen wrote: > I think your assumption (and the current kafka source implementation) is > that there is one state object that you update/mutate all the time. > > If you draw a

Re: Change Streaming Source Function Interface

2015-04-30 Thread Gyula Fóra
Okay, sounds very reasonable :) On Thu, Apr 30, 2015 at 10:15 PM, Stephan Ewen wrote: > For the variant with the "run()" method, this requires strong assumptions > about the internal behavior of the source. > > Unless I am overlooking something, the source needs to guarantee this: > > - It need

Re: Making state in streaming more explicit

2015-04-30 Thread Stephan Ewen
I think your assumption (and the current kafka source implementation) is that there is one state object that you update/mutate all the time. If you draw a snapshot state object at the time of checkpoint, the source can continue and that particular offset is remembered as the state of this checkpoi

Re: Change Streaming Source Function Interface

2015-04-30 Thread Stephan Ewen
For the variant with the "run()" method, this requires strong assumptions about the internal behavior of the source. Unless I am overlooking something, the source needs to guarantee this: - It needs to lock internally and perform the state update and record emit call inside the locked scope -

Re: Making state in streaming more explicit

2015-04-30 Thread Gyula Fóra
Regarding the commits (for instance kafka offset): I dont exactly get how you mean to do this, if the source continues processing after the checkpoint and before the commit, it will not know what state has been committed exactly, so it would need to know the time of checkpoint and store a local co

Re: Change Streaming Source Function Interface

2015-04-30 Thread Gyula Fóra
Hi, The only thing we need is to guarantee that the source will not output any records or update the state while we take the snapshot and send the barrier. There are multiple ways of doing this I guess. We could simply lock on these objects for instance or add the methods you wrote. If we lock, we

Re: Making state in streaming more explicit

2015-04-30 Thread Stephan Ewen
Thanks for the comments! Concerning acknowledging the checkpoint: The sinks need to definitely acknowledge it. If we asynchronously write the state of operator (and emit downstream barriers before that is complete), then I think that we also need those operators to acknowledge the checkp

Re: Making state in streaming more explicit

2015-04-30 Thread Paris Carbone
I agree with all suggestions, thanks for summing it up Stephan. A few more points I have in mind at the moment: - Regarding the acknowledgements, indeed we don’t need to make all operators commit back, we just have to make sure that all sinks have acknowledged a checkpoint to consider it comple

Re: Gzip support

2015-04-30 Thread Robert Metzger
There is already support for inflate compressed files and I introduced logic to handle unsplittable formats. Sent from my iPhone > On 30.04.2015, at 19:39, Stephan Ewen wrote: > > I think that would be very worthwhile :-) Happy to hear that you want to > contribute that! > > Decorating the i

Making state in streaming more explicit

2015-04-30 Thread Stephan Ewen
I was looking into the handling of state in streaming operators, and it is a bit hidden from the system Right now, functions can (of they want) put some state into their context. At runtime, state may occur or not. Before runtime, the system cannot tell which operators are going to be stateful, an

Change Streaming Source Function Interface

2015-04-30 Thread Stephan Ewen
Hi all! I think we need to change the interface of the streaming source function. The function currently has simply a run() method where it does its work, until canceled. With this, it is hard to write sources, where the state and the snapshot barriers are exactly aligned. When performing the ch

Re: Gzip support

2015-04-30 Thread Stephan Ewen
I think that would be very worthwhile :-) Happy to hear that you want to contribute that! Decorating the input stream sounds like a great approach and would also work for other compression formats. The other thing that needs to be taken into account is that GZIP files are not splittable in the sa

Re: Migrating our website from SVN to Git

2015-04-30 Thread Henry Saputra
Sound good to me =) - Henry On Thu, Apr 30, 2015 at 2:29 AM, Maximilian Michels wrote: > Hi everyone, > > As of today [0], the ASF officially offers Git-based repositories for the > project websites. I filed a JIRA [1] to get us a Git repository for our > website. I would assume that everyone li

Re: Migrating our website from SVN to Git

2015-04-30 Thread Robert Metzger
Thank you for the update. I think its best if we wait for you to come back because you know all the details of the migration ,) On Thu, Apr 30, 2015 at 6:07 PM, Maximilian Michels wrote: > I transferred the website files to the new Git repository. I changed the > website to build to the "conten

Re: Migrating our website from SVN to Git

2015-04-30 Thread Maximilian Michels
I transferred the website files to the new Git repository. I changed the website to build to the "content" instead of the "site" directory, renamed the files, and adapted the "How to contribute" guide. I pushed everything to the "asf-site" branch (has to be in that branch, master is not possible at

Gzip support

2015-04-30 Thread Kruse, Sebastian
Hi everyone, I just recently came across a use-case where I needed to read gzip files and handle byte order marks transparently. I know that gzip can be read with Hadoop input formats but that did not work for me since I wanted to reuse my existing custom Flink input formats. It turned out tha

[jira] [Created] (FLINK-1965) Implement the Orthant-wise Limited Memory QuasiNewton optimization algorithm, a variant of L-BFGS that handles L1 regularization

2015-04-30 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-1965: -- Summary: Implement the Orthant-wise Limited Memory QuasiNewton optimization algorithm, a variant of L-BFGS that handles L1 regularization Key: FLINK-1965 URL: https://

[jira] [Created] (FLINK-1964) Rework TwitterSource to use a Properties object instead of a file path

2015-04-30 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1964: - Summary: Rework TwitterSource to use a Properties object instead of a file path Key: FLINK-1964 URL: https://issues.apache.org/jira/browse/FLINK-1964 Project: Flink

[jira] [Created] (FLINK-1963) Improve distinct() transformation

2015-04-30 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1963: Summary: Improve distinct() transformation Key: FLINK-1963 URL: https://issues.apache.org/jira/browse/FLINK-1963 Project: Flink Issue Type: Improvement

Re: Question about SlidingPreReducers

2015-04-30 Thread Szabó Péter
So, would it be a reasonable solution to just modify the WindowUtils.isParallelPolicy() method to return false in case of "eviction instanceof CountEvictionPolicy && trigger instanceof TimeTriggerPolicy" ? 2015-04-30 12:21 GMT+02:00 Gyula Fóra : > I'm referring to Peter's problem. If we just crea

Re: Question about SlidingPreReducers

2015-04-30 Thread Gyula Fóra
I'm referring to Peter's problem. If we just create more count discretizers it doesn't really break the semantics given the network guarantees but it is not very intuitive. On Thursday, April 30, 2015, Aljoscha Krettek wrote: > @Gyula: Are you referring to the pre-aggregator or the thing Peter >

Re: Migrating our website from SVN to Git

2015-04-30 Thread Chiwan Park
Great! :) Regards. Chiwan Park (Sent with iPhone) > On Apr 30, 2015, at 6:52 PM, Fabian Hueske wrote: > > excellent! :-) > > 2015-04-30 11:47 GMT+02:00 Stephan Ewen : > >> git for the win! >> >> On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger >> wrote: >> >>> Great, thank you for taking

[jira] [Created] (FLINK-1962) Add Gelly Scala API

2015-04-30 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1962: Summary: Add Gelly Scala API Key: FLINK-1962 URL: https://issues.apache.org/jira/browse/FLINK-1962 Project: Flink Issue Type: Task Components: Gell

Re: Question about SlidingPreReducers

2015-04-30 Thread Aljoscha Krettek
@Gyula: Are you referring to the pre-aggregator or the thing Peter mentioned? On Thu, Apr 30, 2015 at 11:12 AM, Gyula Fóra wrote: > The problem is in the WindowUtils.isParallel policy method. It makes count > policies automatically parallel as well. > > On Thursday, April 30, 2015, Aljoscha Krett

Re: Migrating our website from SVN to Git

2015-04-30 Thread Fabian Hueske
excellent! :-) 2015-04-30 11:47 GMT+02:00 Stephan Ewen : > git for the win! > > On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger > wrote: > > > Great, thank you for taking care of this. > > > > On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels > > wrote: > > > > > Hi everyone, > > > > > > As

Re: Migrating our website from SVN to Git

2015-04-30 Thread Stephan Ewen
git for the win! On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger wrote: > Great, thank you for taking care of this. > > On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels > wrote: > > > Hi everyone, > > > > As of today [0], the ASF officially offers Git-based repositories for the > > project

Re: Migrating our website from SVN to Git

2015-04-30 Thread Robert Metzger
Great, thank you for taking care of this. On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels wrote: > Hi everyone, > > As of today [0], the ASF officially offers Git-based repositories for the > project websites. I filed a JIRA [1] to get us a Git repository for our > website. I would assume t

Migrating our website from SVN to Git

2015-04-30 Thread Maximilian Michels
Hi everyone, As of today [0], the ASF officially offers Git-based repositories for the project websites. I filed a JIRA [1] to get us a Git repository for our website. I would assume that everyone likes the idea of switching to Git. If not, please raise your objections. The new repository is alre

Re: Question about SlidingPreReducers

2015-04-30 Thread Gyula Fóra
The problem is in the WindowUtils.isParallel policy method. It makes count policies automatically parallel as well. On Thursday, April 30, 2015, Aljoscha Krettek wrote: > Hi, > no, I think the two are unrelated. But that's another problem we need > to tackle then. > > Cheers, > Aljoscha > > On T

Re: Flink's multi-user support

2015-04-30 Thread Flavio Pompermaier
There was an attempt to build such a queue during the Dopa project when Flink was still Stratosphere. Probably it could be a good idea to collect the good and bad things learned from it to start designing the new scheduler :) On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen wrote: > Most component

Re: Flink's multi-user support

2015-04-30 Thread Maximilian Michels
@Fabian: Small misunderstanding :) I suggested to get rid of the multi-user mode in a standalone setup because Flink's support is not sufficient enough. Implementing proper support, as Stephan mentioned, is some work. It's already done in YARN. IMO we can save a lot of time, lines of code, and bra

Re: Question about SlidingPreReducers

2015-04-30 Thread Aljoscha Krettek
Hi, no, I think the two are unrelated. But that's another problem we need to tackle then. Cheers, Aljoscha On Thu, Apr 30, 2015 at 9:15 AM, Szabó Péter wrote: > Hey, > > our intern, Pablo pointed out that there is some problem with mixed > windowing policies. When you write > ... > .window(C

Re: Flink's multi-user support

2015-04-30 Thread Stephan Ewen
Most components are written multi-job aware. The only thing that is not in there right now is scheduling policies for fair resource sharing. This is important in shared clusters. Since YARN implements all those things (various job queues with different priorities/policies etc), I suggest to not t

Re: Question about SlidingPreReducers

2015-04-30 Thread Szabó Péter
Hey, our intern, Pablo pointed out that there is some problem with mixed windowing policies. When you write ... .window(Count ...) .every(Time ...) .mapWindow(...) ... The result makes no sense, as the window is not of the specified length. Maybe, there is some conflict between Time and