Re: Gelly Roadmap

2015-05-20 Thread Andra Lungu
The Roadmap is now available as a wiki page. https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly We're still happy to hear comments/suggestions, if any :) On Wed, May 20, 2015 at 8:43 PM, Andra Lungu wrote: > Thanks, Stephan! > > On Wed, May 20, 2015 at 8:42 PM, Stephan Ewen wrote: >

Re: Gelly Roadmap

2015-05-20 Thread Andra Lungu
Thanks, Stephan! On Wed, May 20, 2015 at 8:42 PM, Stephan Ewen wrote: > All right, you should have permissions now. > > On Wed, May 20, 2015 at 8:37 PM, Andra Lungu > wrote: > > > Sure, but first I need permissions! :) > > > > "*NOTE*: Due to spamming, we can not give every confluence user edit

Re: Gelly Roadmap

2015-05-20 Thread Stephan Ewen
All right, you should have permissions now. On Wed, May 20, 2015 at 8:37 PM, Andra Lungu wrote: > Sure, but first I need permissions! :) > > "*NOTE*: Due to spamming, we can not give every confluence user edit > permissions to the wiki. Just write to the dev@flink.apache.org (you can > also emai

Re: Gelly Roadmap

2015-05-20 Thread Andra Lungu
Sure, but first I need permissions! :) "*NOTE*: Due to spamming, we can not give every confluence user edit permissions to the wiki. Just write to the dev@flink.apache.org (you can also email to rmetzger apache.org) mailing list to get edit permissions." My user is lungu.andra Thanks! On Wed,

[jira] [Created] (FLINK-2066) Make delay between execution retries configurable

2015-05-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2066: --- Summary: Make delay between execution retries configurable Key: FLINK-2066 URL: https://issues.apache.org/jira/browse/FLINK-2066 Project: Flink Issue Type: Imp

[jira] [Created] (FLINK-2065) Cancelled jobs finish with final state FAILED

2015-05-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2065: - Summary: Cancelled jobs finish with final state FAILED Key: FLINK-2065 URL: https://issues.apache.org/jira/browse/FLINK-2065 Project: Flink Issue Type: Bug

Re: Flink on Tez Test stuck

2015-05-20 Thread Robert Metzger
Tez has just announced the availability of version 0.6.1. Maybe that version is more stable. I've filed a jira for upgrading the version: https://issues.apache.org/jira/browse/FLINK-2064 On Sun, May 17, 2015 at 12:04 PM, Robert Metzger wrote: > I saw this failure also multiple times now. > This

[jira] [Created] (FLINK-2064) Set Tez version to 0.6.1

2015-05-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2064: - Summary: Set Tez version to 0.6.1 Key: FLINK-2064 URL: https://issues.apache.org/jira/browse/FLINK-2064 Project: Flink Issue Type: Task Component

[jira] [Created] (FLINK-2063) Streaming checkpoints consider only input and output vertices

2015-05-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2063: --- Summary: Streaming checkpoints consider only input and output vertices Key: FLINK-2063 URL: https://issues.apache.org/jira/browse/FLINK-2063 Project: Flink Is

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
That's fine, you convinced me ;-) And given a flag to deactivate it, I think it should be okay for everyone. Once we have proper serialized window buffers, the number of copies should go down quite a bit anyways... On Wed, May 20, 2015 at 4:29 PM, Gyula Fóra wrote: > This is not about me, plea

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
This is not about me, please don't get me wrong :) It would be good if other people would tell their opinions as well. I am just trying to make the point that other systems do this as well for a reason. Users are used to this abstraction. On Wed, May 20, 2015 at 4:18 PM, Stephan Ewen wrote: > I

Re: Gelly Roadmap

2015-05-20 Thread Vasiliki Kalavri
Thank you for your feedback and ideas everyone! @Andra, how about moving the roadmap to the wiki? On 20 May 2015 at 15:48, Kostas Tzoumas wrote: > :-D > > Great! > > On Tue, May 19, 2015 at 4:00 PM, Andra Lungu > wrote: > > > Hi Kostas, > > > > We're way ahead of you! The first draft of the bl

[jira] [Created] (FLINK-2062) Fix names of memory segment config parameter

2015-05-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2062: --- Summary: Fix names of memory segment config parameter Key: FLINK-2062 URL: https://issues.apache.org/jira/browse/FLINK-2062 Project: Flink Issue Type: Bug

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
I think it is fair to say that everything that Flink has in its core provides immutability. The mutability effect comes only if the user starts mutating objects across functions. The overhead will depend vastly on whether you are sending smaller records or large records. I see you are very keen o

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
I know it is nicer to have no-copy from a performance perspective, but a dataflow system with no immutability guarantee is something very hard to describe. Systems like Storm and Google Dataflow have immutablility guarantees I think for the same reason to provide very clear, easy to use semantics.

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Aljoscha Krettek
We should maybe run some benchmarks and see what the overhead of always running a copy between chained operators actually is. On Wed, May 20, 2015 at 3:45 PM, Stephan Ewen wrote: > A vote is the last resort. Consensus through discussion is much nicer. And > I think we are making progress. > > We

Re: Gelly Roadmap

2015-05-20 Thread Kostas Tzoumas
:-D Great! On Tue, May 19, 2015 at 4:00 PM, Andra Lungu wrote: > Hi Kostas, > > We're way ahead of you! The first draft of the blog post is internally > reviewed as we speak ;) > > > On Tue, May 19, 2015 at 3:49 PM, Kostas Tzoumas > wrote: > > > This is very cool! > > > > Would also love to se

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
A vote is the last resort. Consensus through discussion is much nicer. And I think we are making progress. We went for the lightweight version in the batch API, because - there are few cases that are affected (only functions with side effect state) - you can always switch lightweight -> failsafe

[jira] [Created] (FLINK-2061) CSVReader: quotedStringParsing and includeFields yields ParseException

2015-05-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2061: Summary: CSVReader: quotedStringParsing and includeFields yields ParseException Key: FLINK-2061 URL: https://issues.apache.org/jira/browse/FLINK-2061 Project: Flink

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
I would go for the Failsafe option as a default behaviour with a clearly documented lightweight (no-copy) setting, but I think having a Vote on this would be the proper way of settling this question. On Wed, May 20, 2015 at 3:37 PM, Aljoscha Krettek wrote: > I think that in the long run (maybe n

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Aljoscha Krettek
I think that in the long run (maybe not too long) we will have to change our stateful operators (windows, basically) to use managed memory and spill to disk. (Think jobs that have sliding windows over days or weeks) Then then the internal operators will take care of copying anyways. The problem Gyu

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
It does not mean we have to behave the same way, it is just an indication that well-defined behavior can allow you to mess things up. The question is now what is the default mode: - Failsafe/Heavy (always copy) - Performance/Lightweight (do not copy) On Wed, May 20, 2015 at 3:29 PM, Stephan Ew

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
This is something that we can clearly define as "should not be done". Systems do that. I think if you repeatedly emit (or mutate) the same object for example in Spark, you get an RDD with completely messed up contents. On Wed, May 20, 2015 at 3:27 PM, Gyula Fóra wrote: > If the preceding operato

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
If the preceding operator is emitting a mutated object, or does something with the output object afterwards then its a problem. Emitting the same object is a special case of this. On Wed, May 20, 2015 at 3:09 PM, Stephan Ewen wrote: > The case you are making is if a preceding operator in a chai

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
The case you are making is if a preceding operator in a chain is repeatedly emitting the same object, and the succeeding operator is gathering the objects, then it is a problem Or are there cases where the system itself repeatedly emits the same objects? On Wed, May 20, 2015 at 3:07 PM, Gyula Fór

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
We are designing a system for stateful stream computations, assuming long standing operators that gather and store data as the stream evolves (unlike in the dataset api). Many programs, like windowing, sampling etc hold the state in the form of past data. And without careful understanding of the ru

[jira] [Created] (FLINK-2060) Let maven fail on travis when modules reference to outdated internal dependency

2015-05-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2060: - Summary: Let maven fail on travis when modules reference to outdated internal dependency Key: FLINK-2060 URL: https://issues.apache.org/jira/browse/FLINK-2060 Proje

[jira] [Created] (FLINK-2059) Rename modules flink-compiler to flink-optimizer in pom.xml

2015-05-20 Thread Tamara (JIRA)
Tamara created FLINK-2059: - Summary: Rename modules flink-compiler to flink-optimizer in pom.xml Key: FLINK-2059 URL: https://issues.apache.org/jira/browse/FLINK-2059 Project: Flink Issue Type: Bug

RE: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Paris Carbone
@stephan I see your point. If we assume that operators do not hold references in their state to any transmitted records it works fine. We therefore need to make this clear to the users. I need to check if that would break semantics in SAMOA or other integrations as well that assume immutability.

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
"Copy before putting it into a window buffer and any other group buffer." Exactly my point. Any stateful operator should be able to implement something like this without having to worry about copying the object (and at this point the user would need to know whether it comes from the network to avo

[jira] [Created] (FLINK-2058) Hadoop Input Splits do not use proper UserCodeClassloader

2015-05-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2058: --- Summary: Hadoop Input Splits do not use proper UserCodeClassloader Key: FLINK-2058 URL: https://issues.apache.org/jira/browse/FLINK-2058 Project: Flink Issue T

[jira] [Created] (FLINK-2057) Remove IOReadableWritable interface from input splits

2015-05-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2057: --- Summary: Remove IOReadableWritable interface from input splits Key: FLINK-2057 URL: https://issues.apache.org/jira/browse/FLINK-2057 Project: Flink Issue Type:

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Stephan Ewen
I am curious why the copying is actually needed. In the batch API, we chain and do not copy and it is rather predictable. The cornerpoints of that design is to follow these rules: 1) Objects read from the network or any buffer are always new objects. That comes naturally when they are deseriali

[jira] [Created] (FLINK-2056) Add guide to create a chainable predictor in docs

2015-05-20 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2056: -- Summary: Add guide to create a chainable predictor in docs Key: FLINK-2056 URL: https://issues.apache.org/jira/browse/FLINK-2056 Project: Flink I

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Aljoscha Krettek
Yes, in fact I anticipated this. There is one central place where we can insert a copy step, in OperatorCollector in OutputHandler. On Wed, May 20, 2015 at 11:17 AM, Paris Carbone wrote: > I guess it was not intended ^^. > > Chaining should be transparent and not break the correct/expected behavi

Re: Problems building the current master

2015-05-20 Thread Stephan Ewen
Thanks for investigating this, Robert! On Tue, May 19, 2015 at 10:08 PM, Robert Metzger wrote: > Okay .. it seems that maven is downloading the flink-compiler artifacts > from a snapshots repository: > > [INFO] > > [INFO] B

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Paris Carbone
I guess it was not intended ^^. Chaining should be transparent and not break the correct/expected behaviour. Paris? On 20 May 2015, at 11:02, Márton Balassi wrote: +1 for copying. On May 20, 2015 10:50 AM, "Gyula Fóra" wrote: Hey, The latest streaming operator rework removed the copying of

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Márton Balassi
+1 for copying. On May 20, 2015 10:50 AM, "Gyula Fóra" wrote: > Hey, > > The latest streaming operator rework removed the copying of the outputs > before passing them to chained operators. This is a major break for the > previous operator semantics which guaranteed immutability. > > I think this

[jira] [Created] (FLINK-2055) Implement Streaming HBaseSink

2015-05-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2055: - Summary: Implement Streaming HBaseSink Key: FLINK-2055 URL: https://issues.apache.org/jira/browse/FLINK-2055 Project: Flink Issue Type: New Feature

[DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
Hey, The latest streaming operator rework removed the copying of the outputs before passing them to chained operators. This is a major break for the previous operator semantics which guaranteed immutability. I think this change leads to very indeterministic program behaviour from the user's persp

[jira] [Created] (FLINK-2054) StreamOperator rework removed copy calls when passing output to a chained operator

2015-05-20 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2054: - Summary: StreamOperator rework removed copy calls when passing output to a chained operator Key: FLINK-2054 URL: https://issues.apache.org/jira/browse/FLINK-2054 Project: F

[jira] [Created] (FLINK-2053) Preregister ML types for Kryo serialization

2015-05-20 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2053: Summary: Preregister ML types for Kryo serialization Key: FLINK-2053 URL: https://issues.apache.org/jira/browse/FLINK-2053 Project: Flink Issue Type: Improve