The Roadmap is now available as a wiki page.
https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly
We're still happy to hear comments/suggestions, if any :)
On Wed, May 20, 2015 at 8:43 PM, Andra Lungu wrote:
> Thanks, Stephan!
>
> On Wed, May 20, 2015 at 8:42 PM, Stephan Ewen wrote:
>
Thanks, Stephan!
On Wed, May 20, 2015 at 8:42 PM, Stephan Ewen wrote:
> All right, you should have permissions now.
>
> On Wed, May 20, 2015 at 8:37 PM, Andra Lungu
> wrote:
>
> > Sure, but first I need permissions! :)
> >
> > "*NOTE*: Due to spamming, we can not give every confluence user edit
All right, you should have permissions now.
On Wed, May 20, 2015 at 8:37 PM, Andra Lungu wrote:
> Sure, but first I need permissions! :)
>
> "*NOTE*: Due to spamming, we can not give every confluence user edit
> permissions to the wiki. Just write to the dev@flink.apache.org (you can
> also emai
Sure, but first I need permissions! :)
"*NOTE*: Due to spamming, we can not give every confluence user edit
permissions to the wiki. Just write to the dev@flink.apache.org (you can
also email to rmetzger apache.org) mailing list to get edit
permissions."
My user is lungu.andra
Thanks!
On Wed,
Stephan Ewen created FLINK-2066:
---
Summary: Make delay between execution retries configurable
Key: FLINK-2066
URL: https://issues.apache.org/jira/browse/FLINK-2066
Project: Flink
Issue Type: Imp
Robert Metzger created FLINK-2065:
-
Summary: Cancelled jobs finish with final state FAILED
Key: FLINK-2065
URL: https://issues.apache.org/jira/browse/FLINK-2065
Project: Flink
Issue Type: Bug
Tez has just announced the availability of version 0.6.1.
Maybe that version is more stable. I've filed a jira for upgrading the
version: https://issues.apache.org/jira/browse/FLINK-2064
On Sun, May 17, 2015 at 12:04 PM, Robert Metzger
wrote:
> I saw this failure also multiple times now.
> This
Robert Metzger created FLINK-2064:
-
Summary: Set Tez version to 0.6.1
Key: FLINK-2064
URL: https://issues.apache.org/jira/browse/FLINK-2064
Project: Flink
Issue Type: Task
Component
Stephan Ewen created FLINK-2063:
---
Summary: Streaming checkpoints consider only input and output
vertices
Key: FLINK-2063
URL: https://issues.apache.org/jira/browse/FLINK-2063
Project: Flink
Is
That's fine, you convinced me ;-)
And given a flag to deactivate it, I think it should be okay for everyone.
Once we have proper serialized window buffers, the number of copies should
go down quite a bit anyways...
On Wed, May 20, 2015 at 4:29 PM, Gyula Fóra wrote:
> This is not about me, plea
This is not about me, please don't get me wrong :)
It would be good if other people would tell their opinions as well.
I am just trying to make the point that other systems do this as well for a
reason. Users are used to this abstraction.
On Wed, May 20, 2015 at 4:18 PM, Stephan Ewen wrote:
> I
Thank you for your feedback and ideas everyone!
@Andra, how about moving the roadmap to the wiki?
On 20 May 2015 at 15:48, Kostas Tzoumas wrote:
> :-D
>
> Great!
>
> On Tue, May 19, 2015 at 4:00 PM, Andra Lungu
> wrote:
>
> > Hi Kostas,
> >
> > We're way ahead of you! The first draft of the bl
Stephan Ewen created FLINK-2062:
---
Summary: Fix names of memory segment config parameter
Key: FLINK-2062
URL: https://issues.apache.org/jira/browse/FLINK-2062
Project: Flink
Issue Type: Bug
I think it is fair to say that everything that Flink has in its core
provides immutability. The mutability effect comes only if the user starts
mutating objects across functions.
The overhead will depend vastly on whether you are sending smaller records
or large records.
I see you are very keen o
I know it is nicer to have no-copy from a performance perspective, but a
dataflow system with no immutability guarantee is something very hard to
describe.
Systems like Storm and Google Dataflow have immutablility guarantees I
think for the same reason to provide very clear, easy to use semantics.
We should maybe run some benchmarks and see what the overhead of
always running a copy between chained operators actually is.
On Wed, May 20, 2015 at 3:45 PM, Stephan Ewen wrote:
> A vote is the last resort. Consensus through discussion is much nicer. And
> I think we are making progress.
>
> We
:-D
Great!
On Tue, May 19, 2015 at 4:00 PM, Andra Lungu wrote:
> Hi Kostas,
>
> We're way ahead of you! The first draft of the blog post is internally
> reviewed as we speak ;)
>
>
> On Tue, May 19, 2015 at 3:49 PM, Kostas Tzoumas
> wrote:
>
> > This is very cool!
> >
> > Would also love to se
A vote is the last resort. Consensus through discussion is much nicer. And
I think we are making progress.
We went for the lightweight version in the batch API, because
- there are few cases that are affected (only functions with side effect
state)
- you can always switch lightweight -> failsafe
Fabian Hueske created FLINK-2061:
Summary: CSVReader: quotedStringParsing and includeFields yields
ParseException
Key: FLINK-2061
URL: https://issues.apache.org/jira/browse/FLINK-2061
Project: Flink
I would go for the Failsafe option as a default behaviour with a clearly
documented lightweight (no-copy) setting, but I think having a Vote on this
would be the proper way of settling this question.
On Wed, May 20, 2015 at 3:37 PM, Aljoscha Krettek
wrote:
> I think that in the long run (maybe n
I think that in the long run (maybe not too long) we will have to
change our stateful operators (windows, basically) to use managed
memory and spill to disk. (Think jobs that have sliding windows over
days or weeks) Then then the internal operators will take care of
copying anyways. The problem Gyu
It does not mean we have to behave the same way, it is just an indication
that well-defined behavior can allow you to mess things up.
The question is now what is the default mode:
- Failsafe/Heavy (always copy)
- Performance/Lightweight (do not copy)
On Wed, May 20, 2015 at 3:29 PM, Stephan Ew
This is something that we can clearly define as "should not be done".
Systems do that.
I think if you repeatedly emit (or mutate) the same object for example in
Spark, you get an RDD with completely messed up contents.
On Wed, May 20, 2015 at 3:27 PM, Gyula Fóra wrote:
> If the preceding operato
If the preceding operator is emitting a mutated object, or does something
with the output object afterwards then its a problem.
Emitting the same object is a special case of this.
On Wed, May 20, 2015 at 3:09 PM, Stephan Ewen wrote:
> The case you are making is if a preceding operator in a chai
The case you are making is if a preceding operator in a chain is repeatedly
emitting the same object, and the succeeding operator is gathering the
objects, then it is a problem
Or are there cases where the system itself repeatedly emits the same
objects?
On Wed, May 20, 2015 at 3:07 PM, Gyula Fór
We are designing a system for stateful stream computations, assuming long
standing operators that gather and store data as the stream evolves (unlike
in the dataset api). Many programs, like windowing, sampling etc hold the
state in the form of past data. And without careful understanding of the
ru
Robert Metzger created FLINK-2060:
-
Summary: Let maven fail on travis when modules reference to
outdated internal dependency
Key: FLINK-2060
URL: https://issues.apache.org/jira/browse/FLINK-2060
Proje
Tamara created FLINK-2059:
-
Summary: Rename modules flink-compiler to flink-optimizer in
pom.xml
Key: FLINK-2059
URL: https://issues.apache.org/jira/browse/FLINK-2059
Project: Flink
Issue Type: Bug
@stephan I see your point. If we assume that operators do not hold references
in their state to any transmitted records it works fine. We therefore need to
make this clear to the users. I need to check if that would break semantics in
SAMOA or other integrations as well that assume immutability.
"Copy before putting it into a window buffer and any other group buffer."
Exactly my point. Any stateful operator should be able to implement
something like this without having to worry about copying the object (and
at this point the user would need to know whether it comes from the network
to avo
Stephan Ewen created FLINK-2058:
---
Summary: Hadoop Input Splits do not use proper UserCodeClassloader
Key: FLINK-2058
URL: https://issues.apache.org/jira/browse/FLINK-2058
Project: Flink
Issue T
Stephan Ewen created FLINK-2057:
---
Summary: Remove IOReadableWritable interface from input splits
Key: FLINK-2057
URL: https://issues.apache.org/jira/browse/FLINK-2057
Project: Flink
Issue Type:
I am curious why the copying is actually needed.
In the batch API, we chain and do not copy and it is rather predictable.
The cornerpoints of that design is to follow these rules:
1) Objects read from the network or any buffer are always new objects.
That comes naturally when they are deseriali
Theodore Vasiloudis created FLINK-2056:
--
Summary: Add guide to create a chainable predictor in docs
Key: FLINK-2056
URL: https://issues.apache.org/jira/browse/FLINK-2056
Project: Flink
I
Yes, in fact I anticipated this. There is one central place where we
can insert a copy step, in OperatorCollector in OutputHandler.
On Wed, May 20, 2015 at 11:17 AM, Paris Carbone wrote:
> I guess it was not intended ^^.
>
> Chaining should be transparent and not break the correct/expected behavi
Thanks for investigating this, Robert!
On Tue, May 19, 2015 at 10:08 PM, Robert Metzger
wrote:
> Okay .. it seems that maven is downloading the flink-compiler artifacts
> from a snapshots repository:
>
> [INFO]
>
> [INFO] B
I guess it was not intended ^^.
Chaining should be transparent and not break the correct/expected behaviour.
Paris?
On 20 May 2015, at 11:02, Márton Balassi wrote:
+1 for copying.
On May 20, 2015 10:50 AM, "Gyula Fóra" wrote:
Hey,
The latest streaming operator rework removed the copying of
+1 for copying.
On May 20, 2015 10:50 AM, "Gyula Fóra" wrote:
> Hey,
>
> The latest streaming operator rework removed the copying of the outputs
> before passing them to chained operators. This is a major break for the
> previous operator semantics which guaranteed immutability.
>
> I think this
Robert Metzger created FLINK-2055:
-
Summary: Implement Streaming HBaseSink
Key: FLINK-2055
URL: https://issues.apache.org/jira/browse/FLINK-2055
Project: Flink
Issue Type: New Feature
Hey,
The latest streaming operator rework removed the copying of the outputs
before passing them to chained operators. This is a major break for the
previous operator semantics which guaranteed immutability.
I think this change leads to very indeterministic program behaviour from
the user's persp
Gyula Fora created FLINK-2054:
-
Summary: StreamOperator rework removed copy calls when passing
output to a chained operator
Key: FLINK-2054
URL: https://issues.apache.org/jira/browse/FLINK-2054
Project: F
Till Rohrmann created FLINK-2053:
Summary: Preregister ML types for Kryo serialization
Key: FLINK-2053
URL: https://issues.apache.org/jira/browse/FLINK-2053
Project: Flink
Issue Type: Improve
42 matches
Mail list logo