[jira] [Created] (FLINK-4035) Bump Kafka producer in Kafka sink to Kafka 0.10.0.0

2016-06-08 Thread Elias Levy (JIRA)
Elias Levy created FLINK-4035: - Summary: Bump Kafka producer in Kafka sink to Kafka 0.10.0.0 Key: FLINK-4035 URL: https://issues.apache.org/jira/browse/FLINK-4035 Project: Flink Issue Type: Bug

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-06-08 Thread Aljoscha Krettek
I think it would make sense to also move "State Backends" out from "Runtime". This is also quite complex on it's own. I would of course volunteer for this and I think Stephan, who is the current proposal for "Runtime" would also be good. On Wed, 8 Jun 2016 at 19:22 Stephan Ewen wrote: > I am add

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Alexander Alexandrov
> As far as I know, the reason why the broadcast variables are implemented that way is that the senders would have to know which sub-tasks are deployed to which TMs. As the broadcast variables are realized as additionally attached "broadcast channels", I am assuming that the same behavior will app

[jira] [Created] (FLINK-4034) Dependency convergence on com.101tec:zkclient and com.esotericsoftware.kryo:kryo

2016-06-08 Thread Vladislav Pernin (JIRA)
Vladislav Pernin created FLINK-4034: --- Summary: Dependency convergence on com.101tec:zkclient and com.esotericsoftware.kryo:kryo Key: FLINK-4034 URL: https://issues.apache.org/jira/browse/FLINK-4034

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-06-08 Thread Stephan Ewen
I am adding a dedicated component for "Checkpointing". It would include the checkpoint coordinator, barriers, threads, state handles and recovery. I think that part is big and complex enough to warrant its own shepherd. I would volunteer for that and be happy to also have a second shepherd. On Tu

[jira] [Created] (FLINK-4033) Missing Scala example snippets for the Kinesis Connector documentation

2016-06-08 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-4033: -- Summary: Missing Scala example snippets for the Kinesis Connector documentation Key: FLINK-4033 URL: https://issues.apache.org/jira/browse/FLINK-4033 Proj

AW: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
Hi Till, thanks for the fast answer. I'll think about a concrete way of implementing and open an JIRA. Best Andreas Von: Till Rohrmann Gesendet: Mittwoch, 8. Juni 2016 15:53 An: dev@flink.apache.org Betreff: Re: Broadcast data sent increases with # sl

[jira] [Created] (FLINK-4032) Replace all usage of Guava Preconditions

2016-06-08 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-4032: --- Summary: Replace all usage of Guava Preconditions Key: FLINK-4032 URL: https://issues.apache.org/jira/browse/FLINK-4032 Project: Flink Issue Type: Impr

Re: Future of Python support

2016-06-08 Thread Chesnay Schepler
Hello Julius, I don't think there is any real roadmap for the Python API, regardless of batch or streaming. Of the top of my head i can think of the following issue: The batch Python API makes heavy use of MapPartitions to transfer data in batches, I'm not sure how well this could be done f

Future of Python support

2016-06-08 Thread Julius Neuffer
Hi, I am interested in using Flink as part of a research project. We normally use python as a programming language. The python support for the Batch API is already quite good. But I couldn't find any information on the future roadmap regarding python support in Flink. Are there plans to add pytho

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Till Rohrmann
Hi Andreas, your observation is correct. The data is sent to each slot and the receiving TM only materializes one copy of the data. The rest of the data is discarded. As far as I know, the reason why the broadcast variables are implemented that way is that the senders would have to know which sub

[jira] [Created] (FLINK-4031) Nightly Jenkins job doesn't deploy sources

2016-06-08 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4031: - Summary: Nightly Jenkins job doesn't deploy sources Key: FLINK-4031 URL: https://issues.apache.org/jira/browse/FLINK-4031 Project: Flink Issue Type

[jira] [Created] (FLINK-4030) ScalaShellITCase

2016-06-08 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4030: - Summary: ScalaShellITCase Key: FLINK-4030 URL: https://issues.apache.org/jira/browse/FLINK-4030 Project: Flink Issue Type: Bug Components

Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
Hi, we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager. We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results bel

[jira] [Created] (FLINK-4029) Multi-field "sum" function just like "keyBy"

2016-06-08 Thread Rami (JIRA)
Rami created FLINK-4029: --- Summary: Multi-field "sum" function just like "keyBy" Key: FLINK-4029 URL: https://issues.apache.org/jira/browse/FLINK-4029 Project: Flink Issue Type: Improvement Co

Re: DataStream split/select behaviour

2016-06-08 Thread Till Rohrmann
Hi, the directed output via the split and select methods are indeed only available in the DataStream API. Thus, in order to achieve the same with the DataSet API, you would have to apply multiple filters, as you've already written. The result of the select call will only be sent to the same task