Advice on [FLINK-2021]: Rework examples to use new ParameterTool

2015-09-03 Thread Behrouz Derakhshan
Hi, I had at look at this ticket FLINK-2021 , there isn't much to do from a technical stand point and it kinda makes sense to use the new "ParameterTool", since it is being used in most of the other part of the code base. The only question is do we

[jira] [Created] (FLINK-2618) ExternalSortITCase failure

2015-09-03 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2618: -- Summary: ExternalSortITCase failure Key: FLINK-2618 URL: https://issues.apache.org/jira/browse/FLINK-2618 Project: Flink Issue Type: Bug Reporter: Sa

Re: How to force the parallelism on small streams?

2015-09-03 Thread Matthias J. Sax
It's a valid argument and I am not against changing rebalance() with the formula you suggested. I just don't see it as a bug -- only a unfortunate behavior due to implementation details that only occurs on tiny data sets (which is not a target application). I will open a JIRA for it if there are

Re: How to force the parallelism on small streams?

2015-09-03 Thread Fabian Hueske
The purpose of rebalance() should be to rebalance the partitions of a data streams as evenly as possible, right? If all senders start sending data to the same receiver and there is less data in each partition than receivers, partitions are not evenly rebalanced. That is exactly the problem Arnaud r

Re: How to force the parallelism on small streams?

2015-09-03 Thread Matthias J. Sax
For rebalance() this makes sense. I don't think anything must be changed. For regular data, there is no such issues as for this very small data set. However for shuffle() I would expect that each source task uses a different shuffle pattern... -Matthias On 09/03/2015 03:28 PM, Fabian Hueske wrot

Re: How to force the parallelism on small streams?

2015-09-03 Thread Fabian Hueske
In case of rebalance(), all sources start the round-robin partitioning at index 0. Since each source emits only very few elements, only the first 15 mappers receive any input. It would be better to let each source start the round-robin partitioning at a different index, something like startIdx = (n

Re: How to force the parallelism on small streams?

2015-09-03 Thread Matthias J. Sax
If it would be only 14 elements, you are obviously right. However, if I understood Arnaud correctly, the problem is, that there are more than 14 elements: > Each of my 100 sources gives only a few lines (say 14 max) That would be about 140 lines in total. Using non-parallel source, he is able to

[jira] [Created] (FLINK-2617) ConcurrentModificationException when using HCatRecordReader to access a hive table

2015-09-03 Thread Arnaud Linz (JIRA)
Arnaud Linz created FLINK-2617: -- Summary: ConcurrentModificationException when using HCatRecordReader to access a hive table Key: FLINK-2617 URL: https://issues.apache.org/jira/browse/FLINK-2617 Project:

Re: Outer-join operator integration with DataSet API (FLINK-2576)

2015-09-03 Thread Fabian Hueske
Hi Johann, hi Ricky, Thanks for reaching out to the mailing list before taking action! I do also prefer option c. In principle, all inner join strategies can also be applied for all outer joins (for some hash strategies, a special HashTable implementation is required). I propose to add two method

Re: How to force the parallelism on small streams?

2015-09-03 Thread Aljoscha Krettek
Hi, I don't think it's a bug. If there are 100 sources that each emit only 14 elements then only the first 14 mappers will ever receive data. The round-robin distribution is not global, since the sources operate independently from each other. Cheers, Aljoscha On Wed, 2 Sep 2015 at 20:00 Matthias

[jira] [Created] (FLINK-2616) Failing Test: ZooKeeperLeaderElectionTest

2015-09-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2616: -- Summary: Failing Test: ZooKeeperLeaderElectionTest Key: FLINK-2616 URL: https://issues.apache.org/jira/browse/FLINK-2616 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2615) Multiple restarts of Local Cluster for a single program

2015-09-03 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2615: -- Summary: Multiple restarts of Local Cluster for a single program Key: FLINK-2615 URL: https://issues.apache.org/jira/browse/FLINK-2615 Project: Flink Issue Type:

[jira] [Created] (FLINK-2614) Scala Shell's default local execution mode is broken

2015-09-03 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2614: - Summary: Scala Shell's default local execution mode is broken Key: FLINK-2614 URL: https://issues.apache.org/jira/browse/FLINK-2614 Project: Flink

[jira] [Created] (FLINK-2613) Print usage information for Scala Shell

2015-09-03 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2613: - Summary: Print usage information for Scala Shell Key: FLINK-2613 URL: https://issues.apache.org/jira/browse/FLINK-2613 Project: Flink Issue Type: I

[jira] [Created] (FLINK-2612) ZooKeeperLeaderElectionITCase failure

2015-09-03 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2612: -- Summary: ZooKeeperLeaderElectionITCase failure Key: FLINK-2612 URL: https://issues.apache.org/jira/browse/FLINK-2612 Project: Flink Issue Type: Bug R

Re: Too many changed files building Flink web page

2015-09-03 Thread Matthias J. Sax
+1 for having two commits (if we don't agree on a unique version) However, according to the homepage, you can choose the version you want to install easily: http://jekyllrb.com/docs/installation/ > gem install jekyll -v '2.0.0.alpha.1' Or just build it from the sources. Should not be too difficu

Re: Too many changed files building Flink web page

2015-09-03 Thread Maximilian Michels
> What I also did in the past was to have two commits, one with the changes and > one with the content update. +1 We should always do this to keep the history readable. On Thu, Sep 3, 2015 at 10:50 AM, Ufuk Celebi wrote: > >> On 03 Sep 2015, at 09:56, Maximilian Michels wrote: >> >> Hi Matthia

[jira] [Created] (FLINK-2611) YARN reports failed final state for successful jobs

2015-09-03 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2611: - Summary: YARN reports failed final state for successful jobs Key: FLINK-2611 URL: https://issues.apache.org/jira/browse/FLINK-2611 Project: Flink Issue Typ

Re: Too many changed files building Flink web page

2015-09-03 Thread Ufuk Celebi
> On 03 Sep 2015, at 09:56, Maximilian Michels wrote: > > Hi Matthias, > > I'm totally with you on this issue. However, enforcing a strict > version is not a trivial thing. For some people, it might be difficult > to install a specific Jekyll version because of the dependencies on > libraries a

Re: Too many changed files building Flink web page

2015-09-03 Thread Maximilian Michels
Hi Matthias, I'm totally with you on this issue. However, enforcing a strict version is not a trivial thing. For some people, it might be difficult to install a specific Jekyll version because of the dependencies on libraries and Ruby versions that come with it. > On my system, version 2.2.0 is i