Re: Terminology: Split, Group and Partition

2017-01-15 Thread Fabian Hueske
Hi Robert, thanks for opening the ticket. Regarding injecting grouping or partitioning information, semantic annotations (forward fields) [1] is probably what you are looking for. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/batch/index.html#semantic-annotat

Re: Terminology: Split, Group and Partition

2017-01-14 Thread Robert Schmidtke
Hi Fabian, I have opened a ticket for that, thanks. I have another question: now that I have obtained the proper local grouping, I did some aggregation of type [T] -> U, where one aggregated object is of type U, containing information of zero or more Ts. The Us are still tied to the hostname, and

Re: Terminology: Split, Group and Partition

2017-01-13 Thread Fabian Hueske
I think so far getExecutionPlan() was only used for debugging purpose and not in programs that would also be executed. You can open a JIRA issue if you think that this would a valuable feature. Thanks, Fabian 2017-01-13 16:34 GMT+01:00 Robert Schmidtke : > Just a side note, I'm guessing there's

Re: Terminology: Split, Group and Partition

2017-01-13 Thread Robert Schmidtke
Just a side note, I'm guessing there's a bug here: https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/ContextEnvironment.java#L68 It should say createProgramPlan("unnamed job", false); Otherwise I'm getting an exception complaining that no new

Re: Terminology: Split, Group and Partition

2017-01-13 Thread Robert Schmidtke
Hi Fabian, thanks for the quick and comprehensive reply. I'll have a look at the ExecutionPlan using your suggestion to check what actually gets computed, and I'll use the properties as well. If I stumble across something else I'll let you know. Many thanks again! Robert On Fri, Jan 13, 2017 at

Re: Terminology: Split, Group and Partition

2017-01-13 Thread Fabian Hueske
Hi Robert, let me first describe what splits, groups, and partitions are. * Partition: This is basically all data that goes through the same task instance. If you have an operator with a parallelism of 80, you have 80 partitions. When you call sortPartition() you'll have 80 sorted streams, if you

Terminology: Split, Group and Partition

2017-01-13 Thread Robert Schmidtke
Hi all, I'm having some trouble grasping what the meaning of/difference between the following concepts is: - Split - Group - Partition Let me elaborate a bit on the problem I'm trying to solve here. In my tests I'm using a 5-node cluster, on which I'm running Flink 1.1.3 in standalone mode. Each