[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974346#comment-14974346 ]
ASF GitHub Bot commented on FLINK-7: ------------------------------------ Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43004309 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.java.functions; + +import org.apache.flink.api.common.distributions.DataDistribution; +import org.apache.flink.api.common.functions.RichMapPartitionFunction; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.util.Collector; + +import java.util.ArrayList; +import java.util.List; + +/** + * This mapPartition function require a DataSet with DataDistribution as broadcast input, it read + * target parallelism from parameter, build partition boundaries with input DataDistribution, then + * compute the range index for each record. + * + * @param <IN> The original data type. + * @param <K> The key type. + */ +public class AssignRangeIndex<IN, K extends Comparable<K>> + extends RichMapPartitionFunction<Tuple2<K, IN>, Tuple2<Integer, IN>> { + + private List<K> partitionBoundaries; + private int numberChannels; + + @Override + public void open(Configuration parameters) throws Exception { + this.numberChannels = parameters.getInteger("TargetParallelism", 1); + } + + @Override + public void mapPartition(Iterable<Tuple2<K, IN>> values, Collector<Tuple2<Integer, IN>> out) throws Exception { + + List<Object> broadcastVariable = getRuntimeContext().getBroadcastVariable("DataDistribution"); + if (broadcastVariable == null || broadcastVariable.size() != 1) { --- End diff -- Nevermind, I thought you were using a MapFunction, but its a MapPartitionFunction. So this is only done once. > [GitHub] Enable Range Partitioner > --------------------------------- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime > Reporter: GitHub Import > Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)