Re: RDD flatmap to multiple key/value pairs

Michal Šenkýř Fri, 02 Dec 2016 10:39:32 -0800

Hello im281,

The transformations equivalent to the first mapper would look like thisin Java:

  .flatMap(line -> Arrays.asList(line.split(" ")).iterator())
  .filter(word -> Character.isUpperCase(word.charAt(0)))
  .mapToPair(word -> new Tuple2<>(word, 1))


The second mapper would look more like this:
  .flatMap(line -> {
    Detector detector = new Detector();
    return detector.GetClusters(line).stream()
      .map(cluster -> {
        String cKey = detector.WriteClusterKey(cluster);
        String cValue = detector.WriteClusterValue(cluster);
        return new Tuple2<>(cKey, cValue);
      }).iterator();
  })

This uses the Scala library class Tuple2.

Hope this helps.

Michal


On 02.12.2016 17:05, im281 wrote:

Here is a MapReduce Example implemented in Java.
It reads each line of text and for each word in the line of text determines
if it starts
with an upper case. If so, it creates a key value pair

public class CountUppercaseMapper
     extends Mapper<LongWritable,Text,Text,IntWritable> {
   @Override
   protected void map(LongWritable lineNumber, Text line, Context context)
       throws IOException, InterruptedException {
     for (String word : line.toString().split(" ")) {
       if (Character.isUpperCase(word.charAt(0))) {
         context.write(new Text(word), new IntWritable(1));
       }
     }
   }
}

What is the equivalent spark implementation?

A more use-case specific example below with objects:

In this case, the mapper emits multiple key:value pairs that are
(String,String)

What is the equivalent spark implementation?

import java.io.IOException;

public class IsotopeClusterMapper extends Mapper<LongWritable,
Text, Text, Text> {

        @Override
        protected void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {
                System.out.println("Inside Isotope Cluster Map !");
                String line = value.toString();

                // Get Isotope clusters here are write out to text
                Detector detector = new Detector();
                
                ArrayList<IsotopeCluster> clusters = detector.GetClusters(line);

                for (int i = 0; i < clusters.size(); i++) {
                        String cKey = detector.WriteClusterKey(clusters.get(i));
                        String cValue = 
detector.WriteClusterValue(clusters.get(i));
                        context.write(new Text(cKey), new Text(cValue));
                }
        }
}








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-flatmap-to-multiple-key-value-pairs-tp28154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: RDD flatmap to multiple key/value pairs

Reply via email to