Hello im281,
The transformations equivalent to the first mapper would look like this
in Java:
.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
.filter(word -> Character.isUpperCase(word.charAt(0)))
.mapToPair(word -> new Tuple2<>(word, 1))
The second mapper would look more like this:
.flatMap(line -> {
Detector detector = new Detector();
return detector.GetClusters(line).stream()
.map(cluster -> {
String cKey = detector.WriteClusterKey(cluster);
String cValue = detector.WriteClusterValue(cluster);
return new Tuple2<>(cKey, cValue);
}).iterator();
})
This uses the Scala library class Tuple2.
Hope this helps.
Michal
On 02.12.2016 17:05, im281 wrote:
Here is a MapReduce Example implemented in Java.
It reads each line of text and for each word in the line of text determines
if it starts
with an upper case. If so, it creates a key value pair
public class CountUppercaseMapper
extends Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable lineNumber, Text line, Context context)
throws IOException, InterruptedException {
for (String word : line.toString().split(" ")) {
if (Character.isUpperCase(word.charAt(0))) {
context.write(new Text(word), new IntWritable(1));
}
}
}
}
What is the equivalent spark implementation?
A more use-case specific example below with objects:
In this case, the mapper emits multiple key:value pairs that are
(String,String)
What is the equivalent spark implementation?
import java.io.IOException;
public class IsotopeClusterMapper extends Mapper<LongWritable,
Text, Text, Text> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
System.out.println("Inside Isotope Cluster Map !");
String line = value.toString();
// Get Isotope clusters here are write out to text
Detector detector = new Detector();
ArrayList<IsotopeCluster> clusters = detector.GetClusters(line);
for (int i = 0; i < clusters.size(); i++) {
String cKey = detector.WriteClusterKey(clusters.get(i));
String cValue =
detector.WriteClusterValue(clusters.get(i));
context.write(new Text(cKey), new Text(cValue));
}
}
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-flatmap-to-multiple-key-value-pairs-tp28154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org