Hi, 

I’m quite new to Spark and MR, but have a requirement to get all distinct 
values with their respective counts from a transactional file. Let’s assume the 
following file format:

0 1 2 3 4 5 6 7
1 3 4 5 8 9
9 10 11 12 13 14 15 16 17 18
1 4 7 11 12 13 19 20
3 4 7 11 15 20 21 22 23
1 2 5 9 11 12 16

Given this, I would like an ArrayList<String, Integer> back, where the String 
is the item identifier and the Integer the count of that item identifier in the 
file. The following is what I came up with to map the values, but can’t figure 
out how to do the counting :(

// create RDD of an arraylist of strings

JavaRDD<ArrayList<String>> transactions = sc.textFile(dataPath).map(

new Function<String, ArrayList<String>>() {

private static final long serialVersionUID = 1L;

@Override

public ArrayList<String> call(String s) {

return Lists.newArrayList(s.split(" "));

}

}

);


Any ideas?

Thanks!
Patrick

Reply via email to