I have a different question that might be trivial for you (although not to
me :)) Maybe you can answer this?

Here is a MapReduce Example implemented in Java.
It reads each line of text and for each word in the line of text determines
if it starts
with an upper case. If so, it creates a key value pair. But in this case
one line of text can emit multiple key/value pairs so I can't use the map
function which just returns a single Tuple2:



public class CountUppercaseMapper
    extends Mapper<LongWritable,Text,Text,IntWritable> {
  @Override
  protected void map(LongWritable lineNumber, Text line, Context context)
      throws IOException, InterruptedException {
    for (String word : line.toString().split(" ")) {
      if (Character.isUpperCase(word.charAt(0))) {
        context.write(new Text(word), new IntWritable(1));
      }
    }
  }
}

What is the equivalent spark implementation?

A more use-case specific example below with objects:

In this case, the mapper emits multiple key:value pairs that are
(String,String)

What is the equivalent spark implementation?

import java.io.IOException;

public class IsotopeClusterMapper extends Mapper<LongWritable,
Text, Text, Text> {

        @Override
        protected void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {
                System.out.println("Inside Isotope Cluster Map !");
                String line = value.toString();

                // Get Isotope clusters here are write out to text
                Detector detector = new Detector();

                ArrayList<IsotopeCluster> clusters =
detector.GetClusters(line);

                for (int i = 0; i < clusters.size(); i++) {
                        String cKey =
detector.WriteClusterKey(clusters.get(i));
                        String cValue =
detector.WriteClusterValue(clusters.get(i));
                        context.write(new Text(cKey), new Text(cValue));
                }
        }
}



On Fri, Dec 2, 2016 at 8:23 AM Iman Mohtashemi <iman.mohtash...@gmail.com>
wrote:

> Ok thanks.
>
> On Fri, Dec 2, 2016 at 8:19 AM Sean Owen <so...@cloudera.com> wrote:
>
> I tried, but enforcing the ordering changed a fair bit of behavior and I
> gave up. I think the way to think of it is: a RowMatrix has whatever
> ordering you made it with, so you need to give it ordered rows if you're
> going to use a method like the QR decomposition. That works. I don't think
> the QR method should ever have been on this class though, for this reason.
>
> On Fri, Dec 2, 2016 at 4:13 PM Iman Mohtashemi <iman.mohtash...@gmail.com>
> wrote:
>
> Hi guys,
> Was this bug ever resolved?
> Iman
>
> On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi <iman.mohtash...@gmail.com>
> wrote:
>
> Yes this would be helpful, otherwise the Q part of the decomposition is
> useless. One can use that to solve the system by transposing it and
> multiplying with b and solving for x  (Ax = b) where A = R and b = Qt*b
> since the Upper triangular matrix is correctly available (R)
>
> On Fri, Nov 11, 2016 at 3:56 AM Sean Owen <so...@cloudera.com> wrote:
>
> @Xiangrui / @Joseph, do you think it would be reasonable to have
> CoordinateMatrix sort the rows it creates to make an IndexedRowMatrix? in
> order to make the ultimate output of toRowMatrix less surprising when it's
> not ordered?
>
>
> On Tue, Nov 8, 2016 at 3:29 PM Sean Owen <so...@cloudera.com> wrote:
>
> I think the problem here is that IndexedRowMatrix.toRowMatrix does *not*
> result in a RowMatrix with rows in order of their indices, necessarily:
>
>
> // Drop its row indices.
> RowMatrix rowMat = indexedRowMatrix.toRowMatrix();
>
> What you get is a matrix where the rows are arranged in whatever order
> they were passed to IndexedRowMatrix. RowMatrix says it's for rows where
> the ordering doesn't matter, but then it's maybe surprising it has a QR
> decomposition method, because clearly the result depends on the order of
> rows in the input. (CC Yuhao Yang for a comment?)
>
> You could say, well, why doesn't IndexedRowMatrix.toRowMatrix return at
> least something with sorted rows? that would not be hard. It also won't
> return "missing" rows (all zeroes), so it would not in any event result in
> a RowMatrix whose implicit rows and ordering represented the same matrix.
> That, at least, strikes me as something to be better documented.
>
> Maybe it would be nicer still to at least sort the rows, given the
> existence of use cases like yours. For example, at least
> CoordinateMatrix.toIndexedRowMatrix could sort? that is less surprising.
>
> In any event you should be able to make it work by manually getting the
> RDD[IndexedRow] out of IndexedRowMatrix, sorting by index, then mapping it
> to Vectors and making a RowMatrix from it.
>
>
>
> On Tue, Nov 8, 2016 at 2:41 PM Iman Mohtashemi <iman.mohtash...@gmail.com>
> wrote:
>
> Hi Sean,
> Here you go:
>
> sparsematrix.txt =
>
> row, col ,val
> 0,0,.42
> 0,1,.28
> 0,2,.89
> 1,0,.83
> 1,1,.34
> 1,2,.42
> 2,0,.23
> 3,0,.42
> 3,1,.98
> 3,2,.88
> 4,0,.23
> 4,1,.36
> 4,2,.97
>
> The vector is just the third column of the matrix which should give the
> trivial solution of [0,0,1]
>
> This translates to this which is correct
> There are zeros in the matrix (Not really sparse but just an example)
> 0.42  0.28  0.89
> 0.83  0.34  0.42
> 0.23  0.0   0.0
> 0.42  0.98  0.88
> 0.23  0.36  0.97
>
>
> Here is what I get for  the Q and R
>
> Q: -0.21470961288429483  0.23590615093828807   0.6784910613691661
> -0.3920784235278427   -0.06171221388256143  0.5847874866876442
> -0.7748216464954987   -0.4003560542230838   -0.29392323671555354
> -0.3920784235278427   0.8517909521421976    -0.31435038559403217
> -0.21470961288429483  -0.23389547730301666  -0.11165321782745863
> R: -1.0712142642814275  -0.8347536340918976  -1.227672225670157
> 0.0                  0.7662808691141717   0.7553315911660984
> 0.0                  0.0                  0.7785210939368136
>
> When running this in matlab the numbers are the same but row 1 is the last
> row and the last row is interchanged with row 3
>
>
>
> On Mon, Nov 7, 2016 at 11:35 PM Sean Owen <so...@cloudera.com> wrote:
>
> Rather than post a large section of code, please post a small example of
> the input matrix and its decomposition, to illustrate what you're saying is
> out of order.
>
> On Tue, Nov 8, 2016 at 3:50 AM im281 <iman.mohtash...@gmail.com> wrote:
>
> I am getting the correct rows but they are out of order. Is this a bug or
> am
> I doing something wrong?
>
>
>

Reply via email to