Re: mLIb solving linear regression with sparse inputs

2016-11-06 Thread im281


Thank you! Would happen to have this code in Java?.

This is extremely helpful!

Iman






On Sun, Nov 6, 2016 at 3:35 AM -0800, "Robineast [via Apache Spark User List]" 
 wrote:












Here’s a way of creating sparse vectors in MLLib:
import org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.rdd.RDD
val rdd = sc.textFile("A.txt").map(line => line.split(",")).     map(ary => 
(ary(0).toInt, ary(1).toInt, ary(2).toDouble))
val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))
val create = (first: (Int, Int, Double)) => (Array(first._2), 
Array(first._3))val combine = (head: (Array[Int], Array[Double]), tail: (Int, 
Int, Double)) => (head._1 :+ tail._2, head._2 :+ tail._3)val merge = (a: 
(Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => (a._1 ++ b._1, 
a._2 ++ b._2)
val A = pairRdd.combineByKey(create,combine,merge).map(el => 
Vectors.sparse(3,el._2._1,el._2._2))
If you have a separate file of b’s then you would need to manipulate this 
slightly to join the b’s to the A RDD and then create LabeledPoints. I guess 
there is a way of doing this using the newer ML interfaces but it’s not 
particularly obvious to me how.
One point: In the example you give the b’s are exactly the same as col 2 in the 
A matrix. I presume this is just a quick hacked together example because that 
would give a trivial result.

---Robin
 EastSpark GraphX in Action Michael Malak and Robin EastManning Publications 
Co.http://www.manning.com/books/spark-graphx-in-action






On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden email]> 
wrote:


I would like to use it. But how do I do the following

1) Read sparse data (from text or database)

2) pass the sparse data to the linearRegression class?


For example:


Sparse matrix A

row, column, value

0,0,.42

0,1,.28

0,2,.89

1,0,.83

1,1,.34

1,2,.42

2,0,.23

3,0,.42

3,1,.98

3,2,.88

4,0,.23

4,1,.36

4,2,.97


Sparse vector b

row, column, value

0,2,.89

1,2,.42

3,2,.88

4,2,.97


Solve Ax = b???













If you reply to this email, your message will be added to the 
discussion below:

http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html


To start a new topic under Apache Spark User List, email 
[hidden email] 

To unsubscribe from Apache Spark User List, click here.

NAML






Robin East 

Spark GraphX in Action Michael Malak and Robin East 

Manning Publications Co. 

http://www.manning.com/books/spark-graphx-in-action








If you reply to this email, your message will be added to the 
discussion below:

http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html



To unsubscribe from mLIb solving linear regression with sparse 
inputs, click here.

NAML









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28028.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: mLIb solving linear regression with sparse inputs

2016-11-06 Thread im281
Hi Robin,
It looks like the linear regression model takes in a dataset not a matrix?
It would be helpful for this example if you could set up the whole problem
end to end using one of the columns of the matrix as b. So A is a sparse
matrix and b is a sparse vector
Best regards.
Iman

On Sun, Nov 6, 2016 at 6:43 AM  wrote:

> Thank you! Would happen to have this code in Java?.
> This is extremely helpful!
>
>
> Iman
>
>
>
>
> On Sun, Nov 6, 2016 at 3:35 AM -0800, "Robineast [via Apache Spark User
> List]"  wrote:
>
> Here’s a way of creating sparse vectors in MLLib:
>
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.rdd.RDD
>
> val rdd = sc.textFile("A.txt").map(line => line.split(",")).
>  map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))
>
> val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))
>
> val create = (first: (Int, Int, Double)) => (Array(first._2),
> Array(first._3))
> val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int,
> Double)) => (head._1 :+ tail._2, head._2 :+ tail._3)
> val merge = (a: (Array[Int], Array[Double]), b: (Array[Int],
> Array[Double])) => (a._1 ++ b._1, a._2 ++ b._2)
>
> val A = pairRdd.combineByKey(create,combine,merge).map(el =>
> Vectors.sparse(3,el._2._1,el._2._2))
>
> If you have a separate file of b’s then you would need to manipulate this
> slightly to join the b’s to the A RDD and then create LabeledPoints. I
> guess there is a way of doing this using the newer ML interfaces but it’s
> not particularly obvious to me how.
>
> One point: In the example you give the b’s are exactly the same as col 2
> in the A matrix. I presume this is just a quick hacked together example
> because that would give a trivial result.
>
>
> ---
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=28027&i=0>> wrote:
>
> I would like to use it. But how do I do the following
> 1) Read sparse data (from text or database)
> 2) pass the sparse data to the linearRegression class?
>
> For example:
>
> Sparse matrix A
> row, column, value
> 0,0,.42
> 0,1,.28
> 0,2,.89
> 1,0,.83
> 1,1,.34
> 1,2,.42
> 2,0,.23
> 3,0,.42
> 3,1,.98
> 3,2,.88
> 4,0,.23
> 4,1,.36
> 4,2,.97
>
> Sparse vector b
> row, column, value
> 0,2,.89
> 1,2,.42
> 3,2,.88
> 4,2,.97
>
> Solve Ax = b???
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html
> To start a new topic under Apache Spark User List, email [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=28027&i=1>
> To unsubscribe from Apache Spark User List, click here.
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html
> To unsubscribe from mLIb solving linear regression with sparse inputs, click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=28006&code=aW1hbi5tb2h0YXNoZW1pQGdtYWlsLmNvbXwyODAwNnwtMTc1OTAxNjQz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28029.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: mLIb solving linear regression with sparse inputs

2016-11-06 Thread im281
Also in Java as well. Thanks again!
Iman

On Sun, Nov 6, 2016 at 8:28 AM Iman Mohtashemi 
wrote:

Hi Robin,
It looks like the linear regression model takes in a dataset not a matrix?
It would be helpful for this example if you could set up the whole problem
end to end using one of the columns of the matrix as b. So A is a sparse
matrix and b is a sparse vector
Best regards.
Iman

On Sun, Nov 6, 2016 at 6:43 AM  wrote:

Thank you! Would happen to have this code in Java?.
This is extremely helpful!


Iman




On Sun, Nov 6, 2016 at 3:35 AM -0800, "Robineast [via Apache Spark User
List]"  wrote:

Here’s a way of creating sparse vectors in MLLib:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.rdd.RDD

val rdd = sc.textFile("A.txt").map(line => line.split(",")).
 map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))

val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))

val create = (first: (Int, Int, Double)) => (Array(first._2),
Array(first._3))
val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double))
=> (head._1 :+ tail._2, head._2 :+ tail._3)
val merge = (a: (Array[Int], Array[Double]), b: (Array[Int],
Array[Double])) => (a._1 ++ b._1, a._2 ++ b._2)

val A = pairRdd.combineByKey(create,combine,merge).map(el =>
Vectors.sparse(3,el._2._1,el._2._2))

If you have a separate file of b’s then you would need to manipulate this
slightly to join the b’s to the A RDD and then create LabeledPoints. I
guess there is a way of doing this using the newer ML interfaces but it’s
not particularly obvious to me how.

One point: In the example you give the b’s are exactly the same as col 2 in
the A matrix. I presume this is just a quick hacked together example
because that would give a trivial result.

---
Robin East
*Spark GraphX in Action* Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action





On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] <[hidden email]
<http:///user/SendEmail.jtp?type=node&node=28027&i=0>> wrote:

I would like to use it. But how do I do the following
1) Read sparse data (from text or database)
2) pass the sparse data to the linearRegression class?

For example:

Sparse matrix A
row, column, value
0,0,.42
0,1,.28
0,2,.89
1,0,.83
1,1,.34
1,2,.42
2,0,.23
3,0,.42
3,1,.98
3,2,.88
4,0,.23
4,1,.36
4,2,.97

Sparse vector b
row, column, value
0,2,.89
1,2,.42
3,2,.88
4,2,.97

Solve Ax = b???



--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html
To start a new topic under Apache Spark User List, email [hidden email]
<http:///user/SendEmail.jtp?type=node&node=28027&i=1>
To unsubscribe from Apache Spark User List, click here.
NAML
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action


--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html
To unsubscribe from mLIb solving linear regression with sparse inputs, click
here
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=28006&code=aW1hbi5tb2h0YXNoZW1pQGdtYWlsLmNvbXwyODAwNnwtMTc1OTAxNjQz>
.
NAML
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28030.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

TallSkinnyQR

2016-11-07 Thread im281
I am getting the correct rows but they are out of order. Is this a bug or am
I doing something wrong?




public class CoordinateMatrixDemo {

public static void main(String[] args) {

//boiler plate needed to run locally  
SparkConf conf = new SparkConf().setAppName("Simple
Application").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);

SparkSession spark = SparkSession
.builder()
.appName("CoordinateMatrix")
.getOrCreate()
.newSession();

run(sc,"Data/sparsematrix.txt");
}


private static void run(JavaSparkContext sc, String file) {

//Read coordinate matrix from text or database
JavaRDD fileA = sc.textFile(file);

//map text file with coordinate data (sparse matrix) to
JavaRDD
JavaRDD matrixA = fileA.map(new Function() {
public MatrixEntry call(String x){
String[] indeceValue = x.split(",");
long i = Long.parseLong(indeceValue[0]);
long j = Long.parseLong(indeceValue[1]);
double value = 
Double.parseDouble(indeceValue[2]);
return new MatrixEntry(i, j, value );
}
});

//coordinate matrix from sparse data
CoordinateMatrix cooMatrixA = new 
CoordinateMatrix(matrixA.rdd());

//create block matrix
BlockMatrix matA = cooMatrixA.toBlockMatrix();

//create block matrix after matrix multiplication (square 
matrix)
BlockMatrix ata = matA.transpose().multiply(matA);

//print out the original dense matrix
System.out.println(matA.toLocalMatrix().toString());

//print out the transpose of the dense matrix
System.out.println(matA.transpose().toLocalMatrix().toString());

//print out the square matrix (after multiplication)
System.out.println(ata.toLocalMatrix().toString());

JavaRDD entries =
ata.toCoordinateMatrix().entries().toJavaRDD();



//QR decomposition DEMO
// Convert it to an IndexRowMatrix whose rows are sparse 
vectors.
IndexedRowMatrix indexedRowMatrix = 
cooMatrixA.toIndexedRowMatrix();

// Drop its row indices.
RowMatrix rowMat = indexedRowMatrix.toRowMatrix();

// QR decomposition 
*QRDecomposition result = 
rowMat.tallSkinnyQR(true);*

*System.out.println("Q: " + result.Q().toBreeze().toString());*
System.out.println("R: " + result.R().toString());

   
Vector[] collectPartitions = (Vector[]) 
result.Q().rows().collect();

System.out.println("Q factor is:");
for (Vector vector : collectPartitions) {
  System.out.println("\t" + vector);
}






//compute Qt
//need to compute d = Qt*b where b is the experimental
//Then solve for d using Gaussian elimination

//Extract Q values and create matrix
//TODO:! The array will be HUGE
String Qm = result.Q().toBreeze().toString();
String[] Qmatrix = Qm.split("\\s+");

int rows = (int)result.Q().numRows();
int cols = (int)result.Q().numCols();

try {
PrintWriter pw = new PrintWriter("Data/qMatrix.txt");
pw.write(Qm);
pw.close();

PrintWriter pw1 = new PrintWriter("Data/qMatrix1.txt");
//write coordinate matrix to file
int k = 0;
for(int i = 0; i < (int)result.Q().numRows();i++){
for(int j = 0; j < 
(int)result.Q().numCols();j++){
pw1.println(i + "," + j + "," + 
Qmatrix[k]);
k++;
}
}
pw1.close();

} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

//Read coordinate matrix from text or database
JavaRDD fileQ = sc.textFile("Data/qMatrix1.txt");

  

RDD flatmap to multiple key/value pairs

2016-12-02 Thread im281
Here is a MapReduce Example implemented in Java. 
It reads each line of text and for each word in the line of text determines
if it starts 
with an upper case. If so, it creates a key value pair

public class CountUppercaseMapper
extends Mapper {
  @Override
  protected void map(LongWritable lineNumber, Text line, Context context)
  throws IOException, InterruptedException {
for (String word : line.toString().split(" ")) {
  if (Character.isUpperCase(word.charAt(0))) {
context.write(new Text(word), new IntWritable(1));
  }
}
  }
}

What is the equivalent spark implementation?

A more use-case specific example below with objects:

In this case, the mapper emits multiple key:value pairs that are
(String,String)

What is the equivalent spark implementation?

import java.io.IOException;

public class IsotopeClusterMapper extends Mapper {

@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
System.out.println("Inside Isotope Cluster Map !");
String line = value.toString();

// Get Isotope clusters here are write out to text
Detector detector = new Detector();

ArrayList clusters = detector.GetClusters(line);

for (int i = 0; i < clusters.size(); i++) {
String cKey = detector.WriteClusterKey(clusters.get(i));
String cValue = 
detector.WriteClusterValue(clusters.get(i));
context.write(new Text(cKey), new Text(cValue));
}
}
}








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-flatmap-to-multiple-key-value-pairs-tp28154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Running spark from Eclipse and then Jar

2016-12-07 Thread im281
Hello,
I have a simple word count example in Java and I can run this in Eclipse
(code at the bottom)

I then create a jar file from it and try to run it from the cmd


java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt

But I get this error?

I think the main error is:
*Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: text*

Any advise on how to run this jar file in spark would be appreciated


Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2
16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner
16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner
16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to:
16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to:
16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(Owner); groups
with view permissions: Set(); users  with modify permissions: Set(Owner);
groups with modify permissions: Set()
16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' on
port 10211.
16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker
16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster
16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at
C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc-44fd-a75e-1a0450556873
16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity 1984.5
MB
16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator
16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on port
4040.
16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://192.168.19.2:4040
16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host
localhost
16/12/07 15:16:45 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252.
16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on
192.168.19.2:10252
16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.19.2, 10252)
16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block manager
192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver, 192.168.19.2,
10252)
16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.19.2, 10252)
16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some
configuration may not take effect.
16/12/07 15:16:46 INFO SharedState: Warehouse path is
'file:/C:/Users/Owner/spark-warehouse'.
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: text. Please find packages at
https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at
org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at
org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at
org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at
org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504)
at
org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540)
at
org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513)
at JavaWordCount.main(JavaWordCount.java:57)
Caused by: java.lang.ClassNotFoundException: text.DefaultSource
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at
org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
... 8 more
16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook
16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at
http://192.168.19.2:4040
16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared
16/12/07

flatmap pair

2016-12-08 Thread im281

The class 'Detector' has a function 'detectFeature(cluster) 
However, the method has changed to return a list of features as opposed to
one feature as it is below.
How do I change this so it returns a list of feature objects instead


// creates key-value pairs for Isotope cluster ID and Isotope cluster
// groups them by keys and passes
// each collection of isotopes to the feature detector
JavaPairRDD features =
scans.flatMapToPair(keyData).groupByKey().mapValues(values -> {

System.out.println("Inside Feature Detection Reduce !");
Detector d = new Detector();
ArrayList clusters = 
(ArrayList)
StreamSupport
.stream(values.spliterator(),
false).map(d::GetClustersfromKey).collect(Collectors.toList());
return d.WriteFeatureValue(d.DetectFeature(clusters));
});



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/flatmap-pair-tp28187.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org