from:"Wei Tan"

Re: sparse x sparse matrix multiplication

2014-11-05 Thread Wei Tan

I think Xiangrui's ALS code implement certain aspect of it. You may want to check it out. Best regards, Wei ----- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center From: Xiangrui Meng To: Duy Huynh Cc: user Date: 11/05/2014

Re: CUDA in spark, especially in MLlib?

2014-08-28 Thread Wei Tan

Thank you Debasish. I am fine with either Scala or Java. I would like to get a quick evaluation on the performance gain, e.g., ALS on GPU. I would like to try whichever library does the business :) Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J

Re: CUDA in spark, especially in MLlib?

2014-08-27 Thread Wei Tan

Thank you all. Actually I was looking at JCUDA. Function wise this may be a perfect solution to offload computation to GPU. Will see how performance it will be, especially with the Java binding. Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J

CUDA in spark, especially in MLlib?

2014-08-26 Thread Wei Tan

Hi I am trying to find a CUDA library in Scala, to see if some matrix manipulation in MLlib can be sped up. I googled a few but found no active projects on Scala+CUDA. Python is supported by CUDA though. Any suggestion on whether this idea makes any sense? Best regards, Wei

Re: MLLib: implementing ALS with distributed matrix

2014-08-17 Thread Wei Tan

Hi Deb, thanks for sharing your result. Please find my comments inline in blue. Best regards, Wei From: Debasish Das To: Wei Tan/Watson/IBM@IBMUS, Cc: Xiangrui Meng , "user@spark.apache.org" Date: 08/17/2014 08:15 PM Subject:Re: MLLib: implementing ALS with d

Re: MLLib: implementing ALS with distributed matrix

2014-08-17 Thread Wei Tan

cently. Any idea on which method is better? Thanks! Wei ----- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Xiangrui Meng To: Wei Tan/Watson/IBM@IBMUS, Cc: "user@spark.apache.org&q

MLLib: implementing ALS with distributed matrix

2014-08-03 Thread Wei Tan

algorithm easier to implement. Does it make any sense? Best regards, Wei --------- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan

RE: executor-cores vs. num-executors

2014-07-16 Thread Wei Tan

Thanks for sharing your experience. I got the same experience -- multiple moderate JVMs beat a single huge JVM. Besides the minor JVM starting overhead, is it always better to have multiple JVMs rather than a single one? Best regards, Wei - Wei Tan, PhD

Re: parallel stages?

2014-07-15 Thread Wei Tan

- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Sean Owen To: user@spark.apache.org, Date: 07/15/2014 04:37 PM Subject:Re: parallel stages? The last two lines are what trigger the

parallel stages?

2014-07-15 Thread Wei Tan

wc2.saveAsTextFile("tables.out") Would the two reduceByKey stages run in parallel given sufficient capacity? Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan

Re: Recommended pipeline automation tool? Oozie?

2014-07-11 Thread Wei Tan

Just curious: how about using scala to drive the workflow? I guess if you use other tools (oozie, etc) you lose the advantage of reading from RDD -- you have to read from HDFS. Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research

Re: rdd.cache() is not faster?

2014-06-18 Thread Wei Tan

cache? I will try more workers so that each JVM has a smaller heap. Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Gaurav Jain To: u...@spark.incubator.apache.org

rdd.cache() is not faster?

2014-06-17 Thread Wei Tan

CPauseMillis=500 -XX:MaxPermSize=256m" Thanks, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan

Re: long GC pause during file.cache()

2014-06-16 Thread Wei Tan

BTW: nowadays a single machine with huge RAM (200G to 1T) is really common. With virtualization you lose some performance. It would be ideal to see some "best practice" on how to use Spark in these state-of-art machines... Best regards, Wei --------- We

Re: long GC pause during file.cache()

2014-06-16 Thread Wei Tan

Thanks you all for advice including (1) using CMS GC (2) use multiple worker instance and (3) use Tachyon. I will try (1) and (2) first and report back what I found. I will also try JDK 7 with G1 GC. Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T

long GC pause during file.cache()

2014-06-14 Thread Wei Tan

brary.path= -Xms180g -Xmx180g org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main Best regards, Wei --------- Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan

Re: How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Wei Tan

Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Krishna Sankar To: user@spark.apache.org, Date: 06/08/2014 11:19 AM Subject:Re: How to compile a Spark project in

Re: best practice: write and debug Spark application in scala-ide and maven

2014-06-07 Thread Wei Tan

Thank you all, Madhu, Gerard and Ryan. All your suggestions work. Personally I prefer running Spark locally in Eclipse for debugging purpose. Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person

best practice: write and debug Spark application in scala-ide and maven

2014-06-06 Thread Wei Tan

thout packaging and uploading jars. What is the best practice of writing a spark application (like wordcount) and debug quickly on a remote spark instance? Thanks! Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan

Re: reuse hadoop code in Spark

2014-06-05 Thread Wei Tan

to run it in Hadoop. It is fairly complex and relies on a lot of utility java classes I wrote. Can I reuse the map function in java and port it into Spark? Best regards, Wei - Wei Tan, PhD Research Staff Member IBM T. J. Watson Research Center http

reuse hadoop code in Spark

2014-06-04 Thread Wei Tan

Hello, I am trying to use spark in such a scenario: I have code written in Hadoop and now I try to migrate to Spark. The mappers and reducers are fairly complex. So I wonder if I can reuse the map() functions I already wrote in Hadoop (Java), and use Spark to chain them, mixing the Java ma

Re: sparse x sparse matrix multiplication

Re: CUDA in spark, especially in MLlib?

Re: CUDA in spark, especially in MLlib?

CUDA in spark, especially in MLlib?

Re: MLLib: implementing ALS with distributed matrix

Re: MLLib: implementing ALS with distributed matrix

MLLib: implementing ALS with distributed matrix

RE: executor-cores vs. num-executors

Re: parallel stages?

parallel stages?

Re: Recommended pipeline automation tool? Oozie?

Re: rdd.cache() is not faster?

rdd.cache() is not faster?

Re: long GC pause during file.cache()

Re: long GC pause during file.cache()

long GC pause during file.cache()

Re: How to compile a Spark project in Scala IDE for Eclipse?

Re: best practice: write and debug Spark application in scala-ide and maven

best practice: write and debug Spark application in scala-ide and maven

Re: reuse hadoop code in Spark

reuse hadoop code in Spark

21 matches

Site Navigation

Mail list logo

Footer information