I think Xiangrui's ALS code implement certain aspect of it. You may want to
check it out.
Best regards,
Wei
-----
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
From: Xiangrui Meng
To: Duy Huynh
Cc: user
Date: 11/05/2014
Thank you Debasish.
I am fine with either Scala or Java. I would like to get a quick
evaluation on the performance gain, e.g., ALS on GPU. I would like to try
whichever library does the business :)
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J
Thank you all. Actually I was looking at JCUDA. Function wise this may be
a perfect solution to offload computation to GPU. Will see how performance
it will be, especially with the Java binding.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J
Hi I am trying to find a CUDA library in Scala, to see if some matrix
manipulation in MLlib can be sped up.
I googled a few but found no active projects on Scala+CUDA. Python is
supported by CUDA though. Any suggestion on whether this idea makes any
sense?
Best regards,
Wei
Hi Deb, thanks for sharing your result. Please find my comments inline in
blue.
Best regards,
Wei
From: Debasish Das
To: Wei Tan/Watson/IBM@IBMUS,
Cc: Xiangrui Meng , "user@spark.apache.org"
Date: 08/17/2014 08:15 PM
Subject:Re: MLLib: implementing ALS with d
cently. Any idea on which
method is better?
Thanks!
Wei
-----
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From: Xiangrui Meng
To: Wei Tan/Watson/IBM@IBMUS,
Cc: "user@spark.apache.org&q
algorithm easier to implement.
Does it make any sense?
Best regards,
Wei
---------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
Thanks for sharing your experience. I got the same experience -- multiple
moderate JVMs beat a single huge JVM.
Besides the minor JVM starting overhead, is it always better to have
multiple JVMs rather than a single one?
Best regards,
Wei
-
Wei Tan, PhD
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From: Sean Owen
To: user@spark.apache.org,
Date: 07/15/2014 04:37 PM
Subject:Re: parallel stages?
The last two lines are what trigger the
wc2.saveAsTextFile("tables.out")
Would the two reduceByKey stages run in parallel given sufficient
capacity?
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
Just curious: how about using scala to drive the workflow? I guess if you
use other tools (oozie, etc) you lose the advantage of reading from RDD --
you have to read from HDFS.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research
cache?
I will try more workers so that each JVM has a smaller heap.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From: Gaurav Jain
To: u...@spark.incubator.apache.org
CPauseMillis=500
-XX:MaxPermSize=256m"
Thanks,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
BTW: nowadays a single machine with huge RAM (200G to 1T) is really
common. With virtualization you lose some performance. It would be ideal
to see some "best practice" on how to use Spark in these state-of-art
machines...
Best regards,
Wei
---------
We
Thanks you all for advice including (1) using CMS GC (2) use multiple
worker instance and (3) use Tachyon.
I will try (1) and (2) first and report back what I found.
I will also try JDK 7 with G1 GC.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T
brary.path= -Xms180g -Xmx180g
org.apache.spark.deploy.SparkSubmit spark-shell --class
org.apache.spark.repl.Main
Best regards,
Wei
---------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From: Krishna Sankar
To: user@spark.apache.org,
Date: 06/08/2014 11:19 AM
Subject:Re: How to compile a Spark project in
Thank you all, Madhu, Gerard and Ryan. All your suggestions work.
Personally I prefer running Spark locally in Eclipse for debugging
purpose.
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person
thout packaging and
uploading jars.
What is the best practice of writing a spark application (like wordcount)
and debug quickly on a remote spark instance?
Thanks!
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
to run it in Hadoop. It is fairly complex and relies on
a lot of utility java classes I wrote. Can I reuse the map function in
java and port it into Spark?
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http
Hello,
I am trying to use spark in such a scenario:
I have code written in Hadoop and now I try to migrate to Spark. The
mappers and reducers are fairly complex. So I wonder if I can reuse the
map() functions I already wrote in Hadoop (Java), and use Spark to chain
them, mixing the Java ma
21 matches
Mail list logo