Re: Using CUDA within Spark / boosting linear algebra

2015-03-25 Thread Dmitriy Lyubimov
roviding another library. Sam, please suggest if there >> is another way. >> >> >> >> *From:* Dmitriy Lyubimov [mailto:dlie...@gmail.com] >> *Sent:* Wednesday, March 25, 2015 2:55 PM >> *To:* Ulanov, Alexander >> *Cc:* Sam Halliday; dev@spark.apache.or

Re: Using CUDA within Spark / boosting linear algebra

2015-03-25 Thread Dmitriy Lyubimov
Alexander, does using netlib imply that one cannot switch between CPU and GPU blas alternatives at will at the same time? the choice is always determined by linking aliternatives to libblas.so, right? On Wed, Mar 25, 2015 at 2:31 PM, Ulanov, Alexander wrote: > Hi again, > > I finally managed to

Re: renaming SchemaRDD -> DataFrame

2015-01-27 Thread Dmitriy Lyubimov
It has been pretty evident for some time that's what it is, hasn't it? Yes that's a better name IMO. On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin wrote: > Hi, > > We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to > get the community's opinion. > > The context is that Sche

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Dmitriy Lyubimov
Thanks. let me check this hypothesis (i have dhcp connection on a private net but consequently not sure if there's an inverse). On Thu, Aug 7, 2014 at 10:29 AM, Madhu wrote: > How long does it take to get a spark context? > I found that if you don't have a network connection (reverse DNS looku

Unit test best practice for Spark-derived projects

2014-08-05 Thread Dmitriy Lyubimov
Hello, I 've been switching Mahout from Spark 0.9 to Spark 1.0.x [1] and noticed that tests now run much slower compared to 0.9 with CPU running idle most of the time. I had to conclude that most of that time is spent on tearing down/resetting Spark context which apparently now takes significantly

"log" overloaded in SparkContext/ Spark 1.0.x

2014-08-04 Thread Dmitriy Lyubimov
it would seem the code like import o.a.spark.SparkContext._ import math._ a = log(b) does not seem to compile anymore with Spark 1.0.x since SparkContext._ also exposes a `log` function. Which happens a lot to a guy like me. obvious workaround is to use something like import o.a.spark.Sp

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Dmitriy Lyubimov
faster inference especially > over billions of users. > > > On Tue, Jul 8, 2014 at 1:24 PM, Dmitriy Lyubimov > wrote: > > > Hector, could you share the references for hierarchical K-means? thanks. > > > > > > On Tue, Jul 8, 2014 at 1:01 PM, Hector Yee

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Dmitriy Lyubimov
Hector, could you share the references for hierarchical K-means? thanks. On Tue, Jul 8, 2014 at 1:01 PM, Hector Yee wrote: > I would say for bigdata applications the most useful would be hierarchical > k-means with back tracking and the ability to support k nearest centroids. > > > On Tue, Jul

Re: Kryo not default?

2014-05-13 Thread Dmitriy Lyubimov
On Mon, May 12, 2014 at 2:47 PM, Anand Avati wrote: > Hi, > Can someone share the reason why Kryo serializer is not the default? why should it be? On top of it, the only way to serialize a closure into the backend (even now) is java serialization (which means java serialization is required of a

Double lhbase dependency in spark 0.9.1

2014-04-17 Thread Dmitriy Lyubimov
Not sure if I am seeing double. SparkBuild.scala for 0.9.1 has dobule hbase declaration "org.apache.hbase" % "hbase" % "0.94.6" excludeAll(excludeNetty, excludeAsm), "org.apache.hbase" % "hbase" % HBASE_VERSION excludeAll(excludeNetty, excludeAsm), as a result i am no