Re: Hanging tasks in spark 1.2.1 while working with 1.1.1

2015-03-17 Thread Dmitriy Lyubimov
FWIW observed similar behavior in similar situation. Was able to work around by forcefully committing one of the rdds right before the union into cache, and forcing that by executing take(1). Nothing else ever helped. Seems like yet-undiscovered 1.2.x thing. On Tue, Mar 17, 2015 at 4:21 PM, Eugen

Task result deserialization error (1.1.0)

2015-01-20 Thread Dmitriy Lyubimov
Hi, I am getting task result deserialization error (kryo is enabled). Is it some sort of `chill` registration issue at front end? This is application that lists spark as maven dependency (so it gets correct hadoop and chill dependencies in classpath, i checked). Thanks in advance. 15/01/20 18:2

Re: Upgrade to Spark 1.1.0?

2014-10-19 Thread Dmitriy Lyubimov
Mahout context does not include _all_ possible transitive dependencies. Would not be lighting fast to take all legacy etc. dependencies. There's an "ignored" unit test that asserts context path correctness. you can "uningnore" it and run to verify it still works as ex[ected.The reason it is set to

Re: Spark QL and protobuf schema

2014-08-25 Thread Dmitriy Lyubimov
or that branch, we'll only do that for major bug fixes at this > point. > > > On Thu, Aug 21, 2014 at 10:58 AM, Dmitriy Lyubimov > wrote: > >> ok i'll try. happen to do that a lot to other tools. >> >> So I am guessing you are saying if i wanted to do

Re: Spark QL and protobuf schema

2014-08-21 Thread Dmitriy Lyubimov
gs you can all the applySchema method on SparkContext. > > Would be great if you could contribute this back. > > > On Wed, Aug 20, 2014 at 5:57 PM, Dmitriy Lyubimov > wrote: > >> Hello, >> >> is there any known work to adapt protobuf schema to Spark QL

Spark QL and protobuf schema

2014-08-20 Thread Dmitriy Lyubimov
Hello, is there any known work to adapt protobuf schema to Spark QL data sourcing? If not, would it present interest to contribute one? thanks. -d

Re: MLLib : Math on Vector and Matrix

2014-07-03 Thread Dmitriy Lyubimov
On Wed, Jul 2, 2014 at 11:40 PM, Xiangrui Meng wrote: > Hi Dmitriy, > > It is sweet to have the bindings, but it is very easy to downgrade the > performance with them. The BLAS/LAPACK APIs have been there for more > than 20 years and they are still the top choice for high-performance > linear alg

Re: MLLib : Math on Vector and Matrix

2014-07-02 Thread Dmitriy Lyubimov
in my humble opinion Spark should've supported linalg a-la [1] before it even started dumping methodologies into mllib. [1] http://mahout.apache.org/users/sparkbindings/home.html On Wed, Jul 2, 2014 at 2:16 PM, Thunder Stumpges wrote: > Thanks. I always hate having to do stuff like this. It se

Re: Why Scala?

2014-05-29 Thread Dmitriy Lyubimov
There were few known concerns about Scala, and some still are, but having been doing Scala professionally over two years now, i learned to master and appreciate the advanatages. Major concern IMO is Scala in a less-than-scrupulous corporate environment. First, Scala requires significantly more di

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Mahout now supports doing its distributed linalg natively on Spark so the problem of sequence file input load into Spark is already solved there (trunk, http://mahout.apache.org/users/sparkbindings/home.html, drmFromHDFS() call -- and then you can access to the direct rdd via "rdd" matrix property

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
PPS The shell/spark tutorial i've mentioned is actually being developed in MAHOUT-1542. As it stands, i believe it is now complete in its core. On Wed, May 14, 2014 at 5:48 PM, Dmitriy Lyubimov wrote: > PS spark shell with all proper imports are also supported natively in > Mah

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
PS spark shell with all proper imports are also supported natively in Mahout (mahout spark-shell command). See M-1489 for specifics. There's also a tutorial somewhere but i suspect it has not been yet finished/publised via public link yet. Again, you need trunk to use spark shell there. On Wed, M

Re: Spark - ready for prime time?

2014-04-10 Thread Dmitriy Lyubimov
On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash wrote: > The biggest issue I've come across is that the cluster is somewhat > unstable when under memory pressure. Meaning that if you attempt to > persist an RDD that's too big for memory, even with MEMORY_AND_DISK, you'll > often still get OOMs. I h

Re: Multi master Spark

2014-04-09 Thread Dmitriy Lyubimov
> > > On Wed, Apr 9, 2014 at 3:26 PM, Dmitriy Lyubimov wrote: > >> The only way i know to do this is to use mesos with zookeepers. you >> specify zookeeper url as spark url that contains multiple zookeeper hosts. >> Multiple mesos masters are then elected thru zoo

Re: Multi master Spark

2014-04-09 Thread Dmitriy Lyubimov
The only way i know to do this is to use mesos with zookeepers. you specify zookeeper url as spark url that contains multiple zookeeper hosts. Multiple mesos masters are then elected thru zookeeper leader election until current leader dies; at which point mesos will elect another master (if still l