Re: RDD Partitions not distributed evenly to executors

2016-11-21 Thread Thunder Stumpges
Has anyone figured this out yet!? I have gone looking for this exact problem (spark 1.6.1) and I cannot get my partitions to be distributed evenly across executors no matter what I've tried. it has been mentioned several other times in the user group as well as the dev group (as mentioned by Mike H

Re: Spark transformations

2016-09-12 Thread Thunder Stumpges
opy > in entirety or transitive dependency files as well. > If we need to do complex operations of taking a column as a whole instead > of each element in a row is not possible as of now. > > Trying to find few pointers to easily solve this. > > On Mon, Sep 12, 2016

Re: Spark transformations

2016-09-12 Thread Thunder Stumpges
Hi Janardhan, I have run into similar issues and asked similar questions. I also ran into many problems with private code when trying to write my own Model/Transformer/Estimator. (you might be able to find my question to the group regarding this, I can't really tell if my emails are getting throug

Re: Getting figures from spark streaming

2016-09-12 Thread Thunder Stumpges
Just a guess, but doesn't the `.apply(0)' at the end of each of your print statements take just the first one of the returned list? On Wed, Sep 7, 2016 at 12:36 AM Ashok Kumar wrote: > Any help on this warmly appreciated. > > > On Tuesday, 6 September 2016, 21:31, Ashok Kumar > wrote: > > > He

Re: Complex RDD operation as DataFrame UDF ?

2016-09-09 Thread Thunder Stumpges
Bump, check if this is actually going to the group? I can't see my recent posts on the archives: http://apache-spark-user-list.1001560.n3.nabble.com/ Is there a reason it would not show up here? Thanks! On Tue, Sep 6, 2016 at 11:28 AM Thunder Stumpges wrote: > Hi guys, Spark 1.6.1 her

Complex RDD operation as DataFrame UDF ?

2016-09-06 Thread Thunder Stumpges
Hi guys, Spark 1.6.1 here. I am trying to "DataFrame-ize" a complex function I have that currently operates on a DataSet, and returns another DataSet with a new "column" added to it. I'm trying to fit this into the new ML "Model" format where I can receive a DataFrame, ensure the input column exis

Re: Coding in the Spark ml "ecosystem" why is everything private?!

2016-08-29 Thread Thunder Stumpges
wn from a library. > > If there's a clear opportunity to expose something cleanly you can > bring it up for discussion. But it's never just a matter of making > something public. Making it public means committing others' time to > supporting it as-is for years. It would h

Coding in the Spark ml "ecosystem" why is everything private?!

2016-08-29 Thread Thunder Stumpges
Hi all, I'm not sure if this belongs here in users or over in dev as I guess it's somewhere in between. We have been starting to implement some machine learning pipelines, and it seemed from the documentation that Spark had a fairly well thought-out platform (see: http://spark.apache.org/docs/1.6.

Re: MLLib : Math on Vector and Matrix

2014-07-02 Thread Thunder Stumpges
k around it anyway... Oh well. Thunder On Wed, Jul 2, 2014 at 2:05 PM, Koert Kuipers wrote: > i did the second option: re-implemented .toBreeze as .breeze using pimp > classes > > > On Wed, Jul 2, 2014 at 5:00 PM, Thunder Stumpges < > thunder.stump...@gmail.com> wrote:

MLLib : Math on Vector and Matrix

2014-07-02 Thread Thunder Stumpges
I am upgrading from Spark 0.9.0 to 1.0 and I had a pretty good amount of code working with internals of MLLib. One of the big changes was the move from the old jblas.Matrix to the Vector/Matrix classes included in MLLib. However I don't see how we're supposed to use them for ANYTHING other than a