Re: Storing an action result in HDFS

2015-06-22 Thread Chris Gore
> Thanks for the quick reply and the welcome. I am trying to read a file from > hdfs and then writing back just the first line to hdfs. > > I calling first() on the RDD to get the first line. > > Sent from my iPhone > >> On Jun 22, 2015, at 7:42 PM, Chris Gore w

Re: Storing an action result in HDFS

2015-06-22 Thread Chris Gore
Hi Ravi, Welcome, you probably want RDD.saveAsTextFile(“hdfs:///my_file”) Chris > On Jun 22, 2015, at 5:28 PM, ravi tella wrote: > > > Hello All, > I am new to Spark. I have a very basic question.How do I write the output of > an action on a RDD to HDFS? > > Thanks in advance for the help.

Re: Compare LogisticRegression results using Mllib with those using other libraries (e.g. statsmodel)

2015-05-20 Thread Chris Gore
I tried running this data set as described with my own implementation of L2 regularized logistic regression using LBFGS to compare: https://github.com/cdgore/fitbox Intercept: -0.886745823033 Weights (['gre', 'gpa', 'rank']):[ 0.28862268 0.19402388 -0.36637964]

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Chris Gore
Good to hear there will be partitioning support. I’ve had some success loading partitioned data specified with Unix glowing format. i.e.: sc.textFile("s3:/bucket/directory/dt=2014-11-{2[4-9],30}T00-00-00”) would load dates 2014-11-24 through 2014-11-30. Not the most ideal solution, but it se

Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
> val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0))) > > where numProducts should be the largest product id plus one. > > Best, > Xiangrui > > On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore wrote: >> Hi Sameer, >> >> MLLib uses Breez

Re: MLLib sparse vector

2014-09-15 Thread Chris Gore
Hi Sameer, MLLib uses Breeze’s vector format under the hood. You can use that. http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector For example: import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} val numClasses = classes.distinct.count.toInt val

Re: Accessing neighboring elements in an RDD

2014-09-03 Thread Chris Gore
There is support for Spark in ElasticSearch’s Hadoop integration package. http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html Maybe you could split and insert all of your documents from Spark and then query for “MoreLikeThis” on the ElasticSearch index. I haven’t tried

Re: Error: No space left on device

2014-07-15 Thread Chris Gore
Hi Chris, I've encountered this error when running Spark’s ALS methods too. In my case, it was because I set spark.local.dir improperly, and every time there was a shuffle, it would spill many GB of data onto the local drive. What fixed it was setting it to use the /mnt directory, where a net

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Chris Gore
We'd love to see a Spark user group in Los Angeles and connect with others working with it here. Ping me if you're in the LA area and use Spark at your company ( ch...@retentionscience.com ). Chris Retention Science call: 734.272.3099 visit: Site | like: Facebook | follow: Twitter On Mar 31,