I'm very late to this party and I get hbase-spark... what's the
recommendation for pyspark + hbase? I realize this isn't necessarily a
concern of the spark project, but it'd be nice to at least document it here
with a very short and sweet response because I haven't found anything
useful in the wild
2, 2015 at 3:13 PM, Ignacio Zendejas wrote:
> I've run into an error when trying to create a dataframe. Here's the code:
>
> --
> from pyspark import StorageLevel
> from pyspark.sql import Row
>
> table = 'blah'
> ssc = HiveContext(sc)
>
> data = sc
I've run into an error when trying to create a dataframe. Here's the code:
--
from pyspark import StorageLevel
from pyspark.sql import Row
table = 'blah'
ssc = HiveContext(sc)
data = sc.textFile('s3://bucket/some.tsv')
def deserialize(s):
p = s.strip().split('\t')
p[-1] = float(p[-1])
ret
get a higher-level primitive (e.g. stochastic
> gradient descent) that you can plug some functions into, without worrying
> about the communication.
>
> Matei
>
> On August 13, 2014 at 11:10:02 AM, Ignacio Zendejas (
> ignacio.zendejas...@gmail.com) wrote:
>
> Has
uages for
>> ML-oriented programming", and that's why they went ahead with Python.
>> However, as I understand, very few people actually implement algorithms in
>> Python directly because of the sub-optimal performance. Most people
>> implement algorithms in other lan
Has anyone had a chance to look at this paper (with title in subject)?
http://www.cs.rice.edu/~lp6/comparison.pdf
Interesting that they chose to use Python alone. Do we know how much faster
Scala is vs. Python in general, if at all?
As with any and all benchmarks, I'm sure there are caveats, but
Here's the JIRA:
https://issues.apache.org/jira/browse/SPARK-1473
Future discussions should take place in its comments section.
Thanks.
On Fri, Apr 11, 2014 at 11:26 AM, Ignacio Zendejas <
ignacio.zendejas...@gmail.com> wrote:
> Thanks for the response, Xiangrui.
>
> And
on gain
> > computation, so it is easy to track the progress.
> >
> > The sparse vector support for NaiveBayes is already implemented in
> > branch-1.0 and master. You only need to provide an RDD of sparse
> > vectors (created from Vectors.sparse).
> >
> > MLUti
>
>
> The tail change looks good to me.
>
> For foldLeft, I agree with you that the old way is more readable (although
> less idiomatic scala).
>
>
>
>
> On Thu, Apr 10, 2014 at 1:48 PM, Ignacio Zendejas <
> ignacio.zendejas...@gmail.com> wrote:
>
Hi, again -
As part of the next step, I'd like to make a more substantive contribution
and propose some initial work on feature selection, primarily as it relates
to text classification.
Specifically, I'd like to contribute very straightforward code to perform
information gain feature evaluation.
Hi, all -
First off, I want to say that I love spark and am very excited about
MLBase. I'd love to contribute now that I have some time, but before I do
that I'd like to familiarize myself with the process.
In looking for a few projects and settling on one which I'll discuss in
another thread, I
11 matches
Mail list logo