Couldn't we have a pip installable "pyspark" package that just serves as a
shim to an existing Spark installation? Or it could even download the
latest Spark binary if SPARK_HOME isn't set during installation. Right now,
Spark doesn't play very well with the usual Python ecosystem. For example,
why
Could we gracefully fallback to an in-tree Hadoop binary (e.g. 1.0.4)
in that case? I think many new Spark users are confused about why
Spark has anything to do with Hadoop, e.g. I could see myself being
confused when the download page asks me to select a "package type". I
know that what I want is
That would be awesome. I support this!
On Fri, Mar 21, 2014 at 7:28 PM, Michael Armbrust
wrote:
> Hey Everyone,
>
> Here is a pretty major (but source compatible) change we are considering
> making to the RDD API for 1.0. Java and Python APIs would remain the same,
> but users of Scala would lik