+1 (non-binding, doc and packaging issues aside)
Built from source, ran jobs and spark-shell against a pseudo-distributed
YARN cluster.
On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar wrote:
> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
> Distributions X ...
>
> May
Hey Patrick,
Yes, I will open a Jira tomorrow... For now my implementation is a basic
SSL implementation for the TransportServer and TransportClient.. I will
type up the design and at the same time look at the Hadoop impl for
possible improvements... Cheers!
Jeff
On Sun, Mar 8, 2015 at 5:51 PM,
Hi Spark devs!
I'm writing regarding your GSoC 2015 project idea. I'm a graduate student
with experience in Python and discrete mathematics. I'm interested in
machine learning, and understand some of its basic concepts.
I was wondering if someone might be able to elaborate upon the goals for
Spar
Yeah, my concern is that people should get Apache Spark from *Apache*, not from
a vendor. It helps everyone use the latest features no matter where they are.
In the Hadoop distro case, Hadoop made all this effort to have standard APIs
(e.g. YARN), so it should be easy. But it is a problem if we'
I think that yes, longer term we want to have encryption of all
communicated data. However Jeff, can you open a JIRA to discuss the
design before opening a pull request (it's fine to link to a WIP
branch if you'd like)? I'd like to better understand the performance
and operational complexity of usi
I have already written most of the code, just finishing up the unit tests
right now...
Jeff
On Sun, Mar 8, 2015 at 5:39 PM, Andrew Ash wrote:
> I'm interested in seeing this data transfer occurring over encrypted
> communication channels as well. Many customers require that all network
> tran
I'm interested in seeing this data transfer occurring over encrypted
communication channels as well. Many customers require that all network
transfer occur encrypted to prevent the "soft underbelly" that's often
found inside a corporate network.
On Fri, Mar 6, 2015 at 4:20 PM, turp1twin wrote:
Yeah it's not much overhead, but here's an example of where it causes
a little issue.
I like that reasoning. However, the released builds don't track the
later versions of Hadoop that vendors would be distributing -- there's
no Hadoop 2.6 build for example. CDH4 is here, but not the
far-more-used
I think it's important to separate the goals from the implementation.
I agree with Matei on the goal - I think the goal needs to be to allow
people to download Apache Spark and use it with CDH, HDP, MapR,
whatever... This is the whole reason why HDFS and YARN have stable
API's, so that other projec
Our goal is to let people use the latest Apache release even if vendors fall
behind or don't want to package everything, so that's why we put out releases
for vendors' versions. It's fairly low overhead.
Matei
> On Mar 8, 2015, at 5:56 PM, Sean Owen wrote:
>
> Ah. I misunderstood that Matei w
Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball
at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the
Maven artifacts.
Patrick I see you just commented on SPARK-5134 and will follow up
there. Sounds like this may accidentally not be a problem.
On binary tarball
Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
Distributions X ...
May be one option is to have a minimum basic set (which I know is what we
are discussing) and move the rest to spark-packages.org. There the vendors
can add the latest downloads - for example when 1.4 is r
We probably want to revisit the way we do binaries in general for
1.4+. IMO, something worth forking a separate thread for.
I've been hesitating to add new binaries because people
(understandably) complain if you ever stop packaging older ones, but
on the other hand the ASF has complained that we
Yeah, interesting question of what is the better default for the
single set of artifacts published to Maven. I think there's an
argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
and cons discussed more at
https://issues.apache.org/jira/browse/SPARK-5134
https://github.com/apache/
+1
Tested it on Mac OS X.
One small issue I noticed is that the Scala 2.11 build is using Hadoop 1
without Hive, which is kind of weird because people will more likely want
Hadoop 2 with Hive. So it would be good to publish a build for that
configuration instead. We can do it if we do a new RC
Can you paste the complete code?
Thanks
Best Regards
On Sat, Mar 7, 2015 at 2:25 AM, Ulanov, Alexander
wrote:
> Hi,
>
> I've implemented class MyClass in MLlib that does some operation on
> LabeledPoint. MyClass extends serializable, so I can map this operation on
> data of RDD[LabeledPoints],
16 matches
Mail list logo