date:20150308

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Sandy Ryza

+1 (non-binding, doc and packaging issues aside) Built from source, ran jobs and spark-shell against a pseudo-distributed YARN cluster. On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar wrote: > Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop > Distributions X ... > > May

Re: Block Transfer Service encryption support

2015-03-08 Thread Jeff Turpin

Hey Patrick, Yes, I will open a Jira tomorrow... For now my implementation is a basic SSL implementation for the TransportServer and TransportClient.. I will type up the design and at the same time look at the Hadoop impl for possible improvements... Cheers! Jeff On Sun, Mar 8, 2015 at 5:51 PM,

GSoC 2015

2015-03-08 Thread David J. Manglano

Hi Spark devs! I'm writing regarding your GSoC 2015 project idea. I'm a graduate student with experience in Python and discrete mathematics. I'm interested in machine learning, and understand some of its basic concepts. I was wondering if someone might be able to elaborate upon the goals for Spar

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia

Yeah, my concern is that people should get Apache Spark from *Apache*, not from a vendor. It helps everyone use the latest features no matter where they are. In the Hadoop distro case, Hadoop made all this effort to have standard APIs (e.g. YARN), so it should be easy. But it is a problem if we'

Re: Block Transfer Service encryption support

2015-03-08 Thread Patrick Wendell

I think that yes, longer term we want to have encryption of all communicated data. However Jeff, can you open a JIRA to discuss the design before opening a pull request (it's fine to link to a WIP branch if you'd like)? I'd like to better understand the performance and operational complexity of usi

Re: Block Transfer Service encryption support

2015-03-08 Thread Jeff Turpin

I have already written most of the code, just finishing up the unit tests right now... Jeff On Sun, Mar 8, 2015 at 5:39 PM, Andrew Ash wrote: > I'm interested in seeing this data transfer occurring over encrypted > communication channels as well. Many customers require that all network > tran

Re: Block Transfer Service encryption support

2015-03-08 Thread Andrew Ash

I'm interested in seeing this data transfer occurring over encrypted communication channels as well. Many customers require that all network transfer occur encrypted to prevent the "soft underbelly" that's often found inside a corporate network. On Fri, Mar 6, 2015 at 4:20 PM, turp1twin wrote:

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen

Yeah it's not much overhead, but here's an example of where it causes a little issue. I like that reasoning. However, the released builds don't track the later versions of Hadoop that vendors would be distributing -- there's no Hadoop 2.6 build for example. CDH4 is here, but not the far-more-used

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell

I think it's important to separate the goals from the implementation. I agree with Matei on the goal - I think the goal needs to be to allow people to download Apache Spark and use it with CDH, HDP, MapR, whatever... This is the whole reason why HDFS and YARN have stable API's, so that other projec

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Matei Zaharia

Our goal is to let people use the latest Apache release even if vendors fall behind or don't want to package everything, so that's why we put out releases for vendors' versions. It's fairly low overhead. Matei > On Mar 8, 2015, at 5:56 PM, Sean Owen wrote: > > Ah. I misunderstood that Matei w

Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Sean Owen

Ah. I misunderstood that Matei was referring to the Scala 2.11 tarball at http://people.apache.org/~pwendell/spark-1.3.0-rc3/ and not the Maven artifacts. Patrick I see you just commented on SPARK-5134 and will follow up there. Sounds like this may accidentally not be a problem. On binary tarball

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Krishna Sankar

Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop Distributions X ... May be one option is to have a minimum basic set (which I know is what we are discussing) and move the rest to spark-packages.org. There the vendors can add the latest downloads - for example when 1.4 is r

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell

We probably want to revisit the way we do binaries in general for 1.4+. IMO, something worth forking a separate thread for. I've been hesitating to add new binaries because people (understandably) complain if you ever stop packaging older ones, but on the other hand the ASF has complained that we

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Sean Owen

Yeah, interesting question of what is the better default for the single set of artifacts published to Maven. I think there's an argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros and cons discussed more at https://issues.apache.org/jira/browse/SPARK-5134 https://github.com/apache/

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Matei Zaharia

+1 Tested it on Mac OS X. One small issue I noticed is that the Scala 2.11 build is using Hadoop 1 without Hive, which is kind of weird because people will more likely want Hadoop 2 with Hive. So it would be good to publish a build for that configuration instead. We can do it if we do a new RC

Re: Loading previously serialized object to Spark

2015-03-08 Thread Akhil Das

Can you paste the complete code? Thanks Best Regards On Sat, Mar 7, 2015 at 2:25 AM, Ulanov, Alexander wrote: > Hi, > > I've implemented class MyClass in MLlib that does some operation on > LabeledPoint. MyClass extends serializable, so I can map this operation on > data of RDD[LabeledPoints],

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

Re: Block Transfer Service encryption support

GSoC 2015

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

Re: Block Transfer Service encryption support

Re: Block Transfer Service encryption support

Re: Block Transfer Service encryption support

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

Re: Loading previously serialized object to Spark

16 matches

Site Navigation

Mail list logo

Footer information