Re: Signing releases with pwendell or release manager's key?

2017-09-17 Thread Patrick Wendell
Sparks release pipeline is automated and part of that automation includes securely injecting this key for the purpose of signing. I asked the ASF to provide a service account key several years ago but they suggested that we use a key attributed to an individual even if the process is automated. I

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
ra care to make sure that can't happen, even if it > is an annoyance for the release managers. > > On Sun, Sep 17, 2017 at 10:12 PM, Patrick Wendell > wrote: > >> Sparks release pipeline is automated and part of that automation includes >> securely injecting this key fo

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
ark repo. [1] https://github.com/apache/spark/tree/master/dev/create-release - Patrick On Mon, Sep 18, 2017 at 6:23 PM, Patrick Wendell wrote: > One thing we could do is modify the release tooling to allow the key to be > injected each time, thus allowing any RM to insert their own key at

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
For the current release - maybe Holden could just sign the artifacts with her own key manually, if this is a concern. I don't think that would require modifying the release pipeline, except to just remove/ignore the existing signatures. - Patrick On Mon, Sep 18, 2017 at 7:56 PM, Reynold Xin wrot

Re: Signing releases with pwendell or release manager's key?

2017-09-18 Thread Patrick Wendell
SPARK-22055 & SPARK-22054 to port the > release scripts and allow injecting of the RM's key. > > On Mon, Sep 18, 2017 at 8:11 PM, Patrick Wendell > wrote: > >> For the current release - maybe Holden could just sign the artifacts with >> her own key manually, if

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
r vice versa. >> A >> clean rebuild can always solve this. >> >> On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell >> wrote: >> >> > Does this happen if you clean and recompile? I've seen failures on and >> > off, but haven't been able

branch-1.2 has been cut

2014-11-03 Thread Patrick Wendell
Hi All, I've just cut the release branch for Spark 1.2, consistent with then end of the scheduled feature window for the release. New commits to master will need to be explicitly merged into branch-1.2 in order to be in the release. This begins the transition into a QA period for Spark 1.2, with

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Patrick Wendell
I'm a +1 on this as well, I think it will be a useful model as we scale the project in the future and recognizes some informal process we have now. To respond to Sandy's comment: for changes that fall in between the component boundaries or are straightforward, my understanding of this model is you

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
I think new committers might or might not be maintainers (it would depend on the PMC vote). I don't think it would affect what you could merge, you can merge in any part of the source tree, you just need to get sign off if you want to touch a public API or make major architectural changes. Most pro

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
Hey Greg, Regarding subversion - I think the reference is to partial vs full committers here: https://subversion.apache.org/docs/community-guide/roles.html - Patrick On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein wrote: > -1 (non-binding) > > This is an idea that runs COMPLETELY counter to the Apac

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
In fact, if you look at the subversion commiter list, the majority of people here have commit access only for particular areas of the project: http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell wrote: > Hey Greg, > > Regarding s

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-07 Thread Patrick Wendell
I bet it doesn't work. +1 on isolating it's inclusion to only the newer YARN API's. - Patrick On Fri, Nov 7, 2014 at 11:43 PM, Sean Owen wrote: > I noticed that this doesn't compile: > > mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean > package > > [error] warning: [opt

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-08 Thread Patrick Wendell
ts/spark/core/src/main/scala/org/apache/spark/storage/BlockManager.scala:214: > not found: type ExecutorShuffleInfo > [error] val shuffleConfig = new ExecutorShuffleInfo( > [error] > ... > > > More refactoring needed? Either to support YARN alpha as a separate > shuffle module, or

Re: Should new YARN shuffle service work with "yarn-alpha"?

2014-11-08 Thread Patrick Wendell
gt; makes yarn-alpha work. I'll run tests and open a quick JIRA / PR for > the change. > > On Sat, Nov 8, 2014 at 8:23 AM, Patrick Wendell wrote: >> This second error is something else. Maybe you are excluding >> netwo

Re: getting exception when trying to build spark from master

2014-11-10 Thread Patrick Wendell
I reverted that patch to see if it fixes it. On Mon, Nov 10, 2014 at 1:45 PM, Josh Rosen wrote: > It looks like the Jenkins maven builds are broken, too. Based on the > Jenkins logs, I think that this pull request may have broken things > (although I'm not sure why): > > https://github.com/apach

Re: JIRA + PR backlog

2014-11-11 Thread Patrick Wendell
I wonder if we should be linking to that dashboard somewhere from our official docs or the wiki... On Tue, Nov 11, 2014 at 12:23 PM, Nicholas Chammas wrote: > Yeah, kudos to Josh for putting that together. > > On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa > wrote: > >> Great jobs! >> I didn't kno

[NOTICE] [BUILD] Minor changes to Spark's build

2014-11-11 Thread Patrick Wendell
Hey All, I've just merged a patch that adds support for Scala 2.11 which will have some minor implications for the build. These are due to the complexities of supporting two versions of Scala in a single project. 1. The JDBC server will now require a special flag to build -Phive-thriftserver on t

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
wrote: > >> - Tip: when you rebase, IntelliJ will temporarily think things like the >> Kafka module are being removed. Say 'no' when it asks if you want to remove >> them. >> - Can we go straight to Scala 2.11.4? >> >> On Wed, Nov 12, 2014 at 5:47 AM, Patric

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
PM, Prashant Sharma wrote: > One thing we can do it is print a helpful error and break. I don't know > about how this can be done, but since now I can write groovy inside maven > build so we have more control. (Yay!!) > > Prashant Sharma > > > > On Thu, Nov 13, 20

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
there are no mandatory profiles required to build Spark. I.e. > "mvn package" just works. It seems sad that we would need to break this. > > On Wed, Nov 12, 2014 at 10:59 PM, Patrick Wendell > wrote: >> >> I think printing an error that says "-Pscala-2.10

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-13 Thread Patrick Wendell
a-2.10 profile by default, unless you explicitly activate the 2.11 >> profile, in which case that property will be set and scala-2.10 will >> not activate. If you look at examples/pom.xml, that's the same >> strategy used to choose which hbase profile to activate. >> >> A

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-13 Thread Patrick Wendell
> That's true, but note the code I posted activates a profile based on > the lack of a property being set, which is why it works. Granted, I > did not test that if you activate the other profile, the one with the > property check will be disabled. Ah yeah good call - I so then we'd trigger 2.11-vs

Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:j

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell wrote: > A rece

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
;t test it, but maybe this can fix it? > > Thanks, > Hari > > > On Fri, Nov 14, 2014 at 12:21 PM, Patrick Wendell > wrote: >> >> A work around for this fix is identified here: >> >> http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsj

Re: Has anyone else observed this build break?

2014-11-15 Thread Patrick Wendell
"1.7.0_60" >> Java(TM) SE Runtime Environment (build 1.7.0_60-b19) >> Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) >> >> Let me see if the problem can be solved upstream in HBase >> hbase-annotations module. >> >> Cheers >

Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Patrick Wendell
Neither is strictly optimal which is why we ended up supporting both. Our reference build for packaging is Maven so you are less likely to run into unexpected dependency issues, etc. Many developers use sbt as well. It's somewhat religion and the best thing might be to try both and see which you pr

[ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-17 Thread Patrick Wendell
Hi All, I've just posted a preview of the Spark 1.2.0. release for community regression testing. Issues reported now will get close attention, so please help us test! You can help by running an existing Spark 1.X workload on this and reporting any regressions. As we start voting, etc, the bar for

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-17 Thread Patrick Wendell
Hey Kevin, If you are upgrading from 1.0.X to 1.1.X checkout the upgrade notes here [1] - it could be that default changes caused a regression for your workload. Do you still see a regression if you restore the configuration changes? It's great to hear specifically about issues like this, so plea

Apache infra github sync down

2014-11-18 Thread Patrick Wendell
Hey All, The Apache-->github mirroring is not working right now and hasn't been working fo more than 24 hours. This means that pull requests will not appear as closed even though they have been merged. It also causes diffs to display incorrectly in some cases. If you'd like to follow progress by A

Build break

2014-11-19 Thread Patrick Wendell
Hey All, Just a heads up. I merged this patch last night which caused the Spark build to break: https://github.com/apache/spark/commit/397d3aae5bde96b01b4968dde048b6898bb6c914 The patch itself was fine and previously had passed on Jenkins. The issue was that other intermediate changes merged sin

Spark development with IntelliJ

2014-11-20 Thread Patrick Wendell
Hi All, I noticed people sometimes struggle to get Spark set up in IntelliJ. I'd like to maintain comprehensive instructions on our Wiki to make this seamless for future developers. Due to some nuances of our build, getting to the point where you can build + test every module from within the IDE i

Automated github closing of issues is not working

2014-11-21 Thread Patrick Wendell
After we merge pull requests in Spark they are closed via a special message we put in each commit description ("Closes #XXX"). This feature stopped working around 21 hours ago causing already-merged pull requests to display as open. I've contacted Github support with the issue. No word from them y

Re: How spark and hive integrate in long term?

2014-11-22 Thread Patrick Wendell
There are two distinct topics when it comes to hive integration. Part of the 1.3 roadmap will likely be better defining the plan for Hive integration as Hive adds future versions. 1. Ability to interact with Hive metastore's from different versions ==> I.e. if a user has a metastore, can Spark SQL

Re: Apache infra github sync down

2014-11-22 Thread Patrick Wendell
Hi All, Unfortunately this went back down again. I've opened a new JIRA to track it: https://issues.apache.org/jira/browse/INFRA-8688 - Patrick On Tue, Nov 18, 2014 at 10:24 PM, Patrick Wendell wrote: > Hey All, > > The Apache-->github mirroring is not working right no

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
+1 (binding). Don't see any evidence of regressions at this point. The issue reported by Hector was not related to this rlease. On Sun, Nov 23, 2014 at 9:50 AM, Debasish Das wrote: > -1 from me...same FetchFailed issue as what Hector saw... > > I am running Netflix dataset and dumping out recomm

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
Hey Stephen, Thanks for bringing this up. Technically when we call a release vote it needs to be on the exact commit that will be the final release. However, one thing I've thought of doing for a while would be to publish the maven artifacts using a version tag with $VERSION-rcX even if the underl

Re: Notes on writing complex spark applications

2014-11-23 Thread Patrick Wendell
Hey Evan, It might be nice to merge this into existing documentation. In particular, a lot of this could serve to update the current tuning section and programming guides. It could also work to paste this wholesale as a reference for Spark users, but in that case it's less likely to get updated w

[VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. c

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-29 Thread Patrick Wendell
rtunately you got some of the text here wrong, saying 1.1.0 > instead of 1.2.0. Not sure it will matter since there can well be another RC > after testing, but we should be careful. > > Matei > >> On Nov 28, 2014, at 9:16 PM, Patrick Wendell wrote: >> >> Pleas

Re: Trouble testing after updating to latest master

2014-11-29 Thread Patrick Wendell
Thanks for reporting this. One thing to try is to just do a git clean to make sure you have a totally clean working space ("git clean -fdx" will blow away any differences you have from the repo, of course only do that if you don't have other files around). Can you reproduce this if you just run "sb

Re: Trouble testing after updating to latest master

2014-11-29 Thread Patrick Wendell
d. Thanks, Patrick! > > > On 11/29/14, 10:52 PM, "Patrick Wendell" wrote: > >>Thanks for reporting this. One thing to try is to just do a git clean >>to make sure you have a totally clean working space ("git clean -fdx" >>will blow away any differen

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hey Ryan, A few more things here. You should feel free to send patches to Jenkins to test them, since this is the reference environment in which we regularly run tests. This is the normal workflow for most developers and we spend a lot of effort provisioning/maintaining a very large jenkins cluste

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
to the docs >> site. It should go out with the 1.2 release. >> >> Improvements to the documentation on building Spark belong here: >> https://github.com/apache/spark/blob/master/docs/building-spark.md >> >> If there are clear recommendations that come out of th

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
for /latest is orders of magnitude larger than for snapshot docs). However we could just add /snapshot and publish docs there. - Patrick On Sun, Nov 30, 2014 at 6:15 PM, Patrick Wendell wrote: > Hey Ryan, > > The existing JIRA also covers publishing nightly docs: > https://issues.apache.org

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
ull request for the branch or is there another interface we > can use to submit a build to Jenkins for testing? > > On 11/30/14, 6:49 PM, "Patrick Wendell" wrote: > >>Hey Ryan, >> >>A few more things here. You should feel free to send patches to >>Jenk

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-01 Thread Patrick Wendell
ng-running Spark standalone deployments, >> so it may be hard to reproduce. I'm going to work on a patch to add >> additional logging in order to help with debugging. >> >> I just wanted to give an early head's up about this issue and to get more >> eyes on

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Patrick Wendell
Also a note on this for committers - it's possible to re-word the title during merging, by just running "git commit -a --amend" before you push the PR. - Patrick On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan wrote: > I second that ! > Would also be great if the JIRA was updated accordingl

Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan, What if you run a single "mvn install" to install all libraries locally - then can you "mvn compile -pl core"? I think this may be the only way to make it work. - Patrick On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams wrote: > Following on Mark's Maven examples, here is another related

Re: Ooyala Spark JobServer

2014-12-04 Thread Patrick Wendell
Hey Jun, The Ooyala server is being maintained by it's original author (Evan Chan) here: https://github.com/spark-jobserver/spark-jobserver This is likely to stay as a standalone project for now, since it builds directly on Spark's public API's. - Patrick On Wed, Dec 3, 2014 at 9:02 PM, Jun Fe

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Patrick Wendell
Thanks for flagging this. I reverted the relevant YARN fix in Spark 1.2 release. We can try to debug this in master. On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang wrote: > I created a ticket for this: > > https://issues.apache.org/jira/browse/SPARK-4757 > > > Jianshi > > On Fri, Dec 5, 2014 at

Re: zinc invocation examples

2014-12-05 Thread Patrick Wendell
One thing I created a JIRA for a while back was to have a similar script to "sbt/sbt" that transparently downloads Zinc, Scala, and Maven in a subdirectory of Spark and sets it up correctly. I.e. "build/mvn". Outside of brew for MacOS there aren't good Zinc packages, and it's a pain to figure out

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-05 Thread Patrick Wendell
; browsed the web UI. > > On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 1.2.0! >> >> The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): >> >&g

Re: Is this a little bug in BlockTransferMessage ?

2014-12-09 Thread Patrick Wendell
Hey Nick, Thanks for bringing this up. I believe these Java tests are running in the sbt build right now, the issue is that this particular bug was flagged by the triggering of a runtime Java "assert" (not a normal Junit test assertion) and those are not enabled in our sbt tests. It would be good

Re: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Patrick Wendell
Hi Andrew, It looks like somehow you are including jars from the upstream Apache Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support, we had to modify Hive to use a different version of Kryo that was compatible with Spark's Kryo version. https://github.com/pwendell/hive/commit/5b

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-10 Thread Patrick Wendell
This vote is closed in favor of RC2. On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell wrote: > Hey All, > > Thanks all for the continued testing! > > The issue I mentioned earlier SPARK-4498 was fixed earlier this week > (hat tip to Mark Hamstra who contributed to fix). > &

[VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-10 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc.

Re: Is Apache JIRA down?

2014-12-10 Thread Patrick Wendell
I believe many apache services are/were down due to an outage. On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas wrote: > Nevermind, seems to be back up now. > > On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> For example: https://issues.apache.org/ji

Re: zinc invocation examples

2014-12-12 Thread Patrick Wendell
> great to get your initial read on it. Per this thread I need to add in the > -scala-home call to zinc, but its close to ready for a PR. > > On 12/5/14, 2:10 PM, "Patrick Wendell" wrote: > >>One thing I created a JIRA for a while back was to have a similar >>

Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
Hey All, It appears that a single test suite is failing after the jenkins upgrade: "org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDDSuite". My guess is the suite is not resilient in some way to differences in the environment (JVM, OS version, or something else). I'm going to disable the

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
b.com/apache/spark/pull/3701 > > We might be close to fixing this via one of those PRs, so maybe we should > try using one of those instead? > > On December 15, 2014 at 10:51:46 AM, Patrick Wendell (pwend...@gmail.com) > wrote: > > Hey All, > > It appears that a sin

Re: Governance of the Jenkins whitelist

2014-12-15 Thread Patrick Wendell
Hey Andrew, The list of admins is maintained by the Amplab as part of their donation of this infrastructure. The reason why we need to have admins is that the pull request builder will fetch and then execute arbitrary user code, so we need to do a security audit before we can approve testing new p

Re: Scala's Jenkins setup looks neat

2014-12-16 Thread Patrick Wendell
Yeah you can do it - just make sure they understand it is a new feature so we're asking them to revisit it. They looked at it in the past and they concluded they couldn't give us access without giving us push access. - Patrick On Tue, Dec 16, 2014 at 6:06 PM, Reynold Xin wrote: > It's worth tryi

Re: RDD data flow

2014-12-16 Thread Patrick Wendell
> Why is that? Shouldn't all Partitions be Iterators? Clearly I'm missing > something. The Partition itself doesn't need to be an iterator - the iterator comes from the result of compute(partition). The Partition is just an identifier for that partition, not the data itself. Take a look at the sig

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this relea

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
ote: >> >> +1 >> >> Tested on OS X. >> >> On Wednesday, December 10, 2014, Patrick Wendell wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.2.0! >>> >>> The tag to be voted o

[ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-16 Thread Patrick Wendell
Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating release credits. This process relies on JIRA for categorizing contributions, so it's not possible for us to provide credits in the case where users submit pull requests with no associated

Re: Which committers care about Kafka?

2014-12-18 Thread Patrick Wendell
Hey Cody, Thanks for reaching out with this. The lead on streaming is TD - he is traveling this week though so I can respond a bit. To the high level point of whether Kafka is important - it definitely is. Something like 80% of Spark Streaming deployments (anecdotally) ingest data from Kafka. Also

Re: [RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-18 Thread Patrick Wendell
Update: An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Tue, Dec 16, 2014 at 9:20 PM, Patrick Wendell wrote: > This vote has PASSED with 12 +1 votes (8 binding) and no 0

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark core

Re: Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
2.0 and v1.2.0-rc2 are pointed to different commits in >> https://github.com/apache/spark/releases >> >> Best Regards, >> >> Shixiong Zhu >> >> 2014-12-19 16:52 GMT+08:00 Patrick Wendell : >>> >>> I'm happy to announce the availability of S

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen wrote: > I just tried the exact same command and do not see any error. Maybe > you can make sure you're starting from a clean extraction of the > distro, and check your environment. I'm on OSX, Maven 3.2, Java 8 but

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng wrote: > Me 2 :) > > > On 12/22/2014 06:14 PM, Andrew Ash wrote: > > Hi Xiangrui, > > That link is currently returning a 503 Over Quota error message. Would you > mind pinging back out when the page i

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta wrote: > Andrew, > > Thanks, yes, this is what I wanted: basically just to start multiple jobs > concurrently in th

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas wrote: > Hitesh, >

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
thing missing we should add. - Patrick On Mon, Dec 22, 2014 at 6:17 PM, Nicholas Chammas wrote: > Does this include contributions made against the spark-ec2 repo? > > On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell > wrote: >> >> Hey All, >> >> Due to the

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell wrote: > Hey Josh, > > We don't explicitly track contributions to spark-ec2 in the Apache > Spark release notes. The main reason is that usually updates to > spark-ec2 include a corresponding update t

Re: Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Patrick Wendell
Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang wrote: > Hi all, > In my occasion, I have a huge HashMap[(Int, Long), (Double,

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
Is it sufficient to set "spark.hadoop.validateOutputSpecs" to false? http://spark.apache.org/docs/latest/configuration.html - Patrick On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai wrote: > Hi, > > > > We have such requirements to save RDD output to HDFS with saveAsTextFile > like API, but need

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
ble as any alternatives. This is already pretty easy IMO. - Patrick On Wed, Dec 24, 2014 at 11:28 PM, Cheng, Hao wrote: > I am wondering if we can provide more friendly API, other than configuration > for this purpose. What do you think Patrick? > > Cheng Hao > > -Original

ANNOUNCE: New build script ./build/mvn

2014-12-27 Thread Patrick Wendell
Hi All, A consistent piece of feedback from Spark developers has been that the Maven build is very slow. Typesafe provides a tool called Zinc which improves Scala complication speed substantially with Maven, but is difficult to install and configure, especially for platforms other than Mac OS. I'

Re: Spark driver main thread hanging after SQL insert

2015-01-02 Thread Patrick Wendell
Hi Alessandro, Can you create a JIRA for this rather than reporting it on the dev list? That's where we track issues like this. Thanks!. - Patrick On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta wrote: > Here's what the console shows: > > 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl:

Re: Spark UI history job duration is wrong

2015-01-05 Thread Patrick Wendell
Thanks for reporting this - it definitely sounds like a bug. Please open a JIRA for it. My guess is that we define the start or end time of the job based on the current time instead of looking at data encoded in the underlying event stream. That would cause it to not work properly when loading from

Re: Hang on Executor classloader lookup for the remote REPL URL classloader

2015-01-07 Thread Patrick Wendell
Hey Andrew, So the executors in Spark will fetch classes from the driver node for classes defined in the repl from an HTTP server on the driver. Is this happening in the context of a repl session? Also, is it deterministic or does it happen only periodically? The reason all of the other threads a

Re: When will spark support "push" style shuffle?

2015-01-07 Thread Patrick Wendell
This question is conflating a few different concepts. I think the main question is whether Spark will have a shuffle implementation that streams data rather than persisting it to disk/cache as a buffer. Spark currently decouples the shuffle write from the read using disk/OS cache as a buffer. The t

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Nick - yes. Do you mind moving it? I should have put it in the "Contributing to Spark" page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas wrote: > Side question: Should this section > > in >

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Actually I went ahead and did it. On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell wrote: > Nick - yes. Do you mind moving it? I should have put it in the > "Contributing to Spark" page. > > On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas > wrote: >> Sid

Re: Job priority

2015-01-11 Thread Patrick Wendell
Priority scheduling isn't something we've supported in Spark and we've opted to support FIFO and Fair scheduling and asked users to try and fit these to the needs of their applications. In practice from what I've seen of priority schedulers, such as the linux CPU scheduler, is that strict priority

Fwd: [ NOTICE ] Service Downtime Notification - R/W git repos

2015-01-13 Thread Patrick Wendell
FYI our git repo may be down for a few hours today. -- Forwarded message -- From: "Tony Stevenson" Date: Jan 13, 2015 6:49 AM Subject: [ NOTICE ] Service Downtime Notification - R/W git repos To: Cc: Folks, Please note than on Thursday 15th at 20:00 UTC the Infrastructure team wi

Re: Bouncing Mails

2015-01-17 Thread Patrick Wendell
Akhil, Those are handled by ASF infrastructure, not anyone in the Spark project. So this list is not the appropriate place to ask for help. - Patrick On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das wrote: > My mails to the mailing list are getting rejected, have opened a Jira issue, > can someone t

Semantics of LGTM

2015-01-17 Thread Patrick Wendell
Hey All, Just wanted to ping about a minor issue - but one that ends up having consequence given Spark's volume of reviews and commits. As much as possible, I think that we should try and gear towards "Google Style" LGTM on reviews. What I mean by this is that LGTM has the following semantics: "I

Re: Semantics of LGTM

2015-01-17 Thread Patrick Wendell
out it. The TM part acknowledges the >> judgment as a little more subjective. >> >> I think having some concise way to express both of these is useful. >> >> -Sandy >> >> > On Jan 17, 2015, at 5:40 PM, Patrick Wendell wrote: >> > >> >

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
o mean both "I would >> > > like >> > > to see this feature" and "this patch should be committed", although, >> > > at >> > > least in Hadoop, using +1 on JIRA (as opposed to, say, in a release >> > > vote) >&g

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell wrote: > Okay - so given all this I was going to put the following on the wiki > tentatively: > > ## Reviewing Code > Community code re

Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least l

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
> If the goal is a reproducible test environment then I think that is what > Jenkins is. Granted you can only ask it for a test. But presumably you get > the same result if you start from the same VM image as Jenkins and run the > same steps. But the issue is when users can't reproduce Jenkins fai

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
r directly, this will at least serve as an up-to-date list of packages/versions they should try to install locally in whatever environment they have. - Patrick On Wed, Jan 21, 2015 at 5:42 AM, Will Benton wrote: > - Original Message ----- >> From: "Patrick Wendell" >> T

Upcoming Spark 1.2.1 RC

2015-01-21 Thread Patrick Wendell
Hey All, I am planning to cut a 1.2.1 RC soon and wanted to notify people. There are a handful of important fixes in the 1.2.1 branch (http://s.apache.org/Mpn) particularly for Spark SQL. There was also an issue publishing some of our artifacts with 1.2.0 and this release would fix it for downstr

Re: renaming SchemaRDD -> DataFrame

2015-01-26 Thread Patrick Wendell
One thing potentially not clear from this e-mail, there will be a 1:1 correspondence where you can get an RDD to/from a DataFrame. On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin wrote: > Hi, > > We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to > get the community's opinion.

[VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec The release files, including signatures, digests, etc. can

  1   2   3   4   5   6   7   >