Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread Marcelo Vanzin
(Anybody knows what's the deal with all the .invalid e-mail addresses?) Anyway. ASF has voting rules, and some things like releases follow specific rules: https://www.apache.org/foundation/voting.html#ReleaseVotes So, for releases, ultimately, the only votes that "count" towards the final tally a

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
The hadoop-3 profile doesn't really work yet, not even on master. That's being worked on still. On Fri, Apr 5, 2019 at 10:53 AM akirillov wrote: > > Hi there! I'm trying to run Spark unit tests with the following profiles: > > And 'core' module fails with the following test failing with > NoClass

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Marcelo Vanzin
endencies in the classpath, is that correct? > > On Fri, Apr 5, 2019 at 10:57 AM Marcelo Vanzin wrote: >> >> The hadoop-3 profile doesn't really work yet, not even on master. >> That's being worked on still. >> >> On Fri, Apr 5, 2019 at 10:53 AM akirill

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Marcelo Vanzin
+1 On Tue, Aug 27, 2019 at 4:06 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 30th 5PM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this pa

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
Just noticed something before I started to run some tests. The output of "spark-submit --version" is a little weird, in that it's missing information (see end of e-mail). Personally I don't think a lot of that output is super useful (like "Compiled by" or the repo URL), but the branch and revision

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
(Ah, and the 2.4 RC has the same issue.) On Wed, Aug 28, 2019 at 2:23 PM Marcelo Vanzin wrote: > > Just noticed something before I started to run some tests. The output > of "spark-submit --version" is a little weird, in that it's missing > information (see end of e-ma

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-28 Thread Marcelo Vanzin
+1 On Mon, Aug 26, 2019 at 1:28 PM Kazuaki Ishizaki wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.4. > > The vote is open until August 29th 2PM PST and passes if a majority +1 PMC > votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release th

dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Hey all, Something broke that script when running with python 2. I know we want to deprecate python 2, but in that case, scripts should at least be changed to use "python3" in the shebang line... -- Marcelo - To unsubscribe e-

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
hange was on Oct 1, and should have actually helped it > still work with Python 2: > https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c > > Hasn't otherwise changed in a while. What's the error? > > On Fr

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
d 'fix' the > author name if that's the case, or just use python 3. > > On Fri, Nov 8, 2019 at 12:20 PM Marcelo Vanzin wrote: > > > > Something related to non-ASCII characters. Worked fine with python 3. > > > > git branch -D PR_TOOL_MERGE_PR_26426_MASTE

Re: Do we need to finally update Guava?

2019-12-16 Thread Marcelo Vanzin
Great that Hadoop has done it (which, btw, probably means that Spark won't work with that version of Hadoop yet), but Hive also depends on Guava, and last time I tried, even Hive 3.x did not work with Guava 27. (Newer Hadoop versions also have a new artifact that shades a lot of dependencies, whic

Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
Just in the off-chance that someone with admin access to the Jenkins servers is around this week... they seem to be in a pretty unhappy state, I can't even load the UI. FYI in case you're waiting for your PR tests to finish (or even start running). -- Marcelo ---

Re: Jenkins looks hosed

2019-12-23 Thread Marcelo Vanzin
3, 2019 at 12:23 PM Shane Knapp wrote: > > > > > > checking it now. > > > > > > On Mon, Dec 23, 2019 at 11:27 AM Marcelo Vanzin > > > wrote: > > > > > > > > Just in the off-chance that someone with admin access to the Jenkins > > >

Re: Keytab, Proxy User & Principal

2020-03-12 Thread Marcelo Vanzin
. But frankly this feels more like something better taken care of in Livy (e.g. by using KRB5CCNAME when running spark-submit). -- Marcelo Vanzin van...@gmail.com "Life's too short to drink cheap beer"

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
sn't fixed? > == > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. > > > Note: I fully expect this RC to fail. > > > > -- Marcelo Vanzin van...@gmail.com "Life's too short to drink cheap beer"

Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Marcelo Vanzin
; is at https://www.apache.org/foundation/voting.html. > > Please vote before July 6th at noon: > > [ ] +1: Accept the proposal as an official SPIP > [ ] +0 > [ ] -1: I don't think this is a good idea because ... > > I will start the voting off with a +1 f

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-21 Thread Marcelo Vanzin
While you're at it, one thing that needs to be done is create a 2.1.3 version on JIRA. Not sure if you have enough permissions to do that. Fixes after an RC should use the new version, and if you create a new RC, you'll need to go and backdate the patches that went into the new RC. On Mon, Sep 18

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-03 Thread Marcelo Vanzin
Maybe you're running as root (or the admin account on your OS)? On Tue, Oct 3, 2017 at 12:12 PM, Nick Pentreath wrote: > Hmm I'm consistently getting this error in core tests: > > - SPARK-3697: ignore directories that cannot be read. *** FAILED *** > 2 was not equal to 1 (FsHistoryProviderSuite

Re: Set spark.*.retained* configs to 0 when the UI is disabled?

2017-10-13 Thread Marcelo Vanzin
There are some APIs (SparkStatusTracker) that expose job and stage data even when the UI is disabled. I don't think tasks, or SQL stuff, are exposed without the UI though, and maybe the SQL listener doesn't even need to be installed in that case. (Similar for other listeners that don't do anything

Re: Set spark.*.retained* configs to 0 when the UI is disabled?

2017-10-13 Thread Marcelo Vanzin
On Fri, Oct 13, 2017 at 12:49 PM, Craig Ingram wrote: > Are you referring to SPARK-20421 > > and SPARK-18085 ? If I > can lend a hand in this, just let me know. Yes, I'm referring to that pr

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-17 Thread Marcelo Vanzin
This is https://issues.apache.org/jira/browse/SPARK-20201. On Fri, Nov 17, 2017 at 8:51 AM, Felix Cheung wrote: > I wasn’t able to test this out. > > Is anyone else seeing this error? I see a few JVM fixes and getting back > ported, are they related to this? > > This issue seems important to hold

Re: Rolling policy in Spark event logs for long living streaming applications

2017-12-01 Thread Marcelo Vanzin
There's really no current solution to this. There's a brief discussion about it on SPARK-12140. Here we recommend people disable event logs for streaming apps, as sub-optimal as that might be... On Fri, Dec 1, 2017 at 3:27 AM, kankalapti omkar naidu wrote: > Dear spark developers, > > I did not

Kubernetes backend and docker images

2018-01-05 Thread Marcelo Vanzin
Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have

Re: Kubernetes backend and docker images

2018-01-08 Thread Marcelo Vanzin
On Mon, Jan 8, 2018 at 1:39 PM, Matt Cheah wrote: > We would still want images to be able to be uniquely specified for the > driver vs. the executors. For example, not all of the libraries required on > the driver may be required on the executors, so the user would want to > specify a different cu

Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
Hello, Me again. I was playing some more with the kubernetes backend and the whole init container thing seemed unnecessary to me. Currently it's used to download remote jars and files, mount the volume into the driver / executor, and place those jars in the classpath / move the files to the worki

Re: Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas wrote: > You can argue that executors downloading from > external servers would be faster than downloading from the driver, but > I’m not sure I’d agree - it can go both ways. > > On a tangentially related note, one of the main reasons spark-ec2 is

Re: Kubernetes: why use init containers?

2018-01-09 Thread Marcelo Vanzin
One thing I forgot in my previous e-mail is that if a resource is remote I'm pretty sure (but haven't double checked the code) that executors will download it directly from the remote server, and not from the driver. So there, distributed download without an init container. On Tue, Jan 9, 2018 at

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
is idea in detail, and understand the implications and then get > back to you. > > Thanks for the detailed responses here, and for spending time with the idea. > (Also, you're more than welcome to attend the meeting - there's a link here > if you're around.) > > Cheers,

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah wrote: > I’d imagine this is a reason why YARN hasn’t went with using spark-submit > from the application master... I wouldn't use YARN as a template to follow when writing a new backend. A lot of the reason why the YARN backend works the way it does i

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:33 PM, Matt Cheah wrote: > If we use spark-submit in client mode from the driver container, how do we > handle needing to switch between a cluster-mode scheduler backend and a > client-mode scheduler backend in the future? With a config value set by the submission code

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 1:47 PM, Matt Cheah wrote: >> With a config value set by the submission code, like what I'm doing to >> prevent client mode submission in my p.o.c.? > > The contract for what determines the appropriate scheduler backend to > instantiate is then going to be different in Ku

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:00 PM, Yinan Li wrote: > I want to re-iterate on one point, that the init-container achieves a clear > separation between preparing an application and actually running the > application. It's a guarantee provided by the K8s admission control and > scheduling components th

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:16 PM, Yinan Li wrote: > but we can not rule out the benefits init-containers bring either. Sorry, but what are those again? So far all the benefits are already provided by spark-submit... > Again, I would suggest we look at this more thoroughly post 2.3. Actually, one

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:30 PM, Yinan Li wrote: > 1. Retries of init-containers are automatically supported by k8s through pod > restart policies. For this point, sorry I'm not sure how spark-submit > achieves this. Great, add that feature to spark-submit, everybody benefits, not just k8s. > 2.

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 2:51 PM, Matt Cheah wrote: > those sidecars may perform side effects that are undesirable if the main > Spark application failed because dependencies weren’t available If the contract is that the Spark driver pod does not have an init container, and the driver handles its

Re: Kubernetes: why use init containers?

2018-01-10 Thread Marcelo Vanzin
On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan wrote: > We can start by getting a PR going perhaps, and start augmenting the > integration testing to ensure that there are no surprises - with/without > credentials, accessing GCS, S3 etc as well. > When we get enough confidence and test covera

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 4:13 AM, Eric Charles wrote: >> Again, I don't see what is all this hoopla about fine grained control >> of dependency downloads. Spark solved this years ago for Spark >> applications. Don't reinvent the wheel. > > Init-containers are used today to download dependencies. I

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
should be enough time to make the change, test and release with > confidence. > > On Wed, Jan 10, 2018 at 3:45 PM, Marcelo Vanzin wrote: >> >> On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan >> wrote: >> > We can start by getting a PR going perhaps, and sta

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 1:53 PM, Anirudh Ramanathan wrote: > As I understand, the bigger change discussed here are like the init > containers, which will be more on the implementation side than a user facing > change/behavioral change - which is why it seemed okay to pursue it post 2.3 > as well.

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Marcelo Vanzin
+0 Signatures check out. Code compiles, although I see the errors in [1] when untarring the source archive; perhaps we should add "use GNU tar" to the RM checklist? Also ran our internal tests and they seem happy. My concern is the list of open bugs targeted at 2.3.0 (ignoring the documentation

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Marcelo Vanzin
On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen wrote: > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking > it with 'xvzf' and also unzipping it first, and it untarred without warnings > in either case. The warnings just show up if you unpack using GNU tar. The exit code is

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-24 Thread Marcelo Vanzin
Given that the bugs I was worried about have been dealt with, I'm upgrading to +1. On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin wrote: > +0 > > Signatures check out. Code compiles, although I see the errors in [1] > when untarring the source archive; perhaps we should add &

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209 and that's a regression we need to fix. I'll post a patch soon. So -1 (although others have already -1'ed). On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin wrote: > Given that the bugs I was worried about ha

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen wrote: > I am still seeing these tests fail or hang: > > - subscribing topic by name from earliest offsets (failOnDataLoss: false) > - subscribing topic by name from earliest offsets (failOnDataLoss: true) This is something that we are seeing internally

Re: Drop the Hadoop 2.6 profile?

2018-02-08 Thread Marcelo Vanzin
I think it would make sense to drop one of them, but not necessarily 2.6. It kinda depends on what wire compatibility guarantees the Hadoop libraries have; can a 2.6 client talk to 2.7 (pretty certain it can)? Is the opposite safe (not sure)? If the answer to the latter question is "no", then kee

Re: File JIRAs for all flaky test failures

2018-02-08 Thread Marcelo Vanzin
Hey all, I just wanted to bring up Kay's old e-mail about this. If you see a flaky test during a PR, don't just ask for a re-test. File a bug so that we know that test is flaky and someone will eventually take a look at it. A lot of them also make great newbie bugs. I've filed a bunch of these i

Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Marcelo Vanzin
Since it seems there are other issues to fix, I raised SPARK-23413 to blocker status to avoid having to change the disk format of history data in a minor release. On Wed, Feb 14, 2018 at 11:06 PM, Nick Pentreath wrote: > -1 for me as we elevated https://issues.apache.org/jira/browse/SPARK-23377 >

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Hey Sameer, Mind including https://github.com/apache/spark/pull/20643 (SPARK-23468) in the new RC? It's a minor bug since I've only hit it with older shuffle services, but it's pretty safe. On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal wrote: > This RC has failed due to https://issues.apache.

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Marcelo Vanzin
Done, thanks! On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal wrote: > Sure, please feel free to backport. > > On 20 February 2018 at 18:02, Marcelo Vanzin wrote: >> >> Hey Sameer, >> >> Mind including https://github.com/apache/spark/pull/20643 >> (SPARK-

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Marcelo Vanzin
+1 Checked the archives; ran a subset of our internal tests on the hadoop2.7 archive, looks good. On Thu, Feb 22, 2018 at 2:23 PM, Sameer Agarwal wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
I followed Misi's instructions: - click on https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/api/R/index.html - click on "s" at the top - find "sin" and click on it And that does not give me the documentation for the "sin" function. That leads to you to a really ugly list of func

Re: Help needed in R documentation generation

2018-02-27 Thread Marcelo Vanzin
k in May 2017. > > I don’t think I can capture the long reviews and many discussed that went > in, for further discussion please start from JIRA SPARK-20889. > > > > ________ > From: Marcelo Vanzin > Sent: Tuesday, February 27, 2018 10:26:23 AM &

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
Spark already has code to monitor idle connections and close them. That's in TransportChannelHandler.java. If there's anything to do here, it's to allow all users of the transport library to support the "close idle connections" feature of that class. On Wed, Feb 28, 2018 at 9:07 AM, sandeep_katta

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
e completed. > > So needed a mechanism to close only invalid connections . > > > On Wed, 28 Feb 2018 at 10:54 PM, Marcelo Vanzin wrote: >> >> Spark already has code to monitor idle connections and close them. >> That's in TransportChannelHandler.java. >> &g

Re: [Spark-core] why Executors send HeartBeat to driver but not App Master

2018-03-02 Thread Marcelo Vanzin
The app master doesn't have anything it needs to periodically tell the driver, so there was no need for a heartbeat. On Fri, Mar 2, 2018 at 1:44 AM, sandeep_katta wrote: > I want to attempt *SPARK-23545* bug,so I have some questions regarding the > design, > > I am analyzing the communications b

Re: Hadoop 3 support

2018-04-02 Thread Marcelo Vanzin
Saisai filed SPARK-23534, but the main blocking issue is really SPARK-18673. On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin wrote: > Does anybody know what needs to be done in order for Spark to support Hadoop > 3? > -- Marcelo --

Re: Hadoop 3 support

2018-04-02 Thread Marcelo Vanzin
e possible given the current hive-exec packaging. On Mon, Apr 2, 2018 at 2:58 PM, Reynold Xin wrote: > Is it difficult to upgrade Hive execution version to the latest version? The > metastore used to be an issue but now that part had been separated from the > execution part. > > &g

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
I remember seeing somewhere that Scala still has some issues with Java 9/10 so that might be hard... But on that topic, it might be better to shoot for Java 11 compatibility. 9 and 10, following the new release model, aren't really meant to be long-term releases. In general, agree with Sean here.

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
On Thu, Apr 5, 2018 at 10:30 AM, Matei Zaharia wrote: > Sorry, but just to be clear here, this is the 2.12 API issue: > https://issues.apache.org/jira/browse/SPARK-14643, with more details in this > doc: > https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit. > >

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava wrote: > I've found a KVStore wrapper which stores all the metrics in a LevelDb > store. This KVStore wrapper is available as a spark-dependency but we cannot > access the metrics directly from spark since they are all private. I'm not sure what i

Time for 2.3.1?

2018-05-10 Thread Marcelo Vanzin
Hello all, It's been a while since we shipped 2.3.0 and lots of important bug fixes have gone into the branch since then. I took a look at Jira and it seems there's not a lot of things explicitly targeted at 2.3.1 - the only potential blocker (a parquet issue) is being worked on since a new parque

[VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 18, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To le

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
e RC ready. Still learning the ropes. Also, if you plan on doing this in the future, *do not* do "svn co" on the dist.apache.org repo. The ASF Infra folks will not be very kind to you. I'll update our RM docs later. On Tue, May 15, 2018 at 2:00 PM, Marcelo Vanzin wrote: > Please vot

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
It's in. That link is only a list of the currently open bugs. On Tue, May 15, 2018 at 2:02 PM, Justin Miller wrote: > Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo. > > Thanks, > Justin > > On May 15, 2018, at 3:00 PM, Marcelo Vanzin wr

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
e. > > https://issues.apache.org/jira/browse/SPARK-24259 > > Xiao > > > 2018-05-15 14:00 GMT-07:00 Marcelo Vanzin : >> >> Please vote on releasing the following candidate as Apache Spark version >> 2.3.1. >> >> The vote is open until Friday, May 18, at 21:00 UTC

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-16 Thread Marcelo Vanzin
n only bugfixes. >> >> 2018-05-16 12:11 GMT+02:00 kant kodali : >>> >>> Can this https://issues.apache.org/jira/browse/SPARK-23406 be part of >>> 2.3.1? >>> >>> On Tue, May 15, 2018 at 2:07 PM, Marcelo Vanzin >>> wrote: &g

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-17 Thread Marcelo Vanzin
Wenchen reviewed and pushed that change, so he's the most qualified to make that decision. I plan to cut a new RC tomorrow so hopefully he'll see this by then. On Thu, May 17, 2018 at 10:13 AM, Artem Rudoy wrote: > Can we include https://issues.apache.org/jira/browse/SPARK-22371 as well > please

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-18 Thread Marcelo Vanzin
; pretty serious. I've marked it a blocker, I think it should go into 2.3.1. > I'll also take a closer look comparing to the behavior of the old listener > bus. > > On Thu, May 17, 2018 at 12:18 PM, Marcelo Vanzin > wrote: >> >> Wenchen reviewed and pushed th

Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
6351319 >>> >>> Actually, I've been monitoring the history here. (It's synced every 30 >>> minutes.) >>> >>> https://travis-ci.org/dongjoon-hyun/spark/builds >>> >>> Could we give a change to this? >>> >>> Be

Re: Running lint-java during PR builds?

2018-05-21 Thread Marcelo Vanzin
- all of ASF shares one queue. > > At the number of PRs Spark has this could be a big issue. > > > ________ > From: Marcelo Vanzin > Sent: Monday, May 21, 2018 9:08:28 AM > To: Hyukjin Kwon > Cc: Dongjoon Hyun; dev > Subject: Re: Running lint-java

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-21 Thread Marcelo Vanzin
FYI the fix for the blocker has just been committed. I'll prepare RC2 tomorrow morning assuming jenkins is reasonably happy with the current state of the branch. On Fri, May 18, 2018 at 10:39 AM, Marcelo Vanzin wrote: > Just to give folks an update. > > In case you haven't

[VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 25, at 20:00 UTC and passes if at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To learn more about

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-22 Thread Marcelo Vanzin
Starting with my own +1. Did the same testing as RC1. On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday, May 25, at 20:00 UTC and passes if > at least 3 +

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Marcelo Vanzin
n >> > discuss if we should do a new release for 2.0, 2.1, 2.2 later. >> > >> > Thanks, >> > Wenchen >> > >> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen < >> >> > srowen@ >> >> > > wrote: >> > >

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-25 Thread Marcelo Vanzin
ise we end up creating throwaway RCs that are just overhead. On Tue, May 22, 2018 at 12:45 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday, May 25, at 20:00 UTC and passes if > at leas

[VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. Given that I expect at least a few people to be busy with Spark Summit next week, I'm taking the liberty of setting an extended voting period. The vote will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PD

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Marcelo Vanzin
gain. On Fri, Jun 1, 2018 at 1:20 PM, Xiao Li wrote: > Sorry, I need to say -1 > > This morning, just found a regression in 2.3.1 and reverted > https://github.com/apache/spark/pull/21443 > > Xiao > > 2018-06-01 13:09 GMT-07:00 Marcelo Vanzin : >> >> Pleas

[VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. Given that I expect at least a few people to be busy with Spark Summit next week, I'm taking the liberty of setting an extended voting period. The vote will be open until Friday, June 8th, at 19:00 UTC (that's 12:00 PD

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Starting with my own +1 (binding). On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking th

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
local-m2-cache: tried > > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar > > I’d guess I’m probably using the wrong version of hadoop-aws, but I called > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to try. > >

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
project) and figure > out what I need to change (as due diligence for Flintrock’s users). > > Nick > > > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin wrote: >> >> Using the hadoop-aws package is probably going to be a little more >> complicated than that.

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Marcelo Vanzin
3 didn’t work for me > either (even building with -Phadoop-2.7). I guess I’ve been relying on an > unsupported pattern and will need to figure something else out going forward > in order to use s3a://. > > > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin wrote: >> >&g

Re: Time for 2.2.2 release

2018-06-07 Thread Marcelo Vanzin
Took a look at our branch and most of the stuff that is not already in 2.2 are flaky test fixes, so +1. On Wed, Jun 6, 2018 at 7:54 AM, Tom Graves wrote: > Hello all, > > I think its time for another 2.2 release. > I took a look at Jira and I don't see anything explicitly targeted for 2.2.2 > tha

Re: Scala 2.12 support

2018-06-07 Thread Marcelo Vanzin
But DB's shell output is on the most recent 2.11, not 2.12, right? On Thu, Jun 7, 2018 at 5:54 PM, Holden Karau wrote: > I agree that's a little odd, could we not add the bacspace terminal > character? Regardless even if not, I don't think that should be a blocker > for 2.12 support especially si

[VOTE] [RESULT] Spark 2.3.1 (RC4)

2018-06-08 Thread Marcelo Vanzin
The vote passes. Thanks to all who helped with the release! I'll follow up later with a release announcement once everything is published. +1 (* = binding): - Marcelo Vanzin * - Reynold Xin * - Sean Owen * - Denny Lee - Dongjoon Hyun - Ricardo Almeida - Hyukjin Kwon - John Zhuge - Mark Ha

[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1! Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.1, head over to the download page: http://spar

Time for 2.1.3

2018-06-12 Thread Marcelo Vanzin
Hey all, There are some fixes that went into 2.1.3 recently that probably deserve a release. So as usual, please take a look if there's anything else you'd like on that release, otherwise I'd like to start with the process by early next week. I'll go through jira to see what's the status of thing

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Marcelo Vanzin
Yes, my bad. The code in session.py needs to also catch TypeError like before. On Thu, Jun 14, 2018 at 11:03 AM, Li Jin wrote: > Sounds good. Thanks all for the quick reply. > > https://issues.apache.org/jira/browse/SPARK-24563 > > > On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li wrote: >> >> Thanks

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Marcelo Vanzin
structured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowski > > On Mon, Jun 11, 2018 at 9:47 PM, Marcelo Vanzin wrote: >> >> We are happy to announce the availability of Spark 2.3.1! >> >

Re: Time for 2.1.3

2018-06-19 Thread Marcelo Vanzin
long). On Tue, Jun 12, 2018 at 4:27 PM, Marcelo Vanzin wrote: > Hey all, > > There are some fixes that went into 2.1.3 recently that probably > deserve a release. So as usual, please take a look if there's anything > else you'd like on that release, otherwise I'd l

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Starting with my own +1. On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.3. > > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with

[VOTE] Spark 2.1.3 (RC2)

2018-06-26 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.1.3. The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.1.3 [ ] -1 Do not release this pack

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Marcelo Vanzin
+1 Checked sigs + ran a bunch of tests on the hadoop-2.7 binary package. On Wed, Jun 27, 2018 at 1:30 PM, Tom Graves wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.2. > > The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a > majority

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
contrasts = NULL, ...) > Argument names in code not in docs: > singular.ok > Mismatches in argument names: > Position: 16 Code: singular.ok Docs: contrasts > Position: 17 Code: contrasts Docs: ... > > > From: Sean Owen > Sent: Wednesday, June

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-27 Thread Marcelo Vanzin
lakes: https://amplab.cs.berkeley.edu/jenkins/user/vanzin/my-views/view/Spark/ (Look for the 2.1 branch jobs.) > ____ > From: Marcelo Vanzin > Sent: Wednesday, June 27, 2018 6:55 PM > To: Felix Cheung > Cc: Marcelo Vanzin; Tom Graves; dev > > Sub

Re: Time for 2.3.2?

2018-06-27 Thread Marcelo Vanzin
+1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes for those out. (Those are what delayed 2.2.2 and 2.1.3 for those watching...) On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan wrote: > Hi all, > > Spark 2.3.1 was released just a while ago, but unfortunately we discovered > and f

Re: Time for 2.3.2?

2018-06-28 Thread Marcelo Vanzin
Thu, Jun 28, 2018 at 12:56 PM Saisai Shao >>>>> wrote: >>>>> >>>>>> +1, like mentioned by Marcelo, these issues seems quite severe. >>>>>> >>>>>> I can work on the release if short of hands :). >>>>>> >&

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
> > http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 > > Since it isn’t a regression I’d say +1 from me. > > > > From: Tom Graves > Sent: Thursday, June 28, 2018 6:56:16 AM > To: Marc

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
BTW that would be a great fix in the docs now that we'll have a 2.3.2 being prepared. On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung wrote: > Exactly... > > ____ > From: Marcelo Vanzin > Sent: Thursday, June 28, 2018 9:16:08 AM > To: Tom Graves

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
new RC. On Tue, Jun 26, 2018 at 1:25 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.3. > > The vote is open until Fri, June 29th @ 9PM UTC (2PM PDT) and passes if a > majority +1 PMC votes are cast, with a minimu

  1   2   3   4   >