Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-15 Thread Josh Rosen
+1 On Mon, Apr 15, 2024 at 11:26 AM Maciej wrote: > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 4/15/24 8:16 PM, Rui Wang wrote: > > +1, non-binding. > > Thanks Dongjoon to drive this! > > > -Rui > > On Mon, Apr 15, 2024 at 10:10 AM Xinro

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-29 Thread Josh Rosen
first wrote our own integration / sync script. On Wed, Apr 29, 2020 at 6:21 PM Hyukjin Kwon wrote: > Let actually me just take a look by myself and bring some updates soon. > > 2020년 4월 30일 (목) 오전 9:13, Hyukjin Kwon 님이 작성: > >> WDYT @Josh Rosen ? >> Seems &

Spark SQL upgrade / migration guide: discoverability and content organization

2019-07-14 Thread Josh Rosen
I'd like to discuss the Spark SQL migration / upgrade guides in the Spark documentation: these are valuable resources and I think we could increase that value by making these docs easier to discover and by adding a bit more structure to the existing content. For folks who aren't familiar with thes

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Josh Rosen
+1 in favor of some sort of JIRA cleanup. My only request is that we attach some sort of 'bulk-closed' label to issues that we close via JIRA filter batch operations (and resolve the issues as "Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label makes it easier to audit what was closed, s

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-24 Thread Josh Rosen
The code for this runs in http://spark-prs.appspot.com (see https://github.com/databricks/spark-pr-dashboard/blob/1e799c9e510fa8cdc9a6c084a777436bebeabe10/sparkprs/controllers/tasks.py#L137 ) I checked the AppEngine logs and it looks like we're getting error responses, possibly due to a credential

Re: spark-tests.appspot status?

2017-12-14 Thread Josh Rosen
Yep, it turns out that there was a problem with the Jenkins job. I've restarted it and it should be backfilling now (this might take a while). On Thu, Dec 14, 2017 at 1:57 PM Xin Lu wrote: > Most likely the job that uploads this stuff at databricks is broken. > > On Thu, Dec 14, 2017 at 12:41 PM

Re: Spark build is failing in amplab Jenkins

2017-11-05 Thread Josh Rosen
UPS but the workers > aren't... and when they come back, the PATH variable specified in the > workers' configs get dropped and we see behavior like this. > > josh rosen (whom i am talking with over chat) will be restarting the > ssh/worker processes on all of the worker

Re: Raise Jenkins timeout?

2017-10-09 Thread Josh Rosen
I bumped the timeouts up to 255 minutes (to exceed https://github.com/apache/spark/blame/master/dev/run-tests-jenkins.py#L185). Let's see if this resolves the problem. On Mon, Oct 9, 2017 at 9:30 AM shane knapp wrote: > ++joshrosen > > On Mon, Oct 9, 2017 at 1:48 AM, Sean Owen wrote: > >> I'm s

Re: Are there multiple processes out there running JIRA <-> Github maintenance tasks?

2017-08-30 Thread Josh Rosen
, Aug 30, 2017 at 1:18 PM Marcelo Vanzin wrote: > I'm still seeing some odd behavior. > > I just deleted my repo's branch for > https://github.com/apache/spark/pull/19013 and the script seems to > have done some update to the bug, since I got a bunch of e-mails. > &

Re: Are there multiple processes out there running JIRA <-> Github maintenance tasks?

2017-08-28 Thread Josh Rosen
This should be fixed now. The problem was that debug code had been pushed while investigating the JIRA linkage failure but was not removed and this problem went unnoticed because linking was failing well before the debug code was hit. Once the JIRA connectivity issues were resolved, the problematic

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Josh Rosen
Usually the backend of https://spark-prs.appspot.com does the linking while processing PR update tasks. It appears that the site's connections to JIRA have started failing: ConnectionError: ('Connection aborted.', HTTPException('Deadline exceeded while waiting for HTTP response from URL: https://i

Crowdsourced triage Scapegoat compiler plugin warnings

2017-05-24 Thread Josh Rosen
ed warnings in order to help improve Scapegoat itself and eliminate common false-positives. Thanks and happy bug-hunting, Josh Rosen

Re: New Optimizer Hint

2017-05-01 Thread Josh Rosen
The issue of UDFS which return structs being evaluated many times when accessing the returned struct's fields sounds like https://issues.apache.org/jira/browse/SPARK-17728; that issue mentions a trick of using *array* and *explode* to prevent project collapsing. On Thu, Apr 20, 2017 at 8:55 AM Rey

Re: branch-2.2 has been cut

2017-04-24 Thread Josh Rosen
I've created the Jenkins jobs for branch-2.2, including the nightly snapshot, packaging, and docs jobs. You can view the latest nightly package at https://home.apache.org/~pwendell/spark-nightly/spark-branch-2.2-bin/latest/ and nightly docs at https://home.apache.org/~pwendell/spark-nightly/spark-

Re: RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-24 Thread Josh Rosen
.0 release notes. On Fri, Mar 24, 2017 at 1:05 PM Marcelo Vanzin wrote: > On Fri, Mar 24, 2017 at 12:07 PM, Josh Rosen > wrote: > > I think that it should be safe to remove JobProgressListener but I'd > like to > > keep the SparkStatusTracker API. > > Thanks Jos

Re: RFC: deprecate SparkStatusTracker, remove JobProgressListener

2017-03-24 Thread Josh Rosen
I think that it should be safe to remove JobProgressListener but I'd like to keep the SparkStatusTracker API. SparkStatusTracker was originally developed to provide a stable programmatic status API for use by Hive on Spark. SparkStatusTracker predated the Spark REST APIs for status tracking which

Re: Nightly builds for master branch have been failing

2017-02-24 Thread Josh Rosen
I spotted the problem and it appears to be a misconfiguration / missing entry in the template which generates the packaging jobs. I've corrected the problem but now the jobs appear to be hanging / flaking on the Git clone. Hopefully this is just a transient issue, so let's retry tonight and see whe

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Josh Rosen
A useful tool for investigating test flakiness is my Jenkins Test Explorer service, running at https://spark-tests.appspot.com/ This has some useful timeline views for debugging flaky builds. For instance, at https://spark-tests.appspot.com/jobs/spark-master-test-maven-hadoop-2.6 (may be slow to l

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-14 Thread Josh Rosen
He pushed the 2.0.2 release docs but there's a problem with Git mirroring of the Spark website repo which is interfering with the publishing: https://issues.apache.org/jira/browse/INFRA-12913 On Mon, Nov 14, 2016 at 1:15 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > The release

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Josh Rosen
+1 On Sun, Sep 25, 2016 at 1:16 PM Yin Huai wrote: > +1 > > On Sun, Sep 25, 2016 at 11:40 AM, Dongjoon Hyun > wrote: > >> +1 (non binding) >> >> RC3 is compiled and tested on the following two systems, too. All tests >> passed. >> >> * CentOS 7.2 / Oracle JDK 1.8.0_77 / R 3.3.1 >>with -Pyar

Re: Unable to run docker jdbc integrations test ?

2016-09-07 Thread Josh Rosen
I think that these tests are valuable so I'd like to keep them. If possible, though, we should try to get rid of our dependency on the Spotify docker-client library, since it's a dependency hell nightmare. Given our relatively simple use of Docker here, I wonder whether we could just write some sim

Re: df.groupBy('m).agg(sum('n)).show dies with 10^3 elements?

2016-09-06 Thread Josh Rosen
I think that this is a simpler case of https://issues.apache.org/jira/browse/SPARK-17405. I'm going to comment on that ticket with your simpler reproduction. On Tue, Sep 6, 2016 at 1:32 PM Jacek Laskowski wrote: > Hi, > > I'm concerned with the OOME in local mode with the version built today: >

Re: master snapshots not publishing?

2016-07-24 Thread Josh Rosen
working. On Thu, Jul 21, 2016 at 3:36 PM Andrew Duffy wrote: > Gotcha, that'd be great! > > On Thu, Jul 21, 2016 at 8:52 PM, Josh Rosen > wrote: > >> Yeah, it's on purpose: we had to disable it back when both the master and >> branch-2.0 branches had the sam

Re: master snapshots not publishing?

2016-07-21 Thread Josh Rosen
Yeah, it's on purpose: we had to disable it back when both the master and branch-2.0 branches had the same versions in their POMs because that was causing the master snapshots to overwrite the 2.0.0-SNAPSHOTS which are generated off of branch-2.0. I can go ahead and re-enable it later today. On T

Re: RFC: Remote "HBaseTest" from examples?

2016-04-19 Thread Josh Rosen
+1; I think that it's preferable for code examples, especially third-party integration examples, to live outside of Spark. On Tue, Apr 19, 2016 at 10:29 AM Reynold Xin wrote: > Yea in general I feel examples that bring in a large amount of > dependencies should be outside Spark. > > > On Tue, Ap

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Josh Rosen
d status 1 > tar: Error is not recoverable: exiting now > > $ ls -l !$ > ls -l spark-1.6.1-bin-hadoop2.4.tgz > -rw-r--r--. 1 hbase hadoop 323614720 Apr 5 19:25 > spark-1.6.1-bin-hadoop2.4.tgz > > Thanks > > On Wed, Apr 6, 2016 at 12:19 PM, Josh Rosen > wrote:

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Josh Rosen
I downloaded the Spark 1.6.1 artifacts from the Apache mirror network and re-uploaded them to the spark-related-packages S3 bucket, so hopefully these packages should be fixed now. On Mon, Apr 4, 2016 at 3:37 PM Nicholas Chammas wrote: > Thanks, that was the command. :thumbsup: > > On Mon, Apr 4

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
o the platform's default javac, which happens to be Java 7. To fix this, I'm going to modify the build to just prepend $JAVA_HOME/bin to $PATH while setting up the test environment On Tue, Apr 5, 2016 at 5:09 PM Josh Rosen wrote: > I've reverted the bulk of the conf changes while

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
following error ( > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/566/console > ): > > [error] javac: invalid source release: 1.8 > [error] Usage: javac > [error] use -help for a list of possible options > > > On Tue, Apr 5, 2016 at 2:14 P

Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Josh Rosen
In order to be able to run Java 8 API compatibility tests, I'm going to push a new set of Jenkins configurations for Spark's test and PR builders so that those jobs use a Java 8 JDK. I tried this once in the past and it seemed to introduce some rare, transient flakiness in certain tests, so if anyo

Re: Understanding PySpark Internals

2016-03-30 Thread Josh Rosen
One clarification: there *are* Python interpreters running on executors so that Python UDFs and RDD API code can be executed. Some slightly-outdated but mostly-correct reference material for this can be found at https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals. See also: search

Re: Spark build with scala-2.10 fails ?

2016-03-20 Thread Josh Rosen
It looks like the Scala 2.10 Jenkins build is working: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/ Can you share more details about how you're compiling with 2.10 (e.g. which commands you ran, git SHA, etc)? On Wed, Mar 16, 2016 at 11:

Re: Apache Spark Exception in thread “main” java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2016-03-19 Thread Josh Rosen
See the instructions in the Spark documentation: https://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211 On Wed, Mar 16, 2016 at 7:05 PM satyajit vegesna wrote: > > > Hi, > > Scala version:2.11.7(had to upgrade the scala verison to enable case > clasess to accept more tha

Re: Apache Spark Exception in thread “main” java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2016-03-19 Thread Josh Rosen
Err, whoops, looks like this is a user app and not building Spark itself, so you'll have to change your deps to use the 2.11 versions of Spark. e.g. spark-streaming_2.10 -> spark-streaming_2.11. On Wed, Mar 16, 2016 at 7:07 PM Josh Rosen wrote: > See the instructions in the Spark do

Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Josh Rosen
Does anyone implement Spark's serializer interface (org.apache.spark.serializer.Serializer) in your own third-party code? If so, please let me know because I'd like to change this interface from a DeveloperAPI to private[spark] in Spark 2.0 in order to do some cleanup and refactoring. I think that

Re: Spark 1.6.1

2016-02-26 Thread Josh Rosen
I updated the release packaging scripts to use SFTP via the *lftp* client: https://github.com/apache/spark/pull/11350 I'm starting the process of cutting a 1.6.1-RC1 tag and release artifacts right now, so please be extra careful about merging into branch-1.6 until after the release. Once the RC p

Re: BUILD FAILURE...again?! :( Spark Project External Flume on fire

2016-01-11 Thread Josh Rosen
I've got a hotfix which should address it: https://github.com/apache/spark/pull/10693 On Sun, Jan 10, 2016 at 11:50 PM, Jacek Laskowski wrote: > Hi, > > It appears that the last commit [1] broke the build. Is anyone working > on it? I can when told so. > > ➜ spark git:(master) ✗ ./build/mvn -

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
t; other administrators may not be so compliant. >> >> Saw a small bit about the java version in there; does Spark currently >> prefer Java 1.8.x? >> >> —Ken >> >> On Jan 5, 2016, at 6:08 PM, Josh Rosen wrote: >> >> Note that you _can_ use a Pyth

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
5, 2016 at 3:07 PM, Josh Rosen wrote: > Yep, the driver and executors need to have compatible Python versions. I > think that there are some bytecode-level incompatibilities between 2.6 and > 2.7 which would impact the deserialization of Python closures, so I think > you need to be runni

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
> the app we can not ship it with our software because its gpl licensed, so >>>> the client would have to download it and install it themselves, and this >>>> would mean its an independent install which has to be audited and approved >>>> and now you are in fo

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Josh Rosen
If users are able to install Spark 2.0 on their RHEL clusters, then I imagine that they're also capable of installing a standalone Python alongside that Spark version (without changing Python systemwide). For instance, Anaconda/Miniconda make it really easy to install Python 2.7.x/3.x without impac

New processes / tools for changing dependencies in Spark

2015-12-30 Thread Josh Rosen
I just merged https://github.com/apache/spark/pull/10461, a PR that adds new automated tooling to help us reason about dependency changes in Spark. Here's a summary of the changes: - The dev/run-tests script (used in the SBT Jenkins builds and for testing Spark pull requests) now generates a

Re: Is there any way to stop a jenkins build

2015-12-29 Thread Josh Rosen
Yeah, I thought that my quick fix might address the HiveThriftBinaryServerSuite hanging issue, but it looks like it didn't work so I'll now have to do the more principled fix of using a UDF which sleeps for some amount of time. In order to stop builds, you need to have a Jenkins account with the p

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Josh Rosen
+1 On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang wrote: > +1 > > On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra > wrote: > >> +1 >> >> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark vers

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Josh Rosen
Would you mind copying this information into a JIRA ticket to make it easier to discover / track? Thanks! On Sun, Dec 20, 2015 at 11:35 AM Alexander Pivovarov wrote: > Usually Spark EMR job fails with the following exception in 1 hour 40 min > - Job cancelled because SparkContext was shut down >

Re: JIRA: Wrong dates from imported JIRAs

2015-12-16 Thread Josh Rosen
Personally, I'd rather avoid the risk of breaking things during the reimport. In my experience we've had a lot of unforeseen problems with JIRA import/export and the benefit here doesn't seem huge (this issue only impacts people that are searching for the oldest JIRAs across all projects, which I t

Re: Fastest way to build Spark from scratch

2015-12-09 Thread Josh Rosen
of Spark > development. Is that right? > > > On Tue, Dec 8, 2015 at 12:33 PM Josh Rosen > wrote: > >> @Nick, on a fresh EC2 instance a significant chunk of the initial build >> time might be due to artifact resolution + downloading. Putting >> pre-populated Ivy and

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Josh Rosen
@Nick, on a fresh EC2 instance a significant chunk of the initial build time might be due to artifact resolution + downloading. Putting pre-populated Ivy and Maven caches onto your EC2 machine could shave a decent chunk of time off that first build. On Tue, Dec 8, 2015 at 9:16 AM, Nicholas Chammas

Re: Bringing up JDBC Tests to trunk

2015-12-06 Thread Josh Rosen
Can you write a script to download and install the JDBC driver to the local Maven repository if it's not already present? If we had that, we could just invoke it as part of dev/run-tests. On Thu, Dec 3, 2015 at 5:55 PM Luciano Resende wrote: > > > On Mon, Nov 30, 2015 at 1:53

Re: Spark doesn't unset HADOOP_CONF_DIR when testing ?

2015-12-06 Thread Josh Rosen
I agree that we should unset this in our tests. Want to file a JIRA and submit a PR to do this? On Thu, Dec 3, 2015 at 6:40 PM Jeff Zhang wrote: > I try to do test on HiveSparkSubmitSuite on local box, but fails. The > cause is that spark is still using my local single node cluster hadoop when >

Re: IntelliJ license for committers?

2015-12-02 Thread Josh Rosen
Yep, I'm the point of contact between us and JetBrains. I forwarded the 2015 license renewal email to the private@ list, so it should be accessible via the archives. I'll go ahead and forward you a copy of our project license, which will have to be renewed in January of next year. On Wed, Dec 2, 2

Re: Bringing up JDBC Tests to trunk

2015-11-30 Thread Josh Rosen
tion about > how the jdbc drivers are actually being setup for the other datasources > (MySQL and PostgreSQL), are these setup directly on the Jenkins slaves ? I > didn't see the jars or anything specific on the pom or other files... > > > Thanks > > On Wed, Oct 21, 2015

Re: VerifyError running Spark SQL code?

2015-11-25 Thread Josh Rosen
I think I've also seen this issue as well, but in a different suite. I wasn't able to easily get to the bottom of it, though. What JDK / JRE are you using? I'm on Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) on OSX. On Wed,

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Josh Rosen
Can you file a JIRA issue to help me triage this further? Thanks! On Tue, Nov 17, 2015 at 4:08 PM Jeff Zhang wrote: > Sure, hive profile is enabled. > > On Wed, Nov 18, 2015 at 6:12 AM, Josh Rosen > wrote: > >> Is the Hive profile enabled? I think it may need to be turn

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Josh Rosen
ient.java:74) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-16 Thread Josh Rosen
As of https://github.com/apache/spark/pull/9575, Spark's build will no longer place every dependency JAR into lib_managed. Can you say more about how this affected spark-shell for you (maybe share a stacktrace)? On Mon, Nov 16, 2015 at 12:03 AM, Jeff Zhang wrote: > > Sometimes, the jars under li

Re: A proposal for Spark 2.0

2015-11-10 Thread Josh Rosen
There's a proposal / discussion of the assembly-less distributions at https://github.com/vanzin/spark/pull/2/files / https://issues.apache.org/jira/browse/SPARK-11157. On Tue, Nov 10, 2015 at 3:53 PM, Reynold Xin wrote: > > On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas < > nicholas.cham...@g

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread Josh Rosen
Are you sure that the credentials are missing? Also: did you enable GitHub commit status updating by accident / configuration loss? That might explain the errors here, since our keys don't have permissions to use that API. On Fri, Nov 6, 2015 at 12:54 PM, shane knapp wrote: > alright, i'm downgr

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Josh Rosen
v 5, 2015 at 4:14 PM, Chang Ya-Hsuan wrote: > >> Thanks for your quickly reply. >> >> I will test several pypy versions and report the result later. >> >> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen wrote: >> >>> I noticed that you're using PyPy 2.2.1

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Josh Rosen
I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's docs say that we only support PyPy 2.3+. Could you try using a newer PyPy version to see if that works? I just checked and it looks like our Jenkins tests are running against PyPy 2.5.1, so that version is known to work. I'm n

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-27 Thread Josh Rosen
Hi Sjoerd, Did your job actually *fail* or did it just generate many spurious exceptions? While the stacktrace that you posted does indicate a bug, I don't think that it should have stopped query execution because Spark should have fallen back to an interpreted code path (note the "Failed to gener

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-25 Thread Josh Rosen
Hi Mark, The shuffle memory leaks that I identified in SPARK-11239 have been around for multiple releases and it's not clear whether they have caused performance problems in real workloads, so I would say that it's fine to move the release forward without including my patch. If we have to cut anot

Re: Bringing up JDBC Tests to trunk

2015-10-21 Thread Josh Rosen
Hey Luciano, This sounds like a reasonable plan to me. One of my colleagues has written some Dockerized MySQL testing utilities, so I'll take a peek at those to see if there are any specifics of their solution that we should adapt for Spark. On Wed, Oct 21, 2015 at 1:16 PM, Luciano Resende wrote

Re: Spark Event Listener

2015-10-16 Thread Josh Rosen
The reason for having two separate interfaces is developer API backwards-compatibility, as far as I know. SparkFirehoseListener came later. On Tue, Oct 13, 2015 at 4:36 PM, Jakob Odersky wrote: > the path of the source file defining the event API is > `core/src/main/scala/org/apache/spark/schedu

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-15 Thread Josh Rosen
To clarify, we're asking about the *spark.sql.tungsten.enabled* flag, which was introduced in Spark 1.5 and enables Project Tungsten optimizations in Spark SQL. This option is set to *true* by default in Spark 1.5+ and exists primarily to allow users to disable the new code paths if they encounter

Re: Spark Event Listener

2015-10-13 Thread Josh Rosen
Check out SparkFirehoseListener, an adapter which forwards all events to a single `onEvent` method in order to let you do pattern-matching as you have described: https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkFirehoseListener.java On Tue, Oct 13, 2015 at 4:29

Re: Pyspark dataframe read

2015-10-06 Thread Josh Rosen
Could someone please file a JIRA to track this? https://issues.apache.org/jira/browse/SPARK On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers wrote: > i ran into the same thing in scala api. we depend heavily on comma > separated paths, and it no longer works. > > > On Tue, Oct 6, 2015 at 3:02 AM, B

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Josh Rosen
I'm working on a fix for this right now. I'm planning to re-run a modified copy of the release packaging scripts which will emit only the missing artifacts (so we won't upload new artifacts with different SHAs for the builds which *did* succeed). I expect to have this finished in the next day or s

Re: Test workflow - blacklist entire suites and run any independently

2015-09-21 Thread Josh Rosen
For quickly running individual suites: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-RunningIndividualTests On Mon, Sep 21, 2015 at 8:21 AM, Adam Roberts wrote: > Hi, is there an existing way to blacklist any test suite? > > Ideally we'd have a tex

Does anyone use ShuffleDependency directly?

2015-09-18 Thread Josh Rosen
Does anyone use ShuffleDependency directly in their Spark code or libraries? If so, how do you use it? Similarly, does anyone use ShuffleHandle

Re: Building with sbt "impossible to get artifacts when data has not been loaded"

2015-08-26 Thread Josh Rosen
I ran into a similar problem while working on the spark-redshift library and was able to fix it by bumping that library's ScalaTest version. I'm still fighting some mysterious Scala issues while trying to test the spark-csv library against 1.5.0-RC1, so it's possible that a build or dependency chan

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-14 Thread Josh Rosen
The updated prototype listed in https://github.com/databricks/spark-pr-dashboard/pull/59 is now running live on spark-prs as part of its PR comment update task. On Fri, Aug 14, 2015 at 10:51 AM, Josh Rosen wrote: > I think that I'm still going to want some custom code to remove t

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-14 Thread Josh Rosen
ed to be more informative, since our custom SparkQA provides nicer output. On Fri, Aug 14, 2015 at 1:57 AM, Iulian Dragoș wrote: > > > On Fri, Aug 14, 2015 at 4:21 AM, Josh Rosen wrote: > >> Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 >> >

Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Josh Rosen
Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59 On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen wrote: > *TL;DR*: would anyone object if I wrote a script to auto-delete pull > request comments from AmplabJenkins? > > Currently there are two bots which post

Re: Is OutputCommitCoordinator necessary for all the stages ?

2015-08-11 Thread Josh Rosen
Can you clarify what you mean by "used for all stages"? OutputCommitCoordinator RPCs should only be initiated through SparkHadoopMapRedUtil.commitTask(), so while the OutputCommitCoordinator doesn't make a distinction between ShuffleMapStages and ResultStages there still should not be a perform

Re: Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Josh Rosen
+1. I've been holding off on reviewing / merging patches like the run-tests-jenkins Python refactoring for exactly this reason. On 8/5/15 11:24 AM, Patrick Wendell wrote: Hey All, Was wondering if people would be willing to avoid merging build changes until we have put the tests in better sha

Master JIRA ticket for tracking Spark 1.5.0 configuration renames, defaults changes, and configuration deprecation

2015-08-02 Thread Josh Rosen
To help us track planned / finished configuration renames, defaults changes, and configuration deprecation for the upcoming 1.5.0 release, I have created https://issues.apache.org/jira/browse/SPARK-9550. As you make configuration changes or think of configurations that need to be audited, please u

Re: Should spark-ec2 get its own repo?

2015-08-01 Thread Josh Rosen
I don't think that using git submodules is a good idea here: - The extra `git submodule init && git submodule update` step can lead to confusing problems in certain workflows. - We'd wind up with many commits that serve only to bump the submodule SHA; these commits will be hard to revi

Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-07-31 Thread Josh Rosen
It would also be great to test this with codegen and unsafe enabled but while continuing to use sort shuffle manager instead of the new tungsten-sort one. On Fri, Jul 31, 2015 at 1:39 AM, Reynold Xin wrote: > Is this deterministically reproducible? Can you try this on the latest > master branch?

Re: Worker memory leaks?

2015-07-27 Thread Josh Rosen
capsulates exactly one > application. That is we create a single context per application submitted > and close it upon success/failure completion of the application. > > Thanks, > > On Mon, Jul 20, 2015 at 3:20 PM, Josh Rosen > wrote: > >> Hi Richard, >> >>

Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-26 Thread Josh Rosen
Given that 2.11 may be more stringent with respect to warnings, we might consider building with 2.11 instead of 2.10 in the pull request builder. This would also have some secondary benefits in terms of letting us use tools like Scapegoat or SCoverage highlighting. On Sat, Jul 25, 2015 at 8:52 AM,

Re: Worker memory leaks?

2015-07-20 Thread Josh Rosen
Hi Richard, Thanks for your detailed investigation of this issue. I agree with your observation that the finishedExecutors hashmap is a source of memory leaks for very-long-lived clusters. It looks like the finishedExecutors map is only read when rendering the Worker Web UI and in constructing R

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Josh Rosen
Yep, I emailed TD about it; I think that we may need to make a change to the pull request builder to fix this. Pending that, we could just revert the commit that added this. On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu wrote: > Hi, > I noticed that KinesisStreamSuite fails for both hadoop profiles i

Re: why doesn't jenkins like me?

2015-07-17 Thread Josh Rosen
The "It is not a test" failed test message means that something went wrong in a suite-wide setup or teardown method. This could be some sort of race or flakiness. If this problem persists, we should file a JIRA and label it with "flaky-test" so that we can find it later. On Thu, Jul 16, 2015 at

Re: KryoSerializer gives class cast exception

2015-07-17 Thread Josh Rosen
We've run into other problems caused by our old Kryo versions. I agree that the Chill dependency is one of the main blockers to upgrading Kryo, but I don't think that it's insurmountable: if necessary, we could just publish our own forked version of Chill under our own namespace, similar to what we

Re: problems with build of latest the master

2015-07-15 Thread Josh Rosen
We may be able to fix this from the Spark side by adding appropriate exclusions in our Hadoop dependencies, right? If possible, I think that we should do this. On Wed, Jul 15, 2015 at 7:10 AM, Ted Yu wrote: > I attached a patch for HADOOP-12235 > > BTW openstack was not mentioned in the first e

Re: Joining Apache Spark

2015-07-13 Thread Josh Rosen
Also, check out https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Jul 13, 2015 at 4:08 PM, Marcelo Vanzin wrote: > Hello, welcome, and please start by going through the web site ( > http://spark.apache.org/), especially the "Contributors" section at the > bottom. >

Re: Spark master broken?

2015-07-12 Thread Josh Rosen
I think it is just broken for 2.11 since pull requests are building properly. Sent from my phone > On Jul 12, 2015, at 8:22 AM, René Treffer wrote: > > Java 8, make-distribution > > Jenkins does show the same error, though: > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-Snaps

Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Josh Rosen
Jenkins runs compile-only builds for Maven as an early warning system for this type of issue; you can see from https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ that the Maven compilation is now broken in master. On Thu, Jul 9, 2015 at 8:48 AM, Ted Yu wrote: > I guess the compilation

Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Josh Rosen
I've filed https://issues.apache.org/jira/browse/SPARK-8903 to fix the DataFrameStatSuite test failure. The problem turned out to be caused by a mistake made while resolving a merge-conflict when backporting that patch to branch-1.4. I've submitted https://github.com/apache/spark/pull/7295 to fix

Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-28 Thread Josh Rosen
The 2.11 compile build is going to be green because this is an issue with tests, not compilation. On Sun, Jun 28, 2015 at 6:30 PM, Ted Yu wrote: > Spark-Master-Scala211-Compile build is green. > > However it is not clear what the actual command is: > > [EnvInject] - Variables injected successful

Re: [SQL] codegen on wide dataset throws StackOverflow

2015-06-26 Thread Josh Rosen
Which Spark version are you using? Can you file a JIRA for this issue? On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko wrote: > Hi, i have a small but very wide dataset (2000 columns). Trying to > optimize Dataframe pipeline for it, since it behaves very poorly comparing > to rdd operation. > W

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-06-24 Thread Josh Rosen
This sounds like https://issues.apache.org/jira/browse/SPARK-7436, which has been fixed in Spark 1.4+ and in branch-1.3 (for Spark 1.3.2). On Wed, Jun 24, 2015 at 10:57 PM, Niranda Perera wrote: > Hi all, > > I'm trying to implement a custom StandaloneRecoveryModeFactory in the Java > environmen

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-24 Thread Josh Rosen
At least a couple of those issues are mistargeted; some of the flaky test JIRAs + test improvement tasks should probably be targeted for 1.5.0 instead. On Wed, Jun 24, 2015 at 8:56 AM, Patrick Wendell wrote: > Hey Sean, > > This is being shipped now because there is a severe bug in 1.4.0 that >

Re: [jenkins] ERROR: Publisher 'Publish JUnit test result report' failed: No test report files were found. Configuration error?

2015-06-21 Thread Josh Rosen
This is a side effect of the new pull request tester script interacting badly with a Jenkins plugin, not anything caused by your changes. I'm working on a fix but in the meantime I'd just trust what SparkQA says. Sent from my phone > On Jun 21, 2015, at 1:54 PM, Yu Ishikawa wrote: > > Hi all,

Re: [Tungsten] NPE in UnsafeShuffleWriter.java

2015-06-19 Thread Josh Rosen
I've filed https://issues.apache.org/jira/browse/SPARK-8498 to fix this error-handling code. On Fri, Jun 19, 2015 at 11:51 AM, Josh Rosen wrote: > Hey Peter, > > I think that this is actually due to an error-handling issue: if you look > at the stack trace that you posted,

Re: [Tungsten] NPE in UnsafeShuffleWriter.java

2015-06-19 Thread Josh Rosen
Hey Peter, I think that this is actually due to an error-handling issue: if you look at the stack trace that you posted, the NPE is being thrown from an error-handling branch of a `finally` block: @Override public void write(scala.collection.Iterator> records) throws IOException { boolean success

Re: Sidebar: issues targeted for 1.4.0

2015-06-16 Thread Josh Rosen
Whatever you do, DO NOT use the built-in JIRA 'releases' feature to migrate issues from 1.4.0 to another version: the JIRA feature will have the side-effect of automatically changing the target versions for issues that have been closed, which is going to be really confusing. I've made this mistake

Re: PySpark on PyPi

2015-06-05 Thread Josh Rosen
This has been proposed before: https://issues.apache.org/jira/browse/SPARK-1267 There's currently tighter coupling between the Python and Java halves of PySpark than just requiring SPARK_HOME to be set; if we did this, I bet we'd run into tons of issues when users try to run a newer version of the

Re: Possible space improvements to shuffle

2015-06-02 Thread Josh Rosen
The relevant JIRA that springs to mind is https://issues.apache.org/jira/browse/SPARK-2926 If an aggregator and ordering are both defined, then the map side of sort-based shuffle will sort based on the key ordering so that map-side spills can be efficiently merged. We do not currently do a sort-b

  1   2   >