Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Marcin Tustin
I've done this for some pyspark stuff. I didn't find it especially problematic. On Mon, Oct 10, 2016 at 12:58 PM, Reynold Xin wrote: > How are they using it? Calling some main function directly? > > > On Monday, October 10, 2016, Russell Spitzer > wrote: > >> I've seen a variety of users attemp

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Marcin Tustin
Congratulations Xiao ๐ŸŽ‰ On Tuesday, October 4, 2016, Reynold Xin wrote: > Hi all, > > Xiao Li, aka gatorsmile, has recently been elected as an Apache Spark > committer. Xiao has been a super active contributor to Spark SQL. Congrats > and welcome, Xiao! > > - Reynold > > -- Want to work at Hand

Re: IllegalArgumentException: spark.sql.execution.id is already set

2016-09-30 Thread Marcin Tustin
The solution is to strip it out in a hook on your threadpool, by overriding beforeExecute. See: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html On Fri, Sep 30, 2016 at 7:08 AM, Grant Digby wrote: > Thanks for the link. Yeah if there's no need to copy execut

Re: IllegalArgumentException: spark.sql.execution.id is already set

2016-09-29 Thread Marcin Tustin
And that PR as promised: https://github.com/apache/spark/pull/12456 On Thu, Sep 29, 2016 at 5:18 AM, Grant Digby wrote: > Yeah that would work although I was worried that they used > InheritableThreadLocal vs Threadlocal because they did want the child > threads to inherit the parent's execution

Re: IllegalArgumentException: spark.sql.execution.id is already set

2016-09-29 Thread Marcin Tustin
That's not possible because inherited primitive values are copied, not shared. Clearing problematic values on thread creation should eliminate this problem. As to your idea as a design goal, that's also not desirable, because Java thread pooling is implemented in a very surprising way. The standar

Re: IllegalArgumentException: spark.sql.execution.id is already set

2016-09-28 Thread Marcin Tustin
I've solved this in the past by using a thread pool which runs clean up code on thread creation, to clear out stale values. On Wednesday, September 28, 2016, Grant Digby wrote: > Hi, > > We've received the following error a handful of times and once it's > occurred > all subsequent queries fail

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Marcin Tustin
t 12:18 PM, Michael Allman wrote: > Marcin, > > I'm not sure what you're referring to. Can you be more specific? > > Cheers, > > Michael > > On Jul 20, 2016, at 9:10 AM, Marcin Tustin wrote: > > Whatever happened with the query regarding benchmarks? Is that re

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Marcin Tustin
Whatever happened with the query regarding benchmarks? Is that resolved? On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes > if a majority o

Re: Send real-time alert using Spark

2016-07-12 Thread Marcin Tustin
Priya, You wouldn't necessarily "use spark" to send the alert. Spark is in an important sense one library among many. You can have your application use any other library available for your language to send the alert. Marcin On Tue, Jul 12, 2016 at 9:25 AM, Priya Ch wrote: > Hi All, > > I am b

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-05 Thread Marcin Tustin
+1 agree that right the problem is theoretical esp if the preview label is in the version coordinates as it should be. On Saturday, June 4, 2016, Sean Owen wrote: > Artifacts that are not for public consumption shouldn't be in a public > release; this is instead what nightlies are for. However,

Re: Welcoming Yanbo Liang as a committer

2016-06-04 Thread Marcin Tustin
Congrats! On Friday, June 3, 2016, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a > super active contributor in many areas of MLlib. Please join me in > welcoming Yanbo! > > Matei > ---

Re: Some minor LICENSE and NOTICE issues with 2.0 preview release

2016-06-02 Thread Marcin Tustin
Changing the maven co-ordinates is going to cause everyone in the world who uses a maven-based build system to have update their builds. Given that sbt uses ivy by default, that's likely to affect almost every spark user. Unless we can articulate what the extra legal protections are (and frankly I

Re: NLP & Constraint Programming

2016-05-30 Thread Marcin Tustin
Hi Ralph, You could look at https://spark-packages.org/ and see if there's anything you want on there, and if not release your packages there. Constraint programming might benefit from integration into Spark, though. Marcin On Mon, May 30, 2016 at 7:12 AM, Debusmann, Ralph wrote: > Hi, > > >

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
The use case of docker images in general is that you can deploy and develop with exactly the same binary environment - same java 8, same scala, same spark. This makes things repeatable. On Wed, May 25, 2016 at 8:38 PM, Matei Zaharia wrote: > Just wondering, what is the main use case for the Dock

Spark docker image - does that sound useful?

2016-05-25 Thread Marcin Tustin
think the project would bless anything but the standard > release artifacts since only those are voted on. People are free to > maintain whatever they like and even share it, as long as it's clear > it's not from the Apache project. > > On Wed, May 25, 2016 at 3:41 PM, Marci

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
Ah very nice. Would it be possible to have this blessed into an official image? On Wed, May 25, 2016 at 4:12 PM, Luciano Resende wrote: > > > On Wed, May 25, 2016 at 6:53 AM, Marcin Tustin > wrote: > >> Would it be useful to start baking docker images? Would anyone find th

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
Would it be useful to start baking docker images? Would anyone find that a boon to their testing? On Wed, May 25, 2016 at 2:44 AM, Reynold Xin wrote: > In the past the Spark community have created preview packages (not > official releases) and used those as opportunities to ask community members

Re: RDD.broadcast

2016-04-28 Thread Marcin Tustin
join attribute -> collectAsMap > ->Broadcast MapA; > > b. RddB -> map (Broadcast> MapA; > > > > The first use case might not be that common, but joining a large RDD with > a small (reference) RDD is quite common and much faster than using โ€œjoinโ€ > method. > &

Re: RDD.broadcast

2016-04-28 Thread Marcin Tustin
Why would you ever need to do this? I'm genuinely curious. I view collects as being solely for interactive work. On Thursday, April 28, 2016, wrote: > Hi, > > > > It is a common pattern to process an RDD, collect (typically a subset) to > the driver and then broadcast back. > > > > Adding an RDD

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Marcin Tustin
I think that's an important result. Could you format your email to split out your parts a little more? It all runs together for me in gmail, so it's hard to follow, and I very much would like to. On Thu, Apr 21, 2016 at 2:07 PM, atootoonchian wrote: > SQL query planner can have intelligence to p

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcin Tustin
Let's posit that the spark example is much better than what is available in HBase. Why is that a reason to keep it within Spark? On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu wrote: > bq. HBase's current support, even if there are bugs or things that still > need to be done, is much better than the Sp

Re: YARN Shuffle service and its compatibility

2016-04-18 Thread Marcin Tustin
I'm good with option B at least until it blocks something utterly wonderful (like shuffles are 10x faster). On Mon, Apr 18, 2016 at 4:51 PM, Mark Grover wrote: > Hi all, > If you don't use Spark on YARN, you probably don't need to read further. > > Here's the *user scenario*: > There are going t

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Marcin Tustin
+1 and at the same time maybe surface a report to this list of PRs which need committer action and have only had submitters responding to pings in the last 30 days? On Mon, Apr 18, 2016 at 3:33 PM, Holden Karau wrote: > Personally I'd rather err on the side of keeping PRs open, but I > understan

Re: Recent Jenkins always fails in specific two tests

2016-04-17 Thread Marcin Tustin
Also hitting this: https://github.com/apache/spark/pull/12455. On Sun, Apr 17, 2016 at 9:22 PM, Hyukjin Kwon wrote: > +1 > > Yea, I am facing this problem as well, > https://github.com/apache/spark/pull/12452 > > I thought they are spurious because the tests are passed in my local. > > > > 201

Re: Should localProperties be inheritable? Should we change that or document it?

2016-04-15 Thread Marcin Tustin
On Wed, Apr 13, 2016 at 6:15 AM, Marcin Tustin > wrote: > >> *Tl;dr: *SparkContext.setLocalProperty is implemented with >> InheritableThreadLocal. >> This has unexpected consequences, not least because the method >> documentation doesn't say anything about it: &g

Should localProperties be inheritable? Should we change that or document it?

2016-04-13 Thread Marcin Tustin
upport for this, and ideally secure a committer who can help shepherd this change through. Marcin Tustin -- Want to work at Handy? Check out our culture deck and open roles <http://www.handy.com/careers> Latest news <http://www.handy.com/press> at Handy Handy just raised $50m <http://