FYI - marking data type APIs stable

2016-10-10 Thread Reynold Xin
I noticed today that our data types APIs (org.apache.spark.sql.types) are actually DeveloperApis, which means they can be changed from one feature release to another. In reality these APIs have been there since the original introduction of the DataFrame API in Spark 1.3, and has not seen any breaki

Kryo on Zeppelin

2016-10-10 Thread Fei Hu
Hi All, I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version 1.5.0). I customized the Spark interpreter to use org.apache.spark. serializer.KryoSerializer as spark.serializer. And in the dependency I added Kyro-3.0.3 as following: com.esotericsoftware:kryo:3.0.3 When I wro

[no subject]

2016-10-10 Thread Fei Hu
Hi All, I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version 1.5.0). I customized the Spark interpreter to use org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the dependency I added Kyro-3.0.3 as following: com.esotericsoftware:kryo:3.0.3 When I wrot

Re: Quotes within a table name (phoenix table) getting failure: identifier expected at Spark level parsing

2016-10-10 Thread Xiao Li
Hi, Nico, It sounds like you hit a bug in Phoenix Connector. Our general JDBC connector already fixed it, I think. Thanks, Xiao 2016-10-10 15:29 GMT-07:00 Nico Pappagianis : > Hi Xiao, when I try that it gets past spark's sql parser then errors out > at the phoenix sql parser. > > org.apache.p

Re: Improving governance / committers (split from Spark Improvement Proposals thread)

2016-10-10 Thread Holden Karau
I think it is really important to ensure that someone with a good understanding of Kafka is empowered around this component with a formal voice around - but I don't have much dev experience with our Kafka connectors so I can't speak to the specifics around it personally. More generally, I also fee

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
If someone wants to tell me that it's OK and "The Apache Way" for Kafka and Flink to have a proposal process that ends in a lazy majority, but it's not OK for Spark to have a proposal process that ends in a non-lazy consensus... https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+P

Re: Spark Improvement Proposals

2016-10-10 Thread Mark Hamstra
There is a larger issue to keep in mind, and that is that what you are proposing is a procedure that, as far as I am aware, hasn't previously been adopted in an Apache project, and thus is not an easy or exact fit with established practices that have been blessed as "The Apache Way". As such, we n

Re: Spark Improvement Proposals

2016-10-10 Thread Mark Hamstra
I'm not a fan of the SEP acronym. Besides it prior established meaning of "Somebody else's problem", the are other inappropriate or offensive connotations such as this Australian slang that often gets shortened to just "sep": http://www.urbandictionary.com/define.php?term=Seppo On Sun, Oct 9, 20

Re: Spark Improvement Proposals

2016-10-10 Thread Mark Hamstra
If I'm correctly understanding the kind of voting that you are talking about, then to be accurate, it is only the PMC members that have a vote, not all committers: https://www.apache.org/foundation/how-it-works.html#pmc-members On Mon, Oct 10, 2016 at 12:02 PM, Cody Koeninger wrote: > I think th

Re: Quotes within a table name (phoenix table) getting failure: identifier expected at Spark level parsing

2016-10-10 Thread Xiao Li
HI, Nico, We use back ticks to quote it. For example, CUSTOM_ENTITY.`z02` Thanks, Xiao Li 2016-10-10 12:49 GMT-07:00 Nico Pappagianis : > Hello, > > *Some context:* > I have a Phoenix tenant-specific view named CUSTOM_ENTITY."z02" (Phoenix > tables can have quotes to specify case-sensitivity)

Quotes within a table name (phoenix table) getting failure: identifier expected at Spark level parsing

2016-10-10 Thread Nico Pappagianis
Hello, *Some context:* I have a Phoenix tenant-specific view named CUSTOM_ENTITY."z02" (Phoenix tables can have quotes to specify case-sensitivity). I am attempting to write to this table using Spark via a scala script. I am performing the following read successfully: val table = """CUSTOM_ENTITY

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
Updated on github, https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md I believe I've touched on all feedback with the exception of naming, and API vs Strategy. Do we want a straw poll on naming? Matei, are your concerns about api vs strategy addressed if we add a

Re: Spark Improvement Proposals

2016-10-10 Thread Steve Loughran
This is an interesting process proposal; I think it could work well. -It's got the flavour of the ASF incubator; maybe some of the processes there: mentor, regular reporting in could help, in particular, help stop the -1 at the end of the work -it may also aid collaboration to have a medium live

Re: Spark Improvement Proposals

2016-10-10 Thread Matei Zaharia
Agreed with this. As I said before regarding who submits: it's not a normal ASF process to require contributions to only come from committers. Committers are of course the only people who can *commit* stuff. But the whole point of an open source project is that anyone can *contribute* -- indeed,

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
That seems reasonable to me. I do not want to see lazy consensus used on one of these proposals though, I want a clear outcome, i.e. call for a vote, wait at least 72 hours, get three +1s and no vetos. On Mon, Oct 10, 2016 at 2:15 PM, Ryan Blue wrote: > Proposal submission: I think we should k

Re: Spark Improvement Proposals

2016-10-10 Thread Ryan Blue
Proposal submission: I think we should keep this as open as possible. If there is a problem with too many open proposals, then we should tackle that as a fix rather than excluding participation. Perhaps it will end up that way, but I think it's worth trying a more open model first. Majority vs con

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Ofir Manor
Funny, someone from my team talked to me about that idea yesterday. We use SparkLauncher, but it just calls spark-submit that calls other scripts that starts a new Java program that tries to submit (in our case in cluster mode - driver is started in the Spark cluster) and exit. That make it a chall

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
I think this is closer to a procedural issue than a code modification issue, hence why majority. If everyone thinks consensus is better, I don't care. Again, I don't feel strongly about the way we achieve clarity, just that we achieve clarity. On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue wrote: >

Re: Spark Improvement Proposals

2016-10-10 Thread Ryan Blue
Sorry, I missed that the proposal includes majority approval. Why majority instead of consensus? I think we want to build consensus around these proposals and it makes sense to discuss until no one would veto. rb On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue wrote: > +1 to votes to approve propos

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
I think the main value is in being honest about what's going on. No one other than committers can cast a meaningful vote, that's the reality. Beyond that, if people think it's more open to allow formal proposals from anyone, I'm not necessarily against it, but my main question would be this: If

Re: Spark Improvement Proposals

2016-10-10 Thread Ryan Blue
+1 to votes to approve proposals. I agree that proposals should have an official mechanism to be accepted, and a vote is an established means of doing that well. I like that it includes a period to review the proposal and I think proposals should have been discussed enough ahead of a vote to surviv

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Russell Spitzer
Just folks who don't want to use spark-submit, no real use-cases I've seen yet. I didn't know about SparkLauncher myself and I don't think there are any official docs on that or launching spark as an embedded library for tests. On Mon, Oct 10, 2016 at 11:09 AM Matei Zaharia wrote: > What are th

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Matei Zaharia
What are the main use cases you've seen for this? Maybe we can add a page to the docs about how to launch Spark as an embedded library. Matei > On Oct 10, 2016, at 10:21 AM, Russell Spitzer > wrote: > > I actually had not seen SparkLauncher before, that looks pretty great :) > > On Mon, Oct

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Russell Spitzer
I actually had not seen SparkLauncher before, that looks pretty great :) On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer wrote: > I'm definitely only talking about non-embedded uses here as I also use > embedded Spark (cassandra, and kafka) to run tests. This is almost always > safe since every

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Russell Spitzer
I'm definitely only talking about non-embedded uses here as I also use embedded Spark (cassandra, and kafka) to run tests. This is almost always safe since everything is in the same JVM. It's only once we get to launching against a real distributed env do we end up with issues. Since Pyspark uses

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Sean Owen
I have also 'embedded' a Spark driver without much trouble. It isn't that it can't work. The Launcher API is ptobably the recommended way to do that though. spark-submit is the way to go for non programmatic access. If you're not doing one of those things and it is not working, yeah I think peopl

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Marcin Tustin
I've done this for some pyspark stuff. I didn't find it especially problematic. On Mon, Oct 10, 2016 at 12:58 PM, Reynold Xin wrote: > How are they using it? Calling some main function directly? > > > On Monday, October 10, 2016, Russell Spitzer > wrote: > >> I've seen a variety of users attemp

Re: Official Stance on Not Using Spark Submit

2016-10-10 Thread Reynold Xin
How are they using it? Calling some main function directly? On Monday, October 10, 2016, Russell Spitzer wrote: > I've seen a variety of users attempting to work around using Spark Submit > with at best middling levels of success. I think it would be helpful if the > project had a clear statemen

Official Stance on Not Using Spark Submit

2016-10-10 Thread Russell Spitzer
I've seen a variety of users attempting to work around using Spark Submit with at best middling levels of success. I think it would be helpful if the project had a clear statement that submitting an application without using Spark Submit is truly for experts only or is unsupported entirely. I know

Spark 2.0.0 job completes but hangs

2016-10-10 Thread jamborta
Hi all, I have a spark job that takes about an hour to run, in the end it completes all the task, then the job just hangs and does nothing (it writes to s3 as the last step, which also gets completed, all files appear on s3). any ideas how to debug this? see the thread dump below: "Attach Lis

Auto start spark jobs

2016-10-10 Thread Deepak Sharma
Hi All Is there any way to schedule the ever running spark in such a way that it comes up on its own , after the cluster maintenance? -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
Yes, users suggesting SIPs is a good thing and is explicitly called out in the linked document under the Who? section. Formally proposing them, not so much, because of the political realities. Yes, implementation strategy definitely affects goals. There are all kinds of examples of this, I'll pi

Re: This Exception has been really hard to trace

2016-10-10 Thread kant kodali
Hi I use gradle and I don't think it really has "provided" but I was able to google and create the following file but the same error still persist. group 'com.company'version '1.0-SNAPSHOT' apply plugin: 'java'apply plugin: 'idea' repositories {mavenCentral()mavenLocal()} configurations {  

Re: Monitoring system extensibility

2016-10-10 Thread Pete Robbins
Yes I agree. I'm not sure how important this is anyway. It's a little annoying but easy to work around. On Mon, 10 Oct 2016 at 09:01 Reynold Xin wrote: > I just took a quick look and set a target version on the JIRA. But Pete I > think the primary problem with the JIRA and pull request is that i

Re: Monitoring system extensibility

2016-10-10 Thread Reynold Xin
I just took a quick look and set a target version on the JIRA. But Pete I think the primary problem with the JIRA and pull request is that it really just argues (or implements) opening up a private API, which is a valid point but there are a lot more that needs to be done before making some private