Re: [Discuss] Datasource v2 support for Kerberos

2018-10-01 Thread tigerquoll
Hi Steve, I think that passing a kerberos keytab around is one of those bad ideas that is entirely appropriate to re-question every single time you come across it. It has been used already in spark when interacting with Kerberos systems that do not support delegation tokens. Any such system will e

Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
I don't think so. Spark brings the Scala dependency, and I don't think the installed scalac matters in this respect. Darcy there was an open question about whether this enables you to back out the workaround you created for 2.12.6. I tried removing it and it failed again, so left it in as still nee

Re: On Scala 2.12.7

2018-10-01 Thread Wenchen Fan
SGTM then. Is there anything we need to do to pick up the 2.12.7 upgrade? like updating Jenkins config? On Tue, Oct 2, 2018 at 10:53 AM Sean Owen wrote: > I tested both ways, and it actually works fine. It calls into question > whether there's really a fix we need with 2.12.7, but, I hear two >

Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
I tested both ways, and it actually works fine. It calls into question whether there's really a fix we need with 2.12.7, but, I hear two informed opinions (Darcy and the scala release notes) that it was relevant. As we have no prior 2.12 support, I guess my feeling was indeed to get this update in

Re: On Scala 2.12.7

2018-10-01 Thread Felix Cheung
Although like you said, spark support for scala 2.12 is beta anyway then shouldn’t we get it to a working state by basing on 2.12.7? There shouldn’t be a stability issue since it is not officially “supported” From: Wenchen Fan Sent: Monday, October 1, 2018 7:4

Re: On Scala 2.12.7

2018-10-01 Thread Wenchen Fan
My major concern is how it will affect end-users if Spark 2.4 is built with Scala versions prior to 2.12.7. Generally I'm hesitating to upgrade Scala version when we are very close to a release, and Scala 2.12 build of Spark 2.4 is beta anyway. On Sat, Sep 29, 2018 at 6:46 AM Sean Owen wrote: >

Re: BroadcastJoin failed on partitioned parquet table

2018-10-01 Thread Wenchen Fan
I'm not sure if Spark 1.6 is still maintained, can you try a 2.x spark version and see if the problem still exists? On Sun, Sep 30, 2018 at 4:14 PM 白也诗无敌 <445484...@qq.com> wrote: > Besides I have tried ANALYZE statement. It has no use cause I need the > single partition but get the total table s

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Wenchen Fan
Ryan thanks for putting up a list! Generally there are a few tunning to the data source v2 API in 2.4, and it shouldn't be too hard if you already have a data source v2 implementation and you want to upgrade to Spark 2.4. However, we do want to do some big API changes for data source v2 in the ne

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Michael Heuer
FYI I’ve open two new issues against 2.4.0 rc2 https://issues.apache.org/jira/browse/SPARK-25587 https://issues.apache.org/jira/browse/SPARK-25588 that are regressions against 2.3.1, and may

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Wenchen Fan
This RC fails because of the correctness bug: SPARK-25538 I'll start a new RC once the fix(https://github.com/apache/spark/pull/22602) is merged. Thanks, Wenchen On Tue, Oct 2, 2018 at 1:21 AM Sean Owen wrote: > Given that this release is probably still 2 weeks from landing, I don't > think th

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
Hi, OK. Sorry for the noise. I don't know why it started working, but I cannot reproduce it anymore. Sorry for a false alarm (but I could promise it didn't work and I changed nothing). Back to work... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit

Re: [DISCUSS] Syntax for table DDL

2018-10-01 Thread Ryan Blue
What do you mean by consistent with the syntax in SqlBase.g4? These aren’t currently defined, so we need to decide what syntax to support. There are more details below, but the syntax I’m proposing is more standard across databases than Hive, which uses confusing and non-standard syntax. I doubt t

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Sean Owen
Given that this release is probably still 2 weeks from landing, I don't think that waiting on a spark-tensorflow-connector release with TF 1.12 in mid-October is a big deal. Users can use the library with Spark 2.3.x for a week or two before upgrading, if that's the case. I think this kind of bug f

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Xiangrui Meng
On Mon, Oct 1, 2018 at 9:52 AM Holden Karau wrote: > Oh that does look like an important correctness issue. > -1 > > On Mon, Oct 1, 2018, 9:57 AM Marco Gaido wrote: > >> -1, I was able to reproduce SPARK-25538 with the provided data. >> >> Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha >> sc

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Ryan Blue
Hi Assaf, The major changes to the V2 API that you linked to aren’t going into 2.4. Those will be in the next release because they weren’t finished in time for 2.4. Here are the major updates that will be in 2.4: - SPARK-23323 : The output

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Holden Karau
Oh that does look like an important correctness issue. -1 On Mon, Oct 1, 2018, 9:57 AM Marco Gaido wrote: > -1, I was able to reproduce SPARK-25538 with the provided data. > > Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha > scritto: > >> +1 >> >> Original message >> From:

Data source V2 in spark 2.4.0

2018-10-01 Thread assaf.mendelson
Hi all, I understood from previous threads that the Data source V2 API will see some changes in spark 2.4.0, however, I can't seem to find what these changes are. Is there some documentation which summarizes the changes? The only mention I seem to find is this pull request: https://github.com/apa

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Steve Loughran
On 30 Sep 2018, at 19:37, Jacek Laskowski mailto:ja...@japila.pl>> wrote: scala> spark.range(1).write.saveAsTable("demo") 2018-09-30 17:44:27 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException 2018-09-30 17:44:28 ERROR FileOutputCommitter:314 - Mkdirs f

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-01 Thread Steve Loughran
On 11 Aug 2018, at 17:33, chandan prakash mailto:chandanbaran...@gmail.com>> wrote: Hi All, I was going through this pull request about new CheckpointFileManager abstraction in structured streaming coming in 2.4 : https://issues.apache.org/jira/browse/SPARK-23966 https://github.com/apache/spar

Re: Some PRs not automatically linked to JIRAs

2018-10-01 Thread Hyukjin Kwon
Seems fixed but looks it starts to leave duplicated PR links for some recent JIRAs. Not a big deal but are they being ran in multiple places maybe? For instance, https://issues.apache.org/jira/browse/SPARK-25579 https://issues.apache.org/jira/browse/SPARK-25574 https://issues.apache.org/jira/brow

why y.size is 65536 but y size in new dataset is 1000

2018-10-01 Thread hagersaleh
please help me, code write in spark by python error is Caused by: java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes: x.size = 1000, y.size = 65536 why y.size is 65536 but y size in new dataset is 1000 1-I train model o

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-10-01 Thread Gabor Somogyi
Hi Saisai, The reasons why I've originally set the goal only structured streaming is the following: * Haven't seen big interest in the DStream area for new features * Separate the concerns even if there is a need All in all happy to port the feature to DStream if you think it worth and you can

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-10-01 Thread Gabor Somogyi
Hi Jungtaek, Thanks for your comments, just reacted on them. BR, G On Sat, Sep 29, 2018 at 2:50 PM Jungtaek Lim wrote: > Hi Gabor, > > Thanks for proposing the feature. I'm definitely interested to see this > feature, but honestly I'm not familiar with how Spark deals with delegation > token

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Marco Gaido
-1, I was able to reproduce SPARK-25538 with the provided data. Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha scritto: > +1 > > Original message > From: Denny Lee > Date: 9/30/18 10:30 PM (GMT-08:00) > To: Stavros Kontopoulos > Cc: Sean Owen , Wenchen Fan , dev < > dev@sp

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Ted Yu
+1 Original message From: Denny Lee Date: 9/30/18 10:30 PM (GMT-08:00) To: Stavros Kontopoulos Cc: Sean Owen , Wenchen Fan , dev Subject: Re: [VOTE] SPARK 2.4.0 (RC2) +1 (non-binding) On Sat, Sep 29, 2018 at 10:24 AM Stavros Kontopoulos wrote: +1 Stavros On Sat, Sep 2