date:20181026

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Michael Shtelma

Which alternatives to ThriftServer do we really have? If ThriftServer is not there anymore, there is no other way to connect to Spark SQL using JDBC and this is the primary way for connecting BI tools to Spark SQL. Do I miss something? The question is, if Spark would like to be the tool, used

Re: Helper methods for PySpark discussion

2018-10-26 Thread Holden Karau

Ok so let's say you made a spark dataframe, you call length -- what do you expect to happen? Personallt I expect Spark to evaluate the dataframe, this is what happens with collections and even iterables. The interplay with cache is a bit strange, but presumably if you've marked your Dataframe for

Re: Helper methods for PySpark discussion

2018-10-26 Thread Li Jin

> (2) If the method forces evaluation this matches most obvious way that would implemented then we should add it with a note in the docstring I am not sure about this because force evaluation could be something that has side effect. For example, df.count() can realize a cache and if we implement _

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Reynold Xin

People do use it, and the maintenance cost is pretty low so I don't think we should just drop it. We can be explicit about there are not a lot of developments going on and we are unlikely to add a lot of new features to it, and users are also welcome to use other JDBC/ODBC endpoint implementations

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen

Maybe that's what I really mean (you can tell I don't follow the Hive part closely) In my travels, indeed the thrift server has been viewed as an older solution to a problem probably better met by others. >From my perspective it's worth dropping, but, that's just anecdotal. Any other arguments for

Re: DataSourceV2 hangouts sync

2018-10-26 Thread Ryan Blue

Looks like the majority opinion is for Wednesday. I've sent out an invite to everyone that replied and will add more people as I hear more responses. Thanks, everyone! On Fri, Oct 26, 2018 at 3:23 AM Gengliang Wang wrote: > +1 > > On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote: > > I didn't k

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Marco Gaido

Hi all, one big problem about getting rid of the Hive fork is the thriftserver, which relies on the HiveServer from the Hive fork. We might migrate to an apache/hive dependency, but not sure this would help that much. I think a broader topic would be the actual opportunity of having a thriftserver

Re: Helper methods for PySpark discussion

2018-10-26 Thread Leif Walsh

That all sounds reasonable but I think in the case of 4 and maybe also 3 I would rather see it implemented to raise an error message that explains what’s going on and suggests the explicit operation that would do the most equivalent thing. And perhaps raise a warning (using the warnings module) for

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen

OK let's keep this about Hive. Right, good point, this is really about supporting metastore versions, and there is a good argument for retaining backwards-compatibility with older metastores. I don't know how far, but I guess, as far as is practical? Isn't there still a lot of Hive 0.x test code?

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Dongjoon Hyun

Hi, Sean and All. For the first question, we support only Hive Metastore from 1.x ~ 2.x. And, we can support Hive Metastore 3.0 simultaneously. Spark is designed like that. I don't think we need to drop old Hive Metastore Support. Is it for avoiding Hive Metastore sharing between Spark2 and Spark

Helper methods for PySpark discussion

2018-10-26 Thread Holden Karau

Coming out of https://github.com/apache/spark/pull/21654 it was agreed the helper methods in question made sense but there was some desire for a plan as to which helper methods we should use. I'd like to purpose a light weight solution to start with for helper methods that match either Pandas or g

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-26 Thread Sean Owen

This is all merged to master/2.4. AFAIK there aren't any items I'm monitoring that are needed for 2.4. On Thu, Oct 25, 2018 at 6:54 PM Sean Owen wrote: > Yep, we're going to merge a change to separate the k8s tests into a > separate profile, and fix up the Scala 2.12 thing. While non-critical th

Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen

Here's another thread to start considering, and I know it's been raised before. What version(s) of Hive should Spark 3 support? If at least we know it won't include Hive 0.x, could we go ahead and remove those tests from master? It might significantly reduce the run time and flakiness. It seems t

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-26 Thread Stavros Kontopoulos

Sean, Yes, I updated the PR and re-run it. On Fri, Oct 26, 2018 at 2:54 AM, Sean Owen wrote: > Yep, we're going to merge a change to separate the k8s tests into a > separate profile, and fix up the Scala 2.12 thing. While non-critical those > are pretty nice to have for 2.4. I think that's doab

Re: DataSourceV2 hangouts sync

2018-10-26 Thread Gengliang Wang

+1 > On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote: > > I didn't know I live in the same timezone with you Wenchen :D. > Monday or Wednesday at 5PM PDT sounds good to me too FWIW. > > 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? >

Re: Drop support for old Hive in Spark 3.0?

Re: Helper methods for PySpark discussion

Re: Helper methods for PySpark discussion

Re: Drop support for old Hive in Spark 3.0?

Re: Drop support for old Hive in Spark 3.0?

Re: DataSourceV2 hangouts sync

Re: Drop support for old Hive in Spark 3.0?

Re: Helper methods for PySpark discussion

Re: Drop support for old Hive in Spark 3.0?

Re: Drop support for old Hive in Spark 3.0?

Helper methods for PySpark discussion

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Drop support for old Hive in Spark 3.0?

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Re: DataSourceV2 hangouts sync

15 matches

Site Navigation

Mail list logo

Footer information