Which alternatives to ThriftServer do we really have? If ThriftServer is
not there anymore, there is no other way to connect to Spark SQL using
JDBC and this is the primary way for connecting BI tools to Spark SQL.
Do I miss something?
The question is, if Spark would like to be the tool, used
Ok so let's say you made a spark dataframe, you call length -- what do you
expect to happen?
Personallt I expect Spark to evaluate the dataframe, this is what happens
with collections and even iterables.
The interplay with cache is a bit strange, but presumably if you've marked
your Dataframe for
> (2) If the method forces evaluation this matches most obvious way that
would implemented then we should add it with a note in the docstring
I am not sure about this because force evaluation could be something that
has side effect. For example, df.count() can realize a cache and if we
implement _
People do use it, and the maintenance cost is pretty low so I don't think
we should just drop it. We can be explicit about there are not a lot of
developments going on and we are unlikely to add a lot of new features to
it, and users are also welcome to use other JDBC/ODBC endpoint
implementations
Maybe that's what I really mean (you can tell I don't follow the Hive part
closely)
In my travels, indeed the thrift server has been viewed as an older
solution to a problem probably better met by others.
>From my perspective it's worth dropping, but, that's just anecdotal.
Any other arguments for
Looks like the majority opinion is for Wednesday. I've sent out an invite
to everyone that replied and will add more people as I hear more responses.
Thanks, everyone!
On Fri, Oct 26, 2018 at 3:23 AM Gengliang Wang wrote:
> +1
>
> On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote:
>
> I didn't k
Hi all,
one big problem about getting rid of the Hive fork is the thriftserver,
which relies on the HiveServer from the Hive fork.
We might migrate to an apache/hive dependency, but not sure this would help
that much.
I think a broader topic would be the actual opportunity of having a
thriftserver
That all sounds reasonable but I think in the case of 4 and maybe also 3 I
would rather see it implemented to raise an error message that explains
what’s going on and suggests the explicit operation that would do the most
equivalent thing. And perhaps raise a warning (using the warnings module)
for
OK let's keep this about Hive.
Right, good point, this is really about supporting metastore versions, and
there is a good argument for retaining backwards-compatibility with older
metastores. I don't know how far, but I guess, as far as is practical?
Isn't there still a lot of Hive 0.x test code?
Hi, Sean and All.
For the first question, we support only Hive Metastore from 1.x ~ 2.x. And,
we can support Hive Metastore 3.0 simultaneously. Spark is designed like
that.
I don't think we need to drop old Hive Metastore Support. Is it
for avoiding Hive Metastore sharing between Spark2 and Spark
Coming out of https://github.com/apache/spark/pull/21654 it was agreed the
helper methods in question made sense but there was some desire for a plan
as to which helper methods we should use.
I'd like to purpose a light weight solution to start with for helper
methods that match either Pandas or g
This is all merged to master/2.4. AFAIK there aren't any items I'm
monitoring that are needed for 2.4.
On Thu, Oct 25, 2018 at 6:54 PM Sean Owen wrote:
> Yep, we're going to merge a change to separate the k8s tests into a
> separate profile, and fix up the Scala 2.12 thing. While non-critical th
Here's another thread to start considering, and I know it's been raised
before.
What version(s) of Hive should Spark 3 support?
If at least we know it won't include Hive 0.x, could we go ahead and remove
those tests from master? It might significantly reduce the run time and
flakiness.
It seems t
Sean,
Yes, I updated the PR and re-run it.
On Fri, Oct 26, 2018 at 2:54 AM, Sean Owen wrote:
> Yep, we're going to merge a change to separate the k8s tests into a
> separate profile, and fix up the Scala 2.12 thing. While non-critical those
> are pretty nice to have for 2.4. I think that's doab
+1
> On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote:
>
> I didn't know I live in the same timezone with you Wenchen :D.
> Monday or Wednesday at 5PM PDT sounds good to me too FWIW.
>
> 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성:
> Good point. How about Monday or Wednesday at 5PM PDT then?
>
15 matches
Mail list logo