+1
From: Holden Karau
Sent: Wednesday, July 22, 2020 10:49:49 AM
To: Steve Loughran
Cc: dev
Subject: Re: Exposing Spark parallelized directory listing & non-locality
listing in core
Wonderful. To be clear the patch is more to start the discussion about how we
Hello,
I am prototyping a change in the behavior of spark.jars conf for my
use-case. spark.jars conf is used to specify a list of jars to include on
the driver and executor classpaths.
*Current behavior:* spark.jars conf value is not read until after the JVM
has already started and the system
On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > wrote:
> Hi Holden,
>
> thanks for leading this discussion, I'm in favor in general. I have one
> specific question -- these two sections seem to contradict each other
> slightly:
>
> > If there is a -1 from a non-committer, mult
Wonderful. To be clear the patch is more to start the discussion about how
we want to do it and less what I think is the right way.
On Wed, Jul 22, 2020 at 10:47 AM Steve Loughran wrote:
>
>
> On Wed, 22 Jul 2020 at 00:51, Holden Karau wrote:
>
>> Hi Folks,
>>
>> In Spark SQL there is the abili
On Wed, 22 Jul 2020 at 00:51, Holden Karau wrote:
> Hi Folks,
>
> In Spark SQL there is the ability to have Spark do it's partition
> discovery/file listing in parallel on the worker nodes and also avoid
> locality lookups. I'd like to expose this in core, but given the Hadoop
> APIs it's a bit m
Hi Holden,
thanks for leading this discussion, I'm in favor in general. I have one
specific question -- these two sections seem to contradict each other
slightly:
> If there is a -1 from a non-committer, multiple committers or the PMC
should be consulted before moving forward.
>
>If the original
W dniu środa, 22 lipca 2020 Driesprong, Fokko
napisał(a):
> That's probably one-time overhead so it is not a big issue. In my
> opinion, a bigger one is possible complexity. Annotations tend to introduce
> a lot of cyclic dependencies in Spark codebase. This can be addressed, but
> don't look gr
There is now a full catalog API you can implement which should give you the
control you are looking for. It is in Spark 3.0 and here is an example
implementation for supporting Cassandra.
https://github.com/datastax/spark-cassandra-connector/blob/master/connector/src/main/scala/com/datastax/spark/
Hi Spark developers,
My team has an internal storage format. It already has an implementaion of data
source v2.
Now we want to adapt catalog support for it. I expect each partition can be
stored in this format and spark catalog can manage partition columns which is
just like using ORC and Par
That's probably one-time overhead so it is not a big issue. In my opinion,
a bigger one is possible complexity. Annotations tend to introduce a lot of
cyclic dependencies in Spark codebase. This can be addressed, but don't
look great.
This is not true (anymore). With Python 3.6 you can add strin
On 7/22/20 3:45 AM, Hyukjin Kwon wrote:
> For now, I tend to think adding type hints to the codes make it
> difficult to backport or revert and
> more difficult to discuss about typing only especially considering
> typing is arguably premature yet.
About being premature ‒ since typing ecosystem e
On 7/21/20 9:40 PM, Holden Karau wrote:
> Yeah I think this could be a great project now that we're only Python
> 3.5+. One potential is making this an Outreachy project to get more
> folks from different backgrounds involved in Spark.
I am honestly not sure if that's really the case.
At the mom
On 7/22/20 3:45 AM, Hyukjin Kwon wrote:
>
> Yeah, I tend to be positive about leveraging the Python type hints in
> general.
>
> However, just to clarify, I don’t think we should just port the type
> hints into the main codes yet but maybe think about
> having/porting Maciej's work, pyi files as s
13 matches
Mail list logo