t;
Date: Wednesday, September 19, 2018 at 4:35 PM
To: "Thakrar, Jayesh"
Cc: "tigerqu...@outlook.com" , Spark Dev List
Subject: Re: [Discuss] Datasource v2 support for manipulating partitions
What does partition management look like in those systems and what are the
optio
*To: *"Thakrar, Jayesh"
> *Cc: *"tigerqu...@outlook.com" , Spark Dev List <
> dev@spark.apache.org>
> *Subject: *Re: [Discuss] Datasource v2 support for manipulating partitions
>
>
>
> I'm open to exploring the idea of adding partition management as a catalog
"
Cc: "tigerqu...@outlook.com" , Spark Dev List
Subject: Re: [Discuss] Datasource v2 support for manipulating partitions
I'm open to exploring the idea of adding partition management as a catalog API.
The approach we're taking is to have an interface for each concern a cat
I'm open to exploring the idea of adding partition management as a catalog
API. The approach we're taking is to have an interface for each concern a
catalog might implement, like TableCatalog (proposed in SPARK-24252), but
also FunctionCatalog for stored functions and possibly
PartitionedTableCatal
Totally agree with you Dale, that there are situations for efficiency,
performance and better control/visibility/manageability that we need to expose
partition management.
So as described, I suggested two things - the ability to do it in the current
V2 API form via options and appropriate imple
Hi Jayesh,
I get where you are coming from - partitions are just an implementation
optimisation that we really shouldn’t be bothering the end user with.
Unfortunately that view is like saying RPC is like a procedure call, and
details of the network transport should be hidden from the end user. COR
I am not involved with the design or development of the V2 API - so these could
be naïve comments/thoughts.
Just as dataset is to abstract away from RDD, which otherwise required a little
more intimate knowledge about Spark internals, I am guessing the absence of
partition operations is either d
I've been following the development of the new data source abstraction with
keen interest. One of the issues that has occurred to me as I sat down and
planned how I would implement a data source is how I would support
manipulating partitions.
My reading of the current prototype is that Data sourc