Re: [DISCUSS][SQL] What is the best practice to add catalog support for customized storage format.

Russell Spitzer Wed, 22 Jul 2020 06:38:26 -0700

There is now a full catalog API you can implement which should give you the
control you are looking for. It is in Spark 3.0 and here is an example
implementation for supporting Cassandra.


https://github.com/datastax/spark-cassandra-connector/blob/master/connector/src/main/scala/com/datastax/spark/connector/datasource/CassandraCatalog.scala

I would definitely recommend using this api rather than messing with
catalyst directly.

On Wed, Jul 22, 2020, 7:58 AM Kun H. <kuh...@microsoft.com.invalid> wrote:

>
> Hi Spark developers,
>
> My team has an internal storage format. It already has an implementaion of
> data source v2.
>
> Now we want to adapt catalog support for it. I expect each partition can
> be stored in this format and spark catalog can manage partition columns
> which is just like using ORC and Parquet.
>
> After checking the logic of DataSource.resolveRelation, I wonder if
> introducing another FileFormat for my storage spec is the only way to
> support catalog managed partition. Could any expert help to confirm?
>
> Another question is the following comments "*now catalog for data source
> V2 is under development*". Anyone knows the progress or design of feature?
>
> lazy val providingClass: Class[_] = {
>   val cls = DataSource.lookupDataSource(className, 
> sparkSession.sessionState.conf)
>   // `providingClass` is used for resolving data source relation for catalog 
> tables.
>   // *As now catalog for data source V2 is under development*, here we fall 
> back all the
>   // [[FileDataSourceV2]] to [[FileFormat]] to guarantee the current catalog 
> works.
>   // [[FileDataSourceV2]] will still be used if we call the load()/save() 
> method in
>   // [[DataFrameReader]]/[[DataFrameWriter]], since they use method 
> `lookupDataSource`
>   // instead of `providingClass`.
>   cls.newInstance() match {
>     case f: FileDataSourceV2 => f.fallbackFileFormat
>     case _ => cls
>   }
> }
>
>
> Thanks,
> Kun
>

Re: [DISCUSS][SQL] What is the best practice to add catalog support for customized storage format.

Reply via email to