Hi Spark developers,

My team has an internal storage format. It already has an implementaion of data 
source v2.

Now we want to adapt catalog support for it. I expect each partition can be 
stored in this format and spark catalog can manage partition columns which is 
just like using ORC and Parquet.

After checking the logic of DataSource.resolveRelation, I wonder if introducing 
another FileFormat for my storage spec is the only way to support catalog 
managed partition. Could any expert help to confirm?

Another question is the following comments "now catalog for data source V2 is 
under development". Anyone knows the progress or design of feature?


lazy val providingClass: Class[_] = {
  val cls = DataSource.lookupDataSource(className, 
sparkSession.sessionState.conf)
  // `providingClass` is used for resolving data source relation for catalog 
tables.
  // As now catalog for data source V2 is under development, here we fall back 
all the
  // [[FileDataSourceV2]] to [[FileFormat]] to guarantee the current catalog 
works.
  // [[FileDataSourceV2]] will still be used if we call the load()/save() 
method in
  // [[DataFrameReader]]/[[DataFrameWriter]], since they use method 
`lookupDataSource`
  // instead of `providingClass`.
  cls.newInstance() match {
    case f: FileDataSourceV2 => f.fallbackFileFormat
    case _ => cls
  }
}

Thanks,
Kun

Reply via email to