gruuya opened a new issue, #11865:
URL: https://github.com/apache/datafusion/issues/11865

   ### Is your feature request related to a problem or challenge?
   
   Presently the `information_schema.tables` builder serially loads all tables 
when constructing the output 
https://github.com/apache/datafusion/blob/bddb6415a50746d2803dd908d19c3758952d74f9/datafusion/core/src/catalog_common/information_schema.rs#L93-L102
   
   In our case those are Delta tables with the implications that:
   - Each load (likely) results in network request(s) to an object store, so 
hitting many of them in series will result in slow-down (see 
https://github.com/splitgraph/seafowl/issues/589 for an example)
   - Since we already have the table name the only reason table loading happens 
is to fetch the table type, which in case of Delta tables is hard-coded 
https://github.com/delta-io/delta-rs/blob/aa28d730e1d69ed419f2dc22404c5bbab8e98647/crates/core/src/delta_datafusion/mod.rs#L700
   
   ### Describe the solution you'd like
   
   It seems that loading the full `TableProvider` for each table is an overkill 
since we only ever want to know the table types.
   In addition it would be preferable to have a bulk load method, in case when 
the table type is not hard-coded and must be fetched from an external source. 
   
   In principle this could be achieved by having a method on the schema 
provider that returns `Vec<TableSource>`, since `TableSource` also has the 
table type.
   
   This gets further complicated having in mind that 
`information_schema.columns` and `information_schema.views` also do this serial 
table loading, but in their case it's table schema and table definition that's 
fetched. `TableSource` does have the former, but not the later. 
   
   Moreover, to get a Delta table's schema you really need to 
[load](https://github.com/delta-io/delta-rs/blob/aa28d730e1d69ed419f2dc22404c5bbab8e98647/crates/core/src/table/mod.rs#L316)
 it (unless you also keep track of it someplace else) which brings us back to 
the initial problem.
   
   ### Describe alternatives you've considered
   
   If I know that all the tables are Delta tables make a custom 
`information_schema.tables` builder that just returns the hard-coded table 
type, though this doesn't help with `columns` and `views`.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to