alamb opened a new issue, #19573: URL: https://github.com/apache/datafusion/issues/19573
### Is your feature request related to a problem or challenge? - part of https://github.com/apache/datafusion/issues/17214 As part of https://github.com/apache/datafusion/pull/19366, @BlakeOrth added a session (global) cache for the results of calling LIST on a remote directory. This has a benefit that now this cache is visible, and we can report on it, and make it more aligned with the other session scoped caches. However, it has an unfortunate side effect, namely that there is no good way to force a refresh of the files that back an external table. For example: ```sql -- calls LIST to get the files create external table foo... drop table foo; -- reuses the cached file list, but previously would actually call LIST to get a (potentially updated) version of the file. create external table foo ... ``` Previously, the cache was local to each `ListingTable` and thus was recreated on each call to `CREATE EXTERNAL TABLE`. This means that a user could force a refresh of the file list by recreating the table. Reusing the same cached list I think is pretty confusing. @jizezhang has helpfully volunteered to help with this feature and has another PR queued up so we merged https://github.com/apache/datafusion/pull/19366 and will fix this after the fact ### Describe the solution you'd like I think the caches should still be (logically) table scoped -- so that when a CREATE EXTERNAL TABLE command is issues, it will actually make a call to LIST to see the current contents of the remote table ### Describe alternatives you've considered One idea is to have "sub caches" or something that are all table scoped but that the session level cache has a handle to (so it can report on them) @BlakeOrth also proposed: > it would potentially be reasonable to treat a DROP command like INSERT where we manually invalidate the cache entries for that table's path. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
