alamb opened a new issue, #19573:
URL: https://github.com/apache/datafusion/issues/19573

   ### Is your feature request related to a problem or challenge?
   
   - part of https://github.com/apache/datafusion/issues/17214
   
   As part of https://github.com/apache/datafusion/pull/19366, @BlakeOrth added 
 a session (global) cache for the results of calling LIST on a remote 
directory. 
   
   This has a benefit that now this cache is visible, and we can report on it, 
and make it more aligned with the other session scoped caches. However, it has 
an unfortunate side effect, namely that there is no good way to force a refresh 
of the files that back an external table. 
   
   
   For example:
   ```sql
   -- calls LIST to get the files
   create external table foo...
   drop table foo;
   -- reuses the cached file list, but previously would actually call LIST to 
get a (potentially updated) version of the file. 
   create external table foo ...
   ```
   
   Previously, the cache was local to each `ListingTable` and thus was 
recreated on each call to `CREATE EXTERNAL TABLE`. This means that a user could 
force a refresh of the file list by recreating the table.  Reusing the same 
cached list I think is pretty confusing.
   
   
    @jizezhang has helpfully volunteered to help with this feature and has 
another PR queued up so we merged 
https://github.com/apache/datafusion/pull/19366 and will fix this after the fact
   
   ### Describe the solution you'd like
   
   I think the  caches should still be (logically) table scoped -- so that when 
a CREATE EXTERNAL TABLE command is issues, it will actually make a call to LIST 
to see the current contents of the remote table
   
   
   ### Describe alternatives you've considered
   
   One idea is to have "sub caches" or something that are all table scoped but 
that the session level cache has a handle to (so it can report on them)
   
   @BlakeOrth also proposed:
   
   >  it would potentially be reasonable to treat a DROP command like INSERT 
where we manually invalidate the cache entries for that table's path.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to