westonpace commented on issue #10339: URL: https://github.com/apache/datafusion/issues/10339#issuecomment-2504235618
I've made a stab at this. > However, my personal opinion is that such encouragement can be done via documentation and if people want to implement RPC network calls during planning then the APIs shouldn't stop them The easiest way we've found to do this kind of thing is with a metadata cache. However, this cache gets invalidated and has cold start, etc. The problem with "warming the cache prior to the query" is that it is very difficult to determine which entries will be required by an SQL string. Loading the entire catalog into memory for a single query is prohibitively expensive for us. > I think the biggest challenge is, as @metesynnada hints at above, the viral nature of async -- if we make such APIs async then everywhere they are called must also be be async -- I haven't looked at how far down the stack that is but it could be substantail. Yes :cold_sweat: > An alternate approach might be to implement, via some hackery and tokio channels, an struct that implements the SchemaProvider without changes (sync) but can call async methods (though that would block the runtime thread 🤔 ) I could find no reasonable way to implement such hackery. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org