Hi Everyone, Thanks for all the discussion so far. I have created a PR to document the requirements https://github.com/apache/iceberg/pull/11365. Please feel free to review it or discuss further in the thread.
Thanks, Walaa. On Fri, Oct 11, 2024 at 5:19 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Hi Benny, > > > we don't need to list out such restrictions because they really depend > on the setup > > I do not think this is correct. The restrictions do not depend on the > setup. They rather dictate it. All restrictions discussed in this thread do > that one way or the other. > > The single engine (Dremio) example does not apply to this discussion. The > spec is clear if a single engine is in use, but the spec is not limited for > single engine use cases. > > > If Dremio intended for that view to be readable by Spark, it would have > to adhere to all those restrictions I listed before. > > Sure, but those restrictions are only stated in the mailing list (in many > forms). We are discussing if we should add them to the spec (in one form). > > Thanks > Walaa. > > > On Fri, Oct 11, 2024 at 4:56 PM Benny Chow <btc...@gmail.com> wrote: > >> Hi Russell >> >> Yes, you listed out the requirements to make the two Spark engines case >> work. Basically, it allows each engine to dynamically resolve the table >> identifiers under the correct catalog name. >> >> Hello Walla >> >> IMO, we don't need to list out such restrictions because they really >> depend on the setup. Multiple Iceberg catalogs? Multiple engines? >> Consistent catalog names? Are views created with USE in context? Today, >> in Dremio, we save tons of views to Nessie with fully qualified SQL >> identifiers to other sources such as mysql or snowflake. Those views may >> or may not have default-catalog and default-namespaces set depending on the >> USE context. If Dremio intended for that view to be readable by Spark, it >> would have to adhere to all those restrictions I listed before. >> >> Thanks >> Benny >> >> On Fri, Oct 11, 2024 at 10:00 AM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> Benny, "Iceberg View Spec Improvements" includes documenting what is >>> supported and what is not. You listed a few restrictions. Many of them are >>> not documented on the current spec. Documenting them is what this thread is >>> about. We are trying to reach a consensus on the necessary constraints (so >>> we are not over- or under-restricting). >>> >>> Russell, I think what you stated is a version of the restrictions. From >>> my point of view, the list of the necessary restrictions are: >>> >>> * Engines must share the same default catalog names, ensuring that >>> partially specified SQL identifiers with catalog omitted are resolved to >>> the same fully specified SQL identifier across all engines. >>> * Engines must share the same default namespaces, ensuring that SQL >>> identifiers without catalog and namespace are resolved to the same fully >>> specified SQL identifier across all engines. >>> * All engines must resolve a fully specified SQL identifier to the same >>> storage table in the same storage catalog. >>> >>> Please let me know if this aligns with what you stated. >>> >>> Thanks, >>> Walaa. >>> >>>