I'll share my take on this. My first choice would be leveraging the Java access modifiers, which enforce the visibility by the programming language. Users won't see non-public classes at all. That is best for the users. Peter mentioned the potential downside of collocating 50 classes under one package/namespace and preventing uses of subpackage structure to divide those classes. Personally, I have no problem with 50 classes under one package but I also understand some people's concern on that.
I am definitely fine with using sub-package structures to organize classes. That is pretty natural. That would require exposing some non-public classes as public to allow the plumbing. We have been using the Flink `@Internal` annotation to mark those classes. We can continue to do so. The Flink community also uses those annotations extensively in the Flink repo. I am neutral about having `api` namespaces/subpackages, like `org.apache.iceberg.flink.maintenance.api`. I am -1 to have a separate `iceberg-flink-api` module (in addition to the `iceberg-flink`) for two reasons: (1) it is probably more complicated than necessary. we don't have such conventions for other connectors inside Iceberg (like Spark, Kafka connect etc.). (2) Fink community also doesn't have the convention of separate connector api modules (like flink-connector-kafka). I don't know if we want to start a new convention and convince both communities of this new convention. On Thu, Sep 19, 2024 at 3:20 AM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Thanks Eduard, > > > Are iceberg-flink users aware that classes annotated with @Internal can > potentially break API/ABI compatibility between Iceberg versions > > It is documented here: > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/ > > *The classes & members of the Java/Scala APIs that are intended for users > are annotated with the following stability annotations:* > > > - *Public* > - *PublicEvolving* > - *Experimental* > > *Annotations on a class also apply to all members of that class, unless > otherwise annotated.* > *Any API without such an annotation is considered internal to Flink, with > no guarantees being provided.* > > > With these annotations we can do anything. The question is more focused on > "how to do it in a way which is easier to work with". > > Thanks, > Peter > > Eduard Tudenhöfner <etudenhoef...@apache.org> ezt írta (időpont: 2024. > szept. 19., Cs, 9:34): > >> Having a separate "api" gradle module seems a lot of work and so starting >> with an "api" package seems to make sense to me. This obviously has the >> side-effect you mentioned (requiring package-private classes to be public). >> >> I don't think we have anything in particular in the Iceberg codebase that >> would help with this unfortunately, so using Flink's @Internal annotation >> there makes sense. Are iceberg-flink users aware that classes annotated >> with @Internal can potentially break API/ABI compatibility between Iceberg >> versions (RevAPI doesn't check/verify iceberg-flink)? >> >> Eduard >> >> On Thu, Sep 19, 2024 at 6:42 AM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> One more idea: >>> - Create a new gradle module for the "api" that would contain all the >>> classes a client could access. >>> >>> This would fit nicely to the Iceberg codebase, but would need a serious >>> refactor of the current code (maybe even the api) >>> >>> I'm still in favor of the api package solution. >>> >>> On Wed, Sep 18, 2024, 22:10 Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> >>>> Hi Team, >>>> >>>> Currently I'm working on the Flink Table Maintenance see: >>>> https://github.com/apache/iceberg/pull/11144#discussion_r1764015878, >>>> and with Steven we are trying to find a good way to organize the incoming >>>> 50 classes. >>>> >>>> There will be: >>>> - ~10 classes which will be used by the users >>>> - ~10 classes which are needed for the infrastructure (scheduling, >>>> locking, etc) >>>> - The rest of the classes are distributed between the 4 planned >>>> maintenance tasks, and some of them are reused between them >>>> >>>> The tools we have: >>>> - Java access modifiers >>>> - Java packages >>>> - Flink annotations (Public/Internal) >>>> - Anything else? >>>> >>>> The possibilities we considered >>>> - Keep everything in a single package, use package private for anything >>>> which is not supposed to be used by the user, and use public modifiers only >>>> on the public API - my issue with this solution that having 50 classes in a >>>> package is too much. >>>> - Start organizing the classes into sub-packages. This requires us to >>>> set public modifiers on classes/methods used in other packages. We can >>>> still leverage the Flink annotations to separate public and internal >>>> classes/methods - my issue with this solution is that it requires >>>> considerable mental effort to remember/check what is public and what is >>>> not. >>>> - Create an 'api' package which contains only the classes used by the >>>> clients, and put everything internal to different packages. We still >>>> can/should use the annotations to mark the other classes internal, but IMHO >>>> this helps understanding the code a lot. >>>> >>>> WDYT? >>>> >>>> Any other suggestions? Maybe even something which is already used in >>>> the Iceberg codebase? >>>> >>>> Thanks, Peter >>>> >>>