I am also in favor of the option3. Since the Flink FileSystem has the very similar implementation via plugin mechanism. It has a map "FS_FACTORIES" to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS, etc.). And provide some common interfaces.
Best, Yang Yangze Guo <karma...@gmail.com> 于2020年4月29日周三 下午3:54写道: > For your convenience, I modified the Tokenizer in "WordCount"[1] case > to show how UDF leverages GPU info and how we found that problem. > > [1] > https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java > > Best, > Yangze Guo > > On Wed, Apr 29, 2020 at 3:25 PM Xintong Song <tonysong...@gmail.com> > wrote: > > > > > > > > Will she ask for some properties and then pass them to another > component? > > > > Yes. Take GPU as an example, the property needed is "GPU index", and the > > index will be used to tell the OS which GPU should be used for the > > computing workload. > > > > > > > Where does this component come from? > > > > The component could be either the UDF/operator itself, or some AI > libraries > > used by the operator. For 1.11, we do not have plan for introducing new > GPU > > aware operators in Flink. So all the usages of the GPU resources should > > come from UDF. Please correct me if I am wrong, @Becket. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > > > Thanks for bringing this up Yangze and Xintong. I see the problem. > Help me > > > to understand how the ExternalResourceInfo is intended to be used by > the > > > user. Will she ask for some properties and then pass them to another > > > component? Where does this component come from? > > > > > > Cheers, > > > Till > > > > > > On Wed, Apr 29, 2020 at 9:05 AM Xintong Song <tonysong...@gmail.com> > > > wrote: > > > > > > > Thanks for kicking off this discussion, Yangze. > > > > > > > > First, let me try to explain a bit more about this problem. Since we > > > > decided to make the `ExternalResourceDriver` a plugin whose > > > implementation > > > > could be provided by user, we think it makes sense to leverage > Flink’s > > > > plugin mechanism and load the drivers in separated class loaders to > avoid > > > > potential risk of dependency conflicts. However, that means > > > > `RuntimeContext` and user codes do not naturally have access to > classes > > > > defined in the plugin. In the current design, > > > > `RuntimeContext#getExternalResourceInfos` takes the concrete > > > > `ExternalResourceInfo` implementation class as an argument. This will > > > cause > > > > problem when user codes try to pass in the argument, and when > > > > `RuntimeContext` tries to do the type check/cast. > > > > > > > > > > > > To my understanding, the root problem is probably that we should not > > > depend > > > > on a specific implementation of the `ExternalResourceInfo` interface > from > > > > outside the plugin (user codes & runtime context). To that end, > > > regardless > > > > the detailed interface design, I'm in favor of the direction of the > 3rd > > > > approach. I think it makes sense to add some general > information/property > > > > accessing interfaces in `ExternalResourceInfo` (e.g., a key-value > > > property > > > > map), so that in most cases users do not need to cast the > > > > `ExternalResourceInfo` into concrete subclasses. > > > > > > > > > > > > Regarding the detailed interface design, I'm not sure about using > > > > `Properties`. I think the information contained in a > > > `ExternalResourceInfo` > > > > can be considered as a unmodifiable map. So maybe something like the > > > > following? > > > > > > > > > > > > public interface ExternalResourceInfo { > > > > > String getProperty(String key); > > > > > Map<String, String> getProperties(); > > > > > } > > > > > > > > > > > > WDYT? > > > > > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Wed, Apr 29, 2020 at 2:40 PM Yangze Guo <karma...@gmail.com> > wrote: > > > > > > > > > Hi, there: > > > > > > > > > > The "FLIP-108: Add GPU support in Flink"[1] is now working in > > > > > progress. However, we met a problem with > > > > > "RuntimeContext#getExternalResourceInfos" if we want to leverage > the > > > > > Plugin[2] mechanism in Flink. > > > > > The interface is: > > > > > The problem is now: > > > > > public interface RuntimeContext { > > > > > /** > > > > > * Get the specific external resource information by the > > > > resourceName. > > > > > */ > > > > > <T extends ExternalResourceInfo> Set<T> > > > > > getExternalResourceInfos(String resourceName, Class<T> > > > > > externalResourceType); > > > > > } > > > > > The problem is that the mainClassLoader does not recognize the > > > > > subclasses of ExternalResourceInfo. Those ExternalResourceInfo is > > > > > located in ExternalResourceDriver jar and has been isolated from > > > > > mainClassLoader by PluginManager. So, ClassNotFoundExeption will be > > > > > thrown out. > > > > > > > > > > The solution could be: > > > > > > > > > > - Not leveraging the plugin mechanism. Just load drivers to > > > > > mainClassLoader. The drawback is that user needs to handle the > > > > > dependency conflict. > > > > > > > > > > - Force user to build two separate jars. One for the > > > > > ExternalResourceDriver, the other for the ExternalResourceInfo. The > > > > > jar including ExternalResourceInfo class should be added to “/lib” > > > > > dir. This approach probably makes sense but might annoy user. > > > > > > > > > > - Change the RuntimeContext#getExternalResourceInfos, let it return > > > > > ExternalResourceInfo and add something like “Properties getInfo()” > to > > > > > ExternalResourceInfo interface. The contract for resolving the > return > > > > > value would be specified by the driver provider and user. The Flink > > > > > core does not need to be aware of the concrete implementation: > > > > > public interface RuntimeContext { > > > > > /** > > > > > * Get the specific external resource information by the > > > > resourceName. > > > > > */ > > > > > Set<ExternalResourceInfo> getExternalResourceInfos(String > > > > > resourceName); > > > > > } > > > > > public interface ExternalResourceInfo { > > > > > Properties getInfo(); > > > > > } > > > > > > > > > > From my side, I prefer the third approach: > > > > > - Regarding usability, it frees user from handling dependency or > > > > > packaging two jars. > > > > > - It decouples the Flink's mainClassLoader from the concrete > > > > > implementation of ExternalResourceInfo. > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > > [2] > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/ops/plugins.html > > > > > > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > > > >