I am also in favor of the option3. Since the Flink FileSystem has the very
similar implementation via plugin mechanism. It has a map "FS_FACTORIES"
to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS, etc.).
And provide some common interfaces.


Best,
Yang

Yangze Guo <karma...@gmail.com> 于2020年4月29日周三 下午3:54写道:

> For your convenience, I modified the Tokenizer in "WordCount"[1] case
> to show how UDF leverages GPU info and how we found that problem.
>
> [1]
> https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java
>
> Best,
> Yangze Guo
>
> On Wed, Apr 29, 2020 at 3:25 PM Xintong Song <tonysong...@gmail.com>
> wrote:
> >
> > >
> > > Will she ask for some properties and then pass them to another
> component?
> >
> > Yes. Take GPU as an example, the property needed is "GPU index", and the
> > index will be used to tell the OS which GPU should be used for the
> > computing workload.
> >
> >
> > > Where does this component come from?
> >
> > The component could be either the UDF/operator itself, or some AI
> libraries
> > used by the operator. For 1.11, we do not have plan for introducing new
> GPU
> > aware operators in Flink. So all the usages of the GPU resources should
> > come from UDF. Please correct me if I am wrong, @Becket.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
> >
> > > Thanks for bringing this up Yangze and Xintong. I see the problem.
> Help me
> > > to understand how the ExternalResourceInfo is intended to be used by
> the
> > > user. Will she ask for some properties and then pass them to another
> > > component? Where does this component come from?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Apr 29, 2020 at 9:05 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for kicking off this discussion, Yangze.
> > > >
> > > > First, let me try to explain a bit more about this problem. Since we
> > > > decided to make the `ExternalResourceDriver` a plugin whose
> > > implementation
> > > > could be provided by user, we think it makes sense to leverage
> Flink’s
> > > > plugin mechanism and load the drivers in separated class loaders to
> avoid
> > > > potential risk of dependency conflicts. However, that means
> > > > `RuntimeContext` and user codes do not naturally have access to
> classes
> > > > defined in the plugin. In the current design,
> > > > `RuntimeContext#getExternalResourceInfos` takes the concrete
> > > > `ExternalResourceInfo` implementation class as an argument. This will
> > > cause
> > > > problem when user codes try to pass in the argument, and when
> > > > `RuntimeContext` tries to do the type check/cast.
> > > >
> > > >
> > > > To my understanding, the root problem is probably that we should not
> > > depend
> > > > on a specific implementation of the `ExternalResourceInfo` interface
> from
> > > > outside the plugin (user codes & runtime context). To that end,
> > > regardless
> > > > the detailed interface design, I'm in favor of the direction of the
> 3rd
> > > > approach. I think it makes sense to add some general
> information/property
> > > > accessing interfaces in `ExternalResourceInfo` (e.g., a key-value
> > > property
> > > > map), so that in most cases users do not need to cast the
> > > > `ExternalResourceInfo` into concrete subclasses.
> > > >
> > > >
> > > > Regarding the detailed interface design, I'm not sure about using
> > > > `Properties`. I think the information contained in a
> > > `ExternalResourceInfo`
> > > > can be considered as a unmodifiable map. So maybe something like the
> > > > following?
> > > >
> > > >
> > > > public interface ExternalResourceInfo {
> > > > >     String getProperty(String key);
> > > > >     Map<String, String> getProperties();
> > > > > }
> > > >
> > > >
> > > > WDYT?
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Apr 29, 2020 at 2:40 PM Yangze Guo <karma...@gmail.com>
> wrote:
> > > >
> > > > > Hi, there:
> > > > >
> > > > > The "FLIP-108: Add GPU support in Flink"[1] is now working in
> > > > > progress. However, we met a problem with
> > > > > "RuntimeContext#getExternalResourceInfos" if we want to leverage
> the
> > > > > Plugin[2] mechanism in Flink.
> > > > > The interface is:
> > > > > The problem is now:
> > > > > public interface RuntimeContext {
> > > > >     /**
> > > > >      * Get the specific external resource information by the
> > > > resourceName.
> > > > >      */
> > > > >     <T extends ExternalResourceInfo> Set<T>
> > > > > getExternalResourceInfos(String resourceName, Class<T>
> > > > > externalResourceType);
> > > > > }
> > > > > The problem is that the mainClassLoader does not recognize the
> > > > > subclasses of ExternalResourceInfo. Those ExternalResourceInfo is
> > > > > located in ExternalResourceDriver jar and has been isolated from
> > > > > mainClassLoader by PluginManager. So, ClassNotFoundExeption will be
> > > > > thrown out.
> > > > >
> > > > > The solution could be:
> > > > >
> > > > > - Not leveraging the plugin mechanism. Just load drivers to
> > > > > mainClassLoader. The drawback is that user needs to handle the
> > > > > dependency conflict.
> > > > >
> > > > > - Force user to build two separate jars. One for the
> > > > > ExternalResourceDriver, the other for the ExternalResourceInfo. The
> > > > > jar including ExternalResourceInfo class should be added to “/lib”
> > > > > dir. This approach probably makes sense but might annoy user.
> > > > >
> > > > > - Change the RuntimeContext#getExternalResourceInfos, let it return
> > > > > ExternalResourceInfo and add something like “Properties getInfo()”
> to
> > > > > ExternalResourceInfo interface. The contract for resolving the
> return
> > > > > value would be specified by the driver provider and user. The Flink
> > > > > core does not need to be aware of the concrete implementation:
> > > > > public interface RuntimeContext {
> > > > >     /**
> > > > >      * Get the specific external resource information by the
> > > > resourceName.
> > > > >      */
> > > > >     Set<ExternalResourceInfo> getExternalResourceInfos(String
> > > > > resourceName);
> > > > > }
> > > > > public interface ExternalResourceInfo {
> > > > >     Properties getInfo();
> > > > > }
> > > > >
> > > > > From my side, I prefer the third approach:
> > > > > - Regarding usability, it frees user from handling dependency or
> > > > > packaging two jars.
> > > > > - It decouples the Flink's mainClassLoader from the concrete
> > > > > implementation of ExternalResourceInfo.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > [2]
> > > > >
> > >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/plugins.html
> > > > >
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > >
> > >
>

Reply via email to