I agree with Till and Xintong, if the ExternalResourceInfo is only a
holder of properties that doesn't have any sublasses it can just become
the "properties" itself.
Aljoscha
On 30.04.20 12:49, Till Rohrmann wrote:
Thanks for the clarification.
I think you are right that the typed approach does not work with the plugin
mechanism because even if we had the specific ExternalResourceInfo subtype
available one could not cast it into this type because the actual instance
has been loaded by a different class loader.
I also think that the plugin approach is indeed best in order to avoid
dependency conflicts. Hence, I believe that the third proposal is a good
solution. I agree with Xintong, that we should not return a Properties
instance though. Maybe
public interface ExternalResourceInfo {
String getProperty(String key);
Collection<String> getKeys();
}
would be good enough.
Cheers,
Till
On Wed, Apr 29, 2020 at 2:17 PM Yang Wang <danrtsey...@gmail.com> wrote:
I am also in favor of the option3. Since the Flink FileSystem has the very
similar implementation via plugin mechanism. It has a map "FS_FACTORIES"
to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS,
etc.).
And provide some common interfaces.
Best,
Yang
Yangze Guo <karma...@gmail.com> 于2020年4月29日周三 下午3:54写道:
For your convenience, I modified the Tokenizer in "WordCount"[1] case
to show how UDF leverages GPU info and how we found that problem.
[1]
https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java
Best,
Yangze Guo
On Wed, Apr 29, 2020 at 3:25 PM Xintong Song <tonysong...@gmail.com>
wrote:
Will she ask for some properties and then pass them to another
component?
Yes. Take GPU as an example, the property needed is "GPU index", and
the
index will be used to tell the OS which GPU should be used for the
computing workload.
Where does this component come from?
The component could be either the UDF/operator itself, or some AI
libraries
used by the operator. For 1.11, we do not have plan for introducing new
GPU
aware operators in Flink. So all the usages of the GPU resources should
come from UDF. Please correct me if I am wrong, @Becket.
Thank you~
Xintong Song
On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann <trohrm...@apache.org>
wrote:
Thanks for bringing this up Yangze and Xintong. I see the problem.
Help me
to understand how the ExternalResourceInfo is intended to be used by
the
user. Will she ask for some properties and then pass them to another
component? Where does this component come from?
Cheers,
Till
On Wed, Apr 29, 2020 at 9:05 AM Xintong Song <tonysong...@gmail.com>
wrote:
Thanks for kicking off this discussion, Yangze.
First, let me try to explain a bit more about this problem. Since
we
decided to make the `ExternalResourceDriver` a plugin whose
implementation
could be provided by user, we think it makes sense to leverage
Flink’s
plugin mechanism and load the drivers in separated class loaders to
avoid
potential risk of dependency conflicts. However, that means
`RuntimeContext` and user codes do not naturally have access to
classes
defined in the plugin. In the current design,
`RuntimeContext#getExternalResourceInfos` takes the concrete
`ExternalResourceInfo` implementation class as an argument. This
will
cause
problem when user codes try to pass in the argument, and when
`RuntimeContext` tries to do the type check/cast.
To my understanding, the root problem is probably that we should
not
depend
on a specific implementation of the `ExternalResourceInfo`
interface
from
outside the plugin (user codes & runtime context). To that end,
regardless
the detailed interface design, I'm in favor of the direction of the
3rd
approach. I think it makes sense to add some general
information/property
accessing interfaces in `ExternalResourceInfo` (e.g., a key-value
property
map), so that in most cases users do not need to cast the
`ExternalResourceInfo` into concrete subclasses.
Regarding the detailed interface design, I'm not sure about using
`Properties`. I think the information contained in a
`ExternalResourceInfo`
can be considered as a unmodifiable map. So maybe something like
the
following?
public interface ExternalResourceInfo {
String getProperty(String key);
Map<String, String> getProperties();
}
WDYT?
Thank you~
Xintong Song
On Wed, Apr 29, 2020 at 2:40 PM Yangze Guo <karma...@gmail.com>
wrote:
Hi, there:
The "FLIP-108: Add GPU support in Flink"[1] is now working in
progress. However, we met a problem with
"RuntimeContext#getExternalResourceInfos" if we want to leverage
the
Plugin[2] mechanism in Flink.
The interface is:
The problem is now:
public interface RuntimeContext {
/**
* Get the specific external resource information by the
resourceName.
*/
<T extends ExternalResourceInfo> Set<T>
getExternalResourceInfos(String resourceName, Class<T>
externalResourceType);
}
The problem is that the mainClassLoader does not recognize the
subclasses of ExternalResourceInfo. Those ExternalResourceInfo is
located in ExternalResourceDriver jar and has been isolated from
mainClassLoader by PluginManager. So, ClassNotFoundExeption will
be
thrown out.
The solution could be:
- Not leveraging the plugin mechanism. Just load drivers to
mainClassLoader. The drawback is that user needs to handle the
dependency conflict.
- Force user to build two separate jars. One for the
ExternalResourceDriver, the other for the ExternalResourceInfo.
The
jar including ExternalResourceInfo class should be added to
“/lib”
dir. This approach probably makes sense but might annoy user.
- Change the RuntimeContext#getExternalResourceInfos, let it
return
ExternalResourceInfo and add something like “Properties
getInfo()”
to
ExternalResourceInfo interface. The contract for resolving the
return
value would be specified by the driver provider and user. The
Flink
core does not need to be aware of the concrete implementation:
public interface RuntimeContext {
/**
* Get the specific external resource information by the
resourceName.
*/
Set<ExternalResourceInfo> getExternalResourceInfos(String
resourceName);
}
public interface ExternalResourceInfo {
Properties getInfo();
}
From my side, I prefer the third approach:
- Regarding usability, it frees user from handling dependency or
packaging two jars.
- It decouples the Flink's mainClassLoader from the concrete
implementation of ExternalResourceInfo.
Looking forward to your feedback.
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
[2]
https://ci.apache.org/projects/flink/flink-docs-master/ops/plugins.html
Best,
Yangze Guo