Re: Introducing Flink's Plugin mechanism

Piotr Nowojski Wed, 10 Apr 2019 12:25:48 -0700

Hi Biao,

No there is not. The current scope is so small and it was actually very easy to 
implement, that design doc might have been an overkill for it. While for the 
future plans we decided consciously to not account for them and not plan at the 
moment, but to tackle them lazily.


Idea behind current proposal is described at the moment in java docs [1], or in 
the tests and there will be of course a documentation update. Gist is to have 
`./plugins` directory, with subdirectories of jars:

./plugins/pluginA/foo-1.0.jar
./plugins/pluginA/bar-0.9.jar

./plugins/pluginB/foo-2.0.jar

All of the jars from each plugin subdirectory will be loaded in separate class 
loaders and instead of looking for `FileSystem.class` implementations in the 
Flink’s main class loader, we will iterate over those plugins’ class loaders 
and search inside them. Which is a simple extension of the current mechanism.

In the future we will need some discussion how do we want to evolve this 
mechanism to handle more than just `FileSystem`s:
1. Will we want to introduce some centralised architecture? That we would 
discover a `Plugin.class` implementations and `Plugin` would provide 
`FileSystem`, `MetricReporter`, `Connector` implementations? Or would we 
discover `FileSystem`, `MetricReporter`, `Connector` separately? 
2. Do we want to support “dynamically” loaded plugins, provided during the job 
submission?
3. Do we want to support somehow DataStream Sources/Sinks as plugins? If yes, 
how would the API look like? Some shared between connectors Sinks/Sources 
factory interface?
4. How to expose `Plugins` inside operators, especially in Table API/SQL 
operators to support TableSource/SinkFactories loaded via plugins?  
5. …?

Those are some of the questions that we are intentionally trying to avoid at 
the moment.

Piotrek   

[1] 
https://github.com/apache/flink/pull/8038/commits/06d07e846d1b3682538cea203de1371fc79b9940#diff-0599320f7dc729e412841c58534d6fe5
 
<https://github.com/apache/flink/pull/8038/commits/06d07e846d1b3682538cea203de1371fc79b9940#diff-0599320f7dc729e412841c58534d6fe5>

> On 10 Apr 2019, at 17:40, Biao Liu <mmyy1...@gmail.com> wrote:
> 
> Hi Stefan &Piotr,
> Thank you for bringing this discussion. As Zhijiang said, class conflict
> makes a lot of trouble in our production environment.
> I was wondering is there any design document currently? It might be helpful
> to understand the PR and even the whole picture as Piotr said in the future
> it could be extended to other modules.
> 
> Stefan Richter <s.rich...@ververica.com> 于2019年4月10日周三 下午11:22写道：
> 
>> Thank you Piotr for bringing this discussion to the mailing list! As it
>> was not explicitly mentioned in the first email, I wanted to add that there
>> is also already an open PR[1] with my implementation of the basic plugin
>> mechanism for FileSystem. Looking forward to some feedback from the
>> community.
>> 
>> 
>> [1] https://github.com/apache/flink/pull/8038 <
>> https://github.com/apache/flink/pull/8038>
>> 
>> Best,
>> Stefan
>> 
>>> On 10. Apr 2019, at 17:08, zhijiang <wangzhijiang...@aliyun.com.INVALID>
>> wrote:
>>> 
>>> Thanks Piotr for proposing this new feature.
>>> 
>>> The solution for class loader issue is really helpful in production, and
>> we ofen encountered this pain point before.
>>> It might bring more possibilities based on this pluggable mechanism.
>> Hope to see the progress soon. :)
>>> 
>>> Best,
>>> Zhijiang
>>> ------------------------------------------------------------------
>>> From:Jeff Zhang <zjf...@gmail.com>
>>> Send Time:2019年4月10日(星期三) 22:01
>>> To:dev <dev@flink.apache.org>
>>> Subject:Re: Introducing Flink's Plugin mechanism
>>> 
>>> Thank Piotr for driving this plugin mechanism.  Pluggability is pretty
>>> important for the ecosystem of flink.
>>> 
>>> Piotr Nowojski <pi...@ververica.com> 于2019年4月10日周三 下午5:48写道：
>>> 
>>>> Hi Flink developers,
>>>> 
>>>> I would like to introduce a new plugin loading mechanism that we are
>>>> working on right now [1]. The idea is quite simple: isolate services in
>>>> separate independent class loaders, so that classes and dependencies do
>> not
>>>> leak between them and/or Flink runtime itself. Currently we have quite
>> some
>>>> problems with dependency convergence in multiple places. Some of them we
>>>> are solving by shading (built in file systems, metrics), some we are
>>>> forcing users to deal with them (custom file systems/metrics) and
>> others we
>>>> do not solve (connectors - we do not support using different Kafka
>> versions
>>>> in the same job/SQL). With proper plugins, that are loaded in
>> independent
>>>> class loaders, those issues could be solved in a generic way.
>>>> 
>>>> Current scope of implementation targets only file systems, without a
>>>> centralised Plugin architecture and with Plugins that are only
>> “statically”
>>>> initialised at the TaskManager and JobManager start up. More or less we
>> are
>>>> just replacing the way how FileSystem’s implementations are discovered &
>>>> loaded.
>>>> 
>>>> In the future this idea could be extended to different modules, like
>>>> metric reporters, connectors, functions/data types (especially in SQL),
>>>> state backends, internal storage or other future efforts. Some of those
>>>> would be easier than others: the metric reporters would require some
>>>> smaller refactor, while connectors would require some bigger API design
>>>> discussions, which I would like to avoid at the moment. Nevertheless I
>>>> wanted to reach out with this idea so if some other potential use cases
>> pop
>>>> up in the future, more people will be aware.
>>>> 
>>>> Piotr Nowojski
>>>> 
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/FLINK-11952 <
>>>> https://issues.apache.org/jira/browse/FLINK-11952>
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
>> 
>>

Re: Introducing Flink's Plugin mechanism

Reply via email to