Hi, Let me first clarify a few things so that we are on the same page here:
1. The reason that we are looking into having a new base-module for versions 5.3+ is that new Elasticsearch BulkProcessor APIs are breaking some of our original base API assumptions. This is noticeable from the intended cast here [1]. 2. Given that we are looking into a new base module, Christophe proposed that we focus on designing the new base module around Elasticsearch’s new REST API, so that we are more-future proof. 3. I proposed to remove / deprecate support for earlier versions, because once we introduce the new base module, we would be maintaining a lot of Elasticsearch connector modules (2 base modules, 4+ version specific). Of course, whether or not this really is a problem depends on how much capacity the community has to maintain them, as Steve mentioned. Now, moving on to another proposed solution that should work (at least for now, depends on how Elasticsearch changes their API in the future): The main problem we’ve bumped into is that Elasticsearch changed their BulkProcessor API from accepting `add(ActionRequest)`s to `add(DocWriteRequest)`s. The reason why our elasticsearch-base-module used the `ActionRequest` in the first place was so that sinks could have full functionality to perform both delete and write requests. We can actually just consider separating `RequestIndexer#add(ActionRequest …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(DeleteRequest)`. AFAIK, the IndexRequest class and DeleteRequest class has remained stable across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with that. If I’m not mistaken, this should allow us to avoid introducing a new module (no need for a new base AND a 5.3+ module), still have equal functionality, and continue to use the existing base module across all versions. The only downside would be that we’ll need to deprecate / break `RequestIndexer#add(ActionRequest …)` in favor of the new separated methods. Whether or not we move on to use the HighLevelRestClient for 6.x, would then be a completely orthogonal consideration. We can simply use it in 6.x as an implementation of the API call bridge. What do you think? Best, Gordon [1] https://github.com/apache/flink/pull/4675/files#diff-5539d0d57fa1ac17a36f08d1bb9a90b5R54 On 15 May 2018 at 9:15:52 AM, Steve Blackmon (sblack...@apache.org) wrote: It seems to me that if the transport client dependency is removed, the same module could perform inserts, updates, and deletes via the http bulk API, and whatever version differences exist with that API could be handled inside the module without any difference to the classpath of the pipeline. If that's true there's no need or benefit to deprecating support for earlier elastic version so long as someone is willing to implement test and maintain them. Steve On 5/13/18 at 2:00 PM, Christophe wrote: Hi Gordon, Thanks for your feedback (and Flavio for your support!) About your remarks/questions: - Maybe we can consider removing support for ES 1.x and 2.x starting from 1.6. Those are very old ES versions (considering that ES 6.x has already been out for a while). Do you think this would simply how our base module APIs are designed? I would tend to say it should not change drastically the picture but would have to look into it. - Wouldn't it be possible to have a REST implementation of the `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so, once we remove ES 1.x and 2.x, it might actually be possible to completely replace the current `elasticsearch-base` module. The High level REST API was introduced in Elasticsearch 5.6 so it is not possible to cover 5.5 and below with it. If all the necessary APIs are already here (to be double checked) it should be able cover 5.6. What I noticed when working on the PRs is that 6.2 REST Level High Level client API was improved to be closer to original APIs, if we want to support 5.6 with it we might have to rely on APIs they already improved since then. Not dramatic. But does it worth it knowing this would just be giving us 5.6 not 5.2,3,4 and 5? Now on moving forward I read: I'm definitely a +1 to try to move this forward with a proper fix. and Working around that would require introducing a new base module specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a nice way to go. So if I read you correctly you are ok moving with a proper fix but it should not introduce a new (REST based) base module? Then to be honest I'm not sure how to proceed :) Any more specific feedback on the direction to follow would be great! Thanks, -- Christophe On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Christophe, Thanks for bringing this up. Yes, the main issue with the existing PRs and preventing it from moving forward is how it currently breaks initial assumptions of APIs in the `elasticsearch-base` module. Working around that would require introducing a new base module specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a nice way to go. I had a quick stab at the REST API, and it seems to be promising, especially given that you mentioned that starting from next versions, the current API we use will be fully removed. I'm definitely a +1 to try to move this forward with a proper fix. Some other remarks / questions I have: - Maybe we can consider removing support for ES 1.x and 2.x starting from 1.6. Those are very old ES versions (considering that ES 6.x has already been out for a while). Do you think this would simply how our base module APIs are designed? - Wouldn't it be possible to have a REST implementation of the `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so, once we remove ES 1.x and 2.x, it might actually be possible to completely replace the current `elasticsearch-base` module. Cheers, Gordon On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <pomperma...@okkam.it wrote: +1. Torally agree On Sat, 12 May 2018, 18:14 Christophe Jolif, <cjo...@gmail.com> wrote: Hi all, There is quite some time Flink Elasticsearch sink is broken for Elastisearch 5.x (nearly a year): https://issues.apache.org/jira/browse/FLINK-7386 And there is no support for Elasticsearch 6.x: https://issues.apache.org/jira/browse/FLINK-8101 However several PRs were issued: https://github.com/apache/flink/pull/4675 https://github.com/apache/flink/pull/5374 I also raised the issue on the mailing list in the 1.5 timeframe: http://apache-flink-mailing-list-archive.1008284.n3. nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905 But things are still not really moving. However this seems something people are looking for, so I would really like the community to consider that for 1.6. The problems I see from comments on the PRs: - getting something that is following the Flink APIs initially created is a nightmare because Elastic is pretty good at breaking compatibility the hard way (see in particular in the last PR the cast that have to be made to get an API that works in all cases) - Elasticsearch is moving away from their "native" API Flink is using to the REST APIs so there is little common ground between pre 6 and post 6 even if Elasticsearch tried to get some level of compatibility in the APIs. My fear is that by trying to kill two birds with one stone, we actually get nothing done. In the hope of moving that forward I would like to propose for 1.6 a new Elasticsearch 6.x+ sink that would follow the design of the previous ones BUT only leverage the new REST API and not inherit from existing classes. It would really be close to what is in my previous PR: https://github.com/apache/flink/pull/5374 but just focus on E6+/REST and so avoid any "strange" cast. This would not fill the gap of the 5.2+ not working but at least we would be back on track with something for the future as REST API is where Elastic is going. If people feel there is actual interest and chances this can be merged I'll be working on issuing a new PR around that. Alternative is to get back working on the existing PR but it seems to be a dead-end at the moment and not necessarily the best option long term as anyway Elasticsearch is looking into promoting the REST API. Please let me know what you think? -- Christophe