Hi, Roger, Got it! I would like to understand more on the SystemProducer API changes required by #1 and #2. Could you elaborate a bit more?
Regarding to JDK8 required in the new HTTP-based Elasticsearch producer, I want to ask how you are motivated to go w/ JDK8. It does bring a lot more nice features. If we deprecate source-level compatibility to JDK7, we can benefit from a lot of new features from JDK8, like lambda, stream APIs, etc. And refactor Scala code to JDK8 is also much easier. Thanks! -Yi On Tue, Feb 9, 2016 at 4:19 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > Hi Yi, > > It could be merged into the Samza project if there's enough interest but > may need some re-working depending on which dependencies are ok to bring > in. I did it outside of the Samza project first because I had to get it > done quickly so it relies on Java 8 features, dropwizard metrics for > histogram metrics, and JEST (https://github.com/searchbox-io/Jest) which > itself drags in more dependencies (Guava, Gson, commons http). > > There are few issues with the existing ElasticsearchSystemProducer: > > 1. The plugin API (IndexRequestFactory) is tied to the Elasticsearch > Java API (a bulky dependency) > 2. It only supports index requests. I needed to also support updates > and deletes. > 3. There currently no plugin mechanism to register a flush listener. > The reason I needed that was to be able to report end to end latency > stats > (total pipeline latency = commit time - event time). > > #3 is easily solvable with a additional plugin options. #1 and #2 require > changing the system producer API. > > Roger > > On Tue, Feb 9, 2016 at 10:56 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Roger, > > > > That's awesome! Are you planning to submit the HTTP-based system producer > > in Samza open-source samza-elasticsearch module? If ElasticSearch > community > > suggest that HTTP-based clients be the recommended way, we should use it > in > > samza-elasticsearch as well. And what's your opinion on the existing > > ElasticsearchSystemProducer? If the SystemProducer APIs and configure > > options do not change, I would vote to replace the implementation w/ > > HTTP-based ElasticsearchSystemProducer. > > > > Thanks for putting this new additions up! > > > > -Yi > > > > On Tue, Feb 9, 2016 at 10:39 AM, Roger Hoover <roger.hoo...@gmail.com> > > wrote: > > > > > Hi Samza folks, > > > > > > For people who want to use HTTP to integrate with Elasticsearch, I > wrote > > an > > > HTTP-based system producer and a reusable task, including latency stats > > > from event origin time, task processing time, and time spent talking to > > > Elasticsearch API. > > > > > > > > > https://github.com/quantiply/rico/blob/master/docs/common_tasks/es-push.md > > > > > > Cheers, > > > > > > Roger > > > > > >