Hi Yi, It could be merged into the Samza project if there's enough interest but may need some re-working depending on which dependencies are ok to bring in. I did it outside of the Samza project first because I had to get it done quickly so it relies on Java 8 features, dropwizard metrics for histogram metrics, and JEST (https://github.com/searchbox-io/Jest) which itself drags in more dependencies (Guava, Gson, commons http).
There are few issues with the existing ElasticsearchSystemProducer: 1. The plugin API (IndexRequestFactory) is tied to the Elasticsearch Java API (a bulky dependency) 2. It only supports index requests. I needed to also support updates and deletes. 3. There currently no plugin mechanism to register a flush listener. The reason I needed that was to be able to report end to end latency stats (total pipeline latency = commit time - event time). #3 is easily solvable with a additional plugin options. #1 and #2 require changing the system producer API. Roger On Tue, Feb 9, 2016 at 10:56 AM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Roger, > > That's awesome! Are you planning to submit the HTTP-based system producer > in Samza open-source samza-elasticsearch module? If ElasticSearch community > suggest that HTTP-based clients be the recommended way, we should use it in > samza-elasticsearch as well. And what's your opinion on the existing > ElasticsearchSystemProducer? If the SystemProducer APIs and configure > options do not change, I would vote to replace the implementation w/ > HTTP-based ElasticsearchSystemProducer. > > Thanks for putting this new additions up! > > -Yi > > On Tue, Feb 9, 2016 at 10:39 AM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > Hi Samza folks, > > > > For people who want to use HTTP to integrate with Elasticsearch, I wrote > an > > HTTP-based system producer and a reusable task, including latency stats > > from event origin time, task processing time, and time spent talking to > > Elasticsearch API. > > > > > https://github.com/quantiply/rico/blob/master/docs/common_tasks/es-push.md > > > > Cheers, > > > > Roger > > >