Hi, Roger,

Got it! I would like to understand more on the SystemProducer API changes
required by #1 and #2. Could you elaborate a bit more?

Regarding to JDK8 required in the new HTTP-based Elasticsearch producer, I
want to ask how you are motivated to go w/ JDK8. It does bring a lot more
nice features. If we deprecate source-level compatibility to JDK7, we can
benefit from a lot of new features from JDK8, like lambda, stream APIs,
etc. And refactor Scala code to JDK8 is also much easier.

Thanks!

-Yi

On Tue, Feb 9, 2016 at 4:19 PM, Roger Hoover <roger.hoo...@gmail.com> wrote:

> Hi Yi,
>
> It could be merged into the Samza project if there's enough interest but
> may need some re-working depending on which dependencies are ok to bring
> in.  I did it outside of the Samza project first because I had to get it
> done quickly so it relies on Java 8 features, dropwizard metrics for
> histogram metrics, and JEST (https://github.com/searchbox-io/Jest) which
> itself drags in more dependencies (Guava, Gson, commons http).
>
> There are few issues with the existing ElasticsearchSystemProducer:
>
>    1. The plugin API (IndexRequestFactory) is tied to the Elasticsearch
>    Java API (a bulky dependency)
>    2. It only supports index requests.  I needed to also support updates
>    and deletes.
>    3. There currently no plugin mechanism to register a flush listener.
>    The reason I needed that was to be able to report end to end latency
> stats
>    (total pipeline latency = commit time - event time).
>
> #3 is easily solvable with a additional plugin options. #1 and #2 require
> changing the system producer API.
>
> Roger
>
> On Tue, Feb 9, 2016 at 10:56 AM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Roger,
> >
> > That's awesome! Are you planning to submit the HTTP-based system producer
> > in Samza open-source samza-elasticsearch module? If ElasticSearch
> community
> > suggest that HTTP-based clients be the recommended way, we should use it
> in
> > samza-elasticsearch as well. And what's your opinion on the existing
> > ElasticsearchSystemProducer? If the SystemProducer APIs and configure
> > options do not change, I would vote to replace the implementation w/
> > HTTP-based ElasticsearchSystemProducer.
> >
> > Thanks for putting this new additions up!
> >
> > -Yi
> >
> > On Tue, Feb 9, 2016 at 10:39 AM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> > > Hi Samza folks,
> > >
> > > For people who want to use HTTP to integrate with Elasticsearch, I
> wrote
> > an
> > > HTTP-based system producer and a reusable task, including latency stats
> > > from event origin time, task processing time, and time spent talking to
> > > Elasticsearch API.
> > >
> > >
> >
> https://github.com/quantiply/rico/blob/master/docs/common_tasks/es-push.md
> > >
> > > Cheers,
> > >
> > > Roger
> > >
> >
>

Reply via email to