les, e.g. FetchBuffer.
> >
> > And it's the same for storm-crawler, another project I've been involved
> > with in the past.
> >
> > -- Ken
> >
> > > From: Michael Sklyar
> > > Sent: September 21, 2015 5:19:52am PDT
> > > To: dev
l reduce tasks that are doing the fetching - see Nutch and
> Bixo for examples, e.g. FetchBuffer.
>
> And it's the same for storm-crawler, another project I've been involved
> with in the past.
>
> -- Ken
>
> > From: Michael Sklyar
> > Sent: September
r storm-crawler, another project I've been involved with in
the past.
-- Ken
> From: Michael Sklyar
> Sent: September 21, 2015 5:19:52am PDT
> To: dev@samza.apache.org
> Subject: Re: Asynchronous approach and samza
>
> Thanks Navina,
> it is much more clear now.
>
&
Thanks Navina,
it is much more clear now.
Unfortunately, in our case, we can not bootstrap the data in advance(we
can't pre-fetch all existing URL's titles and headers in advance).
Sounds to me that, if we want to use Samza, we will need a background
process that will be synchronized with the main
Hi Michael,
{quote}
Do you mean that in such a case Samza should be combined with another
Stream processing framework (such as Storm)?
{quote}
No. I didn't mean combining it with any other framework.
{quote}
"the job bootstraps the data from the source" - do you mean that
you have a background pro
Thank you for your replies,
I understand that making an external blocking request in a single event
thread will result in extremely low throughput. However this can be solved
by multi threading and/or asynchronous approach. It is clear that in any
case using external services can never achieve the
Hi Michael,
I agree with what Yan said. While nothing stops you from doing it, it is
not encouraged as it affect throughput and realtime processing.
{quote}
It seems that Samza design suits very well "data transformation" scenarios,
what is not clear is how well can it support external services?
{
Hi Michael,
Samza is designed for high-throughput and realtime processing. If you are
using HTTP request/external service, you may not retrieve the same
performance as not using it. However, technically speaking, there is
nothing blocking you to do this, (well, discouraged anyway :). Samza by
defa