Re: Asynchronous approach and samza

2015-09-21 Thread Navina Ramesh
les, e.g. FetchBuffer. > > > > And it's the same for storm-crawler, another project I've been involved > > with in the past. > > > > -- Ken > > > > > From: Michael Sklyar > > > Sent: September 21, 2015 5:19:52am PDT > > > To: dev

Re: Asynchronous approach and samza

2015-09-21 Thread Jordan Shaw
l reduce tasks that are doing the fetching - see Nutch and > Bixo for examples, e.g. FetchBuffer. > > And it's the same for storm-crawler, another project I've been involved > with in the past. > > -- Ken > > > From: Michael Sklyar > > Sent: September

RE: Asynchronous approach and samza

2015-09-21 Thread Ken Krugler
r storm-crawler, another project I've been involved with in the past. -- Ken > From: Michael Sklyar > Sent: September 21, 2015 5:19:52am PDT > To: dev@samza.apache.org > Subject: Re: Asynchronous approach and samza > > Thanks Navina, > it is much more clear now. > &

Re: Asynchronous approach and samza

2015-09-21 Thread Michael Sklyar
Thanks Navina, it is much more clear now. Unfortunately, in our case, we can not bootstrap the data in advance(we can't pre-fetch all existing URL's titles and headers in advance). Sounds to me that, if we want to use Samza, we will need a background process that will be synchronized with the main

Re: Asynchronous approach and samza

2015-09-21 Thread Navina Ramesh
Hi Michael, {quote} Do you mean that in such a case Samza should be combined with another Stream processing framework (such as Storm)? {quote} No. I didn't mean combining it with any other framework. {quote} "the job bootstraps the data from the source" - do you mean that you have a background pro

Re: Asynchronous approach and samza

2015-09-21 Thread Michael Sklyar
Thank you for your replies, I understand that making an external blocking request in a single event thread will result in extremely low throughput. However this can be solved by multi threading and/or asynchronous approach. It is clear that in any case using external services can never achieve the

Re: Asynchronous approach and samza

2015-09-20 Thread Navina Ramesh
Hi Michael, I agree with what Yan said. While nothing stops you from doing it, it is not encouraged as it affect throughput and realtime processing. {quote} It seems that Samza design suits very well "data transformation" scenarios, what is not clear is how well can it support external services? {

Re: Asynchronous approach and samza

2015-09-20 Thread Yan Fang
Hi Michael, Samza is designed for high-throughput and realtime processing. If you are using HTTP request/external service, you may not retrieve the same performance as not using it. However, technically speaking, there is nothing blocking you to do this, (well, discouraged anyway :). Samza by defa