Thanks Mayur and TD for your inputs.

~Shrikar


On Fri, Jun 20, 2014 at 1:20 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> If the metadata is directly related to each individual records, then it
> can be done either ways. Since I am not sure how easy or hard will it be
> for you add tags before putting the data into spark streaming, its hard to
> recommend one method over the other.
>
> However, if the metadata is related to each key (based on which you are
> called updateStateByKey) and not every record, then it may be more
> efficient to maintain that per-key metadata in the updateStateByKey's state
> object.
>
> Regarding doing http calls, I would be a bit cautious about performance.
> Doing a http call for every records it going to be quite expensive, and
> reduce throughput significantly. If it is possible, cache values as much as
> possible to amortize the cost of http calls.
>
> TD
>
>
>
>
>
> On Fri, Jun 20, 2014 at 11:16 AM, Shrikar archak <shrika...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I was curious to know which of the two approach is better for doing
>> analytics using spark streaming. Lets say we want to add some metadata to
>> the stream which is being processed like sentiment, tags etc and then
>> perform some analytics using these added metadata.
>>
>> 1)  Is it ok to make a http call and add some extra information to the
>> stream being processed in the updateByKeyAndWindow operations.
>>
>> 2) Add these sentiment/tags before and then stream through DStreams.
>>
>> Thanks,
>> Shrikar
>>
>>
>

Reply via email to