Thanks Mayur and TD for your inputs. ~Shrikar
On Fri, Jun 20, 2014 at 1:20 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > If the metadata is directly related to each individual records, then it > can be done either ways. Since I am not sure how easy or hard will it be > for you add tags before putting the data into spark streaming, its hard to > recommend one method over the other. > > However, if the metadata is related to each key (based on which you are > called updateStateByKey) and not every record, then it may be more > efficient to maintain that per-key metadata in the updateStateByKey's state > object. > > Regarding doing http calls, I would be a bit cautious about performance. > Doing a http call for every records it going to be quite expensive, and > reduce throughput significantly. If it is possible, cache values as much as > possible to amortize the cost of http calls. > > TD > > > > > > On Fri, Jun 20, 2014 at 11:16 AM, Shrikar archak <shrika...@gmail.com> > wrote: > >> Hi All, >> >> I was curious to know which of the two approach is better for doing >> analytics using spark streaming. Lets say we want to add some metadata to >> the stream which is being processed like sentiment, tags etc and then >> perform some analytics using these added metadata. >> >> 1) Is it ok to make a http call and add some extra information to the >> stream being processed in the updateByKeyAndWindow operations. >> >> 2) Add these sentiment/tags before and then stream through DStreams. >> >> Thanks, >> Shrikar >> >> >