Hi Ayan, Admittedly I haven't done much with Kinesis, but if I'm not mistaken you should be able to use their "processor" interface for that. In this example, it's incrementing a counter: https://github.com/awslabs/amazon-kinesis-data-visualization-sample/blob/master/src/main/java/com/amazonaws/services/kinesis/samples/datavis/kcl/CountingRecordProcessor.java
Instead of incrementing a counter, you could do your transformation and send it to HBase. On Wed, Jun 17, 2015 at 1:40 PM, ayan guha <guha.a...@gmail.com> wrote: > Great discussion!! > > One qs about some comment: Also, you can do some processing with Kinesis. > If all you need to do is straight forward transformation and you are > reading from Kinesis to begin with, it might be an easier option to just do > the transformation in Kinesis > > - Do you mean KCL application? Or some kind of processing withinKineis? > > Can you kindly share a link? I would definitely pursue this route as our > transformations are really simple. > > Best > > On Wed, Jun 17, 2015 at 10:26 PM, Ashish Soni <asoni.le...@gmail.com> > wrote: > >> My Use case is below >> >> We are going to receive lot of event as stream ( basically Kafka Stream ) >> and then we need to process and compute >> >> Consider you have a phone contract with ATT and every call / sms / data >> useage you do is an event and then it needs to calculate your bill on real >> time basis so when you login to your account you can see all those variable >> as how much you used and how much is left and what is your bill till date >> ,Also there are different rules which need to be considered when you >> calculate the total bill one simple rule will be 0-500 min it is free but >> above it is $1 a min. >> >> How do i maintain a shared state ( total amount , total min , total data >> etc ) so that i know how much i accumulated at any given point as events >> for same phone can go to any node / executor. >> >> Can some one please tell me how can i achieve this is spark as in storm i >> can have a bolt which can do this ? >> >> Thanks, >> >> >> >> On Wed, Jun 17, 2015 at 4:52 AM, Enno Shioji <eshi...@gmail.com> wrote: >> >>> I guess both. In terms of syntax, I was comparing it with Trident. >>> >>> If you are joining, Spark Streaming actually does offer windowed join >>> out of the box. We couldn't use this though as our event stream can grow >>> "out-of-sync", so we had to implement something on top of Storm. If your >>> event streams don't become out of sync, you may find the built-in join in >>> Spark Streaming useful. Storm also has a join keyword but its semantics are >>> different. >>> >>> >>> > Also, what do you mean by "No Back Pressure" ? >>> >>> So when a topology is overloaded, Storm is designed so that it will stop >>> reading from the source. Spark on the other hand, will keep reading from >>> the source and spilling it internally. This maybe fine, in fairness, but it >>> does mean you have to worry about the persistent store usage in the >>> processing cluster, whereas with Storm you don't have to worry because the >>> messages just remain in the data store. >>> >>> Spark came up with the idea of rate limiting, but I don't feel this is >>> as nice as back pressure because it's very difficult to tune it such that >>> you don't cap the cluster's processing power but yet so that it will >>> prevent the persistent storage to get used up. >>> >>> >>> On Wed, Jun 17, 2015 at 9:33 AM, Spark Enthusiast < >>> sparkenthusi...@yahoo.in> wrote: >>> >>>> When you say Storm, did you mean Storm with Trident or Storm? >>>> >>>> My use case does not have simple transformation. There are complex >>>> events that need to be generated by joining the incoming event stream. >>>> >>>> Also, what do you mean by "No Back PRessure" ? >>>> >>>> >>>> >>>> >>>> >>>> On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshi...@gmail.com> >>>> wrote: >>>> >>>> >>>> We've evaluated Spark Streaming vs. Storm and ended up sticking with >>>> Storm. >>>> >>>> Some of the important draw backs are: >>>> Spark has no back pressure (receiver rate limit can alleviate this to a >>>> certain point, but it's far from ideal) >>>> There is also no exactly-once semantics. (updateStateByKey can achieve >>>> this semantics, but is not practical if you have any significant amount of >>>> state because it does so by dumping the entire state on every >>>> checkpointing) >>>> >>>> There are also some minor drawbacks that I'm sure will be fixed >>>> quickly, like no task timeout, not being able to read from Kafka using >>>> multiple nodes, data loss hazard with Kafka. >>>> >>>> It's also not possible to attain very low latency in Spark, if that's >>>> what you need. >>>> >>>> The pos for Spark is the concise and IMO more intuitive syntax, >>>> especially if you compare it with Storm's Java API. >>>> >>>> I admit I might be a bit biased towards Storm tho as I'm more familiar >>>> with it. >>>> >>>> Also, you can do some processing with Kinesis. If all you need to do is >>>> straight forward transformation and you are reading from Kinesis to begin >>>> with, it might be an easier option to just do the transformation in >>>> Kinesis. >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan < >>>> sabarish.sasidha...@manthan.com> wrote: >>>> >>>> Whatever you write in bolts would be the logic you want to apply on >>>> your events. In Spark, that logic would be coded in map() or similar such >>>> transformations and/or actions. Spark doesn't enforce a structure for >>>> capturing your processing logic like Storm does. >>>> Regards >>>> Sab >>>> Probably overloading the question a bit. >>>> >>>> In Storm, Bolts have the functionality of getting triggered on events. >>>> Is that kind of functionality possible with Spark streaming? During each >>>> phase of the data processing, the transformed data is stored to the >>>> database and this transformed data should then be sent to a new pipeline >>>> for further processing >>>> >>>> How can this be achieved using Spark? >>>> >>>> >>>> >>>> On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast < >>>> sparkenthusi...@yahoo.in> wrote: >>>> >>>> I have a use-case where a stream of Incoming events have to be >>>> aggregated and joined to create Complex events. The aggregation will have >>>> to happen at an interval of 1 minute (or less). >>>> >>>> The pipeline is : >>>> send events >>>> enrich event >>>> Upstream services -------------------> KAFKA ---------> event Stream >>>> Processor ------------> Complex Event Processor ------------> Elastic >>>> Search. >>>> >>>> From what I understand, Storm will make a very good ESP and Spark >>>> Streaming will make a good CEP. >>>> >>>> But, we are also evaluating Storm with Trident. >>>> >>>> How does Spark Streaming compare with Storm with Trident? >>>> >>>> Sridhar Chellappa >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.a...@gmail.com> >>>> wrote: >>>> >>>> >>>> I have a similar scenario where we need to bring data from kinesis to >>>> hbase. Data volecity is 20k per 10 mins. Little manipulation of data will >>>> be required but that's regardless of the tool so we will be writing that >>>> piece in Java pojo. >>>> All env is on aws. Hbase is on a long running EMR and kinesis on a >>>> separate cluster. >>>> TIA. >>>> Best >>>> Ayan >>>> On 17 Jun 2015 12:13, "Will Briggs" <wrbri...@gmail.com> wrote: >>>> >>>> The programming models for the two frameworks are conceptually rather >>>> different; I haven't worked with Storm for quite some time, but based on my >>>> old experience with it, I would equate Spark Streaming more with Storm's >>>> Trident API, rather than with the raw Bolt API. Even then, there are >>>> significant differences, but it's a bit closer. >>>> >>>> If you can share your use case, we might be able to provide better >>>> guidance. >>>> >>>> Regards, >>>> Will >>>> >>>> On June 16, 2015, at 9:46 PM, asoni.le...@gmail.com wrote: >>>> >>>> Hi All, >>>> >>>> I am evaluating spark VS storm ( spark streaming ) and i am not able >>>> to see what is equivalent of Bolt in storm inside spark. >>>> >>>> Any help will be appreciated on this ? >>>> >>>> Thanks , >>>> Ashish >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> > > > -- > Best Regards, > Ayan Guha >