Thanks for this. It's kcl based kinesis application. But because its just a Java application we are thinking to use spark on EMR or storm for fault tolerance and load balancing. Is it a correct approach? On 17 Jun 2015 23:07, "Enno Shioji" <eshi...@gmail.com> wrote:
> Hi Ayan, > > Admittedly I haven't done much with Kinesis, but if I'm not mistaken you > should be able to use their "processor" interface for that. In this > example, it's incrementing a counter: > https://github.com/awslabs/amazon-kinesis-data-visualization-sample/blob/master/src/main/java/com/amazonaws/services/kinesis/samples/datavis/kcl/CountingRecordProcessor.java > > Instead of incrementing a counter, you could do your transformation and > send it to HBase. > > > > > > > On Wed, Jun 17, 2015 at 1:40 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Great discussion!! >> >> One qs about some comment: Also, you can do some processing with Kinesis. >> If all you need to do is straight forward transformation and you are >> reading from Kinesis to begin with, it might be an easier option to just do >> the transformation in Kinesis >> >> - Do you mean KCL application? Or some kind of processing withinKineis? >> >> Can you kindly share a link? I would definitely pursue this route as our >> transformations are really simple. >> >> Best >> >> On Wed, Jun 17, 2015 at 10:26 PM, Ashish Soni <asoni.le...@gmail.com> >> wrote: >> >>> My Use case is below >>> >>> We are going to receive lot of event as stream ( basically Kafka Stream >>> ) and then we need to process and compute >>> >>> Consider you have a phone contract with ATT and every call / sms / data >>> useage you do is an event and then it needs to calculate your bill on real >>> time basis so when you login to your account you can see all those variable >>> as how much you used and how much is left and what is your bill till date >>> ,Also there are different rules which need to be considered when you >>> calculate the total bill one simple rule will be 0-500 min it is free but >>> above it is $1 a min. >>> >>> How do i maintain a shared state ( total amount , total min , total >>> data etc ) so that i know how much i accumulated at any given point as >>> events for same phone can go to any node / executor. >>> >>> Can some one please tell me how can i achieve this is spark as in storm >>> i can have a bolt which can do this ? >>> >>> Thanks, >>> >>> >>> >>> On Wed, Jun 17, 2015 at 4:52 AM, Enno Shioji <eshi...@gmail.com> wrote: >>> >>>> I guess both. In terms of syntax, I was comparing it with Trident. >>>> >>>> If you are joining, Spark Streaming actually does offer windowed join >>>> out of the box. We couldn't use this though as our event stream can grow >>>> "out-of-sync", so we had to implement something on top of Storm. If your >>>> event streams don't become out of sync, you may find the built-in join in >>>> Spark Streaming useful. Storm also has a join keyword but its semantics are >>>> different. >>>> >>>> >>>> > Also, what do you mean by "No Back Pressure" ? >>>> >>>> So when a topology is overloaded, Storm is designed so that it will >>>> stop reading from the source. Spark on the other hand, will keep reading >>>> from the source and spilling it internally. This maybe fine, in fairness, >>>> but it does mean you have to worry about the persistent store usage in the >>>> processing cluster, whereas with Storm you don't have to worry because the >>>> messages just remain in the data store. >>>> >>>> Spark came up with the idea of rate limiting, but I don't feel this is >>>> as nice as back pressure because it's very difficult to tune it such that >>>> you don't cap the cluster's processing power but yet so that it will >>>> prevent the persistent storage to get used up. >>>> >>>> >>>> On Wed, Jun 17, 2015 at 9:33 AM, Spark Enthusiast < >>>> sparkenthusi...@yahoo.in> wrote: >>>> >>>>> When you say Storm, did you mean Storm with Trident or Storm? >>>>> >>>>> My use case does not have simple transformation. There are complex >>>>> events that need to be generated by joining the incoming event stream. >>>>> >>>>> Also, what do you mean by "No Back PRessure" ? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshi...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> We've evaluated Spark Streaming vs. Storm and ended up sticking with >>>>> Storm. >>>>> >>>>> Some of the important draw backs are: >>>>> Spark has no back pressure (receiver rate limit can alleviate this to >>>>> a certain point, but it's far from ideal) >>>>> There is also no exactly-once semantics. (updateStateByKey can >>>>> achieve this semantics, but is not practical if you have any significant >>>>> amount of state because it does so by dumping the entire state on every >>>>> checkpointing) >>>>> >>>>> There are also some minor drawbacks that I'm sure will be fixed >>>>> quickly, like no task timeout, not being able to read from Kafka using >>>>> multiple nodes, data loss hazard with Kafka. >>>>> >>>>> It's also not possible to attain very low latency in Spark, if that's >>>>> what you need. >>>>> >>>>> The pos for Spark is the concise and IMO more intuitive syntax, >>>>> especially if you compare it with Storm's Java API. >>>>> >>>>> I admit I might be a bit biased towards Storm tho as I'm more familiar >>>>> with it. >>>>> >>>>> Also, you can do some processing with Kinesis. If all you need to do >>>>> is straight forward transformation and you are reading from Kinesis to >>>>> begin with, it might be an easier option to just do the transformation in >>>>> Kinesis. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan < >>>>> sabarish.sasidha...@manthan.com> wrote: >>>>> >>>>> Whatever you write in bolts would be the logic you want to apply on >>>>> your events. In Spark, that logic would be coded in map() or similar such >>>>> transformations and/or actions. Spark doesn't enforce a structure for >>>>> capturing your processing logic like Storm does. >>>>> Regards >>>>> Sab >>>>> Probably overloading the question a bit. >>>>> >>>>> In Storm, Bolts have the functionality of getting triggered on events. >>>>> Is that kind of functionality possible with Spark streaming? During each >>>>> phase of the data processing, the transformed data is stored to the >>>>> database and this transformed data should then be sent to a new pipeline >>>>> for further processing >>>>> >>>>> How can this be achieved using Spark? >>>>> >>>>> >>>>> >>>>> On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast < >>>>> sparkenthusi...@yahoo.in> wrote: >>>>> >>>>> I have a use-case where a stream of Incoming events have to be >>>>> aggregated and joined to create Complex events. The aggregation will have >>>>> to happen at an interval of 1 minute (or less). >>>>> >>>>> The pipeline is : >>>>> send events >>>>> enrich event >>>>> Upstream services -------------------> KAFKA ---------> event Stream >>>>> Processor ------------> Complex Event Processor ------------> Elastic >>>>> Search. >>>>> >>>>> From what I understand, Storm will make a very good ESP and Spark >>>>> Streaming will make a good CEP. >>>>> >>>>> But, we are also evaluating Storm with Trident. >>>>> >>>>> How does Spark Streaming compare with Storm with Trident? >>>>> >>>>> Sridhar Chellappa >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.a...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> I have a similar scenario where we need to bring data from kinesis to >>>>> hbase. Data volecity is 20k per 10 mins. Little manipulation of data will >>>>> be required but that's regardless of the tool so we will be writing that >>>>> piece in Java pojo. >>>>> All env is on aws. Hbase is on a long running EMR and kinesis on a >>>>> separate cluster. >>>>> TIA. >>>>> Best >>>>> Ayan >>>>> On 17 Jun 2015 12:13, "Will Briggs" <wrbri...@gmail.com> wrote: >>>>> >>>>> The programming models for the two frameworks are conceptually rather >>>>> different; I haven't worked with Storm for quite some time, but based on >>>>> my >>>>> old experience with it, I would equate Spark Streaming more with Storm's >>>>> Trident API, rather than with the raw Bolt API. Even then, there are >>>>> significant differences, but it's a bit closer. >>>>> >>>>> If you can share your use case, we might be able to provide better >>>>> guidance. >>>>> >>>>> Regards, >>>>> Will >>>>> >>>>> On June 16, 2015, at 9:46 PM, asoni.le...@gmail.com wrote: >>>>> >>>>> Hi All, >>>>> >>>>> I am evaluating spark VS storm ( spark streaming ) and i am not able >>>>> to see what is equivalent of Bolt in storm inside spark. >>>>> >>>>> Any help will be appreciated on this ? >>>>> >>>>> Thanks , >>>>> Ashish >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >