Re: Regarding my use case to explore with Samza-

Manohar Reddy Tue, 01 Mar 2016 23:32:32 -0800


Hi Ramesh&Jagadesh,

You are very clear about my use case .Please find my inline comments for your 
query on my use case.

We are very thanks about your inputs on my use case and definitely we will 
consider your adapter concept to do a small batch jdbc calls instead of each 
and every event.

Please find my inline comments for your question.

~~Manohar

-----Original Message-----
From: Navina Ramesh [mailto:nram...@linkedin.com.INVALID]
Sent: Wednesday, March 2, 2016 12:42 PM
To: dev@samza.apache.org
Subject: Re: Regarding my use case to explore with Samza-

Hi Manohar,

On a side note regarding your use-case, I have a question.

After consuming the DML changes from the kafka topic, why do you have to query 
back? Are you trying to decorate the event or perform some kind of join?

[Manohar]we  are performing joins, events are only primary tables so to get 
whole data set we are performing some set of joins .

The point I am trying to make is that if you perform a remote lookup with every 
event you consume, it's going to be hard to keep to "realtime" (then again, 
realtime really depends on your SLA).

Instead, I would suggest that you have an adapter that periodically takes a 
snapshot of the entire table and pushes it to another topic in Kafka (not sure 
how hard it is going to be to write an adapter). This way, when you job starts, 
it can partition and cache the entire data set in the Samza task (by using 
RocksDb with changelog, as Jagadish suggested). Samza provides a "bootstrap" 
stream option that is read during job-startup until no more messages are 
available. You can configure your snapshot stream to be a bootstrap stream, 
essentially.

Once your job is "bootstrapped", you can process events by looking up in the 
local partitioned store rather than the remote store. Please note that the DML 
change topic and data set need to be partitioned with the same key.

Otherwise, it won't work correctly.

Another alternative is make a remote call to fetch the data set and cache it 
locally with rocksdb. This is much simpler to implement, however, it depends on 
how you configure your cache and the job will be eventually close to "realtime".

Hope my suggestions makes sense. Apologies, if I have misunderstood your 
use-case.

Feel free to ask any questions you may have.

Cheers!

Navina

On Tue, Mar 1, 2016 at 10:43 PM, Jagadish Venkatraman < 
jagadish1...@gmail.com<mailto:jagadish1...@gmail.com>> wrote:

> Please take a look at the hello-world example. You can implement your

> business logic in the process() callback.

>

> What kind of transformation are you doing? Are you doing a group

> by/count style aggregation to generate the report? If so, you could

> use the embedded rocksdb store in Samza and potentially batch your writes to 
> the database.

>

> How many Qps do you process at peak? Do you expect to buffer any state

> per message? What's the ratio of input to output messages on average?

>

> There's nothing that stops you from using JDBC and Samza.

>

> On Tue, Mar 1, 2016 at 8:58 PM, Manohar Reddy <

> manohar.re...@happiestminds.com<mailto:manohar.re...@happiestminds.com>> 
> wrote:

>

> > Hello Team,

> >

> > we are part of some service based company and trying to explore the

> > available real time streaming technologies.so one of the first

> > option we are trying is Samza.

> > let me explain brief about my use case here:

> >

> > we are trying to build  real time reporting dashboard for e-learning

> > domain.

> > To build this dash board the input is RDBMS.so if any

> > DML(inserts/updates/deletes)  into the source RDBMS,immediately some

> > adapter will publish to kafka with RDBMS table name and primary keys

> > as JSON format.

> > Now Samza has to consume the kafka event and query back to source

> > RDBMS table to get the whole data set of  RDBMS relation tables by

> > using json event information.

> > now do some transformation here as per business and load into

> > Target(Reporting DB) RDBMS.

> > more or less here we are handling with few  JDBC calls through Samza

> > and here every day data load is small I can say max 2Gb of data but

> > we need real time processing ecosystem in place.

> > that's it brief about my usecase,so team please provide your inputs

> > how

> we

> > can approach with samza for this requirement.is there any utility

> > API with Samza for JDBC calls.

> >

> > Thank you very much in Advance.

> >

> > ~~Manohar

> > ________________________________

> > Happiest Minds Disclaimer

> >

> > This message is for the sole use of the intended recipient(s) and

> > may contain confidential, proprietary or legally privileged

> > information. Any unauthorized review, use, disclosure or

> > distribution is prohibited. If

> you

> > are not the original intended recipient of the message, please

> > contact

> the

> > sender by reply email and destroy all copies of the original message.

> >

> > Happiest Minds Technologies <http://www.happiestminds.com>

> >

> > ________________________________

> >

>

>

>

> --

> Jagadish V,

> Graduate Student,

> Department of Computer Science,

> Stanford University

>

--

Navina R.

________________________________
Happiest Minds Disclaimer

This message is for the sole use of the intended recipient(s) and may contain 
confidential, proprietary or legally privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
original intended recipient of the message, please contact the sender by reply 
email and destroy all copies of the original message.

Happiest Minds Technologies <http://www.happiestminds.com>

________________________________

Re: Regarding my use case to explore with Samza-

Reply via email to