Your observation is correct. If you use inner KStream-KTable join, the join will implement the filter automatically as the join will not return any result.
-Matthias On 4/30/17 7:23 AM, Michal Borowiecki wrote: > I have something working on the same principle (except not using > connect), that is, I put ids to filter on into a ktable and then (inner) > join a kstream with that ktable. > > I don't believe the value can be null though. In a changlog null value > is interpreted as a delete so won't be put into a ktable. > > The RocksDB store, for one, does this: > > private void putInternal(byte[] rawKey, byte[] rawValue) { > if (rawValue == null) { > try { > db.delete(wOptions, rawKey); > > But any non-null value would do. > Please correct me if miss-understood. > > Cheers, > MichaĆ > > On 27/04/17 22:44, Matthias J. Sax wrote: >>>> I'd like to avoid repeated trips to the db, and caching a large amount of >>>> data in memory. >> Lookups to the DB would be hard to get done anyway. Ie, it would not >> perform well, as all your calls would need to be synchronous... >> >> >>>> Is it possible to send a message w/ the id as the partition key to a topic, >>>> and then use the same id as the key, so the same node which will receive >>>> the data for an id is the one which will process it? >> That is what I did propose (maybe it was not clear). If you use Connect, >> you can just import the ID into Kafka and leave the value empty (ie, >> null). This reduced you cache data to a minimum. And the KStream-KTable >> join work as you described it :) >> >> >> -Matthias >> >> On 4/27/17 2:37 PM, Ali Akhtar wrote: >>> I'd like to avoid repeated trips to the db, and caching a large amount of >>> data in memory. >>> >>> Is it possible to send a message w/ the id as the partition key to a topic, >>> and then use the same id as the key, so the same node which will receive >>> the data for an id is the one which will process it? >>> >>> >>> On Fri, Apr 28, 2017 at 2:32 AM, Matthias J. Sax <matth...@confluent.io> >>> wrote: >>> >>>> The recommended solution would be to use Kafka Connect to load you DB >>>> data into a Kafka topic. >>>> >>>> With Kafka Streams you read your db-topic as KTable and do a (inne) >>>> KStream-KTable join to lookup the IDs. >>>> >>>> >>>> -Matthias >>>> >>>> On 4/27/17 2:22 PM, Ali Akhtar wrote: >>>>> I have a Kafka topic which will receive a large amount of data. >>>>> >>>>> This data has an 'id' field. I need to look up the id in an external db, >>>>> see if we are tracking that id, and if yes, we process that message, if >>>>> not, we ignore it. >>>>> >>>>> 99% of the data will be for ids which are not being tracked - 1% or so >>>> will >>>>> be for ids which are tracked. >>>>> >>>>> My concern is, that there'd be a lot of round trips to the db made just >>>> to >>>>> check the id, and if it'd be better to cache the ids being tracked >>>>> somewhere, so other ids are ignored. >>>>> >>>>> I was considering sending a message to another (or the same topic) >>>> whenever >>>>> a new id is added to the track list, and that id should then get >>>> processed >>>>> on the node which will process the messages. >>>>> >>>>> Should I just cache all ids on all nodes (which may be a large amount), >>>> or >>>>> is there a way to only cache the id on the same kafka streams node which >>>>> will receive data for that id? >>>>> >>>> > > -- > Signature > <http://www.openbet.com/> Michal Borowiecki > Senior Software Engineer L4 > T: +44 208 742 1600 > > > +44 203 249 8448 > > > > E: michal.borowie...@openbet.com > W: www.openbet.com <http://www.openbet.com/> > > > OpenBet Ltd > > Chiswick Park Building 9 > > 566 Chiswick High Rd > > London > > W4 5XT > > UK > > > <https://www.openbet.com/email_promo> > > This message is confidential and intended only for the addressee. If you > have received this message in error, please immediately notify the > postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it > from your system as well as any copies. The content of e-mails as well > as traffic data may be monitored by OpenBet for employment and security > purposes. To protect the environment please do not print this e-mail > unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building > 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company > registered in England and Wales. Registered no. 3134634. VAT no. > GB927523612 >
signature.asc
Description: OpenPGP digital signature