Ah. Sorry. You are right. Nevertheless, you can set an non-null dummy value like `byte[0]` instead of the actual "tuple" to not blow up your storage requirement.
-Matthias On 4/30/17 10:24 AM, Michal Borowiecki wrote: > Apologies, I must have not made myself clear. > > I meant the values in the records coming from the input topic (which in > turn are coming from kafka connect in the example at hand) > > and not the records coming out of the join. > > My intention was to warn against sending null values from kafka connect > to the topic that is then meant to be read-in as a ktable to filter against. > > > Am I clearer now? > > > Cheers, > > Michał > > > On 30/04/17 18:14, Matthias J. Sax wrote: >> Your observation is correct. >> >> If you use inner KStream-KTable join, the join will implement the >> filter automatically as the join will not return any result. >> >> >> -Matthias >> >> >> >> On 4/30/17 7:23 AM, Michal Borowiecki wrote: >>> I have something working on the same principle (except not using >>> connect), that is, I put ids to filter on into a ktable and then (inner) >>> join a kstream with that ktable. >>> >>> I don't believe the value can be null though. In a changlog null value >>> is interpreted as a delete so won't be put into a ktable. >>> >>> The RocksDB store, for one, does this: >>> >>> private void putInternal(byte[] rawKey, byte[] rawValue) { >>> if (rawValue == null) { >>> try { >>> db.delete(wOptions, rawKey); >>> >>> But any non-null value would do. >>> Please correct me if miss-understood. >>> >>> Cheers, >>> Michał >>> >>> On 27/04/17 22:44, Matthias J. Sax wrote: >>>>>> I'd like to avoid repeated trips to the db, and caching a large amount of >>>>>> data in memory. >>>> Lookups to the DB would be hard to get done anyway. Ie, it would not >>>> perform well, as all your calls would need to be synchronous... >>>> >>>> >>>>>> Is it possible to send a message w/ the id as the partition key to a >>>>>> topic, >>>>>> and then use the same id as the key, so the same node which will receive >>>>>> the data for an id is the one which will process it? >>>> That is what I did propose (maybe it was not clear). If you use Connect, >>>> you can just import the ID into Kafka and leave the value empty (ie, >>>> null). This reduced you cache data to a minimum. And the KStream-KTable >>>> join work as you described it :) >>>> >>>> >>>> -Matthias >>>> >>>> On 4/27/17 2:37 PM, Ali Akhtar wrote: >>>>> I'd like to avoid repeated trips to the db, and caching a large amount of >>>>> data in memory. >>>>> >>>>> Is it possible to send a message w/ the id as the partition key to a >>>>> topic, >>>>> and then use the same id as the key, so the same node which will receive >>>>> the data for an id is the one which will process it? >>>>> >>>>> >>>>> On Fri, Apr 28, 2017 at 2:32 AM, Matthias J. Sax <matth...@confluent.io> >>>>> wrote: >>>>> >>>>>> The recommended solution would be to use Kafka Connect to load you DB >>>>>> data into a Kafka topic. >>>>>> >>>>>> With Kafka Streams you read your db-topic as KTable and do a (inne) >>>>>> KStream-KTable join to lookup the IDs. >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> On 4/27/17 2:22 PM, Ali Akhtar wrote: >>>>>>> I have a Kafka topic which will receive a large amount of data. >>>>>>> >>>>>>> This data has an 'id' field. I need to look up the id in an external db, >>>>>>> see if we are tracking that id, and if yes, we process that message, if >>>>>>> not, we ignore it. >>>>>>> >>>>>>> 99% of the data will be for ids which are not being tracked - 1% or so >>>>>> will >>>>>>> be for ids which are tracked. >>>>>>> >>>>>>> My concern is, that there'd be a lot of round trips to the db made just >>>>>> to >>>>>>> check the id, and if it'd be better to cache the ids being tracked >>>>>>> somewhere, so other ids are ignored. >>>>>>> >>>>>>> I was considering sending a message to another (or the same topic) >>>>>> whenever >>>>>>> a new id is added to the track list, and that id should then get >>>>>> processed >>>>>>> on the node which will process the messages. >>>>>>> >>>>>>> Should I just cache all ids on all nodes (which may be a large amount), >>>>>> or >>>>>>> is there a way to only cache the id on the same kafka streams node which >>>>>>> will receive data for that id? >>>>>>> >>> -- >>> Signature >>> <http://www.openbet.com/> Michal Borowiecki >>> Senior Software Engineer L4 >>> T: +44 208 742 1600 >>> >>> >>> +44 203 249 8448 >>> >>> >>> >>> E: michal.borowie...@openbet.com >>> W: www.openbet.com <http://www.openbet.com/> >>> >>> >>> OpenBet Ltd >>> >>> Chiswick Park Building 9 >>> >>> 566 Chiswick High Rd >>> >>> London >>> >>> W4 5XT >>> >>> UK >>> >>> >>> <https://www.openbet.com/email_promo> >>> >>> This message is confidential and intended only for the addressee. If you >>> have received this message in error, please immediately notify the >>> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it >>> from your system as well as any copies. The content of e-mails as well >>> as traffic data may be monitored by OpenBet for employment and security >>> purposes. To protect the environment please do not print this e-mail >>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building >>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company >>> registered in England and Wales. Registered no. 3134634. VAT no. >>> GB927523612 >>> > > -- > Signature > <http://www.openbet.com/> Michal Borowiecki > Senior Software Engineer L4 > T: +44 208 742 1600 > > > +44 203 249 8448 > > > > E: michal.borowie...@openbet.com > W: www.openbet.com <http://www.openbet.com/> > > > OpenBet Ltd > > Chiswick Park Building 9 > > 566 Chiswick High Rd > > London > > W4 5XT > > UK > > > <https://www.openbet.com/email_promo> > > This message is confidential and intended only for the addressee. If you > have received this message in error, please immediately notify the > postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it > from your system as well as any copies. The content of e-mails as well > as traffic data may be monitored by OpenBet for employment and security > purposes. To protect the environment please do not print this e-mail > unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building > 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company > registered in England and Wales. Registered no. 3134634. VAT no. > GB927523612 >
signature.asc
Description: OpenPGP digital signature