Ah. Sorry.

You are right. Nevertheless, you can set an non-null dummy value like
`byte[0]` instead of the actual "tuple" to not blow up your storage
requirement.


-Matthias


On 4/30/17 10:24 AM, Michal Borowiecki wrote:
> Apologies, I must have not made myself clear.
> 
> I meant the values in the records coming from the input topic (which in
> turn are coming from kafka connect in the example at hand)
> 
> and not the records coming out of the join.
> 
> My intention was to warn against sending null values from kafka connect
> to the topic that is then meant to be read-in as a ktable to filter against.
> 
> 
> Am I clearer now?
> 
> 
> Cheers,
> 
> Michał
> 
> 
> On 30/04/17 18:14, Matthias J. Sax wrote:
>> Your observation is correct.
>>
>> If  you use inner KStream-KTable join, the join will implement the
>> filter automatically as the join will not return any result.
>>
>>
>> -Matthias
>>
>>
>>
>> On 4/30/17 7:23 AM, Michal Borowiecki wrote:
>>> I have something working on the same principle (except not using
>>> connect), that is, I put ids to filter on into a ktable and then (inner)
>>> join a kstream with that ktable.
>>>
>>> I don't believe the value can be null though. In a changlog null value
>>> is interpreted as a delete so won't be put into a ktable.
>>>
>>> The RocksDB store, for one, does this:
>>>
>>> private void putInternal(byte[] rawKey, byte[] rawValue) {
>>>     if (rawValue == null) {
>>>         try {
>>>             db.delete(wOptions, rawKey);
>>>
>>> But any non-null value would do.
>>> Please correct me if miss-understood.
>>>
>>> Cheers,
>>> Michał
>>>
>>> On 27/04/17 22:44, Matthias J. Sax wrote:
>>>>>> I'd like to avoid repeated trips to the db, and caching a large amount of
>>>>>> data in memory.
>>>> Lookups to the DB would be hard to get done anyway. Ie, it would not
>>>> perform well, as all your calls would need to be synchronous...
>>>>
>>>>
>>>>>> Is it possible to send a message w/ the id as the partition key to a 
>>>>>> topic,
>>>>>> and then use the same id as the key, so the same node which will receive
>>>>>> the data for an id is the one which will process it?
>>>> That is what I did propose (maybe it was not clear). If you use Connect,
>>>> you can just import the ID into Kafka and leave the value empty (ie,
>>>> null). This reduced you cache data to a minimum. And the KStream-KTable
>>>> join work as you described it :)
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 4/27/17 2:37 PM, Ali Akhtar wrote:
>>>>> I'd like to avoid repeated trips to the db, and caching a large amount of
>>>>> data in memory.
>>>>>
>>>>> Is it possible to send a message w/ the id as the partition key to a 
>>>>> topic,
>>>>> and then use the same id as the key, so the same node which will receive
>>>>> the data for an id is the one which will process it?
>>>>>
>>>>>
>>>>> On Fri, Apr 28, 2017 at 2:32 AM, Matthias J. Sax <matth...@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> The recommended solution would be to use Kafka Connect to load you DB
>>>>>> data into a Kafka topic.
>>>>>>
>>>>>> With Kafka Streams you read your db-topic as KTable and do a (inne)
>>>>>> KStream-KTable join to lookup the IDs.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>> On 4/27/17 2:22 PM, Ali Akhtar wrote:
>>>>>>> I have a Kafka topic which will receive a large amount of data.
>>>>>>>
>>>>>>> This data has an 'id' field. I need to look up the id in an external db,
>>>>>>> see if we are tracking that id, and if yes, we process that message, if
>>>>>>> not, we ignore it.
>>>>>>>
>>>>>>> 99% of the data will be for ids which are not being tracked - 1% or so
>>>>>> will
>>>>>>> be for ids which are tracked.
>>>>>>>
>>>>>>> My concern is, that there'd be a lot of round trips to the db made just
>>>>>> to
>>>>>>> check the id, and if it'd be better to cache the ids being tracked
>>>>>>> somewhere, so other ids are ignored.
>>>>>>>
>>>>>>> I was considering sending a message to another (or the same topic)
>>>>>> whenever
>>>>>>> a new id is added to the track list, and that id should then get
>>>>>> processed
>>>>>>> on the node which will process the messages.
>>>>>>>
>>>>>>> Should I just cache all ids on all nodes (which may be a large amount),
>>>>>> or
>>>>>>> is there a way to only cache the id on the same kafka streams node which
>>>>>>> will receive data for that id?
>>>>>>>
>>> -- 
>>> Signature
>>> <http://www.openbet.com/>   Michal Borowiecki
>>> Senior Software Engineer L4
>>>     T:      +44 208 742 1600
>>>
>>>     
>>>     +44 203 249 8448
>>>
>>>     
>>>      
>>>     E:      michal.borowie...@openbet.com
>>>     W:      www.openbet.com <http://www.openbet.com/>
>>>
>>>     
>>>     OpenBet Ltd
>>>
>>>     Chiswick Park Building 9
>>>
>>>     566 Chiswick High Rd
>>>
>>>     London
>>>
>>>     W4 5XT
>>>
>>>     UK
>>>
>>>     
>>> <https://www.openbet.com/email_promo>
>>>
>>> This message is confidential and intended only for the addressee. If you
>>> have received this message in error, please immediately notify the
>>> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
>>> from your system as well as any copies. The content of e-mails as well
>>> as traffic data may be monitored by OpenBet for employment and security
>>> purposes. To protect the environment please do not print this e-mail
>>> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
>>> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
>>> registered in England and Wales. Registered no. 3134634. VAT no.
>>> GB927523612
>>>
> 
> -- 
> Signature
> <http://www.openbet.com/>     Michal Borowiecki
> Senior Software Engineer L4
>       T:      +44 208 742 1600
> 
>       
>       +44 203 249 8448
> 
>       
>        
>       E:      michal.borowie...@openbet.com
>       W:      www.openbet.com <http://www.openbet.com/>
> 
>       
>       OpenBet Ltd
> 
>       Chiswick Park Building 9
> 
>       566 Chiswick High Rd
> 
>       London
> 
>       W4 5XT
> 
>       UK
> 
>       
> <https://www.openbet.com/email_promo>
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to