Your observation is correct.

If  you use inner KStream-KTable join, the join will implement the
filter automatically as the join will not return any result.


-Matthias



On 4/30/17 7:23 AM, Michal Borowiecki wrote:
> I have something working on the same principle (except not using
> connect), that is, I put ids to filter on into a ktable and then (inner)
> join a kstream with that ktable.
> 
> I don't believe the value can be null though. In a changlog null value
> is interpreted as a delete so won't be put into a ktable.
> 
> The RocksDB store, for one, does this:
> 
> private void putInternal(byte[] rawKey, byte[] rawValue) {
>     if (rawValue == null) {
>         try {
>             db.delete(wOptions, rawKey);
> 
> But any non-null value would do.
> Please correct me if miss-understood.
> 
> Cheers,
> MichaƂ
> 
> On 27/04/17 22:44, Matthias J. Sax wrote:
>>>> I'd like to avoid repeated trips to the db, and caching a large amount of
>>>> data in memory.
>> Lookups to the DB would be hard to get done anyway. Ie, it would not
>> perform well, as all your calls would need to be synchronous...
>>
>>
>>>> Is it possible to send a message w/ the id as the partition key to a topic,
>>>> and then use the same id as the key, so the same node which will receive
>>>> the data for an id is the one which will process it?
>> That is what I did propose (maybe it was not clear). If you use Connect,
>> you can just import the ID into Kafka and leave the value empty (ie,
>> null). This reduced you cache data to a minimum. And the KStream-KTable
>> join work as you described it :)
>>
>>
>> -Matthias
>>
>> On 4/27/17 2:37 PM, Ali Akhtar wrote:
>>> I'd like to avoid repeated trips to the db, and caching a large amount of
>>> data in memory.
>>>
>>> Is it possible to send a message w/ the id as the partition key to a topic,
>>> and then use the same id as the key, so the same node which will receive
>>> the data for an id is the one which will process it?
>>>
>>>
>>> On Fri, Apr 28, 2017 at 2:32 AM, Matthias J. Sax <matth...@confluent.io>
>>> wrote:
>>>
>>>> The recommended solution would be to use Kafka Connect to load you DB
>>>> data into a Kafka topic.
>>>>
>>>> With Kafka Streams you read your db-topic as KTable and do a (inne)
>>>> KStream-KTable join to lookup the IDs.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 4/27/17 2:22 PM, Ali Akhtar wrote:
>>>>> I have a Kafka topic which will receive a large amount of data.
>>>>>
>>>>> This data has an 'id' field. I need to look up the id in an external db,
>>>>> see if we are tracking that id, and if yes, we process that message, if
>>>>> not, we ignore it.
>>>>>
>>>>> 99% of the data will be for ids which are not being tracked - 1% or so
>>>> will
>>>>> be for ids which are tracked.
>>>>>
>>>>> My concern is, that there'd be a lot of round trips to the db made just
>>>> to
>>>>> check the id, and if it'd be better to cache the ids being tracked
>>>>> somewhere, so other ids are ignored.
>>>>>
>>>>> I was considering sending a message to another (or the same topic)
>>>> whenever
>>>>> a new id is added to the track list, and that id should then get
>>>> processed
>>>>> on the node which will process the messages.
>>>>>
>>>>> Should I just cache all ids on all nodes (which may be a large amount),
>>>> or
>>>>> is there a way to only cache the id on the same kafka streams node which
>>>>> will receive data for that id?
>>>>>
>>>>
> 
> -- 
> Signature
> <http://www.openbet.com/>     Michal Borowiecki
> Senior Software Engineer L4
>       T:      +44 208 742 1600
> 
>       
>       +44 203 249 8448
> 
>       
>        
>       E:      michal.borowie...@openbet.com
>       W:      www.openbet.com <http://www.openbet.com/>
> 
>       
>       OpenBet Ltd
> 
>       Chiswick Park Building 9
> 
>       566 Chiswick High Rd
> 
>       London
> 
>       W4 5XT
> 
>       UK
> 
>       
> <https://www.openbet.com/email_promo>
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
> from your system as well as any copies. The content of e-mails as well
> as traffic data may be monitored by OpenBet for employment and security
> purposes. To protect the environment please do not print this e-mail
> unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
> 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> registered in England and Wales. Registered no. 3134634. VAT no.
> GB927523612
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to