Re: Implementing a non-key in Kafka Streams using the Processor API

Eno Thereska Tue, 21 Feb 2017 11:17:47 -0800

+1 on seeing what Jan did, I'm interested too.

Thanks
Eno


> On 21 Feb 2017, at 19:15, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> Jan,
> 
> Sure I would love to hear what you did for non-key joins. Last time we
> chatted there are discussions on the ordering issue, that we HAVE TO
> augment the join result stream keys as a combo of both, which may not be
> elegant as used in the DSL.
> 
> For your proposed solution, it seems you did not do that on the DSL but at
> the PAPI layer, right?
> 
> Guozhang
> 
> On Tue, Feb 21, 2017 at 6:05 AM, Jan Filipiak <jan.filip...@trivago.com>
> wrote:
> 
>> Just a little note here:
>> 
>> if you can take all rows of the "children" table for each key into memory,
>> you get get away by using group_by and make a list of them. With this
>> aggregation the join is straight forward and you can use a lateral view
>> later to get to the same result. For this you could use the current DSL to
>> a greater extend.
>> 
>> Best Jan
>> 
>> On 21.02.2017 13:10, Frank Lyaruu wrote:
>> 
>>> I've read that JIRA (although I don't understand every single thing), and
>>> I
>>> got the feeling it is not exactly the same problem.
>>> I am aware of the Global Tables, and I've tried that first, but I seem
>>> unable to do what I need to do.
>>> 
>>> I'm replicating a relational database, and on a one-to-many relationship
>>> I'd like to publish a joined message if either of the source streams
>>> receives an update.
>>> 
>>> In the Global Table Wiki:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-99%3A+
>>> Add+Global+Tables+to+Kafka+Streams
>>> 
>>> I see this:
>>> "The GlobalKTable will only be used for doing lookups. That is, data
>>> arriving in the GlobalKTable will not trigger the join. "
>>> 
>>> So how would I go about doing this?
>>> regards, Frank
>>> 
>>> 
>>> 
>>> On Tue, Feb 21, 2017 at 10:38 AM, Eno Thereska <eno.there...@gmail.com>
>>> wrote:
>>> 
>>> Hi Frank,
>>>> 
>>>> As far as I know the design in that wiki has been superceded by the
>>>> Global
>>>> KTables design which is now coming in 0.10.2. Hence, the JIRAs that are
>>>> mentioned there (like KAFKA-3705). There are some extensive comments in
>>>> https://issues.apache.org/jira/browse/KAFKA-3705 <
>>>> https://issues.apache.org/jira/browse/KAFKA-3705> illustrating why this
>>>> design is particularly challenging and why Global KTables was chosen
>>>> instead. I'm not sure if you still want to pursue that original design,
>>>> since it is not proven to work.
>>>> 
>>>> Guozhang, perhaps we need to add a note saying that Global KTables is the
>>>> new design?
>>>> 
>>>> Thanks
>>>> Eno
>>>> 
>>>> On 21 Feb 2017, at 07:35, Frank Lyaruu <flya...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I'm trying to implement joining two Kafka tables using a 'remote' key,
>>>>> basically as described here:
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>> 
>>>> Discussion%3A+Non-key+KTable-KTable+Joins
>>>> 
>>>>> Under the "Implementation Details" there is one line I don't know how to
>>>>> do:
>>>>> 
>>>>> 
>>>>>   1. First of all, we will repartition this KTable's stream, by key
>>>>>   computed from the *mapper(K, V) → K1*, so that it is co-partitioned
>>>>> by
>>>>>   the same key. The co-partition topic is partitioned on the new key,
>>>>> 
>>>> but the
>>>> 
>>>>>   message key and value are unchanged, and log compaction is turned
>>>>> off.
>>>>> 
>>>>> 
>>>>> How do I do that? I've been unable to find any documentation, I've
>>>>> looked
>>>>> at the StreamPartitionAssignor, that seems relevant, but I could use
>>>>> some
>>>>> help. Does anyone have an example?
>>>>> 
>>>>> regards, Frank
>>>>> 
>>>> 
>>>> 
>> 
> 
> 
> -- 
> -- Guozhang

Re: Implementing a non-key in Kafka Streams using the Processor API

Reply via email to