Re: PyFlink kafka producer topic override

Curt Buechter Thu, 24 Jun 2021 14:31:10 -0700

Thanks. I will reconsider my architecture.

On Thu, Jun 24, 2021 at 1:37 AM Arvid Heise <ar...@apache.org> wrote:


> Hi,
>
> getTargetTopic can really be used as Curt needs it. So it would be good to
> add to PyFlink as well.
>
> However, I'm a bit skeptical that Kafka can really handle that model well.
> It's usually encouraged to use rather fewer, larger topics and you'd rather
> use partitions here instead of topics. There is a huge overhead on managing
> topics and even some hard limits that can be reached quicker than it
> initially appears.
> Of course, if you want to use ACLs, afaik you cannot define them on
> partitions, so you probably do not have any choice but to do it your way
> (or consider an alternative architecture).
> It's probably also fine if you just have THE one table that you need to
> divide, but it's not a pattern that scales to more tables or to an
> ever-growing number of tenants. Partitions scale much much better.
>
> Also from painful past experiences, you should absolutely create all
> topics beforehand on your production env and really use getTargetTopic just
> for routing.
>
> On Thu, Jun 24, 2021 at 8:10 AM Dian Fu <dian0511...@gmail.com> wrote:
>
>> OK, got it. Then it seems that split streams is also not quite suitable
>> to address your requirements as you still need to iterate over each of the
>> side output corresponding to each topic.
>>
>> Regarding to getTargetTopic [1],  per my understanding, it’s not designed
>> to dynamically assign the topic for an element according to the
>> documentation. However, I’m not 100% sure about this and maybe I missed
>> something.
>>
>> [1]
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaContextAware.java#L48
>>
>> Regards,
>> Dian
>>
>> 2021年6月24日 上午11:45，Curt Buechter <tricksho...@gmail.com> 写道：
>>
>> Hi Dian,
>> Thanks for the reply.
>> I don't think a filter function makes sense here. I have 2,000 tenants in
>> the source database, and I want all records for a single tenant in a
>> tenant-specific topic. So, with a filter function, if I understand it
>> correctly, I would need 2,000 different filters, which isn't very practical.
>>
>> An example:
>> source_topic(tenant_id, first_name, last_name)
>>
>> destination:
>> tenant1.sink_topic (first_name, last_name)
>> tenant2.sink_topic (first_name, last_name)
>> ...
>> tenant2000.sink_topic (first_name, last_name)
>>
>> On 2021/06/24 03:18:36, Dian Fu <d...@gmail.com> wrote:
>> > You are right that split is still not supported. Does it make sense for
>> you to split the stream using a filter function? There is some overhead
>> compared the built-in stream.split as you need to provide a filter function
>> for each sub-stream and so a record will evaluated multiple times.>
>> >
>> > > 2021年6月24日 上午3:08，Curt Buechter <tr...@gmail.com> 写道：>
>> > > >
>> > > Hi,>
>> > > New PyFlink user here. Loving it so far. The first major problem I've
>> run into is that I cannot create a Kafka Producer with dynamic topics. I
>> see that this has been available for quite some time in Java with Keyed
>> Serialization using the getTargetTopic method. Another way to do this in
>> Java may be with stream.split(), and adding a different sink for the split
>> streams. Stream splitting is also not available in PyFlink.>
>> > > Am I missing anything? Has anyone implemented this before in PyFlink,
>> or know of a way to make it happen?>
>> > > The use case here is that I'm using a CDC Debezium connector to
>> populate kafka topics from a multi-tenant database, and I'm trying to use
>> PyFlink to split the records into a different topic for each tenant.>
>> > > >
>> > > Thanks>
>> >
>> >
>>
>>
>>

Re: PyFlink kafka producer topic override

Reply via email to