Thanks Alexey for sharing tickets and code. I found one workaround to use
the update function. If I generate different KafkaIO step name for each
submission and
provide 
--transformNameMapping="{"Kafka_IO_3242/Read(KafkaUnboundedSource)/DataflowRunner.StreamingUnboundedRead.ReadWithIds":""}"
My update command is successfully. I have a question about it. Is there any
way to prevent data loss ? I believe when I provide
that transformNameMapping, DF destroys previous KafakIO with its state.
Does .commitOffsetsInFinalize()  help me to prevent data loss ? I am ok
with small data duplication.

Thanks

On Mon, Jun 29, 2020 at 10:17 AM Alexey Romanenko <aromanenko....@gmail.com>
wrote:

> Yes, it’s a known limitation [1] mostly due to the fact that KafkaIO.Read
> is based on UnboundedSource API and it fetches all information about topic
> and partitions only once during a “split" phase [2]. There is on-going work
> to make KafkaIO.Read based on Splittable DoFn [3] which should allow to get
> topic/partitions information dynamically, without restarting a pipeline.
>
> [1] https://issues.apache.org/jira/browse/BEAM-727
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D727&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=qE1jp6UlbdKM5dXaDZzNnB9ZDEv5oy8e2v__lg7dhsc&e=>
> [2]
> https://github.com/apache/beam/blob/8a54c17235f768f089b36265d79e69ee9b27a2ce/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java#L54
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_8a54c17235f768f089b36265d79e69ee9b27a2ce_sdks_java_io_kafka_src_main_java_org_apache_beam_sdk_io_kafka_KafkaUnboundedSource.java-23L54&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=9HxiWfGyKoMej4vaaIlmJqacD90k8FcrPJNRn4tog4I&e=>
> [3] https://issues.apache.org/jira/browse/BEAM-9977
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D9977&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=oOSICgp142hODu039pk4iIbe8Ziv0PkGcTaRh7kaDVU&e=>
>
>
> On 29 Jun 2020, at 18:14, Talat Uyarer <tuya...@paloaltonetworks.com>
> wrote:
>
> Hi,
>
> I am using Dataflow. When I update my DF job with a new topic or update
> partition count on the same topic. I can not use DF's update function.
> Could you help me to understand why I can not use the update function for
> that ?
>
> I checked the Beam source code but I could not find the right place to
> read on the code.
>
> Thanks for your help in advance
> Talat
>
>
>

Reply via email to