Thanks Alexey for sharing tickets and code. I found one workaround to use the update function. If I generate different KafkaIO step name for each submission and provide --transformNameMapping="{"Kafka_IO_3242/Read(KafkaUnboundedSource)/DataflowRunner.StreamingUnboundedRead.ReadWithIds":""}" My update command is successfully. I have a question about it. Is there any way to prevent data loss ? I believe when I provide that transformNameMapping, DF destroys previous KafakIO with its state. Does .commitOffsetsInFinalize() help me to prevent data loss ? I am ok with small data duplication.
Thanks On Mon, Jun 29, 2020 at 10:17 AM Alexey Romanenko <aromanenko....@gmail.com> wrote: > Yes, it’s a known limitation [1] mostly due to the fact that KafkaIO.Read > is based on UnboundedSource API and it fetches all information about topic > and partitions only once during a “split" phase [2]. There is on-going work > to make KafkaIO.Read based on Splittable DoFn [3] which should allow to get > topic/partitions information dynamically, without restarting a pipeline. > > [1] https://issues.apache.org/jira/browse/BEAM-727 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D727&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=qE1jp6UlbdKM5dXaDZzNnB9ZDEv5oy8e2v__lg7dhsc&e=> > [2] > https://github.com/apache/beam/blob/8a54c17235f768f089b36265d79e69ee9b27a2ce/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedSource.java#L54 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_8a54c17235f768f089b36265d79e69ee9b27a2ce_sdks_java_io_kafka_src_main_java_org_apache_beam_sdk_io_kafka_KafkaUnboundedSource.java-23L54&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=9HxiWfGyKoMej4vaaIlmJqacD90k8FcrPJNRn4tog4I&e=> > [3] https://issues.apache.org/jira/browse/BEAM-9977 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D9977&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RUPU-Ql-as-aMhyluV9No2li49C63Jzv4wrk0BTBzyM&s=oOSICgp142hODu039pk4iIbe8Ziv0PkGcTaRh7kaDVU&e=> > > > On 29 Jun 2020, at 18:14, Talat Uyarer <tuya...@paloaltonetworks.com> > wrote: > > Hi, > > I am using Dataflow. When I update my DF job with a new topic or update > partition count on the same topic. I can not use DF's update function. > Could you help me to understand why I can not use the update function for > that ? > > I checked the Beam source code but I could not find the right place to > read on the code. > > Thanks for your help in advance > Talat > > >