Hi Andrew, I assume SQL query is not going to change. Changing things is the Row schema by adding new columns or rename columns. if we keep a version information on somewhere for example a KV pair. Key is schema information, value is Row. Can not we generate SQL code ? Why I am asking We have 15k pipelines. When we have a schema change we restart a 15k DF job which is pain. I am looking for a possible way to avoid job restart. Dont you think it is not still doable ?
Thanks On Mon, Dec 7, 2020 at 6:10 PM Andrew Pilloud <apill...@google.com> wrote: > Unfortunately we don't have a way to generate the SQL Java code on the > fly, even if we did, that wouldn't solve your problem. I believe our > recommended practice is to run both the old and new pipeline for some time, > then pick a window boundary to transition the output from the old pipeline > to the new one. > > Beam doesn't handle changing the format of data sent between intermediate > steps in a running pipeline. Beam uses "coders" to serialize data between > steps of the pipeline. The builtin coders (including the Schema Row Coder > used by SQL) have a fixed data format and don't handle schema evolution. > They are optimized for performance at all costs. > > If you worked around this, the Beam model doesn't support changing the > structure of the pipeline graph. This would significantly limit the changes > you can make. It would also require some changes to SQL to try to produce > the same plan for an updated SQL query. > > Andrew > > On Mon, Dec 7, 2020 at 5:44 PM Talat Uyarer <tuya...@paloaltonetworks.com> > wrote: > >> Hi, >> >> We are using Beamsql on our pipeline. Our Data is written in Avro format. >> We generate our rows based on our Avro schema. Over time the schema is >> changing. I believe Beam SQL generates Java code based on what we define as >> BeamSchema while submitting the pipeline. Do you have any idea How can we >> handle schema changes with resubmitting our beam job. Is it possible to >> generate SQL java code on the fly ? >> >> Thanks >> >