Re: About Beam SQL Schema Changes and Code generation

Andrew Pilloud Mon, 07 Dec 2020 18:10:27 -0800

Unfortunately we don't have a way to generate the SQL Java code on the fly,
even if we did, that wouldn't solve your problem. I believe our recommended
practice is to run both the old and new pipeline for some time, then pick a
window boundary to transition the output from the old pipeline to the new
one.

Beam doesn't handle changing the format of data sent between intermediate
steps in a running pipeline. Beam uses "coders" to serialize data between
steps of the pipeline. The builtin coders (including the Schema Row Coder
used by SQL) have a fixed data format and don't handle schema evolution.
They are optimized for performance at all costs.

If you worked around this, the Beam model doesn't support changing the
structure of the pipeline graph. This would significantly limit the changes
you can make. It would also require some changes to SQL to try to produce
the same plan for an updated SQL query.

Andrew

On Mon, Dec 7, 2020 at 5:44 PM Talat Uyarer <tuya...@paloaltonetworks.com>
wrote:

> Hi,
>
> We are using Beamsql on our pipeline. Our Data is written in Avro format.
> We generate our rows based on our Avro schema. Over time the schema is
> changing. I believe Beam SQL generates Java code based on what we define as
> BeamSchema while submitting the pipeline. Do you have any idea How can we
> handle schema changes with resubmitting our beam job. Is it possible to
> generate SQL java code on the fly ?
>
> Thanks
>

Re: About Beam SQL Schema Changes and Code generation

Reply via email to