Dear XQ and Ahmed,

We plan to use dynamic schemas and I tried to implement it by myself. But I
found maybe this is a blocker, Beam's Coder system generally requires a *fixed
schema* to serialize and deserialize data efficiently. When working with
dynamic schemas that are determined at runtime, this can become a blocker.
Is my understanding correct? Could you give some suggestions on how to
solve this?

Best regards,
Luyao

On Thu, Jan 23, 2025 at 10:01 AM XQ Hu <x...@google.com> wrote:

> +kaikua...@gmail.com explicitly. Please check the response.
>
> On Thu, Jan 23, 2025 at 10:45 AM Ahmed Abualsaud <
> ahmedabuals...@apache.org> wrote:
>
>> Hey Luyao!
>>
>> Thanks for reaching out. Beam's Iceberg connector supports dynamic
>> destinations so long as they have equivalent Schemas. There's no support
>> for dynamic schemas yet, i.e. there's no `getDataSchema(<destination>)`
>> method.
>>
>> There's a feature request on Github:
>> https://github.com/apache/beam/issues/33724.
>>
>> > I don't think Managed.write support with DynamicDestinations
>>
>> ManagedIO does support dynamic destinations, in the format described
>> here:
>> https://cloud.google.com/dataflow/docs/guides/managed-io#dynamic-destinations.
>> It also does not support dynamic schemas yet.
>>
>> Best,
>> Ahmed
>>
>> On 2025/01/23 04:02:57 Ling Li wrote:
>> > Hi team,
>> >
>> > Happy New Year! I have a question regarding
>> > the "org.apache.beam:beam-sdks-java-io-iceberg:2.61.0".
>> >
>> > Does the org.apache.beam.sdk.schemas.Schema getDataSchema(String
>> > destination) of class DynamicDestinations already exist?
>> >
>> > Our team tried to use DynamicDestinations to decide which iceberg table
>> to
>> > write in the runtime but failed.
>> >
>> >
>> > Attached is my code class DynamicIcebergDestinations. Our problem is
>> that
>> > we don't have a universal schema that can match all events to be
>> > implemented for getDataSchema(). We need to use a parameter to get the
>> > correct schema for each event, so we want to use: getDataSchema(String
>> > destination). But seems getDataSchema(String destination) is not
>> > implemented for org.apache.beam.sdk.io.iceberg.DynamicDestinations.
>> Since
>> > getDataSchema() will be automatically called and it cannot return a
>> > universal schema, there is no way for us to
>> > use IcebergIO.writeRows(icebergCatalogConfig).to(dynamicDestinations). I
>> > also tried to use managed I/O connector, but I don't think
>> > Managed.write support
>> > with DynamicDestinations. Could you give some urgent guidance?
>> >
>> > Thank you so much.
>> >
>> > Sincerely,
>> > Luyao
>> >
>>
>

Reply via email to