So far I'm happy with the performance of converting Beam row directly to Proto and save to BQ. But if in beam schema a field is nullable but you can not really set it to null, It is a bug, right?
On Tue, Oct 1, 2024 at 10:07 AM [email protected] <[email protected]> wrote: > Also Is there a tool to generate proto from existing BQ table? > > On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]> wrote: > >> Yes but there are transformations designed for PCollection<Row> that we >> use before writing to BQ >> >> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]> >> wrote: >> >>> Beam's schema transforms should work with protocol buffers as well. Beam >>> automatically infers the type of proto and efficiently calls the accessors >>> (assuming that these are precompiled protos from a .proto file). If the >>> proto matches the BigQuery schema, you can use writeProto and skip the >>> entire conversion stage. >>> >>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> >>> wrote: >>> >>>> Well, I'm trying to build something as cost effective as possible. I >>>> was trying to use row to tablerow and use the writeTableRow function, but >>>> it's too expensive. From the profiler, it seems row to tablerow is >>>> expensive, But from the source code I also see it's possible to write beam >>>> row directly to Bigquery >>>> >>>> Do you guys have any suggestions? I can try to use writeProto but then >>>> I don't get the benefit of all the buildin transformations that designed >>>> for beam row format >>>> >>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user < >>>> [email protected]> wrote: >>>> >>>>> Can you explain what you are trying to do here? BigQuery requires >>>>> schema to be known before we write. Beam schemas similarly must be known >>>>> at >>>>> graph construction time - though this isn't quite the same as Java compile >>>>> time. >>>>> >>>>> Reuven >>>>> >>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]> >>>>> wrote: >>>>> >>>>>> I mean how do I create empty list if the element type is unknown at >>>>>> compile time. >>>>>> >>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks @Ahmed Abualsaud <[email protected]> but how do I >>>>>>> get around this error for now if I want to use beam schema? >>>>>>> >>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hey Siyuan, >>>>>>>> >>>>>>>> We use the descriptor because it is derived from the BQ table's >>>>>>>> schema In a previous step [1]. We are essentially checking against the >>>>>>>> table schema. >>>>>>>> You're seeing this error because *nullable* and *repeated* modes >>>>>>>> are mutually exclusive. I think we can reduce friction though by >>>>>>>> defaulting >>>>>>>> null values to an empty list, which seems to be in line with >>>>>>>> GoogleSQL's >>>>>>>> behavior [2]. >>>>>>>> >>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>>>>>> >>>>>>>> For now, you can work around this by converting your null array >>>>>>>> values to empty lists. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>>>>>> [2] >>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>>>>>> >>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm trying to write Beam row directly to bigquery because it would >>>>>>>>> go through less conversion and more efficient but there is some weird >>>>>>>>> error >>>>>>>>> happening >>>>>>>>> A nullable array field would throw >>>>>>>>> >>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value >>>>>>>>> for non-nullable field >>>>>>>>> >>>>>>>>> If I set null for that field >>>>>>>>> >>>>>>>>> Here is code in beam I found related >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>>>>>> >>>>>>>>> private static Object messageValueFromRowValue( >>>>>>>>> FieldDescriptor fieldDescriptor, Field beamField, int index, >>>>>>>>> Row row) { >>>>>>>>> @Nullable Object value = row.getValue(index); >>>>>>>>> if (value == null) { >>>>>>>>> if (fieldDescriptor.isOptional()) { >>>>>>>>> return null; >>>>>>>>> } else { >>>>>>>>> throw new IllegalArgumentException( >>>>>>>>> "Received null value for non-nullable field " + >>>>>>>>> fieldDescriptor.getName()); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> return toProtoValue(fieldDescriptor, beamField.getType(), >>>>>>>>> value); >>>>>>>>> } >>>>>>>>> >>>>>>>>> line 277 why not use beamField.isNullable() instead of >>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should >>>>>>>>> stick to >>>>>>>>> nullable setting on beam schema field, correct? >>>>>>>>> >>>>>>>>> And how do I avoid this? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Siyuan >>>>>>>>> >>>>>>>>
