Thanks @Ahmed Abualsaud <[email protected]> On Tue, Oct 1, 2024 at 10:22 AM Ahmed Abualsaud via user < [email protected]> wrote:
> The PR got merged, so this is fixed for 2.60.0. > > The workaround I mentioned would be to add a step before the BQ write step > (maybe a MapElements or something similar) that goes through your Rows and > overwrites null list values to empty lists. > > Or if you like, you can also just use the most recent snapshot version. > > On Tue, Oct 1, 2024 at 8:09 PM [email protected] <[email protected]> wrote: > >> So far I'm happy with the performance of converting Beam row directly to >> Proto and save to BQ. But if in beam schema a field is nullable but you can >> not really set it to null, It is a bug, right? >> >> On Tue, Oct 1, 2024 at 10:07 AM [email protected] <[email protected]> >> wrote: >> >>> Also Is there a tool to generate proto from existing BQ table? >>> >>> On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]> >>> wrote: >>> >>>> Yes but there are transformations designed for PCollection<Row> that we >>>> use before writing to BQ >>>> >>>> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user < >>>> [email protected]> wrote: >>>> >>>>> Beam's schema transforms should work with protocol buffers as well. >>>>> Beam automatically infers the type of proto and efficiently calls the >>>>> accessors (assuming that these are precompiled protos from a .proto file). >>>>> If the proto matches the BigQuery schema, you can use writeProto and skip >>>>> the entire conversion stage. >>>>> >>>>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> >>>>> wrote: >>>>> >>>>>> Well, I'm trying to build something as cost effective as possible. I >>>>>> was trying to use row to tablerow and use the writeTableRow function, but >>>>>> it's too expensive. From the profiler, it seems row to tablerow is >>>>>> expensive, But from the source code I also see it's possible to write >>>>>> beam >>>>>> row directly to Bigquery >>>>>> >>>>>> Do you guys have any suggestions? I can try to use writeProto but >>>>>> then I don't get the benefit of all the buildin transformations that >>>>>> designed for beam row format >>>>>> >>>>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Can you explain what you are trying to do here? BigQuery requires >>>>>>> schema to be known before we write. Beam schemas similarly must be >>>>>>> known at >>>>>>> graph construction time - though this isn't quite the same as Java >>>>>>> compile >>>>>>> time. >>>>>>> >>>>>>> Reuven >>>>>>> >>>>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I mean how do I create empty list if the element type is unknown at >>>>>>>> compile time. >>>>>>>> >>>>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks @Ahmed Abualsaud <[email protected]> but how do I >>>>>>>>> get around this error for now if I want to use beam schema? >>>>>>>>> >>>>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey Siyuan, >>>>>>>>>> >>>>>>>>>> We use the descriptor because it is derived from the BQ table's >>>>>>>>>> schema In a previous step [1]. We are essentially checking against >>>>>>>>>> the >>>>>>>>>> table schema. >>>>>>>>>> You're seeing this error because *nullable* and *repeated* modes >>>>>>>>>> are mutually exclusive. I think we can reduce friction though by >>>>>>>>>> defaulting >>>>>>>>>> null values to an empty list, which seems to be in line with >>>>>>>>>> GoogleSQL's >>>>>>>>>> behavior [2]. >>>>>>>>>> >>>>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>>>>>>>> >>>>>>>>>> For now, you can work around this by converting your null array >>>>>>>>>> values to empty lists. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>>>>>>>> [2] >>>>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>>>>>>>> >>>>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I'm trying to write Beam row directly to bigquery because it >>>>>>>>>>> would go through less conversion and more efficient but there is >>>>>>>>>>> some weird >>>>>>>>>>> error happening >>>>>>>>>>> A nullable array field would throw >>>>>>>>>>> >>>>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null >>>>>>>>>>> value for non-nullable field >>>>>>>>>>> >>>>>>>>>>> If I set null for that field >>>>>>>>>>> >>>>>>>>>>> Here is code in beam I found related >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>>>>>>>> >>>>>>>>>>> private static Object messageValueFromRowValue( >>>>>>>>>>> FieldDescriptor fieldDescriptor, Field beamField, int >>>>>>>>>>> index, Row row) { >>>>>>>>>>> @Nullable Object value = row.getValue(index); >>>>>>>>>>> if (value == null) { >>>>>>>>>>> if (fieldDescriptor.isOptional()) { >>>>>>>>>>> return null; >>>>>>>>>>> } else { >>>>>>>>>>> throw new IllegalArgumentException( >>>>>>>>>>> "Received null value for non-nullable field " + >>>>>>>>>>> fieldDescriptor.getName()); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> return toProtoValue(fieldDescriptor, beamField.getType(), >>>>>>>>>>> value); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> line 277 why not use beamField.isNullable() instead of >>>>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should >>>>>>>>>>> stick to >>>>>>>>>>> nullable setting on beam schema field, correct? >>>>>>>>>>> >>>>>>>>>>> And how do I avoid this? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Siyuan >>>>>>>>>>> >>>>>>>>>>
