Yes but there are transformations designed for PCollection<Row> that we use before writing to BQ
On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]> wrote: > Beam's schema transforms should work with protocol buffers as well. Beam > automatically infers the type of proto and efficiently calls the accessors > (assuming that these are precompiled protos from a .proto file). If the > proto matches the BigQuery schema, you can use writeProto and skip the > entire conversion stage. > > On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> wrote: > >> Well, I'm trying to build something as cost effective as possible. I was >> trying to use row to tablerow and use the writeTableRow function, but it's >> too expensive. From the profiler, it seems row to tablerow is expensive, >> But from the source code I also see it's possible to write beam row >> directly to Bigquery >> >> Do you guys have any suggestions? I can try to use writeProto but then I >> don't get the benefit of all the buildin transformations that designed for >> beam row format >> >> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <[email protected]> >> wrote: >> >>> Can you explain what you are trying to do here? BigQuery requires schema >>> to be known before we write. Beam schemas similarly must be known at graph >>> construction time - though this isn't quite the same as Java compile time. >>> >>> Reuven >>> >>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]> >>> wrote: >>> >>>> I mean how do I create empty list if the element type is unknown at >>>> compile time. >>>> >>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]> >>>> wrote: >>>> >>>>> Thanks @Ahmed Abualsaud <[email protected]> but how do I get >>>>> around this error for now if I want to use beam schema? >>>>> >>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>>>> [email protected]> wrote: >>>>> >>>>>> Hey Siyuan, >>>>>> >>>>>> We use the descriptor because it is derived from the BQ table's >>>>>> schema In a previous step [1]. We are essentially checking against the >>>>>> table schema. >>>>>> You're seeing this error because *nullable* and *repeated* modes are >>>>>> mutually exclusive. I think we can reduce friction though by defaulting >>>>>> null values to an empty list, which seems to be in line with GoogleSQL's >>>>>> behavior [2]. >>>>>> >>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>>>> >>>>>> For now, you can work around this by converting your null array >>>>>> values to empty lists. >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>>>> [2] >>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>>>> >>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I'm trying to write Beam row directly to bigquery because it would >>>>>>> go through less conversion and more efficient but there is some weird >>>>>>> error >>>>>>> happening >>>>>>> A nullable array field would throw >>>>>>> >>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value >>>>>>> for non-nullable field >>>>>>> >>>>>>> If I set null for that field >>>>>>> >>>>>>> Here is code in beam I found related >>>>>>> >>>>>>> >>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>>>> >>>>>>> private static Object messageValueFromRowValue( >>>>>>> FieldDescriptor fieldDescriptor, Field beamField, int index, >>>>>>> Row row) { >>>>>>> @Nullable Object value = row.getValue(index); >>>>>>> if (value == null) { >>>>>>> if (fieldDescriptor.isOptional()) { >>>>>>> return null; >>>>>>> } else { >>>>>>> throw new IllegalArgumentException( >>>>>>> "Received null value for non-nullable field " + >>>>>>> fieldDescriptor.getName()); >>>>>>> } >>>>>>> } >>>>>>> return toProtoValue(fieldDescriptor, beamField.getType(), value); >>>>>>> } >>>>>>> >>>>>>> line 277 why not use beamField.isNullable() instead of >>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should >>>>>>> stick to >>>>>>> nullable setting on beam schema field, correct? >>>>>>> >>>>>>> And how do I avoid this? >>>>>>> >>>>>>> Regards, >>>>>>> Siyuan >>>>>>> >>>>>>
