Also Is there a tool to generate proto from existing BQ table? On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]> wrote:
> Yes but there are transformations designed for PCollection<Row> that we > use before writing to BQ > > On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]> > wrote: > >> Beam's schema transforms should work with protocol buffers as well. Beam >> automatically infers the type of proto and efficiently calls the accessors >> (assuming that these are precompiled protos from a .proto file). If the >> proto matches the BigQuery schema, you can use writeProto and skip the >> entire conversion stage. >> >> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> wrote: >> >>> Well, I'm trying to build something as cost effective as possible. I was >>> trying to use row to tablerow and use the writeTableRow function, but it's >>> too expensive. From the profiler, it seems row to tablerow is expensive, >>> But from the source code I also see it's possible to write beam row >>> directly to Bigquery >>> >>> Do you guys have any suggestions? I can try to use writeProto but then I >>> don't get the benefit of all the buildin transformations that designed for >>> beam row format >>> >>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <[email protected]> >>> wrote: >>> >>>> Can you explain what you are trying to do here? BigQuery requires >>>> schema to be known before we write. Beam schemas similarly must be known at >>>> graph construction time - though this isn't quite the same as Java compile >>>> time. >>>> >>>> Reuven >>>> >>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]> >>>> wrote: >>>> >>>>> I mean how do I create empty list if the element type is unknown at >>>>> compile time. >>>>> >>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks @Ahmed Abualsaud <[email protected]> but how do I >>>>>> get around this error for now if I want to use beam schema? >>>>>> >>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Siyuan, >>>>>>> >>>>>>> We use the descriptor because it is derived from the BQ table's >>>>>>> schema In a previous step [1]. We are essentially checking against the >>>>>>> table schema. >>>>>>> You're seeing this error because *nullable* and *repeated* modes >>>>>>> are mutually exclusive. I think we can reduce friction though by >>>>>>> defaulting >>>>>>> null values to an empty list, which seems to be in line with GoogleSQL's >>>>>>> behavior [2]. >>>>>>> >>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>>>>> >>>>>>> For now, you can work around this by converting your null array >>>>>>> values to empty lists. >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>>>>> [2] >>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>>>>> >>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm trying to write Beam row directly to bigquery because it would >>>>>>>> go through less conversion and more efficient but there is some weird >>>>>>>> error >>>>>>>> happening >>>>>>>> A nullable array field would throw >>>>>>>> >>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value >>>>>>>> for non-nullable field >>>>>>>> >>>>>>>> If I set null for that field >>>>>>>> >>>>>>>> Here is code in beam I found related >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>>>>> >>>>>>>> private static Object messageValueFromRowValue( >>>>>>>> FieldDescriptor fieldDescriptor, Field beamField, int index, >>>>>>>> Row row) { >>>>>>>> @Nullable Object value = row.getValue(index); >>>>>>>> if (value == null) { >>>>>>>> if (fieldDescriptor.isOptional()) { >>>>>>>> return null; >>>>>>>> } else { >>>>>>>> throw new IllegalArgumentException( >>>>>>>> "Received null value for non-nullable field " + >>>>>>>> fieldDescriptor.getName()); >>>>>>>> } >>>>>>>> } >>>>>>>> return toProtoValue(fieldDescriptor, beamField.getType(), >>>>>>>> value); >>>>>>>> } >>>>>>>> >>>>>>>> line 277 why not use beamField.isNullable() instead of >>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should >>>>>>>> stick to >>>>>>>> nullable setting on beam schema field, correct? >>>>>>>> >>>>>>>> And how do I avoid this? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Siyuan >>>>>>>> >>>>>>>
