Yes but there are transformations designed for PCollection<Row> that we use
before writing to BQ

On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]>
wrote:

> Beam's schema transforms should work with protocol buffers as well. Beam
> automatically infers the type of proto and efficiently calls the accessors
> (assuming that these are precompiled protos from a .proto file). If the
> proto matches the BigQuery schema, you can use writeProto and skip the
> entire conversion stage.
>
> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> wrote:
>
>> Well, I'm trying to build something as cost effective as possible. I was
>> trying to use row to tablerow and use the writeTableRow function, but it's
>> too expensive. From the profiler, it seems row to tablerow is expensive,
>> But from the source code I also see it's possible to write beam row
>> directly to Bigquery
>>
>> Do you guys have any suggestions? I can try to use writeProto but then I
>> don't get the benefit of all the buildin transformations that designed for
>> beam row format
>>
>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <[email protected]>
>> wrote:
>>
>>> Can you explain what you are trying to do here? BigQuery requires schema
>>> to be known before we write. Beam schemas similarly must be known at graph
>>> construction time - though this isn't quite the same as Java compile time.
>>>
>>> Reuven
>>>
>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]>
>>> wrote:
>>>
>>>> I mean how do I create empty list if the element type is unknown at
>>>> compile time.
>>>>
>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks @Ahmed Abualsaud <[email protected]>  but how do I get
>>>>> around this error for now if I want to use beam schema?
>>>>>
>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey Siyuan,
>>>>>>
>>>>>> We use the descriptor because it is derived from the BQ table's
>>>>>> schema In a previous step [1]. We are essentially checking against the
>>>>>> table schema.
>>>>>> You're seeing this error because *nullable* and *repeated* modes are
>>>>>> mutually exclusive. I think we can reduce friction though by defaulting
>>>>>> null values to an empty list, which seems to be in line with GoogleSQL's
>>>>>> behavior [2].
>>>>>>
>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>>
>>>>>> For now, you can work around this by converting your null array
>>>>>> values to empty lists.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>>> [2]
>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>>
>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm trying to write Beam row directly to bigquery because it would
>>>>>>> go through less conversion and more efficient but there is some weird 
>>>>>>> error
>>>>>>> happening
>>>>>>> A nullable array field would throw
>>>>>>>
>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value
>>>>>>> for non-nullable field
>>>>>>>
>>>>>>> If I set null for that field
>>>>>>>
>>>>>>> Here is code in beam I found related
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>>
>>>>>>>   private static Object messageValueFromRowValue(
>>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int index,
>>>>>>> Row row) {
>>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>>     if (value == null) {
>>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>>         return null;
>>>>>>>       } else {
>>>>>>>         throw new IllegalArgumentException(
>>>>>>>             "Received null value for non-nullable field " +
>>>>>>> fieldDescriptor.getName());
>>>>>>>       }
>>>>>>>     }
>>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(), value);
>>>>>>>   }
>>>>>>>
>>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should 
>>>>>>> stick to
>>>>>>> nullable setting on beam schema field, correct?
>>>>>>>
>>>>>>> And how do I avoid this?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Siyuan
>>>>>>>
>>>>>>

Reply via email to