The PR got merged, so this is fixed for 2.60.0.

The workaround I mentioned would be to add a step before the BQ write step
(maybe a MapElements or something similar) that goes through your Rows and
overwrites null list values to empty lists.

Or if you like, you can also just use the most recent snapshot version.

On Tue, Oct 1, 2024 at 8:09 PM [email protected] <[email protected]> wrote:

> So far I'm happy with the performance of converting Beam row directly to
> Proto and save to BQ. But if in beam schema a field is nullable but you can
> not really set it to null, It is a bug, right?
>
> On Tue, Oct 1, 2024 at 10:07 AM [email protected] <[email protected]> wrote:
>
>> Also Is there a tool to generate proto from existing BQ table?
>>
>> On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]>
>> wrote:
>>
>>> Yes but there are transformations designed for PCollection<Row> that we
>>> use before writing to BQ
>>>
>>> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]>
>>> wrote:
>>>
>>>> Beam's schema transforms should work with protocol buffers as well.
>>>> Beam automatically infers the type of proto and efficiently calls the
>>>> accessors (assuming that these are precompiled protos from a .proto file).
>>>> If the proto matches the BigQuery schema, you can use writeProto and skip
>>>> the entire conversion stage.
>>>>
>>>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]>
>>>> wrote:
>>>>
>>>>> Well, I'm trying to build something as cost effective as possible. I
>>>>> was trying to use row to tablerow and use the writeTableRow function, but
>>>>> it's too expensive. From the profiler, it seems row to tablerow is
>>>>> expensive, But from the source code I also see it's possible to write beam
>>>>> row directly to Bigquery
>>>>>
>>>>> Do you guys have any suggestions? I can try to use writeProto but then
>>>>> I don't get the benefit of all the buildin transformations that designed
>>>>> for beam row format
>>>>>
>>>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Can you explain what you are trying to do here? BigQuery requires
>>>>>> schema to be known before we write. Beam schemas similarly must be known 
>>>>>> at
>>>>>> graph construction time - though this isn't quite the same as Java 
>>>>>> compile
>>>>>> time.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I mean how do I create empty list if the element type is unknown at
>>>>>>> compile time.
>>>>>>>
>>>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks @Ahmed Abualsaud <[email protected]>  but how do I
>>>>>>>> get around this error for now if I want to use beam schema?
>>>>>>>>
>>>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey Siyuan,
>>>>>>>>>
>>>>>>>>> We use the descriptor because it is derived from the BQ table's
>>>>>>>>> schema In a previous step [1]. We are essentially checking against the
>>>>>>>>> table schema.
>>>>>>>>> You're seeing this error because *nullable* and *repeated* modes
>>>>>>>>> are mutually exclusive. I think we can reduce friction though by 
>>>>>>>>> defaulting
>>>>>>>>> null values to an empty list, which seems to be in line with 
>>>>>>>>> GoogleSQL's
>>>>>>>>> behavior [2].
>>>>>>>>>
>>>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>>>>>
>>>>>>>>> For now, you can work around this by converting your null array
>>>>>>>>> values to empty lists.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>>>>>> [2]
>>>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>>>>>
>>>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm trying to write Beam row directly to bigquery because it
>>>>>>>>>> would go through less conversion and more efficient but there is 
>>>>>>>>>> some weird
>>>>>>>>>> error happening
>>>>>>>>>> A nullable array field would throw
>>>>>>>>>>
>>>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null
>>>>>>>>>> value for non-nullable field
>>>>>>>>>>
>>>>>>>>>> If I set null for that field
>>>>>>>>>>
>>>>>>>>>> Here is code in beam I found related
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>>>>>
>>>>>>>>>>   private static Object messageValueFromRowValue(
>>>>>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int
>>>>>>>>>> index, Row row) {
>>>>>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>>>>>     if (value == null) {
>>>>>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>>>>>         return null;
>>>>>>>>>>       } else {
>>>>>>>>>>         throw new IllegalArgumentException(
>>>>>>>>>>             "Received null value for non-nullable field " +
>>>>>>>>>> fieldDescriptor.getName());
>>>>>>>>>>       }
>>>>>>>>>>     }
>>>>>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(),
>>>>>>>>>> value);
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should 
>>>>>>>>>> stick to
>>>>>>>>>> nullable setting on beam schema field, correct?
>>>>>>>>>>
>>>>>>>>>> And how do I avoid this?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Siyuan
>>>>>>>>>>
>>>>>>>>>

Reply via email to