Thanks @Ahmed Abualsaud <[email protected]>

On Tue, Oct 1, 2024 at 10:22 AM Ahmed Abualsaud via user <
[email protected]> wrote:

> The PR got merged, so this is fixed for 2.60.0.
>
> The workaround I mentioned would be to add a step before the BQ write step
> (maybe a MapElements or something similar) that goes through your Rows and
> overwrites null list values to empty lists.
>
> Or if you like, you can also just use the most recent snapshot version.
>
> On Tue, Oct 1, 2024 at 8:09 PM [email protected] <[email protected]> wrote:
>
>> So far I'm happy with the performance of converting Beam row directly to
>> Proto and save to BQ. But if in beam schema a field is nullable but you can
>> not really set it to null, It is a bug, right?
>>
>> On Tue, Oct 1, 2024 at 10:07 AM [email protected] <[email protected]>
>> wrote:
>>
>>> Also Is there a tool to generate proto from existing BQ table?
>>>
>>> On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]>
>>> wrote:
>>>
>>>> Yes but there are transformations designed for PCollection<Row> that we
>>>> use before writing to BQ
>>>>
>>>> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <
>>>> [email protected]> wrote:
>>>>
>>>>> Beam's schema transforms should work with protocol buffers as well.
>>>>> Beam automatically infers the type of proto and efficiently calls the
>>>>> accessors (assuming that these are precompiled protos from a .proto file).
>>>>> If the proto matches the BigQuery schema, you can use writeProto and skip
>>>>> the entire conversion stage.
>>>>>
>>>>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Well, I'm trying to build something as cost effective as possible. I
>>>>>> was trying to use row to tablerow and use the writeTableRow function, but
>>>>>> it's too expensive. From the profiler, it seems row to tablerow is
>>>>>> expensive, But from the source code I also see it's possible to write 
>>>>>> beam
>>>>>> row directly to Bigquery
>>>>>>
>>>>>> Do you guys have any suggestions? I can try to use writeProto but
>>>>>> then I don't get the benefit of all the buildin transformations that
>>>>>> designed for beam row format
>>>>>>
>>>>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Can you explain what you are trying to do here? BigQuery requires
>>>>>>> schema to be known before we write. Beam schemas similarly must be 
>>>>>>> known at
>>>>>>> graph construction time - though this isn't quite the same as Java 
>>>>>>> compile
>>>>>>> time.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I mean how do I create empty list if the element type is unknown at
>>>>>>>> compile time.
>>>>>>>>
>>>>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks @Ahmed Abualsaud <[email protected]>  but how do I
>>>>>>>>> get around this error for now if I want to use beam schema?
>>>>>>>>>
>>>>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Siyuan,
>>>>>>>>>>
>>>>>>>>>> We use the descriptor because it is derived from the BQ table's
>>>>>>>>>> schema In a previous step [1]. We are essentially checking against 
>>>>>>>>>> the
>>>>>>>>>> table schema.
>>>>>>>>>> You're seeing this error because *nullable* and *repeated* modes
>>>>>>>>>> are mutually exclusive. I think we can reduce friction though by 
>>>>>>>>>> defaulting
>>>>>>>>>> null values to an empty list, which seems to be in line with 
>>>>>>>>>> GoogleSQL's
>>>>>>>>>> behavior [2].
>>>>>>>>>>
>>>>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>>>>>>
>>>>>>>>>> For now, you can work around this by converting your null array
>>>>>>>>>> values to empty lists.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>>>>>>> [2]
>>>>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm trying to write Beam row directly to bigquery because it
>>>>>>>>>>> would go through less conversion and more efficient but there is 
>>>>>>>>>>> some weird
>>>>>>>>>>> error happening
>>>>>>>>>>> A nullable array field would throw
>>>>>>>>>>>
>>>>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null
>>>>>>>>>>> value for non-nullable field
>>>>>>>>>>>
>>>>>>>>>>> If I set null for that field
>>>>>>>>>>>
>>>>>>>>>>> Here is code in beam I found related
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>>>>>>
>>>>>>>>>>>   private static Object messageValueFromRowValue(
>>>>>>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int
>>>>>>>>>>> index, Row row) {
>>>>>>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>>>>>>     if (value == null) {
>>>>>>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>>>>>>         return null;
>>>>>>>>>>>       } else {
>>>>>>>>>>>         throw new IllegalArgumentException(
>>>>>>>>>>>             "Received null value for non-nullable field " +
>>>>>>>>>>> fieldDescriptor.getName());
>>>>>>>>>>>       }
>>>>>>>>>>>     }
>>>>>>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(),
>>>>>>>>>>> value);
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should 
>>>>>>>>>>> stick to
>>>>>>>>>>> nullable setting on beam schema field, correct?
>>>>>>>>>>>
>>>>>>>>>>> And how do I avoid this?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Siyuan
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to