So far I'm happy with the performance of converting Beam row directly to
Proto and save to BQ. But if in beam schema a field is nullable but you can
not really set it to null, It is a bug, right?

On Tue, Oct 1, 2024 at 10:07 AM [email protected] <[email protected]> wrote:

> Also Is there a tool to generate proto from existing BQ table?
>
> On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]> wrote:
>
>> Yes but there are transformations designed for PCollection<Row> that we
>> use before writing to BQ
>>
>> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]>
>> wrote:
>>
>>> Beam's schema transforms should work with protocol buffers as well. Beam
>>> automatically infers the type of proto and efficiently calls the accessors
>>> (assuming that these are precompiled protos from a .proto file). If the
>>> proto matches the BigQuery schema, you can use writeProto and skip the
>>> entire conversion stage.
>>>
>>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]>
>>> wrote:
>>>
>>>> Well, I'm trying to build something as cost effective as possible. I
>>>> was trying to use row to tablerow and use the writeTableRow function, but
>>>> it's too expensive. From the profiler, it seems row to tablerow is
>>>> expensive, But from the source code I also see it's possible to write beam
>>>> row directly to Bigquery
>>>>
>>>> Do you guys have any suggestions? I can try to use writeProto but then
>>>> I don't get the benefit of all the buildin transformations that designed
>>>> for beam row format
>>>>
>>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <
>>>> [email protected]> wrote:
>>>>
>>>>> Can you explain what you are trying to do here? BigQuery requires
>>>>> schema to be known before we write. Beam schemas similarly must be known 
>>>>> at
>>>>> graph construction time - though this isn't quite the same as Java compile
>>>>> time.
>>>>>
>>>>> Reuven
>>>>>
>>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I mean how do I create empty list if the element type is unknown at
>>>>>> compile time.
>>>>>>
>>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks @Ahmed Abualsaud <[email protected]>  but how do I
>>>>>>> get around this error for now if I want to use beam schema?
>>>>>>>
>>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hey Siyuan,
>>>>>>>>
>>>>>>>> We use the descriptor because it is derived from the BQ table's
>>>>>>>> schema In a previous step [1]. We are essentially checking against the
>>>>>>>> table schema.
>>>>>>>> You're seeing this error because *nullable* and *repeated* modes
>>>>>>>> are mutually exclusive. I think we can reduce friction though by 
>>>>>>>> defaulting
>>>>>>>> null values to an empty list, which seems to be in line with 
>>>>>>>> GoogleSQL's
>>>>>>>> behavior [2].
>>>>>>>>
>>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>>>>
>>>>>>>> For now, you can work around this by converting your null array
>>>>>>>> values to empty lists.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>>>>> [2]
>>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>>>>
>>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm trying to write Beam row directly to bigquery because it would
>>>>>>>>> go through less conversion and more efficient but there is some weird 
>>>>>>>>> error
>>>>>>>>> happening
>>>>>>>>> A nullable array field would throw
>>>>>>>>>
>>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value
>>>>>>>>> for non-nullable field
>>>>>>>>>
>>>>>>>>> If I set null for that field
>>>>>>>>>
>>>>>>>>> Here is code in beam I found related
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>>>>
>>>>>>>>>   private static Object messageValueFromRowValue(
>>>>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int index,
>>>>>>>>> Row row) {
>>>>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>>>>     if (value == null) {
>>>>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>>>>         return null;
>>>>>>>>>       } else {
>>>>>>>>>         throw new IllegalArgumentException(
>>>>>>>>>             "Received null value for non-nullable field " +
>>>>>>>>> fieldDescriptor.getName());
>>>>>>>>>       }
>>>>>>>>>     }
>>>>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(),
>>>>>>>>> value);
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should 
>>>>>>>>> stick to
>>>>>>>>> nullable setting on beam schema field, correct?
>>>>>>>>>
>>>>>>>>> And how do I avoid this?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Siyuan
>>>>>>>>>
>>>>>>>>

Reply via email to