Also Is there a tool to generate proto from existing BQ table?

On Tue, Oct 1, 2024 at 10:06 AM [email protected] <[email protected]> wrote:

> Yes but there are transformations designed for PCollection<Row> that we
> use before writing to BQ
>
> On Tue, Oct 1, 2024 at 9:58 AM Reuven Lax via user <[email protected]>
> wrote:
>
>> Beam's schema transforms should work with protocol buffers as well. Beam
>> automatically infers the type of proto and efficiently calls the accessors
>> (assuming that these are precompiled protos from a .proto file). If the
>> proto matches the BigQuery schema, you can use writeProto and skip the
>> entire conversion stage.
>>
>> On Tue, Oct 1, 2024 at 8:47 AM [email protected] <[email protected]> wrote:
>>
>>> Well, I'm trying to build something as cost effective as possible. I was
>>> trying to use row to tablerow and use the writeTableRow function, but it's
>>> too expensive. From the profiler, it seems row to tablerow is expensive,
>>> But from the source code I also see it's possible to write beam row
>>> directly to Bigquery
>>>
>>> Do you guys have any suggestions? I can try to use writeProto but then I
>>> don't get the benefit of all the buildin transformations that designed for
>>> beam row format
>>>
>>> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <[email protected]>
>>> wrote:
>>>
>>>> Can you explain what you are trying to do here? BigQuery requires
>>>> schema to be known before we write. Beam schemas similarly must be known at
>>>> graph construction time - though this isn't quite the same as Java compile
>>>> time.
>>>>
>>>> Reuven
>>>>
>>>> On Tue, Oct 1, 2024 at 12:44 AM [email protected] <[email protected]>
>>>> wrote:
>>>>
>>>>> I mean how do I create empty list if the element type is unknown at
>>>>> compile time.
>>>>>
>>>>> On Tue, Oct 1, 2024 at 12:42 AM [email protected] <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks @Ahmed Abualsaud <[email protected]>  but how do I
>>>>>> get around this error for now if I want to use beam schema?
>>>>>>
>>>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hey Siyuan,
>>>>>>>
>>>>>>> We use the descriptor because it is derived from the BQ table's
>>>>>>> schema In a previous step [1]. We are essentially checking against the
>>>>>>> table schema.
>>>>>>> You're seeing this error because *nullable* and *repeated* modes
>>>>>>> are mutually exclusive. I think we can reduce friction though by 
>>>>>>> defaulting
>>>>>>> null values to an empty list, which seems to be in line with GoogleSQL's
>>>>>>> behavior [2].
>>>>>>>
>>>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>>>
>>>>>>> For now, you can work around this by converting your null array
>>>>>>> values to empty lists.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>>>> [2]
>>>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>>>
>>>>>>> On Mon, Sep 30, 2024 at 6:57 PM [email protected] <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm trying to write Beam row directly to bigquery because it would
>>>>>>>> go through less conversion and more efficient but there is some weird 
>>>>>>>> error
>>>>>>>> happening
>>>>>>>> A nullable array field would throw
>>>>>>>>
>>>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value
>>>>>>>> for non-nullable field
>>>>>>>>
>>>>>>>> If I set null for that field
>>>>>>>>
>>>>>>>> Here is code in beam I found related
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>>>
>>>>>>>>   private static Object messageValueFromRowValue(
>>>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int index,
>>>>>>>> Row row) {
>>>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>>>     if (value == null) {
>>>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>>>         return null;
>>>>>>>>       } else {
>>>>>>>>         throw new IllegalArgumentException(
>>>>>>>>             "Received null value for non-nullable field " +
>>>>>>>> fieldDescriptor.getName());
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(),
>>>>>>>> value);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should 
>>>>>>>> stick to
>>>>>>>> nullable setting on beam schema field, correct?
>>>>>>>>
>>>>>>>> And how do I avoid this?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Siyuan
>>>>>>>>
>>>>>>>

Reply via email to