Hi,

I'm wondering if this is the intended behaviour?

I've got incoming protobuf bytes read in a pubsubio.Read step and they are 
decoded using the following DoFn.

> func decode(b []byte) (v *pb.Item, _ error) {
>       if err := proto.Unmarshal(b, v); err != nil {
>               return v, fmt.Errorf("failed to unmarshal: %w", err)
>       }
>       return v, nil
> }

This function returns a pointer since protobuf includes code that cause govet 
to prevent invalid copies, complaining with the following.

> copylocks: return copies lock value: example.com/pb.Item contains 
> google.golang.org/protobuf/internal/impl.MessageState contains sync.Mutex

When decoding incoming bytes using this DoFn and passing the resulting struct 
pointers PCollection into bigqueryio.Write, it will fail since it is unable to 
infer the schema as per [1].

Because of this, it seems that for every protobuf piece of data that we'd like 
to insert into BigQuery, we would have to duplicate the structure as a regular 
go struct and translate into this from the protobuf, passing the regular go 
struct along instead.

I'd just like to confirm that this is what we have to do, since it is quite a 
bit of extra work.


Thanks,
--

[1] 
https://github.com/apache/beam/blob/fef98f7597f11aaa93388d547f58bf0482ed74f0/sdks/go/pkg/beam/io/bigqueryio/bigquery.go#L190

Reply via email to