Hey all,

I'm a bit confused with the semantics around DoFn serialization. I have a
model defined like this:

```
type SinkCustomAttributes map[string]string

type SinkScan struct {
  DestinationURLQueryParams SinkCustomAttributes
`beam:"destination_url_query_params" db:"destination_url_query_params"
json:"destination_url_query_params"`
}
```

I figured out pretty fast that I can't use a `map` _directly_ as the result
of a DoFn, however a `map` wrapped up in a `struct` worked fine for a batch
pipeline. Indeed that definition has been working totally fine for both the
prism runner and the dataflow runner for a while now.

However when I started in on a streaming pipeline running on prism I got
the following error:

```

panic:  generating model pipeline
        failed to add input kind: {main.SourceScanToSinkScanFn 5: ParDo
[In(Main): models.SourceScan <- {4: models.SourceScan/R[models.SourceScan]
FIX[1s]:unbounded}] -> [Out: models.SinkScan -> {5:
models.SinkScan/R[models.SinkScan] FIX[1s]:unbounded}]}
                caused by:
        failed to serialize 5: ParDo [In(Main): models.SourceScan <- {4:
models.SourceScan/R[models.SourceScan] FIX[1s]:unbounded}] -> [Out:
models.SinkScan -> {5: models.SinkScan/R[models.SinkScan]
FIX[1s]:unbounded}]
                caused by:
                encoding userfn 5: ParDo [In(Main): models.SourceScan <-
{4: models.SourceScan/R[models.SourceScan] FIX[1s]:unbounded}] -> [Out:
models.SinkScan -> {5: models.SinkScan/R[models.SinkScan]
FIX[1s]:unbounded}]
        bad output type
                caused by:
                encoding full type models.SinkScan
        bad type
                caused by:
                encoding struct models.SinkScan
        bad field type
                caused by:
        unencodable type 'map', try to wrap the type as a field in a
struct, see https://github.com/apache/beam/issues/23101 for details

```


The error message seems to imply (paradoxically) that the `map` field could
not be serialized, and that I can fix it by having it be a field on a
struct (it already) is.


This begs some questions:



   1. Is it OK to have `map` fields on structs in Beam?
   2. Are there different serialization semantics between batch and
   streaming pipelines?

There is also a decent chance there is just some subtle bug on my side, but
I wanted to check my understanding here.

Thanks!

Reply via email to