I am trying to create a pipeline where I query paginated data from some
external service via a client and join them into a PCollectionList and
flatten it to get the final collection of data items.
The data class is encoded using a ProtoCoder

Here is my code:
-------------------------------------------------------------------------------------------------------------------------
private static PCollection<Data> loadAllData(
Pipeline pipeline, DataClient client, Instant from, Instant to) {

PaginatedList<Data> paginatedData = getPaginatedData(client, from, to, 0);
int total = paginatedData.meta().total();
int limit = paginatedData.meta().limit();

List<PCollection<Data>> collections = new ArrayList<>();

PCollection<Data> collection =
pipeline
.apply(Create.of(paginatedData.data()).withCoder(ProtoCoder.of(Data.class
)));
collections.add(collection);

for (int i = 1; i < total / limit + 1; i++) {
paginatedData = getPaginatedData(client, from, to, i * limit);
collection =
pipeline
.apply(Create.of(paginatedData.data()).withCoder(ProtoCoder.of(Data.class
)));
observationsCollections.add(collection);
}
PCollectionList<Data> list = PCollectionList.of(collections);

return list
.apply(Flatten.pCollections())
.setCoder(ProtoCoder.of(Data.class));
}
---------------------------------------------------------------------------

When I run this pipeline it is complaining at each step like DataClient has
to be serializable.
Even any objects created in the above method have to be serializable, for
example PaginatedList.

However many of the classes I use like DataClient and PaginatedList are
part of some third party library and
they don't implement serializable. So is there any way to ensure beam is
able to serialize them ?

Overall in general when designing a pipeline how can we identify what all
objects would have to be serialized.

Instead of creating this static method, if I create a non-static method and
implement serializable in the base class which contains this method,
would this help ?

Thanks
Sachin

Reply via email to