I am trying to create a pipeline where I query paginated data from some external service via a client and join them into a PCollectionList and flatten it to get the final collection of data items. The data class is encoded using a ProtoCoder
Here is my code: ------------------------------------------------------------------------------------------------------------------------- private static PCollection<Data> loadAllData( Pipeline pipeline, DataClient client, Instant from, Instant to) { PaginatedList<Data> paginatedData = getPaginatedData(client, from, to, 0); int total = paginatedData.meta().total(); int limit = paginatedData.meta().limit(); List<PCollection<Data>> collections = new ArrayList<>(); PCollection<Data> collection = pipeline .apply(Create.of(paginatedData.data()).withCoder(ProtoCoder.of(Data.class ))); collections.add(collection); for (int i = 1; i < total / limit + 1; i++) { paginatedData = getPaginatedData(client, from, to, i * limit); collection = pipeline .apply(Create.of(paginatedData.data()).withCoder(ProtoCoder.of(Data.class ))); observationsCollections.add(collection); } PCollectionList<Data> list = PCollectionList.of(collections); return list .apply(Flatten.pCollections()) .setCoder(ProtoCoder.of(Data.class)); } --------------------------------------------------------------------------- When I run this pipeline it is complaining at each step like DataClient has to be serializable. Even any objects created in the above method have to be serializable, for example PaginatedList. However many of the classes I use like DataClient and PaginatedList are part of some third party library and they don't implement serializable. So is there any way to ensure beam is able to serialize them ? Overall in general when designing a pipeline how can we identify what all objects would have to be serialized. Instead of creating this static method, if I create a non-static method and implement serializable in the base class which contains this method, would this help ? Thanks Sachin