Hi Andrei, Dataflow doesn't execute the pubsub write transform on the user worker but as an internal operation. In that way, it is not possible to log messages when pubsub write happens.
But you can use --experiments=enable_custom_pubsub_sink to enforce the Dataflow to use the expansion <https://github.com/apache/beam/blob/d7655c12f7e0d575513c937ad53c45fbe1339c2d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java#L411-L436> which will be executed on the user worker. Then you can add your log somewhere here <https://github.com/apache/beam/blob/d7655c12f7e0d575513c937ad53c45fbe1339c2d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java#L246> . If you believe there is something wrong with Dataflow, please contact gcp customer support. On Tue, Nov 24, 2020 at 11:34 AM Andrei Litvinov <alitvi...@griddynamics.com> wrote: > Hello Beam Community, > > We are using a streaming Dataflow job writing to Google Pub Sub through > PubsubIO.Write. Does anybody know how we can log IDs which Pub Sub assigns > to messages? > We need that to localize where messages might be lost. > Case 1. A Pub Sub topic is used as an interface between two projects. We > are working on the sending side and do not control the receiving side. When > the job was deployed the first time, Pub Sub did not accept messages > (because of a permission problem), the job got stuck, but we did not get > any errors in Dataflow logs. Neither were we able to tell if the messages > were accepted by Pub Sub. > Case 2. We suspect some messages are lost while going end-to-end through > our system, we do not know exactly where. We would like to know IDs of > messages which were actually accepted by Pub Sub to trace missing messages. > > Thank you, > Andrei Litvinov > > > >