That explains a lot. For example, why I was able to intercept calls to
PubsubJsonClient by AspectJ locally but not in Dataflow. And why custom
changes in the beam-sdks-java-io-google-cloud-platform  artifact are
ignored in Dataflow.

Thank you, Boyuan Zhang!


[image: Brandmark_small.jpg]

Andrei Litvinov,

Senior Java Developer

Grid Dynamics
5000 Executive Parkway # 520, San Ramon, CA 94583
Cell: +1 (510) 366 4205

Read Grid Dynamics' Tech Blog
<http://blog.griddynamics.com/?utm_campaign=Big%20Data%20Blog%20social%20media%20promotion&utm_medium=CTA&utm_source=Email>


This email message (and any attachments) is confidential and may be
privileged or otherwise protected from disclosure by applicable law. If you
are not the intended recipient or have received this in error please notify
the system manager, secur...@griddynamics.com and remove this message and
any attachments from your system. Any unauthorized dissemination, copying
or other use of this message and/or any attachments is strictly prohibited
and may constitute a breach of civil or criminal law. Grid Dynamics may
monitor email traffic data and also the content of email.



On Tue, Nov 24, 2020 at 5:00 PM Boyuan Zhang <boyu...@google.com> wrote:

> Hi Andrei,
>
> Dataflow doesn't execute the pubsub write transform on the user worker but
> as an internal operation. In that way, it is not possible to log messages
> when pubsub write happens.
>
> But you can use --experiments=enable_custom_pubsub_sink to enforce the
> Dataflow to use the expansion
> <https://github.com/apache/beam/blob/d7655c12f7e0d575513c937ad53c45fbe1339c2d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java#L411-L436>
>  which
> will be executed on the user worker. Then you can add your log somewhere
> here
> <https://github.com/apache/beam/blob/d7655c12f7e0d575513c937ad53c45fbe1339c2d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSink.java#L246>
> .
>
> If you believe there is something wrong with Dataflow, please contact gcp
> customer support.
>
> On Tue, Nov 24, 2020 at 11:34 AM Andrei Litvinov <
> alitvi...@griddynamics.com> wrote:
>
>> Hello Beam Community,
>>
>>   We are using a streaming Dataflow job writing to Google Pub Sub through
>> PubsubIO.Write. Does anybody know how we can log IDs which Pub Sub assigns
>> to messages?
>>   We need that to localize where messages might be lost.
>>   Case 1. A Pub Sub topic is used as an interface between two projects.
>> We are working on the sending side and do not control the receiving side.
>> When the job was deployed the first time, Pub Sub did not accept messages
>> (because of a permission problem), the job got stuck,  but we did not get
>> any errors in Dataflow logs. Neither were we able to tell if the messages
>> were accepted by Pub Sub.
>>   Case 2. We suspect some messages are lost while going end-to-end
>> through our system, we do not know exactly where.  We would like to know
>> IDs of messages which were actually accepted by Pub Sub to trace missing
>> messages.
>>
>> Thank you,
>> Andrei Litvinov
>>
>>
>>
>>

Reply via email to