I wrote an apache beam pipeline in python to read messages from pubsub
subscription.

The data rate at which the messages published from that pubsub topic which
is in us-east4 region is 10,000 tuples/sec.

The pipeline looks likes this:
| 'read from pubsub' >> beam.io.textio.ReadFromText()
| ' print' >> beam.Map(print)


I created template for this pipeline and submitted the job in dataflow with
n2d-standard-4 machine.
Its using 90% of CPU just to read from pubsub and backlog is around 10
seconds which is constant over the time.

My question are:
1. Is it normal to use 90% of CPU just to read the messages from pubsub
2. what could be the possible reasons for this.
3. why it is not able to clear all the backlog and infact it is increasing
after after sometime as throughput is also decreasing.


Thank you

Reply via email to