Sorry for the late response. Where does the beam set that timestamp field
on element ? Is it set whenever KafkaIO reads that element ? And also I
have a windowing function on my pipeline. Does the timestamp field change
for any kind of operation ? On pipeline I have the following steps: KafkaIO
-> Format Conversion Pardo -> SQL Filter -> Windowing Step -> Custom Sink.
If timestamp set in KafkaIO, Can I see process time by now() - timestamp in
Custom Sink ?

Thanks

On Thu, May 28, 2020 at 2:07 PM Luke Cwik <lc...@google.com> wrote:

> Dataflow provides msec counters for each transform that executes. You
> should be able to get them from stackdriver and see them from the Dataflow
> UI.
>
> You need to keep track of the timestamp of the element as it flows through
> the system as part of data that goes alongside the element. You can use the
> element's timestamp[1] if that makes sense (it might not if you intend to
> use a timestamp that is from the kafka record itself and the record's
> timestamp isn't the same as the ingestion timestamp). Unless you are
> writing your own sink, the sink won't track the processing time at all so
> you'll need to add a ParDo that goes right before it that writes the timing
> information to wherever you want (a counter, your own metrics database,
> logs, ...).
>
> 1:
> https://github.com/apache/beam/blob/018e889829e300ab9f321da7e0010ff0011a73b1/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L257
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_018e889829e300ab9f321da7e0010ff0011a73b1_sdks_java_core_src_main_java_org_apache_beam_sdk_transforms_DoFn.java-23L257&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=1202mTv7BP1KzcBJECS98dr7u5riw0NHdl8rT8I6Ego&s=cPdnrK4r-tVd0iAO6j7eAAbDPISOdazEYBrPoC9cQOo&e=>
>
>
> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer <tuya...@paloaltonetworks.com>
> wrote:
>
>> Yes I am trying to track how long it takes for a single element to be
>> ingested into the pipeline until it is output somewhere.
>>
>> My pipeline is unbounded. I am using KafkaIO. I did not think about CPU
>> time. if there is a way to track it too, it would be useful to improve my
>> metrics.
>>
>> On Thu, May 28, 2020 at 12:52 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> What do you mean by processing time?
>>>
>>> Are you trying to track how long it takes for a single element to be
>>> ingested into the pipeline until it is output somewhere?
>>> Do you have a bounded pipeline and want to know how long all the
>>> processing takes?
>>> Do you care about how much CPU time is being consumed in aggregate for
>>> all the processing that your pipeline is doing?
>>>
>>>
>>> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
>>>> I am using Dataflow Runner. The pipeline read from kafkaIO and send
>>>> Http. I could not find any metadata field on the element to set first read
>>>> time.
>>>>
>>>> On Thu, May 28, 2020 at 10:44 AM Kyle Weaver <kcwea...@google.com>
>>>> wrote:
>>>>
>>>>> Which runner are you using?
>>>>>
>>>>> On Thu, May 28, 2020 at 1:43 PM Talat Uyarer <
>>>>> tuya...@paloaltonetworks.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a pipeline which has 5 steps. What is the best way to measure
>>>>>> processing time for my pipeline?
>>>>>>
>>>>>> Thnaks
>>>>>>
>>>>>

Reply via email to