To my knowledge, autoscaling is dependent on how many messages are
backlogged within Pubsub and independent of the second subscription (the
second subscription is really to compute a better watermark).

On Thu, Aug 3, 2017 at 1:34 PM, <[email protected]> wrote:

> Thanks Lukasz that's good to know! It sounds like we can halve our PubSub
> costs then!
>
> Just to clarify, if I were to remove withTimestampAttribute from a job,
> this would cause the watermark to always be up to date (processing time)
> even if the job starts to lag behind (build up of unacknowledged PubSub
> messages). In this case would Dataflow's autoscaling still scale up? I
> thought the reason the autoscaler scales up is due to the watermark lagging
> behind, but is it also aware of the acknowledged PubSub messages?
>
> On 3 Aug 2017, at 18:58, Lukasz Cwik <[email protected]> wrote:
>
> You understanding is correct - the data watermark will only matter for
> windowing. It will not affect auto-scaling. If the pipeline is not doing
> any windowing, triggering, etc then there is no need to pay for the cost of
> the second subscription.
>
> On Thu, Aug 3, 2017 at 8:17 AM, Josh <[email protected]> wrote:
>
>> Hi all,
>>
>> We've been running a few streaming Beam jobs on Dataflow, where each job
>> is consuming from PubSub via PubSubIO. Each job does something like this:
>>
>> PubsubIO.readMessagesWithAttributes()
>>             .withIdAttribute("unique_id")
>>             .withTimestampAttribute("timestamp");
>>
>> My understanding of `withTimestampAttribute` is that it means we use the
>> timestamp on the PubSub message as Beam's concept of time (the watermark) -
>> so that any windowing we do in the job uses "event time" rather than
>> "processing time".
>>
>> My question is: is my understanding correct, and does using
>> `withTimestampAttribute` have any effect in a job that doesn't do any
>> windowing? I have a feeling it may also have an effect on Dataflow's
>> autoscaling, since I think Dataflow scales up when the watermark timestamp
>> lags behind, but I'm not sure about this.
>>
>> The reason I'm concerned about this is because we've been using it in all
>> our Dataflow jobs, and have now realised that whenever
>> `withTimestampAttribute` is used, Dataflow creates an additional PubSub
>> subscription (suffixed with `__streaming_dataflow_internal`), which
>> appears to be doubling PubSub costs (since we pay per subscription)! So I
>> want to remove `withTimestampAttribute` from jobs where possible, but want
>> to first understand the implications.
>>
>> Thanks for any advice,
>> Josh
>>
>
>

Reply via email to