On Fri, Jan 5, 2018 at 12:31 PM, John Sellens <jsell...@syonex.com> wrote:

> Hi Josh - thanks for the info.
>
> Can I make an assertion that having the default read timeout be unlimited
> is a mistake?  In practical terms, anything over 60 seconds means
> something is broken.
>

Timeouts are hard. How does the client know 60 seconds is long enough?
Compile times of ~1 min are not unthinkable. Maybe the server is under
heavy load? If the timeout is reached, what should the client do? Retry
(bad idea since it makes the situation worse)... Fail (bad idea if the
server was making progress)...


> Could I suggest (without having to go and update the bug because I'm a
> bad bad lazy person) that along with the watchdog you change the default
> timeout to, say, 5 minutes?  That's effectively infinite, but would
> likely keep things from getting stuck.
>

It's definitely possible for the connection to be lost between the time
that the server responds and when the agent would normally receive the
response. In this half-open scenario, the agent may wait indefinitely, so I
agree having a timeout "less than infinite" makes sense. I'm thinking it
should be strictly less than runinterval, otherwise you could have agent
runs stacking up, and contending for the agent lock.

(I wrote some tools back in the early puppet 3 days to run puppet the
> way I wanted, and of course I included a timeout on the total run time.
> There were some interesting failure modes back in the olden days.)
>

Yeah, "interesting" is one way to put it :) Puppet 2/3 conflated TCP
connect and read timeouts. And it required that the *entire* pluginsync
operation take less than Puppet[:configtimeout] minutes (defaulted to 2),
otherwise the agent would abort the pluginsync operation, even though it
could be making progress downloading individual files (see PUP-2885)!


> Thanks - cheers!
>
> John
>
>
>
> On Fri, 2018/01/05 11:53:12AM -0800, Josh Cooper <j...@puppet.com> wrote:
> | In Puppet 4 we added settings for configuring http connect and read
> | timeouts independently[1]. Previously they were both controlled by the
> | configfiletimeout. The default read timeout is unlimited, so the hung
> agent
> | may be stuck in a socket read. You might want to strace the stuck agent
> to
> | see what it's up to.
> |
> | In our upcoming 4.10.x/5.3.x releases, we've added a watchdog to kill a
> | stuck run[2].
> |
> | Josh
> |
> | [1] https://tickets.puppetlabs.com/browse/PUP-3666
> | [2] https://tickets.puppetlabs.com/browse/PUP-7517
> |
> | --
> | Josh Cooper | Software Engineer
> | j...@puppet.com | @coopjn
> |
>



-- 
Josh Cooper | Software Engineer
j...@puppet.com | @coopjn

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/CA%2Bu97u%3DCns%2BekXDO1rqQ2TmO1A1kH1JCW0kMtNkViLAHJ8j_Jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to