I understand now. So the issue is that the report_state greenthread is just blocking and yielding whenever it tries to actually send a message?
On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov <enikano...@mirantis.com> wrote: > Salvatore, > > By 'fairness' I meant chances for state report greenthread to get the > control. In DHCP case, each network processed by a separate greenthread, so > the more greenthreads agent has, the less chances that report state > greenthread will be able to report in time. > > Thanks, > Eugene. > > On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <sorla...@nicira.com> > wrote: > >> On 5 June 2015 at 01:29, Itsuro ODA <o...@valinux.co.jp> wrote: >> >>> Hi, >>> >>> > After trying to reproduce this, I'm suspecting that the issue is >>> actually >>> > on the server side from failing to drain the agent report state queue >>> in >>> > time. >>> >>> I have seen before. >>> I thought the senario at that time as follows. >>> * a lot of create/update resource API issued >>> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked >>> farther sending side of RPC. >>> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size" >>> pool for replying RPC. >>> * receiving state_report is blocked because "rpc_thread_pool_size" pool >>> exhausted. >>> >>> >> I think this could be a good explanation couldn't it? >> Kevin proved that the periodic tasks are not mutually exclusive and that >> long process times for sync_routers are not an issue. >> However, he correctly suspected a server-side involvement, which could >> actually be a lot of requests saturating the RPC pool. >> >> On the other hand, how could we use this theory to explain why this issue >> tend to occur when the agent is restarted? >> Also, Eugene, what do you mean by stating that the issue could be in >> agent's "fairness"? >> >> Salvatore >> >> >> >>> Thanks >>> Itsuro Oda >>> >>> On Thu, 4 Jun 2015 14:20:33 -0700 >>> Kevin Benton <blak...@gmail.com> wrote: >>> >>> > After trying to reproduce this, I'm suspecting that the issue is >>> actually >>> > on the server side from failing to drain the agent report state queue >>> in >>> > time. >>> > >>> > I set the report_interval to 1 second on the agent and added a logging >>> > statement and I see a report every 1 second even when sync_routers is >>> > taking a really long time. >>> > >>> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <c...@ecbaldwin.net> >>> wrote: >>> > >>> > > Ann, >>> > > >>> > > Thanks for bringing this up. It has been on the shelf for a while >>> now. >>> > > >>> > > Carl >>> > > >>> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando < >>> sorla...@nicira.com> >>> > > wrote: >>> > > > One reason for not sending the heartbeat from a separate >>> greenthread >>> > > could >>> > > > be that the agent is already doing it [1]. >>> > > > The current proposed patch addresses the issue blindly - that is >>> to say >>> > > > before declaring an agent dead let's wait for some more time >>> because it >>> > > > could be stuck doing stuff. In that case I would probably make the >>> > > > multiplier (currently 2x) configurable. >>> > > > >>> > > > The reason for which state report does not occur is probably that >>> both it >>> > > > and the resync procedure are periodic tasks. If I got it right >>> they're >>> > > both >>> > > > executed as eventlet greenthreads but one at a time. Perhaps then >>> adding >>> > > an >>> > > > initial delay to the full sync task might ensure the first thing >>> an agent >>> > > > does when it comes up is sending a heartbeat to the server? >>> > > > >>> > > > On the other hand, while doing the initial full resync, is the >>> agent >>> > > able >>> > > > to process updates? If not perhaps it makes sense to have it down >>> until >>> > > it >>> > > > finishes synchronisation. >>> > > >>> > > Yes, it can! The agent prioritizes updates from RPC over full resync >>> > > activities. >>> > > >>> > > I wonder if the agent should check how long it has been since its >>> last >>> > > state report each time it finishes processing an update for a router. >>> > > It normally doesn't take very long (relatively) to process an update >>> > > to a single router. >>> > > >>> > > I still would like to know why the thread to report state is being >>> > > starved. Anyone have any insight on this? I thought that with all >>> > > the system calls, the greenthreads would yield often. There must be >>> > > something I don't understand about it. >>> > > >>> > > Carl >>> > > >>> > > >>> __________________________________________________________________________ >>> > > OpenStack Development Mailing List (not for usage questions) >>> > > Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> > > >>> > >>> > >>> > >>> > -- >>> > Kevin Benton >>> >>> -- >>> Itsuro ODA <o...@valinux.co.jp> >>> >>> >>> >>> __________________________________________________________________________ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- Kevin Benton
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev