Re: [openstack-dev] [Neutron] L3 agent rescheduling issue

Kevin Benton Sun, 07 Jun 2015 20:29:15 -0700

I understand now. So the issue is that the report_state greenthread is just
blocking and yielding whenever it tries to actually send a message?


On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov <enikano...@mirantis.com>
wrote:

> Salvatore,
>
> By 'fairness' I meant chances for state report greenthread to get the
> control. In DHCP case, each network processed by a separate greenthread, so
> the more greenthreads agent has, the less chances that report state
> greenthread will be able to report in time.
>
> Thanks,
> Eugene.
>
> On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <sorla...@nicira.com>
> wrote:
>
>> On 5 June 2015 at 01:29, Itsuro ODA <o...@valinux.co.jp> wrote:
>>
>>> Hi,
>>>
>>> > After trying to reproduce this, I'm suspecting that the issue is
>>> actually
>>> > on the server side from failing to drain the agent report state queue
>>> in
>>> > time.
>>>
>>> I have seen before.
>>> I thought the senario at that time as follows.
>>> * a lot of create/update resource API issued
>>> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked
>>>   farther sending side of RPC.
>>> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"
>>>   pool for replying RPC.
>>> * receiving state_report is blocked because "rpc_thread_pool_size" pool
>>>   exhausted.
>>>
>>>
>> I think this could be a good explanation couldn't it?
>> Kevin proved that the periodic tasks are not mutually exclusive and that
>> long process times for sync_routers are not an issue.
>> However, he correctly suspected a server-side involvement, which could
>> actually be a lot of requests saturating the RPC pool.
>>
>> On the other hand, how could we use this theory to explain why this issue
>> tend to occur when the agent is restarted?
>> Also, Eugene, what do you mean by stating that the issue could be in
>> agent's "fairness"?
>>
>> Salvatore
>>
>>
>>
>>> Thanks
>>> Itsuro Oda
>>>
>>> On Thu, 4 Jun 2015 14:20:33 -0700
>>> Kevin Benton <blak...@gmail.com> wrote:
>>>
>>> > After trying to reproduce this, I'm suspecting that the issue is
>>> actually
>>> > on the server side from failing to drain the agent report state queue
>>> in
>>> > time.
>>> >
>>> > I set the report_interval to 1 second on the agent and added a logging
>>> > statement and I see a report every 1 second even when sync_routers is
>>> > taking a really long time.
>>> >
>>> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <c...@ecbaldwin.net>
>>> wrote:
>>> >
>>> > > Ann,
>>> > >
>>> > > Thanks for bringing this up.  It has been on the shelf for a while
>>> now.
>>> > >
>>> > > Carl
>>> > >
>>> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <
>>> sorla...@nicira.com>
>>> > > wrote:
>>> > > > One reason for not sending the heartbeat from a separate
>>> greenthread
>>> > > could
>>> > > > be that the agent is already doing it [1].
>>> > > > The current proposed patch addresses the issue blindly - that is
>>> to say
>>> > > > before declaring an agent dead let's wait for some more time
>>> because it
>>> > > > could be stuck doing stuff. In that case I would probably make the
>>> > > > multiplier (currently 2x) configurable.
>>> > > >
>>> > > > The reason for which state report does not occur is probably that
>>> both it
>>> > > > and the resync procedure are periodic tasks. If I got it right
>>> they're
>>> > > both
>>> > > > executed as eventlet greenthreads but one at a time. Perhaps then
>>> adding
>>> > > an
>>> > > > initial delay to the full sync task might ensure the first thing
>>> an agent
>>> > > > does when it comes up is sending a heartbeat to the server?
>>> > > >
>>> > > > On the other hand, while doing the initial full resync, is the
>>> agent
>>> > > able
>>> > > > to process updates? If not perhaps it makes sense to have it down
>>> until
>>> > > it
>>> > > > finishes synchronisation.
>>> > >
>>> > > Yes, it can!  The agent prioritizes updates from RPC over full resync
>>> > > activities.
>>> > >
>>> > > I wonder if the agent should check how long it has been since its
>>> last
>>> > > state report each time it finishes processing an update for a router.
>>> > > It normally doesn't take very long (relatively) to process an update
>>> > > to a single router.
>>> > >
>>> > > I still would like to know why the thread to report state is being
>>> > > starved.  Anyone have any insight on this?  I thought that with all
>>> > > the system calls, the greenthreads would yield often.  There must be
>>> > > something I don't understand about it.
>>> > >
>>> > > Carl
>>> > >
>>> > >
>>> __________________________________________________________________________
>>> > > OpenStack Development Mailing List (not for usage questions)
>>> > > Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Kevin Benton
>>>
>>> --
>>> Itsuro ODA <o...@valinux.co.jp>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Kevin Benton

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] L3 agent rescheduling issue

Reply via email to