Re: [openstack-dev] [Neutron] L3 agent rescheduling issue

Eugene Nikanorov Sun, 07 Jun 2015 21:29:47 -0700

No, I think greenthread itself don't do anything special, it's just when
there are too many threads, state_report thread can't get the control for
too long, since there is no prioritization of greenthreads.


Eugene.

On Sun, Jun 7, 2015 at 8:24 PM, Kevin Benton <blak...@gmail.com> wrote:

> I understand now. So the issue is that the report_state greenthread is
> just blocking and yielding whenever it tries to actually send a message?
>
> On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov <enikano...@mirantis.com>
> wrote:
>
>> Salvatore,
>>
>> By 'fairness' I meant chances for state report greenthread to get the
>> control. In DHCP case, each network processed by a separate greenthread, so
>> the more greenthreads agent has, the less chances that report state
>> greenthread will be able to report in time.
>>
>> Thanks,
>> Eugene.
>>
>> On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando <sorla...@nicira.com>
>> wrote:
>>
>>> On 5 June 2015 at 01:29, Itsuro ODA <o...@valinux.co.jp> wrote:
>>>
>>>> Hi,
>>>>
>>>> > After trying to reproduce this, I'm suspecting that the issue is
>>>> actually
>>>> > on the server side from failing to drain the agent report state queue
>>>> in
>>>> > time.
>>>>
>>>> I have seen before.
>>>> I thought the senario at that time as follows.
>>>> * a lot of create/update resource API issued
>>>> * "rpc_conn_pool_size" pool exhausted for sending notify and blocked
>>>>   farther sending side of RPC.
>>>> * "rpc_thread_pool_size" pool exhausted by waiting "rpc_conn_pool_size"
>>>>   pool for replying RPC.
>>>> * receiving state_report is blocked because "rpc_thread_pool_size" pool
>>>>   exhausted.
>>>>
>>>>
>>> I think this could be a good explanation couldn't it?
>>> Kevin proved that the periodic tasks are not mutually exclusive and that
>>> long process times for sync_routers are not an issue.
>>> However, he correctly suspected a server-side involvement, which could
>>> actually be a lot of requests saturating the RPC pool.
>>>
>>> On the other hand, how could we use this theory to explain why this
>>> issue tend to occur when the agent is restarted?
>>> Also, Eugene, what do you mean by stating that the issue could be in
>>> agent's "fairness"?
>>>
>>> Salvatore
>>>
>>>
>>>
>>>> Thanks
>>>> Itsuro Oda
>>>>
>>>> On Thu, 4 Jun 2015 14:20:33 -0700
>>>> Kevin Benton <blak...@gmail.com> wrote:
>>>>
>>>> > After trying to reproduce this, I'm suspecting that the issue is
>>>> actually
>>>> > on the server side from failing to drain the agent report state queue
>>>> in
>>>> > time.
>>>> >
>>>> > I set the report_interval to 1 second on the agent and added a logging
>>>> > statement and I see a report every 1 second even when sync_routers is
>>>> > taking a really long time.
>>>> >
>>>> > On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin <c...@ecbaldwin.net>
>>>> wrote:
>>>> >
>>>> > > Ann,
>>>> > >
>>>> > > Thanks for bringing this up.  It has been on the shelf for a while
>>>> now.
>>>> > >
>>>> > > Carl
>>>> > >
>>>> > > On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando <
>>>> sorla...@nicira.com>
>>>> > > wrote:
>>>> > > > One reason for not sending the heartbeat from a separate
>>>> greenthread
>>>> > > could
>>>> > > > be that the agent is already doing it [1].
>>>> > > > The current proposed patch addresses the issue blindly - that is
>>>> to say
>>>> > > > before declaring an agent dead let's wait for some more time
>>>> because it
>>>> > > > could be stuck doing stuff. In that case I would probably make the
>>>> > > > multiplier (currently 2x) configurable.
>>>> > > >
>>>> > > > The reason for which state report does not occur is probably that
>>>> both it
>>>> > > > and the resync procedure are periodic tasks. If I got it right
>>>> they're
>>>> > > both
>>>> > > > executed as eventlet greenthreads but one at a time. Perhaps then
>>>> adding
>>>> > > an
>>>> > > > initial delay to the full sync task might ensure the first thing
>>>> an agent
>>>> > > > does when it comes up is sending a heartbeat to the server?
>>>> > > >
>>>> > > > On the other hand, while doing the initial full resync, is the
>>>> agent
>>>> > > able
>>>> > > > to process updates? If not perhaps it makes sense to have it down
>>>> until
>>>> > > it
>>>> > > > finishes synchronisation.
>>>> > >
>>>> > > Yes, it can!  The agent prioritizes updates from RPC over full
>>>> resync
>>>> > > activities.
>>>> > >
>>>> > > I wonder if the agent should check how long it has been since its
>>>> last
>>>> > > state report each time it finishes processing an update for a
>>>> router.
>>>> > > It normally doesn't take very long (relatively) to process an update
>>>> > > to a single router.
>>>> > >
>>>> > > I still would like to know why the thread to report state is being
>>>> > > starved.  Anyone have any insight on this?  I thought that with all
>>>> > > the system calls, the greenthreads would yield often.  There must be
>>>> > > something I don't understand about it.
>>>> > >
>>>> > > Carl
>>>> > >
>>>> > >
>>>> __________________________________________________________________________
>>>> > > OpenStack Development Mailing List (not for usage questions)
>>>> > > Unsubscribe:
>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> > >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Kevin Benton
>>>>
>>>> --
>>>> Itsuro ODA <o...@valinux.co.jp>
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Kevin Benton
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] L3 agent rescheduling issue

Reply via email to