Rohit, I have seen quite a few issues with this feature so far.  The change
you made in #1538 does not change the actual code at all, it just reduces
the number of tests, so you are less likely to run into the problem, but
the problem still exists.

I am CCing in Simon Weller as well.  I was talking to him this morning and
he had this to say (unprompted).

Will, We're still seeing odd issues with that NIO SSL concurrency patch
> (1493), even after pulling in the additional PR 1534. The latest problem
> we've seen is 100% cpu on the agents for no apparent reason. I reverted
> both patches from our QA lab this morning and the problem has gone away.


I pulled it into a second lab where we have haproxy setup to load balance
> and the same behaviour occurs


top - 08:18:15 up 1 day, 17:08,  5 users,  load average: 1.92, 2.22, 2.09
> Tasks: 223 total,   1 running, 222 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 22.2 us, 11.9 sy,  0.0 ni, 65.8 id,  0.0 wa,  0.0 hi,  0.1 si,
>  0.0 st
> KiB Mem : 32673608 total, 28312176 free,  3512104 used,   849328 buff/cache
> KiB Swap:  4194300 total,  4194300 free,        0 used. 28757568 avail Mem
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
>
> 17985 root      20   0 6937720 162816  22196 S 100.3  0.5   3:24.84
> /usr/lib/jvm/jre/bin/java -Xms256m -Xmx2048m -cp
> /usr/share/cloudstack-agent/lib/activatio+
> 15587 root      20   0 1733288 375976  12164 S 100.0  1.2  10:42.36
> /usr/libexec/qemu-kvm -name v-46-VM -S -machine
> pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1+
>  4480 root      20   0  909604 305292  12264 S   0.7  0.9   1:10.21
> /usr/libexec/qemu-kvm -name r-44-VM -S -machine
> pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2+
>  5188 root      20   0  957548 323420  12216 S   0.7  1.0   1:07.35
> /usr/libexec/qemu-kvm -name r-45-VM -S -machine
> pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2+
> 18336 root      20   0  157840   2392   1556 R   0.7  0.0   0:00.14 top
>
>
> 19023 root      20   0 1002156 449720  12372 S   0.7  1.4  10:57.69
> /usr/libexec/qemu-kvm -name r-32-VM -S -machine
> pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2+


 I am considering reverting this feature (both PRs) until we can understand
what is causing this and we can stabilize this code so it does not cause us
problems.  With this type of behavior, I am not confident with this code in
production right now...

*Will STEVENS*
Lead Developer

*CloudOps* *| *Cloud Solutions Experts
420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
w cloudops.com *|* tw @CloudOps_

On Wed, May 11, 2016 at 5:36 AM, Rohit Yadav <rohit.ya...@shapeblue.com>
wrote:

> Please follow up on PR #1538 and comment if that fixes the issue on OSX.
>
> Regards.
>
> Regards,
>
> Rohit Yadav
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
> -----Original Message-----
> From: Rohit Yadav [mailto:rohit.ya...@shapeblue.com]
> Sent: Wednesday, May 11, 2016 2:49 PM
> To: dev@cloudstack.apache.org
> Subject: RE: Test failure on master?
>
> I don't have OSX, but it seems to be working on Travis and Linux env in
> general.
> I'll send a PR that relaxes malicious client attacks, and ask you to
> review in your env -- Koushik and Mike.
>
> Regards,
>
> Rohit Yadav
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>
> -----Original Message-----
> From: Koushik Das [mailto:koushik....@accelerite.com]
> Sent: Wednesday, May 11, 2016 12:22 PM
> To: dev@cloudstack.apache.org
> Subject: Re: Test failure on master?
>
> I am also seeing the same failure happening randomly. OS X El Capitan
> 10.11.4.
>
> Results :
>
> Tests in error:
>   NioTest.testConnection:152 > TestTimedOut test timed out after 60000
> milliseco...
>
> Tests run: 200, Failures: 0, Errors: 1, Skipped: 13
>
>
> ________________________________________
> From: Tutkowski, Mike <mike.tutkow...@netapp.com>
> Sent: Tuesday, May 10, 2016 6:31:23 PM
> To: dev@cloudstack.apache.org
> Subject: Re: Test failure on master?
>
> Oh, and it's the OS of my MacBook Pro.
>
> > On May 10, 2016, at 6:59 AM, Tutkowski, Mike <mike.tutkow...@netapp.com>
> wrote:
> >
> > Hi,
> >
> > The environment is Mac OS X El Capitan 10.11.4.
> >
> > Thanks!
> > Mike
> >
> >> On May 10, 2016, at 5:51 AM, Will Stevens <wstev...@cloudops.com>
> wrote:
> >>
> >> I think I can verify that this is still happening on master for him
> >> because you changed the timeout (and the number of tests run, etc)
> >> when you pushed the fix in #1534.  So by looking at the timeout of
> >> 60000, we can verify that it is the latest code from master being run.
> >>
> >> I do think we need to revisit this to make sure we don't have
> >> intermittent issues with this test.
> >>
> >> Thx guys...
> >>
> >> *Will STEVENS*
> >> Lead Developer
> >>
> >> *CloudOps* *| *Cloud Solutions Experts
> >> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 w cloudops.com *|* tw
> >> @CloudOps_
> >>
> >> On Tue, May 10, 2016 at 7:41 AM, Rohit Yadav
> >> <rohit.ya...@shapeblue.com>
> >> wrote:
> >>
> >>> Mike,
> >>>
> >>> Can you comment if you're using latest master. Can you also share
> >>> the environment where you're running this (in a VM, automated by
> >>> Jenkins, Java version etc)?
> >>>
> >>> Will - I think the issue should be fixed on latest master, but if
> >>> Mike and others are getting failures I can further relax the test.
> >>> In virtualized environments, there may be threading/scheduling issues.
> >>>
> >>> Regards,
> >>> Rohit Yadav
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Rohit Yadav
> >>>
> >>> rohit.ya...@shapeblue.com
> >>> www.shapeblue.com
> >>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue On
> >>> May 10 2016, at 3:20 am, Will Stevens <wstev...@cloudops.com> wrote:
> >>>
> >>> Rohit, can you look into this.
> >>>
> >>> It was first introduced in:
> >>> https://github.com/apache/cloudstack/pull/1493
> >>>
> >>> I thought the problem was fixed with this:
> >>> https://github.com/apache/cloudstack/pull/1534
> >>>
> >>> Apparently we still have a problem. This is intermittently emitting
> >>> false negatives from what I can tell...
> >>>
> >>> *Will STEVENS*
> >>> Lead Developer
> >>>
> >>> *CloudOps* *| *Cloud Solutions Experts
> >>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 w cloudops.com *|*
> >>> tw @CloudOps_
> >>>
> >>> On Mon, May 9, 2016 at 5:34 PM, Tutkowski, Mike
> >>> <mike.tutkow...@netapp.com
> >>> wrote:
> >>>
> >>>> ?Hi,
> >>>>
> >>>>
> >>>> I've seen this a couple times today.
> >>>>
> >>>>
> >>>> Is this a known issue?
> >>>>
> >>>>
> >>>> Results :
> >>>>
> >>>>
> >>>> Tests in error:
> >>>>
> >>>> NioTest.testConnection:152 > TestTimedOut test timed out after
> >>>> 60000 milliseco...
> >>>>
> >>>>
> >>>> Tests run: 200, Failures: 0, Errors: 1, Skipped: 13
> >>>>
> >>>>
> >>>> [INFO]
> >>>> -------------------------------------------------------------------
> >>>> -----
> >>>>
> >>>> [INFO] Reactor Summary:
> >>>>
> >>>> [INFO]
> >>>>
> >>>> [INFO] Apache CloudStack Developer Tools - Checkstyle Configuration
> >>>> SUCCESS [ 1.259 s]
> >>>>
> >>>> [INFO] Apache CloudStack .................................. SUCCESS
> >>>> [
> >>>> 1.858 s]
> >>>>
> >>>> [INFO] Apache CloudStack Maven Conventions Parent ......... SUCCESS
> >>>> [
> >>>> 1.528 s]
> >>>>
> >>>> [INFO] Apache CloudStack Framework - Managed Context ...... SUCCESS
> >>>> [
> >>>> 4.882 s]
> >>>>
> >>>> [INFO] Apache CloudStack Utils ............................ FAILURE
> >>> [01:20
> >>>> min]??
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Mike
> >>>
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Accelerite, a Persistent Systems business. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Accelerite, a Persistent Systems business does not accept any
> liability for virus infected mails.
>

Reply via email to