Hi Jason

Just spent a little time looking further at this. I think the stack dump
has the answer after all.

I believe the inefficiency is coming from the fact that for each object
access there is a call to getAccess( T object ) in DefaultAclService.  The
trouble is this in turn is calling currentUserService.getCurrentUser() each
time.  You wouldn't think this is such a load but each time I look at a
snapshot of the thread running it is busy in that method so its obviously
significant.  I guess it is doing a string comparison search each time
(userService.getUserCredentialsByUsername( username )), so with sufficient
number of users in the system its accounting for the bulk of the time - a
lot of users multiplied by a lot of dataelements and its going to be more
problematic.  Do you have a lot of users?

The solution has got to be to call  currentUserService.getCurrentUser()
just once, and then use the getAccess( T object, User user ) method
instead.  I'm guessing that will reduce the computation load for returning
a list of dataelements hugely.

The trouble is I am not 100% of where that getAccess call is coming from in
this object sharing code.

Morten will know better but I guess everything starts in the
AbstractCrudController.  Given that currentUserService.getCurrentUser() is
not such a lightweight operation we clearly need to be much more
conservative in how often we use it.  Currently it is peppered all ove the
place.  Can't we do this just once per session?  And from there on make
sure to use getAccess( T object, User user ) ?

I'd bet a pint of best that this would reduce the time for that call to
list dataelements by 90%.

Bob

On 22 January 2015 at 10:58, Bob Jolliffe <bobjolli...@gmail.com> wrote:

> Sorry Jason.  A quick look through isnt shedding light on me :-(
>
> Is this a test environment or did you run that against a running server
> with other users?  There doesn't seem to be any tight loop like we saw with
> the paging bug, so I guess what we have here is not so much as a logical
> error as some sort of gross inefficiency somewhere.
> One thing I notice is that almost all RUNNABLE threads are somehow engaged
> with operations relating to user service.  getCurrentUser,
> getUserCredentials etc.  Given that this is a static snapshot in time of
> where exactly each thread is at, it seems odd that so many are caught up
> chasing users instead of doing real work.  This might be a red herring or a
> pointer to that gross inefficiency - it might be that if the user
> credentials are being retrieved and tested each time the user is touching a
> metadata object (and it it is maybe touching many of them with this
> metadata export) that that is leading to excessive activity.  Do you have a
> vast number of users in the system by any chance?
>
> On 22 January 2015 at 08:09, Jason Pickering <jason.p.picker...@gmail.com>
> wrote:
>
>> OK, is this better?  Stack attached.
>>
>> Tasks: 178 total,   1 running, 177 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 99.1%us,  0.1%sy,  0.0%ni,  0.4%id,  0.0%wa,  0.0%hi,  0.0%si,
>>  0.3%st
>> Mem:  16433100k total, 16330904k used,   102196k free,    71828k buffers
>> Swap:   262140k total,    25108k used,   237032k free, 11411252k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 28091 dhis      20   0 12.0g 3.6g 5916 S  796 23.0 328:48.52 java
>>
>>
>> Command used was
>>
>> curl --verbose "
>> http://XXXX/dhis/api/dataElements.json?links=false&paging=false"; -u
>> XXXX:XXXX
>>
>> Regards,
>> Jason
>>
>>
>> On Wed, Jan 21, 2015 at 11:05 PM, Bob Jolliffe <bobjolli...@gmail.com>
>> wrote:
>>
>>> Jason the cpu is not really "through the roof" here.  Can you try
>>> another dump (while the cpu is through the roof)?
>>>
>>> The interesting threads to look at are the http-bio-8080 threads, which
>>> are the tomcat worker threads.  If the server is not being accessed by
>>> dozens of users most of them should be sitting in a WAITING state.  So then
>>> it is interesting to look at those which are marked as RUNNABLE.  These are
>>> the ones which are trying as hard as they can to get the attention of the
>>> scheduler in order to chew cpu cycles.
>>>
>>> So if your process has become cpu bound then sometimes its possible to
>>> figure out which thread it is and what it is trying to do.  This attached
>>> stack dump doesn't reveal much unfortunately, but then again the cpu (from
>>> top) is showing 62% idle.  So try and get the cpu burning up again and grab
>>> 2 or 3 jstack samples.  If it is cpu bound then it might show.
>>>
>>>
>>>
>>> On 21 January 2015 at 19:28, Jason Pickering <
>>> jason.p.picker...@gmail.com> wrote:
>>>
>>>> Not really sure what is going on, but here is the stack dump. Maybe it
>>>> will help.
>>>>
>>>> And some more info below.
>>>>
>>>>
>>>> top - 21:26:40 up 477 days, 16:20,  2 users,  load average: 2.99, 2.05,
>>>> 1.03
>>>> Tasks: 171 total,   1 running, 170 sleeping,   0 stopped,   0 zombie
>>>> Cpu(s): 38.3%us,  0.0%sy,  0.0%ni, 61.6%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>>  0.1%st
>>>> Mem:  16433100k total, 15416512k used,  1016588k free,    86276k buffers
>>>> Swap:   262140k total,    18872k used,   243268k free, 12412748k cached
>>>>
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 28091 dhis      20   0 11.8g 2.1g  13m S  307 13.2  29:39.99 java
>>>>
>>>>
>>>> Regards,
>>>> Jason
>>>>
>>>>
>>>> On Wed, Jan 21, 2015 at 7:42 PM, Bob Jolliffe <bobjolli...@gmail.com>
>>>> wrote:
>>>>
>>>>> If it is running for 15 minutes then almost certainly it is stuck in
>>>>> some kind of loop.  Find the pid of the running jvm and also the owner of
>>>>> the java process.  Then run:
>>>>>
>>>>> sudo -u <process owner> jstack <pid>  > jstack.txt
>>>>>
>>>>> That might give an idea of what the threads are up to ...
>>>>>
>>>>> On 21 January 2015 at 18:26, Morten Olav Hansen <morte...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> 18k is not really that much.. is it giving you any kind of errors?
>>>>>> given the SL demo db... we can easily export much more objects that 
>>>>>> that..
>>>>>>
>>>>>> --
>>>>>> Morten
>>>>>>
>>>>>> On Thu, Jan 22, 2015 at 12:42 AM, Bob Jolliffe <bobjolli...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Can you try and get a jstack dump while it is busy chomping cpu.
>>>>>>> That will probably indicate where in the code it is spinning.  There 
>>>>>>> was a
>>>>>>> problem earlier with a weakness in the paging code but should have been
>>>>>>> fixed by that build.
>>>>>>>
>>>>>>> On 21 January 2015 at 15:00, Jason Pickering <
>>>>>>> jason.p.picker...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> In one DHIS2 instance, we have a rather large number of data
>>>>>>>> elements, more than 18,000. During the detailed metadata export, a 
>>>>>>>> request
>>>>>>>> is made to "api/dataElements.json?paging=false&links=false". This 
>>>>>>>> throws
>>>>>>>> the CPU of the server through the roof, with the operation taking many
>>>>>>>> minutes (has been running for 15 minutes now with no response).There 
>>>>>>>> are no
>>>>>>>> errors in the logs or the browser. Build revision is not known, but  
>>>>>>>> it is
>>>>>>>> 2.17 from from 2015-01-17 02:03.
>>>>>>>>
>>>>>>>> Any idea what might be going on here?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Jason
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jason P. Pickering
>>>>>>>> email: jason.p.picker...@gmail.com
>>>>>>>> tel:+46764147049
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>>> Post to     : dhis2-devs@lists.launchpad.net
>>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@lists.launchpad.net
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jason P. Pickering
>>>> email: jason.p.picker...@gmail.com
>>>> tel:+46764147049
>>>>
>>>
>>>
>>
>>
>> --
>> Jason P. Pickering
>> email: jason.p.picker...@gmail.com
>> tel:+46764147049
>>
>
>
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp

Reply via email to