[web2py] Re: Web2py locks row at auth_user and web2py_session_* tables and doesn't unlock them

Massimo Di Pierro Sat, 05 May 2018 20:13:49 -0700

That is a very old version of web2py. The problem may have been solved 
years ago. ;-)


On Thursday, 3 May 2018 16:37:53 UTC-5, Lisandro wrote:
>
> I came back to reply this particular message, because the problem happened 
> again yesterday.
>
> Just in order to remember, the problem was with redis_cache.py failing to 
> write some specific key name.
> Not *__lock object was found in the cache during the incident.
> Connecting manually to redis-cli would show that the key isn't present and 
> can be written and read.
> But from the web2py application (using the redis_cache.py adapter), the 
> application hangs with that specific key.
> The only solution is to change the key name. Weird. 
>
>
> Yesterday, our Redis server crashed (it reported out of memory, but the 
> server had plenty of RAM available, this was due to an issue with memory 
> fragmentation).
> Anyway, after that crash, I did a flushall and restarted Redis server. 
> Everything started to work ok for almost the total 260 websites we run 
> with web2py, but 4 of them kept hanging. 
> All the requests to those websites were hanging. I put some log sentences, 
> and I found out that it was the same problem, this time with another random 
> key name. 
>
> The solution was, again, to de-compile the apps, manually edit the source 
> code and change the key name. But of course, this is not a solution.
> Again, this time there was no *__lock key in the cache, and I could 
> succesfully use redis-cli to write and read the key name. 
>
> I don't have a proper solution yet. 
> redis-cli flushall doesn't help.
>
>
> I'm using an old web2py version (Version 
> 2.10.3-stable+timestamp.2015.04.02.21.42.07).
> I already have plans to upgrade. 
> But before, could I manually update only gluon/contrib/redis_*.py files?
> In your opinion, does that test worthwhile? Does it make any sense?
>
>
>
> El viernes, 20 de abril de 2018, 8:28:28 (UTC-3), Lisandro escribió:
>>
>> Sorry to bother you again with this, but I think I've found the problem.
>> *The problem is apparently with Redis integration. *It had nothing to do 
>> with connections, database, sessions, none of that. Here is what I've found.
>>
>> Remember, the line where my app hangs is this:
>>
>> *session.important_messages = cache.redis('important-messages-%s' % 
>> auth.user.id <http://auth.user.id/>,*
>> *                                          lambda: 
>> get_important_messages(), *
>> *                                         time_expire=180)*
>>
>>
>> As the problem only presented in production, on the website of my 
>> customer, I asked him to allow me to play a little with the code. 
>> So, first thing I did was to cache request.now instead of calling the 
>> function "get_important_messages()", but the problem remained.
>> Then I thought "maybe if I change the key..." and I changed the original 
>> code to this:
>>
>> *session.important_messages = cache.redis('important-messages',*
>> *                                         lambda: 
>> get_important_messages(),*
>> *                                         time_expire=180)*
>>
>>
>> *Notice that only thing I changed was the key to store in Redis. And it 
>> worked! *I thought that maybe "auth.user.id" was some large number, but 
>> I checked and the user ID is 3. Tried to pass it like int(auth.user.id) 
>> but I had no success. *App still hangs when I try to retrieve that 
>> specific key*. Only that key.
>>
>> I've connected to redis-cli and it tells me that the key isn't there.
>> So I set a "hello" value for the key, I get it, then I deleted it:
>>
>> $ redis-cli
>> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
>> (nil)
>> 127.0.0.1:6379> SET w2p:myapp:important-messages-3 "hello"
>> OK
>> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
>> "\x00\x05hello\x06\x00\xf5\x9f\xb7\xf6\x90a\x1c\x99"
>> 127.0.0.1:6379> DEL w2p:myapp:important-messages-3
>> (integer) 1127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
>> 127.0.0.1:6379> DUMP w2p:myapp:important-messages-3
>> (nil)
>>
>>
>> But event after that, web2py hangs with this simple code:
>>
>> *r = cache.redis('important-messages-3', **lambda: request.now, *
>> *time_expire=30)*
>>
>> This happens only with that specific key. I can set the key to 
>> "important-messages-2", "important-messages-999", "important-messages-A", 
>> anything I can think, but with that specific key it hangs.
>>
>> We have several websites (around 200), and this problem has happened 
>> about 5 o 6 times in different websites, but it was always the same 
>> problem. The only solution I had (until now) was to create a new account 
>> for the user (that explains why it worked with a new account, that is 
>> because the new account had a different auth.user.id, so the key to 
>> store in redis was different).
>>
>> Could this be a bug in the redis_cache.py integration?
>> Maybe I should open a new thread about this, right?
>>
>>
>> El jueves, 19 de abril de 2018, 10:27:46 (UTC-3), Lisandro escribió:
>>>
>>> Hi there,
>>> I've found the issue but I still don't know how is it produced.
>>> Anthony was right from the begining when he said "the app is not hanging 
>>> because the locks are being held, but rather the locks are being held 
>>> because the app is hanging"
>>> Since that comment, I was waiting for the problem to happen again to 
>>> decompile the app and print some logs to see exactly the line of code where 
>>> the application hangs. 
>>>
>>> So that's what I did, and *I've found that my app indeed hangs in an 
>>> specific line of code of models/db.py:*
>>> This is my models/db.py resumed:
>>>
>>>
>>> if auth.is_logged_in() and auth.user.responsable:
>>>
>>>     
>>>
>>> *# ----------- THIS IS THE LINE WHERE THE CODE HANGS ----------*
>>>     *session.important_messages = cache.redis('important_messages-%s' % 
>>> auth.user.id <http://auth.user.id>,*
>>> *                                             lambda: 
>>> get_important_messages(), *
>>> *                                             time_expire=180)*
>>>
>>>
>>>
>>>
>>> So I checked what the function "get_important_messages()" does, and I 
>>> see that it connects to a webservice (also developed with web2py):
>>>
>>>
>>> def get_important_messages():
>>>     from gluon.contrib.simplejsonrpc import ServerProxy
>>>
>>>     webservice = ServerProxy('
>>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1')
>>>     try:
>>>         result = webservice.get_account_info(CONFIG.customer_id)
>>>     except Exception as e:
>>>         result = []
>>>     return result
>>>
>>>
>>>
>>> Then I went to double check my nginx error.log, this time looking for 
>>> errors in the URL that the app uses to connect to the webservice. 
>>> Surprisingly, I'm seeing a few timeouts everyday for that URL:
>>>
>>> 2018/04/17 15:08:22 [error] 23587#23587: *93711423 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:08:22 [error] 23587#23587: *93711449 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX2 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:08:36 [error] 23582#23582: *93711928 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:09:04 [error] 23582#23582: *93713029 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX3 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:09:16 [error] 23591#23591: *93713451 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:09:24 [error] 23582#23582: *93713819 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX4 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:09:25 [error] 23582#23582: *93713839 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX5 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:10:25 [error] 23582#23582: *93716003 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX1 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:12:34 [error] 23591#23591: *93720887 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX6 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:12:36 [error] 23590#23590: *93720938 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/17 15:12:50 [error] 23589#23589: *93721468 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 10:39:39 [error] 16600#16600: *89723537 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX7 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 10:40:10 [error] 16601#16601: *89724987 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 10:40:11 [error] 16602#16602: *89725040 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 16:59:46 [error] 17874#17874: *90771814 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 17:00:56 [error] 17877#17877: *90774663 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX8 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/16 17:01:11 [error] 17879#17879: *90775407 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>> 2018/04/15 13:46:46 [error] 11395#11395: *86829630 upstream timed out 
>>> (110: Connection timed out) while reading response header from upstream, 
>>> client: MY.OWN.SERVER.IP, server: main-app-domain.com, request: "POST 
>>> /ws/call/jsonrpc?token=XXX9 HTTP/1.1", upstream: 
>>> "uwsgi://unix:///tmp/medios.socket", host: "main-app-domain.com"
>>>
>>>
>>> So, what I know now is that *the problem are these timeouts that occur 
>>> ocasionally when an app tries to connect the main webservice with this 
>>> code:*
>>>
>>> webservice = ServerProxy('
>>> https://main-app-domain.com/ws/call/jsonrpc?token=XXX1'
>>>
>>>
>>>
>>> This is the code of the ws.py controller that implements the webservice:
>>>
>>> # -*- coding: utf-8 -*-
>>>
>>> from gluon.tools import Service
>>>
>>> service = Service()
>>>
>>>
>>> def call():
>>>     if not request.vars.token or not db(db.websites.token == 
>>> request.vars.token).count():
>>>         raise HTTP(403)
>>>     session.forget()
>>>     return service()
>>>
>>>
>>>
>>> Notice that the call receives a token, and every app that tries to 
>>> connect has its own token, in order to validate the connection.
>>> I'm not sure why some of the calls to the webservice hang, but I'm sure 
>>> of this:
>>>
>>>    - While some of these calls time out, other identical calls work 
>>>    properly (and they are all identical, just calls to connect to the 
>>>    webservice).
>>>    - Just in case, I've checked that my nginx configuration isn't 
>>>    applying requests limits to my server IP or something like that, but no 
>>>    warning or error regarding this is showed in the nginx error.log
>>>    - Also, just in case, I checked my pgBouncer log to see if 
>>>    connections to the main database are exhausted, but that's not the case 
>>>    either (actually, if this was the case, I would see error tickets 
>>> created 
>>>    and also any other attempt of connection to the webservice would fail, 
>>> when 
>>>    this is not happening).
>>>
>>>
>>> Now I'm lost here, I don't see how the attempt of connection to the 
>>> webservice could fail. 
>>> Maybe network problems, but they should affect other connections as well.
>>>
>>> Any comment or suggestion will be much apreciated.
>>> Regards,
>>> Lisandro.
>>>
>>>
>>>
>>>
>>>
>>>
>>> El lunes, 16 de abril de 2018, 18:57:47 (UTC-3), Lisandro escribió:
>>>>
>>>> Hi, thank you both for your time and concern.
>>>>
>>>> @Richard: this particular website was still running with sessions 
>>>> stored in Redis. As we have several websites, moving sessions to Redis is 
>>>> something that we will do progressively in the next weeks.
>>>>
>>>> @Anthony: the database server is PostgreSQL, running in the same VPS, 
>>>> so I wouldn't say it's due to network problems. I do have pgBouncer and I 
>>>> limit the pool size to only 1 connection (with 2 of reserve pool) per 
>>>> database. The app didn't have much load (actually it almost never has), 
>>>> but 
>>>> in this situation, with that query hanging for 60 seconds, it's probable 
>>>> that the ticket error was because there were no more connections available 
>>>> for that db (for example, if the user with the problem tried 
>>>> simultaneously 
>>>> in a laptop, in a pc and in his mobile phone). 
>>>>
>>>>
>>>> Some (weird) points about the problem:
>>>>
>>>>    - While it presents in an specific account, other user accounts can 
>>>>    login and work perfectly with the app.
>>>>    - As an admin, I have the permission to impersonate other user 
>>>>    accounts. When the problem happens, I can impersonate any account but 
>>>> the 
>>>>    one with the problem (the impersonation is successfull, but the same 
>>>>    timeout presents after I'm impersonating the account).
>>>>    - Problem doesn't go away deleting all web2py_session_table records 
>>>>    and clearing cookies.
>>>>    - Problem doesn't go away changing the account email or password.
>>>>    - The only solution I've been applying last times it happened, was 
>>>>    to create a new account for the user and invalidate the old one.
>>>>
>>>>
>>>> Today, when the problem happened, I created the new account for the 
>>>> user and moved the sessions to Redis. Maybe I should have kept sessions in 
>>>> the db, in order to debug the problem with that account. Now it's not 
>>>> possible anymore, because I already moved to Redis. Of course I could move 
>>>> back sessions to db, but I don't like the idea of debugging at production 
>>>> in the website of a customer, specially one who had a recent issue with 
>>>> this.
>>>>
>>>> So, I'll wait if it happens again, and I'll try to leave the account 
>>>> there to do some tests.
>>>> Thank you very much for your time!
>>>>
>>>>
>>>> El lunes, 16 de abril de 2018, 17:31:47 (UTC-3), Anthony escribió:
>>>>>
>>>>> Where is the database server running? Is it possible there are 
>>>>> occasional network problems connecting to it?
>>>>>
>>>>> Anthony
>>>>>
>>>>> On Monday, April 16, 2018 at 3:15:54 PM UTC-4, Lisandro wrote:
>>>>>>
>>>>>> Hi there, sorry to bother again, I have some additional info that 
>>>>>> could help.
>>>>>>
>>>>>> The problem happened again, exactly the same as the other times. 
>>>>>> But this time an error ticket was created with this traceback:
>>>>>>
>>>>>>    - 
>>>>>>    
>>>>>>    Traceback (most recent call last):
>>>>>>      File "/var/www/medios/gluon/main.py", line 463, in wsgibase
>>>>>>        session._try_store_in_db(request, response)
>>>>>>      File "/var/www/medios/gluon/globals.py", line 1152, in 
>>>>>> _try_store_in_db
>>>>>>        if not table._db(table.id == record_id).update(**dd):
>>>>>>      File "/var/www/medios/gluon/packages/dal/pydal/objects.py", line 
>>>>>> 2117, in update
>>>>>>        ret = db._adapter.update("%s" % 
>>>>>> table._tablename,self.query,fields)
>>>>>>      File "/var/www/medios/gluon/packages/dal/pydal/adapters/base.py", 
>>>>>> line 988, in update
>>>>>>        raise e
>>>>>>    DatabaseError: query_wait_timeout
>>>>>>    server closed the connection unexpectedly
>>>>>>        This probably means the server terminated abnormally
>>>>>>        before or while processing the request.
>>>>>>    
>>>>>>    
>>>>>>
>>>>>> Could this indicate that for some reason web2py is failing to store 
>>>>>> the session?
>>>>>> Or could it still be that a deadlock in my app code is producing this 
>>>>>> error?
>>>>>>
>>>>>>
>>>>>> El viernes, 6 de abril de 2018, 18:59:28 (UTC-3), Lisandro escribió:
>>>>>>>
>>>>>>> Oh, I see, you made a good point there, I hadn't realised.
>>>>>>>
>>>>>>> I guess I will have to take a closer look to my app code. 
>>>>>>> Considering that the problem exists in specific accounts while others 
>>>>>>> work 
>>>>>>> ok, and considering also that the problem happens with any request that 
>>>>>>> that specific user makes to any controller/function, I'm thinking: what 
>>>>>>> does my app do different for a user compared to another one at request 
>>>>>>> level? For "request level" I mean all the code the app runs in every 
>>>>>>> request, to start, the models/db.py
>>>>>>>
>>>>>>> I'll take a closer look to that and will post another message here 
>>>>>>> if I find something that could signal the root cause of the issue. 
>>>>>>>
>>>>>>> Thank you very much for your help!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> El viernes, 6 de abril de 2018, 16:05:13 (UTC-3), Anthony escribió:
>>>>>>>>
>>>>>>>> On Friday, April 6, 2018 at 10:58:56 AM UTC-4, Lisandro wrote:
>>>>>>>>>
>>>>>>>>> Yes, in fact, I've been running that SQL command to check for 
>>>>>>>>> locks, and sometimes I see that lock on other tables, but that other 
>>>>>>>>> locks 
>>>>>>>>> live for less than a second. However, when the problem happens, the 
>>>>>>>>> lock on 
>>>>>>>>> the auth_user and web2py_session tables remains there for the whole 
>>>>>>>>> 60 
>>>>>>>>> seconds.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, but that doesn't mean the lock or the database has anything to 
>>>>>>>> do with the app hanging. The locks will be held for the duration of 
>>>>>>>> the 
>>>>>>>> database transaction, and web2py wraps HTTP requests in a transaction, 
>>>>>>>> so 
>>>>>>>> the transaction doesn't end until the request ends (unless you 
>>>>>>>> explicitly 
>>>>>>>> call db.commit()). In other words, the app is not hanging because the 
>>>>>>>> locks 
>>>>>>>> are being held, but rather the locks are being held because the app is 
>>>>>>>> hanging. First you have to figure out why the app is hanging (it could 
>>>>>>>> be 
>>>>>>>> the database, but could be something else).
>>>>>>>>
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: Web2py locks row at auth_user and web2py_session_* tables and doesn't unlock them

Reply via email to