Hello,
Dnia poniedziałek, 22 września 2014 22:02:26 Sławek Kapłoński pisze: > Hello, > > Answears below > > --- > Best regards > Sławek Kapłoński > sla...@kaplonski.pl > > Dnia poniedziałek, 22 września 2014 13:41:51 Jay Pipes pisze: > > Hi Peter, Sławek, answers inline... > > > > On 09/22/2014 08:12 AM, Peter Boros wrote: > > > Hi, > > > > > > StaleDataError is not given by MySQL, but rather SQLAlchemy. After a > > > quick look, it seems like SQLAlchemy gets this, if the update updated > > > different number of rows then it expected. I am not sure what is the > > > expectation based on, perhaps soembody can chime in and we can put > > > this together. What is the transaction isolation level you are running > > > on? > > > > The transaction isolation level is REPEATABLE_READ, unless Sławek has > > changed the defaults (unlikely). > > For sure I didn't change it > > > > For the timeout setting in neutron: that's a good way to approach it > > > too, you can even be more agressive and set it to a few seconds. In > > > MySQL making connections is very cheap (at least compared to other > > > databases), an idle timeout of a few seconds for a connection is > > > typical. > > > > > > On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <sla...@kaplonski.pl> > > wrote: > > >> Hello, > > >> > > >> Thanks for Your explanations. I thought so and now I decrease > > >> "idle_connection_timeout" in neutron and nova. Now when master server > > >> is > > >> back to cluster than in less than one minute all conections are again > > >> made to this master node becuase old connections which was made to > > >> backup node are closed. So for now it looks almost perfect but when I > > >> now testing cluster (with master node active and all connections > > >> established to this node) in neutron I still sometimes see errors like: > > >> StaleDataError: UPDATE statement on table 'ports' expected to update 1 > > >> row(s); 0 were matched. > > >> > > >> and also today I found errors like: > > >> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] > > >> ROLLBACK 2014-09-22 11:38:05.784 11474 ERROR > > >> neutron.openstack.common.db.sqlalchemy.session [-] DB exception > > >> wrapped. > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session Traceback (most recent > > >> call > > >> last): > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist- > > >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, > > >> in > > >> _wrap > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session return f(*args, > > >> **kwargs) 2014-09-22 11:38:05.784 11474 TRACE > > > > From looking up the code, it looks like you are using Havana [1]. The > > > > code in the master branch of Neutron now uses oslo.db, not > > neutron.openstack.common.db, so this issue may have been resolved in > > later versions of Neutron. > > Yes, I'm using havana and I have now no possibility to upgrade it fast to > icehouse (about master branch I even don't want to think :)). Do You want to > tell me that this problem will be existing in havana and this can't be > fixed in that release? > > > [1] > > https://github.com/openstack/neutron/blob/stable/havana/neutron/openstack/ > > co mmon/db/sqlalchemy/session.py#L524 > > > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist- > > >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, > > >> in > > >> flush 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session return > > >> super(Session, > > >> self).flush(*args, **kwargs) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line > > >> 1818, > > >> in > > >> flush > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session self._flush(objects) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line > > >> 1936, > > >> in > > >> _flush > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session > > >> transaction.rollback(_capture_exception=True) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line > > >> 58, in __exit__ > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session > > >> compat.reraise(exc_type, > > >> exc_value, exc_tb) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line > > >> 1900, > > >> in > > >> _flush > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session > > >> flush_context.execute() > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line > > >> 372, in execute > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session rec.execute(self) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line > > >> 525, in execute > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session uow > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > > >> 64, in save_obj > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session table, insert) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > > >> 541, in _emit_insert_statements > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session execute(statement, > > >> multiparams) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, > > >> in > > >> execute > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session params) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, > > >> in > > >> _execute_clauseelement > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session compiled_sql, > > >> distilled_params > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, > > >> in > > >> _execute_context > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session context) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line > > >> 1027, > > >> in > > >> _handle_dbapi_exception > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session > > >> util.reraise(*exc_info) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, > > >> in > > >> _execute_context > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session context) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", > > >> line > > >> 60, in do_executemany > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session rowcount = > > >> cursor.executemany(statement, parameters) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session File > > >> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in > > >> executemany > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session r = r + > > >> self.execute(query, a) > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported > > >> operand type(s) for +: 'int' and 'NoneType' > > > > Hmm, this is just bad coding in the MySQLdb driver, frankly. It is > > assuming a call to Cursor.execute() will return an integer, but it can > > return None in some circumstances. See code here: > > > > http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b727a > > a9 8b43f3632c/tree/MySQLdb/MySQLdb/cursors.py > > > > Note that MySQLdb1 (the above, which is what is packaged in your Linux > > distro I believe) is old and buggy. The maintainer has moved on to > > MySQLdb2, which has a different call interface in this part of the code. > > Ok, I checked that code and I found that in fact sometimes method "execute" > returns None and when it is called from executemany than there is problem. > Simple change in executemany method to: > > if not m: > r = 0 > for a in args: > iter_r = self.execute(query, a) > if iter_r: > r = r + iter_r > return r > > for me now in that tests which I made but I don't know what be result of > that change in longer time period and why execute method returns that None > value :/ > > You wrote that MySQLdb2 has different call interface in this part of code - > should change to MySQLdb2 fix that problem or not? If yes can You explain me > how I can change it? Should I install other package just or maybe I should > set in somewhere in config? (I'm using ubuntu 12.04) > > > >> 2014-09-22 11:38:05.784 11474 TRACE > > >> neutron.openstack.common.db.sqlalchemy.session > > >> > > >> And I have to investigate why it is happening becuase whith single > > >> mysql > > >> server I have no such errors :/ > > > > Not sure, frankly. The code is executing many INSERT or UPDATE > > statements in a single block. The MySQL connection is clearly getting > > borked on one of those attempts and results in the traceback you see > > above. > > I'm not sure in 100% but on much bigger cluster with one database server > there was no such problem even once so I supposed that it is related > somehow to galera which I now want to use. Today I checked that problem was always when neutron tries to delete row from ipavailabilityranges table and when I moved whole database to old (one) mysql server then problem was the same. So it is not related to galera or haproxy. This problem is strange for me but I think that small patch for python-mysqldb library (as I wrote in previous email) solves this problem :) > > > best, > > -jay > > > > >> --- > > >> Best regards > > >> Sławek Kapłoński > > >> sla...@kaplonski.pl > > >> > > >> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze: > > >>> Hi, > > >>> > > >>> Let me answer this and one of your previous questions in one because > > >>> they are related. > > >>> > > >>> Earlier you wrote: > > >>>> I made such modifications today in my infra and generally it looks > > >>>> better now. I don't see deadlocks. But I have one more problem with > > >>>> that: generally it works fine when main node is active but in > > >>>> situation when this node is down, haproxy connect to one of backup > > >>>> nodes. Still all is ok but problem is when main node is up again - > > >>>> all > > >>>> new connections are made to main node but active connections which > > >>>> was > > >>>> made to backup node still are active and neutron (or nova) are using > > >>>> connections to two servers and then there are problems with deadlock > > >>>> again. > > >>>> Do You know how to prevent such situation? > > >>> > > >>> This is because of how haproxy works. Haproxy's load balancing is TCP > > >>> level, once the TCP connection is established, haproxy has nothing to > > >>> do with it. If the MySQL application (neutron in this case), uses > > >>> persistent connections, at the time of failing over, haproxy doesn't > > >>> make an extra decision upon failover, because a connection is already > > >>> established. This can be mitigated by using haproxy 1.5 and defining > > >>> the backend with on-marked-down shutdown-sessions, this will kill the > > >>> connections at the TCP level on the formerly active node. Or in case > > >>> of graceful failover, include killing connections in the failover > > >>> script on the formerly active node. The application in this case will > > >>> get error 1 or 2 you described. > > >>> > > >>> From your description error 1 and 2 are related to killing > > >>> > > >>> connections. Case 1 (MySQL server has gone away) happens when the > > >>> connection was killed (but not at the MySQL protocol level) while it > > >>> was idle, and the application is attempting to re-use it. In this case > > >>> the correct behaviour would be re-establishing the connection. Error 2 > > >>> is the same thing, but while the connection was actually doing > > >>> something, reconnecting and retrying is the correct behaviour. These > > >>> errors are not avoidable, if the node dies non-gracefully. A server > > >>> can for example lose power while doing the transaction, in this case > > >>> the transaction will be aborted, and the application will get one of > > >>> the errors described above. The application has to know that the data > > >>> is not written, since it didn't do commit or database didn't > > >>> acknowledge the commit. > > >>> > > >>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński > > >>> <sla...@kaplonski.pl> > > >> > > >> wrote: > > >>>> Hello, > > >>>> > > >>>> New questions below > > >>>> > > >>>> --- > > >>>> Best regards > > >>>> Sławek Kapłoński > > >>>> sla...@kaplonski.pl > > >>>> > > >>>> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze: > > >>>>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700: > > >>>>>> Hello, > > >>>>>> > > >>>>>> Is anyone here using openstack with mysql galera and haproxy? Have > > >>>>>> You > > >>>>>> got > > >>>>>> any problems with that? > > >>>>>> I was today installed such ha infra for database (two mysql servers > > >>>>>> in > > >>>>>> galera cluster and haproxy on controller and neutron node, this > > >>>>>> haproxy > > >>>>>> is connecting to one of galera servers with round robin algorithm). > > >>>>>> Generally all is working fine but I have few problems: > > >>>>>> 1. I have a lot of messages like: > > >>>>>> WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got > > >>>>>> mysql > > >>>>>> server > > >>>>>> has gone away: (2006, 'MySQL server has gone away') > > >>>>>> 2. I have (most on neutron) many errors like: > > >>>>>> OperationalError: (OperationalError) (2013, 'Lost connection to > > >>>>>> MySQL > > >>>>>> server during query') 'UPDATE ml2_port_bindings SET vif_type=%s, > > >>>>>> driver=%s, segment=%s WHERE ml2_port_bindings.port_id = > > >>>>> > > >>>>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They > > >>>>> need > > >>>>> to be just a little longer than MySQL's connection timeouts. > > >>>> > > >>>> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in > > >>>> neutron > > >>>> and nova problem 1 looks that is solver. Unfortunatelly I found that > > >>>> when > > >>>> I'm deleting port from neutron I still sometimes have got errors like > > >>>> in > > >>>> 2. I don't check exactly nova logs yet so I'm not sure is it only in > > >>>> neutron or in both. > > >>>> Do You maybe know why it happens in neutron? It not happend when I > > >>>> have > > >>>> single mysql node without haproxy and galera so I suppose that > > >>>> haproxy > > >>>> or > > >>>> galera is responsible for that problem :/ > > >>>> > > >>>>>> 3. Also errors: > > >>>>>> StaleDataError: UPDATE statement on table 'ports' expected to > > >>>>>> update > > >>>>>> 1 > > >>>>>> row(s); 0 were matched. > > >>>>>> 4. and errors: > > >>>>>> DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying > > >>>>>> to > > >>>>>> get > > >>>>>> lock; try restarting transaction') 'UPDATE ipavailabilityranges SET > > >>>>>> first_ip=%s WHERE ipavailabilityranges.allocation_pool_id = > > >>>>> > > >>>>> 3 and 4 are a known issue. Our code doesn't always retry > > >>>>> transactions, > > >>>>> which is required to use Galera ACTIVE/ACTIVE. Basically, that > > >>>>> doesn't > > >>>>> work. > > >>>>> > > >>>>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where > > >>>>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for > > >>>>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work > > >>>>> hasn't > > >>>>> been done to make the concurrent transactions work properly. > > >>>>> > > >>>>> _______________________________________________ > > >>>>> OpenStack-operators mailing list > > >>>>> OpenStack-operators@lists.openstack.org > > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operat > > >>>>> or > > >>>>> s > > >>>> > > >>>> _______________________________________________ > > >>>> OpenStack-operators mailing list > > >>>> OpenStack-operators@lists.openstack.org > > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato > > >>>> rs > > >> > > >> _______________________________________________ > > >> OpenStack-operators mailing list > > >> OpenStack-operators@lists.openstack.org > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators --- Pozdrawiam Sławek Kapłoński sla...@kaplonski.pl
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators