Hi Peter, Sławek, answers inline...
On 09/22/2014 08:12 AM, Peter Boros wrote:
Hi,
StaleDataError is not given by MySQL, but rather SQLAlchemy. After a
quick look, it seems like SQLAlchemy gets this, if the update updated
different number of rows then it expected. I am not sure what is the
expectation based on, perhaps soembody can chime in and we can put
this together. What is the transaction isolation level you are running
on?
The transaction isolation level is REPEATABLE_READ, unless Sławek has
changed the defaults (unlikely).
For the timeout setting in neutron: that's a good way to approach it
too, you can even be more agressive and set it to a few seconds. In
MySQL making connections is very cheap (at least compared to other
databases), an idle timeout of a few seconds for a connection is
typical.
On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <sla...@kaplonski.pl> wrote:
Hello,
Thanks for Your explanations. I thought so and now I decrease
"idle_connection_timeout" in neutron and nova. Now when master server is back
to cluster than in less than one minute all conections are again made to this
master node becuase old connections which was made to backup node are closed.
So for now it looks almost perfect but when I now testing cluster (with master
node active and all connections established to this node) in neutron I still
sometimes see errors like:
StaleDataError: UPDATE statement on table 'ports' expected to update 1 row(s);
0 were matched.
and also today I found errors like:
2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] ROLLBACK
2014-09-22 11:38:05.784 11474 ERROR
neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session Traceback (most recent call
last):
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-
packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in
_wrap
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session return f(*args, **kwargs)
2014-09-22 11:38:05.784 11474 TRACE
From looking up the code, it looks like you are using Havana [1]. The
code in the master branch of Neutron now uses oslo.db, not
neutron.openstack.common.db, so this issue may have been resolved in
later versions of Neutron.
[1]
https://github.com/openstack/neutron/blob/stable/havana/neutron/openstack/common/db/sqlalchemy/session.py#L524
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-
packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in flush
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session return super(Session,
self).flush(*args, **kwargs)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818, in
flush
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session self._flush(objects)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936, in
_flush
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session
transaction.rollback(_capture_exception=True)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 58, in
__exit__
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session compat.reraise(exc_type,
exc_value, exc_tb)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900, in
_flush
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session flush_context.execute()
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 372, in
execute
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session rec.execute(self)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 525, in
execute
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session uow
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 64, in
save_obj
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session table, insert)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 541, in
_emit_insert_statements
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session execute(statement,
multiparams)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in
execute
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session params)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in
_execute_clauseelement
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session compiled_sql,
distilled_params
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in
_execute_context
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session context)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027, in
_handle_dbapi_exception
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session util.reraise(*exc_info)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, in
_execute_context
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session context)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line 60,
in do_executemany
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session rowcount =
cursor.executemany(statement, parameters)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session File
"/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in
executemany
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session r = r + self.execute(query,
a)
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported operand
type(s) for +: 'int' and 'NoneType'
Hmm, this is just bad coding in the MySQLdb driver, frankly. It is
assuming a call to Cursor.execute() will return an integer, but it can
return None in some circumstances. See code here:
http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b727aa98b43f3632c/tree/MySQLdb/MySQLdb/cursors.py
Note that MySQLdb1 (the above, which is what is packaged in your Linux
distro I believe) is old and buggy. The maintainer has moved on to
MySQLdb2, which has a different call interface in this part of the code.
2014-09-22 11:38:05.784 11474 TRACE
neutron.openstack.common.db.sqlalchemy.session
And I have to investigate why it is happening becuase whith single mysql
server I have no such errors :/
Not sure, frankly. The code is executing many INSERT or UPDATE
statements in a single block. The MySQL connection is clearly getting
borked on one of those attempts and results in the traceback you see above.
best,
-jay
---
Best regards
Sławek Kapłoński
sla...@kaplonski.pl
Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
Hi,
Let me answer this and one of your previous questions in one because
they are related.
Earlier you wrote:
I made such modifications today in my infra and generally it looks
better now. I don't see deadlocks. But I have one more problem with
that: generally it works fine when main node is active but in
situation when this node is down, haproxy connect to one of backup
nodes. Still all is ok but problem is when main node is up again - all
new connections are made to main node but active connections which was
made to backup node still are active and neutron (or nova) are using
connections to two servers and then there are problems with deadlock
again.
Do You know how to prevent such situation?
This is because of how haproxy works. Haproxy's load balancing is TCP
level, once the TCP connection is established, haproxy has nothing to
do with it. If the MySQL application (neutron in this case), uses
persistent connections, at the time of failing over, haproxy doesn't
make an extra decision upon failover, because a connection is already
established. This can be mitigated by using haproxy 1.5 and defining
the backend with on-marked-down shutdown-sessions, this will kill the
connections at the TCP level on the formerly active node. Or in case
of graceful failover, include killing connections in the failover
script on the formerly active node. The application in this case will
get error 1 or 2 you described.
From your description error 1 and 2 are related to killing
connections. Case 1 (MySQL server has gone away) happens when the
connection was killed (but not at the MySQL protocol level) while it
was idle, and the application is attempting to re-use it. In this case
the correct behaviour would be re-establishing the connection. Error 2
is the same thing, but while the connection was actually doing
something, reconnecting and retrying is the correct behaviour. These
errors are not avoidable, if the node dies non-gracefully. A server
can for example lose power while doing the transaction, in this case
the transaction will be aborted, and the application will get one of
the errors described above. The application has to know that the data
is not written, since it didn't do commit or database didn't
acknowledge the commit.
On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <sla...@kaplonski.pl>
wrote:
Hello,
New questions below
---
Best regards
Sławek Kapłoński
sla...@kaplonski.pl
Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
Hello,
Is anyone here using openstack with mysql galera and haproxy? Have You
got
any problems with that?
I was today installed such ha infra for database (two mysql servers in
galera cluster and haproxy on controller and neutron node, this haproxy
is connecting to one of galera servers with round robin algorithm).
Generally all is working fine but I have few problems:
1. I have a lot of messages like:
WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
server
has gone away: (2006, 'MySQL server has gone away')
2. I have (most on neutron) many errors like:
OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
1 and 2 look like timeout issues. Check haproxy's timeouts. They need
to be just a little longer than MySQL's connection timeouts.
After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in neutron
and nova problem 1 looks that is solver. Unfortunatelly I found that when
I'm deleting port from neutron I still sometimes have got errors like in
2. I don't check exactly nova logs yet so I'm not sure is it only in
neutron or in both.
Do You maybe know why it happens in neutron? It not happend when I have
single mysql node without haproxy and galera so I suppose that haproxy or
galera is responsible for that problem :/
3. Also errors:
StaleDataError: UPDATE statement on table 'ports' expected to update 1
row(s); 0 were matched.
4. and errors:
DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
get
lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
3 and 4 are a known issue. Our code doesn't always retry transactions,
which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
work.
You can use ACTIVE/PASSIVE, and even do vertical partitioning where
one of the servers is ACTIVE for Nova, but another one is ACTIVE for
Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work hasn't
been done to make the concurrent transactions work properly.
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators