Still getting the NoHostAvailable with more hosts, just occurring less frequently. Created a JIRA issue on the Python cassandra-driver tracker: https://datastax-oss.atlassian.net/browse/PYTHON-891
On Mon, Jan 1, 2018 at 8:43 PM, Alan Hamlett <alan.haml...@gmail.com> wrote: > Adding more nodes to the cluster fixed the error. Looks like a bug in > python-driver connection pool: > > 1. The connection pool only has one host > 2. A query times out, causing that connection to be removed from the pool > 3. Another query executes, but there are no hosts in the pool > > On Mon, Jan 1, 2018 at 12:21 PM, Jeff Jirsa <jji...@gmail.com> wrote: > >> Well the python driver you reference is a third party driver, because the >> project doesn’t ship official drivers. You may have better luck looking for >> a datastax driver support forum, or wait until after the holiday for more >> people to be checking email. >> >> >> -- >> Jeff Jirsa >> >> >> On Jan 1, 2018, at 12:14 PM, Alan Hamlett <alan.haml...@gmail.com> wrote: >> >> Still getting the cassandra.cluster.NoHostAvailable error periodically >> from uWSGI hosts. Setting up the connection with postfork: >> https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32 >> 98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56 >> >> Lazy connection is False, Retry connection is True. Could this be a bug >> in cassandra-driver's connection pooling? >> >> P.S. Blocking a web app when connection isn't available (default non-lazy >> connect) is really bad. With a web app you want requests that don't depend >> on Cassandra to complete, but cassandra-driver blocks all requests when >> there's no Cassandra connection even if it's not needed for the current web >> app's request. This design decision gives me very low confidence in the >> Python cassandra-driver. >> >> On Sun, Dec 31, 2017 at 2:34 PM, Alan Hamlett <alan.haml...@gmail.com> >> wrote: >> >>> Thanks for the reply, I think it's related. However, after using a fork >>> of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable >>> error once per 4k requests. One strange thing is the error rate doesn't >>> increase with the number of requests, since some uWSGI clients with ~20k >>> requests over the same time period have an error rate of once per 20k >>> requests. Both uWSGI hosts have the same number of worker processes. >>> >>> *Flask-CQLAlchemy Fork with Patch:* >>> >>> https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c >>> 7cf0c51a19be98791dd4c47b72b97d9be >>> >>> *Error Traceback seen after patch applied:* >>> >>> Failed to create connection pool for new host 10.1.2.3: >>> Traceback (most recent call last): >>> File "cassandra/cluster.py", line 2452, in >>> cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool >>> File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection. >>> __init__ >>> File "cassandra/cluster.py", line 1195, in >>> cassandra.cluster.Cluster.connection_factory >>> File "cassandra/connection.py", line 341, in >>> cassandra.connection.Connection.factory >>> cassandra.OperationTimedOut: errors=Timed out creating connection (5 >>> seconds), last_host=None >>> Traceback (most recent call last): >>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in >>> wsgi_app >>> response = self.full_dispatch_request() >>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in >>> full_dispatch_request >>> rv = self.handle_user_exception(e) >>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in >>> handle_user_exception >>> reraise(exc_type, exc_value, tb) >>> File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, >>> in reraise >>> raise value >>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in >>> full_dispatch_request >>> rv = self.dispatch_request() >>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in >>> dispatch_request >>> return self.view_functions[rule.endpoint](**req.view_args) >>> File "./app/api_utils.py", line 876, in get_durations >>> use_cassandra=use_cassandra, >>> File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py", >>> line 53, in wrapped >>> return func(*args, **kwargs) >>> File "./app/api_utils.py", line 1339, in heartbeats_to_durations >>> for heartbeat in heartbeats: >>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>> line 512, in __iter__ >>> self._execute_query() >>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>> line 469, in _execute_query >>> self._result_generator = (i for i in self._execute(self._select_que >>> ry())) >>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>> line 401, in _execute >>> result = _execute_statement(self.model, statement, >>> self._consistency, self._timeout, connection=connection) >>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>> line 1505, in _execute_statement >>> return conn.execute(s, params, timeout=timeout, >>> connection=connection) >>> File >>> "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", >>> line 341, in execute >>> result = conn.session.execute(query, params, timeout=timeout) >>> File "cassandra/cluster.py", line 2122, in >>> cassandra.cluster.Session.execute >>> File "cassandra/cluster.py", line 3982, in >>> cassandra.cluster.ResponseFuture.result >>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation >>> against any hosts', {}) >>> >>> On Sun, Dec 31, 2017 at 9:04 AM, Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> uWSGI forks and the driver / cqlalchemy may need to reconnect or >>>> otherwise fix the state after each fork - you could try to prove this is >>>> the cause by checking uWSGI logs or ps for indication that a worker process >>>> has exited/been recycled. If you think it may be related to this, check out >>>> @postfork decorator >>>> >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Dec 31, 2017, at 8:52 AM, Alan Hamlett <alan.haml...@gmail.com> >>>> wrote: >>>> >>>> More info: The NoHostAvailable error is happening at random times on >>>> each client host, so it's probably a client error. If the Cassandra cluster >>>> was really offline then all client hosts would report the error at the same >>>> time instead of different random times. The NoHostAvailable error occurs >>>> about once every 30 minutes, so most request call Model.create() without >>>> the error. >>>> >>>> On Sun, Dec 31, 2017 at 1:07 AM, Alan Hamlett <alan.haml...@gmail.com> >>>> wrote: >>>> >>>>> I'm seeing tracebacks in my Python Flask app when creating rows: >>>>> >>>>> Traceback (most recent call last): >>>>> File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat >>>>> Heartbeat.create(**form_data) >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", >>>>> line 672, in create >>>>> return cls.objects.create(**kwargs) >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>> line 977, in create >>>>> .using(connection=self._connection) \ >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", >>>>> line 738, in save >>>>> if_exists=self._if_exists).save() >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>> line 1476, in save >>>>> self._execute(insert) >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>> line 1351, in _execute >>>>> results = _execute_statement(self.model, statement, >>>>> self._consistency, self._timeout, connection=connection) >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>> line 1505, in _execute_statement >>>>> return conn.execute(s, params, timeout=timeout, connection=connection) >>>>> File >>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", >>>>> line 341, in execute >>>>> result = conn.session.execute(query, params, timeout=timeout) >>>>> File "cassandra/cluster.py", line 2122, in >>>>> cassandra.cluster.Session.execute >>>>> File "cassandra/cluster.py", line 3982, in >>>>> cassandra.cluster.ResponseFuture.result >>>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation >>>>> against any hosts', {}) >>>>> >>>>> >>>>> I'm using the cassandra-driver client library 3.12.0 via >>>>> Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo >>>>> us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi). >>>>> >>>>> cassandra.cqlengine.connection.setup is being passed >>>>> lazy_connect=True and retry_connect=Truecassandra.cqlengine because >>>>> lazy_connect=False causes requests to timeout to the Flask app for some >>>>> reason. >>>>> >>>>> Also seeing these errors in my uWSGI log file: >>>>> >>>>> [control connection] Error connecting to 10.1.2.3: Traceback (most recent >>>>> call last): File "cassandra/cluster.py", line 2781, in >>>>> cassandra.cluster.ControlConnection._reconnect_internal File >>>>> "cassandra/cluster.py", line 2803, in >>>>> cassandra.cluster.ControlConnection._try_connect File >>>>> "cassandra/cluster.py", line 1195, in >>>>> cassandra.cluster.Cluster.connection_factory File >>>>> "cassandra/connection.py", line 341, in >>>>> cassandra.connection.Connection.factory cassandra.OperationTimedOut: >>>>> errors=Timed out creating connection (5 seconds), last_host=None >>>>> >>>>> >>>>> What's causing these connection and timeout errors? Something related >>>>> to Flask-CQLAlchemy? >>>>> >>>> >>>> >>>> >>> >> >> >> -- >> Alan Hamlett >> ahamlett.com >> >> > > > -- > Alan Hamlett > ahamlett.com > -- Alan Hamlett ahamlett.com