Update: Still getting the NoHostAvailable periodically in client logs. Also seeing these INFO and WARN messages in
/var/log/cassandra/system.log INFO [epollEventLoopGroup-2-5] 2018-01-06 01:39:02,412 Message.java:623 - Unexpected exception during request; channel = [id: 0xae99b597, L:/10.1.2.3:9042 - R:/10.1.2.12:54720] io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] WARN [ReadStage-1] 2018-01-06 01:39:24,350 ReadCommand.java:533 - Read 344 live rows and 2074 tombstone cells for query SELECT * FROM keyspace.heartbeat WHERE user_id = 66b6796d-eb84-4bb9-b9d2-8dc882f4c6ac AND time >= 1515225599 AND time <= 1515139200 ORDER BY (time ASC) LIMIT 5000 (see tombstone_warn_threshold) On Tue, Jan 2, 2018 at 8:13 AM, Alan Hamlett <alan.haml...@gmail.com> wrote: > Still getting the NoHostAvailable with more hosts, just occurring less > frequently. Created a JIRA issue on the Python cassandra-driver tracker: > https://datastax-oss.atlassian.net/browse/PYTHON-891 > > On Mon, Jan 1, 2018 at 8:43 PM, Alan Hamlett <alan.haml...@gmail.com> > wrote: > >> Adding more nodes to the cluster fixed the error. Looks like a bug in >> python-driver connection pool: >> >> 1. The connection pool only has one host >> 2. A query times out, causing that connection to be removed from the pool >> 3. Another query executes, but there are no hosts in the pool >> >> On Mon, Jan 1, 2018 at 12:21 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> >>> Well the python driver you reference is a third party driver, because >>> the project doesn’t ship official drivers. You may have better luck looking >>> for a datastax driver support forum, or wait until after the holiday for >>> more people to be checking email. >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Jan 1, 2018, at 12:14 PM, Alan Hamlett <alan.haml...@gmail.com> >>> wrote: >>> >>> Still getting the cassandra.cluster.NoHostAvailable error periodically >>> from uWSGI hosts. Setting up the connection with postfork: >>> https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32 >>> 98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56 >>> >>> Lazy connection is False, Retry connection is True. Could this be a bug >>> in cassandra-driver's connection pooling? >>> >>> P.S. Blocking a web app when connection isn't available (default >>> non-lazy connect) is really bad. With a web app you want requests that >>> don't depend on Cassandra to complete, but cassandra-driver blocks all >>> requests when there's no Cassandra connection even if it's not needed for >>> the current web app's request. This design decision gives me very low >>> confidence in the Python cassandra-driver. >>> >>> On Sun, Dec 31, 2017 at 2:34 PM, Alan Hamlett <alan.haml...@gmail.com> >>> wrote: >>> >>>> Thanks for the reply, I think it's related. However, after using a fork >>>> of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable >>>> error once per 4k requests. One strange thing is the error rate doesn't >>>> increase with the number of requests, since some uWSGI clients with ~20k >>>> requests over the same time period have an error rate of once per 20k >>>> requests. Both uWSGI hosts have the same number of worker processes. >>>> >>>> *Flask-CQLAlchemy Fork with Patch:* >>>> >>>> https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c >>>> 7cf0c51a19be98791dd4c47b72b97d9be >>>> >>>> *Error Traceback seen after patch applied:* >>>> >>>> Failed to create connection pool for new host 10.1.2.3: >>>> Traceback (most recent call last): >>>> File "cassandra/cluster.py", line 2452, in >>>> cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool >>>> File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection. >>>> __init__ >>>> File "cassandra/cluster.py", line 1195, in >>>> cassandra.cluster.Cluster.connection_factory >>>> File "cassandra/connection.py", line 341, in >>>> cassandra.connection.Connection.factory >>>> cassandra.OperationTimedOut: errors=Timed out creating connection (5 >>>> seconds), last_host=None >>>> Traceback (most recent call last): >>>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, >>>> in wsgi_app >>>> response = self.full_dispatch_request() >>>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, >>>> in full_dispatch_request >>>> rv = self.handle_user_exception(e) >>>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, >>>> in handle_user_exception >>>> reraise(exc_type, exc_value, tb) >>>> File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, >>>> in reraise >>>> raise value >>>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, >>>> in full_dispatch_request >>>> rv = self.dispatch_request() >>>> File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, >>>> in dispatch_request >>>> return self.view_functions[rule.endpoint](**req.view_args) >>>> File "./app/api_utils.py", line 876, in get_durations >>>> use_cassandra=use_cassandra, >>>> File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py", >>>> line 53, in wrapped >>>> return func(*args, **kwargs) >>>> File "./app/api_utils.py", line 1339, in heartbeats_to_durations >>>> for heartbeat in heartbeats: >>>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>> line 512, in __iter__ >>>> self._execute_query() >>>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>> line 469, in _execute_query >>>> self._result_generator = (i for i in self._execute(self._select_que >>>> ry())) >>>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>> line 401, in _execute >>>> result = _execute_statement(self.model, statement, >>>> self._consistency, self._timeout, connection=connection) >>>> File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>> line 1505, in _execute_statement >>>> return conn.execute(s, params, timeout=timeout, >>>> connection=connection) >>>> File >>>> "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", >>>> line 341, in execute >>>> result = conn.session.execute(query, params, timeout=timeout) >>>> File "cassandra/cluster.py", line 2122, in >>>> cassandra.cluster.Session.execute >>>> File "cassandra/cluster.py", line 3982, in >>>> cassandra.cluster.ResponseFuture.result >>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation >>>> against any hosts', {}) >>>> >>>> On Sun, Dec 31, 2017 at 9:04 AM, Jeff Jirsa <jji...@gmail.com> wrote: >>>> >>>>> uWSGI forks and the driver / cqlalchemy may need to reconnect or >>>>> otherwise fix the state after each fork - you could try to prove this is >>>>> the cause by checking uWSGI logs or ps for indication that a worker >>>>> process >>>>> has exited/been recycled. If you think it may be related to this, check >>>>> out >>>>> @postfork decorator >>>>> >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>> On Dec 31, 2017, at 8:52 AM, Alan Hamlett <alan.haml...@gmail.com> >>>>> wrote: >>>>> >>>>> More info: The NoHostAvailable error is happening at random times on >>>>> each client host, so it's probably a client error. If the Cassandra >>>>> cluster >>>>> was really offline then all client hosts would report the error at the >>>>> same >>>>> time instead of different random times. The NoHostAvailable error occurs >>>>> about once every 30 minutes, so most request call Model.create() without >>>>> the error. >>>>> >>>>> On Sun, Dec 31, 2017 at 1:07 AM, Alan Hamlett <alan.haml...@gmail.com> >>>>> wrote: >>>>> >>>>>> I'm seeing tracebacks in my Python Flask app when creating rows: >>>>>> >>>>>> Traceback (most recent call last): >>>>>> File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat >>>>>> Heartbeat.create(**form_data) >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", >>>>>> line 672, in create >>>>>> return cls.objects.create(**kwargs) >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>>> line 977, in create >>>>>> .using(connection=self._connection) \ >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py", >>>>>> line 738, in save >>>>>> if_exists=self._if_exists).save() >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>>> line 1476, in save >>>>>> self._execute(insert) >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>>> line 1351, in _execute >>>>>> results = _execute_statement(self.model, statement, >>>>>> self._consistency, self._timeout, connection=connection) >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py", >>>>>> line 1505, in _execute_statement >>>>>> return conn.execute(s, params, timeout=timeout, >>>>>> connection=connection) >>>>>> File >>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py", >>>>>> line 341, in execute >>>>>> result = conn.session.execute(query, params, timeout=timeout) >>>>>> File "cassandra/cluster.py", line 2122, in >>>>>> cassandra.cluster.Session.execute >>>>>> File "cassandra/cluster.py", line 3982, in >>>>>> cassandra.cluster.ResponseFuture.result >>>>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation >>>>>> against any hosts', {}) >>>>>> >>>>>> >>>>>> I'm using the cassandra-driver client library 3.12.0 via >>>>>> Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo >>>>>> us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi). >>>>>> >>>>>> cassandra.cqlengine.connection.setup is being passed >>>>>> lazy_connect=True and retry_connect=Truecassandra.cqlengine because >>>>>> lazy_connect=False causes requests to timeout to the Flask app for some >>>>>> reason. >>>>>> >>>>>> Also seeing these errors in my uWSGI log file: >>>>>> >>>>>> [control connection] Error connecting to 10.1.2.3: Traceback (most >>>>>> recent call last): File "cassandra/cluster.py", line 2781, in >>>>>> cassandra.cluster.ControlConnection._reconnect_internal File >>>>>> "cassandra/cluster.py", line 2803, in >>>>>> cassandra.cluster.ControlConnection._try_connect File >>>>>> "cassandra/cluster.py", line 1195, in >>>>>> cassandra.cluster.Cluster.connection_factory File >>>>>> "cassandra/connection.py", line 341, in >>>>>> cassandra.connection.Connection.factory cassandra.OperationTimedOut: >>>>>> errors=Timed out creating connection (5 seconds), last_host=None >>>>>> >>>>>> >>>>>> What's causing these connection and timeout errors? Something related >>>>>> to Flask-CQLAlchemy? >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Alan Hamlett >>> ahamlett.com >>> >>> >> >> >> -- >> Alan Hamlett >> ahamlett.com >> > > > > -- > Alan Hamlett > ahamlett.com > -- Alan Hamlett ahamlett.com