Update: Still getting the NoHostAvailable periodically in client logs.

Also seeing these INFO and WARN messages in

/var/log/cassandra/system.log

INFO  [epollEventLoopGroup-2-5] 2018-01-06 01:39:02,412
Message.java:623 - Unexpected exception during request; channel = [id:
0xae99b597, L:/10.1.2.3:9042 - R:/10.1.2.12:54720]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
failed: Connection reset by peer
        at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
WARN  [ReadStage-1] 2018-01-06 01:39:24,350 ReadCommand.java:533 -
Read 344 live rows and 2074 tombstone cells for query SELECT * FROM
keyspace.heartbeat WHERE user_id =
66b6796d-eb84-4bb9-b9d2-8dc882f4c6ac AND time >= 1515225599 AND time
<= 1515139200 ORDER BY (time ASC) LIMIT 5000 (see
tombstone_warn_threshold)


On Tue, Jan 2, 2018 at 8:13 AM, Alan Hamlett <alan.haml...@gmail.com> wrote:

> Still getting the NoHostAvailable with more hosts, just occurring less
> frequently. Created a JIRA issue on the Python cassandra-driver tracker:
> https://datastax-oss.atlassian.net/browse/PYTHON-891
>
> On Mon, Jan 1, 2018 at 8:43 PM, Alan Hamlett <alan.haml...@gmail.com>
> wrote:
>
>> Adding more nodes to the cluster fixed the error. Looks like a bug in
>> python-driver connection pool:
>>
>> 1. The connection pool only has one host
>> 2. A query times out, causing that connection to be removed from the pool
>> 3. Another query executes, but there are no hosts in the pool
>>
>> On Mon, Jan 1, 2018 at 12:21 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Well the python driver you reference is a third party driver, because
>>> the project doesn’t ship official drivers. You may have better luck looking
>>> for a datastax driver support forum, or wait until after the holiday for
>>> more people to be checking email.
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Jan 1, 2018, at 12:14 PM, Alan Hamlett <alan.haml...@gmail.com>
>>> wrote:
>>>
>>> Still getting the cassandra.cluster.NoHostAvailable error periodically
>>> from uWSGI hosts. Setting up the connection with postfork:
>>> https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32
>>> 98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
>>>
>>> Lazy connection is False, Retry connection is True. Could this be a bug
>>> in cassandra-driver's connection pooling?
>>>
>>> P.S. Blocking a web app when connection isn't available (default
>>> non-lazy connect) is really bad. With a web app you want requests that
>>> don't depend on Cassandra to complete, but cassandra-driver blocks all
>>> requests when there's no Cassandra connection even if it's not needed for
>>> the current web app's request. This design decision gives me very low
>>> confidence in the Python cassandra-driver.
>>>
>>> On Sun, Dec 31, 2017 at 2:34 PM, Alan Hamlett <alan.haml...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the reply, I think it's related. However, after using a fork
>>>> of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable
>>>> error once per 4k requests. One strange thing is the error rate doesn't
>>>> increase with the number of requests, since some uWSGI clients with ~20k
>>>> requests over the same time period have an error rate of once per 20k
>>>> requests. Both uWSGI hosts have the same number of worker processes.
>>>>
>>>> *Flask-CQLAlchemy Fork with Patch:*
>>>>
>>>> https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c
>>>> 7cf0c51a19be98791dd4c47b72b97d9be
>>>>
>>>> *Error Traceback seen after patch applied:*
>>>>
>>>> Failed to create connection pool for new host 10.1.2.3:
>>>> Traceback (most recent call last):
>>>>   File "cassandra/cluster.py", line 2452, in
>>>> cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
>>>>   File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
>>>> __init__
>>>>   File "cassandra/cluster.py", line 1195, in
>>>> cassandra.cluster.Cluster.connection_factory
>>>>   File "cassandra/connection.py", line 341, in
>>>> cassandra.connection.Connection.factory
>>>> cassandra.OperationTimedOut: errors=Timed out creating connection (5
>>>> seconds), last_host=None
>>>> Traceback (most recent call last):
>>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982,
>>>> in wsgi_app
>>>>     response = self.full_dispatch_request()
>>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614,
>>>> in full_dispatch_request
>>>>     rv = self.handle_user_exception(e)
>>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517,
>>>> in handle_user_exception
>>>>     reraise(exc_type, exc_value, tb)
>>>>   File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33,
>>>> in reraise
>>>>     raise value
>>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612,
>>>> in full_dispatch_request
>>>>     rv = self.dispatch_request()
>>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598,
>>>> in dispatch_request
>>>>     return self.view_functions[rule.endpoint](**req.view_args)
>>>>   File "./app/api_utils.py", line 876, in get_durations
>>>>     use_cassandra=use_cassandra,
>>>>   File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
>>>> line 53, in wrapped
>>>>     return func(*args, **kwargs)
>>>>   File "./app/api_utils.py", line 1339, in heartbeats_to_durations
>>>>     for heartbeat in heartbeats:
>>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>> line 512, in __iter__
>>>>     self._execute_query()
>>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>> line 469, in _execute_query
>>>>     self._result_generator = (i for i in self._execute(self._select_que
>>>> ry()))
>>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>> line 401, in _execute
>>>>     result = _execute_statement(self.model, statement,
>>>> self._consistency, self._timeout, connection=connection)
>>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>> line 1505, in _execute_statement
>>>>     return conn.execute(s, params, timeout=timeout,
>>>> connection=connection)
>>>>   File 
>>>> "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
>>>> line 341, in execute
>>>>     result = conn.session.execute(query, params, timeout=timeout)
>>>>   File "cassandra/cluster.py", line 2122, in
>>>> cassandra.cluster.Session.execute
>>>>   File "cassandra/cluster.py", line 3982, in
>>>> cassandra.cluster.ResponseFuture.result
>>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
>>>> against any hosts', {})
>>>>
>>>> On Sun, Dec 31, 2017 at 9:04 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>
>>>>> uWSGI forks and the driver / cqlalchemy may need to reconnect or
>>>>> otherwise fix the state after each fork - you could try to prove this is
>>>>> the cause by checking uWSGI logs or ps for indication that a worker 
>>>>> process
>>>>> has exited/been recycled. If you think it may be related to this, check 
>>>>> out
>>>>> @postfork decorator
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> On Dec 31, 2017, at 8:52 AM, Alan Hamlett <alan.haml...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> More info: The NoHostAvailable error is happening at random times on
>>>>> each client host, so it's probably a client error. If the Cassandra 
>>>>> cluster
>>>>> was really offline then all client hosts would report the error at the 
>>>>> same
>>>>> time instead of different random times. The NoHostAvailable error occurs
>>>>> about once every 30 minutes, so most request call Model.create() without
>>>>> the error.
>>>>>
>>>>> On Sun, Dec 31, 2017 at 1:07 AM, Alan Hamlett <alan.haml...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm seeing tracebacks in my Python Flask app when creating rows:
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>   File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
>>>>>>     Heartbeat.create(**form_data)
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>>>>>  line 672, in create
>>>>>>     return cls.objects.create(**kwargs)
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>>  line 977, in create
>>>>>>     .using(connection=self._connection) \
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>>>>>  line 738, in save
>>>>>>     if_exists=self._if_exists).save()
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>>  line 1476, in save
>>>>>>     self._execute(insert)
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>>  line 1351, in _execute
>>>>>>     results = _execute_statement(self.model, statement, 
>>>>>> self._consistency, self._timeout, connection=connection)
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>>  line 1505, in _execute_statement
>>>>>>     return conn.execute(s, params, timeout=timeout, 
>>>>>> connection=connection)
>>>>>>   File 
>>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
>>>>>>  line 341, in execute
>>>>>>     result = conn.session.execute(query, params, timeout=timeout)
>>>>>>   File "cassandra/cluster.py", line 2122, in 
>>>>>> cassandra.cluster.Session.execute
>>>>>>   File "cassandra/cluster.py", line 3982, in 
>>>>>> cassandra.cluster.ResponseFuture.result
>>>>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation 
>>>>>> against any hosts', {})
>>>>>>
>>>>>>
>>>>>> I'm using the cassandra-driver client library 3.12.0 via
>>>>>> Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo
>>>>>> us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
>>>>>>
>>>>>> cassandra.cqlengine.connection.setup is being passed
>>>>>> lazy_connect=True and retry_connect=Truecassandra.cqlengine because
>>>>>> lazy_connect=False causes requests to timeout to the Flask app for some
>>>>>> reason.
>>>>>>
>>>>>> Also seeing these errors in my uWSGI log file:
>>>>>>
>>>>>> [control connection] Error connecting to 10.1.2.3: Traceback (most 
>>>>>> recent call last): File "cassandra/cluster.py", line 2781, in 
>>>>>> cassandra.cluster.ControlConnection._reconnect_internal File 
>>>>>> "cassandra/cluster.py", line 2803, in 
>>>>>> cassandra.cluster.ControlConnection._try_connect File 
>>>>>> "cassandra/cluster.py", line 1195, in 
>>>>>> cassandra.cluster.Cluster.connection_factory File 
>>>>>> "cassandra/connection.py", line 341, in 
>>>>>> cassandra.connection.Connection.factory cassandra.OperationTimedOut: 
>>>>>> errors=Timed out creating connection (5 seconds), last_host=None
>>>>>>
>>>>>>
>>>>>> What's causing these connection and timeout errors? Something related
>>>>>> to Flask-CQLAlchemy?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Alan Hamlett
>>> ahamlett.com
>>>
>>>
>>
>>
>> --
>> Alan Hamlett
>> ahamlett.com
>>
>
>
>
> --
> Alan Hamlett
> ahamlett.com
>



-- 
Alan Hamlett
ahamlett.com

Reply via email to