Re: [python] [flask] [CQLAlchemy] NoHostAvailable on create

Alan Hamlett Tue, 02 Jan 2018 08:13:55 -0800

Still getting the NoHostAvailable with more hosts, just occurring less
frequently. Created a JIRA issue on the Python cassandra-driver tracker:
https://datastax-oss.atlassian.net/browse/PYTHON-891


On Mon, Jan 1, 2018 at 8:43 PM, Alan Hamlett <alan.haml...@gmail.com> wrote:

> Adding more nodes to the cluster fixed the error. Looks like a bug in
> python-driver connection pool:
>
> 1. The connection pool only has one host
> 2. A query times out, causing that connection to be removed from the pool
> 3. Another query executes, but there are no hosts in the pool
>
> On Mon, Jan 1, 2018 at 12:21 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Well the python driver you reference is a third party driver, because the
>> project doesn’t ship official drivers. You may have better luck looking for
>> a datastax driver support forum, or wait until after the holiday for more
>> people to be checking email.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Jan 1, 2018, at 12:14 PM, Alan Hamlett <alan.haml...@gmail.com> wrote:
>>
>> Still getting the cassandra.cluster.NoHostAvailable error periodically
>> from uWSGI hosts. Setting up the connection with postfork:
>> https://github.com/alanhamlett/flask-cqlalchemy/blob/653ed32
>> 98af7dd617a972e9f87437f6e53f741b9/flask_cqlalchemy/__init__.py#L56
>>
>> Lazy connection is False, Retry connection is True. Could this be a bug
>> in cassandra-driver's connection pooling?
>>
>> P.S. Blocking a web app when connection isn't available (default non-lazy
>> connect) is really bad. With a web app you want requests that don't depend
>> on Cassandra to complete, but cassandra-driver blocks all requests when
>> there's no Cassandra connection even if it's not needed for the current web
>> app's request. This design decision gives me very low confidence in the
>> Python cassandra-driver.
>>
>> On Sun, Dec 31, 2017 at 2:34 PM, Alan Hamlett <alan.haml...@gmail.com>
>> wrote:
>>
>>> Thanks for the reply, I think it's related. However, after using a fork
>>> of Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable
>>> error once per 4k requests. One strange thing is the error rate doesn't
>>> increase with the number of requests, since some uWSGI clients with ~20k
>>> requests over the same time period have an error rate of once per 20k
>>> requests. Both uWSGI hosts have the same number of worker processes.
>>>
>>> *Flask-CQLAlchemy Fork with Patch:*
>>>
>>> https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c
>>> 7cf0c51a19be98791dd4c47b72b97d9be
>>>
>>> *Error Traceback seen after patch applied:*
>>>
>>> Failed to create connection pool for new host 10.1.2.3:
>>> Traceback (most recent call last):
>>>   File "cassandra/cluster.py", line 2452, in
>>> cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
>>>   File "cassandra/pool.py", line 332, in cassandra.pool.HostConnection.
>>> __init__
>>>   File "cassandra/cluster.py", line 1195, in
>>> cassandra.cluster.Cluster.connection_factory
>>>   File "cassandra/connection.py", line 341, in
>>> cassandra.connection.Connection.factory
>>> cassandra.OperationTimedOut: errors=Timed out creating connection (5
>>> seconds), last_host=None
>>> Traceback (most recent call last):
>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in
>>> wsgi_app
>>>     response = self.full_dispatch_request()
>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
>>> full_dispatch_request
>>>     rv = self.handle_user_exception(e)
>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
>>> handle_user_exception
>>>     reraise(exc_type, exc_value, tb)
>>>   File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33,
>>> in reraise
>>>     raise value
>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
>>> full_dispatch_request
>>>     rv = self.dispatch_request()
>>>   File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in
>>> dispatch_request
>>>     return self.view_functions[rule.endpoint](**req.view_args)
>>>   File "./app/api_utils.py", line 876, in get_durations
>>>     use_cassandra=use_cassandra,
>>>   File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
>>> line 53, in wrapped
>>>     return func(*args, **kwargs)
>>>   File "./app/api_utils.py", line 1339, in heartbeats_to_durations
>>>     for heartbeat in heartbeats:
>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>> line 512, in __iter__
>>>     self._execute_query()
>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>> line 469, in _execute_query
>>>     self._result_generator = (i for i in self._execute(self._select_que
>>> ry()))
>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>> line 401, in _execute
>>>     result = _execute_statement(self.model, statement,
>>> self._consistency, self._timeout, connection=connection)
>>>   File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>> line 1505, in _execute_statement
>>>     return conn.execute(s, params, timeout=timeout,
>>> connection=connection)
>>>   File 
>>> "./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
>>> line 341, in execute
>>>     result = conn.session.execute(query, params, timeout=timeout)
>>>   File "cassandra/cluster.py", line 2122, in
>>> cassandra.cluster.Session.execute
>>>   File "cassandra/cluster.py", line 3982, in
>>> cassandra.cluster.ResponseFuture.result
>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
>>> against any hosts', {})
>>>
>>> On Sun, Dec 31, 2017 at 9:04 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> uWSGI forks and the driver / cqlalchemy may need to reconnect or
>>>> otherwise fix the state after each fork - you could try to prove this is
>>>> the cause by checking uWSGI logs or ps for indication that a worker process
>>>> has exited/been recycled. If you think it may be related to this, check out
>>>> @postfork decorator
>>>>
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Dec 31, 2017, at 8:52 AM, Alan Hamlett <alan.haml...@gmail.com>
>>>> wrote:
>>>>
>>>> More info: The NoHostAvailable error is happening at random times on
>>>> each client host, so it's probably a client error. If the Cassandra cluster
>>>> was really offline then all client hosts would report the error at the same
>>>> time instead of different random times. The NoHostAvailable error occurs
>>>> about once every 30 minutes, so most request call Model.create() without
>>>> the error.
>>>>
>>>> On Sun, Dec 31, 2017 at 1:07 AM, Alan Hamlett <alan.haml...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm seeing tracebacks in my Python Flask app when creating rows:
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
>>>>>     Heartbeat.create(**form_data)
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>>>>  line 672, in create
>>>>>     return cls.objects.create(**kwargs)
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>  line 977, in create
>>>>>     .using(connection=self._connection) \
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>>>>  line 738, in save
>>>>>     if_exists=self._if_exists).save()
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>  line 1476, in save
>>>>>     self._execute(insert)
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>  line 1351, in _execute
>>>>>     results = _execute_statement(self.model, statement, 
>>>>> self._consistency, self._timeout, connection=connection)
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>>>>  line 1505, in _execute_statement
>>>>>     return conn.execute(s, params, timeout=timeout, connection=connection)
>>>>>   File 
>>>>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
>>>>>  line 341, in execute
>>>>>     result = conn.session.execute(query, params, timeout=timeout)
>>>>>   File "cassandra/cluster.py", line 2122, in 
>>>>> cassandra.cluster.Session.execute
>>>>>   File "cassandra/cluster.py", line 3982, in 
>>>>> cassandra.cluster.ResponseFuture.result
>>>>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation 
>>>>> against any hosts', {})
>>>>>
>>>>>
>>>>> I'm using the cassandra-driver client library 3.12.0 via
>>>>> Flask-CQLAlchemy 1.2.0 (https://github.com/thegeorgeo
>>>>> us/flask-cqlalchemy) with uWSGI (https://github.com/unbit/uwsgi).
>>>>>
>>>>> cassandra.cqlengine.connection.setup is being passed
>>>>> lazy_connect=True and retry_connect=Truecassandra.cqlengine because
>>>>> lazy_connect=False causes requests to timeout to the Flask app for some
>>>>> reason.
>>>>>
>>>>> Also seeing these errors in my uWSGI log file:
>>>>>
>>>>> [control connection] Error connecting to 10.1.2.3: Traceback (most recent 
>>>>> call last): File "cassandra/cluster.py", line 2781, in 
>>>>> cassandra.cluster.ControlConnection._reconnect_internal File 
>>>>> "cassandra/cluster.py", line 2803, in 
>>>>> cassandra.cluster.ControlConnection._try_connect File 
>>>>> "cassandra/cluster.py", line 1195, in 
>>>>> cassandra.cluster.Cluster.connection_factory File 
>>>>> "cassandra/connection.py", line 341, in 
>>>>> cassandra.connection.Connection.factory cassandra.OperationTimedOut: 
>>>>> errors=Timed out creating connection (5 seconds), last_host=None
>>>>>
>>>>>
>>>>> What's causing these connection and timeout errors? Something related
>>>>> to Flask-CQLAlchemy?
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Alan Hamlett
>> ahamlett.com
>>
>>
>
>
> --
> Alan Hamlett
> ahamlett.com
>



-- 
Alan Hamlett
ahamlett.com

Re: [python] [flask] [CQLAlchemy] NoHostAvailable on create

Reply via email to