Thanks for the reply, I think it's related. However, after using a fork of
Flask-CQLAlchemy with postfork I'm still getting the NoHostAvailable error
once per 4k requests. One strange thing is the error rate doesn't increase
with the number of requests, since some uWSGI clients with ~20k requests
over the same time period have an error rate of once per 20k requests. Both
uWSGI hosts have the same number of worker processes.

*Flask-CQLAlchemy Fork with Patch:*

https://github.com/alanhamlett/flask-cqlalchemy/tree/a7e5c7c7cf0c51a19be98791dd4c47b72b97d9be

*Error Traceback seen after patch applied:*

Failed to create connection pool for new host 10.1.2.3:
Traceback (most recent call last):
  File "cassandra/cluster.py", line 2452, in
cassandra.cluster.Session.add_or_renew_pool.run_add_or_renew_pool
  File "cassandra/pool.py", line 332, in
cassandra.pool.HostConnection.__init__
  File "cassandra/cluster.py", line 1195, in
cassandra.cluster.Cluster.connection_factory
  File "cassandra/connection.py", line 341, in
cassandra.connection.Connection.factory
cassandra.OperationTimedOut: errors=Timed out creating connection (5
seconds), last_host=None
Traceback (most recent call last):
  File "./venv/lib/python3.4/site-packages/flask/app.py", line 1982, in
wsgi_app
    response = self.full_dispatch_request()
  File "./venv/lib/python3.4/site-packages/flask/app.py", line 1614, in
full_dispatch_request
    rv = self.handle_user_exception(e)
  File "./venv/lib/python3.4/site-packages/flask/app.py", line 1517, in
handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "./venv/lib/python3.4/site-packages/flask/_compat.py", line 33, in
reraise
    raise value
  File "./venv/lib/python3.4/site-packages/flask/app.py", line 1612, in
full_dispatch_request
    rv = self.dispatch_request()
  File "./venv/lib/python3.4/site-packages/flask/app.py", line 1598, in
dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./app/api_utils.py", line 876, in get_durations
    use_cassandra=use_cassandra,
  File "./venv/lib/python3.4/site-packages/datadog/dogstatsd/context.py",
line 53, in wrapped
    return func(*args, **kwargs)
  File "./app/api_utils.py", line 1339, in heartbeats_to_durations
    for heartbeat in heartbeats:
  File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 512, in __iter__
    self._execute_query()
  File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 469, in _execute_query
    self._result_generator = (i for i in
self._execute(self._select_query()))
  File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 401, in _execute
    result = _execute_statement(self.model, statement, self._consistency,
self._timeout, connection=connection)
  File "./venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
line 1505, in _execute_statement
    return conn.execute(s, params, timeout=timeout, connection=connection)
  File
"./venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
line 341, in execute
    result = conn.session.execute(query, params, timeout=timeout)
  File "cassandra/cluster.py", line 2122, in
cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 3982, in
cassandra.cluster.ResponseFuture.result
cassandra.cluster.NoHostAvailable: ('Unable to complete the operation
against any hosts', {})

On Sun, Dec 31, 2017 at 9:04 AM, Jeff Jirsa <jji...@gmail.com> wrote:

> uWSGI forks and the driver / cqlalchemy may need to reconnect or otherwise
> fix the state after each fork - you could try to prove this is the cause by
> checking uWSGI logs or ps for indication that a worker process has
> exited/been recycled. If you think it may be related to this, check out
> @postfork decorator
>
>
> --
> Jeff Jirsa
>
>
> On Dec 31, 2017, at 8:52 AM, Alan Hamlett <alan.haml...@gmail.com> wrote:
>
> More info: The NoHostAvailable error is happening at random times on each
> client host, so it's probably a client error. If the Cassandra cluster was
> really offline then all client hosts would report the error at the same
> time instead of different random times. The NoHostAvailable error occurs
> about once every 30 minutes, so most request call Model.create() without
> the error.
>
> On Sun, Dec 31, 2017 at 1:07 AM, Alan Hamlett <alan.haml...@gmail.com>
> wrote:
>
>> I'm seeing tracebacks in my Python Flask app when creating rows:
>>
>> Traceback (most recent call last):
>>   File "/opt/app/current/app/api.py", line 1174, in consume_heartbeat
>>     Heartbeat.create(**form_data)
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>  line 672, in create
>>     return cls.objects.create(**kwargs)
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>  line 977, in create
>>     .using(connection=self._connection) \
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/models.py",
>>  line 738, in save
>>     if_exists=self._if_exists).save()
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>  line 1476, in save
>>     self._execute(insert)
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>  line 1351, in _execute
>>     results = _execute_statement(self.model, statement, self._consistency, 
>> self._timeout, connection=connection)
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/query.py",
>>  line 1505, in _execute_statement
>>     return conn.execute(s, params, timeout=timeout, connection=connection)
>>   File 
>> "/opt/app/current/venv/lib/python3.4/site-packages/cassandra/cqlengine/connection.py",
>>  line 341, in execute
>>     result = conn.session.execute(query, params, timeout=timeout)
>>   File "cassandra/cluster.py", line 2122, in 
>> cassandra.cluster.Session.execute
>>   File "cassandra/cluster.py", line 3982, in 
>> cassandra.cluster.ResponseFuture.result
>> cassandra.cluster.NoHostAvailable: ('Unable to complete the operation 
>> against any hosts', {})
>>
>>
>> I'm using the cassandra-driver client library 3.12.0 via Flask-CQLAlchemy
>> 1.2.0 (https://github.com/thegeorgeous/flask-cqlalchemy) with uWSGI (
>> https://github.com/unbit/uwsgi).
>>
>> cassandra.cqlengine.connection.setup is being passed lazy_connect=True
>> and retry_connect=Truecassandra.cqlengine because lazy_connect=False
>> causes requests to timeout to the Flask app for some reason.
>>
>> Also seeing these errors in my uWSGI log file:
>>
>> [control connection] Error connecting to 10.1.2.3: Traceback (most recent 
>> call last): File "cassandra/cluster.py", line 2781, in 
>> cassandra.cluster.ControlConnection._reconnect_internal File 
>> "cassandra/cluster.py", line 2803, in 
>> cassandra.cluster.ControlConnection._try_connect File 
>> "cassandra/cluster.py", line 1195, in 
>> cassandra.cluster.Cluster.connection_factory File "cassandra/connection.py", 
>> line 341, in cassandra.connection.Connection.factory 
>> cassandra.OperationTimedOut: errors=Timed out creating connection (5 
>> seconds), last_host=None
>>
>>
>> What's causing these connection and timeout errors? Something related to
>> Flask-CQLAlchemy?
>>
>
>
>

Reply via email to