There is nothing in the system.log when the aggregation query fails.
Thanks for the Datastax clarification.
Thanks,
Dinesh.
On 12/24/2015 2:46 PM, DuyHai Doan wrote:
The exception stack trace at client side shows some issue with File
Permission. Try to look for the same error message in system.log to
chase down the root issue.
"Would trying the Datastax distribution offer any better chances?" -->
No, DSC is just a packaging of C* OSS
On Thu, Dec 24, 2015 at 7:07 AM, Dinesh Shanbhag
<dinesh.shanb...@isanasystems.com
<mailto:dinesh.shanb...@isanasystems.com>> wrote:
Even if aggregation that forces a full table scan across
partitions is not recommended, the message/exception does seems
unrelated to partitioning:
cqlsh:flightdata> select late_flights(uniquecarrier, depdel15) from
flightsbydate in ('2015-09-15', '2015-09-16',
'2015-09-17', '2015-09-18', '2015-09-19', '2015-09-20',
'2015-09-21');
Traceback (most recent call last):
File "CassandraInstall-3.1/bin/cqlsh.py", line 1258, in
perform_simple_statement
result = future.result()
File
"/home/wpl/CassandraInstall-3.1/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/cluster.py",
line 3122, in result
raise self._final_exception
FunctionFailure: code=1400 [User Defined Function failure]
message="execution of 'flightdata.state_late_flights[map<text,
frozen<tuple<int, int>>>, text, decimal]' failed:
java.security.AccessControlException: access denied
("java.io.FilePermission"
"/home/wpl/CassandraInstall-3.1/conf/logback.xml" "read")"
Is that right?
And note that this same aggregation query (on a subset of the
month's days) does complete successfully sometimes.
The behavior is similar with Cassandra 3.0 as well: on the same
set of days, the query sometimes succeeds, fails most times.
Would trying the Datastax distribution offer any better chances?
Thanks,
Dinesh.
On 12/24/2015 2:59 AM, DuyHai Doan wrote:
Thanks for the pointer on internal paging Tyler, I missed this
one. But then it raises some questions:
1. Is it possible to "tune" the page size or is it hard-coded
internally ?
2. Is read-repair performed on EACH page or is it done on the
whole requested rows once they are fetched ?
Question 2. is relevant in some particular scenarios when the
user is using CL QUORUM (or more) and some replicas are
out-of-sync. Even in the case of aggregation over a single
partition, if this partition is wide and spans many fetch
pages, the time the coordinator performs all the read-repair
and reconcile over QUORUM replicas, the query may timeout very
quickly.
On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs
<ty...@datastax.com <mailto:ty...@datastax.com>
<mailto:ty...@datastax.com <mailto:ty...@datastax.com>>> wrote:
On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan
<doanduy...@gmail.com <mailto:doanduy...@gmail.com>
<mailto:doanduy...@gmail.com
<mailto:doanduy...@gmail.com>>> wrote:
Cassandra will perform a full table scan and fetch all the
data in memory to apply the aggregate function.
Just to clarify for others on the list: when executing
aggregation
functions, Cassandra /will/ use paging internally, so at
most one
page worth of data will be held in memory at a time.
However, if
your aggregation function retains a large amount of data,
this may
contribute to heap pressure.
-- Tyler Hobbs
DataStax <http://datastax.com/>