Re: Error during select query - Found other issues with cluster too

Dipan Shah Wed, 20 Dec 2017 03:20:53 -0800

Hello Nicolas,


Here's our data model:


  *   CREATE TABLE hhahistory.history (

     *   tablename text,

     *   columnname text,

     *   tablekey bigint,

     *   updateddate timestamp,

     *   dateyearpart bigint,

     *   historyid bigint,

     *   appname text,

     *   audittype text,

     *   createddate timestamp,

     *   dbsession uuid,

     *   firstname text,

     *   historybatch uuid,

     *   historycassandraid uuid,

     *   hostname text,

     *   isvlm boolean,

     *   lastname text,

     *   loginname text,

     *   newvalue text,

     *   notes text,

     *   oldvalue text,

     *   reason text,

     *   updatedby text,

     *   updatedutcdate timestamp,

     *   dbname text,

     *   PRIMARY KEY (( tablename, columnname,dateyearpart ), tablekey, 
updateddate, historyid));


We are using this to store audit data of our primary SQL Server DB. Our primary 
key consists of the original table name, column name and the month+year 
combination.


I just realized that a script had managed to sneak in more than 100 million 
rows on the same day so that might me the reason for all this data going into 
the same partition. I'll see if I can do something about this.


Thanks,

Dipan Shah


________________________________
From: Nicolas Guyomar <[email protected]>
Sent: Wednesday, December 20, 2017 2:48 PM
To: [email protected]
Subject: Re: Error during select query - Found other issues with cluster too

Hi Dipan,

This seems like a really unbalanced modelisation, you have some very wide rows !

Can you share your model and explain a bit what you are storing in this table ? 
Your partition key might not be appropriate

On 20 December 2017 at 09:43, Dipan Shah 
<[email protected]<mailto:[email protected]>> wrote:

Hello Kurt,


I think I might have found the problem:


Can you please look at the tablehistogram for a table and see if that seems to 
be the problem? I think the Max Partition Size and Cell Count are too high:


Percentile      SSTables        Write Latency (micros)  Read Latency (micros)   
Partition Size (bytes)  Cell Count
50.00%  0.00    0.00    0.00    29521   2299
75.00%  0.00    0.00    0.00    379022  29521
95.00%  0.00    0.00    0.00    5839588 454826
98.00%  0.00    0.00    0.00    30130992        2346799
99.00%  0.00    0.00    0.00    89970660        7007506
Min     0.00    0.00    0.00    150     0
Max     0.00    0.00    0.00    53142810146     1996099046



Thanks,

Dipan Shah


________________________________
From: Dipan Shah <[email protected]<mailto:[email protected]>>
Sent: Wednesday, December 20, 2017 12:04 PM
To: User
Subject: Re: Error during select query - Found other issues with cluster too


Hello Kurt,


We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.


Repair command failure error:


INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042<http://10.10.52.17:9042> ! 
R:/10.10.55.229:58712<http://10.10.55.229:58712>]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042<http://10.10.52.17:9042> ! 
R:/10.10.55.229:58130<http://10.10.55.229:58130>]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer


Node failure error:


ERROR [STREAM-IN-/10.10.52.22:7000<http://10.10.52.22:7000>] 2017-12-20 
01:17:17,691 JVMStabilityInspector.java:142 - JVM state determined to be 
unstable.  Exiting forcefully due to:
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]
at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[na:1.8.0_131]
at java.io.FileOutputStream.<init>(FileOutputStream.java:101) ~[na:1.8.0_131]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:364)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.SimpleSSTableMultiWriter.finish(SimpleSSTableMultiWriter.java:59)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.RangeAwareSSTableWriter.finish(RangeAwareSSTableWriter.java:129)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamReceiveTask.received(StreamReceiveTask.java:110)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:656) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]




Thanks,

Dipan Shah


________________________________
From: kurt greaves <[email protected]<mailto:[email protected]>>
Sent: Wednesday, December 20, 2017 2:23 AM
To: User
Subject: Re: Error during select query

Can you send through the full stack trace as reported in the Cassandra logs? 
Also, what version are you running?

On 19 Dec. 2017 9:23 pm, "Dipan Shah" 
<[email protected]<mailto:[email protected]>> wrote:

Hello,


I am getting an error message when I'm running a select query from 1 particular 
node. The error is "ServerError: java.lang.IllegalStateException: Unable to 
compute ceiling for max when histogram overflowed".


Has anyone faced this error earlier? I tried to search for this but did not get 
anything that matches my scenario.


Please note, I do not get this error when I run the same query from any other 
node. And I'm connecting to the node using cqlsh.


Thanks,

Dipan Shah

Re: Error during select query - Found other issues with cluster too

Reply via email to