Hi list,

Would appreciate some insight into some irregular performance we're seeing.

We have a column family that has become problematic recently. We've noticed
a few queries take enormous amounts of time, and seem to clog up read
resources on the machine (read pending tasks pile up, then immediately are
relieved).

I've included the output of cfhistograms on this keyspace[1]. The latencies
sampled do not include one of these problematic partitions, but show two
things: 1) the vast majority of queries to this table seem to be healthy,
and 2) that the maximum partition size is absurd (4139110981 bytes).

This particular cf is not expected to be updated beyond an initial set of
writes, but can be read many times. The data model includes several hashes
that amount to a few KB at most, a set<ascii> that can hit ~30-40 entries,
and three list<int> that reach a hundred or so entries at most. There
doesn't appear to be any material difference in the size or character of
the data saved between "good" and "bad" partitions. Often, the same
extremely slow partition queried with consistency ONE returns cleanly and
very quickly against other replicas.

I've included a trace of one of these slow returns[2], which I find very
strange: The vast majority of operations are very quick, but the final step
is extremely slow. Nothing exceeds 2ms until the final "Read 1 live and 0
tombstone cells" which takes a whopping 69 seconds [!!]. We've checked our
garbage collection in this time period and have not noticed any significant
collections.

As far as I can tell, the trace doesn't raise any red flags, and we're
largely stumped.

We've got two main questions:

1) What's up with the megapartition? What's the best way to debug this? Our
data model is largely write once, we don't do any updates. We do DELETE,
but the partitions that are giving us issues haven't been removed. We had
some suspicions on https://issues.apache.org/jira/browse/CASSANDRA-10547,
but that seems to largely be triggered by UPDATE operations.

2) What could cause the Read to take such an absurd amount of time when
it's a pair of sstables and the memtable being examined, and its just a
single cell being read? We originally suspected just memory pressure from
huge sstables, but without a corresponding GC this seems unlikely?

Any ideas?

Thanks in advance!

--Bryan


[1]
Percentile  SSTables     Write Latency      Read Latency    Partition Size
       Cell Count
                              (micros)          (micros)           (bytes)

50%             1.00             35.00             72.00              1109
               14
75%             1.00             50.00            149.00              1331
               17
95%             1.00             72.00            924.00              4768
               35
98%             2.00            103.00           1597.00              9887
               72
99%             2.00            149.00           1597.00             14237
              103
Min             0.00             15.00             25.00                43
                0
Max             2.00            258.00           6866.00        4139110981
            20501

[2]

[
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f524890-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Parsing select * from pooltx where hash =
0x5f805c68d66e7d271361e7774a7eeec0591eb5197d4f420126cea83171f0a8ff;",
    "Source": "172.31.54.46",
    "SourceElapsed": 26
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f526fa0-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Preparing statement",
    "Source": "172.31.54.46",
    "SourceElapsed": 79
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f52bdc0-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Executing single-partition query on pooltx",
    "Source": "172.31.54.46",
    "SourceElapsed": 1014
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f52e4d0-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Acquiring sstable references",
    "Source": "172.31.54.46",
    "SourceElapsed": 1016
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f530be0-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Merging memtable tombstones",
    "Source": "172.31.54.46",
    "SourceElapsed": 1029
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f5332f0-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Bloom filter allows skipping sstable 387133",
    "Source": "172.31.54.46",
    "SourceElapsed": 1040
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f535a00-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Key cache hit for sstable 386331",
    "Source": "172.31.54.46",
    "SourceElapsed": 1046
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f538110-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Seeking to partition beginning in data file",
    "Source": "172.31.54.46",
    "SourceElapsed": 1047
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f53a820-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Key cache hit for sstable 379893",
    "Source": "172.31.54.46",
    "SourceElapsed": 1256
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f53cf30-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Seeking to partition beginning in data file",
    "Source": "172.31.54.46",
    "SourceElapsed": 1257
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f53f640-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Skipped 0/4 non-slice-intersecting sstables, included
0 due to tombstones",
    "Source": "172.31.54.46",
    "SourceElapsed": 1357
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "4f541d50-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Merging data from memtables and 2 sstables",
    "Source": "172.31.54.46",
    "SourceElapsed": 1359
  },
  {
    "Sessionid": "4f51fa70-ba2f-11e5-8729-e1d125cb9b2d",
    "Eventid": "78d82a90-ba2f-11e5-8729-e1d125cb9b2d",
    "Activity": "Read 1 live and 0 tombstone cells",
    "Source": "172.31.54.46",
    "SourceElapsed": 69666648
  }
]

Reply via email to