Hi Jeff,

It is happening on multiple servers and even on different DCs.
The schema contains two keyspaces as follows:

CREATE KEYSPACE ks1 WITH replication = {'class': 'NetworkTopologyStrategy',
'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE ks1.tbl1 (
    account_id text,
    metric_name text,
    day_of_month int,
    hour_of_day int,
    insert_time timestamp,
    metric_value text,
    PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day),
insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND default_time_to_live = 2764800
    AND gc_grace_seconds = 864000;

CREATE TABLE ks1.tbl2 (
    account_id text,
    lambda_uuid uuid,
    metric_name text,
    day_of_month int,
    insert_time timestamp,
    hour_of_day int,
    metric_value text,
    PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month),
insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND default_time_to_live = 2764800
    AND gc_grace_seconds = 864000;


CREATE KEYSPACE ks2 WITH replication = {'class': 'NetworkTopologyStrategy',
'DC1': '3', 'DC2': '3'}  AND durable_writes = true;

CREATE TABLE ks2.tbl3 (
    account_id text,
    lambda_uuid uuid,
    insert_time timestamp,
    data text,
    error_count int,
    info_count int,
    request_id text,
    warn_count int,
    PRIMARY KEY ((account_id, lambda_uuid), insert_time)
) WITH CLUSTERING ORDER BY (insert_time DESC)
    AND default_time_to_live = 2764800
    AND gc_grace_seconds = 864000;


I'm afraid I am unable to provide the actual key cache file, but for what
it's worth, here's a data example from the above tables:


*ks1.tbl1:*
 account_id | metric_name                                               |
day_of_month | hour_of_day | insert_time                     | metric_value
------------+-----------------------------------------------------------+--------------+-------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   12345678 | com.selfbuilds.faas.system.invocation-finished-successful |
         1 |          10 | 2020-01-01 10:59:03.588000+0000 |
{"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392}


*ks1.tbl2:*
 account_id | lambda_uuid                          | metric_name
                          | day_of_month | insert_time
| hour_of_day | metric_value
------------+--------------------------------------+-------------------------------------------------+--------------+---------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc |
com.selfbuilds.faas.system.invocation-triggered |            9 | 2020-01-09
13:03:06.653000+0000 |          13 |
{"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"}


*ks2.tbl3:*
 account_id | lambda_uuid                          | insert_time
          | data
                        | error_count | info_count | request_id
              | warn_count
------------+--------------------------------------+---------------------------------+----------------------------------------------------------------------------------------+-------------+------------+--------------------------------------+------------
   12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01
01:30:22.302000+0000 |
[{"timestamp":1577842222458,"level":"Info","message":"Best Logs in
town","extras":[]}] |           0 |          1 |
2966683f-cf37-4ea3-9d82-1de46207d51e |          0



Thanks for your help on this one!



On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa <jji...@gmail.com> wrote:

> 489MB key seems huge. Unlikely it’s really that large.
>
> If it’s happening on multiple servers it’d be interesting to see the
> schema and/or saved cache file if it’s not incredibly sensitive
>
> On Jan 23, 2020, at 3:48 AM, Shalom Sagges <shalomsag...@gmail.com> wrote:
>
> 
> Hi All,
>
> Cassandra 3.11.4.
> On one of our clusters, during startup, I see two types of  "Harmless
> error" notification regarding the keycache:
>
> *Server 1:*
>
> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 - 
> *Harmless
> error reading saved cache* /path/to/saved_caches/KeyCache-e.db
> *java.lang.NegativeArraySizeException: null*
>         at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
>         at
> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472)
>         at
> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
>         at
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
>         at
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 -
> Completed loading (188 ms; 10916 keys) KeyCache cache
>
>
> *Server 2:*
>
> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 -
> Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db
> *java.io.IOException: Corrupted key cache. Key length of 489735494 is
> longer than maximum of 65535*
>         at
> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469)
>         at
> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
>         at
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
>         at
> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 -
> Completed loading (285 ms; 20681 keys) KeyCache cache
>
> The KeyCache size is a lot less than max default (100MB). So does this
> mean that the key itself is the culprit?
>
> I'd really appreciate some guidance on what this message actually means in
> the logs during startup and what problems may arise from it.
> I see that one of the nodes experiences high CPU load and even gets kernel
> panic from time to time, so I want to understand if the above may cause
> such issues.
>
> Thanks a lot!
>
>

Reply via email to