Hotspots on Time Series based Model

Chandra Sekar KR Tue, 17 Nov 2015 02:30:07 -0800

Hi,


I have a time-series based table with the below structure and partition 
size/volumetrics. The purpose of this table is to enable range based scans on 
log_ts and filter the log_id, so it can be further used in the main table 
(EVENT_LOG) for checking the actual data. The EVENT_LOG_BY_DATE acts as a 
lookup (index) to the main table.


CREATE TABLE EVENT_LOG_BY_DATE (

  YEAR INT,

  MONTH INT,

  DAY INT,

  HOUR INT,

  LOG_TS TIMESTAMP,

  LOG_ID VARINT,

  PRIMARY KEY ((YEAR, MONTH, DAY, HOUR), LOG_TS))

WITH CLUSTERING ORDER BY (LOG_TS DESC);

SELECT LOG_TS, LOG_ID FROM EVENT_LOG_BY_DATE
  WHERE YEAR = 2015 AND

  MONTH = 11 AND

  DAY = 15 AND

  HOUR IN (10,11) AND

  LOG_TS >= '2015-11-15 10:00:00+0000' AND

  LOG_TS <= '2015-11-15 11:00:00+0000';


Average daily volume of records for this table is ~10million & the avg. row 
size is ~40B. The partition size for an hour comes close to 13MB with each 
partition spanning 416K rows. Will the partition on PRIMARY KEY ((YEAR, MONTH, 
DAY, HOUR) cause any hotspot issues on a node given the hourly data size is 
~13MB ?


Is there any alternate way to model the above time-series based table that 
enable range scans?


Regards, Chandra KR

Hotspots on Time Series based Model

Reply via email to