With the following schema:
- TimeStamp - Device ID - Device Name - Device Owner - Device Color PKEY (TimeStamp, DeviceID) Each record is 40 bytes. I'm trying to fetch all the rows for a particular TimeStamp (partitionID). Select * from schema where TimeStamp = '.' There are 500K such rows per timestamp. I have figured out that doing pagination would give a much better throughput than trying to fetch all in one shot. So to fetch 500 K rows (40 MB), using page size of 1000 / 10000, it took around 25-30 seconds. I have following question: (A) Will all the data that I'm querying be stored sequentially in disk for a particular TimeStamp (and yes, I've run compact command)? (B) If answer to first qn is yes, then why am I not able to get throughput equal to disk (40 MB/s)? Please note that I'm able to retrieve 40 MB worth of data in 25-30 seconds, which translates to hardly 1.5 MB/s. (C) If answer to above first question is yes, then could I further speed up the response? (D) Is serialization / deserialization the culprit for slow throughput? If so, can something be done to avoid it altogether? Thanks