netudima opened a new pull request, #4789:
URL: https://github.com/apache/cassandra/pull/4789

   Currently, when we execute a local read we fetch data from SSTables and 
Memtables using a merging iterator and write it to a byte buffer. Later when we 
combine a CQL response we deserialize the data back to iterate over them as a 
part of coordinator logic. So, we allocate rows and cells twice here, during 
the read from SSTables/Memtables and during the deserialization by coordinator 
logic if we read data locally (it is a typical scenario because usually drivers 
are sending requests to replicas).
   
   The idea of optimization: if we do a single partition read of a small number 
of rows we can keep the data in memory and avoid double row objects allocation.
   
   A system property is  used to limit number of rows we keep in memory in this 
scenario (to avoid too much pressure on GC due to extended lifetime for these 
objects and promoting them to an old generation). The property also allows to 
disable the logic in case of any issues.
   
   We cannot get a number of rows in advance, so we read first N rows to memory 
and if we still have something then we serialize the remaining to a buffer and 
then concatenate iterators for the in-memory rows + deserialized one when we 
need to iterate over the result
   
   patch by Dmitry Konstantinov; reviewed by TBD for CASSANDRA-21354


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to