Thanks. For those interested: opened CASSANDRA-14415. SK
On 2018-04-19 06:04, Benjamin Lerer wrote: > Hi Sam, > > Your finding is interesting. Effectively, if the number of bytes to skip is > larger than the remaining bytes in the buffer + the buffer size it could be > faster to use seek. > Feel free to open a JIRA ticket and attach your patch. It will be great if > you could add to the ticket your table schema as well > as some information on your environment (e.g. disk type). > > On Tue, Apr 17, 2018 at 8:53 PM, Sam Klock <skl...@akamai.com> wrote: > >> Thanks (and apologies for the delayed response); that was the kind of >> feedback we were looking for. >> >> We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially >> addresses our problem in the sense that it does limit the data sent on >> the wire. The performance is still extremely poor, however, due to the >> fact that Cassandra continues to read large volumes of data from disk. >> (We've also confirmed this behavior in 3.11.2.) >> >> With a bit more investigation, we now believe the problem (after >> CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(), >> which appears to read bytes in order to skip them. The subclass used in >> our case, RandomAccessReader, exposes a seek(), so we overrode >> skipBytes() in it to make use of seek(), and that seems to resolve the >> problem. >> >> This change is intuitively much safer than the one we'd originally >> identified, but we'd still like to confirm with you folks whether it's >> likely safe and, if so whether it's also potentially worth contributing. >> >> Thanks, >> Sk >> >> >> On 2018-03-22 18:16, Benjamin Lerer wrote: >> >>> You should check the 3.x release. CASSANDRA-10657 could have fixed your >>> problem. >>> >>> >>> On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer < >>> benjamin.le...@datastax.com >>> >>>> wrote: >>>> >>> >>> Syvlain explained the problem in CASSANDRA-4536: >>>> " Let me note that in CQL3 a row that have no live column don't exist, so >>>> we can't really implement this with a range slice having an empty columns >>>> list. Instead we should do a range slice with a full-row slice predicate >>>> with a count of 1, to make sure we do have a live column before including >>>> the partition key. " >>>> >>>> By using ColumnFilter.selectionBuilder(); you do not select all the >>>> columns. By consequence, some partitions might be returned while they >>>> should not. >>>> >>>> On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock <skl...@akamai.com> wrote: >>>> >>>> Cassandra devs, >>>>> >>>>> We use workflows in some of our clusters (running 3.0.15) that involve >>>>> "SELECT DISTINCT key FROM..."-style queries. For some tables, we >>>>> observed extremely poor performance under light load (i.e., a small >>>>> number of rows per second and frequent timeouts), which we eventually >>>>> traced to replicas shipping entire rows (which in some cases could store >>>>> on the order of MBs of data) to service the query. That surprised us >>>>> (partly because 2.1 doesn't seem to behave this way), so we did some >>>>> digging, and we eventually came up with a patch that modifies >>>>> SelectStatement.java in the following way: if the selection in the query >>>>> only includes the partition key, then when building a ColumnFilter for >>>>> the query, use: >>>>> >>>>> builder = ColumnFilter.selectionBuilder(); >>>>> >>>>> instead of: >>>>> >>>>> builder = ColumnFilter.allColumnsBuilder(); >>>>> >>>>> to initialize the ColumnFilter.Builder in gatherQueriedColumns(). That >>>>> seems to repair the performance regression, and it doesn't appear to >>>>> break any functionality (based on the unit tests and some smoke tests we >>>>> ran involving insertions and deletions). >>>>> >>>>> We'd like to contribute this patch back to the project, but we're not >>>>> convinced that there aren't subtle correctness issues we're missing, >>>>> judging both from comments in the code and the existence of >>>>> CASSANDRA-5912, which suggests optimizing this kind of query is >>>>> nontrivial. >>>>> >>>>> So: does this change sound safe to make, or are there corner cases we >>>>> need to account for? If there are corner cases, are there plausibly >>>>> ways of addressing them at the SelectStatement level, or will we need to >>>>> look deeper? >>>>> >>>>> Thanks, >>>>> SK >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>>> >>>>> >>>> >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org