Thank you! I have one more question ;-) If I use regular "get" function then I can be sure that it takes ~5ms. So I suppose that if I use "get_indexed_slices" function then the response time depends on how many rows match the most selected equality predicate. Am I right?
Augi 2011/6/14 aaron morton <aa...@thelastpickle.com>: > From a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()... > > Candidate rows are first read by applying the most selected equality > predicate. > > From those candidate rows... > > 1) If the SlicePredicate has a SliceRange the query execution will read all > columns for the candidate row if the byte size of the largest tracked row is > less than column_index_size_in_kb config setting (defaults to 64K). Meaning > if no more than 1 column index page of columns is (probably) going to be > read, they will all be read. > > 2) Otherwise if the query will read the columns specified by the SliceRange. > > 3) If the SlicePredicate uses a list of columns names, those columns and the > ones referenced in the IndexExpressions (except the one selected as the > primary pivot above) are read from disk. > > If additional columns are needed (in case 2 above) they are read in a > separate reads from the candidate row. > > Then when applying the SlicePredicate to produce the final projection into > the result set, all the columns required to satisfy the filter will be in > memory. > > > So, yes it reads just the columns from disk you you ask for. Unless it thinks > it will take no more work to read more. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 13 Jun 2011, at 08:34, Michal Augustýn wrote: > >> Hi, >> >> as I wrote, I don't want to install Hadoop etc. - I want just to use >> the Thrift API. The core of my question is how does get_indexed_slices >> function work. >> >> I know that it must get all keys using equality expression firstly - >> but what about additional expressions? Does Cassandra fetch whole >> filtered rows, or just columns used in additional filtering >> expression? >> >> Thanks! >> >> Augi >> >> 2011/6/12 aaron morton <aa...@thelastpickle.com>: >>> Not exactly sure what you mean here, all data access is through the thrift >>> API unless you code java and embed cassandra in your app. >>> As well as Pig support there is also Hive support in brisk (which will also >>> have Pig support soon) http://www.datastax.com/products/brisk >>> Can you provide some more info on the use case ? Personally if you have a >>> read query you know you need to support, I would consider supporting it in >>> the data model without secondary indexes. >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> On 11 Jun 2011, at 19:23, Michal Augustýn wrote: >>> >>> Hi all, >>> >>> I'm thinking of get_indexed_slices function as a simple map-reduce job >>> (that just maps) - am I right? >>> >>> Well, I would like to be able to run simple queries on values but I >>> don't want to install Hadoop, write map-reduce jobs in Java (the whole >>> application is in C# and I don't want to introduce new development >>> stack - maybe Pig would help) and have some second interface to >>> Cassandra (in addition to Thrift). So secondary indexes seem to be >>> rescue for me. I would have just one indexed column that will have >>> day-timestamp value (~100k items per day) and the equality expression >>> for this column would be in each query (and I would add more ad-hoc >>> expressions). >>> Will this scenario work or is there some issue I could run in? >>> >>> Thanks! >>> >>> Augi >>> >>> > >