Not sure it's a feature cassandra needs, it would radically change the meaning of get_indexes_slices(). If you already know the row keys the assumption would be you know they are the rows you want to get.
Feel free to add a Jira though. IMHO this sounds more like Sphinx not supporting all the features you need, rather than cassandra. Can you use a different search engine such as Solr, Solandra or Elastic Search? Or Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 10:27 AM, Evgeniy Ryabitskiy wrote: > Something like this. > > Actually I think it's better to extend get_indexed_slice() API instead of > creating new one thrift method. > I wish to have something like this: > > //here we run query to external search engine > List<byte[]> keys = performSphinxQuery(someFullTextSearchQuery); > IndexClause indexClause = new IndexClause(); > > //required API to set list of keys > indexClause.setKeys(keys); > indexClause.setExpressions(someFilteringExpressions); > List finalResult = get_indexed_slices(colParent, indexClause, colPredicate, > cLevel); > > > > I can't solve my issue with single get_indexed_slice(). > Here is issue in more details: > 1) have ~ 6 millions records, in feature could be much more > 2) have > 10k different properties (stored as column values in Cassandra), > in feature could be much more > 3) properties are text descriptions , int/float values, string values > 4) need to implement search over all properties. For text descriptions: full > text search. for int/float properties: range search. > 5) Search query could use any combination of property descriptions. Like full > text search description and some range expression for int/float field. > 6) have external search engine (Sphinx) that indexed all string and text > properties > 7) still need to perform range search for int, float fields. > > So now I split my query expressions in 2 groups: > 1) expressions that can be handled by search engine > 2) others (additional filters) > > For example I run first query to Sphinx and got list of rowKeys, with length > of 100k. (mark as RESULT1) > Now I need to filter it by second group of expressions. For example I have > simple expression: "age > 25". > So imagine I would run get_indexed_slice() with this query and could possibly > get half of my records in result. (mark as RESULT2) > Then I would need to get intersection between RESULT1 and RESULT2 on client > side, which could take a lot of time and memory. > That is why I can't use single get_indexed_slice here. > > For me is better to iterate RESULT1 (with 100k records) at client side to > filter by age and got 10-50k record as final result. Disadvantage here is > that I have to fetch all 100k records. > > Evgeny. > > > > > > > > >