Re: Index search in provided list of rows (list of rowKeys).

aaron morton Tue, 13 Sep 2011 14:55:52 -0700

Not sure it's a feature cassandra needs, it would radically change the meaning 
of get_indexes_slices(). If you already know the row keys the assumption would 
be you know they are the rows you want to get.


Feel free to add a Jira though. 

IMHO this sounds more like Sphinx not supporting all the features you need, 
rather than cassandra. Can you use a different search engine such as Solr, 
Solandra or Elastic Search? Or 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13/09/2011, at 10:27 AM, Evgeniy Ryabitskiy wrote:

> Something like this.
> 
> Actually I think it's better to extend get_indexed_slice() API instead of 
> creating new one thrift method.
> I wish to have something like this:
> 
> //here we run query to external search engine
> List<byte[]> keys = performSphinxQuery(someFullTextSearchQuery);
> IndexClause indexClause = new IndexClause();
> 
> //required API to set list of keys
> indexClause.setKeys(keys);
> indexClause.setExpressions(someFilteringExpressions);
> List finalResult = get_indexed_slices(colParent, indexClause, colPredicate, 
> cLevel);
> 
> 
> 
> I can't solve my issue with single get_indexed_slice().
> Here is issue in more details: 
> 1) have ~ 6 millions records, in feature could be much more
> 2) have  > 10k different properties (stored as column values in Cassandra), 
> in feature could be much more
> 3) properties are text descriptions , int/float values, string values 
> 4) need to implement search over all properties. For text descriptions: full 
> text search. for int/float properties: range search.
> 5) Search query could use any combination of property descriptions. Like full 
> text search description and some range expression for int/float field.
> 6) have external search engine (Sphinx) that indexed all string and text 
> properties
> 7) still need to perform range search for int, float fields.
> 
> So now I split my query expressions in 2 groups:
> 1) expressions that can be handled by search engine
> 2) others (additional filters)
> 
> For example I run first query to Sphinx and got list of rowKeys, with length 
> of 100k.  (mark as RESULT1)
> Now I need to filter it by second group of expressions. For example I have 
> simple expression: "age > 25".
> So imagine I would run get_indexed_slice() with this query and could possibly 
> get half of my records in result. (mark as RESULT2)
> Then I would need to get intersection between RESULT1 and RESULT2 on client 
> side, which could take a lot of time and memory.
> That is why I can't use single get_indexed_slice here.
> 
> For me is better to iterate RESULT1 (with 100k records) at client side to 
> filter by age and got 10-50k record as final result. Disadvantage here is 
> that I have to fetch all 100k records.
> 
> Evgeny.
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Index search in provided list of rows (list of rowKeys).

Reply via email to