We do not use Cassandra for search. We made modifications to Lucene. Here is a blog post on our engineering section that talks about what we did:
http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html On Sun, Mar 18, 2012 at 11:22 PM, Tharindu Mathew <mcclou...@gmail.com>wrote: > Sasha, > > It depends on the way you implement I guess... Maybe twitter uses > Solandra, who's very good at indexing these in different ways but has the > power of Cassandra underneath... > > If your doing your own impl of indexing be mindful that you can break the > sentence into four words and index or you index the whole sentence. Both > would produce different results as they can mean a completely different > thing based on the context. > > > On Mon, Mar 19, 2012 at 7:35 AM, Andrey V. Panov <panov.a...@gmail.com>wrote: > >> Why you suppose they did search on Cassandra? >> >> >> On 19 March 2012 00:16, Sasha Dolgy <sdo...@gmail.com> wrote: >> >>> yes -- but given i have two keywords, and want to find all tweets that >>> have "cassandra" and "bestest" ... means, retrieving all columns + values >>> in each row, iterating through both to see if tweet id's in one, exist in >>> the other and finishing up with a consolidated list of tweet id's that only >>> exist in both. just seems clunky to me ... ? >>> >>> >>> On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud <ben...@noisette.ch>wrote: >>> >>>> The simpliest modeling you could have is using the keyword as key, a >>>> timestamp/time UUID as column name and the tweetid as value >>>> >>>> -> cf['keyword']['timestamp'] = tweetid >>>> >>>> then you do a range query to get all tweetid sorted by time (you may >>>> want them in reverse order) and you can limit to the number of tweets >>>> displayed on the page. >>>> >>>> As some rows can become large, you could use key patitionning by >>>> concatening for instance keyword and the month and year. >>>> >>>> >>>> 2012/3/18 Sasha Dolgy <sdo...@gmail.com>: >>>> > Hi All, >>>> > >>>> > With twitter, when I search for words like: "cassandra is the >>>> bestest", 4 >>>> > tweets will appear, including one i just did. My understand that the >>>> > internals of twitter work in that each word in a tweet is allocated, >>>> > irrespective of the presence of a # hash tag, and the tweet id is >>>> assigned >>>> > to a row for that word. What is puzzling to me, and hopeful that >>>> some smart >>>> > people on here can shed some light on -- is how would this work with >>>> > Cassandra? >>>> > >>>> > row [ cassandra ]: key -> tweetid / timestamp >>>> > row [ bestest ]: key -> tweetid / timestamp >>>> > >>>> > I had thought that I could simply pull a list of all column names >>>> from each >>>> > row (representing each word) and flag all occurrences (tweet id's) >>>> that >>>> > exist in each row ... however, these rows would get quite long over >>>> time. >>>> > >>>> > Am I missing an easier way to get a list of all "tweetid's" that >>>> exist in >>>> > multiple rows? >>>> > >>>> > -- >>>> > Sasha Dolgy >>>> > sasha.do...@gmail.com >>>> >>>> >>>> >>>> -- >>>> sent from my Nokia 3210 >>>> >>> >>> >>> >>> -- >>> Sasha Dolgy >>> sasha.do...@gmail.com >>> >> >> > > > -- > Regards, > > Tharindu > > blog: http://mackiemathew.com/ > >