Re: design that mimics twitter tweet search

Chris Goffinet Mon, 19 Mar 2012 01:24:05 -0700

We do not use Cassandra for search. We made modifications to Lucene.

Here is a blog post on our engineering section that talks about what we did:


http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html


On Sun, Mar 18, 2012 at 11:22 PM, Tharindu Mathew <mcclou...@gmail.com>wrote:

> Sasha,
>
> It depends on the way you implement I guess... Maybe twitter uses
> Solandra, who's very good at indexing these in different ways but has the
> power of Cassandra underneath...
>
> If your doing your own impl of indexing be mindful that you can break the
> sentence into four words and index or you index the whole sentence. Both
> would produce different results as they can mean a completely different
> thing based on the context.
>
>
> On Mon, Mar 19, 2012 at 7:35 AM, Andrey V. Panov <panov.a...@gmail.com>wrote:
>
>> Why you suppose they did search on Cassandra?
>>
>>
>> On 19 March 2012 00:16, Sasha Dolgy <sdo...@gmail.com> wrote:
>>
>>> yes -- but given i have two keywords, and want to find all tweets that
>>> have "cassandra" and "bestest" ... means, retrieving all columns + values
>>> in each row, iterating through both to see if tweet id's in one, exist in
>>> the other and finishing up with a consolidated list of tweet id's that only
>>> exist in both.  just seems clunky to me ... ?
>>>
>>>
>>> On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud <ben...@noisette.ch>wrote:
>>>
>>>> The simpliest modeling you could have is using the keyword as key, a
>>>> timestamp/time UUID as column name and the tweetid as value
>>>>
>>>> -> cf['keyword']['timestamp'] = tweetid
>>>>
>>>> then you do a range query to get all tweetid sorted by time (you may
>>>> want them in reverse order) and you can limit to the number of tweets
>>>> displayed on the page.
>>>>
>>>> As some rows can become large, you could use key patitionning by
>>>> concatening for instance keyword and the month and year.
>>>>
>>>>
>>>> 2012/3/18 Sasha Dolgy <sdo...@gmail.com>:
>>>> > Hi All,
>>>> >
>>>> > With twitter, when I search for words like:  "cassandra is the
>>>> bestest", 4
>>>> > tweets will appear, including one i just did.  My understand that the
>>>> > internals of twitter work in that each word in a tweet is allocated,
>>>> > irrespective of the presence of a  # hash tag, and the tweet id is
>>>> assigned
>>>> > to a row for that word.  What is puzzling to me, and hopeful that
>>>> some smart
>>>> > people on here can shed some light on -- is how would this work with
>>>> > Cassandra?
>>>> >
>>>> > row [ cassandra ]: key -> tweetid  / timestamp
>>>> > row [ bestest ]: key -> tweetid / timestamp
>>>> >
>>>> > I had thought that I could simply pull a list of all column names
>>>> from each
>>>> > row (representing each word) and flag all occurrences (tweet id's)
>>>> that
>>>> > exist in each row ... however, these rows would get quite long over
>>>> time.
>>>> >
>>>> > Am I missing an easier way to get a list of all "tweetid's" that
>>>> exist in
>>>> > multiple rows?
>>>> >
>>>> > --
>>>> > Sasha Dolgy
>>>> > sasha.do...@gmail.com
>>>>
>>>>
>>>>
>>>> --
>>>> sent from my Nokia 3210
>>>>
>>>
>>>
>>>
>>> --
>>> Sasha Dolgy
>>> sasha.do...@gmail.com
>>>
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>

Re: design that mimics twitter tweet search

Reply via email to