Re: Design for 'Most viewed Discussions' in a forum

openvictor Open Wed, 18 May 2011 12:04:45 -0700

Sorry I made a mistake in topics-seen !
When you insert it should be :

topics-seen[topic:TopicX:timestampN]={TimeUUID3:whatever}


Sorry about that,
Victor

2011/5/18 openvictor Open <openvic...@gmail.com>

> I guess you can use the same system, you need two CF for that and I think
> it's better to use 0.8 because it supports counter :
>
> One CF with UTF8Type called active-topics one CF with UUIDType called
> topics-seen, then using the same principle :
>
> for each timestampN you create :
>
> For each visit to Topic1 Topic2 Topic1
>
> You create a TimeUUID and you insert
> active-topics[topics:timestampN] = {Topic1:whateveryouwant}
> and :
> topics-seen[topic:Topic1]={TimeUUID1:whatever}
>
>
> active-topics[topics:timestampN] = {Topic2:whateveryouwant}
> and :
> topics-seen[topic:Topic2]={TimeUUID2:whatever}
>
>
> active-topics[topics:timestampN] = {Topic1:whateveryouwant}
> and :
> topics-seen[topic:Topic1]={TimeUUID3:whatever}
>
>
> Then when you want to query, you query first all the topics (slice) in
> active-topics for topics:timestampN and then you get all counts in the
> topics-seen CF for all topics in active-topics.
>
> Not so simple... By the way it adds overhead compared to a simple counter
> solution but I think it is far more elegant, but this is just my opinion.
>
>
> Victor
>
>
> 2011/5/18 Aditya Narayan <ady...@gmail.com>
>
>> Thanks victor!
>>
>> Aren't there any good ways by using Cassandra alone ?
>>
>>
>> On Wed, May 18, 2011 at 11:41 PM, openvictor Open 
>> <openvic...@gmail.com>wrote:
>>
>>> Have you thought about user another kind of Database, which supports
>>> volative content for example ?
>>>
>>> I am currently thinking about doing something similar. The best and
>>> simplest option at the moment that I can think of is Redis. In redis you
>>> have the option of querying keys with wildcards. Your problem can be done by
>>> just inserting an UUID into Redis for a certain amount of time ( the best is
>>> to tailor this amount of time as an inverse function of the number of keys
>>> existing in Redis).
>>>
>>> *With Redis*
>>> What I would do : I cut down time in pieces of X minutes ( 15 minutes,
>>> for example by truncating a timestamp). Let timestampN be the timestamp for
>>> the period of time ( [N,N+15] ), let Topic1 Topic2 be two topics then :
>>>
>>> One or more people will view Topic 1 then Topic2 then again Topic1 in
>>> this period of 15 minutes
>>> (HINCRBY is the Increment)
>>> H 
>>> <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>>> topics:Topic1:timestampN
>>> viewcount 1
>>> H 
>>> <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>>> topics:Topic2:timestampN
>>> viewcount 1
>>> H 
>>> <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>>> topics:Topic1:timestampN
>>> viewcount 1
>>>
>>> Then you just query in the following way :
>>>
>>> MGET <http://redis.io/commands/mget> topics:*:timestampN
>>>
>>> * is the wildcard, you order by viewcount and you have what you are
>>> asking for !
>>> This is a simplified version of what you should do but personnally I
>>> really like the combination of Cassandra and Redis.
>>>
>>>
>>> Victor
>>>
>>> 2011/5/18 Aditya Narayan <ady...@gmail.com>
>>>
>>>> I would arrange for memtable flush period in such a manner that the time
>>>> period for which these most viewed discussions are generated equals the
>>>> memtable flush timeperiod, so that the entire row of most viewed discussion
>>>> on a topic is in one or maximum two memtables/ SST tables.
>>>> This would also help minimize several versions of the same column in the
>>>> row parts in different SST tables.
>>>>
>>>>
>>>>
>>>> On Wed, May 18, 2011 at 11:04 PM, Aditya Narayan <ady...@gmail.com>wrote:
>>>>
>>>>> *************
>>>>> For a discussions forum, I need to show a page of most viewed
>>>>> discussions.
>>>>>
>>>>> For implementing this, I maintain a count of views of a discussion &
>>>>> when this views count of a discussion passes a certain threshold limit, 
>>>>> the
>>>>> discussion Id is added to a row of most viewed discussions.
>>>>>
>>>>> This row of most viewed discussions contains columns with Integer names
>>>>> & values containing serialized lists of Ids of all discussions whose views
>>>>> count equals the Integral name of this column.
>>>>>
>>>>> Thus if the view count of a discussion increases I'll need to move its
>>>>> 'Id' from serialized list in some column to serialized list in another
>>>>> column whose name represents the updated views count on that discussion.
>>>>>
>>>>> Thus I can get the most viewed discussions by getting the appropriate
>>>>> no of columns from one end of this Integer sorted row.
>>>>>
>>>>> ************
>>>>>
>>>>> I wanted to get feedback from you all, to know if this is a good
>>>>> design.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Design for 'Most viewed Discussions' in a forum

Reply via email to