for #2 you could pipe through wc -l to get the answer

sort -n keys.txt | uniq | wc -l

but both examples are just refinements of iterate.

#1 is just a distributed iterate
#2 is just an optimized iterate based on knowledge of the on-disk
format (and my give inaccurate results... tombstones...)

On 28 March 2011 14:16, Or Yanay <o...@peer39.com> wrote:
> I use one of two ways to achieve that:
>  1. run a map reduce. Pig is really helpful in these cases. Make sure you run 
> your MR using Hadoop task tracker on your nodes - or your performance will 
> take a hit.
>  2. dump all keys using sstablekeys script from relevant files on all 
> machines and count unique values. I do that using "sort -n  keys.txt |uniq >> 
> unique_keys.txt"
>
> Dumping all keys is much faster but less elegant and can be more annoying if 
> you want do that from your application.
>
> Hope that do the trick for you.
> -Orr
>
> -----Original Message-----
> From: Joshua Partogi [mailto:joshua.j...@gmail.com]
> Sent: Monday, March 28, 2011 2:39 PM
> To: user@cassandra.apache.org
> Subject: Re: newbie question: how do I know the total number of rows of a cf?
>
> Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> these days.
>
> On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
> <stephen.alan.conno...@gmail.com> wrote:
>> iterate.
>>
>> otherwise if that will be too slow and you will do it often, the nosql way
>> is to create a separate column family updated with each row add/delete to
>> hold the answer for you.
>>
>> - Stephen
>>
>> ---
>> Sent from my Android phone, so random spelling mistakes, random nonsense
>> words and other nonsense are a direct result of using swype to type on the
>> screen
>>
>> On 28 Mar 2011 07:40, "Sheng Chen" <chensheng2...@gmail.com> wrote:
>>> Hi all,
>>> I want to know how many records I am holding in Cassandra, just like
>>> count(*) in sql.
>>> What can I do ? Thank you.
>>>
>>> Sheng
>>
>
>
>
> --
> http://twitter.com/jpartogi
>

Reply via email to