for #2 you could pipe through wc -l to get the answer sort -n keys.txt | uniq | wc -l
but both examples are just refinements of iterate. #1 is just a distributed iterate #2 is just an optimized iterate based on knowledge of the on-disk format (and my give inaccurate results... tombstones...) On 28 March 2011 14:16, Or Yanay <o...@peer39.com> wrote: > I use one of two ways to achieve that: > 1. run a map reduce. Pig is really helpful in these cases. Make sure you run > your MR using Hadoop task tracker on your nodes - or your performance will > take a hit. > 2. dump all keys using sstablekeys script from relevant files on all > machines and count unique values. I do that using "sort -n keys.txt |uniq >> > unique_keys.txt" > > Dumping all keys is much faster but less elegant and can be more annoying if > you want do that from your application. > > Hope that do the trick for you. > -Orr > > -----Original Message----- > From: Joshua Partogi [mailto:joshua.j...@gmail.com] > Sent: Monday, March 28, 2011 2:39 PM > To: user@cassandra.apache.org > Subject: Re: newbie question: how do I know the total number of rows of a cf? > > Not all NoSQL is like that. Or perhaps the term NoSQL has became vague > these days. > > On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly > <stephen.alan.conno...@gmail.com> wrote: >> iterate. >> >> otherwise if that will be too slow and you will do it often, the nosql way >> is to create a separate column family updated with each row add/delete to >> hold the answer for you. >> >> - Stephen >> >> --- >> Sent from my Android phone, so random spelling mistakes, random nonsense >> words and other nonsense are a direct result of using swype to type on the >> screen >> >> On 28 Mar 2011 07:40, "Sheng Chen" <chensheng2...@gmail.com> wrote: >>> Hi all, >>> I want to know how many records I am holding in Cassandra, just like >>> count(*) in sql. >>> What can I do ? Thank you. >>> >>> Sheng >> > > > > -- > http://twitter.com/jpartogi >