Re: newbie question: how do I know the total number of rows of a cf?

2011-03-29 Thread Sheng Chen
Thanks all.

2011/3/28 Stephen Connolly 

> for #2 you could pipe through wc -l to get the answer
>
> sort -n keys.txt | uniq | wc -l
>
> but both examples are just refinements of iterate.
>
> #1 is just a distributed iterate
> #2 is just an optimized iterate based on knowledge of the on-disk
> format (and my give inaccurate results... tombstones...)
>
> On 28 March 2011 14:16, Or Yanay  wrote:
> > I use one of two ways to achieve that:
> >  1. run a map reduce. Pig is really helpful in these cases. Make sure you
> run your MR using Hadoop task tracker on your nodes - or your performance
> will take a hit.
> >  2. dump all keys using sstablekeys script from relevant files on all
> machines and count unique values. I do that using "sort -n  keys.txt |uniq
> >> unique_keys.txt"
> >
> > Dumping all keys is much faster but less elegant and can be more annoying
> if you want do that from your application.
> >
> > Hope that do the trick for you.
> > -Orr
> >
> > -Original Message-
> > From: Joshua Partogi [mailto:joshua.j...@gmail.com]
> > Sent: Monday, March 28, 2011 2:39 PM
> > To: user@cassandra.apache.org
> > Subject: Re: newbie question: how do I know the total number of rows of a
> cf?
> >
> > Not all NoSQL is like that. Or perhaps the term NoSQL has became vague
> > these days.
> >
> > On Mon, Mar 28, 2011 at 6:16 PM, Stephen Connolly
> >  wrote:
> >> iterate.
> >>
> >> otherwise if that will be too slow and you will do it often, the nosql
> way
> >> is to create a separate column family updated with each row add/delete
> to
> >> hold the answer for you.
> >>
> >> - Stephen
> >>
> >> ---
> >> Sent from my Android phone, so random spelling mistakes, random nonsense
> >> words and other nonsense are a direct result of using swype to type on
> the
> >> screen
> >>
> >> On 28 Mar 2011 07:40, "Sheng Chen"  wrote:
> >>> Hi all,
> >>> I want to know how many records I am holding in Cassandra, just like
> >>> count(*) in sql.
> >>> What can I do ? Thank you.
> >>>
> >>> Sheng
> >>
> >
> >
> >
> > --
> > http://twitter.com/jpartogi
> >
>


[ANN] Mojo's Cassandra Maven Plugin 0.7.4-1 released

2011-03-29 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.7.4-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

  * cassandra:start Starts up a test instance of Cassandra in the background.
  * cassandra:stop Stops the test instance of Cassandra that was started
using cassandra:start.
  * cassandra:run Starts up a test instance of Cassandra in the foreground.
  * cassandra:load Runs a cassandra-cli script against the test instance
of Cassandra.
  * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
  * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
  * cassandra:compact Runs nodetool compact against the test instance of
Cassandra.
  * cassandra:cleanup Runs nodetool cleanup against the test instance of
Cassandra.
  * cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


  org.codehaus.mojo
  cassandra-maven-plugin
  0.7.4-1


Release Notes - Mojo's Cassandra Maven Plugin - Version 0.7.4-1

** Improvement
* [MCASSANDRA-6] - Upgrade to Cassandra 0.7.4

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


Compaction doubles disk space

2011-03-29 Thread Sheng Chen
I use 'nodetool compact' command to start a compaction.
I can understand that extra disk spaces are required during the compaction,
but after the compaction, the extra spaces are not released.

Before compaction:
SSTable count: 10
space used (live): 19G
space used (total): 21G

After compaction:
sstable count: 1
space used (live): 19G
space used (total): 42G


BTW, given that compaction requires double disk spaces, does it mean that I
should never reach half of my total disk space?
e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at all.


Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
>From a previous thread of the same topic, I used a force GC and the extra
spaces are released.

What about my second question?




2011/3/29 Sheng Chen 

> I use 'nodetool compact' command to start a compaction.
> I can understand that extra disk spaces are required during the compaction,
> but after the compaction, the extra spaces are not released.
>
> Before compaction:
> SSTable count: 10
> space used (live): 19G
> space used (total): 21G
>
> After compaction:
> sstable count: 1
> space used (live): 19G
> space used (total): 42G
>
>
> BTW, given that compaction requires double disk spaces, does it mean that I
> should never reach half of my total disk space?
> e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at
> all.
>
>
>
>
>


Re: Compaction doubles disk space

2011-03-29 Thread Sylvain Lebresne
> BTW, given that compaction requires double disk spaces, does it mean that I
> should never reach half of my total disk space?
> e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at all.

It is not so black and white. What is true is that in practice
reaching half the disk should
be a first alert, from which you should start to monitor things more
carefully to avoid problems.

There is 2 kind of compaction, major and minor ones. The major ones
are the ones that compact
all the sstables for a given column family. Minor compaction are the
one that are trigger automatically
and regularly. By definition they don't compact everything and thus
don't need half your disk space.
Note however that over time, even minor compaction will require a fair
amount of disk space and could
very well require as much as half the disk space, but in practice it
won't happen all the time.

There other thing is that even a major compaction only have to be
applied to one Column Family at a
time. So unless you only have one CF or 90% of you data in one CF (and
for the record, there's nothing
wrong with that, it's just not necessarily your case), you won't need
exactly half you disk for a
compaction.

All this to say that it is not as if as simple as: you've reached half
your disk space you are necessarily doomed.
Chances are you'll never hit any problem until you're say 70% full (or
more). But there is no fullproof number
here so I said earlier, hitting 50% should be a first sign that you
may need a plan for the future.

--
Sylvain


Re: balance between concurrent_[reads|writes] and feeding/reading threads i clients

2011-03-29 Thread aaron morton
The concurrent_reads and concurrent_writes set the number of threads in the 
relevant thread pools. You can view the number of active and queued tasks using 
nodetool tpstats. 

The thread pool uses a blocking linked list for it's work queue with a max size 
of Integer.MAX_VALUE. So it's size is essentially unbounded. When (internode) 
messages are received by a node they are queued into the relevant thread pool 
for processing. When (certain) messages are executed it checks the send time of 
the message and will not process it if it is more than rpc_timeout (typically 
10 seconds) old. This is where the "dropped messages" log messages come from.  

The coordinator will wait up to rpc_timeout for the CL number of nodes to 
respond. So if say one node is under severe load and cannot process the read in 
time, but the others are ok a request at QUORUM would probably succeed. However 
if a number of nodes are getting a beating the co-ordinator may time out 
resulting in the client getting a TimedOutException. 

For the read path it's a little more touchy. Only the "nearest" node is sent a 
request for the actual data, the others are asked for a digest of the data they 
would return. So if the "nearest" node is the one under load and times out the 
request will time out even if CL nodes returned. Thats what the DynamicSnitch 
is there for, a node under load would less likely to be considered the 
"nearest" node. 

The read and write thread pools are really just dealing with reading and 
writing data on the local machine. Your request moves through several other 
threads / thread pools: connection thread, outbound TCP pool, inbound TCP pool 
and message response pool. The SEDA paper referenced on this page was the model 
for using thread pools to manage access to resources 
http://wiki.apache.org/cassandra/ArchitectureInternals

In summary, don't worry about it unless you see the thread pools backing up and 
messages being dropped. 
 
Hope that helps
Aaron

On 28 Mar 2011, at 19:55, Terje Marthinussen wrote:

> Hi, 
> 
> I was pondering about how the concurrent_read and write settings balances 
> towards max read/write threads in clients.
> 
> Lets say we have 3 nodes, and concurrent read/write set to 8.
> That is, 8*3=24 threads for reading and writing.
> 
> Replication factor is 3.
> 
> Lets say we have clients that in total set up 16 connections to each node.
> 
> Now all the clients write at the same time. Since the replication factor is 
> 3, you could get up to 16*3=48  concurrent write request per node (which 
> needs to be handled by 8 threads)?
> 
> What is the result if this load continues?
> Could you see that replication of data fails (at least initially) causing all 
> kinds of fun timeouts around in the system?
> 
> Same on the read side. 
> If all clients read at the same time with Consistency level QUORUM, you get 
> 16*2 read requests in best case (and more in worst case)?
> 
> Could you see that one node answers, but another one times out due to lack of 
> read threads, causing read repair which again further degrades?
> 
> How does this queue up internally between thrift, gossip and the threads 
> doing the actual read and writes? 
> 
> Regards,
> Terje



Re: Help on how to configure an off-site DR node.

2011-03-29 Thread aaron morton
Be aware that at RF 2 the Quorum is 2, so you cannot afford to lose a replica 
when working at Quorum. 3 is really the starting point if you want some 
redundancy. 

If you want to get your data offsite how about doing snapshots and moving them 
off site http://wiki.apache.org/cassandra/Operations#Consistent_backups

The guide from Data Stax will give you a warm failover site, which sounds a bit 
more than what you need.  

Hope that helps. 
Aaron

On 28 Mar 2011, at 22:47, Brian Lycett wrote:

> Hello.
> 
> I'm setting up a cluster that has three nodes in our production rack.
> My intention is to have a replication factor of two for this.
> For disaster recovery purposes, I need to have another node (or two?)
> off-site.
> 
> The off-site node is entirely for the purpose of having an offsite
> backup of the data - no clients will connect to it.
> 
> My question is, is it possible to configure Cassandra so that the
> offsite node will have a full copy of the data set?
> That is, somehow guarantee that a replica of all data will be written to
> it, but without having to resort to an ALL consistency level for writes?
> Although the offsite node will on a 20Mbit leased line, I'd rather not
> have the risk that the link goes down and breaks the cluster.
> 
> I've seen this suggestion here:
> http://www.datastax.com/docs/0.7/operations/datacenter#disaster
> but that configuration is vulnerable to the link breaking, and uses four
> nodes in the offsite location.
> 
> 
> Regards,
> 
> Brian
> 
> 



Re: Compaction doubles disk space

2011-03-29 Thread Karl Hiramoto
Would it be possible to improve the current compaction disk space issue 
by  compacting one only a few SSTables at a time then imediately 
deleting the old one?  Looking at the logs it seems like deletions of 
old SSTables are taking longer than necessary.


--
Karl


improving speed/space for repair/ompact Big O Notation of

2011-03-29 Thread Karl Hiramoto

Can someone roughly advise Big O()  for number of keys in a CF?


Is it advisable to partition  data into more Column Famlies and 
Keyspaces to improve repair and compact performance?


Thanks
--
Karl



Re: Problem about freeing space after a major compaction

2011-03-29 Thread aaron morton
Cassandra will request a GC to free compacted SSTables if there is not 
sufficient space to write an SSTable or perform a compaction. 

Aaron

On 29 Mar 2011, at 02:15, Roberto Bentivoglio wrote:

> Thanks you again, we're going to update our enviroment.
> 
> Regards,
> Roberto
> 
> On 28 March 2011 17:08, Ching-Cheng Chen  wrote:
> 
> AFAIK, setting gc_grace_period to 0 shouldn't cause this issue.   In fact, 
> that what I'm using now in a single node environment like yours.
> 
> However, I'm using 0.7.2 with some patches.   If you are still using 0.7.0, 
> most likely you got hit with this bug.
> You might want to patch it or upgrade to latest release.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2059
> 
> Regards,
> 
> Chen
> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
> http://www.evidentsoftware.com
> 
> On Mon, Mar 28, 2011 at 11:04 AM, Roberto Bentivoglio 
>  wrote:
> Hi Chen,
> we've set the gc grace period of the column families to 0 as suggest in a 
> single node enviroment.
> Can this setting cause the problem? I don't think so...
> 
> Thanks,
> Roberto
> 
> On 28 March 2011 16:54, Ching-Cheng Chen  wrote:
> tombstones removal also depends on your gc grace period setting.
> 
> If you are pretty sure that you have proper gc grace period set and still on 
> 0.7.0, then probably related to this bug.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2059
> 
> Regards,
> 
> Chen
> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
> http://www.evidentsoftware.com
> 
> On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio 
>  wrote:
> Hi all,
> we're working on a Cassandra 0.7.0 production enviroment with a store of data 
> near to 500 GB.
> We need to periodically remove the tombstones from deleted/expired data 
> performing a major compaction operation through nodetool.
> After invoking the compaction on a single column family we can see from 
> JConsole that the LiveSSTableCount is going from 15 to 3 while the 
> LiveDiskSpaceUsed is going from 90GB to 50GB.
> The problem now is that the space on the file system is been taken from 
> Cassandra (I assumed from the old SSTable) and it isn't freed. We have tried 
> to perform a full GC from the JConsole as described in 
> http://wiki.apache.org/cassandra/MemtableSSTable without any success. The 
> space is freed only after a database restart.
> 
> How can we free this disk space without restart the db?
> 
> Thanks you very much,
> Roberto Bentivoglio
> 
> 
> 
> 



Re: design cassandra issue client when moving from version 0.6.* to 0.7.3

2011-03-29 Thread aaron morton
There should only be one active request on the socket at a time. Otherwise 
things could get confused on the server side. 

Also is there a reason you are not calling CassandraClient::multiget_slice ?

Aaron

On 29 Mar 2011, at 10:59, Anurag Gujral wrote:

> Hi All,
>  I am currently porting a cassandra c++ client from 0.6.*  to 0.7.3. 
> The c++ client I had in 0.6.* used to function
> conn->client->send_multiget_slice which used to take as parameter cseqid.
> The sign of the function in 0.6.* was
> void CassandraClient::send_multiget_slice(const std::string& keyspace, const 
> std::vector & keys, const ColumnParent& column_parent, const 
> SlicePredicate& predicate, const ConsistencyLevel consistency_level, const 
> int32_t cseqid)
> 
> 
> Incase the function send_multiget_slice did not return sucess. The code used 
> to wait on the socket by calling select and use to read data if the data  was 
> available using recv_multiget_slice provided cseqid passed to 
> send_multiget_slice was same as that in the call to function
> recv_mutlget_slice .
> 
> In Cassandra 0.7.3 the function send_multiget_slice and recv_multiget_slice 
> dont take cseqid as parameter.
> 
> How can I accomplish the behaviour of 0.6.* in 0.7.3 version.
> 
> Please Suggest
> Thanks
> Anurag
> 



Re: New committer Sylvain Lebresne

2011-03-29 Thread aaron morton
Congratulations Sylvain

On 29 Mar 2011, at 11:47, Jake Luciani wrote:

> Great job, well deserved Sylvain!
> 
> On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis  wrote:
> The Cassandra PMC has voted to add Sylvain as a committer.
> 
> Welcome, Sylvain, and thanks for the hard work!
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 
> 
> -- 
> http://twitter.com/tjake



Re: Problem about freeing space after a major compaction

2011-03-29 Thread Roberto Bentivoglio
Hi Aaron,
we had tried invoking a full GC on Cassandra without any success.
The space is still used.

Regards,
Roberto

On 29 March 2011 13:12, aaron morton  wrote:

> Cassandra will request a GC to free compacted SSTables if there is not
> sufficient space to write an SSTable or perform a compaction.
>
> Aaron
>
> On 29 Mar 2011, at 02:15, Roberto Bentivoglio wrote:
>
> Thanks you again, we're going to update our enviroment.
>
> Regards,
> Roberto
>
> On 28 March 2011 17:08, Ching-Cheng Chen wrote:
>
>>
>> AFAIK, setting gc_grace_period to 0 shouldn't cause this issue.   In fact,
>> that what I'm using now in a single node environment like yours.
>>
>> However, I'm using 0.7.2 with some patches.   If you are still using
>> 0.7.0, most likely you got hit with this bug.
>> You might want to patch it or upgrade to latest release.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2059
>>
>> Regards,
>>
>> 
>> Chen
>> Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )
>> http://www.evidentsoftware.com
>>
>> On Mon, Mar 28, 2011 at 11:04 AM, Roberto Bentivoglio <
>> roberto.bentivog...@gmail.com> wrote:
>>
>>> Hi Chen,
>>> we've set the gc grace period of the column families to 0 as suggest in a
>>> single node enviroment.
>>> Can this setting cause the problem? I don't think so...
>>>
>>> Thanks,
>>> Roberto
>>>
>>> On 28 March 2011 16:54, Ching-Cheng Chen wrote:
>>>
 tombstones removal also depends on your gc grace period setting.

 If you are pretty sure that you have proper gc grace period set and
 still on 0.7.0, then probably related to this bug.

 https://issues.apache.org/jira/browse/CASSANDRA-2059

 Regards,

 
 Chen
 Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA
 )
 http://www.evidentsoftware.com

 On Mon, Mar 28, 2011 at 10:40 AM, Roberto Bentivoglio <
 roberto.bentivog...@gmail.com> wrote:

> Hi all,
> we're working on a Cassandra 0.7.0 production enviroment with a store
> of data near to 500 GB.
> We need to periodically remove the tombstones from deleted/expired data
> performing a major compaction operation through nodetool.
> After invoking the compaction on a single column family we can see from
> JConsole that the LiveSSTableCount is going from 15 to 3 while the
> LiveDiskSpaceUsed is going from 90GB to 50GB.
> The problem now is that the space on the file system is been taken from
> Cassandra (I assumed from the old SSTable) and it isn't freed. We have 
> tried
> to perform a full GC from the JConsole as described in
> http://wiki.apache.org/cassandra/MemtableSSTable without any success.
> The space is freed only after a database restart.
>
> How can we free this disk space without restart the db?
>
> Thanks you very much,
> Roberto Bentivoglio
>


>>>
>>
>
>


Re: Help on how to configure an off-site DR node.

2011-03-29 Thread Brian Lycett
Hi.

Cheers for your reply.

Unfortunately there's too much data for snapshots to be practical.  The
data set will be at least 400GB initially, and the offsite node will be
on a 20Mbit leased line.

However I don't need the consistency level to be quorum for read/writes
in the production cluster, so am I right in still assuming that a
replication factor of 2 in a three node cluster allows for one node to
die without data loss?

If that's the case, I still don't understand how to ensure that the
offsite node will get a copy of the whole data set.
I've read through the O'Reilly book, and that doesn't seem to address
this scenario (unless I still don't get the Cassandra basics at a
fundamental level).

Does anyone know any tutorials/examples of such a set-up that would help
me out?

Cheers,

Brian



On Tue, 2011-03-29 at 21:56 +1100, aaron morton wrote:
> Be aware that at RF 2 the Quorum is 2, so you cannot afford to lose a
> replica when working at Quorum. 3 is really the starting point if you
> want some redundancy. 
> 
> 
> If you want to get your data offsite how about doing snapshots and
> moving them off
> site http://wiki.apache.org/cassandra/Operations#Consistent_backups
> 
> 
> The guide from Data Stax will give you a warm failover site, which
> sounds a bit more than what you need.  
> 
> 
> Hope that helps. 
> Aaron
> 
> 
> On 28 Mar 2011, at 22:47, Brian Lycett wrote:
> 
> > Hello.
> > 
> > I'm setting up a cluster that has three nodes in our production
> > rack.
> > My intention is to have a replication factor of two for this.
> > For disaster recovery purposes, I need to have another node (or
> > two?)
> > off-site.
> > 
> > The off-site node is entirely for the purpose of having an offsite
> > backup of the data - no clients will connect to it.
> > 
> > My question is, is it possible to configure Cassandra so that the
> > offsite node will have a full copy of the data set?
> > That is, somehow guarantee that a replica of all data will be
> > written to
> > it, but without having to resort to an ALL consistency level for
> > writes?
> > Although the offsite node will on a 20Mbit leased line, I'd rather
> > not
> > have the risk that the link goes down and breaks the cluster.
> > 
> > I've seen this suggestion here:
> > http://www.datastax.com/docs/0.7/operations/datacenter#disaster
> > but that configuration is vulnerable to the link breaking, and uses
> > four
> > nodes in the offsite location.
> > 
> > 
> > Regards,
> > 
> > Brian
> > 
> > 
> > 
> 
> 




Two column families or One super column family?

2011-03-29 Thread T Akhayo
Good afternoon,

I'm making my data model from scratch for cassandra, this means i can tune
and fine tune it for performance.

At this time i'm having problems choosing between a 2 column families or 1
super column family. I will illustrate with a example.

Sector, this defines a place, this is one or two properties.
Entry, a entry that is bound to a sector, this is simply some text and a few
properties.

I can model this with a super column family:

sectors{ //super column family
sector1{
uid1{
text: a text
user: joop
}
uid2{
text: more text
user: piet
}
}
sector2{
uid10{
text: even more text
user: marie
}
}
}

But i can also model this with 2 column families:

sectors{ // column family
sector1{
textid1: null
textid2: null
}
sector2{
textid4: null
}
}

texts{ //column family
textid1{
text: a text
user: joop
}
textid2{
text: more text
user: piet
}
}

With the super column family i can retrieve a list of texts for a specific
sector with only 1 request to cassandra.

With the 2 column families i need to send 2 requests to cassandra:
1. give me all textids from sector x. (returns x, y, z)
2. give me all texts that have id x, y, z.

In my final application it is likely that there will be a bit more writes
compared to reads.

I was wondering what the best approach is when it comes to performance. I
suspect that using super column families is slower compared the using column
families, but is it stil slower when using 2 column families and with 2
request to cassandra instead of 1 (with super column family).

Kind regards,
T. Akhayo


International language implementations

2011-03-29 Thread A J
Can someone list some of the current international language
implementations of cassandra ?

Thanks.


NegativeArraySizeException during upgrade from 0.7.0 to 0.7.4

2011-03-29 Thread Wenjun Che
even after I ran compact on all keyspaces.  Cassandra exits after the
exception so I can't try "nodetool scrub".
There is just one node.

java.lang.NegativeArraySizeException
at
org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:280)
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:219)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:472)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:453)
at org.apache.cassandra.db.Table.initCf(Table.java:317)
at org.apache.cassandra.db.Table.(Table.java:254)
at org.apache.cassandra.db.Table.open(Table.java:110)
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)


Thanks


How to determine if repair need to be run

2011-03-29 Thread mcasandra
Is there a way to monitor and tell if one of the node require repair? For eg:
Node was down and came back up but in the meantime HH were dropped. Now
unless we are really careful in all the scenarios we wouldn't have any
problems :) but in general when things are going awry you might forget about
running repair or other commands until there is a customer impact.

Is there a way to monitor and alert on such things like repair?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6220005.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Is there a way to monitor and tell if one of the node require repair? For eg:
> Node was down and came back up but in the meantime HH were dropped. Now
> unless we are really careful in all the scenarios we wouldn't have any
> problems :) but in general when things are going awry you might forget about
> running repair or other commands until there is a customer impact.
>
> Is there a way to monitor and alert on such things like repair?

You should always run repair as required by GCGraceSeconds, unless you
really really know what you're doing. So no, there is no 'needs
repair' state. See
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

(What *would* be useful perhaps is to be able to ask a node for the
time of its most recently started repair, to facilitate easier
comparison with GCGraceSeconds for monitoring purposes.)

-- 
/ Peter Schuller


Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Yes but that doesn't really provide the monitoring that will really be
helpful. If I don't realize it until 2 days then we potentially could be
returning inconsistent results or not have data sync for 2 days until repair
is run. It will be best to be able to monitor these things so that it can be
run as soon as it is required (eg node down). Have such monitoring will be
helpful for operations team to monitor also who may not know all internals
of cassandra.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6220171.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Yes but that doesn't really provide the monitoring that will really be
> helpful. If I don't realize it until 2 days then we potentially could be
> returning inconsistent results or not have data sync for 2 days until repair
> is run. It will be best to be able to monitor these things so that it can be
> run as soon as it is required (eg node down). Have such monitoring will be
> helpful for operations team to monitor also who may not know all internals
> of cassandra.

For the purpose of this discussion, nodes are always down in any
non-trivial time window. You may have flapping in the ring, individual
requests may time out, etc. Do not assume repair is not required just
because you have not had some kind of major outtage where a human
became consciously aware that a node was officially "down".

Unless you really know what you're doing, the thing to monitor is the
completion of repairs at sufficient frequency. In the event that
repair *doesn't* run, there needs to be enough time left until
tombstone expiry for someone to take some kind of action (whether that
is re-running repair again or re-configuring gcgraceseconds
temporarily is another matter).

Repair is not something that you only run in the event of some major
issue; repair is a regularly scheduled operation for your typical
cluster.

The invariant required by Cassandra is that repairs complete prior to
tombstones expiring (see URL in previous e-mail). Some applications,
given some combination of consistency levels, use-case and
requirements, may benefit from more frequent repair. But the important
part, is the minimum repair frequency mandated by Cassandra - and
determined by GCGraceSeconds.

-- 
/ Peter Schuller


Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Thanks! I was keeping the discussion simple. But you make my case stronger
that we need such monitoring since it looks like it should always be run but
we want to run it as soon as it is required.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6220228.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Thanks! I was keeping the discussion simple. But you make my case stronger
> that we need such monitoring since it looks like it should always be run but
> we want to run it as soon as it is required.

The way to deal with individual requests timing out or transient
flapping, is to use a consistency level which is appropriate for your
application along with an appropriately configured level of read
repair.

If you *require* that reads see writes, use QUORUM. If you only softly
require it for "99.x% of cases" or similar, use CL.ONE with read
repair turned on. If requirements are very lax, maybe use CL.ONE with
read repair turned off or set very low (only useful for the
performance improvement it will imply relative to full read repair).

Running nodetool repair as soon as a single write times out to some
node, is not the way to go (ok, I can think of situations where it
might be - but those would be very very obscure cases unless I am
overlooking something).

Bottom line: If you want a flag that is set to true whenever some node
ever may have dropped a write, that functionality currently does not
exist. It may be possible to add, but I would be skeptical as to it
being committed unless a clear need can be shown. Maybe if you
describe your situation we can better agree on what is appropriate.

For monitoring that repair does happen within desired time periods,
there *is* a clear need for monitoring and exposing something like a
time-of-start-of-last-successful-repair would be helpful I think, but
doesn't currently exist (as far as I know), such that the script (or
whatever) doing the repairs would have to solve that problem.

-- 
/ Peter Schuller


Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
I think my problem is that I don't want to remember to run read repair. I
want to know from cassandra that I "need" to run repair "now". This seems
like a important functionality that need to be there. I don't really want to
find out hard way that I forgot to run "repair" :)

Say Node A, B, C. Now A is inconsistent and needs repair. Now Node B goes
down. Even with Quorum this will fail read and write. There could be other
scenarios. Looks like repair is critical commands that is expected to be
run, but "when"? Saying once within GCGraceSeconds might be ok for some but
not for everyone where we want bring all nodes in sync ASAP.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6220423.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: NegativeArraySizeException during upgrade from 0.7.0 to 0.7.4

2011-03-29 Thread Jonathan Ellis
Remove the cache file.

On Tue, Mar 29, 2011 at 11:44 AM, Wenjun Che  wrote:
>
> even after I ran compact on all keyspaces.  Cassandra exits after the
> exception so I can't try "nodetool scrub".
> There is just one node.
>
> java.lang.NegativeArraySizeException
>     at
> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:280)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:219)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:472)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:453)
>     at org.apache.cassandra.db.Table.initCf(Table.java:317)
>     at org.apache.cassandra.db.Table.(Table.java:254)
>     at org.apache.cassandra.db.Table.open(Table.java:110)
>     at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
>
>
> Thanks
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


client connection timeouts vs. thrift timeouts

2011-03-29 Thread David Hawthorne
I've been scratching my head on this one for a day now and I'm hoping someone 
can help clear it up.

The initial question was: does it make sense to have a configurable connection 
timeout (for a client connecting to a cassandra server) separate from the 
thrift socket timeout (which governs *all* interactions)?

The context is that I've written a java middleware (using hector with 
connection pooling, talking to a round-robin vip that will do healthchecking 
and will also need a connection timeout configured) for a php frontend to talk 
to, which sits between cassandra and the frontend.  I'd like to let the 
frontend know that the cassandra service is down fairly quickly so it can move 
on to other logic sooner rather than later, and I don't want it to have to wait 
however long I've set the thrift socket timeout to be.  The feedback I got 
initially was that I would run into problems with high load, and could run into 
delays when cassandra gets overwhelmed.

Does this make sense or am I just looking at this from the wrong angle?

Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
First some specifics:

> I think my problem is that I don't want to remember to run read repair. I

You are not expected to remember to do so manually. Typically periodic
repairs would be automated in some fashion, such as by having a cron
job on each node that starts the repair. Typically some kind of logic
may be applied to avoid running repair on all nodes at the same time.

> want to know from cassandra that I "need" to run repair "now". This seems
> like a important functionality that need to be there. I don't really want to
> find out hard way that I forgot to run "repair" :)

See further below.

> Say Node A, B, C. Now A is inconsistent and needs repair. Now Node B goes
> down. Even with Quorum this will fail read and write. There could be other

WIth writes and reads at QUORUM, a read following a write is
guaranteed to see the write. If enough nodes are down such that QUORUM
is not satisfied, the read operation will fail. Node B going down
above is not a problem. If your RF is 3, a write would have been
required to succeed on A and B, or B and C, or A and B. Since reads
have the same requirement, there is by definition always overlap
between the read set and write set. This is the fundamental point of
using QUORUM.

> scenarios. Looks like repair is critical commands that is expected to be
> run, but "when"? Saying once within GCGraceSeconds might be ok for some but
> not for everyone where we want bring all nodes in sync ASAP.

Let me try to put it in a different light.

The reasons to use 'nodetool repair' seems to fall roughly into two categories:

(a) Ensuring that 'nodetool repair' has been run within GCGraceSeconds.
(b) Helping to increase the 'average' level of consistency as observed
by the application.

These two cases are very very different. Cassandra makes certain
promises on data consistency, that clients can control in part by
consistency levels. If (a) fails, such that a 'nodetool repair' was
not run in time, the cluster will behave *incorrectly*. It will fail
to satisfy the guarantees that it supposedly promises. This is
essentially a binary condition; either you run nodetool repair as
often as is required for correct functioning, or you don't. This is a
*hard* requirement, but is entirely irrelevant until you actually
reach the limit imposed by GCGraceSeconds. There is no need to run
'repair' as soon as possible (for some definition of 'soon as
possible') in order to satisfy (a). You're 100% fine until you're not,
at which time you've caused Cassandra to violate its guarantees. So -
it's *important* to run repair due to (a), but it is not *urgent* to
do so.

(b) on the other hand is very different. Assuming your application and
cluster is one that wants to run repair more often than GCGraceSeconds
for whatever reason (for example, for performance you want to use
CL.ONE and turn off read-repair, but your data set is such that it's
practical to use pretty frequent repairs to keep inconsistencies
down), it may be beneficial to do so. But this is essentially a soft
'preference' for how often repairs should be run; there is no magic
limit at which something breaks where it did not break before. This
becomes a matter of setting a reasonable repair frequency for your use
case, and and individual node perhaps failing a repair once for some
obscure reason is not an issue.

For (b), you should be fine just triggering repair sufficiently often
as appropriate with no need to even have strict monitoring or demands.
Almost by definition the requirements are not strict; if they were
stricter, you should be using QUORUM or maybe ONE + read repair. So in
this case, "remembering" is not a problem - you just install your
cronjob that does it often enough, approximately, and don't worry
about it.

For (a), there is the hard requirement. So this is where you *really*
want it completing, and preferably have some kind of
alarm/notification if a repair doesn't run in time.

Note that for (b), it doesn't help to know the moment a write didn't
get replicated fully. That's bound to happen often (every time a node
is restarted, there is some short hiccup, etc). A single write failing
to replicate is an almost irrelevant event.

For (a) on the other hand, it *is* helpful and required to keep track
of the time of the last successful repair. Cassandra could be better
at making this easier I think, but it is an entirely different problem
than detecting that "somewhere in the cluster, a non-zero amount of
writes may possibly have failed to replicate". The former is directly
relevant and important; the latter is almost always completely
irrelevant to the problem at hand.

Sorry to be harping on the same issue, but I really think it's worth
trying to be clear about this from the start :) If you do have a
use-case that somehow truly is not consistent with the above, it would
however be interesting to hear what it is.

Is the above clear? I'm thinking maybe it's worth adding to the FAQ
unless it's more confusing than 

Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
Looks like you didn't get to see my updated post :) This is the scenario I
was referring to:

Say Node A, B, C. Now A is inconsistent and needs repair. Now after a day
Node B goes down and comes up. Now both nodes are inconsistent. Even with
Quorum this will fail read and write by returning inconsistent results when
A & B are used as Quorum.

So we need to remember to run repair on A ASAP. But if we rely on doing it
somepoint later we run into issues as stated above. I really think this is
needs to be beefed up with monitoring. This will help developers to not to
operational stuff. And operations team can easily monitor and run command to
reduce customer impact.

Is it really complicated to expose this monitoring in Cassandra? Could there
be a configuration parameter that defines the threshold or monitor if it is
consistently increasing in no. of inconsistent writes?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6220683.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


OOM in compaction - cassandra 0.7.4

2011-03-29 Thread Marek Żebrowski
Hi, I am getting repeatable OOM during compaction:

ERROR [CompactionExecutor:1] 2011-03-29 14:52:29,193
AbstractCassandraDaemon.java (line 112) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:237)
at 
org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:230)
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:35)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:106)
at 
org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:97)
at 
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:147)
at 
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
at 
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at 
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at 
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449)
at 
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
at 
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

-- 
Marek Żebrowski


Re: client connection timeouts vs. thrift timeouts

2011-03-29 Thread Narendra Sharma
I think it make sense to have two different timeouts. The client timeout
clearly affects user's experience with the application. The timeout could be
due to number of factors not directly related to thrift connection. The
client timeout could trigger retry of operation on a different instance of
the application.

-Naren

On Tue, Mar 29, 2011 at 12:16 PM, David Hawthorne  wrote:

> I've been scratching my head on this one for a day now and I'm hoping
> someone can help clear it up.
>
> The initial question was: does it make sense to have a configurable
> connection timeout (for a client connecting to a cassandra server) separate
> from the thrift socket timeout (which governs *all* interactions)?
>
> The context is that I've written a java middleware (using hector with
> connection pooling, talking to a round-robin vip that will do healthchecking
> and will also need a connection timeout configured) for a php frontend to
> talk to, which sits between cassandra and the frontend.  I'd like to let the
> frontend know that the cassandra service is down fairly quickly so it can
> move on to other logic sooner rather than later, and I don't want it to have
> to wait however long I've set the thrift socket timeout to be.  The feedback
> I got initially was that I would run into problems with high load, and could
> run into delays when cassandra gets overwhelmed.
>
> Does this make sense or am I just looking at this from the wrong angle?




-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> Looks like you didn't get to see my updated post :) This is the scenario I
> was referring to:

I don't see what's different. If you write a QUORUM and read at
QUORUM, your read is guaranteed to see a previous write, period. If
that cannot be satisfied, the read will fail due to not being able to
satisfy QUORUM.

What is the actual sequence of events that you are worried about? B
going down is not a problem. *By definition*, any two quorum of nodes
are overlapping by at least one node. Either a read succeeds and you
see the data, or there is no quorum and the read fails.

-- 
/ Peter Schuller


Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
So from what I am understanding is that there is no need to monitor this and
no need to remember running repair? If that's the case then manual repair
wouldn't be needed ever, correct?

But if Manual repair is needed then shouldn't there be ability to monitor?
Having dealt with production problems I know how handy these monitoring can
be. In our case there generally is development team and then operations
team. Operations team is generally monitoring system and may not know about
diff. scnearios like GCGracePeriod.

I would say anything manual that someone need to be done there should be
some ability to monitor that to have a successful operations. And nothing
better if Cassandra can expose that ability. Otherwise we will need yet
another way of reminding us to run repair :)

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6221041.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> So from what I am understanding is that there is no need to monitor this and
> no need to remember running repair? If that's the case then manual repair
> wouldn't be needed ever, correct?

No. See my next-to-last e-mail where I go through two reasons to run
nodetool repair, of which (a) is absolutely required in order for
Cassandra to deliver on the consistency guarantees it promises.

At this point I'm not sure how to rephrase/clarify because I'm not
really clear on where we're not understanding each other.

-- 
/ Peter Schuller


Re: International language implementations

2011-03-29 Thread Peter Schuller
> Can someone list some of the current international language
> implementations of cassandra ?

What is an "international language implementation of Cassandra"?

-- 
/ Peter Schuller


Re: International language implementations

2011-03-29 Thread Sasha Dolgy
I store multiple languages in a cf if this is what you are on about...
On Mar 29, 2011 5:42 PM, "Peter Schuller" 
wrote:
>> Can someone list some of the current international language
>> implementations of cassandra ?
>
> What is an "international language implementation of Cassandra"?
>
> --
> / Peter Schuller


Re: International language implementations

2011-03-29 Thread A J
Example, taobao.com is a chinese online bid site. All data is chinese
and they use Mongodb successfully.
Are there similar installations of cassandra where data is non-latin ?

I know in theory, it should all work as cassandra has full utf-8
support. But unless there are real implementations, you cannot be sure
of the issues related to storing,sorting etc..

Regards.


On Tue, Mar 29, 2011 at 5:41 PM, Peter Schuller
 wrote:
>> Can someone list some of the current international language
>> implementations of cassandra ?
>
> What is an "international language implementation of Cassandra"?
>
> --
> / Peter Schuller
>


Re: Help on how to configure an off-site DR node.

2011-03-29 Thread aaron morton
Snapshots take use a hard link and do not take additional disk space 
http://www.mail-archive.com/user@cassandra.apache.org/msg11028.html

WRT losing a node, it's not the number of total nodes thats important is the 
number of replicas. If you have 3 nodes with RF2 and you lose one of the 
replicas you will not be able to work at Quorum level. 

You *may* be able to use the NetworkTopologySnitch to have 2 replicas in DC1 
and 1 replica in DC2. Then use the property file snitch to only put one node in 
the second DC. Finally work against DC1 with LOCAL_QUORUM so you do not wait on 
DC 2 and you can tolerate the link to DC2 failing. That also means there is no 
guarantee DC2 is up to date. If you were to ship snapshots you would have a 
better idea of what you had in DC2. 

FWIW I'm not convinced that setting things up so that one node gets *all* the 
data in DC2 is a good idea. It would make an offsite replica that could only 
work at essentially CL ONE and would require a lot of streaming to move to a 
cluster with more nodes. I don't have time right now to think through all of 
the implications now (may be able to do some more thinking tonight), but the 
data stax guide creates a warm fail over that is ready to work. I'm not sure 
what this approach would give you in case of failure: a backup to be restored 
or a failover installation. 

Hope that helps. 
Aaron


On 30 Mar 2011, at 00:38, Brian Lycett wrote:

> Hi.
> 
> Cheers for your reply.
> 
> Unfortunately there's too much data for snapshots to be practical.  The
> data set will be at least 400GB initially, and the offsite node will be
> on a 20Mbit leased line.
> 
> However I don't need the consistency level to be quorum for read/writes
> in the production cluster, so am I right in still assuming that a
> replication factor of 2 in a three node cluster allows for one node to
> die without data loss?
> 
> If that's the case, I still don't understand how to ensure that the
> offsite node will get a copy of the whole data set.
> I've read through the O'Reilly book, and that doesn't seem to address
> this scenario (unless I still don't get the Cassandra basics at a
> fundamental level).
> 
> Does anyone know any tutorials/examples of such a set-up that would help
> me out?
> 
> Cheers,
> 
> Brian
> 
> 
> 
> On Tue, 2011-03-29 at 21:56 +1100, aaron morton wrote:
>> Be aware that at RF 2 the Quorum is 2, so you cannot afford to lose a
>> replica when working at Quorum. 3 is really the starting point if you
>> want some redundancy. 
>> 
>> 
>> If you want to get your data offsite how about doing snapshots and
>> moving them off
>> site http://wiki.apache.org/cassandra/Operations#Consistent_backups
>> 
>> 
>> The guide from Data Stax will give you a warm failover site, which
>> sounds a bit more than what you need.  
>> 
>> 
>> Hope that helps. 
>> Aaron
>> 
>> 
>> On 28 Mar 2011, at 22:47, Brian Lycett wrote:
>> 
>>> Hello.
>>> 
>>> I'm setting up a cluster that has three nodes in our production
>>> rack.
>>> My intention is to have a replication factor of two for this.
>>> For disaster recovery purposes, I need to have another node (or
>>> two?)
>>> off-site.
>>> 
>>> The off-site node is entirely for the purpose of having an offsite
>>> backup of the data - no clients will connect to it.
>>> 
>>> My question is, is it possible to configure Cassandra so that the
>>> offsite node will have a full copy of the data set?
>>> That is, somehow guarantee that a replica of all data will be
>>> written to
>>> it, but without having to resort to an ALL consistency level for
>>> writes?
>>> Although the offsite node will on a 20Mbit leased line, I'd rather
>>> not
>>> have the risk that the link goes down and breaks the cluster.
>>> 
>>> I've seen this suggestion here:
>>> http://www.datastax.com/docs/0.7/operations/datacenter#disaster
>>> but that configuration is vulnerable to the link breaking, and uses
>>> four
>>> nodes in the offsite location.
>>> 
>>> 
>>> Regards,
>>> 
>>> Brian
>>> 
>>> 
>>> 
>> 
>> 
> 
> 



Re: How to determine if repair need to be run

2011-03-29 Thread mcasandra
I think what I feel is that there is a need to know if repair is required
flag in order for team to manage the cluster.

Atleast at minimum, Is there a flag somewhere that tells if repair was run
within GCGracePeriod?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-to-determine-if-repair-need-to-be-run-tp6220005p6221157.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to determine if repair need to be run

2011-03-29 Thread Peter Schuller
> I think what I feel is that there is a need to know if repair is required
> flag in order for team to manage the cluster.

And again, repair is always required essentially. You should *always*
run it within the necessary period as determined by GCGraceSeconds.

> Atleast at minimum, Is there a flag somewhere that tells if repair was run
> within GCGracePeriod?

No, and it's not what you want either since by the time that flags
says "false", it's already too late :) This is why my best suggestion
for a simple improvement would be to expose the time since the last
successful repair.

Currently this information is, to my knowledge, not exposed by
Cassandra so it is the responsibility of your deployment strategy to
monitor for this. One simple version (not to be used as-is) might be:

  set -e # important
  touch /path/to/flagfile.tmp
  nodetool -h localhost repair
  mv /path/to/flagfile.tmp /path/to/flagfile

The mtime of /path/to/flagfile is the indicator of when repair
succeeded last, assuming a recent version of Cassandra where 'nodetool
repair' is blocking.

The key point is: What you want to monitor, is the time since last
successful repair. If that time is less than some triggering low water
mark, someone needs to be informed because you are X hours away from
violating the requirements imposed by GCGraceSeconds.

(Cassandra could make this easier, but just be clear on what it is
that you're actually looking for. You're *not* looking for "has a
write been timed out ever in the cluster", but rather "are we closer
to GCGraceSeconds than some threshold which we normally should never
reach if repairs are functioning and running as intended".)

-- 
/ Peter Schuller


Re: Two column families or One super column family?

2011-03-29 Thread aaron morton
I would go with the solution that means you only have to make one request to 
serve your reads, so consider the super CF approach. 

There are some downsides to super columns see 
http://wiki.apache.org/cassandra/CassandraLimitations and they tend to have a 
love-them-hate-them reputation.

One thing to consider is that you do not need to model every attribute of your 
entity as a column in cassandra. Especially if you are always going to pull 
back all the attributes. So you could do your super CF approach with a standard 
CF, just pack the columns into some sort of structure such as JSON and store 
them as a blob. 

Or you can use a naming scheme in the column names with a standard CF, e.g. 
uuid1.text and uuid2.text 

Hope that helps. 
Aaron

On 30 Mar 2011, at 01:05, T Akhayo wrote:

> Good afternoon,
> 
> I'm making my data model from scratch for cassandra, this means i can tune 
> and fine tune it for performance.
> 
> At this time i'm having problems choosing between a 2 column families or 1 
> super column family. I will illustrate with a example.
> 
> Sector, this defines a place, this is one or two properties.
> Entry, a entry that is bound to a sector, this is simply some text and a few 
> properties.
> 
> I can model this with a super column family:
> 
> sectors{ //super column family
> sector1{
> uid1{
> text: a text
> user: joop
> }
> uid2{
> text: more text
> user: piet
> }
> }
> sector2{
> uid10{
> text: even more text
> user: marie
> }
> }
> }
> 
> But i can also model this with 2 column families:
> 
> sectors{ // column family
> sector1{
> textid1: null
> textid2: null
> }
> sector2{
> textid4: null
> }
> }
> 
> texts{ //column family
> textid1{
> text: a text
> user: joop
> }
> textid2{
> text: more text
> user: piet
> }
> }
> 
> With the super column family i can retrieve a list of texts for a specific 
> sector with only 1 request to cassandra.
> 
> With the 2 column families i need to send 2 requests to cassandra:
> 1. give me all textids from sector x. (returns x, y, z)
> 2. give me all texts that have id x, y, z.
> 
> In my final application it is likely that there will be a bit more writes 
> compared to reads.
> 
> I was wondering what the best approach is when it comes to performance. I 
> suspect that using super column families is slower compared the using column 
> families, but is it stil slower when using 2 column families and with 2 
> request to cassandra instead of 1 (with super column family).
> 
> Kind regards,
> T. Akhayo



Re: OOM in compaction - cassandra 0.7.4

2011-03-29 Thread Tyler Hobbs
You might want to lower your in memory compaction
limit,
but I would also recommend checking your heap
sizeand
monitoring (with something like jconsole) to see how much heap
pressure
there is.

On Tue, Mar 29, 2011 at 3:36 PM, Marek Żebrowski
wrote:

> Hi, I am getting repeatable OOM during compaction:
>
> ERROR [CompactionExecutor:1] 2011-03-29 14:52:29,193
> AbstractCassandraDaemon.java (line 112) Fatal exception in thread
> Thread[CompactionExecutor:1,1,main]
> java.lang.OutOfMemoryError: Java heap space
>at java.util.Arrays.copyOf(Arrays.java:2786)
>at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>at java.io.DataOutputStream.write(DataOutputStream.java:90)
>at
> org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:237)
>at
> org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:230)
>at
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
>at
> org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:35)
>at
> org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87)
>at
> org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:106)
>at
> org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:97)
>at
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:147)
>at
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
>at
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
>at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
>at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>at
> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
>at
> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
>at
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:449)
>at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
>at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
>at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:662)
>
> --
> Marek Żebrowski
>



-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Ditching Cassandra

2011-03-29 Thread Gregori Schmidt
hi,

After using Cassandra during development for the past 8 months my team and I
made the decision to switch from Cassandra to MongoDB this morning.  I
thought I'd share some thoughts on why we did this and where Cassandra might
benefit from improvement.

   - The API is horrible and it produces pointlessly verbose code in
   addition to being utterly confusing.  EVERYTHING takes a lot of time to
   implement with Cassandra, and to be frank, it is incredibly tiring.  For
   this reason alone I no longer recommend Cassandra.  If you want an example,
   pick up the O'Reilly book on Cassandra and look through the examples.  Such
   MASSIVE amounts of code for doing nearly NOTHING.  This is ridiculous.
Didn't this strike anyone else as ridiculous?  It should have!
   - You need to have official client libraries and they need to be
   programmer friendly.  Yes, I know there are nice people maintaining a
   plethora of different libraries, but you need to man up and face reality:
the chaos that is the Cassandra client space is a horrible mess.
   - It is buggy and the solution seems to be to just go to the next
   release.  And the next.  And the next.  Which would be okay if you could
   upgrade all the time, but what to do once you hit production?

I would recommend that everyone interested in improving Cassandra take the
day off,  download MongoDB and read
https://github.com/karlseguin/the-little-mongodb-book . Then, while you are
downloading, unpacking, looking at what was in the JAR, reading the book and
pawing through the examples: _pay attention_ to the neatness and the
effortlessness the ease with which you can use MongoDB.  Then spend the rest
of the day implementing something on top of it to gain some hacking
experience.

No, really.  Do it.  This is important.  You need to connect with the user
and you need to understand what you ought to be aspiring to.

In any case, thanks for all the effort that went into Cassandra.  I will
check back from time to time and perhaps in a year or so it'll be time to
re-evaluate Cassandra.

PS: one last thing.  It took us less time to rewrite the DB-interface for
our system to MongoDB AND port over our data than it took to write the
Cassandra implementation.

~G


Re: Ditching Cassandra

2011-03-29 Thread Drew Kutcharian
Hi Gregori,

I'm about to start a new project and I was considering using MongoDB too, but I 
just couldn't find a nice way to scale it. Seems like for scaling you need to 
use the same style as MySQL, having master/slaves and replicas, which for us 
was a deal breaker. We just couldn't see how you would scale MongoDB to support 
massive databases that you can reach using Cassandra/HBase.

I personally think that's where Cassandra shines and if you don't need that 
massive scale, then there are a lot nicer solutions out there.

How do you scale MongoDB to store massive amounts of data?

- Drew





On Mar 29, 2011, at 5:11 PM, Gregori Schmidt wrote:

> hi,
> 
> After using Cassandra during development for the past 8 months my team and I 
> made the decision to switch from Cassandra to MongoDB this morning.  I 
> thought I'd share some thoughts on why we did this and where Cassandra might 
> benefit from improvement.
> The API is horrible and it produces pointlessly verbose code in addition to 
> being utterly confusing.  EVERYTHING takes a lot of time to implement with 
> Cassandra, and to be frank, it is incredibly tiring.  For this reason alone I 
> no longer recommend Cassandra.  If you want an example, pick up the O'Reilly 
> book on Cassandra and look through the examples.  Such MASSIVE amounts of 
> code for doing nearly NOTHING.  This is ridiculous.  Didn't this strike 
> anyone else as ridiculous?  It should have!
> You need to have official client libraries and they need to be programmer 
> friendly.  Yes, I know there are nice people maintaining a plethora of 
> different libraries, but you need to man up and face reality:  the chaos that 
> is the Cassandra client space is a horrible mess.
> It is buggy and the solution seems to be to just go to the next release.  And 
> the next.  And the next.  Which would be okay if you could upgrade all the 
> time, but what to do once you hit production?
> I would recommend that everyone interested in improving Cassandra take the 
> day off,  download MongoDB and read 
> https://github.com/karlseguin/the-little-mongodb-book . Then, while you are 
> downloading, unpacking, looking at what was in the JAR, reading the book and 
> pawing through the examples: _pay attention_ to the neatness and the 
> effortlessness the ease with which you can use MongoDB.  Then spend the rest 
> of the day implementing something on top of it to gain some hacking 
> experience.
> 
> No, really.  Do it.  This is important.  You need to connect with the user 
> and you need to understand what you ought to be aspiring to.
> 
> In any case, thanks for all the effort that went into Cassandra.  I will 
> check back from time to time and perhaps in a year or so it'll be time to 
> re-evaluate Cassandra.
> 
> PS: one last thing.  It took us less time to rewrite the DB-interface for our 
> system to MongoDB AND port over our data than it took to write the Cassandra 
> implementation.
> 
> ~G



Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Wed, 2011-03-30 at 02:11 +0200, Gregori Schmidt wrote:
>- The API is horrible and it produces pointlessly verbose code in
>addition to being utterly confusing.  EVERYTHING takes a lot of
> time to implement with Cassandra, and to be frank, it is incredibly
> tiring.  For this reason alone I no longer recommend Cassandra.  If
> you want an example, pick up the O'Reilly book on Cassandra and look
> through the examples.  Such MASSIVE amounts of code for doing nearly
> NOTHING.  This is ridiculous. Didn't this strike anyone else as
> ridiculous?  It should have!

Yes, it did, which is why for 0.8 we have CQL
(https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co).

>- You need to have official client libraries and they need to be
>programmer friendly.  Yes, I know there are nice people maintaining
> a plethora of different libraries, but you need to man up and face
> reality: the chaos that is the Cassandra client space is a horrible
> mess.

The client space as a whole *is* a mess, despite heroic efforts on the
part of our third-party API maintainers, but forcing them in-tree is not
going to solve anything.  In fact, it would very likely make it worse by
adding unnecessary overhead to contribution, and discouraging
innovation.

The root cause goes back to your first point, the RPC interface is
baroque, and too tightly coupled to Cassandra's internals.  The
third-party library maintainers can only do so much to paper over that;
The Fail shines through.

The solution here is the same as for point #1 above, CQL.  And, the idea
is to include in-tree "drivers", basically, the minimum amount of common
code that all third-party libs would need to implement (think connection
pooling, parameter substitution, etc).  We already have drivers for
Java, Python, and Twisted, and folks are working PHP and Perl (that I'm
aware of).

>- It is buggy and the solution seems to be to just go to the next
>release.  And the next.  And the next.  Which would be okay if you
> could upgrade all the time, but what to do once you hit production?

0.7 has been a rough ride, no doubt.  We spent too much time pushing in
too many features, and didn't do a good job of drawing a line in the
sand when it came time to release.  Our track record prior to 0.7 was
Not Horrible, and trending toward Better And Better, and we've made some
adjustment to the release process, so I'm hopeful we'll get back on
track.

Also, new for 0.8 is backward compatible messaging, which will allow you
to smoothly perform rolling (non-disruptive) upgrades.  That, combined
with a stable query interface (CQL), will really reduce the barrier to
upgrades.

> I would recommend that everyone interested in improving Cassandra take
> the day off,  download MongoDB and read
> https://github.com/karlseguin/the-little-mongodb-book . Then, while
> you are downloading, unpacking, looking at what was in the JAR,
> reading the book and pawing through the examples: _pay attention_ to
> the neatness and the effortlessness the ease with which you can use
> MongoDB.  Then spend the rest of the day implementing something on top
> of it to gain some hacking experience.
> 
> No, really.  Do it.  This is important.  You need to connect with the
> user and you need to understand what you ought to be aspiring to.
> 
> In any case, thanks for all the effort that went into Cassandra.  I
> will check back from time to time and perhaps in a year or so it'll be
> time to re-evaluate Cassandra.

In a year we'll have achieved Total World Domination. :)

> PS: one last thing.  It took us less time to rewrite the DB-interface
> for our system to MongoDB AND port over our data than it took to write
> the Cassandra implementation. 
-- 
Eric Evans
eev...@rackspace.com



Re: Ditching Cassandra

2011-03-29 Thread Jake Luciani
Hi Gregori,

What language *were* you using to interact with cassandra? were you unable
to find a wrapper API that you found

We have discussed adopting the "best of" client api's in cassandra but we
decided it's better for the community to naturally develop them.  I think
this has also motivated Eric to develop CQL in response to the folks who
find the thrift api hard to use.

-Jake

On Tue, Mar 29, 2011 at 8:11 PM, Gregori Schmidt  wrote:

> hi,
>
> After using Cassandra during development for the past 8 months my team and
> I made the decision to switch from Cassandra to MongoDB this morning.  I
> thought I'd share some thoughts on why we did this and where Cassandra might
> benefit from improvement.
>
>- The API is horrible and it produces pointlessly verbose code in
>addition to being utterly confusing.  EVERYTHING takes a lot of time to
>implement with Cassandra, and to be frank, it is incredibly tiring.  For
>this reason alone I no longer recommend Cassandra.  If you want an example,
>pick up the O'Reilly book on Cassandra and look through the examples.  Such
>MASSIVE amounts of code for doing nearly NOTHING.  This is ridiculous.
> Didn't this strike anyone else as ridiculous?  It should have!
>- You need to have official client libraries and they need to be
>programmer friendly.  Yes, I know there are nice people maintaining a
>plethora of different libraries, but you need to man up and face reality:
> the chaos that is the Cassandra client space is a horrible mess.
>- It is buggy and the solution seems to be to just go to the next
>release.  And the next.  And the next.  Which would be okay if you could
>upgrade all the time, but what to do once you hit production?
>
> I would recommend that everyone interested in improving Cassandra take the
> day off,  download MongoDB and read
> https://github.com/karlseguin/the-little-mongodb-book . Then, while you
> are downloading, unpacking, looking at what was in the JAR, reading the book
> and pawing through the examples: _pay attention_ to the neatness and the
> effortlessness the ease with which you can use MongoDB.  Then spend the rest
> of the day implementing something on top of it to gain some hacking
> experience.
>
> No, really.  Do it.  This is important.  You need to connect with the user
> and you need to understand what you ought to be aspiring to.
>
> In any case, thanks for all the effort that went into Cassandra.  I will
> check back from time to time and perhaps in a year or so it'll be time to
> re-evaluate Cassandra.
>
> PS: one last thing.  It took us less time to rewrite the DB-interface for
> our system to MongoDB AND port over our data than it took to write the
> Cassandra implementation.
>
> ~G
>



-- 
http://twitter.com/tjake


Re: Ditching Cassandra

2011-03-29 Thread Colin
Eric,

Seems like the answer to everything is 8. 

8 has been very painful.

Are you saying that 8 will or not be compatible with 7?

If not, would you recommend waiting until 8?  We have done an awful lot of 
work, have an awful  lot of work left, and have become very frustrated.

Any idea on when 8 will be available?
 

On Mar 29, 2011, at 8:15 PM, Eric Evans  wrote:

> On Wed, 2011-03-30 at 02:11 +0200, Gregori Schmidt wrote:
>>   - The API is horrible and it produces pointlessly verbose code in
>>   addition to being utterly confusing.  EVERYTHING takes a lot of
>> time to implement with Cassandra, and to be frank, it is incredibly
>> tiring.  For this reason alone I no longer recommend Cassandra.  If
>> you want an example, pick up the O'Reilly book on Cassandra and look
>> through the examples.  Such MASSIVE amounts of code for doing nearly
>> NOTHING.  This is ridiculous. Didn't this strike anyone else as
>> ridiculous?  It should have!
> 
> Yes, it did, which is why for 0.8 we have CQL
> (https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co).
> 
>>   - You need to have official client libraries and they need to be
>>   programmer friendly.  Yes, I know there are nice people maintaining
>> a plethora of different libraries, but you need to man up and face
>> reality: the chaos that is the Cassandra client space is a horrible
>> mess.
> 
> The client space as a whole *is* a mess, despite heroic efforts on the
> part of our third-party API maintainers, but forcing them in-tree is not
> going to solve anything.  In fact, it would very likely make it worse by
> adding unnecessary overhead to contribution, and discouraging
> innovation.
> 
> The root cause goes back to your first point, the RPC interface is
> baroque, and too tightly coupled to Cassandra's internals.  The
> third-party library maintainers can only do so much to paper over that;
> The Fail shines through.
> 
> The solution here is the same as for point #1 above, CQL.  And, the idea
> is to include in-tree "drivers", basically, the minimum amount of common
> code that all third-party libs would need to implement (think connection
> pooling, parameter substitution, etc).  We already have drivers for
> Java, Python, and Twisted, and folks are working PHP and Perl (that I'm
> aware of).
> 
>>   - It is buggy and the solution seems to be to just go to the next
>>   release.  And the next.  And the next.  Which would be okay if you
>> could upgrade all the time, but what to do once you hit production?
> 
> 0.7 has been a rough ride, no doubt.  We spent too much time pushing in
> too many features, and didn't do a good job of drawing a line in the
> sand when it came time to release.  Our track record prior to 0.7 was
> Not Horrible, and trending toward Better And Better, and we've made some
> adjustment to the release process, so I'm hopeful we'll get back on
> track.
> 
> Also, new for 0.8 is backward compatible messaging, which will allow you
> to smoothly perform rolling (non-disruptive) upgrades.  That, combined
> with a stable query interface (CQL), will really reduce the barrier to
> upgrades.
> 
>> I would recommend that everyone interested in improving Cassandra take
>> the day off,  download MongoDB and read
>> https://github.com/karlseguin/the-little-mongodb-book . Then, while
>> you are downloading, unpacking, looking at what was in the JAR,
>> reading the book and pawing through the examples: _pay attention_ to
>> the neatness and the effortlessness the ease with which you can use
>> MongoDB.  Then spend the rest of the day implementing something on top
>> of it to gain some hacking experience.
>> 
>> No, really.  Do it.  This is important.  You need to connect with the
>> user and you need to understand what you ought to be aspiring to.
>> 
>> In any case, thanks for all the effort that went into Cassandra.  I
>> will check back from time to time and perhaps in a year or so it'll be
>> time to re-evaluate Cassandra.
> 
> In a year we'll have achieved Total World Domination. :)
> 
>> PS: one last thing.  It took us less time to rewrite the DB-interface
>> for our system to MongoDB AND port over our data than it took to write
>> the Cassandra implementation. 
> -- 
> Eric Evans
> eev...@rackspace.com
> 


Re: Compaction doubles disk space

2011-03-29 Thread Sheng Chen
Yes.
I think at least we can remove the tombstones for each sstable first, and
then do the merge.

2011/3/29 Karl Hiramoto 

> Would it be possible to improve the current compaction disk space issue by
>  compacting one only a few SSTables at a time then imediately deleting the
> old one?  Looking at the logs it seems like deletions of old SSTables are
> taking longer than necessary.
>
> --
> Karl
>


Re: Ditching Cassandra

2011-03-29 Thread Edward Capriolo
On Tue, Mar 29, 2011 at 9:56 PM, Colin  wrote:
> Eric,
>
> Seems like the answer to everything is 8.
>
> 8 has been very painful.
>
> Are you saying that 8 will or not be compatible with 7?
>
> If not, would you recommend waiting until 8?  We have done an awful lot of 
> work, have an awful  lot of work left, and have become very frustrated.
>
> Any idea on when 8 will be available?
>
>
> On Mar 29, 2011, at 8:15 PM, Eric Evans  wrote:
>
>> On Wed, 2011-03-30 at 02:11 +0200, Gregori Schmidt wrote:
>>>   - The API is horrible and it produces pointlessly verbose code in
>>>   addition to being utterly confusing.  EVERYTHING takes a lot of
>>> time to implement with Cassandra, and to be frank, it is incredibly
>>> tiring.  For this reason alone I no longer recommend Cassandra.  If
>>> you want an example, pick up the O'Reilly book on Cassandra and look
>>> through the examples.  Such MASSIVE amounts of code for doing nearly
>>> NOTHING.  This is ridiculous. Didn't this strike anyone else as
>>> ridiculous?  It should have!
>>
>> Yes, it did, which is why for 0.8 we have CQL
>> (https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co).
>>
>>>   - You need to have official client libraries and they need to be
>>>   programmer friendly.  Yes, I know there are nice people maintaining
>>> a plethora of different libraries, but you need to man up and face
>>> reality: the chaos that is the Cassandra client space is a horrible
>>> mess.
>>
>> The client space as a whole *is* a mess, despite heroic efforts on the
>> part of our third-party API maintainers, but forcing them in-tree is not
>> going to solve anything.  In fact, it would very likely make it worse by
>> adding unnecessary overhead to contribution, and discouraging
>> innovation.
>>
>> The root cause goes back to your first point, the RPC interface is
>> baroque, and too tightly coupled to Cassandra's internals.  The
>> third-party library maintainers can only do so much to paper over that;
>> The Fail shines through.
>>
>> The solution here is the same as for point #1 above, CQL.  And, the idea
>> is to include in-tree "drivers", basically, the minimum amount of common
>> code that all third-party libs would need to implement (think connection
>> pooling, parameter substitution, etc).  We already have drivers for
>> Java, Python, and Twisted, and folks are working PHP and Perl (that I'm
>> aware of).
>>
>>>   - It is buggy and the solution seems to be to just go to the next
>>>   release.  And the next.  And the next.  Which would be okay if you
>>> could upgrade all the time, but what to do once you hit production?
>>
>> 0.7 has been a rough ride, no doubt.  We spent too much time pushing in
>> too many features, and didn't do a good job of drawing a line in the
>> sand when it came time to release.  Our track record prior to 0.7 was
>> Not Horrible, and trending toward Better And Better, and we've made some
>> adjustment to the release process, so I'm hopeful we'll get back on
>> track.
>>
>> Also, new for 0.8 is backward compatible messaging, which will allow you
>> to smoothly perform rolling (non-disruptive) upgrades.  That, combined
>> with a stable query interface (CQL), will really reduce the barrier to
>> upgrades.
>>
>>> I would recommend that everyone interested in improving Cassandra take
>>> the day off,  download MongoDB and read
>>> https://github.com/karlseguin/the-little-mongodb-book . Then, while
>>> you are downloading, unpacking, looking at what was in the JAR,
>>> reading the book and pawing through the examples: _pay attention_ to
>>> the neatness and the effortlessness the ease with which you can use
>>> MongoDB.  Then spend the rest of the day implementing something on top
>>> of it to gain some hacking experience.
>>>
>>> No, really.  Do it.  This is important.  You need to connect with the
>>> user and you need to understand what you ought to be aspiring to.
>>>
>>> In any case, thanks for all the effort that went into Cassandra.  I
>>> will check back from time to time and perhaps in a year or so it'll be
>>> time to re-evaluate Cassandra.
>>
>> In a year we'll have achieved Total World Domination. :)
>>
>>> PS: one last thing.  It took us less time to rewrite the DB-interface
>>> for our system to MongoDB AND port over our data than it took to write
>>> the Cassandra implementation.
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>

While I respect your decision...
If you are tired of writing code there are solutions around coding
everything there are tools like http://code.google.com/p/kundera/

This is verbose:
http://java.sun.com/developer/onlineTraining/Beans/EJBTutorial/step4.html
Or this:
http://www.roseindia.net/hibernate/firstexample.shtml

I am sure someone will soon generate tons all kinds of fancy "easy"
stuff for cassandra like ECB or Cassanbernate that will be much more
complex and less efficient then writing your own pojos that everyone
will have wet dreams over.


Re: Ditching Cassandra

2011-03-29 Thread mcasandra
I am also interested in knowing when 8 will be released. Also, is there
someplace where we can read about features that will be relased in 8? Looks
like some major changes are going to come out.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Ditching-Cassandra-tp6221436p6221685.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: International language implementations

2011-03-29 Thread Edward Capriolo
On Tue, Mar 29, 2011 at 5:54 PM, A J  wrote:
> Example, taobao.com is a chinese online bid site. All data is chinese
> and they use Mongodb successfully.
> Are there similar installations of cassandra where data is non-latin ?
>
> I know in theory, it should all work as cassandra has full utf-8
> support. But unless there are real implementations, you cannot be sure
> of the issues related to storing,sorting etc..
>
> Regards.
>
>
> On Tue, Mar 29, 2011 at 5:41 PM, Peter Schuller
>  wrote:
>>> Can someone list some of the current international language
>>> implementations of cassandra ?
>>
>> What is an "international language implementation of Cassandra"?
>>
>> --
>> / Peter Schuller
>>
>
Keyspace ->Java String
ColumnFamily ->Java string
Row Key-> byte []
column -> byte []
value -> byte []

So you can encode/store any type of data you like.

As for internationalization, I have not found any NadaSQL groups yet.


Data Modeling advise for Cassandra 0.8

2011-03-29 Thread Drew Kutcharian
I'm pretty new to Cassandra and I would like to get your advice on modeling. 
The object model of the project that I'm working on will be pretty close to 
Blogger, Tumblr, etc. (or any other blogging website).
Where you have Users, that each can have many Blogs and each Blog can have many 
comments. How would you model this efficiently considering:

1) Be able to directly link to a User
2) Be able to directly link to a Blog
3) Be able to query and get all the Blogs for a User ordered by time created 
descending (new blogs first)
4) Be able to query and get all the Comments for each Blog ordered by time 
created ascending (old comments first)
5) Be able to link different Users to each other, as a network.
6) Have a well distributed hash so we don't end up with "hot" nodes, while the 
rest of the nodes are idle
7) It would be nice to show a User how many Blogs they have or how many 
comments are on a Blog, without iterating thru the whole dataset.

The target Cassandra version is 0.8 to use the Secondary Indexes. The goal is 
to be very efficient, so no Text keys. We were thinking of using Time Based 
64bit ids, using Snowflake.

Thanks,

Drew

Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-29 Thread Drew Kutcharian
I'm pretty new to Cassandra and I would like to get your advice on modeling. 
The object model of the project that I'm working on will be pretty close to 
Blogger, Tumblr, etc. (or any other blogging website).
Where you have Users, that each can have many Blogs and each Blog can have many 
comments. How would you model this efficiently considering:

1) Be able to directly link to a User
2) Be able to directly link to a Blog
3) Be able to query and get all the Blogs for a User ordered by time created 
descending (new blogs first)
4) Be able to query and get all the Comments for each Blog ordered by time 
created ascending (old comments first)
5) Be able to link different Users to each other, as a network.
6) Have a well distributed hash so we don't end up with "hot" nodes, while the 
rest of the nodes are idle
7) It would be nice to show a User how many Blogs they have or how many 
comments are on a Blog, without iterating thru the whole dataset.
NEW: 8) Be able to query for the most recently added Blogs. For example, Blogs 
added today, this week, this month, etc.

The target Cassandra version is 0.8 to use the Secondary Indexes. The goal is 
to be very efficient, so no Text keys. We were thinking of using Time Based 
64bit ids, using Snowflake.

Thanks,

Drew

RE: Ditching Cassandra

2011-03-29 Thread Colin
Edward,

My issue isn't in doing the work, I just don't want to do a lot of work if 8
is going to be out in a month or two.  That's just common sense.  Especially
if I can't upgrade an existing implementation without incurring undue risk.



-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Tuesday, March 29, 2011 9:58 PM
To: user@cassandra.apache.org
Subject: Re: Ditching Cassandra

On Tue, Mar 29, 2011 at 9:56 PM, Colin  wrote:
> Eric,
>
> Seems like the answer to everything is 8.
>
> 8 has been very painful.
>
> Are you saying that 8 will or not be compatible with 7?
>
> If not, would you recommend waiting until 8?  We have done an awful lot of
work, have an awful  lot of work left, and have become very frustrated.
>
> Any idea on when 8 will be available?
>
>
> On Mar 29, 2011, at 8:15 PM, Eric Evans  wrote:
>
>> On Wed, 2011-03-30 at 02:11 +0200, Gregori Schmidt wrote:
>>>   - The API is horrible and it produces pointlessly verbose code in
>>>   addition to being utterly confusing.  EVERYTHING takes a lot of 
>>> time to implement with Cassandra, and to be frank, it is incredibly 
>>> tiring.  For this reason alone I no longer recommend Cassandra.  If 
>>> you want an example, pick up the O'Reilly book on Cassandra and look 
>>> through the examples.  Such MASSIVE amounts of code for doing nearly 
>>> NOTHING.  This is ridiculous. Didn't this strike anyone else as 
>>> ridiculous?  It should have!
>>
>> Yes, it did, which is why for 0.8 we have CQL 
>> (https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co).
>>
>>>   - You need to have official client libraries and they need to be
>>>   programmer friendly.  Yes, I know there are nice people 
>>> maintaining a plethora of different libraries, but you need to man 
>>> up and face
>>> reality: the chaos that is the Cassandra client space is a horrible 
>>> mess.
>>
>> The client space as a whole *is* a mess, despite heroic efforts on 
>> the part of our third-party API maintainers, but forcing them in-tree 
>> is not going to solve anything.  In fact, it would very likely make 
>> it worse by adding unnecessary overhead to contribution, and 
>> discouraging innovation.
>>
>> The root cause goes back to your first point, the RPC interface is 
>> baroque, and too tightly coupled to Cassandra's internals.  The 
>> third-party library maintainers can only do so much to paper over 
>> that; The Fail shines through.
>>
>> The solution here is the same as for point #1 above, CQL.  And, the 
>> idea is to include in-tree "drivers", basically, the minimum amount 
>> of common code that all third-party libs would need to implement 
>> (think connection pooling, parameter substitution, etc).  We already 
>> have drivers for Java, Python, and Twisted, and folks are working PHP 
>> and Perl (that I'm aware of).
>>
>>>   - It is buggy and the solution seems to be to just go to the next
>>>   release.  And the next.  And the next.  Which would be okay if you 
>>> could upgrade all the time, but what to do once you hit production?
>>
>> 0.7 has been a rough ride, no doubt.  We spent too much time pushing 
>> in too many features, and didn't do a good job of drawing a line in 
>> the sand when it came time to release.  Our track record prior to 0.7 
>> was Not Horrible, and trending toward Better And Better, and we've 
>> made some adjustment to the release process, so I'm hopeful we'll get 
>> back on track.
>>
>> Also, new for 0.8 is backward compatible messaging, which will allow 
>> you to smoothly perform rolling (non-disruptive) upgrades.  That, 
>> combined with a stable query interface (CQL), will really reduce the 
>> barrier to upgrades.
>>
>>> I would recommend that everyone interested in improving Cassandra 
>>> take the day off,  download MongoDB and read 
>>> https://github.com/karlseguin/the-little-mongodb-book . Then, while 
>>> you are downloading, unpacking, looking at what was in the JAR, 
>>> reading the book and pawing through the examples: _pay attention_ to 
>>> the neatness and the effortlessness the ease with which you can use 
>>> MongoDB.  Then spend the rest of the day implementing something on 
>>> top of it to gain some hacking experience.
>>>
>>> No, really.  Do it.  This is important.  You need to connect with 
>>> the user and you need to understand what you ought to be aspiring to.
>>>
>>> In any case, thanks for all the effort that went into Cassandra.  I 
>>> will check back from time to time and perhaps in a year or so it'll 
>>> be time to re-evaluate Cassandra.
>>
>> In a year we'll have achieved Total World Domination. :)
>>
>>> PS: one last thing.  It took us less time to rewrite the 
>>> DB-interface for our system to MongoDB AND port over our data than 
>>> it took to write the Cassandra implementation.
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>

While I respect your decision...
If you are tired of writing code there are solutions around coding
everything there are too

Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Tue, 2011-03-29 at 20:56 -0500, Colin wrote:
> Are you saying that 8 will or not be compatible with 7?

You will be able to perform a rolling upgrade from 0.7.x to 0.8.  That
is to say, you'll be able to upgrade each node one at a time, mixing 0.7
and 0.8 nodes until the upgrade is complete.

> If not, would you recommend waiting until 8?  We have done an awful
> lot of work, have an awful  lot of work left, and have become very
> frustrated.

If you're interested in exploring the CQL route, and if your time-line
permits it, that's what I would do.

> Any idea on when 8 will be available?

Provided nothing crops up, we'll freeze on April 11th (2 weeks from
yesterday), and release on the week of May 9th (4 weeks later).

-- 
Eric Evans
eev...@rackspace.com



Re: Ditching Cassandra

2011-03-29 Thread Eric Evans
On Tue, 2011-03-29 at 19:58 -0700, mcasandra wrote:
> I am also interested in knowing when 8 will be released. 

We're targeting the week of May 9th.

> Also, is there someplace where we can read about features that will be
> relased in 8? Looks like some major changes are going to come out. 

The big stuff is CQL and backward compatible messaging (though that is
some pretty big stuff).  Most of the work done since the first 0.7.x
release has been features and bug fixes that landed in 0.7 and were
merged forward to trunk (for 0.8).

-- 
Eric Evans
eev...@rackspace.com



RE: Ditching Cassandra

2011-03-29 Thread Colin
Thank you Eric.  I appreciate it.

-Original Message-
From: Eric Evans [mailto:eev...@rackspace.com] 
Sent: Tuesday, March 29, 2011 11:47 PM
To: user@cassandra.apache.org
Subject: Re: Ditching Cassandra

On Tue, 2011-03-29 at 20:56 -0500, Colin wrote:
> Are you saying that 8 will or not be compatible with 7?

You will be able to perform a rolling upgrade from 0.7.x to 0.8.  That is to 
say, you'll be able to upgrade each node one at a time, mixing 0.7 and 0.8 
nodes until the upgrade is complete.

> If not, would you recommend waiting until 8?  We have done an awful 
> lot of work, have an awful  lot of work left, and have become very 
> frustrated.

If you're interested in exploring the CQL route, and if your time-line permits 
it, that's what I would do.

> Any idea on when 8 will be available?

Provided nothing crops up, we'll freeze on April 11th (2 weeks from yesterday), 
and release on the week of May 9th (4 weeks later).

--
Eric Evans
eev...@rackspace.com