Problem with libcassandra
I am trying to run below code, but it gives this error. It compiles without any errors. Kindly help me. (source of the code : http://posulliv.github.io/2011/02/27/libcassandra-sec-indexes/) terminate called after throwing an instance of 'org::apache::cassandra::InvalidRequestException' what(): Default TException. Aborted #include #include #include #include #include #include #include #include #include #include #include #include #include using namespace std; using namespace libcassandra; static string host("127.0.0.1"); static int port= 9160; int main() { CassandraFactory cf(host, port); tr1::shared_ptr c(cf.create()); KeyspaceDefinition ks_def; ks_def.setName("demo"); c->createKeyspace(ks_def); ColumnFamilyDefinition cf_def; cf_def.setName("users"); cf_def.setKeyspaceName(ks_def.getName()); ColumnDefinition name_col; name_col.setName("full_name"); name_col.setValidationClass("UTF8Type"); ColumnDefinition sec_col; sec_col.setName("birth_date"); sec_col.setValidationClass("LongType"); sec_col.setIndexType(org::apache::cassandra::IndexType::KEYS); ColumnDefinition third_col; third_col.setName("state"); third_col.setValidationClass("UTF8Type"); third_col.setIndexType(org::apache::cassandra::IndexType::KEYS); cf_def.addColumnMetadata(name_col); cf_def.addColumnMetadata(sec_col); cf_def.addColumnMetadata(third_col); c->setKeyspace(ks_def.getName()); c->createColumnFamily(cf_def); return 0; }
Re: Cassandra as storage for cache data
In our case we have continuous flow of data to be cached. Every second we're receiving tens of PUT requests. Every request has 500Kb payload in average and TTL about 20 minutes. On the other side we have the similar flow of GET requests. Every GET request is transformed to "get by key" query for cassandra. This is very simple and straightforward solution: - one CF - one key that is directly corresponds to cache entry key - one value of type bytes that corresponds to cache entry payload To be honest, I don't see how we can switch this solution to multi-CF scheme playing with time-based snapshots. Today this solution crashed again with overload symptoms: - almost non-stop compactifications on every node in cluster - large io-wait in the system - clients start failing with timeout exceptions At the same time we see that cassandra uses only half of java heap. How we can enforce it to start using all available resources (namely operating memory)? Best regards, Dmitry Olshansky
Re: Streaming performance with 1.2.6
On Mon, Jul 1, 2013 at 10:06 PM, Mike Heffner wrote: > > The only changes we've made to the config (aside from dirs/hosts) are: > Forgot to include we've changed this as well: -partitioner: org.apache.cassandra.dht.Murmur3Partitioner +partitioner: org.apache.cassandra.dht.RandomPartitioner Cheers, Mike -- Mike Heffner Librato, Inc.
Re: very inefficient operation with tombstones
I've seen the same thing From: Sylvain Lebresne Reply-To: Date: Tue, 2 Jul 2013 08:32:06 +0200 To: "user@cassandra.apache.org" Subject: Re: very inefficient operation with tombstones This is https://issues.apache.org/jira/browse/CASSANDRA-5677. -- Sylvain On Tue, Jul 2, 2013 at 6:04 AM, Mohica Jasha wrote: > Querying a table with 5000 thousands tombstones take 3 minutes to complete! > But Querying the same table with the same data pattern with 10,000 entries > takes a fraction of second to complete! > > > Details: > 1. created the following table: > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'}; > use test; > CREATE TABLE job_index ( stage text, "timestamp" text, PRIMARY KEY > (stage, "timestamp")); > > 2. inserted 5000 entries to the table: > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0001' ); > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '0002' ); > > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '4999' ); > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '5000' ); > > 3. flushed the table: > nodetool flush test job_index > > 4. deleted the 5000 entries: > DELETE from job_index WHERE stage ='a' AND timestamp = '0001' ; > DELETE from job_index WHERE stage ='a' AND timestamp = '0002' ; > ... > DELETE from job_index WHERE stage ='a' AND timestamp = '4999' ; > DELETE from job_index WHERE stage ='a' AND timestamp = '5000' ; > > 5. flushed the table: > nodetool flush test job_index > > 6. querying the table takes 3 minutes to complete: > cqlsh:test> SELECT * from job_index limit 2; > tracing: > http://pastebin.com/jH2rZN2X > > while query was getting executed I saw a lot of GC entries in cassandra's log: > DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line 121) > GC for ParNew: 30 ms for 6 collections, 263993608 used; max is 2093809664 > DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line 121) > GC for ParNew: 29 ms for 6 collections, 186209616 used; max is 2093809664 > DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line 121) > GC for ParNew: 29 ms for 6 collections, 108731464 used; max is 2093809664 > > It seems that something very inefficient is happening in managing tombstones. > > If I start with a clean table and do the following: > 1. insert 5000 entries > 2. flush to disk > 3. insert new 5000 entries > 4. flush to disk > Querying the job_index for all the 10,000 entries takes a fraction of second > to complete: > tracing: > http://pastebin.com/scUN9JrP > > The fact that iterating over 5000 tombstones takes 3 minutes but iterating > over 10,000 live cells takes fraction of a second to suggest that something > very inefficient is happening in managing tombstones. > > I appreciate if any developer can look into this. > > -M > > > > > > >
Re: schema management
Franc, We manage our schema through the Astyanax driver. It runs in a listener at application startup. We read a self-defined schema version, update the schema if needed based on the version number, and then write the new schema version number. There is a chance two or more app servers will try to update the schema at the same time but in our testing we haven't seen any problems result from this even when we forced many servers to all update the schema with many different updates at the same time. And besides we typically do a rolling restart anyway. Todd, Mutagen Cassandra looks pretty similar to what we're doing, but is perhaps a bit more elegant. Will take a look at that now :) Cheers On Mon, Jul 1, 2013 at 5:55 PM, Franc Carter wrote: > On Tue, Jul 2, 2013 at 10:33 AM, Todd Fast wrote: > >> Franc-- >> >> I think you will find Mutagen Cassandra very interesting; it is similar >> to schema management tools like Flyway for SQL databases: >> > > Oops - forgot to mention in my original email that we will be looking into > Mutagen Cassandra in the medium term. I'm after something with a low > barrier to entry initially as we are quite time constrained. > > cheers > > >> >> Mutagen Cassandra is a framework (based on Mutagen) that provides schema >>> versioning and mutation for Apache Cassandra. >>> >>> Mutagen is a lightweight framework for applying versioned changes (known >>> as mutations) to a resource, in this case a Cassandra schema. Mutagen takes >>> into account the resource's existing state and only applies changes that >>> haven't yet been applied. >>> >>> Schema mutation with Mutagen helps you make manageable changes to the >>> schema of live Cassandra instances as you update your software, and is >>> especially useful when used across development, test, staging, and >>> production environments to automatically keep schemas in sync. >> >> >> >> https://github.com/toddfast/mutagen-cassandra >> >> Todd >> >> >> On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli wrote: >> >>> You can generate schema through the code. That is also one option. >>> >>> >>> On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter >>> wrote: >>> Hi, I've been giving some thought to the way we deploy schemas and am looking for something better than out current approach, which is to use cassandra-cli scripts. What do people use for this ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 >>> >> > > > -- > > *Franc Carter* | Systems architect | Sirca Ltd > > > franc.car...@sirca.org.au | www.sirca.org.au > > Tel: +61 2 8355 2514 > > Level 4, 55 Harrington St, The Rocks NSW 2000 > > PO Box H58, Australia Square, Sydney NSW 1215 > > >
Re: Does cassandra recover from too many writes?
On Tue, Jul 2, 2013 at 7:18 AM, Eric Marshall wrote: > > My query: Should a Cassandra node be able to recover from too many writes > on its own? And if it can, what do I need to do to reach such a blissful > state? > In general applications running within the JVM are unable to recover when the JVM Garbage Collection process has failed in a catastrophic fashion. This is almost certainly the error condition you are triggering, which is why your Cassandra node does not recover. To confirm whether this is the case, enable more verbose GC logging and/or consult existing JVM GC log messages in system.log. =Rob
Re: Streaming performance with 1.2.6
This was a problem pre vnodes. I had several JIRA for that but some of them were voted down saying the performance will improve with vnodes. The main problem is that it streams one sstable at a time and not in parallel. Jira 4784 can speed up the bootstrap performance. You can also do a zero copy and not touch the caches of the nodes which are contributing in the build. https://issues.apache.org/jira/browse/CASSANDRA-4663 https://issues.apache.org/jira/browse/CASSANDRA-4784 On Tue, Jul 2, 2013 at 7:35 AM, Mike Heffner wrote: > > On Mon, Jul 1, 2013 at 10:06 PM, Mike Heffner wrote: > >> >> The only changes we've made to the config (aside from dirs/hosts) are: >> > > Forgot to include we've changed this as well: > > -partitioner: org.apache.cassandra.dht.Murmur3Partitioner > +partitioner: org.apache.cassandra.dht.RandomPartitioner > > > Cheers, > > Mike > -- > > Mike Heffner > Librato, Inc. > >
RE: Does cassandra recover from too many writes?
Makes sense - I will confirm. Thanks again for the help. Cheers, Eric From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Tuesday, July 02, 2013 12:53 PM To: user@cassandra.apache.org Subject: Re: Does cassandra recover from too many writes? On Tue, Jul 2, 2013 at 7:18 AM, Eric Marshall mailto:emarsh...@pulsepoint.com>> wrote: My query: Should a Cassandra node be able to recover from too many writes on its own? And if it can, what do I need to do to reach such a blissful state? In general applications running within the JVM are unable to recover when the JVM Garbage Collection process has failed in a catastrophic fashion. This is almost certainly the error condition you are triggering, which is why your Cassandra node does not recover. To confirm whether this is the case, enable more verbose GC logging and/or consult existing JVM GC log messages in system.log. =Rob
Re: Streaming performance with 1.2.6
Sankalp, Parallel sstableloader streaming would definitely be valuable. However, this ring is currently using vnodes and I was surprised to see that a bootstrapping node only streamed from one node in the ring. My understanding was that a bootstrapping node would stream from multiple nodes in the ring. We started with a 3 node/3 AZ, RF=3 ring. We then increased that to 6 nodes, adding one per AZ. The 4th, 5th and 6th nodes only streamed from the node in their own AZ/rack which led to the serial sstable streaming. Is this the correct behavior for the snitch? Is there an option to stream from multiple replicas across the az/rack configuration? Mike On Tue, Jul 2, 2013 at 1:53 PM, sankalp kohli wrote: > This was a problem pre vnodes. I had several JIRA for that but some of > them were voted down saying the performance will improve with vnodes. > The main problem is that it streams one sstable at a time and not in > parallel. > > Jira 4784 can speed up the bootstrap performance. You can also do a zero > copy and not touch the caches of the nodes which are contributing in the > build. > > > https://issues.apache.org/jira/browse/CASSANDRA-4663 > https://issues.apache.org/jira/browse/CASSANDRA-4784 > > > On Tue, Jul 2, 2013 at 7:35 AM, Mike Heffner wrote: > >> >> On Mon, Jul 1, 2013 at 10:06 PM, Mike Heffner wrote: >> >>> >>> The only changes we've made to the config (aside from dirs/hosts) are: >>> >> >> Forgot to include we've changed this as well: >> >> -partitioner: org.apache.cassandra.dht.Murmur3Partitioner >> +partitioner: org.apache.cassandra.dht.RandomPartitioner >> >> >> Cheers, >> >> Mike >> -- >> >> Mike Heffner >> Librato, Inc. >> >> > -- Mike Heffner Librato, Inc.
Re: Streaming performance with 1.2.6
As a test, adding a 7th node in the first AZ will stream from both the two existing nodes in the same AZ. Aggregate streaming bandwidth at the 7th node is approximately 12 MB/sec when all limits are set at 800 MB/sec, or about double what I saw streaming from a single node. This would seem to indicate that the sending node is limiting our streaming rate. Mike On Tue, Jul 2, 2013 at 3:00 PM, Mike Heffner wrote: > Sankalp, > > Parallel sstableloader streaming would definitely be valuable. > > However, this ring is currently using vnodes and I was surprised to see > that a bootstrapping node only streamed from one node in the ring. My > understanding was that a bootstrapping node would stream from multiple > nodes in the ring. > > We started with a 3 node/3 AZ, RF=3 ring. We then increased that to 6 > nodes, adding one per AZ. The 4th, 5th and 6th nodes only streamed from the > node in their own AZ/rack which led to the serial sstable streaming. Is > this the correct behavior for the snitch? Is there an option to stream from > multiple replicas across the az/rack configuration? > > Mike > > > On Tue, Jul 2, 2013 at 1:53 PM, sankalp kohli wrote: > >> This was a problem pre vnodes. I had several JIRA for that but some of >> them were voted down saying the performance will improve with vnodes. >> The main problem is that it streams one sstable at a time and not in >> parallel. >> >> Jira 4784 can speed up the bootstrap performance. You can also do a zero >> copy and not touch the caches of the nodes which are contributing in the >> build. >> >> >> https://issues.apache.org/jira/browse/CASSANDRA-4663 >> https://issues.apache.org/jira/browse/CASSANDRA-4784 >> >> >> On Tue, Jul 2, 2013 at 7:35 AM, Mike Heffner wrote: >> >>> >>> On Mon, Jul 1, 2013 at 10:06 PM, Mike Heffner wrote: >>> The only changes we've made to the config (aside from dirs/hosts) are: >>> >>> Forgot to include we've changed this as well: >>> >>> -partitioner: org.apache.cassandra.dht.Murmur3Partitioner >>> +partitioner: org.apache.cassandra.dht.RandomPartitioner >>> >>> >>> Cheers, >>> >>> Mike >>> -- >>> >>> Mike Heffner >>> Librato, Inc. >>> >>> >> > > > -- > > Mike Heffner > Librato, Inc. > > -- Mike Heffner Librato, Inc.
Re: Problem with libcassandra
Have you tried running your code in GDB to find which line is causing the error? That would be what I'd do first. Aaron Turner http://synfin.net/ Twitter: @synfinatic https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero" On Tue, Jul 2, 2013 at 3:18 AM, Shubham Mittal wrote: > I am trying to run below code, but it gives this error. It compiles > without any errors. Kindly help me. > (source of the code : > http://posulliv.github.io/2011/02/27/libcassandra-sec-indexes/) > > terminate called after throwing an instance of > 'org::apache::cassandra::InvalidRequestException' > what(): Default TException. > Aborted > > > #include > #include > #include > #include > #include > #include > #include > #include > > #include > #include > #include > #include > #include > > using namespace std; > using namespace libcassandra; > > static string host("127.0.0.1"); > static int port= 9160; > > int main() > { > > CassandraFactory cf(host, port); > tr1::shared_ptr c(cf.create()); > > KeyspaceDefinition ks_def; > ks_def.setName("demo"); > c->createKeyspace(ks_def); > > ColumnFamilyDefinition cf_def; > cf_def.setName("users"); > cf_def.setKeyspaceName(ks_def.getName()); > > ColumnDefinition name_col; > name_col.setName("full_name"); > name_col.setValidationClass("UTF8Type"); > > ColumnDefinition sec_col; > sec_col.setName("birth_date"); > sec_col.setValidationClass("LongType"); > sec_col.setIndexType(org::apache::cassandra::IndexType::KEYS); > > ColumnDefinition third_col; > third_col.setName("state"); > third_col.setValidationClass("UTF8Type"); > third_col.setIndexType(org::apache::cassandra::IndexType::KEYS); > > cf_def.addColumnMetadata(name_col); > cf_def.addColumnMetadata(sec_col); > cf_def.addColumnMetadata(third_col); > > c->setKeyspace(ks_def.getName()); > c->createColumnFamily(cf_def); > > return 0; > } > >
Re: Streaming performance with 1.2.6
I dont know much about streaming in vnodes but you might be hitting this https://issues.apache.org/jira/browse/CASSANDRA-4650 On Tue, Jul 2, 2013 at 12:43 PM, Mike Heffner wrote: > As a test, adding a 7th node in the first AZ will stream from both the two > existing nodes in the same AZ. > > Aggregate streaming bandwidth at the 7th node is approximately 12 MB/sec > when all limits are set at 800 MB/sec, or about double what I saw streaming > from a single node. This would seem to indicate that the sending node is > limiting our streaming rate. > > Mike > > > On Tue, Jul 2, 2013 at 3:00 PM, Mike Heffner wrote: > >> Sankalp, >> >> Parallel sstableloader streaming would definitely be valuable. >> >> However, this ring is currently using vnodes and I was surprised to see >> that a bootstrapping node only streamed from one node in the ring. My >> understanding was that a bootstrapping node would stream from multiple >> nodes in the ring. >> >> We started with a 3 node/3 AZ, RF=3 ring. We then increased that to 6 >> nodes, adding one per AZ. The 4th, 5th and 6th nodes only streamed from the >> node in their own AZ/rack which led to the serial sstable streaming. Is >> this the correct behavior for the snitch? Is there an option to stream from >> multiple replicas across the az/rack configuration? >> >> Mike >> >> >> On Tue, Jul 2, 2013 at 1:53 PM, sankalp kohli wrote: >> >>> This was a problem pre vnodes. I had several JIRA for that but some of >>> them were voted down saying the performance will improve with vnodes. >>> The main problem is that it streams one sstable at a time and not in >>> parallel. >>> >>> Jira 4784 can speed up the bootstrap performance. You can also do a zero >>> copy and not touch the caches of the nodes which are contributing in the >>> build. >>> >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-4663 >>> https://issues.apache.org/jira/browse/CASSANDRA-4784 >>> >>> >>> On Tue, Jul 2, 2013 at 7:35 AM, Mike Heffner wrote: >>> On Mon, Jul 1, 2013 at 10:06 PM, Mike Heffner wrote: > > The only changes we've made to the config (aside from dirs/hosts) are: > Forgot to include we've changed this as well: -partitioner: org.apache.cassandra.dht.Murmur3Partitioner +partitioner: org.apache.cassandra.dht.RandomPartitioner Cheers, Mike -- Mike Heffner Librato, Inc. >>> >> >> >> -- >> >> Mike Heffner >> Librato, Inc. >> >> > > > -- > > Mike Heffner > Librato, Inc. > >
Re: Cassandra as storage for cache data
If this is a tombstone problem as suggested by some, and it is ok to turn of replication as suggested by others, it may be an idea to do an optimization in cassandra where if replication_factor < 1: do not create tombstones Terje On Jul 2, 2013, at 11:11 PM, Dmitry Olshansky wrote: > In our case we have continuous flow of data to be cached. Every second we're > receiving tens of PUT requests. Every request has 500Kb payload in average > and TTL about 20 minutes. > > On the other side we have the similar flow of GET requests. Every GET request > is transformed to "get by key" query for cassandra. > > This is very simple and straightforward solution: > - one CF > - one key that is directly corresponds to cache entry key > - one value of type bytes that corresponds to cache entry payload > > To be honest, I don't see how we can switch this solution to multi-CF scheme > playing with time-based snapshots. > > Today this solution crashed again with overload symptoms: > - almost non-stop compactifications on every node in cluster > - large io-wait in the system > - clients start failing with timeout exceptions > > At the same time we see that cassandra uses only half of java heap. How we > can enforce it to start using all available resources (namely operating > memory)? > > Best regards, > Dmitry Olshansky
Re: Problem with libcassandra
If you are using 1.2, I would checkout https://github.com/mstump/libcql -Jeremiah On Jul 2, 2013, at 5:18 AM, Shubham Mittal wrote: > I am trying to run below code, but it gives this error. It compiles without > any errors. Kindly help me. > (source of the code : > http://posulliv.github.io/2011/02/27/libcassandra-sec-indexes/) > > terminate called after throwing an instance of > 'org::apache::cassandra::InvalidRequestException' > what(): Default TException. > Aborted > > > #include > #include > #include > #include > #include > #include > #include > #include > > #include > #include > #include > #include > #include > > using namespace std; > using namespace libcassandra; > > static string host("127.0.0.1"); > static int port= 9160; > > int main() > { > > CassandraFactory cf(host, port); > tr1::shared_ptr c(cf.create()); > > KeyspaceDefinition ks_def; > ks_def.setName("demo"); > c->createKeyspace(ks_def); > > ColumnFamilyDefinition cf_def; > cf_def.setName("users"); > cf_def.setKeyspaceName(ks_def.getName()); > > ColumnDefinition name_col; > name_col.setName("full_name"); > name_col.setValidationClass("UTF8Type"); > > ColumnDefinition sec_col; > sec_col.setName("birth_date"); > sec_col.setValidationClass("LongType"); > sec_col.setIndexType(org::apache::cassandra::IndexType::KEYS); > > ColumnDefinition third_col; > third_col.setName("state"); > third_col.setValidationClass("UTF8Type"); > third_col.setIndexType(org::apache::cassandra::IndexType::KEYS); > > cf_def.addColumnMetadata(name_col); > cf_def.addColumnMetadata(sec_col); > cf_def.addColumnMetadata(third_col); > > c->setKeyspace(ks_def.getName()); > c->createColumnFamily(cf_def); > > return 0; > } >
Strange preparedStatment response...
Hi All, Using JDBC prepareStatement when I use secondary index and use that in preparedStatement I get no rows back. If I replace the ? with a integer I get the rows back I expect. If I use setObject() instead of setInt() I get the following exception: encountered object of class: class java.lang.Integer, but only 'String' is supported to map to the various VARCHAR types Makes me wonder what I am doing wrong? I use IntegerType for the column value type but it looks like maybe when you use a sql statement where you have column_name = ? it is comparing maybe the column name rather than the Integer values in the column? Thanks for the help. -Tony
columns disappearing intermittently
Hi All, We're having a problem with our cassandra cluster and are at a loss as to the cause. We have what appear to be columns that disappear for a little while, then reappear. The rest of the row is returned normally during this time. This is, of course, very disturbing, and is wreaking havoc with our application. A bit more info about what's happening: We are repeatedly executing the same query against our cluster. Every so often, one of the columns will disappear from the row and will remain gone for some time. Then after continually executing the same query, the column will come back. The queries are being executed against a 3 node cluster, with a replication factor of 3, and all reads and writes are done with a quorum consistency level. We upgraded from cassandra 1.1.12 to 1.2.6 last week, but only started seeing issues this morning. Has anyone had a problem like this before, or have any idea what might be causing it?