Cassandra on a single (under-powered) instance?
Hello All. I am new to Cassandra and I am evaluating it for a project I am working on. This project has several distribution models, ranging from a cloud distribution where we would be collecting hundreds of millions of rows per day to a single box distribution where we could be collecting as few as 5 to 10 million rows per day. Based on the experimentation and testing I have done so far, I believe that Cassandra would be an excellent fit for our large scale cloud distribution, but from a maintenance/support point of view, we would like to keep our storage engine consistent across all distributions. For our single box distribution, it could be running on a box as small as an i3 processor with 4 GB of RAM and about 180 GB of disk base available for use... A rough estimate would be that our storage engine could be allowed to consume about half of the processor and RAM resources. I know that running Cassandra on a single instance throws away the majority of the benefits of using a distribution storage solution (distributed writes and reads, fault tolerance, etc.), but it might be worth the trade off if we don't have to support two completely different storage solutions, even if they were hidden behind an abstraction layer from the application's point of view. My question is, are we completely out-to-lunch thinking that we might be able to run Cassandra in a reasonable way on such an under-powered box? I believe I recall reading in the Datastax documentation that the minimum recommended system requirements are 8 to 12 cores and 8 GB of RAM, which is a far cry from the lowest-end machine I'm considering. Any info or help anyone could provide would be most appreciated. Regards, Daniel Morton
Bulk loading into CQL3 Composite Columns
Hi All. I am trying to bulk load some data into a CQL3 table using the sstableloader utility and I am having some difficulty figuring out how to use the SSTableSimpleUnsortedWriter with composite columns. I have created this simple contrived table for testing: create table test (key varchar, val1 int, val2 int, primary key (key, val1, val2)); Loosely following the bulk loading example in the docs, I have constructed the following method to create my temporary SSTables. public static void main(String[] args) throws Exception { final List> compositeTypes = new ArrayList<>(); compositeTypes.add(UTF8Type.instance); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); final CompositeType compType = CompositeType.getInstance(compositeTypes); SSTableSimpleUnsortedWriter ssTableWriter = new SSTableSimpleUnsortedWriter( new File("/tmp/cassandra_bulk/bigdata/test"), new Murmur3Partitioner() , "bigdata", "test", compType, null, 128); final Builder builder = new CompositeType.Builder(compType); builder.add(bytes("20101201")); builder.add(bytes(5)); builder.add(bytes(10)); ssTableWriter.newRow(bytes("20101201")); ssTableWriter.addColumn( builder.build(), ByteBuffer.allocate(0), System.currentTimeMillis() ); ssTableWriter.close(); } When I execute this method and load the data using sstableloader, if I do a 'SELECT * FROM test' in cqlsh, I get the results: key | val1 | val2 20101201 | '20101201' | 5 And the error: Failed to decode value '20101201' (for column 'val1') as int. The error I get makes sense, as apparently it tried to place the key value into the val1 column. From this error, I then assumed that the key value should not be part of the composite type when the row is added, so I removed the UTF8Type from the composite type, and only added the two integer values through the builder, but when I repeat the select with that data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the ColumnGroupMap class. Can anyone offer any advice on the correct way to insert data via the bulk loading process into CQL3 tables with composite columns? Does the fact that I am not inserting a value for the columns make a difference? For my particular use case, all I care about is the values in the column names themselves (and the associated sorting that goes with them). Any info or help anyone could provide would be very much appreciated. Regards, Daniel Morton
Re: Cassandra on a single (under-powered) instance?
Hi Tyler... Thank you very much for the response. It is nice to know that there is some possibility this might work. :) Regards, Daniel Morton On Wed, May 29, 2013 at 2:03 PM, Tyler Hobbs wrote: > You can get away with a 1 to 2GB heap if you don't put too much pressure > on it. I commonly run stress tests against a 400M heap node while > developing and I almost never see OutOfMemory errors, but I'm not keeping a > close eye on latency and throughput, which will be impacted when the JVM GC > is running nonstop. > > Cassandra doesn't tend to become CPU bound, so an i3 will probably work > fine. > > > On Tue, May 28, 2013 at 9:42 AM, Daniel Morton wrote: > >> Hello All. >> >> I am new to Cassandra and I am evaluating it for a project I am working >> on. >> >> This project has several distribution models, ranging from a cloud >> distribution where we would be collecting hundreds of millions of rows per >> day to a single box distribution where we could be collecting as few as 5 >> to 10 million rows per day. >> >> Based on the experimentation and testing I have done so far, I believe >> that Cassandra would be an excellent fit for our large scale cloud >> distribution, but from a maintenance/support point of view, we would like >> to keep our storage engine consistent across all distributions. >> >> For our single box distribution, it could be running on a box as small as >> an i3 processor with 4 GB of RAM and about 180 GB of disk base available >> for use... A rough estimate would be that our storage engine could be >> allowed to consume about half of the processor and RAM resources. >> >> I know that running Cassandra on a single instance throws away the >> majority of the benefits of using a distribution storage solution >> (distributed writes and reads, fault tolerance, etc.), but it might be >> worth the trade off if we don't have to support two completely different >> storage solutions, even if they were hidden behind an abstraction layer >> from the application's point of view. >> >> My question is, are we completely out-to-lunch thinking that we might be >> able to run Cassandra in a reasonable way on such an under-powered box? I >> believe I recall reading in the Datastax documentation that the minimum >> recommended system requirements are 8 to 12 cores and 8 GB of RAM, which is >> a far cry from the lowest-end machine I'm considering. >> >> Any info or help anyone could provide would be most appreciated. >> >> Regards, >> >> Daniel Morton >> > > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> >
Re: Bulk loading into CQL3 Composite Columns
Hi Keith... Thanks for the help. I'm presently not importing the Hector library (Which is where classes like CompositeSerializer and StringSerializer come from, yes?), only the cassandra-all maven artifact. Is the behaviour of the CompositeSerializer much different than using a Builder from a CompositeType? When I saw the error about '20101201' failing to decode, I tried only including the values for val1 and val2 like: final List> compositeTypes = new ArrayList<>(); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); final CompositeType compType = CompositeType.getInstance(compositeTypes); final Builder builder = new CompositeType.Builder(compType); builder.add(bytes(5)); builder.add(bytes(10)); ssTableWriter.newRow(bytes("20101201")); ssTableWriter.addColumn(builder.build(), ByteBuffer.allocate(0), System.currentTimeMillis()); (where bytes is the statically imported ByteBufferUtil.bytes method) But doing this resulted in an ArrayIndexOutOfBounds exception from Cassandra. Is doing this any different than using the CompositeSerializer you suggest? Thanks again, Daniel Morton On Thu, May 30, 2013 at 3:32 PM, Keith Wright wrote: > You do not want to repeat the first item of your primary key again. If > you recall, in CQL3 a primary key as defined below indicates that the row > key is the first item (key) and then the column names are composites of > val1,val2. Although I don't see why you need val2 as part of the primary > key in this case. In any event, you would do something like this (although > I've never tested passing a null value): > > ssTableWriter.newRow(StringSerializer.get().toByteBuffer("20101201")); > Composite columnComposite = new Composite(); > columnComposite(0,5,IntegerSerializer.get()); > columnComposite(0,10,IntegerSerializer.get()); > ssTableWriter.addColumn( > CompositeSerializer.get().toByteBuffer(columnComposite), > null, > System.currentTimeMillis() > ); > > From: Daniel Morton > Reply-To: "user@cassandra.apache.org" > Date: Thursday, May 30, 2013 1:06 PM > To: "user@cassandra.apache.org" > Subject: Bulk loading into CQL3 Composite Columns > > Hi All. I am trying to bulk load some data into a CQL3 table using the > sstableloader utility and I am having some difficulty figuring out how to > use the SSTableSimpleUnsortedWriter with composite columns. > > I have created this simple contrived table for testing: > > create table test (key varchar, val1 int, val2 int, primary key (key, > val1, val2)); > > Loosely following the bulk loading example in the docs, I have constructed > the following method to create my temporary SSTables. > > public static void main(String[] args) throws Exception { >final List> compositeTypes = new ArrayList<>(); >compositeTypes.add(UTF8Type.instance); >compositeTypes.add(IntegerType.instance); >compositeTypes.add(IntegerType.instance); >final CompositeType compType = > CompositeType.getInstance(compositeTypes); >SSTableSimpleUnsortedWriter ssTableWriter = > new SSTableSimpleUnsortedWriter( > new File("/tmp/cassandra_bulk/bigdata/test"), > new Murmur3Partitioner() , > "bigdata", > "test", > compType, > null, > 128); > >final Builder builder = > new CompositeType.Builder(compType); > >builder.add(bytes("20101201")); >builder.add(bytes(5)); >builder.add(bytes(10)); > >ssTableWriter.newRow(bytes("20101201")); >ssTableWriter.addColumn( > builder.build(), > ByteBuffer.allocate(0), > System.currentTimeMillis() >); > >ssTableWriter.close(); > } > > When I execute this method and load the data using sstableloader, if I do > a 'SELECT * FROM test' in cqlsh, I get the results: > > key | val1 | val2 > > 20101201 | '20101201' | 5 > > And the error: Failed to decode value '20101201' (for column 'val1') as > int. > > The error I get makes sense, as apparently it tried to place the key value > into the val1 column. From this error, I then assumed that the key value > should not be part of the composite type when the row is added, so I > removed the UTF8Type from the composite type, and only added the two > integer values through the builder, but when I repeat the select with that > data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the > ColumnGroupMap class. > > Can anyone offer any advice on the correct way to insert data via the bulk > loading process into CQL3 tables with composite columns? Does the fact > that I am not inserting a value for the columns make a difference? For my > particular use case, all I care about is the values in the column names > themselves (and the associated sorting that goes with them). > > Any info or help anyone could provide would be very much appreciated. > > Regards, > > Daniel Morton >
Re: Bulk loading into CQL3 Composite Columns
Hi Edward... Thanks for the pointer. I will use that going forward. Daniel Morton On Thu, May 30, 2013 at 4:09 PM, Edward Capriolo wrote: > You should probably be using system.nanoTime() not > system.currentTimeInMillis(). The user is free to set the timestamp to > whatever they like but nano-time is the standard (it is what the cli uses, > and what cql will use) > > > On Thu, May 30, 2013 at 3:33 PM, Keith Wright wrote: > >> Sorry, typo in code sample, should be: >> >> ssTableWriter.newRow(StringSerializer.get().toByteBuffer("20101201")); >> Composite columnComposite = new Composite(); >> columnComposite.setComponent(0,5,IntegerSerializer.get()); >> columnComposite.setComponent(1,10,IntegerSerializer.get()); >> >> ssTableWriter.addColumn( >> CompositeSerializer.get().toByteBuffer(columnComposite), null, >> System.currentTimeMillis() ); >> >> From: Keith Wright >> Date: Thursday, May 30, 2013 3:32 PM >> To: "user@cassandra.apache.org" >> Subject: Re: Bulk loading into CQL3 Composite Columns >> >> You do not want to repeat the first item of your primary key again. If >> you recall, in CQL3 a primary key as defined below indicates that the row >> key is the first item (key) and then the column names are composites of >> val1,val2. Although I don't see why you need val2 as part of the primary >> key in this case. In any event, you would do something like this (although >> I've never tested passing a null value): >> >> ssTableWriter.newRow(StringSerializer.get().toByteBuffer("20101201")); >> Composite columnComposite = new Composite(); >> columnComposite(0,5,IntegerSerializer.get()); >> columnComposite(0,10,IntegerSerializer.get()); >> ssTableWriter.addColumn( >> CompositeSerializer.get().toByteBuffer(columnComposite), >> null, >> System.currentTimeMillis() >> ); >> >> From: Daniel Morton >> Reply-To: "user@cassandra.apache.org" >> Date: Thursday, May 30, 2013 1:06 PM >> To: "user@cassandra.apache.org" >> Subject: Bulk loading into CQL3 Composite Columns >> >> Hi All. I am trying to bulk load some data into a CQL3 table using the >> sstableloader utility and I am having some difficulty figuring out how to >> use the SSTableSimpleUnsortedWriter with composite columns. >> >> I have created this simple contrived table for testing: >> >> create table test (key varchar, val1 int, val2 int, primary key (key, >> val1, val2)); >> >> Loosely following the bulk loading example in the docs, I have >> constructed the following method to create my temporary SSTables. >> >> public static void main(String[] args) throws Exception { >>final List> compositeTypes = new ArrayList<>(); >>compositeTypes.add(UTF8Type.instance); >>compositeTypes.add(IntegerType.instance); >>compositeTypes.add(IntegerType.instance); >>final CompositeType compType = >> CompositeType.getInstance(compositeTypes); >>SSTableSimpleUnsortedWriter ssTableWriter = >> new SSTableSimpleUnsortedWriter( >> new File("/tmp/cassandra_bulk/bigdata/test"), >> new Murmur3Partitioner() , >> "bigdata", >> "test", >> compType, >> null, >> 128); >> >>final Builder builder = >> new CompositeType.Builder(compType); >> >>builder.add(bytes("20101201")); >>builder.add(bytes(5)); >>builder.add(bytes(10)); >> >>ssTableWriter.newRow(bytes("20101201")); >>ssTableWriter.addColumn( >> builder.build(), >> ByteBuffer.allocate(0), >> System.currentTimeMillis() >>); >> >>ssTableWriter.close(); >> } >> >> When I execute this method and load the data using sstableloader, if I do >> a 'SELECT * FROM test' in cqlsh, I get the results: >> >> key | val1 | val2 >> >> 20101201 | '20101201' | 5 >> >> And the error: Failed to decode value '20101201' (for column 'val1') as >> int. >> >> The error I get makes sense, as apparently it tried to place the key >> value into the val1 column. From this error, I then assumed that the key >> value should not be part of the composite type when the row is added, so I >> removed the UTF8Type from the composite type, and only added the two >> integer values through the builder, but when I repeat the select with that >> data loaded, Cassandra throws an ArrayIndexOutOfBoundsException in the >> ColumnGroupMap class. >> >> Can anyone offer any advice on the correct way to insert data via the >> bulk loading process into CQL3 tables with composite columns? Does the >> fact that I am not inserting a value for the columns make a difference? >> For my particular use case, all I care about is the values in the column >> names themselves (and the associated sorting that goes with them). >> >> Any info or help anyone could provide would be very much appreciated. >> >> Regards, >> >> Daniel Morton >> > >
Re: Bulk loading into CQL3 Composite Columns
compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); final CompositeType compType = CompositeType.getInstance(compositeTypes); final SSTableSimpleUnsortedWriter ssTableWriter = new SSTableSimpleUnsortedWriter( new File("/tmp/cassandra_bulk/bigdata/test"), new Murmur3Partitioner() , "bigdata", "test", compType, null, 128); final Builder builder = new CompositeType.Builder(compType); builder.add(bytes(5)); builder.add(bytes(10)); builder.add(bytes(20)); ssTableWriter.newRow(bytes("0|20101201")); ssTableWriter.addColumn( builder.build(), ByteBuffer.allocate(0), System.nanoTime() ); ssTableWriter.close(); } } Any thoughts? Daniel Morton On Thu, May 30, 2013 at 8:12 PM, Keith Wright wrote: > StringSerializer and CompositeSerializer are actually from Astyanax for > what's it worth. I would recommend you change your table definition so > that only val1 is part of the primary key. There is no reason to include > val2. Perhaps sending the IndexOutOfBoundsException would help. > > All the StringSerializer is really doing is > > ByteBuffer.wrap<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/ByteBuffer.java#ByteBuffer.wrap%28byte%5B%5D%29> > (obj.getBytes<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.getBytes%28java.nio.charset.Charset%29> > (charset<http://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/StringSerializer.java#StringSerializer.0charset> > )) > > Using UTF-8 as the charset (see > http://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/StringSerializer.java#StringSerializer > ) > > You can see the source for CompositeSerializer here: > http://grepcode.com/file/repo1.maven.org/maven2/com.netflix.astyanax/astyanax/1.56.26/com/netflix/astyanax/serializers/CompositeSerializer.java > > Good luck! > > From: Daniel Morton > Reply-To: "user@cassandra.apache.org" > Date: Thursday, May 30, 2013 4:33 PM > To: "user@cassandra.apache.org" > Subject: Re: Bulk loading into CQL3 Composite Columns > > Hi Keith... Thanks for the help. > > I'm presently not importing the Hector library (Which is where classes > like CompositeSerializer and StringSerializer come from, yes?), only the > cassandra-all maven artifact. Is the behaviour of the CompositeSerializer > much different than using a Builder from a CompositeType? When I saw the > error about '20101201' failing to decode, I tried only including the values > for val1 and val2 like: > > > final List> compositeTypes = new ArrayList<>(); > compositeTypes.add(IntegerType.instance); > compositeTypes.add(IntegerType.instance); > > final CompositeType compType = CompositeType.getInstance(compositeTypes); > final Builder builder = new CompositeType.Builder(compType); > > builder.add(bytes(5)); > builder.add(bytes(10)); > > ssTableWriter.newRow(bytes("20101201")); > ssTableWriter.addColumn(builder.build(), ByteBuffer.allocate(0), > System.currentTimeMillis()); > > > > (where bytes is the statically imported ByteBufferUtil.bytes method) > > But doing this resulted in an ArrayIndexOutOfBounds exception from > Cassandra. Is doing this any different than using the CompositeSerializer > you suggest? > > Thanks again, > > Daniel Morton > > > On Thu, May 30, 2013 at 3:32 PM, Keith Wright wrote: > >> You do not want to repeat the first item of your primary key again. If >> you recall, in CQL3 a primary key as defined below indicates that the row >> key is the first item (key) and then the column names are composites of >> val1,val2. Although I don't see why you need val2 as part of the primary >> key in this case. In any event, you would do something like this (although >> I've never tested passing a null value): >> >> ssTableWriter.newRow(StringSerializer.get().toByteBuffer("20101201")); >> Composite columnComposite = new Composite(); >> columnComposite(0,5,IntegerSerializer.get()); >> columnComposite(0,10,IntegerSerializer.get()); >> ssTableWriter.addColumn( >> CompositeSerializer.get().toByteBuffer(columnComposite), >> null, >> System.currentTimeMillis() >> ); >> >>