Re: Cassandra 0.8 questions
It's not really possible to give a general answer your second question, it depends of your implementation. Personally I do two thing : the first one is to map arrays with a key and then name of column as a key of your array and value of column as the data storage. However for some application, as I am using Java I just serialize my ArrayList (or List) and push all the content to one column. It all depends on what you want to achieve. Third question: try to make CF according to what you want to achieve. I am designing an internal messaging system I use only two column family to hold the message lists, message and message box. I would have used one; but I need one that is sorted by TimeUUID and the other one by UTF8Type. I think there is a general consensus here : try to avoid super columns. 2 sets of columns can do the same jobs has one SuperColumn and it's the preferred scheme. Again just experiment and be ready to change your organization if you begin with Cassandra, this is the best way to figure out what to do for your data organization. Victor Kabdebon http://www.voxnucleus.fr http://www.victorkabdebon.net 2011/5/24 Jian Fang > Does anyone have a good suggestion on my second question? I believe that > question is a pretty common one. > > My third question is a design question. For the same data, we can stored > them into multiple column families or a single column family with multiple > super columns. > From Cassandra read/write performance point of view, what are the general > rules to make mutliple column families and when to use a single column > family? > > Thanks again, > > John > > > On Mon, May 23, 2011 at 5:47 PM, Jian Fang > wrote: > >> Hi, >> >> I am pretty new to Cassandra and am going to use Cassandra 0.8.0. I have >> two questions (sorry if they are very basic ones): >> >> 1) I have a column family to hold many super columns, say 30. When I first >> insert the data to the column family, do I need to insert each column one at >> a time or can I insert the whole column family in one transaction (or >> call?)? The latter one seems to be more efficient to me. Does Cassandra >> support that? >> >> For example, I saw the following code to do insertion (with Hector), >> >> Mutator m = HFactory.createMutator(keyspace, stringSerializer); >> //Mutator m = >> HFactory.createMutator(keyspace,stringSerializer); >> m.insert(p.getCassandraKey(), colFamily, >> HFactory.createStringColumn("type", >> p.getStringValue())); >> m.insert(p.getCassandraKey(), colFamily, >> HFactory.createColumn("data", >> p.getCompressedXML(), StringSerializer.get(), >> BytesArraySerializer.get())); >> >> Will the insertions be two separate calls to Cassandra? Or they are just >> one transaction? If it is the former case, is there any way to make them as >> one call to Cassandra? >> >> 2) How to store a list/array of data in Cassandra? For example, I have a >> data field called categories, which include none or many categories and each >> category includes a category id and a category description. Usually, how do >> people handle this scenario when they use Cassandra? >> >> Thanks in advance, >> >> John >> > >
Re: Appending to fields
As Jonathan stated I believe that the insert is in O(N + M), unless there are some operations that I don't know. There are other NoSQL database that can be used with Cassandra as "buffers" for quick access and modification and then after the content can be dumped into Cassandra for long term storage. Here is an example with Redis : http://redis.io/commands/append The "append" command is said to be in O(1) but it is a little bit suspicious to me... Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/5/31 Jonathan Ellis > On Tue, May 31, 2011 at 2:22 PM, Marcus Bointon > wrote: > > mysql reads the entire value of y, appends the data, then writes the > whole thing back, which unfortunately is an O(n^2) operation. > > Actually, this analysis is incorrect. Appending M bytes to N is O(N + > M) which isn't the same as N^2 at all. > > At least in Cassandra, nor can I think of any possible algorithm which > would allow MySQL to achieve N^2, but I don't claim to be an expert > there. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: When should I use Solandra?
Why do you need Solandra for storing data ? If you want to retrieve data simply use Cassandra. Solandra is for research and indexing it is a search engine. I do not recommand you to store data uniquely in a search engine. Use the following desgin : *Store ALL data in Cassandra then extract from Cassandra only the data you need to index in Solandra. For what it matters you can use Solr instead of Solandra. In SolR you have something called schema.xml where you can set up which fields to index. My advice is do not store you passwords in plain text. Add salt (random sequence) AND hash it then insert the bytes in Cassandra. Otherwise you'll end up like Sony and a massive lawsuit when hackers will breach in your website and steal the passwords.* If you really want to use Solandra I guess there is an equivalent to the schema.xml where you have lines to tell wether or not to index some fields. Victor Kabdebon http://www.victorkabdebon.com 2011/6/4 Jean-Nicolas Boulay Desjardins > Hi, > > I am planning to use Cassandra to store my users passwords and at the same > time data for my website that need to be accessible via search. My Question > is should I use two DB: Cassandra (for users passwords) and Solandra (for > the websites data) or can I put everything in Solandra? > > Is there a way to stop Solandra from indexing my users passwords? > > Thanks in advance for any help. >
Re: When should I use Solandra?
Again I don't really know the specifics of Solandra but in Solr (so Solandra being a cousin of Solr it should be true too) you have XML fields like this : Just turn indexed to false and it's not going to be indexed... Thrift won't affect Solandra at all. 2011/6/4 Jean-Nicolas Boulay Desjardins > Hi, > > So if I understand Solandra. > > All the data are in Solandra and you can query them like you would normaly > with a normal Cassandra setup and search through them. > > The data from the indexing of Solr is stored in Cassandra column family... > > Second, question. I have Thrift already install will it affect my setup of > Solandra? > > Thrid question: my passwords (yes, I know, I will hash them :) I am no > Sony) even hash I don't want them to be index by Solr in Solandra; is there > a way to stop Solandra from indexing the hash passwords or any other data or > should I put this information in another DB? > > Is Solandra as stable as Cassandra? > > Sorry, I am just EXTREMELY curious! :) > > Thanks allot for your time and help! > > On Sat, Jun 4, 2011 at 8:29 PM, Jake Luciani wrote: > >> On Saturday, June 4, 2011, Kirk Peterson wrote: >> > I think the OP was asking if you can use the same Cassandra cluster >> that Solandra is integrated with to store non-Solandra in a different >> keyspace. This would remove the need to run two Cassandra clusters, one for >> storing his Solandra index, and another for his other data. >> > >> >> Yes. Both services are running. Cassandra thrift and solr >> >> > I'm not sure if Solandra supports this, but I would start by checking to >> see if the Cassandra thrift daemon is binding when running the Solandra >> server. If the thrift daemon for cassandra is available , then there is a >> good chance (albeit, I'm not sure how you would configure it) that it would >> be possible, so long as you didn't mess with the Solandra keyspace. >> > >> > cheers, >> > -kirk >> > >> > >> > On Sat, Jun 4, 2011 at 11:57 AM, Norman Maurer < >> norman.mau...@googlemail.com> wrote: >> > >> > Are you sure you really need cassandra for this ? For me it sounds >> > like mysql or other databases would be a better fit for you (if you >> > don't need to store a very hugh amount of data...) >> > >> > Bye, >> > Norman >> > >> > 2011/6/4 Jean-Nicolas Boulay Desjardins : >> >> Hi, >> >> I am planning to use Cassandra to store my users passwords and at the >> same >> >> time data for my website that need to be accessible via search. My >> Question >> >> is should I use two DB: Cassandra (for users passwords) and Solandra >> (for >> >> the websites data) or can I put everything in Solandra? >> >> Is there a way to stop Solandra from indexing my users passwords? >> >> Thanks in advance for any help. >> > >> > >> > -- >> > ⑆gmail.com⑆necrobious⑈ >> > >> > >> >> -- >> http://twitter.com/tjake >> > > > > -- > Name / Nom: Boulay Desjardins, Jean-Nicolas > Website / Site Web: www.jeannicolas.com >
Re: New web client & future API
Hello Markus, Actually from what I understood (please correct me if I am wrong) CQL is based on Thrift / Avro. Victor Kabdebon 2011/6/14 Markus Wiesenbacher | Codefreun.de > > Hi, > > what is the future API for Cassandra? Thrift, Avro, CQL? > > I just released an early version of my web client > (<http://www.codefreun.de/apollo> > http://www.codefreun.de/apollo) which is Thrift-based, and therefore I > would like to know what the future is ... > > Many thanks > MW >
Re: New web client & future API
Ok thanks for the update. I thought the query string was translated to Thrift, then send to a server. Victor Kabdebon 2011/6/15 Eric Evans > On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote: > > Actually from what I understood (please correct me if I am wrong) CQL > > is based on Thrift / Avro. > > In this project, we tend to use the word "Thrift" as a sort of shorthand > for "Cassandra's RPC interface", and not, "The serialization and RPC > framework from the Apache Thrift project". > > CQL does not (yet )have its own networking protocol, so it uses Thrift > as a means of delivering queries, and serializing the results, but it is > *not* a wrapper around the existing RPC methods. The query string you > provide is parsed entirely on the server. > > -- > Eric Evans > eev...@rackspace.com > >
Re: solandra or pig or....?
I can speak for what I know : Pig I have taken only a quick look and maybe some guys from Twitter can answer better than me on that particular program. Pig is not for "on demand" queries: they are quite slow and as you said you extract relevant information and append it to another CF where you can retrieve quickly the statistics. SolR is purely a search engine. It is not only text based but also time based etc... To do statistics you need mathematical operations, statistics, SolR won't provide that. It can do simple things in terms of statistics but mostly it is a search engine. Personally for what you are asking I would use Pig and stock that in CF. I would update those CF regularly. For simple statistics you can generate them with your favorite language or a specialized language such as R as long as it concerns small sets. Hope it helps, Victor Kabdebon 2011/6/21 Sasha Dolgy > Folks, > > Simple question ... Assuming my current use case is the ability to log > lots of trivial and seemingly useless sports statistics ... I want a > user to be able to query / compare For example: > > --> Show me all baseball players in cheektowaga and ontario, > california who have hit a grandslam on tuesdays where it was just a > leap year. > > Each baseball player is represented by a single row in a CF: > > player_uuid, fullname, hometown, game1, game2, game3, game4 > > Game's are UUID's that are a reference to another row in the same CF > that provides information about that game... > > location, final score, date (unix timestamp or ISO format) , and > statitics which are represented as a new column timestamp:player_uuid > > I can use PIG, as I understand, to run a query to generate specific > information about specific "things" and populate that data back into > Cassandra in another CF ... similar to the hypothetical search > aboveas the information is structured already, i assume PIG is the > right tool for the job, but may not be ideal for a web application and > enabling ad-hoc queries ... it could take anywhere from 2-? > seconds for that query to generate, populate, and return to the > user...? > > On the other hand, I have started to read about Solr / Solandra / > Lucandra can this provide similar functionality or better ? or > is it more geared towards full text search and indexing ... > > I don't want to get into the habit of guessing what my potential users > want to search for ... trying to think of ways to offload this to > them. > > > > -- > Sasha Dolgy > sasha.do...@gmail.com >
Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)
Hello everybody, I actually have the exact same problem. I have very little amount of data ( few hundred kb) and the memory consumption goes up without any end. in sight. For On my node I have limited ram ( 2 Gb) to run cassandra, but since I have very little data, I fought it was not a problem, here is the result of $du : vic...@:~$ du /opt/cassandra/data/ -h 40K/opt/cassandra/data/system 1,7M/opt/cassandra/data/FallingDown 1,7M/opt/cassandra/data/ Now, if I look at : vic...@:~$ sudo ps aux | grep "cassandra" cassandra 11034 0.2 22.9 *1107772 462764* ? Sl Dec17 6:13 /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar org.apache.cassandra.thrift.CassandraDaemon Cassandra uses 462764 Kb, roughly 460 Mb for 2 Mb of data... And it keeps getting bigger. It is important to know that I have just a few insert, quite a lot of read though. Also Cassandra seams to completly ignore the JVM limitations such as Xmx. If I don't stop and launch Cassandra every 15 ou 20 days it simply crashes, due to oom errors. Is there an explanation for this ? Thank you all, Victor 2010/12/18 Zhu Han > Here is a typo, sorry... > > best regards, > hanzhu > > > On Sun, Dec 19, 2010 at 10:29 AM, Zhu Han wrote: > >> The problem seems still like the C-heap of JVM, which leaks 70MB every >> day. Here is the summary: >> >> on 12/19: 010c3000 178548K rw---[ anon ] >> on 12/18: 010c3000 110320K rw---[ anon ] >> on 12/17: 010c3000 39256K rw---[ anon ] >> >> This should not be the JVM object heap, because the object heap size is >> fixed up per the below JVM settings. Here is the map of JVM object heap, >> which remains constant. >> >> 010c3000 39256K rw---[ anon ] >> > > It should be : > 2b58433c 1069824K rw---[ anon ] > > >> >> I'll paste it to open-jdk mailist to seek for help. >> >> Zhu, >>> Couple of quick questions: >>> How many threads are in your JVM? >>> >> >> There are hundreds of threads. Here is the settings of Cassandra: >> 1) *8 >> 128* >> >> The thread stack size on this server is 1MB. So I observe hundreds of >> single mmap segment as 1MB. >> >> Can you also post the full commandline as well? >>> >> Sure. All of them are default settings. >> >> /usr/bin/java -ea -Xms1G -Xmx1G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 >> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly >> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8080 >> -Dcom.sun.management.jmxremote.ssl=false >> -Dcom.sun.management.jmxremote.authenticate=false >> -Dstorage-config=bin/../conf -cp >> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.8.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar >> org.apache.cassandra.thrift.CassandraDaemon >> >> >>> Also, output of cat /proc/meminfo >>> >> >> This is an openvz based testing environment. So /proc/meminfo is not very >> helpful. Whatever, I paste it here. >> >> >> MemTotal: 9838380 kB >> MemFree: 4005900 kB >> Buffers: 0 kB >> Cached: 0 kB >> SwapCached: 0 kB >> Active: 0 kB >> Inactive:0 kB >> HighTotal: 0 kB >> HighFree:0 kB >> LowTotal: 9838380 kB >> LowFree: 4005900 kB >> SwapTotal:
Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)
Hello Peter, So more information on that problem : Yes I am using this node with very few data, it is used to design requests so I don't need a very large dataset. I am running Apache Cassandra 0.6.6 on a Debian Stable, with java version "1.6.0_22". I recently restarted cassandra, thus I have this low memory use, but if I keep it running for 2 or 3 weeks then Cassandra will take about 1.5 Gb. Here is the result of the command, one day after the previous one : vic...@***:~$ sudo ps aux | grep "cassandra" root 11034 0.2 26.8 1167304* 540176* ? Sl Dec17 8:09 /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar org.apache.cassandra.thrift.CassandraDaemon I have done very little work on it (a few insert and reads). Thank you, Victor 2010/12/19 Peter Schuller > > vic...@:~$ sudo ps aux | grep "cassandra" > > cassandra 11034 0.2 22.9 1107772 462764 ? Sl Dec17 6:13 > > /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 > > -Dcom.sun.management.jmxremote.ssl=false > > -Dcom.sun.management.jmxremote.authenticate=false > > -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp > > > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar > > org.apache.cassandra.thrift.CassandraDaemon > > > > Cassandra uses 462764 Kb, roughly 460 Mb for 2 Mb of data... And it keeps > > getting bigger. > > It is important to know that I have just a few insert, quite a lot of > read > > though. Also Cassandra seams to completly ignore the JVM limitations such > as > > Xmx. > > If I don't stop and launch Cassandra every 15 ou 20 days it simply > crashes, > > due to oom errors. > > The resident size is not unexpected given that your Xmx is 512 MB. The > virtual may or may not be expected depending; for example thread > stacks as previously discussed in this thread. > > If you're not seeing the *resident* set size go above the maximum heap > size, you're unlikely to be seeing the same problem. > > WIth respect to OOM, see > http://www.riptano.com/docs/0.6/operations/tuning - but without more > information it's difficult to know what specifically it is that you're > hitting. Are you seriously saying you're running for 15-20 days with > only 2 mb of live data? > > -- > / Peter Schuller >
Storing big objects into columns
Dear all, In a project I would like to store "big" objects in columns, serialized. For example entire images (several Ko to several Mo), flash animations (several Mo) etc... Does someone use Cassandra with those relatively big columns and if yes does it work well ? Is there any drawbacks using this method ? Thank you, Victor K.
Re: Storing big objects into columns
Is there any recommanded maximum size for a Column ? (not the very upper limit which is 2Gb) Why is it useful to chunk the content into multiple columns ? Thank you, Victor K. 2011/1/13 Ryan King > On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon > wrote: > > Dear all, > > In a project I would like to store "big" objects in columns, serialized. > For > > example entire images (several Ko to several Mo), flash animations > (several > > Mo) etc... > > Does someone use Cassandra with those relatively big columns and if yes > does > > it work well ? Is there any drawbacks using this method ? > > I haven't benchmarked this myself, but I think you'll want to chunk > your content into multiple columns in the same row. > > -ryan >
Re: Storing big objects into columns
Ok thank you very much for these information ! If somebody has more insights on this matter I am still interested ! Victor K. 2011/1/13 Ryan King > On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon > wrote: > > Is there any recommanded maximum size for a Column ? (not the very upper > > limit which is 2Gb) > > Why is it useful to chunk the content into multiple columns ? > > I think you're going to have to do some tests yourself. > > You want to chunk it so that you can pseudo-stream the content. You > don't want to have to load the whole content at once. > > -ryan >
Re: live data migration from mysql to cassandra
I personnally did it the other way around : from Cassandra to PostGreSQL, I needed an hybrid system : Cassandra solidly holds all data while PostGreSQL holds fewer data but request are simple and efficient ( with SELECT WHERE). This is pretty easy once you master key browsing and iterating. I think that Cassandra is a totally different design and Cassandra design is tailored to once needs wether *SQL is more general. So all migrations are different. Ruslan if you are interested by "big" migrations you should check Reddit's blog or Digg's blog. They switch from *SQL to Cassandra and their hold a lot of data. Best Regards, Victor K. http://www.voxnucleus.fr 2011/1/14 Edward Capriolo > On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov > wrote: > > Hello > > > > Dear community please share your experience, home you make live(without > > stop) migration from mysql or other RDBM to cassandra > > > > There is no built in way to do this. I remember hearing at hadoop > world this year that the hbase guys have a system to read mysql slave > logs and replay into hbase. Since all the nosql community seems to do > this maybe we can 'borrow' this idea. > > Edward >
Re: live data migration from mysql to cassandra
gosh, sorry for the mistakes I am tired ! Victor K. 2011/1/14 Victor Kabdebon > I personnally did it the other way around : from Cassandra to PostGreSQL, I > needed an hybrid system : Cassandra solidly holds all data while PostGreSQL > holds fewer data but request are simple and efficient ( with SELECT WHERE). > This is pretty easy once you master key browsing and iterating. > > I think that Cassandra is a totally different design and Cassandra design > is tailored to once needs wether *SQL is more general. So all migrations are > different. Ruslan if you are interested by "big" migrations you should > check Reddit's blog or Digg's blog. They switch from *SQL to Cassandra and > their hold a lot of data. > > Best Regards, > Victor K. > http://www.voxnucleus.fr > > 2011/1/14 Edward Capriolo > > On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov >> wrote: >> > Hello >> > >> > Dear community please share your experience, home you make live(without >> > stop) migration from mysql or other RDBM to cassandra >> > >> >> There is no built in way to do this. I remember hearing at hadoop >> world this year that the hbase guys have a system to read mysql slave >> logs and replay into hbase. Since all the nosql community seems to do >> this maybe we can 'borrow' this idea. >> >> Edward >> > >
Re: Do you have a site in production environment with Cassandra? What client do you use?
Same here Hector + java Best Regards, Victor K 2011/1/14 Ran Tavory > Java > On Jan 14, 2011 8:25 PM, "Ertio Lew" wrote: > > what is the technology stack do you use? > > > > On 1/14/11, Ran Tavory wrote: > >> I use Hector, if that counts. .. > >> On Jan 14, 2011 7:25 PM, "Ertio Lew" wrote: > >>> Hey, > >>> > >>> If you have a site in production environment or considering so, what > >>> is the client that you use to interact with Cassandra. I know that > >>> there are several clients available out there according to the > >>> language you use but I would love to know what clients are being used > >>> widely in production environments and are best to work with(support > >>> most required features for performance). > >>> > >>> Also preferably tell about the technology stack for your applications. > >>> > >>> Any suggestions, comments appreciated ? > >>> > >>> Thanks > >>> Ertio > >> >
Re: Cassandra in less than 1G of memory?
Dear rajat, Yes it is possible, I have the same constraints. However I must warn you, from what I see Cassandra memory consumption is not bounded in 0.6.X on debian 64 Bit Here is an example of an instance launch in a node : root 19093 0.1 28.3 1210696 *570052* ? Sl Jan11 9:08 /usr/bin/java -ea -Xms128M *-Xmx512M *-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon Look at the second bold value, Xmx indicates the maximum memory that cassandra can use; it is set to be 512, so it could easily fit into 1 Gb. Now look at the first one, 570Mb > 512 Mb. Moreover if I come back in one day the first value will be even higher. Probably around 610 Mb. Actually it increases to the point where I need to restart it otherwise other program are shut down by Linux for cassandra to further expand its memory usage... By the way it's a call to other cassandra users, am I the only one to encounter this problem ? Best regards, Victor K. 2011/1/14 Rajat Chopra > Hello. > > > > According to JVM heap size topic at > http://wiki.apache.org/cassandra/MemtableThresholds , Cassandra would need > atleast 1G of memory to run. Is it possible to have a running Cassandra > cluster with machines that have less than that memory… say 512M? > > I can live with slow transactions, no compactions etc, but do not want an > OutOfMemory error. The reason for a smaller bound for Cassandra is that I > want to leave room for other processes to run. > > > > Please help with specific parameters to tune. > > > > Thanks, > > Rajat > > >
Re: Cassandra in less than 1G of memory?
Hi Jonathan, hi Edward, Jonathan : but it looks like mmaping wants to consume the entire memory of my server. It goes up to 1.7 Gb for a ridiculously small amount of data. Am I doing something wrong or is there something I should change to prevent this never ending increase of memory consumption ? Edward : I am not sure, I will try to see that tomorrow but my disk access mode is standard, not mmap. Anyway thank you very much, Victor K. PS : here is some hours after the result of ps aux | grep cassandra root 19093 0.1 30.0 1243940 *605060* ? Sl Jan11 10:15 /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon 2011/1/15 Jonathan Ellis > mmapping only consumes memory that the OS can afford to feed it. > > On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo > wrote: > > On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon > > wrote: > >> Dear rajat, > >> > >> Yes it is possible, I have the same constraints. However I must warn > you, > >> from what I see Cassandra memory consumption is not bounded in 0.6.X on > >> debian 64 Bit > >> > >> Here is an example of an instance launch in a node : > >> > >> root 19093 0.1 28.3 1210696 570052 ? Sl Jan11 9:08 > >> /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > >> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > >> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 > >> -Dcom.sun.management.jmxremote.ssl=false > >> -Dcom.sun.management.jmxremote.authenticate=false > >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp > >> > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar > >> org.apache.cassandra.thrift.CassandraDaemon > >> > >> Look at the second bold value, Xmx indicates the maximum memory that > >> cassandra can use; it is set to be 512, so it could easily fit into 1 > Gb. > >> Now look at the first one, 570Mb > 512 Mb. Moreover if I come back in > one > >> day the first value will be even higher. Probably around 610 Mb. > Actually it > >> increases to the point where I need to restart it otherwise other > program > >> are shut down by Linux for cassandra to further expand its memory > usage... > >> > >> By the way it's a call to other cassandra users, am I the only one to > >> encounter this problem ? > >> > >> Best regards, > >> > >> Victor K. > >> > >> 2011/1/14 Rajat Chopra > >>> > >>>
Re: cass0.7: Creating colum family & Sorting
Comparator comparates only the column inside a Key. Key sorting is done by your partitionner. Best regards, Victor Kabdebon 2011/1/16 kh jo > I am having some problems with creating column families and sorting them, > > I want to create a countries column family where I can get a sorted list of > countries(by country's name) > > the following command fails: > > create column family Countries with comparator=LongType > and column_metadata=[ > {column_name: cid, validation_class: LongType, index_type: KEYS}, > {column_name: cname, validation_class: UTF8Type}, > {column_name: code, validation_class: UTF8Type, index_type: KEYS} > ]; > > IT SHOWS: 'id' could not be translated into a LongType. > > > the following works: > > create column family Countries with comparator=UTF8Type > and column_metadata=[ > {column_name: cid, validation_class: LongType, index_type: KEYS}, > {column_name: cname, validation_class: UTF8Type}, > {column_name: code, validation_class: UTF8Type, index_type: KEYS} > ]; > > > but when I insert some columns, they are not sorted as I want > > $countries = new ColumnFamily(Cassandra::con(), 'Countries'); > $countries->insert('Afghanistan', array('cid'=> '1', 'cname' => > 'Afghanistan', 'code' => 'AF')); > $countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany', > 'code' =>'DE')); > $countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe', > 'code' =>'ZM')); > > now: > list Countries; > > shows: > --- > RowKey: Germany > => (column=cid, value=2, timestamp=1295211346716047) > => (column=cname, value=Germany, timestamp=1295211346716047) > => (column=code, value=DE, timestamp=1295211346716047) > --- > RowKey: Zimbabwe > => (column=cid, value=3, timestamp=1295211346713570) > => (column=cname, value=Zimbabwe, timestamp=1295211346713570) > => (column=code, value=ZM, timestamp=1295211346713570) > --- > RowKey: Afghanistan > => (column=cid, value=1, timestamp=1295211346709448) > => (column=cname, value=Afghanistan, timestamp=1295211346709448) > => (column=code, value=AF, timestamp=1295211346709448) > > > I don't see any sorting here?! > >
Re: Cassandra in less than 1G of memory?
If it's because of swapping made by Linux, wouldn't I only see the swap memory consumption rise ? Because the problem is (apart from swap becoming bigger and bigger) that cassandra ram memory consumption is going through the roof. However I want to give a try to the proposed method. Thank you very much, Best Regards, Victor Kabdebon PS : memory consumption : root 19093 0.1 35.8 *1362108 722312* ? Sl Jan11 14:01 /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon 2011/1/16 Aaron Morton > The OS will make it's best guess as to how much memory if can give over to > mmapped files. Unfortunately it will not always makes the best decision, see > the information on adding JNA and mlockall() support in cassandra 0.6.5 > http://www.datastax.com/blog/whats-new-cassandra-065 > > <http://www.datastax.com/blog/whats-new-cassandra-065>As Jonathan says, > try setting the disk mode to standard to see the difference. > > WRT the resident memory for the process, not all memory allocation is done > on the heap. To see the non heap usage connect to the processing using > JConsole and take a look at the Memory tab. For example on my box now > Cassandra has 110M of heap memory and 20M of non heap. AFAIK memory such as > the class definitions are not included in the heap memory usage. > > Hope that helps. > Aaron > > > On 15 Jan, 2011,at 08:03 PM, Victor Kabdebon > wrote: > > Hi Jonathan, hi Edward, > > Jonathan : but it looks like mmaping wants to consume the entire memory of > my server. It goes up to 1.7 Gb for a ridiculously small amount of data. > Am I doing something wrong or is there something I should change to prevent > this never ending increase of memory consumption ? > Edward : I am not sure, I will try to see that tomorrow but my disk access > mode is standard, not mmap. > > Anyway thank you very much, > Victor K. > > PS : here is some hours after the result of ps aux | grep cassandra > root 19093 0.1 30.0 1243940 *605060* ? Sl Jan11 10:15 > /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError > -Dcom.sun.management.jmxremote.port=8081 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar > org.apache.cassandra.thrift.CassandraDaemon > > > 2011/1/15 Jonathan Ellis > >> mmapping only consumes memory that the OS can afford to feed it. >> >> >
Re: Cassandra in less than 1G of memory?
Peter : What do you recommand ? using Aaron Morton solution and using JNA or just disable mmap ? (Or is it the same and I missed something ?) Thank you all for your advice, I am surprised to be the only one to have this problem even if I'm using a pretty standard distribution. Best regards, Victor K. 2011/1/16 Peter Schuller > > bigger and bigger) that cassandra ram memory consumption is going through > > the roof. > > mmap():ed memory will be counted as virtual address space. > > Disable mmap() and use standard I/O if you want to see how it behaves > for real;' then if you want mmap() for performance you can re-enable > it. > > -- > / Peter Schuller >
Re: Secondary Index information
Dear Sasha, I am currently thinking about using secondary index in the future. I have seen two pros : _Simplicity, it is "more simple" to query based on a second index than going for a first column then a second. _ "Consistency" : depending on where you store your inverted index, it may be unavailable to query because of a node down, or an error allows you to insert in the first column but then you crash and cannot insert into the inverted your inverse information. Because of that you cannot query and you have to periodically check the consistency of the data in the double column. That's what I am doing right now for my applications and making it simpler and more consistent would be great. Remember : I don't know the details of the implementation, I take this principle as if it was perfectly working. But I am interested in experiences. Best regards, Victor K. http://www.voxnucleus.fr 2011/1/28 Sasha Dolgy > Thank you. So, after reading, I'm still unsure if this feature will > afford me a larger benefit when compared to an inverted index > solution. > > Has anyone done a pros / cons ? > > -sd > > > On Fri, Jan 28, 2011 at 3:22 PM, Jake Luciani wrote: > > http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes > > > > On Fri, Jan 28, 2011 at 7:15 AM, Sasha Dolgy > wrote: > >> > >> Hi there, > >> > >> Where can I find information regarding secondary indexes? Spent the > >> past 2 days looking for some good details. > >> > >> http://wiki.apache.org/cassandra/SecondaryIndexes doesn't yet exist, > >> althought it's referenced from > >> http://wiki.apache.org/cassandra/StorageConfiguration > >> > >> Trying to understand if this feature will afford me a larger benefit > >> when compared to an inverted index solution. > >> > >> Thanks in advance, > >> -sd > >> > >> -- > >> Sasha Dolgy > >> @sdolgy > >> sasha.do...@gmail.com >
Re: Cassandra and count
Buddasystem is right. A count returns columns to the client which count it. My advice : do not count big columns / supercolumns. People in the dev team are trying to develop distributed counters but I don't know the state of this research. Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/1/28 buddhasystem > > As far as I know, there are no aggregate operations built into Cassandra, > which means you'll have to retrieve all of the data to count it in the > client. I had a thread on this topic 2 weeks ago. It's pretty bad. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Re: Using Cassandra to store files
Dear Brendan, I would really be interested by your findings too. I need a system to store various documents, I am thinking of Cassandra (that I am already using) or using a second type of database or any other system. Maybe like dan suggested, using mogilefs. Thank you, Victor Kabdebon http://www.voxnucleus.fr 2011/2/3 Dan Kuebrich > >> CouchDB >> > That's not what document-oriented means! (har har) > > I don't know all the details of your case, but with serving static files I > suspect you could do ok with something that has a much smaller memory/cpu > footprint as you won't have as great of write throughput / read latency > concerns. I've used mogilefs <http://www.danga.com/mogilefs/> for this > before. > > -- >> >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html >> Sent from the cassandra-u...@incubator.apache.org mailing list archive at >> Nabble.com. >> > >
Re: revisioned data
Hello Raj, No it actually doesn't make sense from the point of view of Cassandra; OrderingPartioner preserves the order of the *keys*. The Ordering will be done according to the *supercolumn name*. In that case you can set the ordering with compare_super_with (sorry I don't remember exactly the new term in Cassandra, but that's the idea). The compare_with will order your columns inside your supercolumn. However, and I think that many will agree here, tend to avoid SuperColumn. Rather than using SuperColumns try to think like that : CF1 : "ObjectStore" Key :ID (long) Columns : { name other fields update time (long [date]) ...} CF2 : "ObjectOrder" Key : "myorderedobjects Column:{ { name : identifier that can be sorted value :ObjectID}, ... } Best regards, Victor Kabdebon, http://www.voxnucleus.fr 2011/2/5 Raj Bakhru > Hi all - > > We're new to Cassandra and have read plenty on the data model, but we > wanted to poll for thoughts on how to best handle this structure. > > We have simple objects that have and ID and we want to maintain a history > of all the revisions. > > e.g. > MyObject: > ID (long) > name > other fields > update time (long [date]) > > > Any time the object changes, we'll store down a new version of the object > (same ID, but different update time and other fields). We need to be able > to query out what the object was as-of any time historically. We also need > to be able to query out what some or all of the items of this object type > were as-of any time historically.. > > In SQL, we'd just find the max(id) where update time < queried_as_of_time > > In Cassandra, we were thinking of modeling as follows: > > CF: MyObjectType > Super-Column: ID of object (e.g. 625) > Column: updatetime (e.g. "1000245242") > Value: byte[] of serialized object > > We were thinking of using the OrderingPartitioner and using range queries > against the data. > > Does this make sense? Are we approaching this in the wrong way? > > Thanks a lot > > > >
Re: unique key generation
Hello Kallin. If you use timeUUID the chance to generate two time the same uuid is the following : considering that both client generate the uuid at the *same millisecond*, the chance of generating the same uuid is : 1/1.84467441 × 1019Which is equal to the probability for winning a national lottery for 1e11 days in a row ( for 270 million years). Well if you do have a collision you should play the lottery :). Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/2/7 Kallin Nagelberg > Hey, > > I am developing a session management system using Cassandra and need > to generate unique sessionIDs (cassandra columnfamily keys). Does > anyone know of an elegant/simple way to accomplish this? I am not sure > about using time based uuids on the client as there a chance that > multiple clients could generate the same ID. I've heard suggestions of > using zookeeper as a source for the IDs, but was just hoping that > there might be something simpler for my purposes. > > Thanks, > -Kal >
Cassandra memory consumption
Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but "Standard", I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424* 687476* ? Sl Feb02 14:58 /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon result of nodetool info : 116024732779488843382476400091948985708 *Load : 1,94 MB* Generation No: 1296673772 Uptime (seconds) : 467550 *Heap Memory (MB) : 120,26 / 253,94* I have about 21 column families, none of them have a lot of information ( as you see I have 2 Mb of text which is really small). Even if I set Xmx at 256 there is 687M of memory used. Where does this memory come from ? Bad garbage collection ? Something that I ignore ? Thank you for your help I really need to get rid of that problem. Best regards, Victor Kabdebon
Re: Cassandra memory consumption
It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 *494996* pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon > Dear all, > > Sorry to come back again to this point but I am really worried about > Cassandra memory consumption. I have a single machine that runs one > Cassandra server. There is almost no data on it but I see a crazy memory > consumption and it doesn't care at all about the instructions... > Note that I am not using mmap, but "Standard", I use also JNA (inside lib > folder), i am running on debian 5 64 bits, so a pretty normal configuration. > I also use Cassandra 0.6.8. > > > Here are the informations I gathered on Cassandra : > > 105 16765 0.1 34.1 1089424* 687476* ? Sl Feb02 14:58 > /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError > -Dcom.sun.management.jmxremote.port=8081 > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar > org.apache.cassandra.thrift.CassandraDaemon > > result of nodetool info : > > 116024732779488843382476400091948985708 > *Load : 1,94 MB* > Generation No: 1296673772 > Uptime (seconds) : 467550 > *Heap Memory (MB) : 120,26 / 253,94* > > > I have about 21 column families, none of them have a lot of information ( > as you see I have 2 Mb of text which is really small). Even if I set Xmx at > 256 there is 687M of memory used. Where does this memory come from ? Bad > garbage collection ? Something that I ignore ? > Thank you for your help I really need to get rid of that problem. > > Best regards, > Victor Kabdebon >
Re: Cassandra memory consumption
Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis > I missed the part where you explained where you're getting your numbers > from. > > On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon > wrote: > > It is really weird that I am the only one to have this issue. > > I restarted Cassandra today and already the memory compution is over the > > limit : > > > > root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 > > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 > > -Dcom.sun.management.jmxremote.ssl=false > > -Dcom.sun.management.jmxremote.authenticate=false > > -Dstorage-config=bin/../conf -cp > > > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar > > org.apache.cassandra.thrift.CassandraDaemon > > > > It is really an annoying problem if we cannot really foresee memory > > consumption. > > > > Best regards, > > Victor K > > > > 2011/2/8 Victor Kabdebon > >> > >> Dear all, > >> > >> Sorry to come back again to this point but I am really worried about > >> Cassandra memory consumption. I have a single machine that runs one > >> Cassandra server. There is almost no data on it but I see a crazy memory > >> consumption and it doesn't care at all about the instructions... > >> Note that I am not using mmap, but "Standard", I use also JNA (inside > lib > >> folder), i am running on debian 5 64 bits, so a pretty normal > configuration. > >> I also use Cassandra 0.6.8. > >> > >> > >> Here are the informations I gathered on Cassandra : > >> > >> 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58 > >> /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > >> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > >> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 > >> -Dcom.sun.management.jmxremote.ssl=false > >> -Dcom.sun.management.jmxremote.authenticate=false > >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp > >> > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar > >> org.apache.cassandra.thrift.CassandraDaemon > >> > >> result of nodetool info : > >> > >> 116024732779488843382476400091948985708 > >> Load : 1,94 MB > >> Generation No: 1296673772 > >> Uptime (seconds) : 467550 > >> Heap Memory (MB) : 120,26 / 253,94 > >> &
Re: Cassandra memory consumption
Information on the system : *Debian 5* *Jvm :* victor@testhost:~/database/apache-cassandra-0.6.6$ java -version java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) *RAM :* 2Go 2011/2/8 Victor Kabdebon > Sorry Jonathan : > > So most of these informations were taken using the command : > > sudo ps aux | grep cassandra > > For the nodetool information it is : > > /bin/nodetool --host localhost --port 8081 info > > > Regars, > > Victor K. > > > 2011/2/8 Jonathan Ellis > > I missed the part where you explained where you're getting your numbers >> from. >> >> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon >> wrote: >> > It is really weird that I am the only one to have this issue. >> > I restarted Cassandra today and already the memory compution is over the >> > limit : >> > >> > root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 >> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC >> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 >> -XX:MaxTenuringThreshold=1 >> > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly >> > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 >> > -Dcom.sun.management.jmxremote.ssl=false >> > -Dcom.sun.management.jmxremote.authenticate=false >> > -Dstorage-config=bin/../conf -cp >> > >> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar >> > org.apache.cassandra.thrift.CassandraDaemon >> > >> > It is really an annoying problem if we cannot really foresee memory >> > consumption. >> > >> > Best regards, >> > Victor K >> > >> > 2011/2/8 Victor Kabdebon >> >> >> >> Dear all, >> >> >> >> Sorry to come back again to this point but I am really worried about >> >> Cassandra memory consumption. I have a single machine that runs one >> >> Cassandra server. There is almost no data on it but I see a crazy >> memory >> >> consumption and it doesn't care at all about the instructions... >> >> Note that I am not using mmap, but "Standard", I use also JNA (inside >> lib >> >> folder), i am running on debian 5 64 bits, so a pretty normal >> configuration. >> >> I also use Cassandra 0.6.8. >> >> >> >> >> >> Here are the informations I gathered on Cassandra : >> >> >> >> 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58 >> >> /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC >> >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 >> -XX:MaxTenuringThreshold=1 >> >> -XX:CMSInitiatingOccupancyFraction=75 >> -XX:+UseCMSInitiatingOccupancyOnly >> >> -XX:+HeapDumpOnOutOfMemoryError >> -Dcom.sun.management.jmxremote.port=8081 >> >> -Dcom.sun.management.jmxremote.ssl=false >> >> -Dcom.sun.management.jmxremote.authenticate=false >> >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp >> >> >> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.
Re: Cassandra memory consumption
I will do that in the future and I will post my results here ( I upgraded the server to debian 6 to see if there is any change, so memory is back to normal). I will report in a few days. In the meantime I am open to any suggestion... 2011/2/8 Aaron Morton > When you attach to the JVM with JConsole how much non heap memory and how > much heap memory is reported on the memory tab? > > Xmx controls the total size of the heap memory, which excludes the > permanent generation. > see > > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing > and > > http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generation<http://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation> > > <http://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation> > Total non-heap memory on a 0.7 box I have is around 27M. You numbers seem > large but it would be interesting to know what the JVM is reporting. > > Aaron > > On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon > wrote: > > Information on the system : > > *Debian 5* > *Jvm :* > victor@testhost:~/database/apache-cassandra-0.6.6$ java -version > java version "1.6.0_22" > Java(TM) SE Runtime Environment (build 1.6.0_22-b04) > Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) > > *RAM :* 2Go > > > 2011/2/8 Victor Kabdebon > >> Sorry Jonathan : >> >> So most of these informations were taken using the command : >> >> sudo ps aux | grep cassandra >> >> For the nodetool information it is : >> >> /bin/nodetool --host localhost --port 8081 info >> >> >> Regars, >> >> Victor K. >> >> >> 2011/2/8 Jonathan Ellis >> >> >> I missed the part where you explained where you're getting your numbers >>> from. >>> >>> >>> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon >>> wrote: >>> > It is really weird that I am the only one to have this issue. >>> > I restarted Cassandra today and already the memory compution is over >>> the >>> > limit : >>> > >>> > root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 >>> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC >>> -XX:+UseConcMarkSweepGC >>> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 >>> -XX:MaxTenuringThreshold=1 >>> > -XX:CMSInitiatingOccupancyFraction=75 >>> -XX:+UseCMSInitiatingOccupancyOnly >>> > -XX:+HeapDumpOnOutOfMemoryError >>> -Dcom.sun.management.jmxremote.port=8081 >>> > -Dcom.sun.management.jmxremotessl=false >>> >>> > -Dcom.sun.management.jmxremote.authenticate=false >>> > -Dstorage-config=bin/../conf -cp >>> > >>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar >>> >>> > org.apache.cassandra.thrift.CassandraDaemon >>> > >>> > It is really an annoying problem if we cannot really foresee memory >>> > consumption. >>> > >>> > Best regards, >>> > Victor K >>> > >>> > 2011/2/8 Victor Kabdebon >>> >> >>> >> Dear all, >>> >> >>> >> Sorry to come back again to this point but I am really worried about >>> >> Cassandra memory consumption. I have a single machine that runs one >>> >> Cassandra server. There is almost no data on it but I see a crazy >>> memory >>> >> consumption and it doesn't care at all about the instructions... >>> >> Note that I am not using mmap, but "Standard", I use also JNA (inside >>> lib >>> >> folder), i am running on debian 5 64 bits, so a pretty normal >>> configuration.
Re: Cassandra memory consumption
Yes I have, but I have to add that this is a server where there is so little data (2.0 Mo of text, rougly a book) that even if there were an overhead due to those things it would be minimal. I don't understand what's eating up all that memory, is it because of Linux that has difficulty getting rid of used memory ... I really am puzzled. (by the way it is not a Amazon EC2 server this is a dedicated server). Regards, Victor K. 2011/2/8 Edward Capriolo > On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon > wrote: > > I will do that in the future and I will post my results here ( I upgraded > > the server to debian 6 to see if there is any change, so memory is back > to > > normal). I will report in a few days. > > In the meantime I am open to any suggestion... > > > > 2011/2/8 Aaron Morton > >> > >> When you attach to the JVM with JConsole how much non heap memory and > how > >> much heap memory is reported on the memory tab? > >> Xmx controls the total size of the heap memory, which excludes the > >> permanent generation. > >> see > >> > >> > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing > >> and > >> > >> > http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generation > >> Total non-heap memory on a 0.7 box I have is around 27M. You numbers > seem > >> large but it would be interesting to know what the JVM is reporting. > >> Aaron > >> On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon > > >> wrote: > >> > >> Information on the system : > >> > >> Debian 5 > >> Jvm : > >> victor@testhost:~/database/apache-cassandra-0.6.6$ java -version > >> java version "1.6.0_22" > >> Java(TM) SE Runtime Environment (build 1.6.0_22-b04) > >> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) > >> > >> RAM : 2Go > >> > >> > >> 2011/2/8 Victor Kabdebon > >>> > >>> Sorry Jonathan : > >>> > >>> So most of these informations were taken using the command : > >>> > >>> sudo ps aux | grep cassandra > >>> > >>> For the nodetool information it is : > >>> > >>> /bin/nodetool --host localhost --port 8081 info > >>> > >>> > >>> Regars, > >>> > >>> Victor K. > >>> > >>> > >>> 2011/2/8 Jonathan Ellis > >>> > >>>> I missed the part where you explained where you're getting your > numbers > >>>> from. > >>>> > >>>> > >>>> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon > >>>> wrote: > >>>> > It is really weird that I am the only one to have this issue. > >>>> > I restarted Cassandra today and already the memory compution is over > >>>> > the > >>>> > limit : > >>>> > > >>>> > root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 > >>>> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC > >>>> > -XX:+UseConcMarkSweepGC > >>>> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > >>>> > -XX:MaxTenuringThreshold=1 > >>>> > -XX:CMSInitiatingOccupancyFraction=75 > >>>> > -XX:+UseCMSInitiatingOccupancyOnly > >>>> > -XX:+HeapDumpOnOutOfMemoryError > >>>> > -Dcom.sun.management.jmxremote.port=8081 > >>>> > -Dcom.sun.management.jmxremotessl=false > >>>> > -Dcom.sun.management.jmxremote.authenticate=false > >>>> > -Dstorage-config=bin/../conf -cp > >>>> > > >>>> > > bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:b
Re: unique key generation
Yes i have done a mistake I know ! But I hoped nobody would notice :). It is the odds of winning 3 days in a row (standard probability fail). Still it is totally unlikely Sorry about this mistake, Best regards, Victor K.
Re: online chat scenario
Hello Sasha. In this sort of real time application the way you insert (QUORUM, ONE, etc..) and the way you retrieve is extremely important because your data may not have had the time to propagate to all your nodes. Be sure to use adequate policies to do that : insert to a certain number of nodes but don't sacrifice to much time doing that to keep the real time component. Here is a presentation of how the chat is made in Facebook, it may be useful to you : http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf It's more focused on erlang, but it might give you ideas on how to deal with that problem (I am not sure that DB are the best way to deal with that... but it's just my opinion). Victor Kabdebon http://www.voxnucleus.fr 2011/2/15 Sasha Dolgy > thanks for the response. thinking about this, this would not allow for the > sorting of messages into a chronological order for end user display. i had > thought about having each message as its own column against the room or the > user, but i have had some inconsistencies in retrieving the data. sometimes > i get 3 columns, sometimes i get 50...( i think this is because of the > random partitioner) > > i had thought about this structure: > > [messages][nickname][message id => message data] > [chatrooms][room_name][message id] > > this way i can pull all messages a user ever posted, not specific to a > room. what i haven't been able to do so far is print the timestamp on the > row or column. does this have to be explicitly added somewhere or can it be > returned as part of a 'get' request? > > -sd > > > On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn < > augustyn.mic...@gmail.com> wrote: > >> The schema design depends on chatrooms/users/messages numbers. I.e. you >> can have one CF, where key is chatroom, column name is username, column >> value is the message and message time is the same as column timestamp. >> You can add day-timestamp to the chatroom name to avoid large rows. >> >> Augi >> >> 2011/2/15 Andrey V. Panov >> >> I never did it. But I suppose you can use "chatroom name" as key and store >>> messages & nicks as columns in JSON and timestamp as columnName. >>> >> >> > > > -- > Sasha Dolgy > sasha.do...@gmail.com >
Re: Subscribe
Looks like your wish has been granted. 2011/2/15 Chris Goffinet > I would like to subscribe to your newsletter. > > On Tue, Feb 15, 2011 at 8:04 AM, A J wrote: > >> >> >
Re: Cassandra memory consumption
Yes I didn't see there was 2 different parameters. I was personally setting ( in cassandra 0.6.6 ) MemTableThoughputInMB, but I don't know what BinaryMemtableThroughtputInMB is. And I take this opportunity to ask a question : If you have a small amount of data per key so that your memtable is maybe a few Ko big. Is the memory footprint of the memtable going to be MemTableThoughputInMB mb or few Ko + overhead ? Ruslan I have seen your question in the other mail and I have the same problem. How many CF do you have ? 2011/2/16 ruslan usifov > > Each of your 21 column families will have its own memtable if you have >> the default memtable settings your memory usage will grow quite large >> over time. Have you tuned down your memtable size? >> > > Which config parameter make this? binary_memtable_throughput_in_mb? >
Re: Cassandra memory consumption
Someone please correct me if I am wrong, but I think the overhead you can expect is something like : 16* MemTableThroughtPutInMB but I don't know when BinaryMemTableThroughputInMb come into account.. 2011/2/16 ruslan usifov > > > 2011/2/16 Victor Kabdebon > > >> >> Ruslan I have seen your question in the other mail and I have the same >> problem. How many CF do you have ? >> >> >> 16 >
Re: Cassandra memory consumption
Thanks robert, and do you know if there is a way to control the maximum likely number of memtables ? (I'd like to cap it at 2) 2011/2/16 Robert Coli > On Wed, Feb 16, 2011 at 7:12 AM, Victor Kabdebon > wrote: > > Someone please correct me if I am wrong, but I think the overhead you can > > expect is something like : > > > > MemTableThroughtPutInMB * * number of such memtables which might exist at once, due to flushing > logic> > > JavaOverHeadFudgeFactor is "at least 2". > > The maximum likely number of such memtables is usually roughly "3" > when considered across an assortment of columnfamilies with different > write patterns. > > > but I don't know when BinaryMemTableThroughputInMb come into account.. > > BinaryMemTable options are only considered when using the Binary > Memtable interface. If you don't know what that is, you're not using > it. > > =Rob >
Re: memory consuption
Is it possible to change the maximum JVM heap memory use in 0.6.X ? 2011/2/17 Aaron Morton > What are you using for disk_access_mode ? > Have you tried reducing the JVM head size? > Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock > the JVM memory. > > > Aaron > > > On 17/02/2011, at 9:20 PM, ruslan usifov wrote: > > > > 2011/2/16 Aaron Morton < aa...@thelastpickle.com> > >> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh >> >> Memory mapped files will use additional virtual memory, is controlled in >> conf/Cassandra.yaml disk_access_mode >> >> > And??? JVM memory heap in cassandra 0.7 is by default half of memory is > system in my case 4GB, here is a part of cassandra-env.sh: > > calculate_heap_size() > { > case "`uname`" in > Linux) > system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` > MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M > return 0 > ;; > FreeBSD) > system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` > MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M > return 0 > ;; > *) > MAX_HEAP_SIZE=1024M > return 1 > ;; > esac > } > > > > I set all this options by default. All my nodes have 8GB of memory. And i > affraid that after some time all my nodes goes to hard swap, and only reboot > help them :-((( > > PS: as i understand that down sometime of cassandra is normal? > >
Re: memory consuption
Oh right but Cassandra doesn't really respect that, I thought there was another option to set that. Just for your information, I set xms and xmx very low with a small amount of data. I am waiting to be able to connect jconsole, I don't know why it is not reachable at the moment. Here is my result : 105 26115 0.2 27.3 1125328 755316 ? Sl Feb09 23:58 /usr/bin/java -ea -Xms64M -Xmx128M 2011/2/17 Aaron Morton > bin/cassandra.in.sh > set Xms and Xmx in the JVM_OPTS > > Aaron > > > On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon > wrote: > > Is it possible to change the maximum JVM heap memory use in 0.6.X ? > > 2011/2/17 Aaron Morton > >> What are you using for disk_access_mode ? >> Have you tried reducing the JVM head size? >> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to >> lock the JVM memory. >> >> >> Aaron >> >> >> >> On 17/02/2011, at 9:20 PM, ruslan usifov wrote: >> >> >> >> >> >> 2011/2/16 Aaron Morton < aa...@thelastpickle.com >> > >> >>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh >>> >>> Memory mapped files will use additional virtual memory, is controlled in >>> conf/Cassandra.yaml disk_access_mode >>> >>> >> And??? JVM memory heap in cassandra 0.7 is by default half of memory is >> system in my case 4GB, here is a part of cassandra-env.sh: >> >> calculate_heap_size() >> { >> case "`uname`" in >> Linux) >> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` >> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M >> return 0 >> ;; >> FreeBSD) >> system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` >> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M >> return 0 >> ;; >> *) >> MAX_HEAP_SIZE=1024M >> return 1 >> ;; >> esac >> } >> >> >> >> I set all this options by default. All my nodes have 8GB of memory. And i >> affraid that after some time all my nodes goes to hard swap, and only reboot >> help them :-((( >> >> PS: as i understand that down sometime of cassandra is normal? >> >> >
Re: memory consuption
Sorry I forgot to say that this is the partial result of : ps aux | grep cassandra Best regards 2011/2/17 Victor Kabdebon > Oh right but Cassandra doesn't really respect that, I thought there was > another option to set that. > > Just for your information, I set xms and xmx very low with a small amount > of data. I am waiting to be able to connect jconsole, I don't know why it is > not reachable at the moment. Here is my result : > > > 105 26115 0.2 27.3 1125328 755316 ? Sl Feb09 23:58 > /usr/bin/java -ea -Xms64M -Xmx128M > > > 2011/2/17 Aaron Morton > >> bin/cassandra.in.sh >> set Xms and Xmx in the JVM_OPTS >> >> Aaron >> >> >> On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon >> wrote: >> >> Is it possible to change the maximum JVM heap memory use in 0.6.X ? >> >> 2011/2/17 Aaron Morton >> >>> What are you using for disk_access_mode ? >>> Have you tried reducing the JVM head size? >>> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to >>> lock the JVM memory. >>> >>> >>> Aaron >>> >>> >>> >>> On 17/02/2011, at 9:20 PM, ruslan usifov >>> wrote: >>> >>> >>> >>> >>> >>> 2011/2/16 Aaron Morton < >>> aa...@thelastpickle.com> >>> >>>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh >>>> >>>> Memory mapped files will use additional virtual memory, is controlled in >>>> conf/Cassandra.yaml disk_access_mode >>>> >>>> >>> And??? JVM memory heap in cassandra 0.7 is by default half of memory is >>> system in my case 4GB, here is a part of cassandra-env.sh: >>> >>> calculate_heap_size() >>> { >>> case "`uname`" in >>> Linux) >>> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` >>> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M >>> return 0 >>> ;; >>> FreeBSD) >>> system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` >>> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M >>> return 0 >>> ;; >>> *) >>> MAX_HEAP_SIZE=1024M >>> return 1 >>> ;; >>> esac >>> } >>> >>> >>> >>> I set all this options by default. All my nodes have 8GB of memory. And i >>> affraid that after some time all my nodes goes to hard swap, and only reboot >>> help them :-((( >>> >>> PS: as i understand that down sometime of cassandra is normal? >>> >>> >> >
Re: memory consuption
Already done. The disk access mode is standard in storage-conf.xml (I am using 0.6.6 at the moment, I will upgrade to 0.7.x later). But this memory consumption is a real issue. 2011/2/17 Aaron Morton > Looks like you are using virtual memory for memmapped files. Change the > disk_access_mode to standard if you want to reduce the overall memory > usage. > > Aaron > > On 18 Feb, 2011,at 09:34 AM, Victor Kabdebon > wrote: > > Sorry I forgot to say that this is the partial result of : > ps aux | grep cassandra > > Best regards > > 2011/2/17 Victor Kabdebon > >> Oh right but Cassandra doesn't really respect that, I thought there was >> another option to set that. >> >> Just for your information, I set xms and xmx very low with a small amount >> of data. I am waiting to be able to connect jconsole, I don't know why it is >> not reachable at the moment. Here is my result : >> >> >> 105 26115 0.2 273 1125328 755316 ? Sl Feb09 23:58 >> /usr/bin/java -ea -Xms64M -Xmx128M >> >> >> >> 2011/2/17 Aaron Morton >> >>> bin/cassandra.in.sh >>> set Xms and Xmx in the JVM_OPTS >>> >>> Aaron >>> >>> >>> >>> On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon >>> wrote: >>> >>> >>> Is it possible to change the maximum JVM heap memory use in 0.6.X ? >>> >>> 2011/2/17 Aaron Morton >>> >>>> What are you using for disk_access_mode ? >>>> Have you tried reducing the JVM head size? >>>> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to >>>> lock the JVM memory. >>>> >>>> >>>> Aaron >>>> >>>> >>>> >>>> On 17/02/2011, at 9:20 PM, ruslan usifov >>>> > >>>> wrote: >>>> >>>> >>>> >>>> >>>> >>>> 2011/2/16 Aaron Morton < >>>> aa...@thelastpickle.com> >>>> >>>>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh >>>>> >>>>> Memory mapped files will use additional virtual memory, is controlled >>>>> in conf/Cassandra.yaml disk_access_mode >>>>> >>>>> >>>> And??? JVM memory heap in cassandra 0.7 is by default half of memory is >>>> system in my case 4GB, here is a part of cassandra-env.sh: >>>> >>>> calculate_heap_size() >>>> { >>>> case "`uname`" in >>>> Linux) >>>> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` >>>> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M >>>> return 0 >>>> ;; >>>> FreeBSD) >>>> system_memory_in_bytes=`sysctl hw.physmem | awk '{print >>>> $2}'` >>>> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M >>>> return 0 >>>> ;; >>>> *) >>>> MAX_HEAP_SIZE=1024M >>>> return 1 >>>> ;; >>>> esac >>>> } >>>> >>>> >>>> >>>> I set all this options by default. All my nodes have 8GB of memory. And >>>> i affraid that after some time all my nodes goes to hard swap, and only >>>> reboot help them :-((( >>>> >>>> PS: as i understand that down sometime of cassandra is normal? >>>> >>>> >>> >> >
Re: Abnormal memory consumption
And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum ? Thank you for your inputs for the JVM I'll try to tune that 2011/4/4 Peter Schuller > > You can change VM settings and tweak things like memtable thresholds > > and in-memory compaction limits to get it down and get away with a > > smaller heap size, but honestly I don't recommend doing so unless > > you're willing to spend some time getting that right and probably > > repeating some of the work in the future with future versions of > > Cassandra. > > That said, if you do want to do so to give it a try, I suggest (1) > changing cassandra-env to remove all the GC stuff: > > VM_OPTS="$JVM_OPTS -XX:+UseParNewGC" > JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" > JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" > JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" > JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1" > JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75" > JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" > > And then setting a fixed heap size, and removing the manual fixation of new > gen: > > JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}" > > Then maybe remove the initial heap size enforcement, but that might > not help depending: > > JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" > > And then go through cassandra.yaml and tune down all the various > limitations. Less concurrent readers/writers, all the *_mb_* settings > way down, and the RPC framing limitations. > > But let me re-iterate: I don't recommend running in any such > configuration in production. But if you just want it running for > testing/for just being available, with no special requirements, and > not in production, the above might work. I haven't really tested it > myself; there may be gotchas involved. > > -- > / Peter Schuller >
Re: database design
Dear Jean-Yves, You can have a different approach of the problem. You need on one side a relational database (MySQL, PostGreSQL) or SolR (as an very efficient index) and on the other side Cassandra. The relational database or SolR must contain the minimum amount of information possible : a date and only the relevant data. It enabled me to keep a simple model for Cassandra. Cassandra will act as a "vault" where you keep all the data and then you dispatch the data from Cassandra to the relational database or SolR. When you want to query you query against SolR or the relational data the key / column / supercolumn and you retrieve the complete data from Cassandra. The hard thing is to maintain the coherence between the query part and the Cassandra part. I speak from personal experience but it was very hard for me to use only Cassandra to do everything my (small amateur) website needed. Now I found an alternative I use : Cassandra (data vault) + Redis (Sessions and other volatile data) + SolR (Search engine) + PostGreSQL ( for relational queries). Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/4/13 Edward Capriolo > On Wed, Apr 13, 2011 at 10:39 AM, Jean-Yves LEBLEU > wrote: > > Hi all, > > > > Just some thoughts and question I have about cassandra data modeling. > > > > If I understand well, cassandra is better on writing than on reading. > > So you have to think about your queries to design cassandra schema. We > > are doing incremental design, and already have our system in > > production and we have to develop new queries. > > How do you usualy do when you have new queries, do you write a > > specific job to update data in the database to match the new query you > > are writing ? > > > > Thanks for your help. > > > > Jean-Yves > > > > Good point, Generally you will need to write some type of range > scanning/map reduce application to process and back fill your data. >
Re: CQL v1.0.0: why super column family not descirbed in it?
Hello Eric, Compound columns seem to be a very interesting feature. Do you have any idea in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X ? Thanks, Victor 2011/5/5 Eric Evans > On Thu, 2011-05-05 at 18:19 +0800, Guofeng Zhang wrote: > > I read the CQL v1.0 document. There are operations about column > > families, but it does not describe how to operate on super column > > families. Why? Does this mean that super column families would not be > > supported by CQL in this version? Will it be supported in the future? > > No CQL will never support super columns, but later versions (not 1.0.0) > will support compound columns. Compound columns are better; instead of > a two-deep structure, you can have one of arbitrary depth. > > What you see is what you get for 1.0.0, there simply wasn't enough time > to do everything (you have to start somewhere). > > -- > Eric Evans > eev...@rackspace.com > >
Re: CQL v1.0.0: why super column family not descirbed in it?
Thank you, I will look into that and I will probably wait until there is an "out of the box" comparator. But it's an excellent new feature ! Regards, Victor K. 2011/5/5 Eric Evans > On Thu, 2011-05-05 at 10:49 -0400, Victor Kabdebon wrote: > > Hello Eric, > > > > Compound columns seem to be a very interesting feature. Do you have any > idea > > in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X > ? > > You can use these today with a custom comparator[1]. There is an open > issue[2] (marked as for-0.8.1) to ship one out-of-the-box. > > Language support[3] for CQL will probably take a bit longer. > > [1]: https://github.com/edanuff/CassandraCompositeType > [2]: https://issues.apache.org/jira/browse/CASSANDRA-2231 > [3]: https://issues.apache.org/jira/browse/CASSANDRA-2474 > > -- > Eric Evans > eev...@rackspace.com > >