Get cassandra SuperColumn only!
Hi, I have a cassandra datastore as follows: key:{ supercol (utf8) : { subcol (timuuid) : data } } Now, for a particular usecase I want to do slice on 2 levels. Firstly on supercols & then from the selected supercols results slice subcols (mostly to restrict no of items fetched in mem). I have tried various API's and there doesn't seem to be a way to do this. The reason being when I slice supercols i get the subcols in the result too! Now, ofcourse, I can add another index as follows: key : { supercol (utf8) : } } Haven't looked at cassandra storage in too detail - but hoping there is a better solution! Thanks in advance.
0.7 live schema updates
Hi! I like the new feature of making live schema updates. You can add, drop and rename columns and keyspaces via thrift, but how do you modify column attributes like key_cache_size or rows_cached? Thank you.
Re: 0.7 live schema updates
You can change these attrs using JMX interface. Take a look at org.apache.cassandra.tools.NodeProbe setCacheCapacities method.
busy thread on IncomingStreamReader
Hi - has anyone made any progress with this issue? We are having the same problem with our Cassandra nodes in production. At some point a node (and sometimes all 3) will jump to 100% CPU usage and stay there for hours until restarted. Stack traces reveal several threads in a seemingly endless loop doing this: "Thread-21770" - Thread t...@25278 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.size0(Native Method) at sun.nio.ch.FileChannelImpl.size(Unknown Source) - locked java.lang.obj...@7a2c843d at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) My understanding from reading the code is that this trace shows a thread belonging to the StreamingService which is writing an incoming stream to disk. There seems to be some kind of bizzare problem which is causing the FileChannel.size() function to spin with high CPU. Also, this problem is not easy to replicate - so I would appreciate any information on how the StreamingService works and what triggers it to transfer these file streams. Thanks, Joseph Mermelstein LivePerson http://solutions.liveperson.com > > > i all, > > We setup two nodes and simply set replication factor=2 for test run. > > After both nodes, say, node A and node B, serve several hours, we found that > "node A" always keep 300% cpu usage. > > > (the other node is under 100% cpu, which is normal) > > thread dump on "node A" shows that there are 3 busy threads related to > IncomingStreamReader: > > == > > "Thread-66" prio=10 tid=0x2aade4018800 nid=0x69e7 runnable > > > [0x4030a000] >java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.setMemory(Native Method) > at sun.nio.ch.Util.erase(Util.java:202) > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) > > > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > > "Thread-65" prio=10 tid=0x2aade4017000 nid=0x69e6 runnable > [0x4d44b000] >java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.setMemory(Native Method) > at sun.nio.ch.Util.erase(Util.java:202) > > > at > sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > > > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > "Thread-62" prio=10 tid=0x2aade4014800 nid=0x4150 runnable > [0x4d34a000] >java.lang.Thread.State: RUNNABLE > > > at sun.nio.ch.FileChannelImpl.size0(Native Method) > at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309) > - locked <0x2aaac450dcd0> (a java.lang.Object) > at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597) > > > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) > > === > > > Is there anyone experience similar issue ? > > environments: > > OS --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux > Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build > 14.2-b01, mixed mode) > > > Cassandra --- 0.6.0 > Node configuration --- node A and node B. both nodes use node A as Seed > client --- Java thrift clients pick one node randomly to do read and write. > > > -- > Ingram Chen > online share order: http://dinbendon.net > > > blog: http://www.javaworld.com.tw/roller/page/ingramchen > > >
Getting client only example to work
Hi I am using 0.7.0-beta1 , and trying to get the contrib/client_only example to work. I am running cassandra on host1, and trying to access it from host2. When using thirft (via cassandra-cli) and in my application; I am able to connect and do all operations as expected. But I am not able to connect to cassandra when using the code in client_only (or far that matter using contrib/bmt_example). Since my test requires to do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft interface. The error that I am getting is follows (the keyspace and the column family exist and can be used via Thirft) : 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip Exception in thread "main" java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in keyspace Keyspace1 at org .apache .cassandra .config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009) at org .apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java: 418) at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) I am using the following code (from client_only example) (also passing JVM parameter -Dstorage-config=path_2_cassandra.yaml) public static void main(String[] args) throws Exception { System.setProperty("storage-config","cassandra.yaml"); testWriting(); } // from client_only example private static void testWriting() throws Exception { StorageService.instance.initClient(); // sleep for a bit so that gossip can do its thing. try { Thread.sleep(1L); } catch (Exception ex) { throw new AssertionError(ex); } // do some writing. final AbstractType comp = ColumnFamily.getComparatorFor("Keyspace1", "Standard1", null); for (int i = 0; i < 100; i++) { RowMutation change = new RowMutation("Keyspace1", ("key" + i).getBytes()); ColumnPath cp = new ColumnPath("Standard1").setColumn(("colb").getBytes()); change.add(new QueryPath(cp), ("value" + i).getBytes(), new TimestampClock(0)); // don't call change.apply(). The reason is that is makes a static call into Table, which will perform // local storage initialization, which creates local directories. // change.apply(); StorageProxy.mutate(Arrays.asList(change)); System.out.println("wrote key" + i); } System.out.println("Done writing."); StorageService.instance.stopClient(); }
RE: 0.7 live schema updates
But you'll loose these settings after Cassandra restart. -Original Message- From: Oleg Anastasyev [mailto:olega...@gmail.com] Sent: Thursday, September 16, 2010 11:21 AM To: user@cassandra.apache.org Subject: Re: 0.7 live schema updates You can change these attrs using JMX interface. Take a look at org.apache.cassandra.tools.NodeProbe setCacheCapacities method.
Indexing&Locking in Cassandra
Hello, I have a few questions about indexing and locking in Cassandra: - if I understood well only row level indexing exists prior to v0.7. I mean only the primary keys are indexed. Is that true? - is it possible to use composite primary keys? For instance I have a user object: User(name,birthday,gender,address) and I want to have the (name,birthday) columns as PK. Can I do? If yes, how? - does Cassandra support CF (table) level locking? Couls someone explain me/provide a link how? Thanks in advance, Sandor
Re: Indexing&Locking in Cassandra
Hello, > I have a few questions about indexing and locking in Cassandra: > - if I understood well only row level indexing exists prior to v0.7. I mean > only the primary keys are indexed. Is that true? Yes and no. The row name is the key which you use to fetch the row from cassandra. There are methods to iterate thru rows but that's not efficient and should be used only in batch operations. Columns inside rows are sorted by their names so they are also indexes as you use the column name to fetch the contents of the column. If you want to index data by other ways you need to build your own application code which maintains such indexes and the upcoming 0.7 version will bring some handy features which makes the coders job much easier. > - is it possible to use composite primary keys? For instance I have a user > object: User(name,birthday,gender,address) and I want to have the > (name,birthday) columns as PK. Can I do? If yes, how? You can always create your row key as a string like "$name_$birthday". Did this answer to your question? > - does Cassandra support CF (table) level locking? Couls someone explain > me/provide a link how? No, cassandra doesn't have any locking capabilities. You can always use some external locking mechanism like zookeeper [http://hadoop.apache.org/zookeeper/] or implement your own sollution on top of cassandra (not recommended as it's quite hard to get it correctly). - Juho Mäkinen / Garo
RE: Indexing&Locking in Cassandra
Thanks for your fast answer. Regarding to the composite keys: that's what I thought by default I just needed some confirmation. Unfortunately I can not use this approach in our application so I will figure out something else. I will check out Zookeeper to see if I can use it. Thanks again! Hello, > I have a few questions about indexing and locking in Cassandra: > - if I understood well only row level indexing exists prior to v0.7. I mean > only the primary keys are indexed. Is that true? Yes and no. The row name is the key which you use to fetch the row from cassandra. There are methods to iterate thru rows but that's not efficient and should be used only in batch operations. Columns inside rows are sorted by their names so they are also indexes as you use the column name to fetch the contents of the column. If you want to index data by other ways you need to build your own application code which maintains such indexes and the upcoming 0.7 version will bring some handy features which makes the coders job much easier. > - is it possible to use composite primary keys? For instance I have a user > object: User(name,birthday,gender,address) and I want to have the > (name,birthday) columns as PK. Can I do? If yes, how? You can always create your row key as a string like "$name_$birthday". Did this answer to your question? > - does Cassandra support CF (table) level locking? Couls someone explain > me/provide a link how? No, cassandra doesn't have any locking capabilities. You can always use some external locking mechanism like zookeeper [http://hadoop.apache.org/zookeeper/] or implement your own sollution on top of cassandra (not recommended as it's quite hard to get it correctly). - Juho Mäkinen / Garo
Re: Getting client only example to work
I discovered some problems with the fat client earlier this week when I tried using it. It needs some fixes to keep up with all the 0.7 changes. Gary. On Thu, Sep 16, 2010 at 05:48, Asif Jan wrote: > > Hi > I am using 0.7.0-beta1 , and trying to get the contrib/client_only example > to work. > I am running cassandra on host1, and trying to access it from host2. > When using thirft (via cassandra-cli) and in my application; I am able to > connect and do all operations as expected. > But I am not able to connect to cassandra when using the code in client_only > (or far that matter using contrib/bmt_example). Since my test requires to > do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft > interface. > The error that I am getting is follows (the keyspace and the column family > exist and can be used via Thirft) : > 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' > determined to be mmap, indexAccessMode is mmap > 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip > Exception in thread "main" java.lang.IllegalArgumentException: Unknown > ColumnFamily Standard1 in keyspace Keyspace1 > at > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418) > at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) > at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) > I am using the following code (from client_only example) (also passing JVM > parameter -Dstorage-config=path_2_cassandra.yaml) > > > public static void main(String[] args) throws Exception { > System.setProperty("storage-config","cassandra.yaml"); > testWriting(); > } > > // from client_only example > private static void testWriting() throws Exception > { > StorageService.instance.initClient(); > // sleep for a bit so that gossip can do its thing. > try > { > Thread.sleep(1L); > } > catch (Exception ex) > { > throw new AssertionError(ex); > } > // do some writing. > final AbstractType comp = ColumnFamily.getComparatorFor("Keyspace1", > "Standard1", null); > for (int i = 0; i < 100; i++) > { > RowMutation change = new RowMutation("Keyspace1", ("key" + > i).getBytes()); > ColumnPath cp = new > ColumnPath("Standard1").setColumn(("colb").getBytes()); > change.add(new QueryPath(cp), ("value" + i).getBytes(), new > TimestampClock(0)); > // don't call change.apply(). The reason is that is makes a > static call into Table, which will perform > // local storage initialization, which creates local > directories. > // change.apply(); > StorageProxy.mutate(Arrays.asList(change)); > System.out.println("wrote key" + i); > } > System.out.println("Done writing."); > StorageService.instance.stopClient(); > } > > > > > >
Re: 0.7 live schema updates
beta-2 will include the ability to set these values and others. Look for the system_update_column_family() and system_update_keyspace() methods. Gary. On Thu, Sep 16, 2010 at 02:38, Marc Canaleta wrote: > Hi! > I like the new feature of making live schema updates. You can add, drop and > rename columns and keyspaces via thrift, but how do you modify column > attributes like key_cache_size or rows_cached? > Thank you.
Re: Build an index to for join query
Alvin - assuming I understand what you're after correctly, why not make a CF Name_Address(name, address). Modifying the Cassandra methods to do the "join" you describe seems like overkill to me... -Paul On Sep 15, 2010, at 7:34 PM, Alvin UW wrote: Hello, I am going to build an index to join two CFs. First, we see this index as a CF/SCF. The difference is I don't materialise it. Assume we have two tables: ID_Address(Id, address) , Name_ID(name, id) Then,the index is: Name_Address(name, address) When the application tries to query on Name_Address, the value of "name" is given by the application. I want to direct the read operation to Name_ID to get "Id" value, then go to ID_Address to get the "address" value by the "Id" value. So far, I consider only the read operation. By this way, the join query is transparent to the user. So I think I should find out which methods or classes are in charge of the read operation in the above operation. For example, the operation in cassandra CLI "get Keyspace1.Standard2['jsmith']" calls exactly which methods in the server side? I noted CassandraServer is used to listen to clients, and there are some methods such as get(), get_slice(). Is it the right place I can modify to implement my idea? Thanks. Alvin
Pb with memtable_throughput_in_mb?
Hi, I am trying out the latest trunk version and I get an error when starting Cassandra with -Xmx3G: Fatal error: memtable_operations_in_millions must be a positive double I guess it is caused by line 76 in org/apache/cassandra/config/Config.java [0]: public Integer memtable_throughput_in_mb = (int) Runtime.getRuntime().maxMemory() / 8; The cast to (int) is done on maxMemory() but this method returns a long, leading to a cast to a negative integer for mem=3G for instance. Thus memtable_operations_in_millions becomes negative (Double memtable_operations_in_millions = memtable_throughput_in_mb / 64 * 0.3) and the exception is thrown: maxMemory() is measured in bytes but I guess memtable_throughput_in_mb should in MB (as it names imply), which is not the case here. What do you think? Thanks for any input you have to this, Cheers [0] http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java
Re: Pb with memtable_throughput_in_mb?
On Thu, Sep 16, 2010 at 11:00 AM, Thomas Boucher wrote: > Hi, > > I am trying out the latest trunk version and I get an error when > starting Cassandra with -Xmx3G: > Fatal error: memtable_operations_in_millions must be a positive double > > I guess it is caused by line 76 in org/apache/cassandra/config/Config.java > [0]: > >public Integer memtable_throughput_in_mb = (int) > Runtime.getRuntime().maxMemory() / 8; > > The cast to (int) is done on maxMemory() but this method returns a > long, leading to a cast to a negative integer for mem=3G for instance. > Thus memtable_operations_in_millions becomes negative (Double > memtable_operations_in_millions = memtable_throughput_in_mb / 64 * > 0.3) and the exception is thrown: > > maxMemory() is measured in bytes but I guess memtable_throughput_in_mb > should in MB (as it names imply), which is not the case here. > Oops, good catch. Fixed in r997841. -Brandon
Buildding a Ubuntu / Debian package for Cassandra
Guys, I am trying to build a debian package in order to deploy Cassandra 0.6.5 on Ubuntu. I see that you have a ./debian directory in the source builds, do you have a bit more background on how it is used and build? P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. Thanks, FR Francois Richard
Re: Buildding a Ubuntu / Debian package for Cassandra
Hi Francois, Any reason http://wiki.apache.org/cassandra/DebianPackaging isn't working for you? Dave Viner On Thu, Sep 16, 2010 at 10:30 AM, Francois Richard wrote: > Guys, > > > > I am trying to build a debian package in order to deploy Cassandra 0.6.5 on > Ubuntu. I see that you have a ./debian directory in the source builds, do > you have a bit more background on how it is used and build? > > > > P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. > > > > Thanks, > > > > FR > > > > > > > > *Francois Richard * >
Secondary Index Null Pointer Error
Hi, I am using Casandra 0.7 trunk (r997357) and am having issues with a secondary index. I have a ColumnFamily with a secondary index on column "X". Not every row of data has column X. It looks like when I write a row that does not have column X, Cassandra throws the following NPE when it writes the index: ERROR 20:05:37,015 Uncaught exception in thread Thread[FLUSH-WRITER-POOL:1,5,main] java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:87) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more Caused by: java.lang.NullPointerException at org.apache.cassandra.io.sstable.IndexSummary.complete(IndexSummary.java:63) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:383) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:138) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:132) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:156) at org.apache.cassandra.db.Memtable.access$000(Memtable.java:44) at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:168) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 6 more This doesn't necessarily happen as soon as the row is written; it happens after you write enough rows, or after a restart of the server when the commitlog is replayed. Is it the case that indexed columns must exist? Thanks Colin
Re: Buildding a Ubuntu / Debian package for Cassandra
Hello Francois, There are already .debs available here: http://wiki.apache.org/cassandra/DebianPackaging I've also setup a PPA to build the packages on Ubuntu here: https://launchpad.net/~cassandra-ubuntu/+archive/stable Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I write this email.. The .debs are nearly identical. The only difference is that I've packaged the jars necessary to build, so that you get the same exact versions of all libraries if you need to patch + repeat the build. Also, these are built specifically for Ubuntu releases, so if we find any incompatibilities between debian/ubuntu we can fix them for ubuntu users. I hope this helps! On Sep 16, 2010, at 10:30 AM, Francois Richard wrote: > Guys, > > I am trying to build a debian package in order to deploy Cassandra 0.6.5 on > Ubuntu. I see that you have a ./debian directory in the source builds, do > you have a bit more background on how it is used and build? > > P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. > > Thanks, > > FR > > > > Francois Richard
Re: Get cassandra SuperColumn only!
AFAIK there is no way to get a list of the super columns, without also getting the sub columns. I do not know if there is a technical reason that would prevent this from being added. In general it's more efficient to make 1 request that pulls back more data, than two or more than pull back just enough data. But you also want to design to answer the queries you need to make.Keeping an index of super column names in another CF does not sound too bad. it might pay to take another look at why you are using a super CF. It may be better to use two standard CF's if say you want to have one sort of request that gets a list of things, and another sort of request that gets the details for a number of things.AaronOn 16 Sep, 2010,at 07:25 PM, Saurabh Raje wrote:Hi,I have a cassandra datastore as follows: key:{ supercol (utf8) : { subcol (timuuid) : data }}Now, for a particular usecase I want to do slice on 2 levels. Firstlyon supercols & then from the selected supercols results slice subcols (mostly to restrict no of items fetched in mem). I have tried variousAPI's and there doesn't seem to be a way to do this. The reason beingwhen I slice supercols i get the subcols in the result too! Now, ofcourse, I can add another index as follows:key : { supercol (utf8) : }}Haven't looked at cassandra storage in too detail - but hoping thereis a better solution! Thanks in advance
RE: Buildding a Ubuntu / Debian package for Cassandra
Thanks Clint, I am going to look-up the links below, I am pretty new on the DEB packaging in general and from what I have seen so far, a lot of the tutorial on the web are mostly based on classic [ .configure | make | make install ] of an application built in C. In this case I wanted to figure out the DEB packaging in the context of a Java application. I'll read on more and will stay in touch. My goal at the end of the day, is to install " the stock" package for Cassandra and then to create a special Cassandra-config package that would move and deploy my customized configuration files on the system. Thanks, FR -Original Message- From: Clint Byrum [mailto:cl...@ubuntu.com] Sent: Thursday, September 16, 2010 10:54 AM To: user@cassandra.apache.org Subject: Re: Buildding a Ubuntu / Debian package for Cassandra Hello Francois, There are already .debs available here: http://wiki.apache.org/cassandra/DebianPackaging I've also setup a PPA to build the packages on Ubuntu here: https://launchpad.net/~cassandra-ubuntu/+archive/stable Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I write this email.. The .debs are nearly identical. The only difference is that I've packaged the jars necessary to build, so that you get the same exact versions of all libraries if you need to patch + repeat the build. Also, these are built specifically for Ubuntu releases, so if we find any incompatibilities between debian/ubuntu we can fix them for ubuntu users. I hope this helps! On Sep 16, 2010, at 10:30 AM, Francois Richard wrote: > Guys, > > I am trying to build a debian package in order to deploy Cassandra 0.6.5 on > Ubuntu. I see that you have a ./debian directory in the source builds, do > you have a bit more background on how it is used and build? > > P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help. > > Thanks, > > FR > > > > Francois Richard
Re: Getting client only example to work
ok, did something about the message service changed in the initClient method; essentially now one can not call initClient when a cassandra instance is running on the same machine. thanks On Sep 16, 2010, at 3:48 PM, Gary Dusbabek wrote: I discovered some problems with the fat client earlier this week when I tried using it. It needs some fixes to keep up with all the 0.7 changes. Gary. On Thu, Sep 16, 2010 at 05:48, Asif Jan wrote: Hi I am using 0.7.0-beta1 , and trying to get the contrib/client_only example to work. I am running cassandra on host1, and trying to access it from host2. When using thirft (via cassandra-cli) and in my application; I am able to connect and do all operations as expected. But I am not able to connect to cassandra when using the code in client_only (or far that matter using contrib/bmt_example). Since my test requires to do bulk insertion of about 1.4 TB of data, so I need to use a non- thirft interface. The error that I am getting is follows (the keyspace and the column family exist and can be used via Thirft) : 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip Exception in thread "main" java.lang.IllegalArgumentException: Unknown ColumnFamily Standard1 in keyspace Keyspace1 at org .apache .cassandra .config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java: 1009) at org .apache .cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418) at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103) at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187) I am using the following code (from client_only example) (also passing JVM parameter -Dstorage-config=path_2_cassandra.yaml) public static void main(String[] args) throws Exception { System.setProperty("storage-config","cassandra.yaml"); testWriting(); } // from client_only example private static void testWriting() throws Exception { StorageService.instance.initClient(); // sleep for a bit so that gossip can do its thing. try { Thread.sleep(1L); } catch (Exception ex) { throw new AssertionError(ex); } // do some writing. final AbstractType comp = ColumnFamily.getComparatorFor("Keyspace1", "Standard1", null); for (int i = 0; i < 100; i++) { RowMutation change = new RowMutation("Keyspace1", ("key" + i).getBytes()); ColumnPath cp = new ColumnPath("Standard1").setColumn(("colb").getBytes()); change.add(new QueryPath(cp), ("value" + i).getBytes(), new TimestampClock(0)); // don't call change.apply(). The reason is that is makes a static call into Table, which will perform // local storage initialization, which creates local directories. // change.apply(); StorageProxy.mutate(Arrays.asList(change)); System.out.println("wrote key" + i); } System.out.println("Done writing."); StorageService.instance.stopClient(); }
Re: Bootstrapping stays stuck
Thanks to driftx from cassandra IRC channel for helping out. This was resolved by increasing the rpc timeout for the bootstrap process. On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh wrote: > This problem still stays unresolved despite numerous restarts to the > cluster. I cant seem to find a way out of this one, and I am not really > looking for a workaround, kinda need this to work if i need to go to > production. > > Turned on the ALL logging in log4j, and now I see the following exception > (EOFException) on the destination. After receiving each file, it seems to be > throwing this exception. The transfer is successful except for this > exception. The source successful declares the transfer complete. But the > destination does not move out of the bootstrapping mode, and just sits > there. > > DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line > 65) Receiving stream: finished reading chunk, awaiting more > DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line > 87) Removing stream context > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369 > DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java > (line 73) Sending a streaming finished message with > org.apache.cassandra.streaming.completedfilesta...@54828e7 to IP1 > TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line > 82) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(Unknown Source) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) > DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line > 51) Receiving stream > DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line > 54) Creating file for > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db > DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line > 65) Receiving stream: finished reading chunk, awaiting more > DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line > 87) Removing stream context > /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045 > DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java > (line 73) Sending a streaming finished message with > org.apache.cassandra.streaming.completedfilesta...@7b41a32f to IP1 > TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line > 82) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(Unknown Source) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) > > /G > > On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh > wrote: > >> Hi Vineet, >> I have tracked the nodetool streams to completion each time. Below are the >> logs on the source and destination node. There are 3 sstables being >> transferred, and the transfer seems to be successful. However, after the >> streams finish, the source prints out messages about the dropped messages, >> which may point to the problem. ideas? I checked port 7000 is open for >> communication. 9160 is not up on the node being bootstrapped, but that comes >> up after the node is bootstrapped, is that right? >> >> Thanks a ton, >> /G >> >> *Logs on the source node (IP2):* >> * >> * >> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79) >> Flushing memtables for userdata... >> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95) >> Performing anticompaction ... >> INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')] >> INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129) >> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is >> 4415946752 >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java >> (line 396) AntiCompacted to >> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db. >> 49074138589/36770836242 bytes for 5990912 keys. Time: 1438607ms. >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')] >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java >> (line 396) AntiCompacted to >> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db. >> 28185244/21126422 bytes for 47722 keys. Time: 2310ms. >> INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java >> (line 339) AntiCompacting >> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')] >> INFO [COMPACTION-POOL:1] 2010-09-14 1
Re: Buildding a Ubuntu / Debian package for Cassandra
On Sep 16, 2010, at 2:03 PM, Francois Richard wrote: > Thanks Clint, > > I am going to look-up the links below, I am pretty new on the DEB packaging > in general and from what I have seen so far, a lot of the tutorial on the web > are mostly based on classic [ .configure | make | make install ] of an > application built in C. In this case I wanted to figure out the DEB > packaging in the context of a Java application. I'll read on more and will > stay in touch. > Actually there's a lot of ambiguity in packaging that arise given Java's unique properties as a compiled, architecture independent platform. I wouldn't recommend *starting* your debian packaging journey with java. Maybe find a nice C program first. ;) > My goal at the end of the day, is to install " the stock" package for > Cassandra and then to create a special Cassandra-config package that would > move and deploy my customized configuration files on the system. > You're probably better off using a configuration management system such as puppet, chef, or cfengine to.. well.. manage your configuration.
questions on cassandra (repair and multi-datacenter)
Hi, I have a few questions and was looking for an answer. I have a cluster of 7 Cassandra 0.6.5 nodes in my test setup. RF=2. Original data size is about 100 gigs, with RF=2, i see the total load on the cluster is about 200 gigs, all good. 1. I was looking to increase the RF to 3. This process entails changing the config and calling repair on the keyspace one at a time, right? So, I started with one node at a time, changed the config file on the first node for the keyspace, restarted the node. And then called a nodetool repair on the node. These same steps i followed for every node after that, as I read somewhere that the repair should be invoked one node at a time. (a) What is the best way to ascertain if the repair is completed on a node? (b) After the repair was finished, I was expecting the total data load to be 300 gigs. However, calling the ring command, shows the total load to be 370 gigs. I double checked and config on all machines says RF=3. I am calling a cleanup on each node right now. Is the cleanup required after calling a repair? Am i missing something? 2. This question is regarding multi-datacenter support. I plan to have a cluster of 6 machines across 2 datacenters, with the machines from the datacenters alternating on the ring. RF=3 is the plan. I already have a test setup as described above, which has most of the data, but its still configured on the default RackUnAware strategy. I was hoping to find the right steps to move it to RackAware strategy with the PropertyFileEndpointSnitch that I read somewhere (not sure if thats supported in 0.6.5, but CustomEndPointSnitch is the same, right?), all this without having to repopulate any data again. Currently there is only 1 datacenter, but I was stil planning to set the cluster up as it would be in multi-datacenter support, and run it like that in the one datacenter, and when the second datacenter comes up, just copy all the files across to the new nodes in the second datacenter, and bring the whole cluster up. Will this work ? I have tried copying files to a new node, shutting down all nodes, and bringing back everything up, and it recognized the new ips. Thanks Gurpreet
What the thrift version cassandra 0.7 beta uses?
What the thrift version cassandra 0.7 beta uses? -- Best regards, Ivy Tang
Re: What the thrift version cassandra 0.7 beta uses?
It doesn't use a specific version - it uses a specific subversion revision. The revision number is appended to the thrift jar in the cassandra lib folder. On Sep 16, 2010, at 9:10 PM, Ying Tang wrote: > What the thrift version cassandra 0.7 beta uses? > > -- > Best regards, > > Ivy Tang > > >
Re: What the thrift version cassandra 0.7 beta uses?
So the thrift.lib will maybe change while the cassandra is updating? On Thu, Sep 16, 2010 at 10:36 PM, Jeremy Hanna wrote: > It doesn't use a specific version - it uses a specific subversion revision. > The revision number is appended to the thrift jar in the cassandra lib > folder. > > On Sep 16, 2010, at 9:10 PM, Ying Tang wrote: > > > What the thrift version cassandra 0.7 beta uses? > > > > -- > > Best regards, > > > > Ivy Tang > > > > > > > > -- Best regards, Ivy Tang
Re: questions on cassandra (repair and multi-datacenter)
On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh wrote: > 1. I was looking to increase the RF to 3. This process entails changing the > config and calling repair on the keyspace one at a time, right? > So, I started with one node at a time, changed the config file on the first > node for the keyspace, restarted the node. And then called a nodetool repair > on the node. You need to change the RF on _all_ nodes in the cluster _before_ running repair on _any_ of them. If nodes disagree on which nodes should have replicas for keys, repair will not work correctly. Different RF for the same keyspace creates that disagreement. b
Re: questions on cassandra (repair and multi-datacenter)
Thanks Benjamin. I realised that, i have reverted using cleanup, got it back to old state and testing the scenario exactly the way you put it. On Thu, Sep 16, 2010 at 10:56 PM, Benjamin Black wrote: > On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh > wrote: > > 1. I was looking to increase the RF to 3. This process entails changing > the > > config and calling repair on the keyspace one at a time, right? > > So, I started with one node at a time, changed the config file on the > first > > node for the keyspace, restarted the node. And then called a nodetool > repair > > on the node. > > You need to change the RF on _all_ nodes in the cluster _before_ > running repair on _any_ of them. If nodes disagree on which nodes > should have replicas for keys, repair will not work correctly. > Different RF for the same keyspace creates that disagreement. > > > b >