Re: strange get_range_slices behaviour v0.6.1
Thanks Jonathan. After looking at the Lucandra code I realized my confusions has to do with get_range_slices and the RandomPartitioner. When I switched to the OPP I got the expected behaviour. I was noticing cases under the random partitioner where keys I expected to be returned were not. Can you give a little advice on the expected behaviour of get_range_slices with the RP and I'll try to write a JUnit for it. e.g. Is it essentially the same as under the OPP but order is undefined? Thanks Aaron On Mon, 3 May 2010 10:27:37 -0500, Jonathan Ellis wrote: > Util.range returns a Range object which is end-exclusive. (You want > "Bounds" for end-inclusive.) > > On Sun, May 2, 2010 at 7:19 AM, aaron morton > wrote: >> He there, I'm still getting odd behavior with get_range_slices. I've >> created >> a JUNIT test that illustrates the case. >> Could someone take a look and either let me know where my understanding >> is >> wrong or is this is a real issue? >> >> >> I added the following to ColumnFamilyStoreTest.java >> >> >> private ColumnFamilyStore insertKey1Key2Key3() throws IOException, >> ExecutionException, InterruptedException >> { >> List rms = new LinkedList(); >> RowMutation rm; >> rm = new RowMutation("Keyspace2", "key1".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> >> rm = new RowMutation("Keyspace2", "key2".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> >> rm = new RowMutation("Keyspace2", "key3".getBytes()); >> rm.add(new QueryPath("Standard1", null, "Column1".getBytes()), >> "asdf".getBytes(), 0); >> rms.add(rm); >> return Util.writeColumnFamily(rms); >> } >> >> >> �...@test >> public void testThreeKeyRangeAll() throws IOException, >> ExecutionException, InterruptedException >> { >> ColumnFamilyStore cfs = insertKey1Key2Key3(); >> >> IPartitioner p = StorageService.getPartitioner(); >> RangeSliceReply result = >> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY, >> >> Util.range(p, "key1", >> "key3"), >> >> 10, >> >> null, >> >> Arrays.asList("Column1".getBytes())); >> assertEquals(3, result.rows.size()); >> } >> >> �...@test >> public void testThreeKeyRangeSkip1() throws IOException, >> ExecutionException, InterruptedException >> { >> ColumnFamilyStore cfs = insertKey1Key2Key3(); >> >> IPartitioner p = StorageService.getPartitioner(); >> RangeSliceReply result = >> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY, >> >> Util.range(p, "key2", >> "key3"), >> >> 10, >> >> null, >> >> Arrays.asList("Column1".getBytes())); >> assertEquals(2, result.rows.size()); >> } >> >> Running this with "ant test" the partial output is >> >> [junit] Testsuite: org.apache.cassandra.db.ColumnFamilyStoreTest >> [junit] Tests run: 7, Failures: 2, Errors: 0, Time elapsed: 1.405 >> sec >> [junit] >> [junit] Testcase: >> testThreeKeyRangeAll(org.apache.cassandra.db.ColumnFamilyStoreTest): >> FAILED >> [junit] expected:<3> but was:<2> >> [junit] junit.framework.AssertionFailedError: expected:<3> but >> was:<2> >> [junit] at >> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeAll(ColumnFamilyStoreTest.java:170) >> [junit] >> [junit] >> [junit] Testcase: >> testThreeKeyRangeSkip1(org.apache.cassandra.db.ColumnFamilyStoreTest): >> FAILED >> [junit] expected:<2> but was:<1> >> [junit] junit.framework.AssertionFailedError: expected:<2> but >> was:<1> >> [junit] at >> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeSkip1(ColumnFami
RE: read operation is slow
I'm not sure about the client you're using, but I've noticed in the past the incorrect Thrift stack can make things run slow (like 40 times slower). Check that the network stack wraps the socket in a Transport, preferably the TBufferedTransport. I'm guessing the client your're using is doing the right thing, just an suggestion. Aaron On Fri, 11 Jun 2010 18:49:46 -0700, "caribbean410" wrote: > Thanks for the suggestion. For the test case, it is 1 key and 1 column. I > once changed 10 to 1, as I remember there is no much difference. > > > > I have 200k keys and each key is randomly generated. I will try the > optimized query next week. But maybe you still have to face the case that > each time a client just wants to query one key from db. > > > > From: Dop Sun [mailto:su...@dopsun.com] > Sent: Friday, June 11, 2010 6:05 PM > To: user@cassandra.apache.org > Subject: RE: read operation is slow > > > > And also, you are only select 1 key and 10 columns? > > > > criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > nameFirst, 10); > > > > Then, if you have 200k keys, you have 200k Thrift calls. If this is the > case, you may need to optimize the way you do the query (to combine > multiple > keys into a single query), and to reduce the number of calls. > > > > From: Dop Sun [mailto:su...@dopsun.com] > Sent: Saturday, June 12, 2010 8:57 AM > To: user@cassandra.apache.org > Subject: RE: read operation is slow > > > > You mean after you "I remove some unnecessary column family and change the > size of rowcache and keycache, now the latency changes from 0.25ms to > 0.09ms. In essence 0.09ms*200k=18s.", it still takes 400 seconds to > returning? > > > > From: Caribbean410 [mailto:caribbean...@gmail.com] > Sent: Saturday, June 12, 2010 8:48 AM > To: user@cassandra.apache.org > Subject: Re: read operation is slow > > > > Hi, do you mean this one should not introduce much extra delay? To read a > record, I need select here, not sure where the extra delay comes from. > > On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun wrote: > > Jassandra is used here: > > > > Map> map = criteria.select(); > > > > The select here basically is a call to Thrift API: get_range_slices > > > > > > From: Caribbean410 [mailto:caribbean...@gmail.com] > Sent: Saturday, June 12, 2010 8:00 AM > > > To: user@cassandra.apache.org > Subject: Re: read operation is slow > > > > I remove some unnecessary column family and change the size of rowcache and > keycache, now the latency changes from 0.25ms to 0.09ms. In essence > 0.09ms*200k=18s. I don't know why it takes more than 400s total. Here is > the > client code and cfstats. There are not many operations here, why is the > extra time so large? > > > > long start = System.currentTimeMillis(); > for (int j = 0; j < 1; j++) { > for (int i = 0; i < numOfRecords; i++) { > int n = random.nextInt(numOfRecords); > ICriteria criteria = cf.createCriteria(); > userName = keySet[n]; > > criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > nameFirst, 10); > Map> map = criteria.select(); > List list = map.get(userName); > // ByteArray bloc = list.get(0).getValue(); > // byte[] byteArrayloc = bloc.toByteArray(); > // loc = new String(byteArrayloc); > > // readBytes = readBytes + loc.length(); > readBytes = readBytes + blobSize; > } > } > > long finish=System.currentTimeMillis(); > > float totalTime=(finish-start)/1000; > > > Keyspace: Keyspace1 > Read Count: 60 > Read Latency: 0.090530067 ms. > Write Count: 20 > Write Latency: 0.01504989 ms. > Pending Tasks: 0 > Column Family: Standard2 > SSTable count: 3 > Space used (live): 265990358 > Space used (total): 265990358 > Memtable Columns Count: 2615 > Memtable Data Size: 2667300 > Memtable Switch Count: 3 > Read Count: 60 > Read Latency: 0.091 ms. > Write Count: 20 > Write Latency: 0.015 ms. > Pending Tasks: 0 > Key cache capacity: 1000 >
stalled streaming
Hello,
stalled streaming
hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been running a mixed read / write workload test how it works in our environment, we run about 4M bath mutations and 40M get_range_slice requests over 6 to 8 hours that load about 10 to 15 GB of data. Yesterday while there was no activity I noticed 2 nodes sitting at 200% CPU on 8 Core machine. Thought nothing of it. Checked again this morning and they are still sitting at that level of activity with no requests going into them. Checking the streams using node tool I see node 3 is streaming to node 0 and 2, and appears to have stalled. The information in the JMX console for streams matches the info below. I cannot see any errors in the logs. This is just a test system, so am happy to bounce the JVM's. Before I do is there anything else I should be looking for to understand why this happened? Also, sorry for the previous empty email. Node 0 Mode: Normal Nothing streaming to /192.168.34.27 Nothing streaming to /192.168.34.28 Nothing streaming to /192.168.34.29 Streaming from: /192.168.34.29 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Filter.db 0/22765 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Data.db 0/10750717 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Index.db 0/58 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Filter.db 0/325 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Data.db 0/695 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Index.db 0/58 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Filter.db 0/325 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Data.db 0/695 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Index.db 0/587164 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Filter.db 0/22765 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Data.db 0/5159652 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-124-Data.db 22765/4966927 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Index.db 22765/1223053 Node 1 Mode: Normal Nothing streaming to /192.168.34.26 Nothing streaming to /192.168.34.28 Nothing streaming to /192.168.34.29 Not receiving any streams. Node 2 Mode: Normal Nothing streaming to /192.168.34.26 Nothing streaming to /192.168.34.27 Nothing streaming to /192.168.34.29 Streaming from: /192.168.34.29 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Filter.db 0/22765 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Data.db 0/2161660 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Index.db 0/787524 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Filter.db 0/22765 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Data.db 0/6917064 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Index.db 0/58 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Filter.db 0/565 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Data.db 0/695 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-168-Index.db 0/581779 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-168-Filter.db 0/22765 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-168-Data.db 0/5111887 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-125-Data.db 361367/3173057 junkbox.mycompany: /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-125-Index.db 695/361367 Node 3 ode: Normal Streaming to: /192.168.34.26 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Buckets-69-Filter.db 22765/22765 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Buckets-69-Data.db 0/4966927 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Databases-42-Index.db 0/58 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Databases-42-Filter.db 0/325 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Databases-42-Data.db 0/695 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Buckets-82-Index.db 0/1223053 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Buckets-82-Filter.db 0/22765 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Buckets-82-Data.db 0/10750717 /local1/junkbox/cassandra/data/junkbox.mycompany/stream/Databases-52-Index.db 0/58 /local1/junkb
Re: stalled streaming
Thanks, will move to 0.6.2. Aaron On Tue, 15 Jun 2010 15:55:46 -0700, Benjamin Black wrote: > Known bug, fixed in latest 0.6 release. > > On Tue, Jun 15, 2010 at 3:29 PM, aaron wrote: >> hello, >> >> I have a 4 node cassandra cluster with 0.6.1 installed. We've been >> running >> a mixed read / write workload test how it works in our environment, we >> run >> about 4M bath mutations and 40M get_range_slice requests over 6 to 8 >> hours >> that load about 10 to 15 GB of data. >> >> Yesterday while there was no activity I noticed 2 nodes sitting at 200% >> CPU on 8 Core machine. Thought nothing of it. Checked again this morning >> and they are still sitting at that level of activity with no requests >> going >> into them. Checking the streams using node tool I see node 3 is streaming >> to node 0 and 2, and appears to have stalled. The information in the JMX >> console for streams matches the info below. >> >> I cannot see any errors in the logs. >> >> This is just a test system, so am happy to bounce the JVM's. Before I do >> is there anything else I should be looking for to understand why this >> happened? >> >> Also, sorry for the previous empty email. >> >> Node 0 >> Mode: Normal >> Nothing streaming to /192.168.34.27 >> Nothing streaming to /192.168.34.28 >> Nothing streaming to /192.168.34.29 >> Streaming from: /192.168.34.29 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Filter.db >> 0/22765 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Data.db >> 0/10750717 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Index.db >> 0/58 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Filter.db >> 0/325 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-108-Data.db >> 0/695 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Index.db >> 0/58 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Filter.db >> 0/325 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-119-Data.db >> 0/695 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Index.db >> 0/587164 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Filter.db >> 0/22765 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-163-Data.db >> 0/5159652 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-124-Data.db >> 22765/4966927 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Index.db >> 22765/1223053 >> >> Node 1 >> Mode: Normal >> Nothing streaming to /192.168.34.26 >> Nothing streaming to /192.168.34.28 >> Nothing streaming to /192.168.34.29 >> Not receiving any streams. >> >> Node 2 >> Mode: Normal >> Nothing streaming to /192.168.34.26 >> Nothing streaming to /192.168.34.27 >> Nothing streaming to /192.168.34.29 >> Streaming from: /192.168.34.29 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Filter.db >> 0/22765 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-137-Data.db >> 0/2161660 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Index.db >> 0/787524 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Filter.db >> 0/22765 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-147-Data.db >> 0/6917064 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Index.db >> 0/58 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Filter.db >> 0/565 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Databases-tmp-130-Data.db >> 0/695 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Buckets-tmp-168-Index.db >> 0/581779 >> junkbox.mycompany: >> /local1/junkbox/cassandra/data/junkbox.mycompany/Bucket
strange get_range_slices behaviour v0.6.1
I've been looking at the get_range_slices feature and have found some odd behaviour I do not understand. Basically the keys returned in a range query do not match what I would expect to see. I think it may have something to do with the ordering of keys that I don't know about, but I'm just guessing. On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using Python and my own thin wrapper around the Thrift Python API. Step 1. Insert 3 keys into the "Standard 1" column family, called "object 1" "object 2" and "object 3", each with a single column called 'name' with a value like 'object1' Step 2. Do a get_range_slices call in the "Standard 1" CF, for column names ["name"] with start_key "object1" and end_key "object3". I expect to see three results, but I only see results for object1 and object2. Below are the thrift types I'm passing into the Cassandra.Client object... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object1', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Step 3. Modify the get_range_slices call, so the start_key is object2. In this case I expect to see 2 rows returned, but I get 3. Thrift args and return are below... - ColumnParent(column_family='Standard1', super_column=None) - SlicePredicate(column_names=['name'], slice_range=None) - KeyRange(end_key='object3', start_key='object2', count=4000, end_token=None, start_token=None) and the output [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, name='name', value='object2'), super_column=None)], key='object2'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, name='name', value='object1'), super_column=None)], key='object1'), KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, name='name', value='object3'), super_column=None)], key='object3')] Can anyone explain these odd results? As I said I've got my own python wrapper around the client, so I may be doing something wrong. But I've pulled out the thrift objects and they go in and out of the thrift Cassandra.Client, so I think I'm ok. (I have not noticed a systematic problem with my wrapper). On a more general note, is there information on the sort order of keys when using key ranges? I'm guessing the hash of the keys is compared and I wondering if the hash's of the keys maintain the order of the original values? Also I assume the order is byte order, rather than ascii or utf8. I was experimenting with the difference between column slicing and key slicing. In my I could write the keys in as column names (they are in buckets) as well and slice there first, then use the results to to make a multi key get. I'm trying to support features like, get me all the data where the key starts with "foo.bar". Thanks for the fun project. Aaron
Re: strange get_range_slices behaviour v0.6.1
ard1 [636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))} to 1...@localhost/127.0.0.1 DEBUG 09:34:48,133 Processing response on a callback from 1...@localhost/127.0.0.1 DEBUG 09:34:48,133 range slices read object2 DEBUG 09:34:48,133 range slices read object1 DEBUG 09:34:48,133 range slices read object3 In [39]: cass_test.read_range(conn, "object3", "") Out[39]: [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693, name='col_name', value='col_value'), super_column=None)], key='object3')] DEBUG 09:35:26,090 range_slice DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@24e33e18]), range=[123092639156685888118746480803115294277,0], max_keys=1000} DEBUG 09:35:26,090 Adding to restricted ranges [123092639156685888118746480803115294277,0] for (75349581786326521367945210761838448174,75349581786326521367945210761838448174] DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1', column_family='Standard1', super_column=null, predicate=SlicePredicate(column_names:[...@24e33e18]), range=[123092639156685888118746480803115294277,0], max_keys=1000} from 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3', cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))} to 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 Processing response on a callback from 1...@localhost/127.0.0.1 DEBUG 09:35:26,090 range slices read object3 thanks Aaron On Sun, 25 Apr 2010 20:23:05 -0700, aaron wrote: > I've been looking at the get_range_slices feature and have found some odd > behaviour I do not understand. Basically the keys returned in a range query > do not match what I would expect to see. I think it may have something to > do with the ordering of keys that I don't know about, but I'm just > guessing. > > On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using > Python and my own thin wrapper around the Thrift Python API. > > Step 1. > > Insert 3 keys into the "Standard 1" column family, called "object 1" > "object 2" and "object 3", each with a single column called 'name' with a > value like 'object1' > > Step 2. > > Do a get_range_slices call in the "Standard 1" CF, for column names > ["name"] with start_key "object1" and end_key "object3". I expect to see > three results, but I only see results for object1 and object2. Below are > the thrift types I'm passing into the Cassandra.Client object... > > - ColumnParent(column_family='Standard1', super_column=None) > - SlicePredicate(column_names=['name'], slice_range=None) > - KeyRange(end_key='object3', start_key='object1', count=4000, > end_token=None, start_token=None) > > and the output > > [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, > name='name', value='object1'), super_column=None)], key='object1'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, > name='name', value='object3'), super_column=None)], key='object3')] > > Step 3. > > Modify the get_range_slices call, so the start_key is object2. In this case > I expect to see 2 rows returned, but I get 3. Thrift args and return are > below... > > - ColumnParent(column_family='Standard1', super_column=None) > - SlicePredicate(column_names=['name'], slice_range=None) > - KeyRange(end_key='object3', start_key='object2', count=4000, > end_token=None, start_token=None) > > and the output > > [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715, > name='name', value='object2'), super_column=None)], key='object2'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439, > name='name', value='object1'), super_column=None)], key='object1'), > KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362, > name='name', value='object3'), super_column=None)], key='object3')] > > > > Can anyone explain these odd results? As I said I've got my own python > wrapper around the client, so I may be doing something wrong. But I've > pulled out the thrift objects and they go in and out of the thrift > Cassandra.Client, so I think I'm ok. (I have not noticed a systematic > problem with my wrapper). > > On a more general note, is there information on the sort order of keys when > using key ranges? I'm guessing the hash of the keys is compared and I > wondering if the hash's of the keys maintain the order of the original > values? Also I assume the order is byte order, rather than ascii or utf8. > > I was experimenting with the difference between column slicing and key > slicing. In my I could write the keys in as column names (they are in > buckets) as well and slice there first, then use the results to to make a > multi key get. I'm trying to support features like, get me all the data > where the key starts with "foo.bar". > > Thanks for the fun project. > > Aaron
Re: Cassandra didn't order data according to clustering order
It also helps to think about it with the token values of the partition key in place. Assume I have a table "users_by_dept" keyed like this: PRIMARY KEY ((department),username). Querying that table with the token function on the partition key looks like this: SELECT token(department),department,username,email FROM users_by_dept ; system.token(department) | department | username | email --+-+--+-- -8838453544589358145 | Engineering | Dinesh | din...@piedpiper.com -8838453544589358145 | Engineering | Gilfoyle | thedark...@piedpiper.com -8838453544589358145 | Engineering | Richard | rich...@piedpiper.com -4463195796437695053 | Marketing | Erlich | erl...@piedpiper.com -3086103490616810985 | Finance/HR |Jared | don...@piedpiper.com (5 rows) As you can see, department doesn't appear to be in any discernable order until you apply the token function to it. Aaron On Sun, Jul 15, 2018 at 8:05 AM, Soheil Pourbafrani wrote: > To the point, Thanks! > > On Sun, Jul 15, 2018 at 4:31 PM, shalom sagges > wrote: > >> The clustering column is ordered per partition key. >> >> So if for example I create the following table: >> create table desc_test ( >>id text, >>name text, >>PRIMARY KEY (id,name) >> ) WITH CLUSTERING ORDER BY (name DESC ); >> >> >> I insert a few rows: >> >> insert into desc_test (id , name ) VALUES ( 'abc', 'abc'); >> insert into desc_test (id , name ) VALUES ( 'abc', 'bcd'); >> insert into desc_test (id , name ) VALUES ( 'abc', 'aaa'); >> insert into desc_test (id , name ) VALUES ( 'fgh', 'aaa'); >> insert into desc_test (id , name ) VALUES ( 'fgh', 'bcd'); >> insert into desc_test (id , name ) VALUES ( 'fgh', 'abc'); >> >> >> And then read: >> select * from desc_test; >> >> id | name >> -+-- >> fgh | bcd >> fgh | abc >> fgh | aaa >> abc | bcd >> abc | abc >> abc | aaa >> >> (6 rows) >> >> >> You can see that the data is properly ordered in descending mode, BUT >> *for each partition key. * >> So in order to achieve what you want, you will have to add the relevant >> partition key for each select query. >> >> Hope this helps >> >> >> On Sun, Jul 15, 2018 at 2:16 PM, Soheil Pourbafrani < >> soheil.i...@gmail.com> wrote: >> >>> I created table using the command: >>> CREATE TABLE correlated_data ( >>> processing_timestamp bigint, >>> generating_timestamp bigint, >>> data text, >>> PRIMARY KEY (processing_timestamp, generating_timestamp) >>> ) WITH CLUSTERING ORDER BY (generating_timestamp DESC); >>> >>> >>> When I get data using the command : >>> SELECT * FROM correlated_data LIMIT 1 ; >>> >>> I expect it return the row with the biggest field "generating_timestamp", >>> but I got the same row every time I run the query, while row with bigger " >>> generating_timestamp" exists. What's the problem? >>> >> >> >
Re: Question about hector api documentation
I used to be surprised that people still ask about Hector here; and that questions here on Hector always seem to mirror new Hector questions on Stack Overflow. The problem (I think), is that places like Edureka! are still charging people $300 for a Cassandra training class, where they still actively teach people to use Hector: http://www.edureka.co/cassandra-course-curriculum Lesson to be learned, not all online training resources are created equally. New development with a framework that's been dead for 2 years isn't the best way to ensure success. Stick with the DataStax Java driver. Aaron On Fri, Jun 24, 2016 at 11:09 AM, Jonathan Haddad wrote: > +1, do not use Hector. It hasn't had a commit in years and uses the thrift > protocol which is now marked deprecated. The DataStax Java driver is > recommended, possibly with Achilles to make things a bit nicer. > On Thu, Jun 23, 2016 at 9:20 PM Noorul Islam K M > wrote: > >> >> The very first line README tells the story >> >> >> THIS PROJECT IS NO LONGER ACTIVE >> >> >> But you should be able to generate doc from source code. >> >> Regards, >> Noorul >> >> >> Sungju Hong writes: >> >> > Hello, >> > >> > I'm finding hector java api doc. >> > >> > I searched though google but couldn't find hector api doc. >> > >> > This link is broken also. >> > https://hector-client.github.io/hector/build/html/content/api.html# >> > >> > Can I know the way to get the doc? >> > >> > Thanks. >> > >> > Sungju. >> >
Re: Openstack and Cassandra
Shalom, We (Target) have been challenged by our management team to leverage OpenStack whenever possible, and that includes Cassandra. I was against it at first, but we have done some stress testing with it and had application teams try it out. So far, there haven't been any issues. A good use case for Cassandra on OpenStack, is to support an internal-facing application that needs to scale for disk footprint, or to spin-up a quick dev environment. When building clusters to support those solutions, we haven't had any problems due to simply deploying on OpenStack. Our largest Cassandra cluster on OpenStack is currently around 30 nodes. OpenStack is a good solution for that particular use case as we can easily add/remove nodes to accommodate the dynamic disk usage requirements. However, when query latency is a primary concern, I do still recommend that we use one of our external cloud providers. Hope that helps, Aaron On Thu, Dec 22, 2016 at 9:51 AM, Shalom Sagges wrote: > Thanks Vladimir! > > I guess I'll just have to deploy and continue from there. > > > > > Shalom Sagges > DBA > T: +972-74-700-4035 <+972%2074-700-4035> > <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson> > <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections > <https://liveperson.docsend.com/view/8iiswfp> > > > On Thu, Dec 22, 2016 at 5:20 PM, Vladimir Yudovin > wrote: > >> Hi Shalom, >> >> I don't see any reason why it wouldn't work, but obviously, any resource >> sharing affects performance. You can expect less degradation with SSD >> disks, I guess. >> >> >> Best regards, Vladimir Yudovin, >> *Winguzone <https://winguzone.com?from=list> - Cloud Cassandra Hosting* >> >> >> On Wed, 21 Dec 2016 13:31:22 -0500 *Shalom Sagges >> >* wrote >> >> Hi Everyone, >> >> I am looking into the option of deploying a Cassandra cluster on >> Openstack nodes instead of physical nodes due to resource management >> considerations. >> >> Does anyone has any insights regarding this? >> Can this combination work properly? >> Since the disks (HDDs) are part of one physical machine that divide their >> capacity to various instances (not only Cassandra), will this affect >> performance, especially when the commitlog directory will probably reside >> with the data directory? >> >> I'm at a loss here and don't have any answers for that matter. >> >> Can anyone assist please? >> >> Thanks! >> >> >> >> >> Shalom Sagges >> DBA >> T: +972-74-700-4035 <+972%2074-700-4035> >> <http://www.linkedin.com/company/164748> >> <http://twitter.com/liveperson> >> <http://www.facebook.com/LivePersonInc> >> We Create Meaningful Connections >> >> >> >> >> This message may contain confidential and/or privileged information. >> If you are not the addressee or authorized to receive this on behalf of >> the addressee you must not use, copy, disclose or take action based on this >> message or any information herein. >> If you have received this message in error, please advise the sender >> immediately by reply email and delete this message. Thank you. >> >> >> > > This message may contain confidential and/or privileged information. > If you are not the addressee or authorized to receive this on behalf of > the addressee you must not use, copy, disclose or take action based on this > message or any information herein. > If you have received this message in error, please advise the sender > immediately by reply email and delete this message. Thank you. >
Re: JNA on Windows
Processes start differently on windows. On windows it uses mklink to make a hard link https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/CLibrary.java#L170 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/07/2012, at 5:57 AM, Fredrik Stigbäck wrote: > Hello. > I have a question regarding JNA and Windows. > I read about the problem that when taking snapshots might require the > process space x 2 due to how hardlinks are created. > Is JNA for Windows supported? > Looking at jira issue > https://issues.apache.org/jira/browse/CASSANDRA-1371 looks like it but > checking in the Cassandra code base > org.apache.cassandra.utils.CLibrary the only thing I see, is > Native.register("c") which tries to load the c-library but I think > doesn't exists on Windows which will result in creating links with cmd > or fsutil and which might then triggger these extensive memory > requirements. > I'd be happy if someone could shed some light on this issue. > Regards > /Fredrik
Re: CQL 3 with a right API
Row keys are distinct. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/07/2012, at 7:30 AM, Shahryar Sedghi wrote: > Hi > > I am new to to Cassandra and we started with 1.1 and modeled everything with > Composite columns and wide rows and chose CQL 3 even if it is beta. Since I > could not find a way in Hector to set CQL 3, I started with Thrift and > prototyped all my scenarios with Thrift including retrieving all row keys > (without CQL). Recently I saw a JDBC driver for 1.1.1 and it is so promising > (slightly slower than thrift in most of my scenarios). Apparently JDBC "will > be" the ultimate Java API for Cassandra, so the question is: > > Since there is no distinct clause in CQL 3, is there a way to retrieve all > row keys "with JDBC" without browsing all columns of the CF (and make it > distinct yourself) ? > > Thanks > > Shahryar Sedghi > > -- > "Life is what happens while you are making other plans." ~ John Lennon
Re: Multiple keyspace question
I would do a test to see the latency difference under load between having 1 KS with 5 CF's and 50 KS with 5 CF's. Your test will need to read and write to all the CF's. Having many CF's may result in more frequent memtables flushes. (Personally it's not an approach I would take.) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/2012, at 8:15 AM, Shahryar Sedghi wrote: > Aaron > > I am going to have many (over 50 eventually) keyspaces with limited number of > CFs (5-6) do you think this one can cause a problem too. > > Thanks > > On Fri, Jul 6, 2012 at 2:28 PM, aaron morton wrote: > Also, all CF's in the same KS share one commit log. So all writes for the row > row key, across all CF's, are committed at the same time. > > Some other settings, such as caches in 1.1, are machine wide. > > If you have a small KS for something like app config, I'd say go with > whatever feels right. If you are talking about two full "application" KS's I > would think about their prospective workloads and growth patterns. Will you > always want to manage the two together ? > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 6/07/2012, at 9:47 PM, Robin Verlangen wrote: > >> Hi Ben, >> >> The amount of keyspaces is not the problem: the amount of column families >> is. Each column family adds a certain amount of memory usage to the system. >> You can cope with this by adding memory or using generic column families >> that store different types of data. >> >> With kind regards, >> >> Robin Verlangen >> Software engineer >> >> W http://www.robinverlangen.nl >> E ro...@us2.nl >> >> Disclaimer: The information contained in this message and attachments is >> intended solely for the attention and use of the named addressee and may be >> confidential. If you are not the intended recipient, you are reminded that >> the information remains the property of the sender. You must not use, >> disclose, distribute, copy, print or rely on this e-mail. If you have >> received this message in error, please contact the sender immediately and >> irrevocably delete this message and any copies. >> >> 2012/7/6 Ben Kaehne >> Good evening, >> >> I have read multiple keyspaces are bad before in a few discussions, but to >> what extent? >> >> We have some reasonably powerful machines and looking to host an additional >> (currently we have 1) 2 keyspaces within our cassandra cluster (of 3 nodes, >> using RF3). >> >> At what point does adding extra keyspaces start becoming an issue? Is there >> anything special we should be considering or watching out for as we >> implement this? >> >> I could not imagine that all cassandra users out there are running one >> massive keyspace, and at the same time can not imaging that all cassandra >> users have multiple clusters just to host different keyspaces. >> >> Regards. >> >> -- >> -Ben >> >> > > > > > -- > "Life is what happens while you are making other plans." ~ John Lennon
Re: Composite Slice Query returning non-sliced data
Something like: This is how I did the write in CLI and this is what it printed. and then This is how I did the read in the CLI and this is what it printed. It's hard to imagine what data is in cassandra based on code. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/2012, at 1:28 PM, Sunit Randhawa wrote: > Aaron, > > For writing, i am using cli. > Below is the piece of code that is reading column names of different types. > > > Composite start = new Composite(); > > start.addComponent(0, beginTime, > Composite.ComponentEquality.EQUAL); > > if (columns != null){ > int colCount =1; > for (String colName : columns){ > > start.addComponent(colCount,colName,Composite.ComponentEquality.EQUAL); > colCount++; > } > } > > Composite finish = new Composite(); > finish.addComponent(0, endTime, > Composite.ComponentEquality.EQUAL); > > if (columns != null){ > int colCount =1; > for (String colName : columns){ > if (colCount == columns.size()) > finish.addComponent(colCount,colName+ > Character.MAX_VALUE, > Composite.ComponentEquality.GREATER_THAN_EQUAL); > //Greater_than_equal is meant for any subslices > to A:B:C if searched on A:B > else > > finish.addComponent(colCount,colName,Composite.ComponentEquality.EQUAL); > colCount++; > } > } > SliceQuery sq > = HFactory.createSliceQuery(keyspace, StringSerializer.get(), > new CompositeSerializer(), > StringSerializer.get()); > sq.setColumnFamily(columnFamilyName); > > sq.setKey(key); > logger.debug("Start:"+start+",finish:"+finish); > sq.setRange(start, finish, false, 1); > > QueryResult> result = sq > .execute(); > ColumnSlice orderedRows = result.get(); > > Please let me know if you additional info. > > Thanks, > Sunit. > > On Fri, Jul 6, 2012 at 10:59 AM, aaron morton wrote: >> Can you provide an example of writing and reading column names of a >> different type. >> >> Thanks >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/07/2012, at 11:30 AM, Sunit Randhawa wrote: >> >> HI Aaron, >> >> It is >> >> create column family CF >> with comparator = >> 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)' >> and key_validation_class = UTF8Type >> and default_validation_class = UTF8Type; >> >> This is allowing me to insert column names of different type. >> >> Thanks, >> Sunit. >> On Thu, Jul 5, 2012 at 4:24 PM, aaron morton >> wrote: >> >> #2 has the Composite Column and #1 does not. >> >> >> They are both strings. >> >> >> All column names *must* be of the same type. What was your CF definition ? >> >> >> Cheers >> >> >> - >> >> Aaron Morton >> >> Freelance Developer >> >> @aaronmorton >> >> http://www.thelastpickle.com >> >> >> On 6/07/2012, at 7:26 AM, Sunit Randhawa wrote: >> >> >> Hello, >> >> >> I have 2 Columns for a 'RowKey' as below: >> >> >> #1 : set CF['RowKey']['1000']='A=1,B=2'; >> >> #2: set CF['RowKey']['1000:C1']='A=2,B=3''; >> >> >> #2 has the Composite Column and #1 does not. >> >> >> Now when I execute the Composite Slice query by 1000 and C1, I do get >> >> both the columns above. >> >> >> I am hoping get #2 only since I am specifically providing "C1" as >> >> Start and Finish Composite Range with >> >> Composite.ComponentEquality.EQUAL. >> >> >> >> I am not sure if this is by design. >> >> >> Thanks, >> >> Sunit. >> >> >> >>
Re: Effect of rangequeries with RandomPartitioner
for background http://wiki.apache.org/cassandra/FAQ#range_rp It maps the start key to a token, and then scans X rows from their on CL number of nodes. Rows are stored in token order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/2012, at 11:52 PM, prasenjit mukherjee wrote: > Wondering how a rangequery request is handled if RP is used. Will the > receiving node do a fan-out to all the nodes in the ring or it will > just execute the rangequery on its own local partition ? > > -- > Sent from my mobile device
Re: cannot build 1.1.2 from source
Did you try running ant clean first ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/07/2012, at 1:57 PM, Arya Goudarzi wrote: > Hi Fellows, > > I used to be able to build cassandra 1.1 up to 1.1.1 with the same set > of procedures by running ant on the same machine, but now the stuff > associated with gen-cli-grammar breaks the build. Any advice will be > greatly appreciated. > > -Arya > > Source: > source tarball for 1.1.2 downloaded from one of the mirrors in > cassandra.apache.org > OS: > Ubuntu 10.04 Precise 64bit > Ant: > Apache Ant(TM) version 1.8.2 compiled on December 3 2011 > Maven: > Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+) > Java: > java version "1.6.0_32" > Java(TM) SE Runtime Environment (build 1.6.0_32-b05) > Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) > > > > Buildfile: /home/arya/workspace/cassandra-1.1.2/build.xml > > maven-ant-tasks-localrepo: > > maven-ant-tasks-download: > > maven-ant-tasks-init: > > maven-declare-dependencies: > > maven-ant-tasks-retrieve-build: > > init-dependencies: > [echo] Loading dependency paths from file: > /home/arya/workspace/cassandra-1.1.2/build/build-dependencies.xml > > init: >[mkdir] Created dir: > /home/arya/workspace/cassandra-1.1.2/build/classes/main >[mkdir] Created dir: > /home/arya/workspace/cassandra-1.1.2/build/classes/thrift >[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/lib >[mkdir] Created dir: > /home/arya/workspace/cassandra-1.1.2/build/test/classes >[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/src/gen-java > > check-avro-generate: > > avro-interface-generate-internode: > [echo] Generating Avro internode code... > > avro-generate: > > build-subprojects: > > check-gen-cli-grammar: > > gen-cli-grammar: > [echo] Building Grammar > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g > > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:697:1: > Multiple token rules can match input such as "'-'": > IntegerNegativeLiteral, COMMENT > [java] > [java] As a result, token(s) COMMENT were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'I'": INCR, INDEX, > Identifier > [java] > [java] As a result, token(s) INDEX,Identifier were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'0'..'9'": IP_ADDRESS, > IntegerPositiveLiteral, DoubleLiteral, Identifier > [java] > [java] As a result, token(s) > IntegerPositiveLiteral,DoubleLiteral,Identifier were disabled for that > input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'T'": TRUNCATE, TTL, > Identifier > [java] > [java] As a result, token(s) TTL,Identifier were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'A'": T__109, > API_VERSION, AND, ASSUME, Identifier > [java] > [java] As a result, token(s) API_VERSION,AND,ASSUME,Identifier > were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'E'": EXIT, Identifier > [java] > [java] As a result, token(s) Identifier were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'L'": LIST, LIMIT, > Identifier > [java] > [java] As a result, token(s) LIMIT,Identifier were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: > Multiple token rules can match input such as "'B'": BY, Identifier > [java] > [java] As a result, token(s) Identifier were disabled for that input > [java] warning(209): > /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cas
Re: Serious issue updating Cassandra version and topology
To be clear, this happened on a 1.1.2 node and it happened again *after* you had run a scrub ? Has this cluster been around for a while or was the data created with 1.1 ? Can you confirm that all sstables were re-written for the CF? Check the timestamp on the files. Also also files should have the same version, the -h?- part of the name. Can you repair the other CF's ? If this cannot be repaired by scrub or upgradetables you may need to cut the row out of the sstables. Using sstable2json and json2sstable. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/07/2012, at 4:05 PM, Michael Theroux wrote: > Hello, > > We're in the process of trying to move a 6-node cluster from RF=1 to RF=3. > Once our replication factor was upped to 3, we ran nodetool repair, and > immediately hit an issue on the first node we ran repair on: > > INFO 03:08:51,536 Starting repair command #1, repairing 2 ranges. > INFO 03:08:51,552 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] new > session: will sync xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101, > /10.29.187.61 on range > (Token(bytes[d558]),Token(bytes[])] > for x.[a, b, c, d, e, f, g, h, i, > j, k, l, m, n, o, p, q, r, s] > INFO 03:08:51,555 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting > merkle trees for a (to [/10.29.187.61, > xxx-xx-xx-xxx-compute-1.amazonaws.com/10.202.99.101]) > INFO 03:08:52,719 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received > merkle tree for a from /10.29.187.61 > INFO 03:08:53,518 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received > merkle tree for a from > xxx-xx-xx-xxx-.compute-1.amazonaws.com/10.202.99.101 > INFO 03:08:53,519 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting > merkle trees for b (to [/10.29.187.61, > xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101]) > INFO 03:08:53,639 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Endpoints > /10.29.187.61 and xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101 are > consistent for a > INFO 03:08:53,640 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] a is > fully synced (18 remaining column family to sync for this session) > INFO 03:08:54,049 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received > merkle tree for b from /10.29.187.61 > ERROR 03:09:09,440 Exception in thread Thread[ValidationExecutor:1,1,main] > java.lang.AssertionError: row > DecoratedKey(Token(bytes[efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47]), > efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47) received > out of order wrt > DecoratedKey(Token(bytes[f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb]), > f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb) > at > org.apache.cassandra.service.AntiEntropyService$Validator.add(AntiEntropyService.java:349) > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:712) > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68) > at > org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > It looks from the log above, the sync of the "a" column family was > successful. However, the "b" column family resulted in this error. In > addition, the repair hung after this error. We ran node tool scrub on all > nodes and invalidated the key and row caches and tried again (with RF=2), and > it didn't help alleviate the problem. > > Some other important pieces of information: > We use ByteOrderedPartitioner (we MD5 hash the keys ourselves) > We're using Leveled Compaction > As we're in the middle of a transition, one node is on 1.1.2 (the one we > tried repair on), the other 5 are on 1.1.1 > > Thanks, > -Mike >
Re: Effect of rangequeries with RandomPartitioner
Index files map keys (not tokens) to offsets in the data file. A range scan uses the index file to seek to the start position in the data file and then does a partial scan of the data file. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/07/2012, at 7:24 PM, prasenjit mukherjee wrote: > Thanks for the response. Further questions inline.. > > On Mon, Jul 9, 2012 at 11:50 AM, samal wrote: >>> 1. With RandomPartitioner, on a given node, are the keys sorted by >>> their hash_values or original/unhashed keys ? >> >> hash value, > > 1. Based on the second answer in > http://stackoverflow.com/questions/2359175/cassandra-file-structure-how-are-the-files-used > it seems that the index-file ( for a given ssTable ) contains the > row-key ( and not the hash_keys ). Or may be I am missing something. > > 2. Do the "keys" in Index-file ( ref > http://hi.csdn.net/attachment/20/28/0_1322461982l3D8.gif ) > actually contain : hash(row_key)+row_key or something like that ? > Otherwise you need a separate mapping info from hash_bucket -> rows > for reading. > > -Thanks, > Prasenjit
Re: Setting the Memtable allocator on a per CF basis
> Would you guys consider adding this option to a future release? All improvements are considered :) Please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA and reference CASSANDRA-3073 > If you want I can try to create a patch myself and submit it to you? Sounds like a plan. Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/07/2012, at 1:47 AM, Joost van de Wijgerd wrote: > Hello Cassandra Devs, > > We are currently trying to optimize our Cassandra system with > different workloads. One of our workload is update heavy (very). > > Currently we are running with a patch that allows the Live Ratio to go > below 1.0 (lower bound set to 0.1 now) which gives us a > bit better performance in terms of flushes on this particular CF. We > then experienced unexpected memory issues which on further > inspection seems to be related to the SlabAllocator. What happens is > that we allocate a Region of 1MB every couple of seconds (the columns > we write in this CF contain serialized session data, can be 100K > each), so overwrites are actually done into another Region and these > regions are only freed (most of the time) when the Memtable is > flushed. We actually added some debug logs and to write about 300MB to > disk we created roughly 3000 regions. (3GB of data, some of them might > be collected before the flush but probably not much) > > It would really great if we could use the native allocator only for > this CF. Since the SlabAllocator gives us very good results on our > other > CFs. (we tried running on a patched version with the HeapAllocator set > but went OOM almost immediately) > > I have found this issue in which Jonathan mentions he is ok with > adding a configuration option: > > https://issues.apache.org/jira/browse/CASSANDRA-3073 > > Unfortunately it seems the issue was closed and nothing was implemented. > > Would you guys consider adding this option to a future release? > SlabAllocator should be the default but in the CF properties the > HeapAllocator > can be set. > > If you want I can try to create a patch myself and submit it to you? > > Kind Regards > > Joost > > -- > Joost van de Wijgerd > Visseringstraat 21B > 1051KH Amsterdam > +31624111401 > joost.van.de.wijgerd@Skype > http://www.linkedin.com/in/jwijgerd
Re: Composite Slice Query returning non-sliced data
Ah, it's a Hector query question. You may have bette luck on the Hector email list. Or if you can turn on debug logging on the server and grab the query that would be handy. The first thing that stands out is that (in cassandra) comparison operations are not used in a slice range. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/07/2012, at 12:36 PM, Sunit Randhawa wrote: > Aaron, > > Let me start from the beginning. > > 1- I have a ColumnFamily called Rollup15 with below definition: > > create column family Rollup15 > with comparator = > 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)' >and key_validation_class = UTF8Type >and default_validation_class = UTF8Type; > > > 2- Once created, it is empty. Below is the output of CLI: > > [default@Schema] list Rollup15; > Using default limit of 100 > > 0 Row Returned. > Elapsed time: 16 msec(s). > > 3- I use the Code below to insert the Composite Data into Cassandra: > > public void insertData(String columnFamilyName, String key, > String value, int rollupInterval, String... > columnSlice) { > > Composite colKey = new Composite(); > colKey.addComponent(rollupInterval, IntegerSerializer.get()); > if (columnSlice != null){ > for (String colName : columnSlice){ > colKey.addComponent(colName, serializer); > } > } > createMutator(keyspace, serializer).addInsertion(key, > columnFamilyName, > createColumn(colKey, value, new > CompositeSerializer(), > serializer)).execute(); > > } > > 4- After insertion, below is the CLI Output: > > [default@Schema] list Rollup15; > Using default limit of 100 > --- > RowKey: query1_1337295600 > => (column=15:Composite1:Composite2, value=value123, timesta > mp=134187983347) > > 1 Row Returned. > Elapsed time: 9 msec(s). > > So, there is record with 3 Composite Keys (15, Composite1 and Composite2) > > > 5- Now I am doing fetch based on Code Below. I am doing a fetch for > column "15:Composite3" which I know it is not there: > > Composite start = new Composite(); > >start.addComponent(0, 15, >Composite.ComponentEquality.EQUAL); > start.addComponent(1, > "Composite3",Composite.ComponentEquality.EQUAL); > > >Composite finish = new Composite(); >finish.addComponent(0, 15, >Composite.ComponentEquality.EQUAL); > >finish.addComponent(1,"Composite3"+ > Character.MAX_VALUE, Composite.ComponentEquality.GREATER_THAN_EQUAL); > >SliceQuery sq > = HFactory.createSliceQuery(keyspace, StringSerializer.get(), > new CompositeSerializer(), > StringSerializer.get()); >sq.setColumnFamily("Rollup15"); > >sq.setKey("query1_1337295600"); >sq.setRange(start, finish, false, 1); > >QueryResult> result = sq >.execute(); > ColumnSlice orderedRows = result.get(); > > 6- And I get output for RowKey: query1_1337295600 as > (column=15:Composite1:Composite2, value=value123, timesta > mp=134187983347) which should not be the case since it does not > belong to the 'Composite3' slice. > > Sunit. > > > On Sun, Jul 8, 2012 at 11:45 AM, aaron morton wrote: >> Something like: >> >> This is how I did the write in CLI and this is what it printed. >> >> and then >> >> This is how I did the read in the CLI and this is what it printed. >> >> It's hard to imagine what data is in cassandra based on code. >> >> cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 7/07/2012, at 1:28 PM, Sunit Randhawa wrote: >> >> Aaron, >> >> For writing, i am using cli. >> Below is the piece of code that is reading column names of different types. >> >> >> Composite start = new Composite(); >> >> start.addComponent(0, beginTime, >> Composite.
Re: BulkLoading sstables from v1.0.3 to v1.1.1
Do you have the full error logs ? Their should be a couple of caused by: errors that will help track it down where the original Assertion is thrown. The second error is probably the result of the first. Something has upset the SSTable tracking. If you can get the full error stack, and some steps to reproduce, can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/07/2012, at 7:43 PM, rubbish me wrote: > Thanks Ivo. > > We are quite close to releasing so we'd hope to understand what causing the > error and may try to avoid it where possible. As said, it seems to work ok > the first time round. > > The problem you referring in the last mail, was it restricted to bulk loading > or otherwise? > > Thanks > > -A > > Ivo Meißner 於 10 Jul 2012 07:20 寫道: > >> Hi, >> >> there are some problems in version 1.1.1 with secondary indexes and key >> caches that are fixed in 1.1.2. >> I would try to upgrade to 1.1.2 and see if the error still occurs. >> >> Ivo >> >> >> >>> >>> >>> Hi >>> >>> As part of a continuous development of a system migration, we have a test >>> build to take a snapshot of a keyspace from cassandra v 1.0.3 and bulk load >>> it to a cluster of 1.1.1 using the sstableloader.sh. Not sure if relevant, >>> but one of the cf contains a secondary index. >>> >>> The build basically does: >>> Drop the destination keyspace if exist >>> Add the destination keyspace, wait for schema to agree >>> run sstableLoader >>> Do some validation of the streamed data >>> >>> Keyspace / column families schema are basically the same, apart from in the >>> one of v1.1.1, we had compression and key cache switched on. >>> >>> On a clean cluster, (empty data, commit log, saved-cache dirs) the sstables >>> loaded beautifully. >>> >>> But subsequent build failed with >>> -- >>> [21:02:02][exec] progress: []... [total: 0 - 0MB/s (avg: >>> 0MB/s)]ERROR 21:02:02,811 Error in >>> ThreadPoolExecutorjava.lang.RuntimeException: java.net.SocketException: >>> Connection reset
Re: Using a node in separate cluster without decommissioning.
> Since replication factor is 2 in first cluster, I > won't lose any data. Assuming you have been running repair or working at CL QUORUM (which is the same as CL ALL for RF 2) > Is it advisable and safe to go ahead? um, so the plan is to turn off 2 nodes in the first cluster, restask them into the new cluster and then reverse the process ? If you simply turn two nodes off in the first cluster you will have reduce the availability for a portion of the ring. 25% of the keys will now have at best 1 node they can be stored on. If a node is having any sort of problems, and it's is a replica for one of the down nodes, the cluster will appear down for 12.5% of the keyspace. If you work at QUORUM you will not have enough nodes available to write / read 25% of the keys. If you decomission the nodes, you will still have 2 replicas available for each key range. This is the path I would recommend. If you _really_ need to do it what you suggest will probably work. Some tips: * do safe shutdowns - nodetool disablegossip, disablethrift, drain * don't forget to copy the yaml file. * in the first cluster the other nodes will collect hints for the first hour the nodes are down. You are not going to want these so disable HH. * get the nodes back into the first cluster before gc_grace_seconds expires. * bring them back and repair them. * when you bring them back, reading at CL ONE will give inconsistent results. Reading at QUOURM may result in a lot of repair activity. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 6:35 AM, rohit bhatia wrote: > Hi > > I want to take out 2 nodes from a 8 node cluster and use in another > cluster, but can't afford the overhead of streaming the data and > rebalance cluster. Since replication factor is 2 in first cluster, I > won't lose any data. > > I'm planning to save my commit_log and data directories and > bootstrapping the node in the second cluster. Afterwards I'll just > replace both the directories and join the node back to the original > cluster. This should work since cassandra saves all the cluster and > schema info in the system keyspace. > > Is it advisable and safe to go ahead? > > Thanks > Rohit
Re: failed to delete commitlog, cassandra can't accept writes
I don't think it's related to 4337. There is an explicit close call just before the deletion attempt. Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA with all of the information you've got here (including the full JVM vendor, version, build). Can you also check if the file it tries to delete exists ? (I assume it does, otherwise it would be a different error). Thanks for digging into this. ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 9:36 AM, Frank Hsueh wrote: > oops; I missed log line: > > >>>> > ERROR [COMMIT-LOG-ALLOCATOR] 2012-07-10 14:19:39,776 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[COMMIT-LOG-ALLOCATOR,5,main] > java.io.IOError: java.io.IOException: Failed to delete > C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log > at > org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Failed to delete > C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log > at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) > at > org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172) > ... 4 more > <<<< > > > On Tue, Jul 10, 2012 at 2:35 PM, Frank Hsueh wrote: > after reading the JIRA, I decided to use Java 6. > > with Casandra 1.1.2 on Java 6 x64 on Win7 sp1 x64 (all latest versions), > after a several minutes of sustained writes, I see: > > from system.log: > >>>> > java.io.IOError: java.io.IOException: Failed to delete > C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log > at > org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Failed to delete > C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log > at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54) > at > org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172) > ... 4 more > <<<< > > anybody seen this before? is this related to 4337 ? > > > > > On Sat, Jul 7, 2012 at 6:36 PM, Frank Hsueh wrote: > bug already reported: > > https://issues.apache.org/jira/browse/CASSANDRA-4337 > > > > On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh wrote: > Hi, > > I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest > versions). If it matters, I'm using a late version of Astyanax as my client. > > I'm using 4 threads to write a lot of data into a single CF. > > After several minutes of load (~ 30m at last incident), Cassandra stops > accepting writes (client reports an OperationTimeoutException). I looked at > the logs and I see on the Cassandra server: > > >>>> > ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] > java.io.IOError: java.io.IOException: Rename from > \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 > failed > at > org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127) > at > org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166) > at > org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.io.IOException: Rename from > \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 > failed > at > org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:105) > ... 5 more > <<<< > > Anybody else seen this before ? > > > -- > Frank Hsueh | frank.hs...@gmail.com > > > > -- > Frank Hsueh | frank.hs...@gmail.com > > > > -- > Frank Hsueh | frank.hs...@gmail.com > > > > -- > Frank Hsueh | frank.hs...@gmail.com
Re: snapshot issue
Make sure JNA is in the class path http://wiki.apache.org/cassandra/FAQ#jna Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/07/2012, at 9:38 PM, Adeel Akbar wrote: > Hi, > > I am trying to taking snapshot of my data but faced following error. Please > help me to resolve this issue. > > [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711 > Exception in thread "main" java.io.IOError: java.io.IOException: Cannot run > program "ln": java.io.IOException: error=12, Cannot allocate memory >at > org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1660) >at > org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1686) >at org.apache.cassandra.db.Table.snapshot(Table.java:198) >at > org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1393) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >at java.lang.reflect.Method.invoke(Method.java:616) >at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) >at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) >at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226) >at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) >at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251) >at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857) >at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795) >at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450) >at > javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) >at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285) >at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383) >at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >at java.lang.reflect.Method.invoke(Method.java:616) >at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) >at sun.rmi.transport.Transport$1.run(Transport.java:177) >at java.security.AccessController.doPrivileged(Native Method) >at sun.rmi.transport.Transport.serviceCall(Transport.java:173) >at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) >at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) >at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >at java.lang.Thread.run(Thread.java:636) > Caused by: java.io.IOException: Cannot run program "ln": java.io.IOException: > error=12, Cannot allocate memory >at java.lang.ProcessBuilder.start(ProcessBuilder.java:475) >at > org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181) >at > org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147) >at > org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:730) >at > org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1653) >... 33 more > Caused by: java.io.IOException: java.io.IOException: error=12, Cannot > allocate memory >at java.lang.UNIXProcess.(UNIXProcess.java:164) >at java.lang.ProcessImpl.start(ProcessImpl.java:81) >at java.lang.ProcessBuilder.start(ProcessBuilder.java:468) >... 37 more > > -- > > > Thanks & Regards > > *Adeel**Akbar* > > >
Re: Connected file list in Cassandra
Can pages appear in many documents ? If not try this Document CF: row_key: doc_id column: page_number is the order of pages, page_id is the row key for below Page CF: row_key: page_id columns: - doc_id - page_data If you know the page_id, read the doc_id from Page CF, then iterate over the Document CF and read from Page CF. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 7:47 AM, David Brosius wrote: > > why not just hold the pages as different columns in the same row? columns are > automatically sorted such that if the column name was associated with the > page number it would automatically flow the way you wanted. > > - Original Message - > From: "Tomek Hankus" > Sent: Wed, July 11, 2012 14:34 > Subject: Connected file list in Cassandra > > Hi, > at the moment I'm doing research about keeping "linked/connected file list" > in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where first page > is connected to second, second to third etc. > This "files connec tion/lin k" is not specified. Main goal is to be able to > get all "linked files" (the whole PDF/ all pages) while having only key to > first file (page). > > Is there any Cassandra tool/feature which could help me to do that or the > only way is to create some wrapper holding keys relations? > > > Tom H > >
Re: Composite column/key creation via Hector
You may have better luck on the Hector Mailing list… https://groups.google.com/forum/?fromgroups#!forum/hector-users Here is something I found in the docs though http://hector-client.github.com/hector/build/html/content/composite_with_templates.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 9:04 AM, Michael Cherkasov wrote: > Hi all, > > What is the right way to create CF with dynamic composite column and > composite key? > > Now I use code like this: > > private static final String DEFAULT_DYNAMIC_COMPOSITE_ALIAES = > > "(a=>AsciiType,b=>BytesType,i=>IntegerType,x=>LexicalUUIDType,l=>LongType,t=>TimeUUIDType,s=>UTF8Type,u=>UUIDType,A=>AsciiType(reversed=true),B=>BytesType(reversed=true),I=>IntegerType(reversed=true),X=>LexicalUUIDType(reversed=true),L=>LongType(reversed=true),T=>TimeUUIDType(reversed=true),S=>UTF8Type(reversed=true),U=>UUIDType(reversed=true))"; > > for composite columns: > BasicColumnFamilyDefinition columnFamilyDefinition = new > BasicColumnFamilyDefinition(); > columnFamilyDefinition.setComparatorType( > ComparatorType.DYNAMICCOMPOSITETYPE ); > columnFamilyDefinition.setComparatorTypeAlias( > DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); > columnFamilyDefinition.setKeyspaceName( keyspaceName ); > columnFamilyDefinition.setName( "TestCase" ); > columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); > ColumnFamilyDefinition cfDefStandard = new ThriftCfDef( > columnFamilyDefinition ); > cfDefStandard.setKeyValidationClass( > ComparatorType.UTF8TYPE.getClassName() ); > cfDefStandard.setDefaultValidationClass( > ComparatorType.UTF8TYPE.getClassName() ); > > for keys: > columnFamilyDefinition = new BasicColumnFamilyDefinition(); > columnFamilyDefinition.setComparatorType( ComparatorType.UTF8TYPE ); > columnFamilyDefinition.setKeyspaceName( keyspaceName ); > columnFamilyDefinition.setName( "Parameter" ); > columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); > cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); > cfDefStandard.setKeyValidationClass( > ComparatorType.DYNAMICCOMPOSITETYPE.getClassName() + > DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); > cfDefStandard.setDefaultValidationClass( > ComparatorType.UTF8TYPE.getClassName() ); > > Does it correct code? Do I really need so terrible > DEFAULT_DYNAMIC_COMPOSITE_ALIAES ?
Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
It's always a good idea to have a read of the NEWS.txt file https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 5:51 PM, Tyler Hobbs wrote: > On Wed, Jul 11, 2012 at 8:38 PM, Roshan wrote: > > > Currently we are using Cassandra 1.0.6 in our production system but suffer > with the CASSANDRA-3616 (it is already fixed in 1.0.7 version). > > We thought to upgrade the Cassandra to 1.1.X versions, to get it's new > features, but having some concerns about the upgrade and expert advices are > mostly welcome. > > 1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit > logs, etc without ant issue? And vise versa. Because if something happens to > 1.1.X after deployed to production, we want to downgrade to 1.0.6 version > (because that's the versions we tested with our applications). > > 1.1 can handle 1.0 data/schemas/etc without a problem, but the reverse is not > necessarily true. I don't know what in particular might break if you > downgrade from 1.1 to 1.0, but in general, Cassandra does not handle > downgrading gracefully; typically the SSTable formats have changed during > major releases. If you snapshot prior to upgrading, you can always roll back > to that, but you will have lost anything written since the upgrade. > > > 2. How do we need to do upgrade process? Currently we have 3 node 1.0.6 > cluster in production. Can we upgrade node by node? If we upgrade node by > node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue? > > Yes, you can do a rolling upgrade to 1.1, one node at a time. It's usually > fine to leave the cluster in a mixed state for a short while as long as you > don't do things like repairs, decommissions, or bootstraps, but I wouldn't > stay in a mixed state any longer than you have to. > > It's best to test major upgrades with a second, non-production cluster if > that's an option. > > -- > Tyler Hobbs > DataStax >
Re: How to come up with a predefined topology
> WIll it also use the > snitch/strategy info to find next 'R' replicas 'closest' to > coordinator-node ? yes. > 2. In a single DC ( with n racks and r replicas ) what algorithm The logic is here https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78 > a. n>r : I am assuming, have 1 replica in each rack. You have 1 replica in the first n racks. > b. n in each racks. int(n/r) racks will have the same number of replicas. n % r will have more. This is why multi rack replication can be tricky. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote: > Thanks. Some follow up questions : > > 1. How do the reads use strategy/snitch information ? I am assuming > the reads can go to any of the replicas. WIll it also use the > snitch/strategy info to find next 'R' replicas 'closest' to > coordinator-node ? > > 2. In a single DC ( with n racks and r replicas ) what algorithm > cassandra uses to write its replicas in following scenarios : > a. n>r : I am assuming, have 1 replica in each rack. > b. n in each racks. > > -Thanks, > Prasenjit > > On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs wrote: >> I highly recommend specifying the same rack for all nodes (using >> cassandra-topology.properties) unless you really have a good reason not too >> (and you probably don't). The way that replicas are chosen when multiple >> racks are in play can be fairly confusing and lead to a data imbalance if >> you don't catch it. >> >> >> On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee >> wrote: >>> >>>> As far as I know there isn't any way to use the rack name in the >>>> strategy_options for a keyspace. You >>>> might want to look at the code to dig into that, perhaps. >>> >>> Aha, I was wondering if I could do that as well ( specify rack options ) >>> :) >>> >>> Thanks for the pointer, I will dig into the code. >>> >>> -Thanks, >>> Prasenjit >>> >>> On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe >>> wrote: >>>> If you then specify the parameters for the keyspace to use these, you >>>> can control exactly which set of nodes replicas end up on. >>>> >>>> For example, in cassandra-cli: >>>> >>>> create keyspace ks1 with placement_strategy = >>>> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options >>>> = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; >>>> >>>> As far as I know there isn't any way to use the rack name in the >>>> strategy_options for a keyspace. You might want to look at the code to dig >>>> into that, perhaps. >>>> >>>> Whichever snitch you use, the nodes are sorted in order of proximity to >>>> the client node. How this is determined depends on the snitch that's used >>>> but most (the ones that ship with Cassandra) will use the default ordering >>>> of same-node < same-rack < same-datacenter < different-datacenter. Each >>>> snitch has methods to tell Cassandra which rack and DC a node is in, so it >>>> always knows which node is closest. Used with the Bloom filters this can >>>> tell us where the nearest replica is. >>>> >>>> >>>> >>>> -Original Message- >>>> From: prasenjit mukherjee [mailto:prasen@gmail.com] >>>> Sent: 11 July 2012 06:33 >>>> To: user >>>> Subject: How to come up with a predefined topology >>>> >>>> Quoting from >>>> http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy >>>> : >>>> >>>> "Asymmetrical replication groupings are also possible depending on your >>>> use case. For example, you may want to have three replicas per data center >>>> to serve real-time application requests, and then have a single replica in >>>> a >>>> separate data center designated to running analytics." >>>> >>>> Have 2 questions : >>>> 1. Any example how to configure a topology with 3 replicas in one DC ( >>>> with 2 in 1 rack + 1 in another rack ) and one replica in another DC ? >>>> The default networktopologystrategy with rackinferringsnitch will only >>>> give me equal distribution ( 2+2 ) >>>> >>>> 2. I am assuming the reads can go to any of the replicas. Is there a >>>> client which will send query to a node ( in cassandra ring ) which is >>>> closest to the client ? >>>> >>>> -Thanks, >>>> Prasenjit >>>> >>>> >> >> >> >> >> -- >> Tyler Hobbs >> DataStax >>
Re: Increased replication factor not evident in CLI
Do multiple nodes say the RF is 2 ? Can you show the output from the CLI ? Do show schema and show keyspace say the same thing ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 7:39 AM, Dustin Wenz wrote: > We recently increased the replication factor of a keyspace in our cassandra > 1.1.1 cluster from 2 to 4. This was done by setting the replication factor to > 4 in cassandra-cli, and then running a repair on each node. > > Everything seems to have worked; the commands completed successfully and disk > usage increased significantly. However, if I perform a describe on the > keyspace, it still shows replication_factor:2. So, it appears that the > replication factor might be 4, but it reports as 2. I'm not entirely sure how > to confirm one or the other. > > Since then, I've stopped and restarted the cluster, and even ran an > upgradesstables on each node. The replication factor still doesn't report as > I would expect. Am I missing something here? > > - .Dustin >
Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X
The advice Tyler gave is the correct. Do a rolling upgrade, and shapshot if you want to have a rollback. My personal approach is to upgrade a node or two and let run for a few hours. Just to avoid the situation where you uprade every node and then discover some problem that causes wailing and gnashing of teeth. In general, node by node: * drain * snapshot * shutdown * upgrade * turn on. When they are all up I snap shot again if there is space. Then run upgradetables. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 11:06 AM, Roshan wrote: > Thanks Aaron. My major concern is upgrade node by node. Because currently we > are using 1.0.6 in production and plan is to upgrade singe node to 1.1.2 at > a time. > > Any comments? > > Thanks. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Concerns-about-Cassandra-upgrade-from-1-0-6-to-1-1-X-tp7581197p7581221.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: High RecentWriteLatencyMicro
The write path for counters is different than non counter fields, for background http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf The write is applied on the leader *and then* replicated to the other replicas. This was controlled by a config setting called replicate_on_write which IIRC has been removed because you always want to do this. You can see this traffic in the REPLICATE_ON_WRITE thread pool. Have a look at the ROW stage and see it backing up. > 1) Is the whole of 7-8ms being spent in thrift overheads and > Scheduling delays ? (there is insignificant .1ms ping time between > machines) The storage proxy / jmx latency is the total latency for the coordinator after the thrift deserialisation (and before serialising the response). 7 to 8 ms sounds a little high considering the low local node latency. But it would make sense if the nodes were at peak throughput. At max throughput request latency is wait time + processing time. What happens to node local latency and cluster latency when the throughput goes down? Also this will be responsible for some of that latency… > (GC > stops threads for 100ms every 1-2 seconds, effectively pausing > cassandra 5-10% of its time, but this doesn't seem to be the reason) > 2) Do keeping a large number of CF(17 in our case) adversely affect > write performance? (except from the extreme flushing scenario) Should be fine with 17 > 3) I see a lot of threads(4,000-10,000) with names like > "pool-2-thread-*" These are connection threads. Use connecting pooling or try the thread pooled connection manager, see yaml for details. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 3:48 PM, rohit bhatia wrote: > Hi > > As I understand that writes in cassandra are directly pushed to memory > and using counters with CL.ONE shouldn't take the read latency for > counters in account. So Writes for incrementing counters with CL.ONE > should basically be really fast. > > But in my 8 node cluster(16 core/32G ram/cassandra1.0.5/java7 each) > with RF=2, At a traffic of 55k qps = 14k increments per node/7k write > requests per node, the write latency(from jmx) increases to around 7-8 > ms from the low traffic value of 0.5ms. The Nodes aren't even pushed > with absent I/O, lots of free RAM and 30% CPU idle time/OS Load 20. > The write latency by cfstats (supposedly the latency for 1 node to > increment its counter) is a small amount (< 0.05ms). > > 1) Is the whole of 7-8ms being spent in thrift overheads and > Scheduling delays ? (there is insignificant .1ms ping time between > machines) > > 2) Do keeping a large number of CF(17 in our case) adversely affect > write performance? (except from the extreme flushing scenario) > > 3) I see a lot of threads(4,000-10,000) with names like > "pool-2-thread-*" (pointed out as client-connection-threads on the > mailing list before) periodically forming up. but with idle cpu time > and zero pending tasks in tpstats, why do requests keep piling up (GC > stops threads for 100ms every 1-2 seconds, effectively pausing > cassandra 5-10% of its time, but this doesn't seem to be the reason) > > Thanks > Rohit
Re: How to come up with a predefined topology
> Is the above understanding correct ? yes, sorry. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 4:24 PM, prasenjit mukherjee wrote: > On Fri, Jul 13, 2012 at 4:04 AM, aaron morton wrote: >> The logic is here >> https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L78 > > Thanks Aaron for pointing to the code. > >> >> a. n>r : I am assuming, have 1 replica in each rack. >> >> You have 1 replica in the first n racks. >> >> b. n> in each racks. >> >> int(n/r) racks will have the same number of replicas. n % r will have more. > > Did you mean r%n ( since r>n) ? > > Shouldn't the logic be : all racks will have at least int(r/n) and r%n > will have 1 additional replica ? > > Sample use case ( r = 8, n = 3 ) > n1 : 3 ( 2+1 ) > n2: 3 ( 2+1 ) > n3: 2 > > Is the above understanding correct ? > > -Thanks, > Prasenjit > >> >> This is why multi rack replication can be tricky. >> >> Hope that helps. >> >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 12/07/2012, at 8:05 PM, prasenjit mukherjee wrote: >> >> Thanks. Some follow up questions : >> >> 1. How do the reads use strategy/snitch information ? I am assuming >> the reads can go to any of the replicas. WIll it also use the >> snitch/strategy info to find next 'R' replicas 'closest' to >> coordinator-node ? >> >> 2. In a single DC ( with n racks and r replicas ) what algorithm >> cassandra uses to write its replicas in following scenarios : >> a. n>r : I am assuming, have 1 replica in each rack. >> b. n> in each racks. >> >> -Thanks, >> Prasenjit >> >> On Thu, Jul 12, 2012 at 11:24 AM, Tyler Hobbs wrote: >> >> I highly recommend specifying the same rack for all nodes (using >> >> cassandra-topology.properties) unless you really have a good reason not too >> >> (and you probably don't). The way that replicas are chosen when multiple >> >> racks are in play can be fairly confusing and lead to a data imbalance if >> >> you don't catch it. >> >> >> >> On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee >> >> wrote: >> >> >> As far as I know there isn't any way to use the rack name in the >> >> strategy_options for a keyspace. You >> >> might want to look at the code to dig into that, perhaps. >> >> >> Aha, I was wondering if I could do that as well ( specify rack options ) >> >> :) >> >> >> Thanks for the pointer, I will dig into the code. >> >> >> -Thanks, >> >> Prasenjit >> >> >> On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe >> >> wrote: >> >> If you then specify the parameters for the keyspace to use these, you >> >> can control exactly which set of nodes replicas end up on. >> >> >> For example, in cassandra-cli: >> >> >> create keyspace ks1 with placement_strategy = >> >> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options >> >> = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 }; >> >> >> As far as I know there isn't any way to use the rack name in the >> >> strategy_options for a keyspace. You might want to look at the code to dig >> >> into that, perhaps. >> >> >> Whichever snitch you use, the nodes are sorted in order of proximity to >> >> the client node. How this is determined depends on the snitch that's used >> >> but most (the ones that ship with Cassandra) will use the default ordering >> >> of same-node < same-rack < same-datacenter < different-datacenter. Each >> >> snitch has methods to tell Cassandra which rack and DC a node is in, so it >> >> always knows which node is closest. Used with the Bloom filters this can >> >> tell us where the nearest replica is. >> >> >> >> >> -Original Message- >> >> From: prasenjit mukherjee [mailto:prasen@gmail.com] >> >> Sent: 11 July 2012 06:33 >> >> To: user >> >> Subject: How to come up with a predefined topology >> >> >> Quoting from &g
Re: Never ending manual repair after adding second DC
> Now, pretty much every single scenario points towards connectivity > problem, however we also have few PostgreSQL replication streams In the before time someone had problems with a switch/router that was dropping persistent but idle connections. Doubt this applies, and it would probably result in an error, just throwing it out there. Have you combed through the logs logging for errors or warnings ? I would repair a single small CF with -pr and watch closely. Consider setting DEBUG logging (you can do it via JMX) org.apache.cassandra.service.AntiEntropyService <- the class the manages repair org.apache.cassandra.streaming <- package that handles streaming There was a fix to repair in 1.0.11 but that has to do with streaming https://github.com/apache/cassandra/blob/cassandra-1.0/CHANGES.txt#L5 Good luck. ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/07/2012, at 10:16 PM, Bart Swedrowski wrote: > Hello everyone, > > I'm facing quite weird problem with Cassandra since we've added > secondary DC to our cluster and have totally ran out of ideas; this > email is a call for help/advice! > > History looks like: > - we used to have 4 nodes in a single DC > - running Cassandra 0.8.7 > - RF:3 > - around 50GB of data on each node > - randomPartitioner and SimpleSnitch > > All was working fine for over 9 months. Few weeks ago we decided we > want to add another 4 nodes in a second DC and join them to the > cluster. Prior doing that, we upgraded Cassandra to 1.0.9 to push it > out of the doors before the multi-DC work. After upgrade, we left it > working for over a week and it was all good; no issues. > > Then, we added 4 additional nodes in another DC bringing the cluster > to 8 nodes in total, spreading across two DCs, so now we've: > - 8 nodes across 2 DCs, 4 in each DC > - 100Mbps low-latency connection (sub 5ms) running over Cisco ASA > Site-to-Site VPN (which is ikev1 based) > - DC1:3,DC2:3 RFs > - randomPartitioner and using PropertyFileSnitch now > > nodetool ring looks as follows: > $ nodetool -h localhost ring > Address DC RackStatus State Load > OwnsToken > >148873535527910577765226390751398592512 > 192.168.81.2DC1 RC1 Up Normal 37.9 GB > 12.50% 0 > 192.168.81.3DC1 RC1 Up Normal 35.32 GB > 12.50% 21267647932558653966460912964485513216 > 192.168.81.4DC1 RC1 Up Normal 39.51 GB > 12.50% 42535295865117307932921825928971026432 > 192.168.81.5DC1 RC1 Up Normal 19.42 GB > 12.50% 63802943797675961899382738893456539648 > 192.168.94.178 DC2 RC1 Up Normal 40.72 GB > 12.50% 85070591730234615865843651857942052864 > 192.168.94.179 DC2 RC1 Up Normal 30.42 GB > 12.50% 106338239662793269832304564822427566080 > 192.168.94.180 DC2 RC1 Up Normal 30.94 GB > 12.50% 127605887595351923798765477786913079296 > 192.168.94.181 DC2 RC1 Up Normal 12.75 GB > 12.50% 148873535527910577765226390751398592512 > > (please ignore the fact that nodes are not interleaved; they should be > however there's been hiccup during the implementation phase. Unless > *this* is the problem!) > > Now, the problem: over 7 out of 10 manual repairs are not being > finished. They usually get stuck and show 3 different sympoms: > > 1). Say node 192.168.81.2 runs manual repair, it requests merkle > trees from 192.168.81.2, 192.168.81.3, 192.168.81.5, 192.168.94.178, > 192.168.94.179, 192.168.94.181. It receives them from 192.168.81.2, > 192.168.81.3, 192.168.81.5, 192.168.94.178, 192.168.94.179 but not > from 192.168.94.181. 192.168.94.181 logs are saying that it has sent > the merkle tree back but it's never received by 192.168.81.2. > 2). Say node 192.168.81.2 runs manual repair, it requests merkle > trees from 192.168.81.2, 192.168.81.3, 192.168.81.5, 192.168.94.178, > 192.168.94.179, 192.168.94.181. It receives them from 192.168.81.2, > 192.168.81.3, 192.168.81.5, 192.168.94.178, 192.168.94.179 but not > from 192.168.94.181. 192.168.94.181 logs are not saying *anything* > about merkle tree being sent. Also compactionstats are not even > saying anything about them being validated (generated) > 3). Merkle trees are being delivered, and nodes are sending data > across to sync theirselves. On a certain occasions, they'll get > "stuck" streaming files between each other at 100% and won't move > forward. Now the interesting bit is, the ones that are getting stuck > are always placed in diffe
Re: bootstrapping problem. 1.1.2 version
Check net stats a few times to look for progress, if there is none take a look at the logs on both sides for errors. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/07/2012, at 10:53 PM, Michael Cherkasov wrote: > Hi all, > > I have only one node and trying to add new DC with one node too. > > So I do all steps according this instruction > http://www.datastax.com/docs/1.0/operations/cluster_management#adding-nodes-to-a-cluster > > But looks like nothing happens: > > D:\db\apache-cassandra-1.1.2\bin>nodetool netstats > Starting NodeTool > Mode: JOINING > Not sending any streams. > Streaming from: /192.168.33.118 >DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-12-Data.db > sections=1 progress=0/369 - 0% >DevKS: > \home\user\cassandra-data\data\DevKS\TestCase\DevKS-TestCase-hd-11-Data.db > sections=1 progress=0/7255721 - 0% >DevKS: > \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-5-Data.db > sections=1 progress=0/113 - 0% >DevKS: > \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-6-Data.db > sections=1 progress=0/601578 - 0% >DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-13-Data.db > sections=1 progress=0/5138 - 0% >DevKS: > \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-4-Data.db > sections=1 progress=0/4049601 - 0% >DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-14-Data.db > sections=1 progress=0/4481977 - 0% > Pool NameActive Pending Completed > Commandsn/a 0 4 > Responses n/a 0 3030 > > > there's really good connection between DCs, I'm pretty sure that there's no > problem with connection. > So what can be wrong there? > > Also there was one more DC before, which was removed by 'removetoken' > command. for new DC I reused the same DC name.
Re: Cassandra occupy over 80% CPU when take a compaction
Are you able to put together a test case, maybe using the stress testing tool, that models your data layout? If so can you add it to https://issues.apache.org/jira/browse/CASSANDRA-3592 Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/07/2012, at 8:17 PM, 黄荣桢 wrote: > Hello, > > I find the compaction of my secondary index takes a long time and occupy a > lot of CPU. > > INFO [CompactionExecutor:8] 2012-07-16 12:03:16,408 CompactionTask.java > (line 213) Compacted to [XXX]. 71,018,346 to 9,020 (~0% of original) bytes > for 3 keys at 0.22MB/s. Time: 397,602ms. > > The stack of this over load Thread is: > "CompactionReducer:5" - Thread t@1073 >java.lang.Thread.State: RUNNABLE > at java.util.AbstractList$Itr.remove(AbstractList.java:360) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826) > at > org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer$MergeTask.call(ParallelCompactionIterable.java:224) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer$MergeTask.call(ParallelCompactionIterable.java:198) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > >Locked ownable synchronizers: > - locked <4be5863d> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > I guess this problem due to huge amount of columns in my index. The column > which is indexed only have 3 kinds of values, and one possible value have > several million of record, so this index have several million columns. > Compact these columns take a long time. > > I find a similar issue on the jira: > https://issues.apache.org/jira/browse/CASSANDRA-3592 > > Is there any way to work around this issue? Is there any way to improve the > efficiency to compact this index? >
Re: Enable CQL3 from Astyanax
Can you provide an example where you add data, run a CQL statement in cqlsh that does not work and maybe list the data in the CLI. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/07/2012, at 8:25 PM, Thierry Templier wrote: > Hello Aaron, > > I try to simulate a composition relationship within a single column family / > table (for example, an entity and its fields). I dynamically add columns the > contained elements. > > Let's take an example. Here is my table definition with CQL 3: > > CREATE TABLE "Entity" ( >"id" varchar, >"name" varchar, >PRIMARY KEY ("id") > ); > > If I want to store an entity with its two fields, I'll have the following > fields: > > id: "myentityid" > name: "myentityname" > fields.0.id: "myfield1id" > fields.0.name: "myfield1name" > fields.1.id: "myfield2id" > fields.1.name: "myfield2name" > > When accessing Cassandra data through Astyanax, I get all fields on a "load" > operation but not from a CQL3 request. > > Thanks very much for your help. > Thierry > >> Can you provide an example ? >> >> select * should return all the columns from the CF. >> >> Cheers
Re: bootstrapping problem. 1.1.2 version
> DC located in different environments one on Win7 other on Linux. Running different operating systems is not supported. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 12:30 AM, Michael Cherkasov wrote: > I found this error: > > ERROR [Streaming to /192.168.36.25:10] 2012-07-16 16:26:25,206 > AbstractCassandraDaemon.java (line 134) Exception in thread Thread[Streaming > to /192.168.36.25:10,5,main] > java.lang.IllegalStateException: target reports current file is > \home\user\cassandra-data\data\DevKS\TestCase\DevKS-TestCase-hd-12-Data.db > but is > /home/user/cassandra-data/data/DevKS/TestCase/DevKS-TestCase-hd-12-Data.db > at > org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:174) > at > org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:59) > at > org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:208) > at > org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181) > at > org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > > But I have no idea how to fix this. Also as you notice there's problem with > folder separator. DC located in different environments one on Win7 other on > Linux. > > 2012/7/16 aaron morton > Check net stats a few times to look for progress, if there is none take a > look at the logs on both sides for errors. > > Hope that helps. > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/07/2012, at 10:53 PM, Michael Cherkasov wrote: > >> Hi all, >> >> I have only one node and trying to add new DC with one node too. >> >> So I do all steps according this instruction >> http://www.datastax.com/docs/1.0/operations/cluster_management#adding-nodes-to-a-cluster >> >> But looks like nothing happens: >> >> D:\db\apache-cassandra-1.1.2\bin>nodetool netstats >> Starting NodeTool >> Mode: JOINING >> Not sending any streams. >> Streaming from: /192.168.33.118 >>DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-12-Data.db >> sections=1 progress=0/369 - 0% >>DevKS: >> \home\user\cassandra-data\data\DevKS\TestCase\DevKS-TestCase-hd-11-Data.db >> sections=1 progress=0/7255721 - 0% >>DevKS: >> \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-5-Data.db >> sections=1 progress=0/113 - 0% >>DevKS: >> \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-6-Data.db >> sections=1 progress=0/601578 - 0% >>DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-13-Data.db >> sections=1 progress=0/5138 - 0% >>DevKS: >> \home\user\cassandra-data\data\DevKS\Parameter\DevKS-Parameter-hd-4-Data.db >> sections=1 progress=0/4049601 - 0% >>DevKS: \home\user\cassandra-data\data\DevKS\Test\DevKS-Test-hd-14-Data.db >> sections=1 progress=0/4481977 - 0% >> Pool NameActive Pending Completed >> Commandsn/a 0 4 >> Responses n/a 0 3030 >> >> >> there's really good connection between DCs, I'm pretty sure that there's no >> problem with connection. >> So what can be wrong there? >> >> Also there was one more DC before, which was removed by 'removetoken' >> command. for new DC I reused the same DC name. > >
Re: Snapshot issue in Cassandra 0.8.1
> #./nodetool -h localhost snapshot cassandra_01_bkup tells cassandra to snapshot the keyspace called cassandra_01_bkup To specify a name for the snapshot us the -t option snapshot [keyspaces...] -t [snapshotName] - Take a snapshot of the specified keyspaces using optional name snapshotName Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 4:00 AM, Adeel Akbar wrote: > Hi, > > I have created snapshot with following command; > > #./nodetool -h localhost snapshot cassandra_01_bkup > > but the problem is, the snapshot is created on snapshot folder with different > name (like 1342269988711) and I have no idea that if I used this command in > script then how I gzip snapshot with script. Please help me to resolve this > issue. > -- > > Thanks & Regards > > Adeel Akbar >
Re: Never ending manual repair after adding second DC
Even if it is a network error it would be good to detect it. If you can run a small repair with those log settings I'll can take a look at the logs if you want. Cannot promise anything but another set of eyes may help. Ping me off list if you want to send me the logs. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 4:32 AM, Bill Au wrote: > I had ran into the same problem before: > > http://comments.gmane.org/gmane.comp.db.cassandra.user/25334 > > I have not fond any solutions yet. > > Bill > > On Mon, Jul 16, 2012 at 11:10 AM, Bart Swedrowski wrote: > > > On 16 July 2012 11:25, aaron morton wrote: > In the before time someone had problems with a switch/router that was > dropping persistent but idle connections. Doubt this applies, and it would > probably result in an error, just throwing it out there. > > Yes, been through them few times. There's literally no errors or warning at > all. And sometimes, as aforementioned, there's actually INFO that merkle > tree has been sent where the other side is not receiving it. > > Just now, I kicked off manual repair on node with IP 192.168.94.178 and just > got stuck on streaming files again. > > Node 192.168.94.179: > > Streaming from: /192.168.81.5 >Medals: /var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db sections=46 > progress=0/5096 - 0% >Medals: /var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db sections=244 > progress=0/1548510 - 0% >Medals: /var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db sections=228 > progress=0/82859 - 0% > > Node 192.168.81.5: > > Streaming to: /192.168.94.179 >/var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2 > progress=168/168 - 100% >/var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db sections=244 > progress=0/1548510 - 0% >/var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db sections=46 > progress=0/5096 - 0% >/var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db sections=228 > progress=0/82859 - 0% > > Looks like streaming this specific SSTable hasn't finished (or been ACKed on > the other side) > >/var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2 > progress=168/168 - 100% > > This morning I've tightend monitoring so now we've each node monitoring each > other with ICMP packets (20 every minute) and monitoring is silent; no issues > reported since the morning, not a single packet lost. > > I got some help from Acunu guys, first we believed we fixed the problem by > disabling bonding on the servers and blamed it for messing up stuff with > interrupts however this morning problem resurfaced. > > I can see (and Acunu says) everything is pointing to network related problem > (although I'd expect IP stack to correct simple PL) but there's no way to > back this up (unless only Cassandra related traffic is getting lost but *how* > to monitor for it???). > > Honestly, running out of ideas - further advice highly appreciated. >
Re: high i/o usage on one node
Is you client balancing between the two nodes ? Heavy writes at CL ONE could result in nodes dropping messages and having an unbalanced load. Are you sure there is nothing else running on the machines ? Just for fun have you turned off GC logging to see the impact ? Is there swapping going on ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 5:12 AM, feedly team wrote: > I am having an issue where one node of a 2 node cluster seems to be using > much more I/O than the other node. the cassandra read/write requests seem to > be balanced, but iostat shows the data disk to be maxed at 100% utilization > for one machine and <50% for the other. r/s to be about 3x greater on the > high i/o node. I am using a RF of 2 and consistency mode of ALL for reads and > ONE for writes (current requests are very read heavy). user CPU seems to be > fairly low and the same on both machines, but the high i/o machine shows an > os load of 34 (!) while the other machine reports 7. I ran a nodetool > compactionstats and there are no tasks pending which i assume means there is > no compaction going on, and the logs seem to be ok as well. the only > difference is that on the high i/o node, i am doing full gc logging, but > that's on a separate disk than the data. > > Another oddity is that the high i/o node shows a data size of 86GB while the > other shows 71GB. I understand there could be differences, but with a RF of 2 > I would think they would be roughly the equal? > > I am using version 1.0.10. >
Re: Truncate failing with 1.0 client against 0.7 cluster
UnavailableException is a server side error, whats the full error message ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 5:31 AM, Guy Incognito wrote: > i'm doing an upgrade of Cassandra 0.7 to 1.0 at the moment, and as part of > the preparation i'm upgrading to 1.0 client libraries (we use Hector 1.0-5) > prior to upgrading the cluster itself. I'm seeing some of our integration > tests against the dev 0.7 cluster fail as they get UnavailableExceptions when > trying to truncate the test column families. This is new behaviour with the > 1.0 client libraries, it doesn't happen with the 0.7 libraries. > > It seems to fail immediately, it doesn't eg wait for eg the 10 second RPC > timeout, it fails straight away. Anyone have any ideas as to what may be > happening? Interestingly I seem to be able to get around it if i only tell > Hector about one of the nodes (we have 4). If I give it all four then it > throws the UnavailableException.
Re: Truncate failing with 1.0 client against 0.7 cluster
truncate requires all all nodes to be available. Let us know when you have the full error. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 10:04 AM, Guy Incognito wrote: > sorry i don't have the exact text right now but it's along the lines of 'not > enough replicas available to handle the requested consistency level'. i'm > requesting quorum but i've tried with one, and any and it made no difference. > > On 16/07/2012 19:30, aaron morton wrote: >> UnavailableException is a server side error, whats the full error message ? >> >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 17/07/2012, at 5:31 AM, Guy Incognito wrote: >> >>> i'm doing an upgrade of Cassandra 0.7 to 1.0 at the moment, and as part of >>> the preparation i'm upgrading to 1.0 client libraries (we use Hector 1.0-5) >>> prior to upgrading the cluster itself. I'm seeing some of our integration >>> tests against the dev 0.7 cluster fail as they get UnavailableExceptions >>> when trying to truncate the test column families. This is new behaviour >>> with the 1.0 client libraries, it doesn't happen with the 0.7 libraries. >>> >>> It seems to fail immediately, it doesn't eg wait for eg the 10 second RPC >>> timeout, it fails straight away. Anyone have any ideas as to what may be >>> happening? Interestingly I seem to be able to get around it if i only tell >>> Hector about one of the nodes (we have 4). If I give it all four then it >>> throws the UnavailableException. >> > >
Re: Cassandra 1.0 hangs during GC
Assuming all the memory and yaml settings default that does not sound right. The first thought would be the memory meter not counting correctly... Do you do a lot of deletes ? Do you have a lot of CF's and/or secondary indexes ? Can you see log lines about the "liveRatio" for your cf's ? I would upgrade to 1.0.10 before getting too carried away though. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/07/2012, at 8:14 PM, Nikolay Kоvshov wrote: > > This is a cluster of 2 nodes, each having 8G of operating memory, > replicationfactor=2 > Write/read pressure is quite low and almost never exceeds 10/second > > From time to time (2-3 times in a month) I see GC activity in logs and for > this time cassandra stops responding to requests which results in a timeout > in upper-layer application. Total time of unavailability can be over 5 minues > (like in the following case) > > What can I do with that? Wiil it become much worse when my cluster grows up? > > INFO [GossipTasks:1] 2012-07-16 13:10:50,055 Gossiper.java (line 736) > InetAddress /10.220.50.9 is now dead. > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,056 GCInspector.java (line 123) > GC for ParNew: 391383 ms for 1 collections, 2025808488 used; max is 8464105472 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,067 StatusLogger.java (line 50) > Pool NameActive Pending Blocked > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java (line 65) > ReadStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java (line 65) > RequestResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java (line 65) > ReadRepairStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java (line 65) > MutationStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,073 StatusLogger.java (line 65) > ReplicateOnWriteStage 0 0 0 > INFO [GossipStage:1] 2012-07-16 13:10:50,074 Gossiper.java (line 722) > InetAddress /10.220.50.9 is now UP > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java (line 65) > GossipStage 159 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java (line 65) > AntiEntropyStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java (line 65) > MigrationStage0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java (line 65) > StreamStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) > MemtablePostFlusher 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) > FlushWriter 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) > MiscStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java (line 65) > InternalResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java (line 65) > HintedHandoff 0 0 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java (line 69) > CompactionManager n/a 0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java (line 81) > MessagingServicen/a 0,0 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java (line 85) > ColumnFamilyMemtable ops,data Row cache size/cap Key cache > size/cap > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java (line 88) > Keyspace1.PSS 36712,343842617 0/0 > 97995/100 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java (line 88) > Keyspace1.Standard1 128679,162567721 0/0 > 0/100 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java (line 88) > system.NodeIdInfo 0,0 0/0 > 0/1 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java (line 88) > system.IndexInfo 0,0 0/0 > 0/1 > INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java (line 88) > system.LocationInfo 0,0 0/0
Re: create if not exists ? create or update ?
It's not in the language as it stands. If you would like to see it add a request to https://issues.apache.org/jira/browse/CASSANDRA and maybe help out :) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 1:14 AM, Илья Шипицин wrote: > Hello! > > is it possible to write CQL statement for creation of ColumnFamily in "create > if not exists" manner ? > or "create or update" manner ? > > Cheers, > Ilya Shipitsin
Re: Presentation, CQL 3 and Paging
> the data... basically each row will represent a country, and each column of a > particular row will represent the data of a single user. Almost. The first field in the composite primary key is the row key, the remaining fields are used to prefix the column names. So each column will be a single column value for a single user. > 1) Is there an approach to randomly getting a User? Right now I'm doing all > this in the receiving end... I retrieve all the corresponding Users and then > select a random portion of them. Assuming your CF definition includes… PRIMARY KEY (country, last_login_date, email) You will need to know all three bits of information to get a single user. You may want to also store the user details in CF that is keyed on the user id / email. > 2) Unless I'm missing something there's now way to page a query using CQL. > Some of my queries returns a high volume of data and I don't have enough RAM. > Do I have to use the Thrift API or are there other high level libraries which > I could use (I'm using Ruby so I'm looking into Twitter's Cassandra gem but > I've not found yet a way to do this with it) The only way to page is to provide a start column and get the next X columns. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 3:54 AM, Bruno Di Pentima wrote: > Hello all! > > I'm new to Cassandra and the list. Still learning and testing basic stuff. > I've got a couple of questions which I'm hoping you could help me with. > > I'm currently using Cassandra 1.1.1 and CQL Spec 3. > > My schema consists of a table which represents a user list with with their > according properties (country, last_login_date, email, birthdate, name, etc). > I've created a composite key on the columns I always query (country and > last_login_date [plus email as the unique part of the key]). If I understand > correctly, this generates a wide row Columnfamily where the partition key is > the country and the remaining parts of the composite (last_login_date and > email) are used for clustering the data... basically each row will represent > a country, and each column of a particular row will represent the data of a > single user. > What I need querying this Columnfamily is a random list of X users from Y > country which have logged in in the last Z months. So... > 1) Is there an approach to randomly getting a User? Right now I'm doing all > this in the receiving end... I retrieve all the corresponding Users and then > select a random portion of them. > 2) Unless I'm missing something there's now way to page a query using CQL. > Some of my queries returns a high volume of data and I don't have enough RAM. > Do I have to use the Thrift API or are there other high level libraries which > I could use (I'm using Ruby so I'm looking into Twitter's Cassandra gem but > I've not found yet a way to do this with it) > > Thanks in advance. Hope I made myself understood... English is not my primary > language :) > > Best, > > -- Bruno Di Pentima > Santa Fe, Argentina > > > >
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
I would benchmark a default installation, then start tweaking. That way you can see if your changes result in improvements. To simplify things further try using the tools/stress utility in the cassandra source distribution first. It's pretty simple to use. Add clients until you see the latency increase and tasks start to back up in nodetool tpstats. If you see it report dropped messages it is over loaded. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 4:48 AM, Code Box wrote: > Thanks a lot for your reply guys. I was trying fsyn = batch and window =0ms > to see if the disk utilization is happening full on my drive. I checked the > numbers using iostat the numbers were around 60% and the CPU usage was also > not too high. > > Configuration of my Setup :- > > I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8 EC2 > Compute Units. > I have kept the replication factor equal to 3. The typical write size is 1 > KB. > > I tried adding different nodes each with 200 threads and the throughput got > split into two. If i do it from a single host with FSync Set to Periodic and > Window Size equal to 1000ms and using two nodes i am getting these numbers :- > > > [OVERALL], Throughput(ops/sec), 4771 > [INSERT], AverageLatency(us), 18747 > [INSERT], MinLatency(us), 1470 > [INSERT], MaxLatency(us), 446413 > [INSERT], 95thPercentileLatency(ms), 55 > [INSERT], 99thPercentileLatency(ms), 167 > > [OVERALL], Throughput(ops/sec), 4678 > [INSERT], AverageLatency(us), 22015 > [INSERT], MinLatency(us), 1439 > [INSERT], MaxLatency(us), 466149 > [INSERT], 95thPercentileLatency(ms), 62 > [INSERT], 99thPercentileLatency(ms), 171 > > Is there something i am doing wrong in cassandra Setup ?? What is the bet > Setup for Cassandra to get high throughput and good write latency numbers ? > > > > On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne > wrote: > FSync = Batch and Window = 0ms is expected to give relatively crappy result. > It means C* will fsync on disk pretty much all write. This is an overly safe > setting and no database with that kind of setting will perform correctly > because you're far too much bound by the hard drive. > > If you want strong local durability, use Batch (so that C* never ack a > non-fsynced write) but keep a bigger window. And in any case, Periodic will > give you better results and provided you use a replication factor > 1, it is > good enough in 99% of the case. > > As for the exact numbers, you didn't even say what kind of instance you are > using, nor the replication factor, nor the typical size of each write, so > it's hard to tell you if it seems reasonable or not. > > As for the scalability, as horschi said, it's about adding nodes, not adding > clients. > > -- > Sylvain > > > On Tue, Jul 17, 2012 at 3:43 PM, horschi wrote: > When they say "linear scalibility" they mean "throughput scales with the > amount of machines in your cluster". > > Try adding more machines to your cluster and measure the thoughput. I'm > pretty sure you'll see linear scalibility. > > regards, > Christian > > > > On Tue, Jul 17, 2012 at 6:13 AM, Code Box wrote: > I am doing Cassandra Benchmarking using YCSB for evaluating the best > performance for my application which will be both read and write intensive. I > have set up a three cluster environment on EC2 and i am using YCSB in the > same availability region as a client. I have tried various combinations of > tuning cassandra parameters like FSync ( Setting to batch and periodic ), > Increasing the number of rpc_threads, increasing number of concurrent reads > and concurrent writes, write consistency one and Quorum i am not getting very > great results and also i do not see a linear graph in terms of scalability > that is if i increase the number of clients i do not see an increase in the > throughput. > > Here are some sample numbers that i got :- > > Test 1:- Write Consistency set to Quorum Write Proportion = 100%. FSync = > Batch and Window = 0ms > > Threads Throughput ( write per sec )Avg Latency (ms) > TP95(ms) TP99(ms) Min(ms) Max(ms) > > > 1021493.198 45 1.499 291 > 1004070 23.828 70 2.2 260 > 200 4151 45.96 57 130 1.71242 > 300 419764.68115422 2.09 216 > > > If you look at the numbers the number of threads do not increase the > throughput. Also the latency values are no
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
> Three node cluster with replication factor of 3 gets me around 10 ms 100% > writes with consistency equal to ONE. The reads are really bad and they are > around 65ms. Using CL ONE in that situation, with a test that runs in a tight loop, can result in the clients overloading the cluster. Every node is a replica, so a write at CL ONE only has to wait for the local not to ACK. It will then return to the client before the remote nodes ACK, which means the client can send another request very quickly. In normal operation this may not be an issue, but load tests that run in a tight loop do not generate normal traffic. A better approach is to work at QUOURM so that network latency slows down individual client threads. Or generating the traffic using the Poisson distribution. The new load test from twitter uses that https://github.com/twitter/iago/ or you can use numpy for python. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 11:29 PM, Manoj Mainali wrote: > How kind of client are you using in YCSB? If you want to improve latency, try > distributing the requests among nodes instead of stressing a single node, try > host connection pooling instead of creating connection for each request. > Check high level clients like hector or asyantax for use if you are not > already using them. Some clients have ring aware request handling. > > You have a 3 nodes cluster and using a RF of three, that means all the node > will get the data. What CL are you using for writes? Latency increases for > strong CL. > > If you want to increase throughput, try increasing the number of clients. Of > course, it doesnt mean that throughtput will always increase. My observation > was that it will increase and after certain number of clients throughput > decrease again. > > Regards, > Manoj Mainali > > > On Wednesday, July 18, 2012, Code Box wrote: > The cassandra stress tool gives me values around 2.5 milli seconds for > writing. The problem with the Cassandra Stress Tool is that it just gives the > average latency numbers and the average latency numbers that i am getting are > comparable in some cases. It is the 95 percentile and 99 percentile numbers > are the ones that are bad. So it means that the 95% of requests are really > bad and the rest 5% are really good that makes the average go down. I want to > make sure that the 95% and 99% values are in one digit milli seconds. I want > them to be single digit because i have seen people getting those numbers. > > This is my conclusion till now with all the investigations:- > > Three node cluster with replication factor of 3 gets me around 10 ms 100% > writes with consistency equal to ONE. The reads are really bad and they are > around 65ms. > > I thought that network is the issue so i moved the client on a local machine. > Client on the local machine with one node cluster gives me again good average > write latencies but the 99%ile and 95%ile are bad. I am getting around 10 ms > for write and 25 ms for read. > > Network Bandwidth between the client and server is 1 Gigabit/second. I was > able to at the max generate 25 K requests. So it could be the client is the > bottleneck. I am using YCSB. May be i should change my client to some other. > > Throughput that i got from a client at the maximum local was 35K and remote > was 17K. > > > I can try these things now:- > > Use a different client and see how much numbers i get for 99% and 95%. I am > not sure if there is any client that gives me this detailed or i have to > write one of my own. > > Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps. > > Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99% numbers > have gone down. The throughput numbers have also gone down. > > Is there any other client that i can use except the cassandra stress tool and > YCSB and what ever numbers i have got are they good ? > > > --Akshat Vig. > > > > > On Tue, Jul 17, 2012 at 9:22 PM, aaron morton wrote: > I would benchmark a default installation, then start tweaking. That way you > can see if your changes result in improvements. > > To simplify things further try using the tools/stress utility in the > cassandra source distribution first. It's pretty simple to use. > > Add clients until you see the latency increase and tasks start to back up in > nodetool tpstats. If you see it report dropped messages it is over loaded. > > Hope that helps. > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 18/07/2012, at 4:48 AM,
Re: Can't change replication factor in Cassandra 1.1.2
Check the logs server to see if any errors are reported. If possible can you change the logging to DEBUG and run it ? > Note that the UUID did not change, Sounds fishy. There is an issue fixed in 1.1.3 similar to this https://issues.apache.org/jira/browse/CASSANDRA-4432 But I doubt it applies here. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 5:27 AM, Douglas Muth wrote: > Hi folks, > > I have an interesting problem in Cassandra 1.1.2, a Google Search > wasn't much help, so I thought I'd ask here. > > Essentially, I have a "problem keyspace" in my 2-node cluster that > keeps me from changing the replication factor on a specific keyspace. > It's probably easier to show what I'm seeing in cassandra-cli: > > [default@foobar] update keyspace test1 with strategy_options = > {replication_factor:1}; > 2d5f0d16-bb4b-3d75-a084-911fe39f7629 > Waiting for schema agreement... > ... schemas agree across the cluster > [default@foobar] update keyspace test1 with strategy_options = > {replication_factor:1}; > 7745dd06-ee5d-3e74-8734-7cdc18871e67 > Waiting for schema agreement... > ... schemas agree across the cluster > > Even though keyspace "test1" had a replication_factor of 1 to start > with, each of the above UPDATE KEYSPACE commands caused a new UUID to > be generated for the schema, which I assume is normal and expected. > > Then I try it with the problem keyspace: > > [default@foobar] update keyspace foobar with strategy_options = > {replication_factor:1}; > 7745dd06-ee5d-3e74-8734-7cdc18871e67 > Waiting for schema agreement... > ... schemas agree across the cluster > > Note that the UUID did not change, and the replication_factor in the > underlying database did not change either. > > The funny thing is that foobar had a replication_factor of 1 > yesterday, then I brought my second node online and changed the > replication_factor to 2 without incident. I only ran into issues when > I tried changing it back to 1. > > I tried running "nodetool clean" on both nodes, but the problem persists. > > Any suggestions? > > Thanks, > > -- Doug > > -- > http://twitter.com/dmuth
Re: Cassandra startup times
45 minutes for 90GB is high. The odd one out here is using NFS, local storage is the norm. I would look into the NFS first, low network IO and low CPU would suggest it is waiting on disk IO. The simple thing would be to try starting from local disk and see how much faster it is. Or look at the await time in iostat. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 4:54 PM, Ben Kaehne wrote: > Good evening, > > I am interested in improving the startup time of our cassandra cluster. > > We have a 3 node cluster (replication factor of 3) in which our application > requires quorum reads and writes to function. > > Each machine is well specced with 24gig of ram, 10 cores, jna enabled etc. > > On each server our keyspace files are so far around 90 Gb (stored on NFS > although I am not seeing signs that we have much network io). This size will > grow in future. > > Our startup time for 1 server at the moment is greater then half an hour (45 > minutes to 50 minutes even) which is putting a risk factor on the resiliance > of our service. I have tried version 1.09 to latest 1.12. > > I do not see too much system utilization while starting either. > > I gazed apon an article suggesting increased speed in 1.2 although when I set > it up, it did not seem to be any faster at all (if not slower). > > I was observing what was happening during startup and I noticed (via strace), > cassandra was doing lots of 8 byte reads from: > > > /var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1871-CompressionInfo.db > > /var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1874-CompressionInfo.db > > Also... Is there someone I can change the 8 byte reads to something greater? > 8 byte reads across NFS is terribly inefficient (and I am guessing the cause > of our terribly slow startup times). > > Regards, > > -- > -Ben
Re: Tripling size of a cluster
I would check for stored hints in /var/lib/cassandra/data/system Putting nodes in different racks can make placement tricky so… Are you running a multi DC setup ? Are you using the NTS ? What is the RF setting ? What setting do you have for the Snitch ? What is the full node assignments. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 6:00 PM, Mariusz Dymarek wrote: > Hi again, > we have now moved all nodes to correct position in ring, but we can see > higher load on 2 nodes, than on other nodes: > ... > node01-05 rack1 Up Normal 244.65 GB 6,67% > 102084710076281539039012382229530463432 > node02-13 rack2 Up Normal 240.26 GB 6,67% > 107756082858297180096735292353393266961 > node01-13 rack1 Up Normal 243.75 GB 6,67% > 113427455640312821154458202477256070485 > node02-05 rack2 Up Normal 249.31 GB 6,67% > 119098828422328462212181112601118874004 > node01-14 rack1 Up Normal 244.95 GB 6,67% > 124770201204344103269904022724981677533 > node02-14 rack2 Up Normal 392.7 GB 6,67% > 130441573986359744327626932848844481058 > node01-06 rack1 Up Normal 249.3 GB 6,67% > 136112946768375385385349842972707284576 > node02-15 rack2 Up Normal 286.82 GB 6,67% > 141784319550391026443072753096570088106 > node01-15 rack1 Up Normal 245.21 GB 6,67% > 147455692332406667500795663220432891630 > node02-06 rack2 Up Normal 244.9 GB 6,67% > 153127065114422308558518573344295695148 > ... > > Node: > * node01-15 = > 286.82 GB > * node02-14 = > 392.7 GB > > average load on all other nodes is around 245 GB, nodetool cleanup command > was invoked on problematic nodes after move operation... > Why this has happen? > And how can we balance cluster? > On 06.07.2012 20:15, aaron morton wrote: >> If you have the time yes I would wait for the bootstrap to finish. It >> will make you life easier. >> >> good luck. >> >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6/07/2012, at 7:12 PM, Mariusz Dymarek wrote: >> >>> Hi, >>> we`re in the middle of extending our cluster from 10 to 30 nodes, >>> we`re running cassandra 1.1.1... >>> We`ve generated initial tokens for new nodes: >>> "0": 0, # existing: node01-01 >>> "1": 5671372782015641057722910123862803524, # new: node02-07 >>> "2": 11342745564031282115445820247725607048, # new: node01-07 >>> "3": 17014118346046923173168730371588410572, # existing: node02-01 >>> "4": 22685491128062564230891640495451214097, # new: node01-08 >>> "5": 28356863910078205288614550619314017621, # new: node02-08 >>> "6": 34028236692093846346337460743176821145, # existing: node01-02 >>> "7": 39699609474109487404060370867039624669, # new: node02-09 >>> "8": 45370982256125128461783280990902428194, # new: node01-09 >>> "9": 51042355038140769519506191114765231718, # existing: node02-02 >>> "10": 56713727820156410577229101238628035242, # new: node01-10 >>> "11": 62385100602172051634952011362490838766, # new: node02-10 >>> "12": 68056473384187692692674921486353642291, # existing: node01-03 >>> "13": 7372784616620750397831610216445815, # new: node02-11 >>> "14": 79399218948218974808120741734079249339, # new: node01-11 >>> "15": 85070591730234615865843651857942052864, # existing: node02-03 >>> "16": 90741964512250256923566561981804856388, # new: node01-12 >>> "17": 96413337294265897981289472105667659912, # new: node02-12 >>> "18": 102084710076281539039012382229530463436, # existing: node01-05 >>> "19": 107756082858297180096735292353393266961, # new: node02-13 >>> "20": 113427455640312821154458202477256070485, # new: node01-13 >>> "21": 119098828422328462212181112601118874009, # existing: node02-05 >>> "22": 124770201204344103269904022724981677533, # new: node01-14 >>> "23": 130441573986359744327626932848844481058, # new: node02-14 >>> "24": 136112946768375385385349842972707284582, # existing: node01-06 >>> "25": 141784319550391026443072753096570088106, # new: node02-15 >>> "26": 147455692332406667500795663220432891630, # new: node01-15 >>> "27": 153127065114422308558518573344295695155, # existing: node02-06 >>> "28": 158798437896437
Re: Unreachable node, not in nodetool ring
I would: * run repair on 10.58.83.109 * run cleanup on 10.59.21.241 (I assume this was the first node). It looks like 0.56.62.211 is out of the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/07/2012, at 9:37 PM, Alain RODRIGUEZ wrote: > Not sure if this may help : > > nodetool -h localhost gossipinfo > /10.58.83.109 > RELEASE_VERSION:1.1.2 > RACK:1b > LOAD:5.9384978406E10 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > DC:eu-west > STATUS:NORMAL,85070591730234615865843651857942052864 > RPC_ADDRESS:0.0.0.0 > /10.248.10.94 > RELEASE_VERSION:1.1.2 > LOAD:3.0128207422E10 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > STATUS:LEFT,0,1342866804032 > RPC_ADDRESS:0.0.0.0 > /10.56.62.211 > RELEASE_VERSION:1.1.2 > LOAD:11594.0 > RACK:1b > SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f > DC:eu-west > REMOVAL_COORDINATOR:REMOVER,85070591730234615865843651857942052864 > STATUS:removed,170141183460469231731687303715884105727,1342453967415 > RPC_ADDRESS:0.0.0.0 > /10.59.21.241 > RELEASE_VERSION:1.1.2 > RACK:1b > LOAD:1.08667047094E11 > SCHEMA:e7e0ec6c-616e-32e7-ae29-40eae2b82ca8 > DC:eu-west > STATUS:NORMAL,0 > RPC_ADDRESS:0.0.0.0 > > Story : > > I had 2 node cluster > > 10.248.10.94 Token 0 > 10.59.21.241 Token 85070591730234615865843651857942052864 > > Had to replace node 10.248.10.94 so I add 10.56.62.211 on token 0 - 1 > (170141183460469231731687303715884105727). This failed, I removed > token. > > I repeat the previous operation with the node 10.59.21.241 and it went > fine. Next I decommissionned the node 10.248.10.94 and moved > 10.59.21.241 to the token 0. > > Now I am on the situation described before. > > Alain > > > 2012/7/19 Alain RODRIGUEZ : >> Hi, I wasn't able to see the token used currently by the 10.56.62.211 >> (ghost node). >> >> I already removed the token 6 days ago : >> >> -> "Removing token 170141183460469231731687303715884105727 for /10.56.62.211" >> >> "- check in cassandra log. It is possible you see a log line telling >> you 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same >> token" >> >> Nothing like that in the logs >> >> I tried the following without success : >> >> $ nodetool -h localhost removetoken 170141183460469231731687303715884105727 >> Exception in thread "main" java.lang.UnsupportedOperationException: >> Token not found. >> ... >> >> I really thought this was going to work :-). >> >> Any other ideas ? >> >> Alain >> >> PS : I heard that Octo is a nice company and you use Cassandra so I >> guess you're fine in there :-). I wish you the best thanks for your >> help. >> >> 2012/7/19 Olivier Mallassi : >>> I got that a couple of time (due to DNS issues in our infra) >>> >>> what you could try >>> - check in cassandra log. It is possible you see a log line telling you >>> 10.56.62.211 and 10.59.21.241 o 10.58.83.109 share the same token >>> - if 10.56.62.211 is up, try decommission (via nodetool) >>> - if not, move 10.59.21.241 or 10.58.83.109 to current token + 1 >>> - use removetoken (via nodetool) to remove the token associated with >>> 10.56.62.211. in case of failure, you can use removetoken -f instead. >>> >>> then, the unreachable IP should have disappeared. >>> >>> >>> HTH >>> >>> On Thu, Jul 19, 2012 at 10:38 AM, Alain RODRIGUEZ >>> wrote: >>>> >>>> Hi, >>>> >>>> I tried to add a node a few days ago and it failed. I finally made it >>>> work with an other node but now when I describe cluster on cli I got >>>> this : >>>> >>>> Cluster Information: >>>> Snitch: org.apache.cassandra.locator.Ec2Snitch >>>> Partitioner: org.apache.cassandra.dht.RandomPartitioner >>>> Schema versions: >>>> UNREACHABLE: [10.56.62.211] >>>> e7e0ec6c-616e-32e7-ae29-40eae2b82ca8: [10.59.21.241, 10.58.83.109] >>>> >>>> And nodetool ring gives me : >>>> >>>> Address DC RackStatus State Load >>>> OwnsToken >>>> >>>>85070591730234615865843651857942052864 >>>> 10.59.21.241eu-west 1b Up
Re: Counters values are less than expected [1.0.6 - Java/Pelops]
Nothing jumps out, can you reproduce the problem ? If you can repo it let us know and the RF / CL. Good luck. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 1:07 AM, cbert...@libero.it wrote: > Hi all, I have a problem with counters I'd like to solve before going in > production. > When a user write a comment in my platform I increase a counter (there is a > counter for each user) and I write a new column in the user specific row. > Everything worked fine but yesterday I noticed that the column count of the > Row was different from the counters value ... > > In my test environment the user had 7 comments, so 7 columns and 7 as value > of > his countercolumn. > I wrote 3 comments in few minutes, the counter value was still 7, the columns > number was 10! > Counters and columns are written in the same operation. I've checked for my > application log but all was normal. > I wrote one more comment today to check and now counter is 8 and column > number > is 11 . > > I'm trying to get permissions to read the cassandra log (no comment) but in > the meanwhile I'd like to know if anyone faced problems like this one ... > I've > read that sometimes people had counters bigger than expected due to client > retry of succesful operation marked as failed ... > > I will post log results ... thanks for any help > > Regards, > Carlo > >
Re: random partitioner and key scan
Repair and token moves work on ranges of Tokens, not row keys. These operations need to scan through all the rows in the token range. Ordering the rows by row key locally would mean that every row on the node would have to be scanned to find the ones whose token was in the required token range. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 4:46 AM, prasenjit mukherjee wrote: > When a request for token change is issued ( via nodetool ) then on > what basis a node will move some of its rows to other node, as there > will be no way to scan rows based on MD5 hash in a given node ( if the > keys are not prefixed with MD5 hash ) > > On Thu, Jul 19, 2012 at 1:43 PM, Patrik Modesto > wrote: >> Hi Prasenjit, >> >> I don't see the need to recalculate anything. One key has a one MD5 >> hash, it doesn't change. Just use the hash to select a node, than just >> the plain key. Can you elaborate on the redistribution please? >> >> Regards, >> P. >> >> On Thu, Jul 19, 2012 at 9:09 AM, prasenjit mukherjee >> wrote: >>> The probem could be while redistributing the tokens. In that case the >>> hashes has to be recalculated on each fo the candidate node. >>> >>> -Thanks, >>> Prasenjit >>> >>> On Thu, Jul 19, 2012 at 12:19 PM, Patrik Modesto >>> wrote: >>>> Hi, >>>> >>>> I know that RandomPartitioner does MD5 of a key and the MD5 is then >>>> used for key distribution AND key ordering. I was just wondering if >>>> it's possible to have RandomPartitioner just for key distribution and >>>> OrderedPartitioner just for per-node key ordering. That would solve >>>> the often requested key scan feature. >>>> >>>> Regards, >>>> Patrik
Re: Seeing writes when only expecting reads
My first guess would be read repair, are you seeing any increase in ReadRepairStage tasks ? RR (in 1.X) is normally only enabled for 10% of the request. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 5:17 AM, jmodha wrote: > Hi, > > We've observed a some strange behaviour after finishing streaming data into > our new cluster. > > Below are the steps we're doing: > > 1 - We stream data from an old cluster (1.0.3) to a new cluster (1.1.1). > This completes with no errors. > 2 - We then run some validation scripts (some Java code which uses hector) > to ensure what we've streamed in matches the data in our old cluster. > 3 - The validation script is a read-only operation, but when observing the > WriteCount JMX property on one of the column families its continuously > increasing across all nodes! > > I have verified that we only do reads and no writes, so why am I seeing the > WriteCount JMX property increasing on all of my nodes for this one column > family only? > > Some other info.. we're reading at LOCAL_QUORUM, its a two DC setup, 6 nodes > in each DC. All of our CF's have rows that contain a mixture of TTL and > non-TTL columns. > > Any ideas as to why this is happening would be greatly appreciated! > > P.S. I've attached some graphs which show reads vs. writes on the CF in > question. > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n7581333/reads.png > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n7581333/writes.png > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Seeing-writes-when-only-expecting-reads-tp7581333.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.
Re: does secondary index get created(rebuilt?) every time Cassandra restarts?
> INFO [OptionalTasks:1] 2012-07-18 14:05:27,648 SecondaryIndexManager.java > (line 183) Creating new index : ColumnDefinition{name=74696d657374616d70, > validator=org.apache.cassandra.db.marshal.DateType, index_type=KEYS, > index_name='MtsTrackingData_timestamp_idx'} Is the system reading the index meta data. Do you see any INFO level messages with "Submitting index build" ? cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 5:52 AM, Feng Qu wrote: > We have a CF with 11 secondary indexes (don't ask me why) and I noticed > restarting cassandra takes much longer time comparing to other clusters > without secondary indexes. In system.log I see 20 mins pause on building > index. > > this example shows a 12 mins gap. > INFO [SSTableBatchOpen:13] 2012-07-18 13:53:51,556 SSTableReader.java (line > 153) Opening /data/cassandra/data/mobileks/MtsTrackingData-hc-5744 > (1950145442 bytes) > INFO [SSTableBatchOpen:12] 2012-07-18 13:53:51,556 SSTableReader.java (line > 153) Opening /data/cassandra/data/mobileks/MtsTrackingData-hc-5197 > (12551211807 bytes) > INFO [OptionalTasks:1] 2012-07-18 14:05:27,648 SecondaryIndexManager.java > (line 183) Creating new index : ColumnDefinition{name=74696d657374616d70, > validator=org.apache.cassandra.db.marshal.DateType, index_type=KEYS, > index_name='MtsTrackingData_timestamp_idx'} > INFO [SSTableBatchOpen:2] 2012-07-18 14:05:27,673 SSTableReader.java (line > 153) Opening > /data/cassandra/data/mobileks/MtsTrackingData.MtsTrackingData_timestamp_idx-hc-4354 > (64493843 bytes) > INFO [SSTableBatchOpen:1] 2012-07-18 14:05:27,673 SSTableReader.java (line > 153) Opening > /data/cassandra/data/mobileks/MtsTrackingData.MtsTrackingData_timestamp_idx-hc-4344 > (258674041 bytes) > INFO [SSTableBatchOpen:5] 2012-07-18 14:05:27,673 SSTableReader.java (line > 153) Opening > /data/cassandra/data/mobileks/MtsTrackingData.MtsTrackingData_timestamp_idx-hc-1826 > (3397211685 bytes) > > Is this by design? Why it has to be created during start up? > > Feng Qu
Re: Replication factor - Consistency Questions
> But isn't QUORUM on a 2-node cluster still 2 nodes? Yes. 3 is where you start to get some redundancy - http://thelastpickle.com/2011/06/13/Down-For-Me/ Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 10:24 AM, Kirk True wrote: > But isn't QUORUM on a 2-node cluster still 2 nodes? > > On 07/17/2012 11:50 PM, Jason Tang wrote: >> Yes, for ALL, it is not good for HA, and because we meet problem when use >> QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got >> "UnavailableException" exception. >> >> 2012/7/18 Jay Parashar >> Thanks..but write ALL will fail for any downed nodes. I am thinking of >> QUORAM. >> >> >> From: Jason Tang [mailto:ares.t...@gmail.com] >> Sent: Tuesday, July 17, 2012 8:24 PM >> To: user@cassandra.apache.org >> Subject: Re: Replication factor - Consistency Questions >> >> >> Hi >> >> >> I am starting using Cassandra for not a long time, and also have problems in >> consistency. >> >> >> Here is some thinking. >> >> If you have Write:Any / Read:One, it will have consistency problem, and if >> you want to repair, check your schema, and check the parameter "Read repair >> chance: " >> >> http://wiki.apache.org/cassandra/StorageConfiguration >> >> >> And if you want to get consistency result, my suggestion is to have >> Write:ALL / Read:One, since for Cassandra, write is more faster then read. >> >> >> For performance impact, you need to test your traffic, and if your memory >> can not cache all your data, or your network is not fast enough, then yes, >> it will impact to write one more node. >> >> >> BRs >> >> >> 2012/7/18 Jay Parashar >> >> Hello all, >> >> There is a lot of material on Replication factor and Consistency level but I >> am a little confused by what is happening on my setup. (Cassandra 1.1.2). I >> would appreciate any answers. >> >> My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level; >> Write = ANY and Read = 1 >> >> I know that my consistency is Weak but since my RF = 2, I thought data would >> be just duplicated in both the nodes but sometimes, querying does not give >> me the correct (or gives partial) results. In other times, it gives me the >> right results >> Is the Read Repair going on after the first query? But as RF = 2, data is >> duplicated then why the repair? >> Note: My query is done a while after the Writes so data should have been in >> both the nodes. Or is this not the case (flushing not happening etc)? >> >> I am thinking of making the Write as 1 and Read as QUORAM so R + W > RF (1 + >> 2 > 2) to give strong consistency. Will that affect performance a lot >> (generally speaking)? >> >> Thanks in advance >> Regards >> >> Jay >> >> >> >> >
Re: Batch update efficiency with composite key
I'm assuming the logical row is in a CQL 3 CF with composite PRIMARYKEY http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 It will still be a no look write. The exception being secondary indexes and counters which include reads in the write path. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 10:26 AM, Kirk True wrote: > In Cassandra you don't read-then-write updates, you just write the updates. > > Sorry for being dense, but can you clarify a logical vs. physical row? > > Batching is useful for reducing round trips to the server. > > On 07/18/2012 06:18 AM, Leonid Ilyevsky wrote: >> I have a question about efficiency of updates to a CF with composite key. >> >> Let say I have 100 of logical rows to update, and they all belong to the >> same physical wide row. In my naïve understanding (correct me if I am >> wrong), in order to update a logical row, Cassandra has to retrieve the >> whole physical row, add columns to it, and put it back. So I put all my 100 >> updates in a batch and send it over. Would Cassandra be smart enough to >> recognize that they all belong to one physical row, retrieve it once, do all >> the updates and put it back once? Is my batch thing even relevant in this >> case? What happens if I just send updates one by one? >> >> I want to understand why I should use batches. I don’t really care about one >> timestamp for all records, I only care about efficiency. So I thought, I >> want to at least save on the number of remote calls, but I also wonder what >> happens on Cassandra side. >> >> >> This email, along with any attachments, is confidential and may be legally >> privileged or otherwise protected from disclosure. Any unauthorized >> dissemination, copying or use of the contents of this email is strictly >> prohibited and may be in violation of law. If you are not the intended >> recipient, any disclosure, copying, forwarding or distribution of this email >> is strictly prohibited and this email and any attachments should be deleted >> immediately. This email and any attachments do not constitute an offer to >> sell or a solicitation of an offer to purchase any interest in any >> investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”). >> Moon Capital does not provide legal, accounting or tax advice. Any statement >> regarding legal, accounting or tax matters was not intended or written to be >> relied upon by any person as advice. Moon Capital does not waive >> confidentiality or privilege as a result of this email. >
Re: 0.8 --> 1.1 Upgrade: Any Issues?
>> Can a rolling upgrade be done or is it all-or-nothing? > Rolling upgrade, take a look at news…. https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt (my personal approach is to test in dev, and upgrade a single node for a few hours to make sure everything is ok) Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/07/2012, at 4:10 PM, Peter Schuller wrote: >> We currently have a 0.8 production cluster that I would like to upgrade to >> 1.1. Are there any know compatibility or upgrade issues between 0.8 and 1.1? >> Can a rolling upgrade be done or is it all-or-nothing? > > If you have lots of keys: https://issues.apache.org/jira/browse/CASSANDRA-3820 > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Cassandra data model help
On Thu, Aug 9, 2012 at 5:52 AM, wrote: > Hi, > I am trying to create a Cassandra schema for cluster monitoring system, where > one cluster can have multiple nodes and I am monitoring multiple matrices > from a node. My raw data schema looks like and taking values in every 5 min > interval > > matrix_name + daily time stamp as row key, composite column name of node name > and time stamp and matrix value as column value > > the problem I am facing is a node can go back and forth between the > clusters(system can have more than one clusters) so if i need monthly > statistics plotting of a cluster I have to consider the nodes that are > leaving and joining during this period of time, some node might be part of > the cluster for just 15 days and some could join the cluster last 10 day of > month, so to plot data for a particular cluster for a time interval I need to > know the nodes which were part of that cluster for that period of time, what > could be the best schema for this solution ? I have tried few ideas so far no > luck, any suggestions ? Store each node stat in it's own row. Then decide if you want to track when a node joins/leaves a cluster so you can build the aggs on the fly or just store cluster aggregates in their own row as well. If the latter, depending on your polling methodology, you may want to use counters for the cluster aggregates. Also, if you're doing 5 min intervals with each row = 1 day, then your disk space usage is going to grow pretty quickly due to per-column overhead. You didn't say what the values are that you're storing, but if they're just 64bit integers or something like that, most of your disk space is actually being used for column overhead not your data. I worked around this by creating a 2nd CF, where each row = 1 year worth of data and each column = 1 days worth of data. The values are just a vector of the 5min values from the original CF. Then I just have a cron job which reads the previous days data and builds the vectors in the new CF and then deletes the original row. By doing this, my disk space requirements (before replication) went from over 1.1TB/year to 305GB/year. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Re: Cassandra data model help
You need to track node membership separately. I do that in a SQL database, but you can use cassandra for that. For example: rowkey = cluster name column name Composite[ :] = [join|leave] Then every time a node joins or leaves a cluster, write an entry. Then you can just read the row (ordered by epoch times) to build your list of active nodes for a given time period. Note, you can set a ending read range, but you basically have to start reading from 0. Notice that is really for figuring out which nodes are in a cluster for a given period of time. You wouldn't want to model it that way if you wanted to know which cluster(s) a single node was in over a given period of time. In that case you'd model it this way: rowkey = node name column name Composite[ :] = [join|leave] Depending on your needs, you may end up using both! On Fri, Aug 10, 2012 at 1:34 AM, wrote: > Thanks Aaron for your reply, > creating vector for raw data is good work around for decreasing disk space, > but I am not still clear tracking time for nodes, say if we want a query like > give me the list of nodes for a cluster between this period of time then how > do we get that information? do we scan through each node row as we will have > row for each node? > > thanks > > -Aaron Turner wrote: - > To: user@cassandra.apache.org > From: Aaron Turner > Date: 08/09/2012 07:38PM > Subject: Re: Cassandra data model help > > On Thu, Aug 9, 2012 at 5:52 AM, wrote: >> Hi, >> I am trying to create a Cassandra schema for cluster monitoring system, >> where one cluster can have multiple nodes and I am monitoring multiple >> matrices from a node. My raw data schema looks like and taking values in >> every 5 min interval >> >> matrix_name + daily time stamp as row key, composite column name of node >> name and time stamp and matrix value as column value >> >> the problem I am facing is a node can go back and forth between the >> clusters(system can have more than one clusters) so if i need monthly >> statistics plotting of a cluster I have to consider the nodes that are >> leaving and joining during this period of time, some node might be part of >> the cluster for just 15 days and some could join the cluster last 10 day of >> month, so to plot data for a particular cluster for a time interval I need >> to know the nodes which were part of that cluster for that period of time, >> what could be the best schema for this solution ? I have tried few ideas so >> far no luck, any suggestions ? > > Store each node stat in it's own row. Then decide if you want to > track when a node joins/leaves a cluster so you can build the aggs on > the fly or just store cluster aggregates in their own row as well. If > the latter, depending on your polling methodology, you may want to use > counters for the cluster aggregates. > > Also, if you're doing 5 min intervals with each row = 1 day, then your > disk space usage is going to grow pretty quickly due to per-column > overhead. You didn't say what the values are that you're storing, > but if they're just 64bit integers or something like that, most of > your disk space is actually being used for column overhead not your > data. > > I worked around this by creating a 2nd CF, where each row = 1 year > worth of data and each column = 1 days worth of data. The values are > just a vector of the 5min values from the original CF. Then I just > have a cron job which reads the previous days data and builds the > vectors in the new CF and then deletes the original row. By doing > this, my disk space requirements (before replication) went from over > 1.1TB/year to 305GB/year. > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" > -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
quick question about data layout on disk
Curious, but does cassandra store the rowkey along with every column/value pair on disk (pre-compaction) like Hbase does? If so (which makes the most sense), I assume that's something that is optimized during compaction? -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Re: quick question about data layout on disk
So how does that work? An sstable is for a single CF, but it can and likely will have multiple rows. There is no read to write and as I understand it, writes are append operations. So if you have an sstable with say 26 different rows (A-Z) already in it with a bunch of columns and you add a new column to row J, how does Cassandra store the column/value pair on disk in a way to refer to row J without re-writing the row key or some representation of it? Thanks, Aaron On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen wrote: > Rowkey is stored only once in any sstable file. > > That is, in the spesial case where you get sstable file per column/value, you > are correct, but normally, I guess most of us are storing more per key. > > Regards, > Terje > > On 11 Aug 2012, at 10:34, Aaron Turner wrote: > >> Curious, but does cassandra store the rowkey along with every >> column/value pair on disk (pre-compaction) like Hbase does? If so >> (which makes the most sense), I assume that's something that is >> optimized during compaction? >> >> >> -- >> Aaron Turner >> http://synfin.net/ Twitter: @synfinatic >> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & >> Windows >> Those who would give up essential Liberty, to purchase a little temporary >> Safety, deserve neither Liberty nor Safety. >>-- Benjamin Franklin >> "carpe diem quam minimum credula postero" -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Re: quick question about data layout on disk
Thanks Russell, that's the info I was looking for! On Sat, Aug 11, 2012 at 11:23 AM, Russell Haering wrote: > Your update doesn't go directly to an sstable (which are immutable), > it is first merged to an in-memory table. Eventually the memtable is > flushed to a new sstable. > > See http://wiki.apache.org/cassandra/MemtableSSTable > > On Sat, Aug 11, 2012 at 11:03 AM, Aaron Turner wrote: >> So how does that work? An sstable is for a single CF, but it can and >> likely will have multiple rows. There is no read to write and as I >> understand it, writes are append operations. >> >> So if you have an sstable with say 26 different rows (A-Z) already in >> it with a bunch of columns and you add a new column to row J, how does >> Cassandra store the column/value pair on disk in a way to refer to row >> J without re-writing the row key or some representation of it? >> >> Thanks, >> Aaron >> >> On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen >> wrote: >>> Rowkey is stored only once in any sstable file. >>> >>> That is, in the spesial case where you get sstable file per column/value, >>> you are correct, but normally, I guess most of us are storing more per key. >>> >>> Regards, >>> Terje >>> >>> On 11 Aug 2012, at 10:34, Aaron Turner wrote: >>> >>>> Curious, but does cassandra store the rowkey along with every >>>> column/value pair on disk (pre-compaction) like Hbase does? If so >>>> (which makes the most sense), I assume that's something that is >>>> optimized during compaction? >>>> >>>> >>>> -- >>>> Aaron Turner >>>> http://synfin.net/ Twitter: @synfinatic >>>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & >>>> Windows >>>> Those who would give up essential Liberty, to purchase a little temporary >>>> Safety, deserve neither Liberty nor Safety. >>>>-- Benjamin Franklin >>>> "carpe diem quam minimum credula postero" >> >> >> >> -- >> Aaron Turner >> http://synfin.net/ Twitter: @synfinatic >> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & >> Windows >> Those who would give up essential Liberty, to purchase a little temporary >> Safety, deserve neither Liberty nor Safety. >> -- Benjamin Franklin >> "carpe diem quam minimum credula postero" -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Re: DSE solr HA
You may have more luck on the DS forums http://www.datastax.com/support-forums/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/08/2012, at 6:00 AM, Mohit Anchlia wrote: > > Going through this page and it looks like indexes are stored locally > http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details . My > question is what happens if one of the solr nodes crashes? Is the data > indexed again on those nodes? > > Also, if RF > 1 then is the same data being indexed on all RF nodes or is > that RF only for document replication? >
Re: [gem] does "disconnect!" work properly?
My rough understanding of the ruby code is that it auto connects if necessary when you make a request. i.e. your get turns into a _multiget call here https://github.com/twitter/cassandra/blob/master/lib/cassandra/protocol.rb#L52 and that result is in a client connection if necessary https://github.com/twitter/cassandra/blob/master/lib/cassandra/cassandra.rb#L1043 If you still feel there is an problem the Issue list of github is probably the place to go https://github.com/twitter/cassandra/issues Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/08/2012, at 2:27 PM, Satoshi Yamada wrote: > > hi, > > I wonder if disconnect! method works properly in gem cassandra > because the code below does not cause exception. > > - > > client = Cassandra.new('pool', host_ip) > > ret = client.get(:db, 'test', key, option_one) > p ret > client.disconnect! > > ret = client.get(:db, 'test', key, option_one) > p ret > > - > > I use gem cassandra 0.14.0 > http://rubygems.org/gems/cassandra/versions/0.14.0 > > thanks in advance, > satoshi
Re: Problem with cassandra startup on Linux
Hi Dwight, I can confirm that issue on my MBP under Mountain Lion. Can you create a ticker at https://issues.apache.org/jira/browse/CASSANDRA and include the platform you are running on. For reference the change was added by https://issues.apache.org/jira/browse/CASSANDRA-4447 The change is only relevant if you are running on Java 7. As a work around change the relevant section of cassandra-env.sh to look like #startswith () [ "${1#$2}" != "$1" ] if [ "`uname`" = "Linux" ] ; then # reduce the per-thread stack size to minimize the impact of Thrift # thread-per-client. (Best practice is for client connections to # be pooled anyway.) Only do so on Linux where it is known to be # supported. #if startswith "$JVM_VERSION" '1.7.' #then #JVM_OPTS="$JVM_OPTS -Xss160k" #else JVM_OPTS="$JVM_OPTS -Xss128k" #fi fi Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/08/2012, at 5:16 PM, Dwight Smith wrote: > Installed 1.1.3 on my Linux cluster – the JVM_OPTS were truncated due to a > script error in Cassandra-env.sh: > > Invalid token in the following. > >startswith () [ "${1#$2}" != "$1" ] >
Re: Custom Partitioner Type
Yes, you need to implement the org.apache.cassandra.dht.IPartitioner interface, there are a couple of abstract implementations you could base it on. > I want all even keys to go to > node1 and odd keys to node2, is it feasible ? I'm not endorsing the idea of doing this, but as a hack to see if the effects you could use the BOP and format the keys as (python): str(key % 2) + "{0:0>#10}".format(key) So all keys are 11 digit strings, even keys start with 0 and odd with 1. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/08/2012, at 7:33 AM, A J wrote: > Is it possible to use a custom Partitioner type (other than RP or BOP) ? > Say if my rowkeys are all Integers and I want all even keys to go to > node1 and odd keys to node2, is it feasible ? How would I go about ? > > Thanks.
Re: GCInspector info messages in cassandra log
There are a couple of steps you can take if compaction is causing GC. - if you have a lot of wide rows consider reducing the in_memory_compaction_limit_in_mb yaml setting. This will slow down compaction but will reduce the memory usage. - reduce concurrent_compactors Both of these may slow down compaction. Once you have GC under control you may want to play with memory settings. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/08/2012, at 4:45 PM, Tamar Fraenkel wrote: > Hi! > I have 3 nodes ring running on Amazon EC2. > About once a week I see in the logs compaction messages and around the same > time info messages about GC (see below) that I think means it is taking too > long and happening too often. > > Does it mean I have to reduce my cache size? > Thanks, > Tamar > > INFO [ScheduledTasks:1] 2012-08-13 12:50:57,593 GCInspector.java (line 122) > GC for ParNew: 242 ms for 1 collections, 1541590352 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:27,740 GCInspector.java (line 122) > GC for ParNew: 291 ms for 1 collections, 1458227032 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:29,741 GCInspector.java (line 122) > GC for ParNew: 261 ms for 1 collections, 1228861368 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:30,833 GCInspector.java (line 122) > GC for ParNew: 319 ms for 1 collections, 1120131360 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:32,863 GCInspector.java (line 122) > GC for ParNew: 241 ms for 1 collections, 983144216 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:33,864 GCInspector.java (line 122) > GC for ParNew: 215 ms for 1 collections, 967702720 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:34,964 GCInspector.java (line 122) > GC for ParNew: 248 ms for 1 collections, 973803344 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:41,211 GCInspector.java (line 122) > GC for ParNew: 265 ms for 1 collections, 1071933560 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:43,212 GCInspector.java (line 122) > GC for ParNew: 326 ms for 1 collections, 1217367792 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:44,212 GCInspector.java (line 122) > GC for ParNew: 245 ms for 1 collections, 1203481536 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:45,213 GCInspector.java (line 122) > GC for ParNew: 209 ms for 1 collections, 1208819416 used; max is 1937768448 > INFO [ScheduledTasks:1] 2012-08-13 12:51:46,237 GCInspector.java (line 122) > GC for ParNew: 248 ms for 1 collections, 1338361648 used; max is 1937768448 > > > Tamar Fraenkel > Senior Software Engineer, TOK Media > > > > ta...@tok-media.com > Tel: +972 2 6409736 > Mob: +972 54 8356490 > Fax: +972 2 5612956 > > >
Re: What are the basic steps to improve Cassandra performance
> optimize the Cassandra for performance in general It's a lot easier to answer specific questions. Cassandra is fast, and there are way to make it faster in specific use cases. > improve the performance for "select * from X" type of queries Ah. Are you specifying a row key or are you trying to get multiple rows ? Getting the columns from the start of row is the most efficient query, see my presentation here http://www.datastax.com/events/cassandrasummit2012/presentations Hope that helps. ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/08/2012, at 4:56 PM, A Geek wrote: > hi all, > I'm a bit new to Cassandra and was wondering what are the basic steps that we > must follow to optimize the Cassandra for performance in general and how to > improve the performance for "select * from X" type of queries. Any help would > be much appreciated. > Note that, we have huge data sitting in our schema. > > Thanks, > DK
Re: GCInspector info messages in cassandra log
> According to cfstats there are the some CF with high Comacted row maximum > sizes (1131752, 4866323 and 25109160). Others max sizes are < 100. Are > these considered to be problematic, what can I do to solve that? They are only 1, 4 and 25 MB. Not too big. > What should be the values of in_memory_compaction_limit_in_mb and > concurrent_compactors and how do I change them? Sounds like you dont have very big CF's, so changing the in_memory_compaction_limit_in_mb may not make too much difference. Try changing concurrent_compactors to 2 in the yaml file. This change will let you know if GC and compaction are related. > change yaml file and restart, yes > What do I do about the long rows? What value is considered too big. They churn more memory during compaction. If you have a lot of rows +32 MB I would think about it, does not look that way. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 3:15 AM, Tamar Fraenkel wrote: > Hi! > It helps, but before I do more actions I want to give you some more info, and > ask some questions: > > Related Info > According to my yaml file (where do I see these parameters in the jmx? I > couldn't find them): > in_memory_compaction_limit_in_mb: 64 > concurrent_compactors: 1, but it is commented out, so I guess it is the > default value > multithreaded_compaction: false > compaction_throughput_mb_per_sec: 16 > compaction_preheat_key_cache: true > According to cfstats there are the some CF with high Comacted row maximum > sizes (1131752, 4866323 and 25109160). Others max sizes are < 100. Are > these considered to be problematic, what can I do to solve that? > During compactions Cassandra is slower > Running Cassandra Version 1.0.8 > Questions > What should be the values of in_memory_compaction_limit_in_mb and > concurrent_compactors and how do I change them? change yaml file and restart, > or can it be done using jmx without restarting Cassandra? > What do I do about the long rows? What value is considered too big. > > I appreciate your help! > Thanks, > > > > Tamar Fraenkel > Senior Software Engineer, TOK Media > > > > ta...@tok-media.com > Tel: +972 2 6409736 > Mob: +972 54 8356490 > Fax: +972 2 5612956 > > > > > > On Tue, Aug 14, 2012 at 1:22 PM, aaron morton wrote: > There are a couple of steps you can take if compaction is causing GC. > > - if you have a lot of wide rows consider reducing the > in_memory_compaction_limit_in_mb yaml setting. This will slow down compaction > but will reduce the memory usage. > > - reduce concurrent_compactors > > Both of these may slow down compaction. Once you have GC under control you > may want to play with memory settings. > > Hope that helps. > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/08/2012, at 4:45 PM, Tamar Fraenkel wrote: > >> Hi! >> I have 3 nodes ring running on Amazon EC2. >> About once a week I see in the logs compaction messages and around the same >> time info messages about GC (see below) that I think means it is taking too >> long and happening too often. >> >> Does it mean I have to reduce my cache size? >> Thanks, >> Tamar >> >> INFO [ScheduledTasks:1] 2012-08-13 12:50:57,593 GCInspector.java (line 122) >> GC for ParNew: 242 ms for 1 collections, 1541590352 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:27,740 GCInspector.java (line 122) >> GC for ParNew: 291 ms for 1 collections, 1458227032 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:29,741 GCInspector.java (line 122) >> GC for ParNew: 261 ms for 1 collections, 1228861368 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:30,833 GCInspector.java (line 122) >> GC for ParNew: 319 ms for 1 collections, 1120131360 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:32,863 GCInspector.java (line 122) >> GC for ParNew: 241 ms for 1 collections, 983144216 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:33,864 GCInspector.java (line 122) >> GC for ParNew: 215 ms for 1 collections, 967702720 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:34,964 GCInspector.java (line 122) >> GC for ParNew: 248 ms for 1 collections, 973803344 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 12:51:41,211 GCInspector.java (line 122) >> GC for ParNew: 265 ms for 1 collections, 1071933560 used; max is 1937768448 >> INFO [ScheduledTasks:1] 2012-08-13 1
Re: incremental backup and Priam?
The Priam code is looking for the //backups directory created by cassandra during incremental backups. If it finds it the files are uploaded to S3. It's taking the built in incremental backups off node. (AFAIK) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 8:16 AM, Yang wrote: > in the initial incremental backup implementation, > the hardlinking to the backup dir was in the CFS.addSSTable() code, so it's > part of the Cassandra code. > > I looked at Priam, > https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/backup/IncrementalBackup.java > > this code pretty much does the same thing as the original addSSTable() > incremental backup . > so the Priam backup code operates outside of Cassandra write path? > any insight into why this approach was chosen instead of using the > incremental backup provided by Cassandra? > > thanks > Yang
Re: replace dead node? " token -1 "
> Using this method, when choosing the new , should we still use the T-1 > ? (AFAIK) No. replace_token is used when you want to replace a node that is dead. In this case the dead node will be identified by its token. > if so, would the duplicate token (same token but different ip) cause problems? If the nodes are bootstrapping an error is raised. Otherwise the token ownership is passed to the new node. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 11:07 AM, Yang wrote: > previously when a node dies, I remember the documents describes that it's > better to assign T-1 to the new node, > where T was the token of the dead node. > > > the new doc for 1.x here > > http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node > > > shows a new way to pass in cassandra.replace_token= > for the new node. > Using this method, when choosing the new , should we still use the T-1 > ? > > > Also in Priam code: > https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java > > line 148, it does not seem that Priam does the "-1" thing, but assigns the > original token T to the new node. > if so, would the duplicate token (same token but different ip) cause problems? > > > Thanks > Yang
Re: Partial composite result limit possible?
You cannot do that in a single query. The order of columns in output is the order they are stored in. And the API can only return a contiguous range of columns. In this case I would get the larger slice and then discard columns client side. Or build a second row that has the order of the columns reversed so you can select from (0, null) to (2, null). http://pycassa.github.com/pycassa/assorted/composite_types.html?#fetching-compositetype-data Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 4:10 PM, David Turnbull wrote: > Hi, I have a CF with a composite type (LongType, IntegerType) with some data > like this: > > RowKey: hihi > => (column=1000:1, value=616263) > => (column=1000:2, value=6465) > => (column=1000:3, value=66) > => (column=1000:4, value=6768) > => (column=2000:1, value=616263) > => (column=2000:2, value=6465) > => (column=2000:3, value=66) > => (column=2000:4, value=6768) > > I want to query from (1000,0 to 2000,2) such that I get 1000:1, 1000:2, > 2000:1 and 2000:2 back. > Is this possible? > > In pycassa, I can do cf.get('hihi', column_start=(1000,0), > column_finish=(2000,2) but that gives me 1000:1-4 and 2000:1-2. > Specifying a limit of 2 columns for the query just applies to the total > results, i.e. only 1000:1 and 1000:2. > > I could specify the composite columns fully in the query, but I'm hoping to > query over at least 300 columns, which seems bad. >
Re: GCInspector info messages in cassandra log
> Is there anything to do before that? like drain or flush? For a clean shutdown I do nodetool -h localhost disablethrift nodetool -h localhost disablegossip && sleep 10 nodetool -h localhost drain then kill > Would you recommend that? If I do it, how often should I do a full snapshot, > and how often should I backup the backup directory? Sounds like you could use Priam and be happier... http://techblog.netflix.com/2012/02/announcing-priam.html > I just saw that there is an option global_snapshot, is it still supported? I cannot find it. Try Piram or the instructions here, which are pretty much what you have described http://www.datastax.com/docs/1.0/operations/backup_restore Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 4:57 PM, Tamar Fraenkel wrote: > Aaron, > Thank you very much. I will do as you suggested. > > One last question regarding restart: > I assume, I should do it node by node. > Is there anything to do before that? like drain or flush? > > I am also considering enabling incremental backups on my cluster. Currently I > take a daily full snapshot of the cluster, tar it and load it to S3 (size now > is 3.1GB). Would you recommend that? If I do it, how often should I do a full > snapshot, and how often should I backup the backup directory? > > Another snapshot related question, currently I snapshot on each node and use > parallel-slurp to copy the snapshot to one node where I tar them. I just saw > that there is an option global_snapshot, is it still supported? Does that > mean that if I run it on one node the snapshot will contain data from all > cluster? How does it work in restore? Is it better than my current backup > system? > > Tamar Fraenkel > Senior Software Engineer, TOK Media > > > > ta...@tok-media.com > Tel: +972 2 6409736 > Mob: +972 54 8356490 > Fax: +972 2 5612956 > > > > > > On Tue, Aug 14, 2012 at 11:51 PM, aaron morton > wrote: >> According to cfstats there are the some CF with high Comacted row maximum >> sizes (1131752, 4866323 and 25109160). Others max sizes are < 100. Are >> these considered to be problematic, what can I do to solve that? > They are only 1, 4 and 25 MB. Not too big. > >> What should be the values of in_memory_compaction_limit_in_mb and >> concurrent_compactors and how do I change them? > Sounds like you dont have very big CF's, so changing the > in_memory_compaction_limit_in_mb may not make too much difference. > > Try changing concurrent_compactors to 2 in the yaml file. This change will > let you know if GC and compaction are related. > >> change yaml file and restart, > yes > >> What do I do about the long rows? What value is considered too big. > They churn more memory during compaction. If you have a lot of rows +32 MB I > would think about it, does not look that way. > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 15/08/2012, at 3:15 AM, Tamar Fraenkel wrote: > >> Hi! >> It helps, but before I do more actions I want to give you some more info, >> and ask some questions: >> >> Related Info >> According to my yaml file (where do I see these parameters in the jmx? I >> couldn't find them): >> in_memory_compaction_limit_in_mb: 64 >> concurrent_compactors: 1, but it is commented out, so I guess it is the >> default value >> multithreaded_compaction: false >> compaction_throughput_mb_per_sec: 16 >> compaction_preheat_key_cache: true >> According to cfstats there are the some CF with high Comacted row maximum >> sizes (1131752, 4866323 and 25109160). Others max sizes are < 100. Are >> these considered to be problematic, what can I do to solve that? >> During compactions Cassandra is slower >> Running Cassandra Version 1.0.8 >> Questions >> What should be the values of in_memory_compaction_limit_in_mb and >> concurrent_compactors and how do I change them? change yaml file and >> restart, or can it be done using jmx without restarting Cassandra? >> What do I do about the long rows? What value is considered too big. >> >> I appreciate your help! >> Thanks, >> >> >> >> Tamar Fraenkel >> Senior Software Engineer, TOK Media >> >> >> >> ta...@tok-media.com >> Tel: +972 2 6409736 >> Mob: +972 54 8356490 >> Fax: +972 2 5612956 >> >> >> >> >> >> On Tue, Aug 14, 2012 at 1:22 PM, aaron morton
Re: CQL3: Do boolean values need quoting in inserts?
Quoting false is correct from my reading of the Antlr grammar. Constant terms are either strings, UUID, int or long. I'm sure someone from DS will pickup the comment you made on the post and fix the example. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 9:06 PM, Andy Ballingall TF wrote: > Hi, > > I was following the examples on this page: > > http://www.datastax.com/dev/blog/whats-new-in-cql-3-0#comment-116250 > > In the example, a table was created as follows: > > CREATE TABLE altercations ( > instigator text, > started_at timestamp, > ships_destroyed int, > energy_used float, > alliance_involvement boolean, > PRIMARY KEY (instigator, started_at) > ); > > The example then showed an insert into this table: > > INSERT INTO altercations (instigator, started_at, ships_destroyed, > energy_used, alliance_involvement) > VALUES ('Jayne Cobb', '7943-07-23', 2, 4.6, false); > > This threw the following error: > > Bad Request: line 3:65 no viable alternative at input ‘false’ > > I managed to get it to work by single-quoting the boolean value. And > if I look at the data, it does indeed seem to have worked: > > cqlsh:test> select * from altercations; > instigator | started_at | alliance_involvement | energy_used | ships_destroyed > +–+———-+-+—– > Jayne Cobb | 7943-07-23 00:00:00+ | False | 4.6 | 2 > > > I was using cqlsh -3 ( cqlsh 2.2.0 | Cassandra 1.1.2 | CQL spec 3.0.0 > | Thrift protocol 19.32.0) > > Is this a bug, or must we quote booleans in CQL? > > Thanks > Andy > > -- > Andy Ballingall > Senior Software Engineer > > The Foundry > 6th Floor, The Communications Building, > 48, Leicester Square, > London, WC2H 7LT, UK > Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906 > Web: http://www.thefoundry.co.uk/ > > The Foundry Visionmongers Ltd. > Registered in England and Wales No: 4642027
Re: Migrating to a new cluster (using SSTableLoader or other approaches)
> WARN 09:02:38,534 Unable to instantiate cache provider > org.apache.cassandra.cache.SerializingCacheProvider; using default > org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d instead Happens when JNA is not in the path. Nothing to worry about when using the sstableloader. > ERROR 09:02:38,614 Error in ThreadPoolExecutor > java.lang.RuntimeException: java.io.EOFException: unable to seek to position > 93069003 in /opt/analytics/analytics/chart-hd-104-Data.db (65737276 bytes) in > read-only mode This one looks like an error. Can you run nodetool with DEBUG level logging and post the logs ? Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 9:32 PM, Filippo Diotalevi wrote: > Hi, > we are trying to use SSTableLoader to bootstrap a new 7-node cassandra (v. > 1.0.10) cluster with the snapshots taken from a 3-node cassandra cluster. The > new cluster is in a different data centre. > > After reading the articles at > [1] http://www.datastax.com/dev/blog/bulk-loading > [2] > http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx > > we are tried to follow this procedure > 1) we took a snapshot of our keyspaces in the old cluster and moved them to > the data folder of 3 of the new machines > 2) started cassandra in the new cluster > but we noticed that some column families were missing, other had missing data. > > After that we tried to use sstableloader > 1) we reinstalled cassandra in the new cluster > 2) run sstableloader (as explained in [2]) to load the keyspaces > > SSTableLoader starts, but the progress is always 0 and the transfer rate is > 0MB/s. Some warning and exceptions are present in the logs > > ./sstableloader /opt/analytics/analytics/ > Starting client (and waiting 30 seconds for gossip) ... > Streaming revelant part of /opt/analytics/analytics/chart-hd-104-Data.db > /opt/analytics/analytics/chart-hd-105-Data.db > /opt/analytics/analytics/chart-hd-106-Data.db > /opt/analytics/analytics/chart-hd-107-Data.db > /opt/analytics/analytics/chart-hd-108-Data.db to [/1x.xx.xx.xx5, > /1x.xx.xx.xx7, /1x.xx.xx.xx0, /1x.xx.xx.xx7, /1x.xx.xx.xx3, /1x.xx.xx.xx8, > /1x.xx.xx.xx7] > WARN 09:02:38,534 Unable to instantiate cache provider > org.apache.cassandra.cache.SerializingCacheProvider; using default > org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d instead > WARN 09:02:38,549 Unable to instantiate cache provider > org.apache.cassandra.cache.SerializingCacheProvider; using default > org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d instead > > > [….] > ERROR 09:02:38,614 Error in ThreadPoolExecutor > java.lang.RuntimeException: java.io.EOFException: unable to seek to position > 93069003 in /opt/analytics/analytics/chart-hd-104-Data.db (65737276 bytes) in > read-only mode > at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.EOFException: unable to seek to position 93069003 in > /opt/analytics/analytics/chart-hd-104-Data.db (65737276 bytes) in read-only > mode > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:253) > at > org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:136) > at > org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > Exception in thread "Streaming:1" java.lang.RuntimeException: > java.io.EOFException: unable to seek to position 93069003 in > /opt/analytics/analytics/chart-hd-104-Data.db (65737276 bytes) in read-only > mode > at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.EOFException: unable to seek to position 93069003 in > /opt/analytics/analytics/chart-hd-104-Data.db (65737276 bytes) in read-only > mode > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:253) > at > org.apache.cassandra.streaming.FileStreamTa
Re: Composite Column Slice query, wildcard first component?
> Is there a way to create a slice query that returns all columns where the > _second_ component is A? No. You can only get a contiguous slice of columns. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 7:21 AM, Mike Hugo wrote: > Hello, > > Given a row like this > > "key1" => (A:A:C), (A:A:B), (B:A:C), (B:C:D) > > Is there a way to create a slice query that returns all columns where the > _second_ component is A? That is, I would like to get back the following > columns by asking for columns where component[0] = * and component[1] = A > > (A:A:C), (A:A:B), (B:A:C) > > I could do some iteration and figure this out in more of a brute force > manner, I'm just curious if there's anything built in that might be more > efficient > > Thanks! > > Mike
Re: indexing question related to playOrm on github
> 1. Can playOrm be listed on cassandra's list of ORMs? It supports a JQL/HQL > query on a trillion rows in under 100ms (partitioning is the trick so you can > JQL a partition) No sure if we have an ORM specific page. If it's a client then feel free to add it to http://wiki.apache.org/cassandra/ClientOptions > I was wondering if cassandra has or will ever support eventual constancy > where it keeps both the REMOVE AND the ADD together such until it is on all 3 > replicated nodes and in resolving the consistency would end up with an index > that only has the very last one in the index. Not sure I fully understand but it sounds like you want a transaction, which is not going to happen. Internally when Cassandra updates a secondary index it does the same thing. But it synchronises updates around the same row so one thread will apply the changes at a time. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 12:34 PM, "Hiller, Dean" wrote: > 1. Can playOrm be listed on cassandra's list of ORMs? It supports a JQL/HQL > query on a trillion rows in under 100ms (partitioning is the trick so you can > JQL a partition) > 2. Many applications have a common indexing problem and I was wondering if > cassandra has or could have any support for this in the future…. > > When using wide row indexes, you frequently have . > as the composite key. This means when you have your object like so in the > database > > Activity { > pk: 65 > name: bill > } > > And then two servers want to save it as > > Activity { > pk:65 > name:tim > } > Activity { > pk:65 > name:mike > } > > Each server will remove <65> and BOTH servers will add <65> AND > <65> BUT one of them will really be a lie! I was wondering if > cassandra has or will ever support eventual constancy where it keeps both the > REMOVE AND the ADD together such until it is on all 3 replicated nodes and in > resolving the consistency would end up with an index that only has the very > last one in the index. > > Thanks, > Dean >
Re: SSTable Index and Metadata - are they cached in RAM?
> What about SSTable index, Not sure what you are referring to there. Each row has a in a SStable has a bloom filter and may have an index of columns. This is not cached. See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance > and Metadata? This is the meta data we hold in memory for every open sstable https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 7:34 PM, Maciej Miklas wrote: > Hi all, > > bloom filter for row keys is always in RAM. What about SSTable index, and > Metadata? > > Is it cached by Cassandra, or it relays on memory mapped files? > > > Thanks, > Maciej
Re: Migrating to a new cluster (using SSTableLoader or other approaches)
> Which nodetool command are you referring to? (info, cfstats, ring,….) My bad. I meant to write sstableloader > Do I modify the log4j-tools.properties in $CASSANDRA_HOME/conf to set the > nodetool logs to DEBUG? You can use the --debug option with sstableloader to get a better exception message. Also change the logging in log4j-tools.properties for get DEBUG messages so we can see what's going on. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 8:51 PM, Filippo Diotalevi wrote: >>> ERROR 09:02:38,614 Error in ThreadPoolExecutor >>> java.lang.RuntimeException: java.io.EOFException: unable to seek to >>> position 93069003 in /opt/analytics/analytics/chart-hd-104-Data.db >>> (65737276 bytes) in read-only mode >> >> >> This one looks like an error. >> >> Can you run nodetool with DEBUG level logging and post the logs ? > > Thank Aaron. > Which nodetool command are you referring to? (info, cfstats, ring,….) > Do I modify the log4j-tools.properties in $CASSANDRA_HOME/conf to set the > nodetool logs to DEBUG? > > Thanks, > -- > Filippo Diotalevi > > > > On Wednesday, 15 August 2012 at 22:53, aaron morton wrote: > >>> WARN 09:02:38,534 Unable to instantiate cache provider >>> org.apache.cassandra.cache.SerializingCacheProvider; using default >>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d >>> instead >> >> Happens when JNA is not in the path. Nothing to worry about when using the >> sstableloader. >> >>> ERROR 09:02:38,614 Error in ThreadPoolExecutor >>> java.lang.RuntimeException: java.io.EOFException: unable to seek to >>> position 93069003 in /opt/analytics/analytics/chart-hd-104-Data.db >>> (65737276 bytes) in read-only mode >> >> This one looks like an error. >> >> Can you run nodetool with DEBUG level logging and post the logs ? >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> >> >> >> >> >> On 15/08/2012, at 9:32 PM, Filippo Diotalevi > (mailto:fili...@ntoklo.com)> wrote: >>> Hi, >>> we are trying to use SSTableLoader to bootstrap a new 7-node cassandra (v. >>> 1.0.10) cluster with the snapshots taken from a 3-node cassandra cluster. >>> The new cluster is in a different data centre. >>> >>> After reading the articles at >>> [1] http://www.datastax.com/dev/blog/bulk-loading >>> [2] >>> http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx >>> >>> we are tried to follow this procedure >>> 1) we took a snapshot of our keyspaces in the old cluster and moved them to >>> the data folder of 3 of the new machines >>> 2) started cassandra in the new cluster >>> but we noticed that some column families were missing, other had missing >>> data. >>> >>> After that we tried to use sstableloader >>> 1) we reinstalled cassandra in the new cluster >>> 2) run sstableloader (as explained in [2]) to load the keyspaces >>> >>> SSTableLoader starts, but the progress is always 0 and the transfer rate is >>> 0MB/s. Some warning and exceptions are present in the logs >>> >>> ./sstableloader /opt/analytics/analytics/ >>> Starting client (and waiting 30 seconds for gossip) ... >>> Streaming revelant part of /opt/analytics/analytics/chart-hd-104-Data.db >>> /opt/analytics/analytics/chart-hd-105-Data.db >>> /opt/analytics/analytics/chart-hd-106-Data.db >>> /opt/analytics/analytics/chart-hd-107-Data.db >>> /opt/analytics/analytics/chart-hd-108-Data.db to [/1x.xx.xx.xx5, >>> /1x.xx.xx.xx7, /1x.xx.xx.xx0, /1x.xx.xx.xx7, /1x.xx.xx.xx3, /1x.xx.xx.xx8, >>> /1x.xx.xx.xx7] >>> WARN 09:02:38,534 Unable to instantiate cache provider >>> org.apache.cassandra.cache.SerializingCacheProvider; using default >>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d >>> instead >>> WARN 09:02:38,549 Unable to instantiate cache provider >>> org.apache.cassandra.cache.SerializingCacheProvider; using default >>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider@5d59054d >>> instead >>> >>> >>> [….] >>> ERROR 09:02:38,614 Error in ThreadPoolExecutor >
Re: wild card on query
> I want to retrieve all the photos from all the users of certain project. My > sql like query will be "select projectid * photos from Users". How can i run > this kind of row key predicate while executing query on cassandra? You cannot / should not do that using the data model you have. (i.e. you could do it with a secondary index, but in this case you probably should not). Try to de-normalise your data. Say a CF called ProjectPhotos * row key is the project_id * column name is * column value is image_url or JSON data about the image. You would then slice some columns from one row in the ProjectPhotos CF. You then need to know what images a user has uploaded, with say the UserPhotos CF. * row key is user_id * column name is timestamp * column value is image_url or JSON data about the image. I did a twitter sample app at http://wdcnz.com a couple of weeks ago that shows denormalising data https://github.com/amorton/wdcnz-2012-site and http://www.slideshare.net/aaronmorton/hellow-world-cassandra Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 12:39 AM, Swathi Vikas wrote: > Hi, > > I am trying to run query on cassandra cluster with predicate on row key. > > I have column family called "Users" and rows with row key like > "projectid_userid_photos". Each user within a project can have rows like > projectid_userid_blog, projectid_userid_status and so on. > > I want to retrieve all the photos from all the users of certain project. My > sql like query will be "select projectid * photos from Users". How can i run > this kind of row key predicate while executing query on cassandra? > > Any sugesstion will help. > > Thank you, > swat.vikas >>> >>> >> >> >> > > >
Re: indexing question related to playOrm on github
>> I am not sure synchronization fixes thatŠŠIt would be kind of >> nice if the column <65> would not actually be removed until after >> all servers are eventually consistent... Not sure thats possible. You can either serialise updating your custom secondary index on the client site or resolve the inconsistency on read. Not sure this fits with your workload but as an e.g. when you read from the index, if you detect multiple row PK's resolve the issue on the client and leave the data in cassandra as is. Then queue a job that will read the row and try to repair it's index entries. When repairing the index entry play with the timestamp so any deletions you make only apply to the column as it was when you saw the error. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 12:47 AM, "Hiller, Dean" wrote: > Maybe this would be a special type of column family that could contain > these as my other tables definitely don't want the feature below by the > way. > > Dean > > On 8/16/12 6:29 AM, "Hiller, Dean" wrote: > >> Yes, the synch may work, and no, I do "not" want a transactionŠI want a >> different kind of eventually consistent >> >> That might work. >> Let's say server 1 sends a mutation (65 is the pk) >> Remove: <65> Add <65> >> Server 2 also sends a mutation (65 is the pk) >> Remove: <65> Add <65> >> >> What everyone does not want is to end up with a row that has <65> >> and <65>. With the wide row pattern, we would like to have ONE or >> the other. I am not sure synchronization fixes thatŠŠIt would be kind of >> nice if the column <65> would not actually be removed until after >> all servers are eventually consistent AND would keep a reference to the >> add that was happening so that when it goes to resolve eventually >> consistent between the servers, it would see that <65> is newer and >> it would decide to drop the first add completely. >> >> Ie. In a full process it might look like this >> Cassandra node 1 receives remove <65>, add <65> AND in the >> remove column stores info about the add <65> until eventual >> consistency is completed >> Cassandra node 2 one ms later receives remove <65> and <65> >> AND in the remove column stores info about the add <65> until >> eventual consistency is completed >> Eventual consistency starts comparing node 1 and node 2 and finds >> <65> is being removed by different servers and finds add info >> attached to that. ONLY THE LAST add info is acknowledged and it makes >> the row consistent across the cluster. >> >> That makes everyone's wide row indexing pattern tend to get less corrupt >> over time. >> >> Thanks, >> Dean >> >> >> From: aaron morton >> mailto:aa...@thelastpickle.com>> >> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> mailto:user@cassandra.apache.org>> >> Date: Wednesday, August 15, 2012 8:26 PM >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> mailto:user@cassandra.apache.org>> >> Subject: Re: indexing question related to playOrm on github >> >> 1. Can playOrm be listed on cassandra's list of ORMs? It supports a >> JQL/HQL query on a trillion rows in under 100ms (partitioning is the >> trick so you can JQL a partition) >> No sure if we have an ORM specific page. If it's a client then feel free >> to add it to http://wiki.apache.org/cassandra/ClientOptions >> >> I was wondering if cassandra has or will ever support eventual constancy >> where it keeps both the REMOVE AND the ADD together such until it is on >> all 3 replicated nodes and in resolving the consistency would end up with >> an index that only has the very last one in the index. >> Not sure I fully understand but it sounds like you want a transaction, >> which is not going to happen. >> >> Internally when Cassandra updates a secondary index it does the same >> thing. But it synchronises updates around the same row so one thread will >> apply the changes at a time. >> >> Hope that helps. >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 16/08/2012, at 12:34 PM, "Hiller, Dean" >> mailto:dean.hil...@nrel.gov>> wrote: >> >> 1. Can playOrm be listed on cassandra&
Re: nodetool repair uses insane amount of disk space
What version are using ? There were issues with repair using lots-o-space in 0.8.X, it's fixed in 1.X Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 2:56 AM, Michael Morris wrote: > Occasionally as I'm doing my regular anti-entropy repair I end up with a node > that uses an exceptional amount of disk space (node should have about 5-6 GB > of data on it, but ends up with 25+GB, and consumes the limited amount of > disk space I have available) > > How come a node would consume 5x its normal data size during the repair > process? > > My setup is kind of strange in that it's only about 80-100GB of data on a 35 > node cluster, with 2 data centers and 3 racks, however the rack assignments > are unbalanced. One data center has 8 nodes, and the other data center is > split into 2 racks with one rack of 9 nodes, and the other with 18 nodes. > However, within each rack, the tokens are distributed equally. It's a long > sad story about how we ended up this way, but it basically boils down to > having to utilize existing resources to resolve a production issue. > > Additionally, the repair process takes (what I feel is) an extremely long > time to complete (36+ hours), and it always seems that nodes are streaming > data to each other, even on back-to-back executions of the repair. > > Any help on these issues is appreciated. > > - Mike >
Re: Opscenter 2.1 vs 1.3
You may have better luck on the Data Stax forums http://www.datastax.com/support-forums/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 4:36 AM, Robin Verlangen wrote: > Hi there, > > I just upgraded to opscenter 2.1 (from 1.3). It appears that my writes have > tripled. Is this a change in the display/measuring of opscenter? > > > Best regards, > > Robin Verlangen > Software engineer > > W http://www.robinverlangen.nl > E ro...@us2.nl > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. >
Re: C++ Bulk loader and Result set streaming.
> But i couldn't find any information on bulk loading using C++ client > interface. You cannot. To bulk load data use the sstableloader, otherwise you need to use the RPC / CQL API. > 2) I want to retrieve all the result of the query(not just first 100 result > set) using C++ client. Is there any C++ supporting code or information on > streaming the result set into a file or something. I've not looked at the C++ client, but normally you use the last column returned as the start column for the next call. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 6:08 AM, Swathi Vikas wrote: > Hi All, > > I am using C++ client libQtCassandra. I have two questions. > > 1) I want to bulk load data into cassandra through C++ interface. It is > required by my group where i am doing internship. I could bulk load using > sstableloader as specified in Datastax > :http://www.datastax.com/dev/blog/bulk-loading. But i couldn't find any > information on bulk loading using C++ client interface. > > 2) I want to retrieve all the result of the query(not just first 100 result > set) using C++ client. Is there any C++ supporting code or information on > streaming the result set into a file or something. > > If anyone has any information please direct me where i can look into. > > Thank you very much, > Swat.vikas
Re: 'WHERE' with several indexed columns
> If I have a WHERE clause in CQL with several 'AND' and each column is > indexed, which index(es) is(are) used ? The most selective based on the average number of columns per row https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java > Also is index used only with an equality operator or also with greater equality Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 7:13 AM, A J wrote: > Hi > If I have a WHERE clause in CQL with several 'AND' and each column is > indexed, which index(es) is(are) used ? > Just the first field in the where clause or all the indexes involved > in the clause ? > > Also is index used only with an equality operator or also with greater > than /less than comparator as well ? > > Thanks.
Re: Why the StageManager thread pools have 60 seconds keepalive time?
That's some pretty old code. I would guess it was done that way to conserve resources. And _i think_ thread creation is pretty light weight. Jonathan / Brandon / others - opinions ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 8:09 AM, Guillermo Winkler wrote: > Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing from > the mutation pool. > > MutationStage:72031 > > Where threads get created and disposed in 100's batches every few minutes, > since it's a 16 core server concurrent_writes is set in 100 in the > cassandra.yaml. > > concurrent_writes: 100 > > I've seen in the StageManager class this pools get created with 60 seconds > keepalive time. > > DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true); > > StageManager-> public static final long KEEPALIVE = 60; // seconds to keep > "extra" threads alive for when idle > > Is it a reason for it to be this way? > > Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since > corePoolSize and maxPoolSize are set at the same size? > > Thanks, > Guille >
Re: nodetool repair uses insane amount of disk space
I would take a look at the replication: whats the RF per DC and what does nodetool ring say. It's hard (as in no recommended) to get NTS with rack allocation working correctly. Without know much more I would try to understand what the topology is and if it can be simplified. >> Additionally, the repair process takes (what I feel is) an extremely long >> time to complete (36+ hours), and it always seems that nodes are streaming >> data to each other, even on back-to-back executions of the repair. Run some metrics to clock the network IO during repair. Also run an experiment to repair a single CF twice from the same node and look at the logs for the second run. This will give us an idea of how much data is being transferred. Note that very wide rows can result in large repair transfers as the whole row is diff'd and transferred if needed. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 11:14 AM, Michael Morris wrote: > Upgraded to 1.1.3 from 1.0.8 about 2 weeks ago. > > On Thu, Aug 16, 2012 at 5:57 PM, aaron morton wrote: > What version are using ? There were issues with repair using lots-o-space in > 0.8.X, it's fixed in 1.X > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/08/2012, at 2:56 AM, Michael Morris wrote: > >> Occasionally as I'm doing my regular anti-entropy repair I end up with a >> node that uses an exceptional amount of disk space (node should have about >> 5-6 GB of data on it, but ends up with 25+GB, and consumes the limited >> amount of disk space I have available) >> >> How come a node would consume 5x its normal data size during the repair >> process? >> >> My setup is kind of strange in that it's only about 80-100GB of data on a 35 >> node cluster, with 2 data centers and 3 racks, however the rack assignments >> are unbalanced. One data center has 8 nodes, and the other data center is >> split into 2 racks with one rack of 9 nodes, and the other with 18 nodes. >> However, within each rack, the tokens are distributed equally. It's a long >> sad story about how we ended up this way, but it basically boils down to >> having to utilize existing resources to resolve a production issue. >> >> Additionally, the repair process takes (what I feel is) an extremely long >> time to complete (36+ hours), and it always seems that nodes are streaming >> data to each other, even on back-to-back executions of the repair. >> >> Any help on these issues is appreciated. >> >> - Mike >> > >
Re: Cassandra 1.0 row deletion
> If you use the remove function to delete an entire row, is that an atomic > operation? Yes. Row level deletes are atomic. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 3:39 PM, Derek Williams wrote: > On Thu, Aug 16, 2012 at 9:08 PM, Terry Cumaranatunge > wrote: > We have a Cassandra 1.0 cluster that we run with RF=3 and perform operations > using a consistency level of quorum. We use batch_mutate for all inserts and > updates for atomicity across column families with the same row key, but use > the thrift interface remove API call in C++ to delete a row so that we can > delete an entire row without having to specify individual column names. If > you use the remove function to delete an entire row, is that an atomic > operation? In other words, can it delete a partial number of columns in the > row and leave other columns around? > > It all depends on the timestamp for the column. A row level delete will place > a row tombstone at the timestamp given, causing all columns with an earlier > timestamp to be deleted. If a column has a later timestamp then the row > tombstone, then it wont be deleted. > > More info here: http://wiki.apache.org/cassandra/DistributedDeletes > > -- > Derek Williams >
Re: SSTable Index and Metadata - are they cached in RAM?
> 2) Rad from disk all row keys, in order to find one (binary search) No. At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval keys must be read from disk. > As I understand, in the worst case, we can have three disk seeks (2, 4, 6) > pro SSTable in order to check whenever it contains given column, it that > correct ? It depends on the size of the row. For a small (less than column_index_size_in_kb) size row it's to get a specific column it's : * 1 seek in index.db * 1 seek in data.db > I would expect, that sorted row keys (from point 2) ) already contain bloom > filter for their columns. But bloom filter is stored together with column > index, is that correct? Yes Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 7:31 PM, Maciej Miklas wrote: > Great articles, I did not find those before ! > > SSTable Index - yes I mean column Index. > > I would like to understand, how many disk seeks might be required to find > column in single SSTable. > > I am assuming positive bloom filter on row key. Now Cassandra needs to find > out whenever given SSTable contains column name, and this might require few > disk seeks: > 1) Check key cache, if found go to 5) > 2) Rad from disk all row keys, in order to find one (binary search) > 3) Found row key contains disk offset to its column index > 4) Read from disk column index for our row key. Index contains also bloom > filter on column names > 5) Use bloom filter on column name, to find out whenever this SSTable might > contain our column > 6) Read column to finally make sure that is exists > > As I understand, in the worst case, we can have three disk seeks (2, 4, 6) > pro SSTable in order to check whenever it contains given column, it that > correct ? > > I would expect, that sorted row keys (from point 2) ) already contain bloom > filter for their columns. But bloom filter is stored together with column > index, is that correct? > > > Cheers, > Maciej > > On Fri, Aug 17, 2012 at 12:06 AM, aaron morton > wrote: >> What about SSTable index, > Not sure what you are referring to there. Each row has a in a SStable has a > bloom filter and may have an index of columns. This is not cached. > > See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or > http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance > >> and Metadata? > > This is the meta data we hold in memory for every open sstable > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 16/08/2012, at 7:34 PM, Maciej Miklas wrote: > >> Hi all, >> >> bloom filter for row keys is always in RAM. What about SSTable index, and >> Metadata? >> >> Is it cached by Cassandra, or it relays on memory mapped files? >> >> >> Thanks, >> Maciej > >
Re: Omitting empty columns from CQL SELECT
If you specify the columns by name in the select clause the query returns them because they should be projected in the result set. Can you use a column slice instead ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 11:09 AM, Mat Brown wrote: > Hello all, > > I've noticed that when performing a SELECT statement with a list of > columns specified, Cassandra returns all columns in the resulting > row(s) even if they have no value. This creates an apparently > considerable amount of transport and deserialization overhead, > particularly in one use case I'm looking at, in which we select a > large collection of columns but expect only a small fraction of them > to contain values. Is there any way to get around this and only > receive columns that have values in the results? > > Thanks, > Mat
Re: What is the ideal server-side technology stack to use with Cassandra?
My stack: Java + JRuby + Rails + Torquebox I'm using the Hector client (arguably the most mature out there) and JRuby+RoR+Torquebox gives me a great development platform which really scales (full native thread support for example) and is extremely powerful. Honestly I expect, all my future RoR apps will be built on JRuby/Torquebox because I've been so happy with it even if I don't have a specific need to utilize Java libraries from inside the app. And the best part is that I've yet to have to write a single line of Java! :) On Fri, Aug 17, 2012 at 6:53 AM, Edward Capriolo wrote: > The best stack is the THC stack. :) > > Tomcat Hadoop Cassandra :) > > On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF > wrote: >> Hi, >> >> I've been running a number of tests with Cassandra using a couple of >> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and >> PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/), >> and the experience hasn't been great, mainly because I can't try out >> the CQL3. >> >> Aaron Morton (aa...@thelastpickle.com) advised: >> >> "If possible i would avoid using PHP. The PHP story with cassandra has >> not been great in the past. There is little love for it, so it takes a >> while for work changes to get in the client drivers. >> >> AFAIK it lacks server side states which makes connection pooling >> impossible. You should not pool cassandra connections in something >> like HAProxy." >> >> So my question is - if you were to build a new scalable project from >> scratch tomorrow sitting on top of Cassandra, which technologies would >> you select to serve HTTP requests to ensure you get: >> >> a) The best support from the cassandra community (e.g. timely updates >> of drivers, better stability) >> b) Optimal efficiency between webservers and cassandra cluster, in >> terms of the performance of individual requests and in the volumes of >> connections handled per second >> c) Ease of development and and deployment. >> >> What worked for you, and why? What didn't work for you? -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
Re: Omitting empty columns from CQL SELECT
> there a situation in which that behavior would be useful? guessing, makes life easier to client implementations and is consistent in the sense that when doing a slice by name the server is the entity that decides which columns are in the result set. I took a look at the performance of various query techniques here http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance and http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ . If you are selecting by name on wide rows you will eventually see latency increase. This simply has to do with the amount of data pages that must be read to satisfy the query. Note though that this is better in 1.X . See slide 61 in the first link and "In Motion - Name Locality" section in the second. Hope that helps. ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 10:07 PM, Mat Brown wrote: > Hi Aaron, > > Thanks for the answer. That makes sense and I can see it as a formal > reason for returning empty columns, but as a practical matter, is > there a situation in which that behavior would be useful? > > Unfortunately a column slice won't do the trick -- the columns we're > looking for at any given time wouldn't correspond to a particular > range; it's essentially "random access". > > For what it's worth, I've managed to make this operation about 30x > faster in a quick benchmark by just not selecting for specific columns > at all, and throwing away columns I don't care about in the > application layer instead. It's unclear whether the performance > improvements will continue to accrue as the column family becomes more > densely populated, though. > > Anyway, thanks again! > Mat > > On Fri, Aug 17, 2012 at 5:06 AM, aaron morton wrote: >> If you specify the columns by name in the select clause the query returns >> them because they should be projected in the result set. >> >> Can you use a column slice instead ? >> >> Cheers >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 17/08/2012, at 11:09 AM, Mat Brown wrote: >> >> Hello all, >> >> I've noticed that when performing a SELECT statement with a list of >> columns specified, Cassandra returns all columns in the resulting >> row(s) even if they have no value. This creates an apparently >> considerable amount of transport and deserialization overhead, >> particularly in one use case I'm looking at, in which we select a >> large collection of columns but expect only a small fraction of them >> to contain values. Is there any way to get around this and only >> receive columns that have values in the results? >> >> Thanks, >> Mat >> >>
Re: Bad Request: Duplicate index name
Can you provide: * the CF existing schema and output from nodetool cfstats * the command you are running * the error you get Also it's handy to know what version the schema was originally created in. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/08/2012, at 12:54 AM, Abhijit Chanda wrote: > Hi all, > > I am using a CF which is of 50 columns X 1M rows. As per my requirement > suddenly i have > to add a new column into the CF. After creating the new column when i tried > to index that > particular column, its reflecting bad request. > I have gone through this link > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Default-behavior-of-generate-index-name-for-columns-td6586428.html > But Still not able to figure out the root of the problem? I am using > Cassandra 1.0.10. > Any suggestions > > Thanks, > Abhijit >
Re: indexing question related to playOrm on github
Each column mutation (insert / update or delete) includes an int64 timestamp. Typically this is sent by the client, or in the case of CQL typically it is set by the coordinating server. When we have multiple values for a column we compare timestamps, the higher timestamp wins and deletes win if the timestamps match. (The final comparison is between the byte value of the columns). Sometimes you can game this system. If you want to delete a column with that has timestamp 1, send a delete for the column with timestamp 1. That way you delete will be ignored if someone else has re-written with timestamp 2. Remember I said "sometimes". Playing with timestamps often leads to questions such as "why did my inserts not work". Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/08/2012, at 1:30 AM, "Hiller, Dean" wrote: > I am not sure what you mean by play with the timestamp. I think this works > without playing with the timestamp(thanks for you help as it got me here). > > 1. On a scan I hit > 2. I end up looking up the pk > 3. I compare the value in the row with the indexed value "mike" but I see > the row with that pk has Sam not Mike > 4. I now know I can discard this result as a false positive. I also know my > index has duplicates. > 5. I kick off a job to scan the complete index now AND read in each pk row > of the index comparing indexed value with the actual value in the row to fix > the index. > > I think that might work pretty well. > > Thanks, > Dean > > From: aaron morton mailto:aa...@thelastpickle.com>> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Date: Thursday, August 16, 2012 4:55 PM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Subject: Re: indexing question related to playOrm on github > > I am not sure synchronization fixes thatŠŠIt would be kind of > nice if the column <65> would not actually be removed until after > all servers are eventually consistent... > Not sure thats possible. > > You can either serialise updating your custom secondary index on the client > site or resolve the inconsistency on read. > > Not sure this fits with your workload but as an e.g. when you read from the > index, if you detect multiple row PK's resolve the issue on the client and > leave the data in cassandra as is. Then queue a job that will read the row > and try to repair it's index entries. When repairing the index entry play > with the timestamp so any deletions you make only apply to the column as it > was when you saw the error. > > Hope that helps. > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/08/2012, at 12:47 AM, "Hiller, Dean" > mailto:dean.hil...@nrel.gov>> wrote: > > Maybe this would be a special type of column family that could contain > these as my other tables definitely don't want the feature below by the > way. > > Dean > > On 8/16/12 6:29 AM, "Hiller, Dean" > mailto:dean.hil...@nrel.gov>> wrote: > > Yes, the synch may work, and no, I do "not" want a transactionŠI want a > different kind of eventually consistent > > That might work. > Let's say server 1 sends a mutation (65 is the pk) > Remove: <65> Add <65> > Server 2 also sends a mutation (65 is the pk) > Remove: <65> Add <65> > > What everyone does not want is to end up with a row that has <65> > and <65>. With the wide row pattern, we would like to have ONE or > the other. I am not sure synchronization fixes thatŠŠIt would be kind of > nice if the column <65> would not actually be removed until after > all servers are eventually consistent AND would keep a reference to the > add that was happening so that when it goes to resolve eventually > consistent between the servers, it would see that <65> is newer and > it would decide to drop the first add completely. > > Ie. In a full process it might look like this > Cassandra node 1 receives remove <65>, add <65> AND in the > remove column stores info about the add <65> until eventual > consistency is completed > Cassandra node 2 one ms later receives remove <65> and <65> > AND in the remove column stores info about the add <65> until > eventual consistency is completed > Eventual consistency starts comparing node 1 and node 2 and finds > <65> is being removed by
Re: Why the StageManager thread pools have 60 seconds keepalive time?
Your seeing dropped mutations reported from nodetool tpstats ? Take a look at the logs. Look for messages from the MessagingService with the pattern "{} {} messages dropped in last {}ms" They will be followed by info about the TP stats. First would be the workload. Are you sending very big batch_mutate or multiget requests? Each row in the requests turns into a command in the appropriate thread pool. This can result in other requests waiting a long time for their commands to get processed. Next would be looking for GC and checking the memtable_flush_queue_size is set high enough (check yaml for docs). After that I would look at winding concurrent_writers (and I assume concurrent_readers) back. Anytime I see weirdness I look for config changes and see what happens when they are returned to the default or near default. Do you have 16 _physical_ cores? Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/08/2012, at 10:01 AM, Guillermo Winkler wrote: > Aaron, thanks for your answer. > > I'm actually tracking a problem where mutations get dropped and cfstats show > no activity whatsoever, I have 100 threads for the mutation pool, no running > or pending tasks, but some mutations get dropped none the less. > > I'm thinking about some scheduling problems but not really sure yet. > > Have you ever seen a case of dropped mutations with the system under light > load? > > Thanks, > Guille > > > On Thu, Aug 16, 2012 at 8:22 PM, aaron morton wrote: > That's some pretty old code. I would guess it was done that way to conserve > resources. And _i think_ thread creation is pretty light weight. > > Jonathan / Brandon / others - opinions ? > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/08/2012, at 8:09 AM, Guillermo Winkler wrote: > >> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing >> from the mutation pool. >> >> MutationStage:72031 >> >> Where threads get created and disposed in 100's batches every few minutes, >> since it's a 16 core server concurrent_writes is set in 100 in the >> cassandra.yaml. >> >> concurrent_writes: 100 >> >> I've seen in the StageManager class this pools get created with 60 seconds >> keepalive time. >> >> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true); >> >> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep >> "extra" threads alive for when idle >> >> Is it a reason for it to be this way? >> >> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since >> corePoolSize and maxPoolSize are set at the same size? >> >> Thanks, >> Guille >> > >
Re: What is the ideal server-side technology stack to use with Cassandra?
> Aaron Morton (aa...@thelastpickle.com) advised: > > "If possible i would avoid using PHP. The PHP story with cassandra has > not been great in the past. There is little love for it, so it takes a > while for work changes to get in the client drivers. > > AFAIK it lacks server side states which makes connection pooling > impossible. You should not pool cassandra connections in something > like HAProxy." Please note, this was a personal opinion expressed off list. It is not a judgement on the quality of PHPCassa or PDO-cassandra, neither of which I have used. My comments were mostly informed by past issues with Thrift and PHP. Aaron - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 10:09 PM, Andy Ballingall TF wrote: > Hi, > > I've been running a number of tests with Cassandra using a couple of > PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and > PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/), > and the experience hasn't been great, mainly because I can't try out > the CQL3. > > Aaron Morton (aa...@thelastpickle.com) advised: > > "If possible i would avoid using PHP. The PHP story with cassandra has > not been great in the past. There is little love for it, so it takes a > while for work changes to get in the client drivers. > > AFAIK it lacks server side states which makes connection pooling > impossible. You should not pool cassandra connections in something > like HAProxy." > > So my question is - if you were to build a new scalable project from > scratch tomorrow sitting on top of Cassandra, which technologies would > you select to serve HTTP requests to ensure you get: > > a) The best support from the cassandra community (e.g. timely updates > of drivers, better stability) > b) Optimal efficiency between webservers and cassandra cluster, in > terms of the performance of individual requests and in the volumes of > connections handled per second > c) Ease of development and and deployment. > > What worked for you, and why? What didn't work for you? > > > Thanks, > Andy > > > -- > Andy Ballingall > Senior Software Engineer > > The Foundry > 6th Floor, The Communications Building, > 48, Leicester Square, > London, WC2H 7LT, UK > Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906 > Web: http://www.thefoundry.co.uk/ > > The Foundry Visionmongers Ltd. > Registered in England and Wales No: 4642027
Re: Secondary index and/or row key in the read path ?
> - do we need to post-process (filter) the result of the query in our > application ? Thats the one :) Right now the code paths don't exist to select a row using a row key *and* apply a column level filter. The RPC API does not work that way and I'm not sure if this is something that is planned for CQL. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/08/2012, at 6:33 PM, Jean-Armel Luce wrote: > > Hello, > > I am using Cassandra 1.1.1 and CQL3. > > Could you tell me what is the best strategy for retrieving a row using a > condition on a row key (operator =) and also filter on a 2nd column? > > For example, I create a table named "testwhere" with a row key on column > "mykey" and 2 other columns "col1" and "col2". > > I would like to retrieve the row with the key 'key1' only if col1 = 'abcd' > I send the request SELECT mykey, col1 from testwhere where mykey = 'key1' > and col1 = 'abcd'; > As you can see, the 1st condition in the WHERE clause is based on the row key. > However the request doesn't work if no secondary index is created on the > column used in the 2nd condition of the WHERE clause. It works only if a > secondary indexed is created on this 2nd column (see below). > Does that mean that the secondary index is used in the read path instead of > the row key, even if there is a condition on the row key in the WHERE clause ? > > Here is an example : > > jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3 > Connected to Test Cluster at localhost:9160. > [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0] > Use HELP for help. > cqlsh> use test1; > cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY, > ... col1 varchar, > ... col2 varchar); > cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1', > 'abcd', 'efgh'); > > cqlsh:test1> SELECT mykey, col1 from testwhere where mykey = 'key1'; > mykey | col1 > ---+-- > key1 | abcd > > cqlsh:test1> SELECT mykey, col1 from testwhere where mykey = 'key1' and col1 > = 'abcd'; > Bad Request: No indexed columns present in by-columns clause with Equal > operator > > cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1); > cqlsh:test1> SELECT mykey, col1 from testwhere where mykey = 'key1' and col1 > = 'abcd'; > mykey | col1 > ---+-- > key1 | abcd > > cqlsh:test1> > > > My understanding is : > The 1st SELECT is working because there is only the row key in the WHERE > clause > The 2nd SELECT is not working because the row key is in the WHERE clause, but > there is no index on col1 > The 3rd SELECT (which is the same as the 2nd SELECT) is working because the > row key is in the WHERE clause, and a secondary index is created on col1 > > > For this use case, what are the recommendations of the Cassandra community ? > - do we need to create a secondary index for each column we want to filter ? > - do we need to post-process (filter) the result of the query in our > application ? > - or is there another solution ? > > > Thanks. > > Jean-Armel
Re: Cassandra with large number of columns per row
> I think the limit of the size per row in cassandra is 2G? That was a pre 0.7 restriction http://wiki.apache.org/cassandra/CassandraLimitations > and I insert 1 columns into a row, each column has a 1MB data. So a single row with 10GB of data. That's what we call a big one. > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in > `read': CassandraThrift::Cassandra::Client::TransportException >from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in > `read_all' I would expect to see a string description there as well. Check on the server side error logs. > this script crashed again, same error message. And cassandra process remain > in 100% cpu usage. Counting columns involves reading them, so you are asking cassandra to read 10GB of data. This will take a while. It's probably the size of the row that is causing problems. You can easily have rows with millions of columns (here is an experiment that uses 10MM cols in a row http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) In general you will want to avoid rows with more than say 32 or 64 MB of data. It's not a hard restriction but big rows cause issues and it's often easier to avoid them. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/08/2012, at 8:15 PM, Chuan-Heng Hsiao wrote: > I think the limit of the size per row in cassandra is 2G? > > 1 x 1M = 10G. > > Hsiao > > On Mon, Aug 20, 2012 at 1:07 PM, oupfevph wrote: > I setup cassandra with default configuration in clean AWS instance, and I > insert 1 columns into a row, each column has a 1MB data. I use this > ruby(version 1.9.3) script: > > 1.times do > key = rand(36**8).to_s(36) > value = rand(36**1024).to_s(36) * 1024 > Cas_client.insert(TestColumnFamily,TestRow,{key=>value}) > end > > every time I run this script, it will crash: > > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/socket.rb:109:in > `read': CassandraThrift::Cassandra::Client::TransportException > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/base_transport.rb:87:in > `read_all' > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:104:in > `read_frame' > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/transport/framed_transport.rb:69:in > `read_into_buffer' > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in > `read_message_begin' > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift-0.8.0/lib/thrift/client.rb:45:in > `receive_message' > from > /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:251:in > `recv_batch_mutate' > from > /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/vendor/0.8/gen-rb/cassandra.rb:243:in > `batch_mutate' > from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:150:in > `handled_proxy'from > /usr/local/lib/ruby/gems/1.9.1/gems/thrift_client-0.8.1/lib/thrift_client/abstract_thrift_client.rb:60:in > `batch_mutate' > from > /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/protocol.rb:7:in > `_mutate' > from > /usr/local/lib/ruby/gems/1.9.1/gems/cassandra-0.15.0/lib/cassandra/cassandra.rb:463:in > `insert' > from a.rb:6:in `block in ' > from a.rb:3:in `times' > from a.rb:3:in `' > > yet cassandra performs normally, then I run another ruby script to get how > many columns I have inserted: > > p cas_client.count_columns(TestColumnFamily,TestRow) > > this script crashed again, same error message. And cassandra process remain > in 100% cpu usage. > > > AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu) > cassandra-1.1.2 > ruby-1.9.3-p194 > jdk-7u6-linux-x64 > ruby-gems: > cassandra (0.15.0) > thrift (0.8.0) > thrift_client (0.8.1) > > What is the problem? > >
Re: Best strategy to increase cluster size and keep nodes balanced
Unless you really need to consider moving to 6, it will be easier. That said, if you want to get to 7 I would: * bring the new nodes in with tokens selected for 7. * move the old nodes to new 7-node tokens. * cleanup on the old nodes There is a way to expedite things by copying files around, _but_ if this is the first time you have grown the ring I would do it the normal documented way. I would make the changes one node at a time. There are a couple of reasons, but here are two: * if something goes wrong the impact will be less and the cleanup will be less. IMHO With a cluster of machines it's a good idea to make changes one at a time, so that if / when things go wrong you get the benefit of the other machines working. * bootstrapping new nodes will have some performance impact on the existing nodes. With fewer existing nodes it will have a larger impact. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/08/2012, at 12:57 AM, Filippo Diotalevi wrote: > What's the best strategy to increase the cluster size from 3 nodes to 7 > nodes, while keeping the nodes balanced? > The datastax documentation at > http://www.datastax.com/docs/1.0/operations/cluster_management seems to > suggest that it's best practice to add one node at a time. Is it the only > approach? > > Can we add 4 new nodes, reset the tokens, and expect the cluster to rebalance > correctly? > > Thanks, > -- > Filippo Diotalevi > >