Re: Commit log + Data directory on same partition (software raid)

2012-08-11 Thread Thibaut Britz
Unfortunately ssd drives are no option at the moment. I have to use 2
regular hds. Has anyone tried above scenario?

THanks,
Thibaut


On Fri, Aug 10, 2012 at 3:30 PM, Radim Kolar  wrote:

>
>  I was thinking about putting both the commit log and the data directory
>> on a software raid partition spanning over the two disks. Would this
>> increase the general read performance? In theory I could get twice the read
>> performance, but I don't know how the commit log will influence the read
>> performance on both disks?
>>
> zfs + ssd cache is best. get freebsd 8.3 and install cassandra from ports.
>
>


Query for last (composite) columns

2012-08-11 Thread Ersin Er
Hi,

I am new to Cassandra and trying to understand whether it's a good fit for
my problems. So here is a case from my domain:

Assume that we're storing session events of users in composite columns
within a column family partitioned by user id. This is from an example
given about composite columns in the following page:
https://github.com/Netflix/astyanax/wiki/Examples

What I would like to query for is that last session events of each user.
(So it's like a group-by query.) Can I get this information in a single
query and would it be an efficient way to do it (regarding the schema or
the query)? A CQL 3 solution would be great if possible. Any other
suggestions are also welcome.

Regards,

-- 
Ersin Er


Re: Commit log + Data directory on same partition (software raid)

2012-08-11 Thread Tom Duffield
Having the both the commit log and data directory on the same volume is
generally not recommended. You would actually see a performance decrease
unless you can have most your reads be cache hits.

On Friday, August 10, 2012, Thibaut Britz wrote:

> Hi,
>
> Has anyone of you made some experience with software raid (raid 1,
> mirroring 2 disks)?
>
> Our workload is rather read based at the moment (Commit Log directory only
> grows by 128MB every 2-3 minutes), while the second hd is under high load
> due to the read requests to our cassandra cluster.
>
> I was thinking about putting both the commit log and the data directory on
> a software raid partition spanning over the two disks. Would this increase
> the general read performance? In theory I could get twice the read
> performance, but I don't know how the commit log will influence the read
> performance on both disks?
>
> Thanks,
> Thibaut
>
>


Cassandra OOM crash while mapping commitlog

2012-08-11 Thread Robin Verlangen
Hi there,

I currently see Cassandra crash every couple of days. I run a 3 node
cluster on version 1.1.2. Does anyone have a clue why it crashes? I
couldn't find it as fix in a newer release. Is this an actual bug or did I
do something wrong?

Thank you in advance for your time.

Last 100 log lines before crash:

* INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307)
Completed flushing
/var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db
(10778171 bytes) for commitlog position
ReplayPosition(segmentId=2831860362157183, position=89962041)*
* INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
ColumnFamily='wordevents') (estimated 74393593 bytes)*
* INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java
(line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593
serialized/live bytes, 227279 ops)*
* INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266)
Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live
bytes, 227279 ops)*
* INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307)
Completed flushing
/var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db
(11800327 bytes) for commitlog position
ReplayPosition(segmentId=2831860362157183, position=116934579)*
* INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213)
CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is
6.158919689235077 (just-counted was 4.408341190092955).  calculation took
100ms for 16409 columns*
* INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875 AutoSavingCache.java
(line 262) Saved KeyCache (38164 items) in 70 ms*
* INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
ColumnFamily='wordevents') (estimated 74346493 bytes)*
* INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java
(line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493
serialized/live bytes, 226341 ops)*
* INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266)
Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live
bytes, 226341 ops)*
* INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307)
Completed flushing
/var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db
(11796722 bytes) for commitlog position
ReplayPosition(segmentId=2838466681767183, position=67094743)*
* WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197)
setting live ratio to minimum of 1.0 instead of 0.45760196307363504*
* INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213)
CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is
1.0421914932457101 (just-counted was 1.0).  calculation took 2ms for 175
columns*
* INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213)
CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is
4.067582667928898 (just-counted was 4.031462910772899).  calculation took
711ms for 169224 columns*
* INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='OpsCenter',
ColumnFamily='pdps') (estimated 74395427 bytes)*
* INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java
(line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427
serialized/live bytes, 214478 ops)*
* INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266)
Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes,
214478 ops)*
* INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307)
Completed flushing
/var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db
(6928124 bytes) for commitlog position
ReplayPosition(segmentId=2838466681767183, position=117115966)*
* INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213)
CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is
14.460953759840738 (just-counted was 14.460953759840738).  calculation took
28ms for 878 columns*
* INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java (line
62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
ColumnFamily='wordevents') (estimated 74974061 bytes)*
* INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java
(line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061
serialized/live bytes, 228878 ops)*
* INFO [FlushWriter:43] 2012-08-11 15:25:41,367 Memtable.java (line 266)
Writing Memtable-wordevents@24703812(23062288/74974061 serialized/live
bytes, 228878 ops)*
* INFO [FlushWriter:43] 2012-08-11 15:25:42,144 Memtable.java (line 307)
Completed flushing
/var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-160-Data.db
(11891766 bytes) for commitlog position
Rep

Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
So how does that work?  An sstable is for a single CF, but it can and
likely will have multiple rows.  There is no read to write and as I
understand it, writes are append operations.

So if you have an sstable with say 26 different rows (A-Z) already in
it with a bunch of columns and you add a new column to row J, how does
Cassandra store the column/value pair on disk in a way to refer to row
J without re-writing the row key or some representation of it?

Thanks,
Aaron

On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen
 wrote:
> Rowkey is stored only once in any sstable file.
>
> That is, in the spesial case where you get sstable file per column/value, you 
> are correct, but normally, I guess most of us are storing more per key.
>
> Regards,
> Terje
>
> On 11 Aug 2012, at 10:34, Aaron Turner  wrote:
>
>> Curious, but does cassandra store the rowkey along with every
>> column/value pair on disk (pre-compaction) like Hbase does?  If so
>> (which makes the most sense), I assume that's something that is
>> optimized during compaction?
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/ Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
>> Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>>-- Benjamin Franklin
>> "carpe diem quam minimum credula postero"



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: quick question about data layout on disk

2012-08-11 Thread Russell Haering
Your update doesn't go directly to an sstable (which are immutable),
it is first merged to an in-memory table. Eventually the memtable is
flushed to a new sstable.

See http://wiki.apache.org/cassandra/MemtableSSTable

On Sat, Aug 11, 2012 at 11:03 AM, Aaron Turner  wrote:
> So how does that work?  An sstable is for a single CF, but it can and
> likely will have multiple rows.  There is no read to write and as I
> understand it, writes are append operations.
>
> So if you have an sstable with say 26 different rows (A-Z) already in
> it with a bunch of columns and you add a new column to row J, how does
> Cassandra store the column/value pair on disk in a way to refer to row
> J without re-writing the row key or some representation of it?
>
> Thanks,
> Aaron
>
> On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen
>  wrote:
>> Rowkey is stored only once in any sstable file.
>>
>> That is, in the spesial case where you get sstable file per column/value, 
>> you are correct, but normally, I guess most of us are storing more per key.
>>
>> Regards,
>> Terje
>>
>> On 11 Aug 2012, at 10:34, Aaron Turner  wrote:
>>
>>> Curious, but does cassandra store the rowkey along with every
>>> column/value pair on disk (pre-compaction) like Hbase does?  If so
>>> (which makes the most sense), I assume that's something that is
>>> optimized during compaction?
>>>
>>>
>>> --
>>> Aaron Turner
>>> http://synfin.net/ Twitter: @synfinatic
>>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
>>> Windows
>>> Those who would give up essential Liberty, to purchase a little temporary
>>> Safety, deserve neither Liberty nor Safety.
>>>-- Benjamin Franklin
>>> "carpe diem quam minimum credula postero"
>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"


Re: quick question about data layout on disk

2012-08-11 Thread Edward Capriolo
Aaron,

I have not deep dived the data files in a while but this is how I understand it.

http://wiki.apache.org/cassandra/ArchitectureSSTable

There is no need to store the row key each time with the column.
RowKey to columns is a one to many relationship. This would be a
diagram of a physical file:

Hbase does it like this i guess (Or it used to I am not up on the news):

rowkey1,column1,value1
rowkey1,column2,value2

First, I believe they repeat the row key for each column, which is not
a huge deal because you should always use compression but it is a bit
wasteful especially for a non-compressed table.

I know this has some impact on very wide rows because a single rowkey
must fit inside this structure of an hfile. ? (again its been a while)

But to get back to your question. In cassandra:

sstable1
rowkey1: numberof columns 26
(column1,value1,ts1)

(column26,value26,ts26)

sstable2
rowkey1: numberof columns 1
(column1,value1,ts2)

The rowkey appears once in a given sstable if the row has 1 or more
columns in the sstable.

On the read path Cassandra searches all sstables find all the columns
for a row (bloom filters and other criteria eliminate some sstables
from read path). It then merges the row factoring in tombstones and
the last update win rules for a column.



On Sat, Aug 11, 2012 at 2:03 PM, Aaron Turner  wrote:
> So how does that work?  An sstable is for a single CF, but it can and
> likely will have multiple rows.  There is no read to write and as I
> understand it, writes are append operations.
>
> So if you have an sstable with say 26 different rows (A-Z) already in
> it with a bunch of columns and you add a new column to row J, how does
> Cassandra store the column/value pair on disk in a way to refer to row
> J without re-writing the row key or some representation of it?
>
> Thanks,
> Aaron
>
> On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen
>  wrote:
>> Rowkey is stored only once in any sstable file.
>>
>> That is, in the spesial case where you get sstable file per column/value, 
>> you are correct, but normally, I guess most of us are storing more per key.
>>
>> Regards,
>> Terje
>>
>> On 11 Aug 2012, at 10:34, Aaron Turner  wrote:
>>
>>> Curious, but does cassandra store the rowkey along with every
>>> column/value pair on disk (pre-compaction) like Hbase does?  If so
>>> (which makes the most sense), I assume that's something that is
>>> optimized during compaction?
>>>
>>>
>>> --
>>> Aaron Turner
>>> http://synfin.net/ Twitter: @synfinatic
>>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
>>> Windows
>>> Those who would give up essential Liberty, to purchase a little temporary
>>> Safety, deserve neither Liberty nor Safety.
>>>-- Benjamin Franklin
>>> "carpe diem quam minimum credula postero"
>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"


Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
Thanks Russell, that's the info I was looking for!

On Sat, Aug 11, 2012 at 11:23 AM, Russell Haering
 wrote:
> Your update doesn't go directly to an sstable (which are immutable),
> it is first merged to an in-memory table. Eventually the memtable is
> flushed to a new sstable.
>
> See http://wiki.apache.org/cassandra/MemtableSSTable
>
> On Sat, Aug 11, 2012 at 11:03 AM, Aaron Turner  wrote:
>> So how does that work?  An sstable is for a single CF, but it can and
>> likely will have multiple rows.  There is no read to write and as I
>> understand it, writes are append operations.
>>
>> So if you have an sstable with say 26 different rows (A-Z) already in
>> it with a bunch of columns and you add a new column to row J, how does
>> Cassandra store the column/value pair on disk in a way to refer to row
>> J without re-writing the row key or some representation of it?
>>
>> Thanks,
>> Aaron
>>
>> On Fri, Aug 10, 2012 at 7:53 PM, Terje Marthinussen
>>  wrote:
>>> Rowkey is stored only once in any sstable file.
>>>
>>> That is, in the spesial case where you get sstable file per column/value, 
>>> you are correct, but normally, I guess most of us are storing more per key.
>>>
>>> Regards,
>>> Terje
>>>
>>> On 11 Aug 2012, at 10:34, Aaron Turner  wrote:
>>>
 Curious, but does cassandra store the rowkey along with every
 column/value pair on disk (pre-compaction) like Hbase does?  If so
 (which makes the most sense), I assume that's something that is
 optimized during compaction?


 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
 "carpe diem quam minimum credula postero"
>>
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/ Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
>> Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>> -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: anyone have any performance numbers? and here are some perf numbers of my own...

2012-08-11 Thread Tyler Hobbs
One node can typically handle 30k+ inserts per second, so you should be
able to insert the 9 million rows in about 5 minutes with a single node
cluster.  My guess is that you're inserting with a single thread, which
means you're bound by network latency.  Try using 100 threads, or better,
just use the stress tool that comes with Cassandra:
http://www.datastax.com/docs/1.0/references/stress_java

On Fri, Aug 10, 2012 at 5:02 PM, Hiller, Dean  wrote:

> Ignore the third one, my math was badŠworked out to 733 bytes / row and it
> ended up being 6.6 gig as it compacted it some after it was done when the
> load was light(noticed that a bit later)
>
> But what about the other two?  Is that the time is expected approximately?
>
> Thanks,
> Dean
>
> On 8/10/12 3:50 PM, "Hiller, Dean"  wrote:
>
> >** 3. In my test below, I see there is now 8Gig of data and 9,000,000
> >rows.  Does that sound right?,  nearly 1MB of space is used per row for a
> >50 column row  That sounds like a huge amount of overhead. (my values
> >are long on every column, but that is still not much).  I was expecting
> >KB / row maybe, but MB / row?  My column names are "col"+I as well so
> >they are very short too.
> >
> >A common configuration is 1T drives per node, so I was wondering if
> >anyone ran any tests with map/reduce on reading in all those rows(not
> >doing anything with it, just reading it in).
> >
> >** 1. How long does it take to go through the 500MB that would be on
> >that node?
> >
> >I ran some tests on just writing a fake table in 50 columns wide and am
> >seeing it will take about 31 hours to write 500MB of information (a node
> >is about full at 500MB since need to reserve 50-30% space for compaction
> >and such).  Ie. If I need to rerun any kind of indexing, it will take 31
> >hoursŠdoes this sound about normal/ballpark?  Obviously many nodes will
> >be below so that would be worst case with 1 T drives.
> >
> >** 2. Anyone have any other data?
> >
> >Thanks,
> >Dean
>
>


-- 
Tyler Hobbs
DataStax 


Re: Cassandra OOM crash while mapping commitlog

2012-08-11 Thread Tyler Hobbs
We've seen something similar when running on a 32bit JVM, so make sure
you're using the latest 64bit Java 6 JVM.

On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen  wrote:

> Hi there,
>
> I currently see Cassandra crash every couple of days. I run a 3 node
> cluster on version 1.1.2. Does anyone have a clue why it crashes? I
> couldn't find it as fix in a newer release. Is this an actual bug or did I
> do something wrong?
>
> Thank you in advance for your time.
>
> Last 100 log lines before crash:
>
> * INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db
> (10778171 bytes) for commitlog position
> ReplayPosition(segmentId=2831860362157183, position=89962041)*
> * INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java
> (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
> ColumnFamily='wordevents') (estimated 74393593 bytes)*
> * INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java
> (line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593
> serialized/live bytes, 227279 ops)*
> * INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266)
> Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live
> bytes, 227279 ops)*
> * INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db
> (11800327 bytes) for commitlog position
> ReplayPosition(segmentId=2831860362157183, position=116934579)*
> * INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213)
> CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is
> 6.158919689235077 (just-counted was 4.408341190092955).  calculation took
> 100ms for 16409 columns*
> * INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875
> AutoSavingCache.java (line 262) Saved KeyCache (38164 items) in 70 ms*
> * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java
> (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
> ColumnFamily='wordevents') (estimated 74346493 bytes)*
> * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java
> (line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493
> serialized/live bytes, 226341 ops)*
> * INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266)
> Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live
> bytes, 226341 ops)*
> * INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db
> (11796722 bytes) for commitlog position
> ReplayPosition(segmentId=2838466681767183, position=67094743)*
> * WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197)
> setting live ratio to minimum of 1.0 instead of 0.45760196307363504*
> * INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213)
> CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is
> 1.0421914932457101 (just-counted was 1.0).  calculation took 2ms for 175
> columns*
> * INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213)
> CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is
> 4.067582667928898 (just-counted was 4.031462910772899).  calculation took
> 711ms for 169224 columns*
> * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java
> (line 62) flushing high-traffic column family CFS(Keyspace='OpsCenter',
> ColumnFamily='pdps') (estimated 74395427 bytes)*
> * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java
> (line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427
> serialized/live bytes, 214478 ops)*
> * INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266)
> Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes,
> 214478 ops)*
> * INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307)
> Completed flushing
> /var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db
> (6928124 bytes) for commitlog position
> ReplayPosition(segmentId=2838466681767183, position=117115966)*
> * INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213)
> CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is
> 14.460953759840738 (just-counted was 14.460953759840738).  calculation took
> 28ms for 878 columns*
> * INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java
> (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
> ColumnFamily='wordevents') (estimated 74974061 bytes)*
> * INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java
> (line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061
> serialized/live bytes, 228878 ops)*
> * INFO [FlushWriter:43] 2012-08-11 15:25:41,367

Re: Problem with version 1.1.3

2012-08-11 Thread Tyler Hobbs
On Fri, Aug 10, 2012 at 4:29 PM, Dwight Smith
wrote:

> Further info – it seems I had the seeds list backwards – it did not need
> both nodes – I have corrected that with each pointing to the other as a
> single seed entry – and it works fine.


This might have worked by accident, but in general, you want to use exactly
the same seed list for every node.

-- 
Tyler Hobbs
DataStax 


Re: Project Management

2012-08-11 Thread Tyler Hobbs
On Tue, Aug 7, 2012 at 2:32 AM, Baskar Sikkayan wrote:

>
> If i create one more column family based on my query instead of going with
> secondary index,
> Will it affect the write performance?
>

It won't affect writes much more than the built-in secondary indexes would,
and you'll get better read performance.


> Since i need to duplicate the data in the second column family as well
> while writing data, Will it hit write performance?


Same answer.

-- 
Tyler Hobbs
DataStax 


Re: Assume Keys in cqlsh?

2012-08-11 Thread Tyler Hobbs
As far as I know, "assume" isn't a CQL feature, it's only part of
cassandra-cli.

On Tue, Aug 7, 2012 at 10:16 PM, Jason Hill  wrote:

> Hello,
>
> I'm using:
>
> [cqlsh 2.0.0 | Cassandra 1.0.10 | CQL spec 2.0.0 | Thrift protocol 19.20.0]
>
> I have a column family with a key that is a blob so I query it like this:
>
> SELECT FIRST 10 1..1344385804 FROM  WHERE KEY =
> '436170616369747943616c63756c61746f727c33';
>
> Is there any way to avoid the hex I'm using for the key?
>
> I tried the following
>
> ASSUME  KEYS ARE text;
>
> but it gave this error:
>
> Improper assume command.
>
>
> I'm thinking I've missed something here and hope a kind soul would
> point me to a solution.
>
> Cheers,
> Jason
>



-- 
Tyler Hobbs
DataStax 


Re: Syncing nodes + Cassandra Data Availability

2012-08-11 Thread Tyler Hobbs
On Wed, Aug 8, 2012 at 8:58 PM, Ben Kaehne  wrote:

>
>
> Our application runs on a 3 node cassandra cluster with RF of 3.
>
> We use quorum operations against this cluster in hopes of garunteeing
> consistency.
>
> One scenario in which an issue can occur here is:
> Out of our 3 nodes, only 2 are up.
> We perform a write to say, a new key.
> The down node is started again, at the same time, a different node is
> brought offline.
> At this point. The data we have written above is on one node, but not the
> other online node. Meaning quorum reads will fail.
>

So only one of the three nodes are up?  The data should be written to two
nodes (since your quorum write succeeded), one node that is up, and one
that is down.


>
> Surely other people have encountered such issue before.
>
> We disabled hinted handoffs originally as to not have to worry about race
> conditions of disk space on servers filling up due to piling up handoffs.
> Although perhaps this may somewhat aid the situation (although from what I
> read, it does not completely remedy the circumstance).
>

Hints stop being stored after a node has been down for a while (I believe
the default is 1 hour, but it's configurable through cassandra.yaml), so
you shouldn't have to worry about running out of disk space.  Hinted
handoff is definitely the fastest way to restore consistency, and it will
catch almost all cases in Cassandra 1.1 and later.

-- 
Tyler Hobbs
DataStax 


Re: cassandra unable to start after upgrading to 1.1

2012-08-11 Thread Tyler Hobbs
Usually when you're using the packaged installations, you want to start
cassandra with:

sudo sevice cassandra start

On Thu, Aug 9, 2012 at 4:18 AM, Ahmed Ababne  wrote:

> Hi
>
> I am running 12.04 Ubuntu, and had cassandra ubuntu packaged installation.
>
> I have just upgraded cassandra from 1.0 to 1.1.
>
> I followed the step provided by this link
> http://www.datastax.com/docs/1.1/install/upgrading
>
> Anyway, when I try to start cassandra using the command: sudo cassandra -f
> It returns the following error without starting cassandra:
>
> xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M
> -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
>
> I would be thankful, if anybody can give me an idea on what to do to
> successfully start cassandra?
>
> Thanks
>
>
>
>
>
>


-- 
Tyler Hobbs
DataStax 


Re: Key order check in sstable2json

2012-08-11 Thread Tyler Hobbs
Sounds like bad behavior.  Can you open a JIRA ticket for that (once jira
is back up :) ?

On Thu, Aug 9, 2012 at 9:14 AM, Mat Brown  wrote:

> Hello,
>
> We've noticed that when passing multiple -k arguments to the
> sstable2json utility, we pretty much always get an IOException with
> "Key out of order!". Looking at this:
>
> https://github.com/apache/cassandra/blob/cassandra-1.0.10/src/java/org/apache/cassandra/tools/SSTableExport.java#L241
> it looks like it's iterating over the keys in the order given, and
> then enforcing partitioner ordering of the keys. Is that correct? If
> so, why? The original patch states that the key ordering check is to
> detect corrupt sstables, and it does provide that benefit
> where the check runs in the enumerateKeys method, but I don't see
> any advantage to enforcing key order in the export method, other than
> I suppose making scanning for the next key maximally efficient.
>
> Anyway, with the current situation, it seems that the only way to pass
> multiple key arguments to sstable2json would be to use sstablekeys
> first to get the key order, grep for the keys I'm interested in, and
> then pass those in order to sstable2json? Is this worth it, or would
> it be comparably efficient to just call sstable2json on one key at a
> time?
>
> Thanks,
> Mat
>



-- 
Tyler Hobbs
DataStax 


Re: Cassandra commitlog directory size increase on every restart - Cassandra 1.1.0

2012-08-11 Thread Tyler Hobbs
There have been some commitlog-related fixes in later versions of 1.1, so
it's worth trying an upgrade. If that doesn't resolve the issue, open a
JIRA ticket with these details.

On Thu, Aug 9, 2012 at 9:15 AM, Kasun Weranga  wrote:

> Any idea on how to fix this?
>
> Thanks,
> Kasun
>
>
> On Wed, Aug 8, 2012 at 11:56 AM, Kasun Weranga  wrote:
>
>> Hi all,
>>
>> I am facing the above issue in Cassandra 1.1.0, it will add 134.2MB
>> commitlog file in every restart, but it never delete it. We can't control
>> the commitlog dir size even
>> by explicitly setting commitlog_total_space_in_mb in  cassandra.yaml.
>> I set commitlog_total_space_in_mb as 512 in cassandra.yaml and did some
>> testing, now my commitlog directory size reached 1.1 GB.
>>
>> Also I turned on debug log for CommitLog, This is what I get when server
>> starts.
>>
>>
>> [2012-08-08 11:25:30,860]  INFO
>> {org.apache.cassandra.config.DatabaseDescriptor} -  Global memtable
>> threshold is enabled at 151MB
>> [2012-08-08 11:25:31,427]  INFO
>> {org.apache.cassandra.service.CacheService} -  Initializing key cache with
>> capacity of 12 MBs.
>> [2012-08-08 11:25:31,443]  INFO
>> {org.apache.cassandra.service.CacheService} -  Scheduling key cache save to
>> each 14400 seconds (going to save all keys).
>> [2012-08-08 11:25:31,445]  INFO
>> {org.apache.cassandra.service.CacheService} -  Initializing row cache with
>> capacity of 0 MBs and provider
>> org.apache.cassandra.cache.SerializingCacheProvider
>> [2012-08-08 11:25:31,448]  INFO
>> {org.apache.cassandra.service.CacheService} -  Scheduling row cache save to
>> each 0 seconds (going to save all keys).
>> [2012-08-08 11:25:31,677]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-6
>> (1644 bytes)
>> [2012-08-08 11:25:31,677]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-5
>> (5967 bytes)
>> [2012-08-08 11:25:31,784]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/Versions/system-Versions-hc-5
>> (247 bytes)
>> [2012-08-08 11:25:31,789]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/Versions/system-Versions-hc-6
>> (247 bytes)
>> [2012-08-08 11:25:31,887]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-hc-1
>> (242 bytes)
>> [2012-08-08 11:25:31,897]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-hc-2
>> (244 bytes)
>> [2012-08-08 11:25:31,935]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/LocationInfo/system-LocationInfo-hc-14
>> (80 bytes)
>> [2012-08-08 11:25:31,935]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/LocationInfo/system-LocationInfo-hc-13
>> (346 bytes)
>> [2012-08-08 11:25:31,935]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/system/LocationInfo/system-LocationInfo-hc-15
>> (163 bytes)
>> [2012-08-08 11:25:32,980]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/EVENT_KS/org_wso2_sample_httpd_logs/EVENT_KS-org_wso2_sample_httpd_logs-hc-2
>> (312328 bytes)
>> [2012-08-08 11:25:32,980]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/EVENT_KS/org_wso2_sample_httpd_logs/EVENT_KS-org_wso2_sample_httpd_logs-hc-3
>> (312992 bytes)
>> [2012-08-08 11:25:32,984]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/EVENT_KS/org_wso2_sample_httpd_logs/EVENT_KS-org_wso2_sample_httpd_logs-hc-1
>> (647552 bytes)
>> [2012-08-08 11:25:33,183]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/EVENT_KS/bam_service_data_publisher/EVENT_KS-bam_service_data_publisher-hc-1
>> (249143 bytes)
>> [2012-08-08 11:25:33,266]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/META_KS/STREAM_DEFINITION_ID_TO_KEY/META_KS-STREAM_DEFINITION_ID_TO_KEY-hc-1
>> (152 bytes)
>> [2012-08-08 11:25:33,266]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/META_KS/STREAM_DEFINITION_ID_TO_KEY/META_KS-STREAM_DEFINITION_ID_TO_KEY-hc-2
>> (150 bytes)
>> [2012-08-08 11:25:33,362]  INFO
>> {org.apache.cassandra.io.sstable.SSTableReader} -  Opening
>> ./repository/database/cassandra/data/META_KS/STREAM_DEFINI

Re: Thrift batch_mutate erase previous data?

2012-08-11 Thread Tyler Hobbs
On Thu, Aug 9, 2012 at 10:43 AM, Cyril Auburtin wrote:

> It seems the Thrift method *batch-mutate*, with Mutations, will not
> update the previous data with the mutation given, but clear and replace by
> it? right?
>

I'm not sure what you're asking.  Writes in Cassandra are always blind
overwrites, there's not a concept of clearing or replacing.

-- 
Tyler Hobbs
DataStax 


Re: Physical storage of rowkey

2012-08-11 Thread Tyler Hobbs
Yes, if you're using RandomPartitioner.  The hash is md5.

On Thu, Aug 9, 2012 at 1:29 PM, A J  wrote:

> Are row key hashed before being physically stored in Cassandra ? If
> so, what hash function is used to ensure collision is minimal.
>
> Thanks.
>



-- 
Tyler Hobbs
DataStax 


Re: triggering the assertion at the start of ColumnFamilyStore.getRangeSlice

2012-08-11 Thread Tyler Hobbs
You can use something like the maven shade plugin to use both of the
libthrift jars.

On Thu, Aug 9, 2012 at 3:57 PM, Jose Flexa  wrote:

> Hi.
> I´ve avoided the issue by disabling assertions (-da).  Any suggestions
> on a better strategy?
>
> Thanks
> José
>
> On Thu, Aug 9, 2012 at 5:29 PM, Jose Flexa  wrote:
> > Hi,
> >
> > I am triggering the assertion at the start of
> > ColumnFamilyStore.getRangeSlice when setting a SlicePredicate with
> > sliceRange.setStart(new byte[0]), sliceRange.setFinish(new byte[0]);
> > sliceRange.setCount(Integer.MAX_VALUE); in a AWS EMR job flow. AWS EMR
> > uses libthrift 0.7.0 but my cassandra cluster uses libthrift 0.6.0
> > (cassandra 1.0.8).
> >
> > Thanks
> > José
>



-- 
Tyler Hobbs
DataStax 


Re: problem of inserting columns of a great amount

2012-08-11 Thread Tyler Hobbs
There is a fair amount of overhead in the Thrift structures for columns and
mutations, so that's a pretty large mutation.

In general, you'll see better performance inserting many small batch
mutations in parallel.

On Fri, Aug 10, 2012 at 2:04 AM, Jin Lei  wrote:

> Sorry, something is wrong with my previous problem description. The fact
> is that the cassandra deny my requests when I try to insert 50k rows
> (rather than 50k columns) into a column family at one time. Each row with 1
> column.
>
> 2012/8/10 Jin Lei 
>
>> Hello everyone,
>> I'm a novice to cassandra and meet a problem recently.
>> I want to insert over 50k columns into cassandra at one time, total size
>> of which doesn't exceed 16MB, but the database return an exception as
>> follows.
>>
>> [E 120809 15:37:31 service:1251] error in write to database
>> Traceback (most recent call last):
>>   File "/home/stoneiii/mycode/src/user/service.py", line 1248, in
>> flush_mutator
>> self.mutator.send()
>>
>>   File "/home/stoneiii/mycode/pylib/pycassa/batch.py", line 127, in
>> send
>>
>> conn.batch_mutate(mutations, write_consistency_level)
>>   File "/home/stoneiii/gaia2/pylib/pycassa/pool.py", line 145, in
>> new_f
>> return new_f(self, *args, **kwargs)
>>   File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in
>> new_f
>> return new_f(self, *args, **kwargs)
>>   File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in
>> new_f
>> return new_f(self, *args, **kwargs)
>>   File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in
>> new_f
>> return new_f(self, *args, **kwargs)
>>   File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 145, in
>> new_f
>> return new_f(self, *args, **kwargs)
>>   File "/home/stoneiii/mycode/pylib/pycassa/pool.py", line 140, in
>> new_f
>> (self._retry_count, exc.__class__.__name__, exc))
>> MaximumRetryException: Retried 6 times. Last failure was error:
>> [Errno 104] Connection reset by peer
>>
>> Since cassandra supports 2 billion of columns in one table, why can't I
>> insert 50k columns in this way? Or what settings should I adjust to break
>> this limit?
>> Thanks for any hint in advance!
>>
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax 


Re: Question regarding tombstone removal and compaction

2012-08-11 Thread Tyler Hobbs
On Fri, Aug 10, 2012 at 5:54 AM, Fredrik
wrote:

> We've had a bug that caused one of our column families to grow very big
> 280 GB on a 500 GB disk. We're using size tiered compaction.
> Since it's "only append" data I've now issued deletes of 260 GB of
> superflous data.
>
> 1. There are som quite large SSTables (80 GB, 40 GB etc..). If I run a
> major compaction before GC grace, which is 6 hours, will the compaction
> succeed or will it fail due to the GC grace hasn't elapsed and thus major
> compaction will ignore the tombstones and then fail due to insufficient
> disk space?
>

The major compaction would fail, because the tombstones could not be purged
yet.


>
> 2. If I wait until GC grace has elapsed, will it be possible to run a
> major compaction since there are only deletes which doesn't require double
> amount of SStable size when merging tombstones with the large SSTables?
>

Yes, although it's a better idea to let minor compactions take care of that.

-- 
Tyler Hobbs
DataStax 


Re: Node doesn't rejoin ring after restart

2012-08-11 Thread Tyler Hobbs
Make sure that your seed list is the same for every node.  Just pick two of
the three nodes and use those as the seeds everywhere.

If that's not the issue, check your cassandra log to see if there are any
exceptions during startup.

On Fri, Aug 3, 2012 at 5:25 PM, Edward Sargisson <
edward.sargis...@globalrelay.net> wrote:

>  Hi all,
> I'm testing our procedures for handling some Cassandra failure scenarios
> and I'm not understanding something.
>
> I'm testing on a 3 node cluster with a replication_factor of 3.
> I stopped one of the nodes for 5 or so minutes and run some application
> tests. Everything was fine.
>
> Then I started cassandra on that node again and it refuses to re-join the
> ring. It can see itself as up but not the other nodes. The other nodes can
> see themselves but don't see it as up.
>
> I deliberately haven't followed any of the token replacement methods
> outlined in the docs. I'm working on the assumption that a small outage on
> one node shouldn't cause extraordinary action.
>
> Nor do I want to have to stop every node before bringing them up one by
> one.
>
> What am I missing? Am I forced into those time consuming methods every
> time I want to restart?
>
> Thoughts?
>
> Cheers,
> Edward
>
> --
>
> Edward Sargisson
>
> senior java developer
> Global Relay
>
> edward.sargis...@globalrelay.net
>
>
> *866.484.6630*
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
> (+65.3158.1301)
>
> Global Relay Archive supports email, instant messaging, BlackBerry,
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter,
> Facebook and more.
>
>
>  Ask about *Global Relay Message*
> * — *The Future of Collaboration in the Financial Services World
>
> *
> *All email sent to or from this address will be retained by Global
> Relay’s email archiving system. This message is intended only for the use
> of the individual or entity to which it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law.  Global Relay will not be liable for any compliance
> or technical information provided herein.  All trademarks are the property
> of their respective owners.
>



-- 
Tyler Hobbs
DataStax