AW: secondery indexes TTL - strange issues

2012-09-17 Thread Roland Gude
Issue created.

Will attach debug logs asap
CASSANDRA-4670

Von: aaron morton [mailto:aa...@thelastpickle.com]
Gesendet: Montag, 17. September 2012 03:46
An: user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

 Date gets inserted and accessible via index query for some time. At some point 
in time Indexes are completely empty and start filling again (while new data 
enters the system).
If you can reproduce this please create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA .

If you can include DEBUG level logs that would be helpful.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 10:08 PM, Roland Gude 
mailto:roland.g...@ez.no>> wrote:


I am not sure it is compacting an old file: the same thing happens eeverytime I 
rebuild the index. New Files appear, get compacted and vanish.

We have set up a new smaller cluster with fresh data. Same thing happens here 
as well. Date gets inserted and accessible via index query for some time. At 
some point in time Indexes are completely empty and start filling again (while 
new data enters the system).

I am currently testing with SizeTiered on both the fresh set and the imported 
set.

For the fresh set (which is significantly smaller) first results imply that the 
issue is not happening with SizeTieredCompaction - I have not yet tested 
everything that comes into my mind and will update if something new comes up.

As for the failing query it is from the cli:
get EventsByItem where 0003--1000--=utf8('someValue');
0003--1000-- is a TUUID we use as a marker for a 
TimeSeries.
(and equivalent queries with astyanax and hector as well)

This is a cf with the issue:

create column family EventsByItem
  with column_type = 'Standard'
  and comparator = 'TimeUUIDType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and read_repair_chance = 0.5
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and column_metadata = [
{column_name : '--1000--',
validation_class : BytesType,
index_name : 'ebi_mandatorIndex',
index_type : 0},
{column_name : '0002--1000--',
validation_class : BytesType,
index_name : 'ebi_itemidIndex',
index_type : 0},
{column_name : '0003--1000--',
validation_class : BytesType,
index_name : 'ebi_eventtypeIndex',
index_type : 0}]
  and compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};

Von: aaron morton [mailto:aa...@thelastpickle.com]
Gesendet: Freitag, 14. September 2012 10:46
An: user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
There is a lot of weird things here.
It could be levelled compaction compacting an older file for the first time. 
But that would be a guess.

Rebuilding the index gives us back the data for a couple of minutes - then it 
vanishes again.
Are you able to do a test with SiezedTieredCompaction ?

Are you able to replicate the problem with a fresh testing CF and some test 
Data?

If it's only a problem with imported data can you provide a sample of the 
failing query ? Any maybe the CF definition ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 2:46 AM, Roland Gude 
mailto:roland.g...@ez.no>> wrote:



Hi,

we have been running a system on Cassandra 0.7 heavily relying on secondary 
indexes for columns with TTL.
This has been working like a charm, but we are trying hard to move forward with 
Cassandra and are struggling at that point:

When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , 
rebuild indexes and run our system, everything seems to work good - until in 
some point of time index queries do not return any data at all anymore (note 
that the TTL has not yet expired for several months).
Rebuilding the index gives us back the data for a couple of minutes - then it 
vanishes again.

What seems strange is that compaction apparently is very aggressive:

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByIte

Re: nodetool cfstats and compression

2012-09-17 Thread aaron morton
Yes. 
It is the space taken up on disk, including compaction. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 6:30 AM, Jim Ancona  wrote:

> Do the row size stats reported by 'nodetool cfstats' include the
> effect of compression?
> 
> Thanks,
> 
> Jim



Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread aaron morton
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is
No. 
Compaction is per Column Family. 

Tombstones will be expired by Minor Compaction if all fragments of the row are 
contained in the SSTables being compacted. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 6:32 AM, Rene Kochen  wrote:

> Hi all,
> 
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is not subject to the minor
> compaction?
> 
> Example:
> 
> Say there are 5 SStables:
> 
> - Customers_0 (10 MB)
> - Customers_1 (10 MB)
> - Customers_2 (10 MB)
> - Customers_3 (10 MB)
> - Customers_4 (30 MB)
> 
> A minor compaction is triggered which will compact the similar sized
> tables 0 to 3. In these tables is a customer record with key "C1" with
> an expired column tombstone. Customer "C1" is also present in table 4.
> Will the minor compaction delete the column (i.e. will the tombstone
> be present in the newly created table)?
> 
> Thanks,
> 
> Rene



Re: Disk configuration in new cluster node

2012-09-17 Thread aaron morton
>  4 drives for data and 1 drive for commitlog, 
How are you configuring the drives ? It's normally best to present one big data 
volume, e.g. using raid 0, and put the commit log on say the system mirror.

> will the node balance out the load on the drives, or is it agnostic to usage 
> of drives underlying data directories?
It will not. 
There is a feature coming in v1.2 to add better support for JBOD 
configurations. 

A word of warning. If you put more than 300GB to 400GB per node you may end 
experience some issues such as repair, compaction or disaster recovery taking a 
long time. These are simply soft limits that provide a good rule of thumb for 
HDD based systems with 1 GigE networking.   

Hope that helps. 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2012, at 7:39 AM, Casey Deccio  wrote:

> I'm building a new "cluster" (to replace the broken setup I've written about 
> in previous posts) that will consist of only two nodes.  I understand that 
> I'll be sacrificing high availability of writes if one of the nodes goes 
> down, and I'm okay with that.  I'm more interested in maintaining high 
> consistency and high read availability.  So I've decided to use a write-level 
> consistency of ALL and read-level consistency of ONE.
> 
> My first question is about the drives in this setup.  If I initially set up 
> the system with, say, 4 drives for data and 1 drive for commitlog, and later 
> I decide to add more capacity to the node by adding more drives for data 
> (adding the new data directory entries in cassandra.yaml), will the node 
> balance out the load on the drives, or is it agnostic to usage of drives 
> underlying data directories?
> 
> My second question has to do with RAID striping.  Would it be more useful to 
> stripe the disk with the commitlog or the disks with the data?  Of course, 
> with a single striped volume for data directories, it would be more difficult 
> to add capacity to the node later, as I've suggested above.
> 
> Casey



Re: Disk configuration in new cluster node

2012-09-17 Thread Robin Verlangen
" A word of warning. If you put more than 300GB to 400GB per node you may
end experience some issues  ... "

I think this is probably the "solution" to your multiple disk problem. You
could use easily one single disk to store the data on, and one disk for the
commitlog. No issues with JBOD, RAID or whatever. If you want to improve
throughput you might consider a RAID-0 setup.

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/17 aaron morton 

>  4 drives for data and 1 drive for commitlog,
>
> How are you configuring the drives ? It's normally best to present one big
> data volume, e.g. using raid 0, and put the commit log on say the system
> mirror.
>
> will the node balance out the load on the drives, or is it agnostic to
> usage of drives underlying data directories?
>
> It will not.
> There is a feature coming in v1.2 to add better support for JBOD
> configurations.
>
> A word of warning. If you put more than 300GB to 400GB per node you may
> end experience some issues such as repair, compaction or disaster recovery
> taking a long time. These are simply soft limits that provide a good rule
> of thumb for HDD based systems with 1 GigE networking.
>
> Hope that helps.
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/09/2012, at 7:39 AM, Casey Deccio  wrote:
>
> I'm building a new "cluster" (to replace the broken setup I've written
> about in previous posts) that will consist of only two nodes.  I understand
> that I'll be sacrificing high availability of writes if one of the nodes
> goes down, and I'm okay with that.  I'm more interested in maintaining high
> consistency and high read availability.  So I've decided to use a
> write-level consistency of ALL and read-level consistency of ONE.
>
> My first question is about the drives in this setup.  If I initially set
> up the system with, say, 4 drives for data and 1 drive for commitlog, and
> later I decide to add more capacity to the node by adding more drives for
> data (adding the new data directory entries in cassandra.yaml), will the
> node balance out the load on the drives, or is it agnostic to usage of
> drives underlying data directories?
>
> My second question has to do with RAID striping.  Would it be more useful
> to stripe the disk with the commitlog or the disks with the data?  Of
> course, with a single striped volume for data directories, it would be more
> difficult to add capacity to the node later, as I've suggested above.
>
> Casey
>
>
>


Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Rene Kochen
Oke, thanks!

So a column tombstone will only be removed if all row fragments are
present in the tables being compacted.

I have a row called "Index" which contains columns like "page0",
"page1", "page2", etc. Every several minutes, new columns are created
and old ones deleted. The problem is that I now have an "Index" row in
several SSTables, but the column tombstones are never deleted. And
reading the "Index" row (and all its column tombstones) takes longer
and longer.

If I do a major compaction, all tombstones are deleted and reading the
"index" row takes one millisecond again (and all the garbage-collect
issues because of this).

Is it not advised to use rows with many new column creates/deletes
(because of how minor compactions work)?

Thanks!

Rene

2012/9/17 aaron morton :
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is
>
> No.
> Compaction is per Column Family.
>
> Tombstones will be expired by Minor Compaction if all fragments of the row
> are contained in the SSTables being compacted.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/09/2012, at 6:32 AM, Rene Kochen  wrote:
>
> Hi all,
>
> Does minor compaction delete expired column-tombstones when the row is
> also present in another table which is not subject to the minor
> compaction?
>
> Example:
>
> Say there are 5 SStables:
>
> - Customers_0 (10 MB)
> - Customers_1 (10 MB)
> - Customers_2 (10 MB)
> - Customers_3 (10 MB)
> - Customers_4 (30 MB)
>
> A minor compaction is triggered which will compact the similar sized
> tables 0 to 3. In these tables is a customer record with key "C1" with
> an expired column tombstone. Customer "C1" is also present in table 4.
> Will the minor compaction delete the column (i.e. will the tombstone
> be present in the newly created table)?
>
> Thanks,
>
> Rene
>
>


Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
On Sep 17, 2012, at 3:04 AM, aaron morton  wrote:

>> I have a schema that represents a filesystem and one example of a Super CF 
>> is:
> This may help with some ideas
> http://www.datastax.com/dev/blog/cassandra-file-system-design
> 
> In general we advise to avoid Super Columns if possible. They are often 
> slower, and the sub columns are not indexed. Meaning all the sub columns have 
> to be read into memory. 
> 
> 
>> So if I set column_count = 1, as I have now, but fetch 1000 dirs (rows) 
>> and each one happens to have 1 files (columns) the dataset is 1000x1.
> This is the way the query works internally. Multiget is simply a collections 
> of independent gets. 
> 
>  
>> The multiget() is more efficient, but I'm having trouble trying to limit the 
>> size of the data returned in order to not crash the cassandra node.
> Often less is more. I would only ask for a few 10's of rows at a time, or try 
> to limit the size of the returned query to a few MB's. Otherwise a lot of 
> data get's dragged through cassandra, the network and finally Python. 
> 
> You may want to consider a CF like the inode CF it the article above. Where 
> the parent dir is a column with a secondary index. 

Thanks Aaron! I will take your points into consideration.

Best regards,
André



Re: Repair: Issue in netstats

2012-09-17 Thread B R
Sorry for the delay; been out of the loop.

Could this problem be due to running repair on a node upgraded to 1.0.11
but the other node in the cluster is still at 0.8.x ?

On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne wrote:

> That obviously shouldn't happen and I don't remember any open ticket
> related to that. You might want to open a ticket on jira
> (https://issues.apache.org/jira/browse/CASSANDRA).
>
> --
> Sylvain
>
> On Fri, Sep 7, 2012 at 10:50 AM, B R 
> wrote:
> > We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node
> and
> > running upgradesstables, we have run a routine repair operation, This
> > operation has been running for a long time and does not seem to be
> > progressing.
> >
> > Running netstats has shown unexpected values for percentages as shown
> below.
> > Any clue as to what could be be issue ?
> >
> > bin/nodetool -h 172.16.0.34 netstats
> > Mode: NORMAL
> > Streaming to: /172.16.0.29
> >/data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=116
> > progress=19946657796608/334406146 - 5964800%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=116
> > progress=0/179880575 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12
> > progress=0/1448134 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=116
> > progress=0/350403675 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89
> > progress=0/27569594 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1
> > progress=0/95043 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1
> > progress=0/232800 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1
> > progress=0/82705 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=116
> > progress=0/724836994 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=116
> > progress=0/401797714 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2
> > progress=0/301297 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3
> > progress=0/829914 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2
> > progress=0/288460 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13
> > progress=0/1954639 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8
> > progress=0/1187649 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1
> > progress=0/141714 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=116
> > progress=0/390168999 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db sections=111
> > progress=13620592201686/303748754 - 4484163%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db sections=110
> > progress=0/162808076 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10
> > progress=0/1922996 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db sections=111
> > progress=0/350744309 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87
> > progress=0/24364920 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2
> > progress=0/228764 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db sections=111
> > progress=0/720722886 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db sections=111
> > progress=0/364643588 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4
> > progress=0/963207 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2
> > progress=0/360024 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1
> > progress=0/72842 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11
> > progress=0/1381176 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13
> > progress=0/3266736 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2
> > progress=0/639705 - 0%
> >/data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db sections=111
> > progress=0/358443928 - 0%
> >  Nothing streaming from /172.16.0.29
> > Pool NameActive   Pending  Completed
> > Commandsn/a 0 19
> > Responses   n/a 02050444
> >
> > Regards.
>


Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
On Sep 17, 2012, at 3:04 AM, aaron morton  wrote:

>> I have a schema that represents a filesystem and one example of a Super CF 
>> is:
> This may help with some ideas
> http://www.datastax.com/dev/blog/cassandra-file-system-design

Could you explain the usage of the "sentinel"? Which nodes have it? I 
understand that it should be used for recursive dir listings, to restrict the 
nodes returned to the "/tmp/" dir, but I'm not sure I understand how it 
works

Thanks,
André

Re: Many ParNew collections

2012-09-17 Thread Rene Kochen
Thanks Aaron,

I found the problem. It's in this thread: "minor compaction and delete
expired column-tombstones".

The problem was that I have one big row called "Index" which contains
many tombstones. Reading all these tombstones caused the memory
issues.

I think node 1 and 3 have had enough minor compactions so that the
tombstones were removed. The second node still contains several old
SSTables and it takes some time before the whole thing is compacted
again.

Thanks,

Rene

2012/9/17 aaron morton :
> The second node (the one suffering from many GC) has a high read
> latency compared to the others. Another thing is that the compacted
> row maximum size is bigger than on the other nodes.
>
> Node 2 also:
> * has about 220MB of data, while the others have about 45MB
> * has about 1 Million keys while the others have about 0.3 Million
>
> - Should the other nodes also have that wide row,
>
> yes. Are you running repair ? What CL are you using ?
>
> - Could repeatedly reading a wide row cause parnew problems?
>
> Maybe. Are you reading the whole thing ?
> It's only 22MB, it's big but not huge.
>
> I would:
>
> * ensure repair is running and completing, this may even out the data load.
> * determine if GC is associate with compactions, repair or general activity.
> * if Gc is associated with compactions the simple thing is to reduce
> concurrent_compactions and in_memory_compaction_limit in the yaml. Note this
> is often a simple / quick fix that can increase IO load and slow down
> compaction. The harder thing is to tune the JVM memory settings (the
> defaults often do a good job).
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/09/2012, at 10:41 PM, Rene Kochen  wrote:
>
> Thanks Aaron,
>
> At another production site the exact same problems occur (also after
> ~6 months). Here I have a very small cluster of three nodes with
> replication factor = 3.
> One of the three nodes begins to have many long Parnews and high CPU
> load. I upgraded to Cassandra 1.0.11, but the GC problem still
> continues on that node.
>
> If I look at the CFStats of the three nodes, there is one CF which is
> different:
>
> Column Family: Logs
> SSTable count: 1
> Space used (live): 47606705
> Space used (total): 47606705
> Number of Keys (estimate): 338176
> Memtable Columns Count: 22297
> Memtable Data Size: 51542275
> Memtable Switch Count: 1
> Read Count: 189441
> Read Latency: 0,768 ms.
> Write Count: 123411
> Write Latency: 0,035 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0,0
> Bloom Filter Space Used: 721456
> Key cache capacity: 20
> Key cache size: 56685
> Key cache hit rate: 0.9132482658217008
> Row cache: disabled
> Compacted row minimum size: 73
> Compacted row maximum size: 263210
> Compacted row mean size: 94
>
> Column Family: Logs
> SSTable count: 3
> Space used (live): 233688199
> Space used (total): 233688199
> Number of Keys (estimate): 1191936
> Memtable Columns Count: 20147
> Memtable Data Size: 47067518
> Memtable Switch Count: 1
> Read Count: 188473
> Read Latency: 4031,791 ms.
> Write Count: 120412
> Write Latency: 0,042 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 234
> Bloom Filter False Ratio: 0,0
> Bloom Filter Space Used: 2603808
> Key cache capacity: 20
> Key cache size: 5153
> Key cache hit rate: 1.0
> Row cache: disabled
> Compacted row minimum size: 73
> Compacted row maximum size: 25109160
> Compacted row mean size: 156
>
> Column Family: Logs
> SSTable count: 1
> Space used (live): 47714798
> Space used (total): 47714798
> Number of Keys (estimate): 338176
> Memtable Columns Count: 29046
> Memtable Data Size: 66585390
> Memtable Switch Count: 1
> Read Count: 196048
> Read Latency: 1,466 ms.
> Write Count: 127709
> Write Latency: 0,034 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 8
> Bloom Filter False Ratio: 0,00847
> Bloom Filter Space Used: 720496
> Key cache capacity: 20
> Key cache size: 54166
> Key cache hit rate: 0.9833443960960739
> Row cache: disabled
> Compacted row minimum size: 73
> Compacted row maximum size: 263210
> Compacted row mean size: 95
>
> The second node (the one suffering from many GC) has a high read
> latency compared to the others. Another thing is that the compacted
> row maximum size is bigger than on the other nodes.
>
> What puzzles me:
>
> - Should the other nodes also have that wide row, because the
> replication factor is three and I only have three nodes? I must say
> that the wide row is probably the index row which has columns
> added/removed continuously. Maybe the other nodes lost much data
> because of compactions?
> - Could repeatedly reading a wide row cause parnew problems?
>
> Thanks!
>
> Rene
>
> 2012/8/17 aaron morton :
>
> - Cassandra 0.7.10
>
> You _really_ should look at getting up to 1.1 :) Memory management is much
> better and the JVM heap requirements are less.
>
> However, there i

Re: Repair: Issue in netstats

2012-09-17 Thread Sylvain Lebresne
On Mon, Sep 17, 2012 at 11:06 AM, B R  wrote:
> Could this problem be due to running repair on a node upgraded to 1.0.11 but
> the other node in the cluster is still at 0.8.x ?

Yes, repair (as all operation requiring streaming) doesn't work
correctly across major Cassandra version. First thing you should do is
to finish the upgrade of the nodes.

--
Sylvain

>
> On Fri, Sep 7, 2012 at 9:11 PM, Sylvain Lebresne 
> wrote:
>>
>> That obviously shouldn't happen and I don't remember any open ticket
>> related to that. You might want to open a ticket on jira
>> (https://issues.apache.org/jira/browse/CASSANDRA).
>>
>> --
>> Sylvain
>>
>> On Fri, Sep 7, 2012 at 10:50 AM, B R 
>> wrote:
>> > We have upgraded a 0.8 cluster to 1.0.11. After upgrading the first node
>> > and
>> > running upgradesstables, we have run a routine repair operation, This
>> > operation has been running for a long time and does not seem to be
>> > progressing.
>> >
>> > Running netstats has shown unexpected values for percentages as shown
>> > below.
>> > Any clue as to what could be be issue ?
>> >
>> > bin/nodetool -h 172.16.0.34 netstats
>> > Mode: NORMAL
>> > Streaming to: /172.16.0.29
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db
>> > sections=116
>> > progress=19946657796608/334406146 - 5964800%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db
>> > sections=116
>> > progress=0/179880575 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=12
>> > progress=0/1448134 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db
>> > sections=116
>> > progress=0/350403675 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=89
>> > progress=0/27569594 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=1
>> > progress=0/95043 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16617-Data.db sections=1
>> > progress=0/232800 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16612-Data.db sections=1
>> > progress=0/82705 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db
>> > sections=116
>> > progress=0/724836994 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db
>> > sections=116
>> > progress=0/401797714 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=2
>> > progress=0/301297 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=3
>> > progress=0/829914 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=2
>> > progress=0/288460 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=13
>> > progress=0/1954639 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=8
>> > progress=0/1187649 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=1
>> > progress=0/141714 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db
>> > sections=116
>> > progress=0/390168999 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16609-Data.db
>> > sections=111
>> > progress=13620592201686/303748754 - 4484163%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16618-Data.db
>> > sections=110
>> > progress=0/162808076 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16620-Data.db sections=10
>> > progress=0/1922996 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16616-Data.db
>> > sections=111
>> > progress=0/350744309 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16602-Data.db sections=87
>> > progress=0/24364920 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16615-Data.db sections=2
>> > progress=0/228764 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16603-Data.db
>> > sections=111
>> > progress=0/720722886 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16607-Data.db
>> > sections=111
>> > progress=0/364643588 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16608-Data.db sections=4
>> > progress=0/963207 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16619-Data.db sections=2
>> > progress=0/360024 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16604-Data.db sections=1
>> > progress=0/72842 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16610-Data.db sections=11
>> > progress=0/1381176 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16606-Data.db sections=13
>> > progress=0/3266736 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16613-Data.db sections=2
>> > progress=0/639705 - 0%
>> >/data/cassandra/data/Keyspace1/Standard1-hd-16614-Data.db
>> > sections=111
>> > progress=0/358443928 - 0%
>> >  Nothing streaming from /172.16.0.29
>> > Pool NameActive   Pending  Completed
>> > Commandsn/a 0 19
>> > Responses   n/a 02050444
>> >
>> > Regards.
>
>


Re: cassandra/hadoop BulkOutputFormat failures

2012-09-17 Thread Brian Jeltema
As suggested, it was a version-skew problem. 

Thanks.

Brian

On Sep 14, 2012, at 11:34 PM, Jeremy Hanna wrote:

> A couple of guesses:
> - are you mixing versions of Cassandra?  Streaming differences between 
> versions might throw this error.  That is, are you bulk loading with one 
> version of Cassandra into a cluster that's a different version?
> - (shot in the dark) is your cluster overwhelmed for some reason?
> 
> If the temp dir hasn't been cleaned up yet, you are able to retry, fwiw.
> 
> Jeremy
> 
> On Sep 14, 2012, at 1:34 PM, Brian Jeltema  
> wrote:
> 
>> I'm trying to do a bulk load from a Cassandra/Hadoop job using the 
>> BulkOutputFormat class.
>> It appears that the reducers are generating the SSTables, but is failing to 
>> load them into the cluster:
>> 
>> 12/09/14 14:08:13 INFO mapred.JobClient: Task Id : 
>> attempt_201208201337_0184_r_04_0, Status : FAILED
>> java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, 
>> /10.4.0.2, /10.4.0.1, /10.4.0.3, /10.4.0.4] 
>>   at 
>> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
>>   at 
>> org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
>>   at 
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
>>   at 
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
>>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
>>   at java.security.AccessController.doPrivileged(Native Method)
>>   at javax.security.auth.Subject.doAs(Subject.java:396)   
>>   at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)  
>> 
>> A brief look at the BulkOutputFormat class shows that it depends on 
>> SSTableLoader. My Hadoop cluster
>> and my Cassandra cluster are co-located on the same set of machines. I 
>> haven't found any stated restrictions,
>> but does this technique only work if the Hadoop cluster is distinct from the 
>> Cassandra cluster? Any suggestions
>> on how to get past this problem?
>> 
>> Thanks in advance.
>> 
>> Brian
> 
> 



Cassandra Messages Dropped

2012-09-17 Thread Michael Theroux
Hello,

While under load, we have occasionally been seeing "messages dropped" errors in 
our cassandra log.  Doing some research, I understand this is part of 
Cassandra's design to shed load, and we should look at the tpstats-like output 
to determine what should be done to resolve the situation.  Typically, you will 
see lots of messages blocked or pending, and that might be an indicator that a 
specific part of hardware needs to be improved/tuned/upgraded.  

However, looking at the output we are getting, I'm finding it difficult to see 
what needs to be tuned, as it looks to me cassandra is handling the load within 
the mutation stage:

NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 658) 
119 MUTATION messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
ReadStage 3 3 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
RequestResponseStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
MutationStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
ReplicateOnWriteStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
GossipStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
StreamStage   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
MemtablePostFlusher   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
FlushWriter   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
commitlog_archiver0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
AntiEntropySessions   0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) 
CompactionManager 0 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) 
MessagingServicen/a   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) 
Cache Type Size Capacity   
KeysToSave Provider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) 
KeyCache2184533  2184533
  all 
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) 
RowCache  00
  all  org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) 
ColumnFamilyMemtable ops,data
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116) 
system.NodeIdInfo 0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.IndexInfo  0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.LocationInfo   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) 
system.Versions   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.schema_keyspaces   0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.Migrations 0,0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) 
system.schema_columnfamilies 0,0
 IN

Re: Composite Column Types Storage

2012-09-17 Thread Ravikumar Govindarajan
Yes Aaron, I was not clear about Bloom Filters. I was thinking about the
column bloom filters when I specify an absolute value for Part1 of the
composite column and a start/end value for Part2 of the composite column

It is slowly dawning on me that I need a super-column to use column blooms
effectively and at the same time don't want the entire sub-column list
deserialized.

In fact, for my use-case I also do not need a column sampling index. Rather
I would much prefer a multi-level skip-list

Is there a way to customize how cassandra writes/reads it's key/column
indexes to SSTables. Any hooks/API that is available as of now should be
greatly helpful

On Fri, Sep 14, 2012 at 10:33 AM, aaron morton wrote:

> Range queries do not use bloom filters.
>
> Are you talking about row range queries ? Or a slice of columns in a row ?
>
> If you are getting a slice of columns from a single row, a bloom filter is
> used to locate the row.
> If you are getting a slice of columns from a range of rows, the bloom
> filter is used to locate the first row. After that is a scan.
>
> There are also row level bloom filters for columns on a row. These are
> used when you columns by names. If you are doing a slice with a start the
> bloom filter is not used, instead the row level column index is used (if
> present).
>
> Hope that helps.
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> Thanks for the clarification. Even though compression solves disk space
> issue, we might still have Memtable bloat right?
>
> There is another issue to be handled for us. The queries are always going
> to be range queries with absolute match on part1 and range on part 2 of the
> composite columns
>
> Ex: Query
>
> Range queries do not use bloom filters. It holds good for
> composite-columns also right? I believe I will end up writing BF bytes only
> to skip it later.
>
> If sharing had been possible, then  alone could have gone
> into the bloom-filter, speeding up my queries really effectively.
>
> But as I understand, there are many levels of nesting possible in a
> composite type and casing at every level is a big task
>
> May be casing for the top-level or the first-part should be a good start?
>
> --
> Ravi
>
> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne wrote:
>
>> > Is every / combination stored separately in disk
>>
>> Yes, each combination is stored separately on disk (the storage engine
>> itself doesn't have special casing for composite column, at least not
>> yet). But as far as disk space is concerned, I suspect that sstable
>> compression makes this largely a non issue.
>>
>> --
>> Sylvain
>>
>
>
>


Astyanax error

2012-09-17 Thread A J
Hello,

I am tyring to retrieve a list of Column Names (that are defined as
Integer) from a CF with RowKey as Integer as well. (I don't care for
the column values that are just nulls)

Following is snippet of my Astyanax code. I am getting 0 columns but I
know the key that I am querying contains a few hundred columns. Any
idea what part of the code below is incorrect ?

Thanks.

Astyanax code:

ColumnFamily CF1 =
new ColumnFamily(
"CF1", // Column Family Name
IntegerSerializer.get(),   // Key Serializer
IntegerSerializer.get());  // Column Serializer

//Reading data
int NUM_EVENTS = 9;

StopWatch clock = new StopWatch();
clock.start();
for (int i = 0; i < NUM_EVENTS; ++i) {
ColumnList result = keyspace.prepareQuery(CF1)
.getKey(1919)
.execute().getResult();
System.out.println( "results are: " + result.size() );
}
clock.stop();



CF definition:
===
[default@ks1] describe CF1;
ColumnFamily: CF1
  Key Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.IntegerType


Re: Disk configuration in new cluster node

2012-09-17 Thread Casey Deccio
On Mon, Sep 17, 2012 at 1:19 AM, aaron morton wrote:

>  4 drives for data and 1 drive for commitlog,
>
> How are you configuring the drives ? It's normally best to present one big
> data volume, e.g. using raid 0, and put the commit log on say the system
> mirror.
>
>
Given the advice to use a single RAID 0 volume, I think that's what I'll
do.  By system mirror, you are referring to the volume on which the OS is
installed?  Should the volume with the commit log also have multiple disks
in a RAID 0 volume?  Alternatively, would a RAID 1 setup be reasonable for
the system volume/OS, so the system itself can be resilient to disk
failure, or would that kill commit performance?

Any preference to hardware RAID 0 vs. using something like mdadm?

A word of warning. If you put more than 300GB to 400GB per node you may end
> experience some issues such as repair, compaction or disaster recovery
> taking a long time. These are simply soft limits that provide a good rule
> of thumb for HDD based systems with 1 GigE networking.
>

Hmm.  My hope was to be able to run a minimal number of nodes and maximize
their capacity because it doesn't make sense in my case to build or
maintain a large cluster.  I wanted to run a two-node setup (RF=1, RCL=ONE,
WCL=ALL), each with several disks having large capacity, totaling 10 - 12
TB.  Is this (another) bad idea?

Casey


Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Josep Blanquer
We've run exactly into the same problem recently. Some specific keys in a
couple CFs accumulate a fair amount of column churn over time.

Pre Cassandra 1.x we scheduled full compactions often to purge them.
However, when we moved to 1.x but we adopted the recommended practice of
avoiding full compactions. The problem took a while to manifest itself, but
over the course of several weeks (few months) of not doing full compactions
the load on those services slowly increased...and despite we have
everything monitored, it was not trivial to find out that it was the
accumulation of tombstones on 'some' keys, for 'some' CF in the cluster
that were really causing long latencies and CPU spikes (high CPU is a
typical signature when having a fair amount of tombstones in the SSTables).

Is there any JIRA or enhancement to perhaps be able to detect when certain
column tombstones can be deleted in minor compactions? The new introduction
of SSTable min-max timestamps might help? or perhaps there are new ones
coming up that I'm not aware of 

I'm saying this because there is absolutely no way (that I know of) to find
out or monitor when Cassandra encounters many column tombstones when doing
searches. That alone could help detect these cases so one can change the
data model and/or realize that needs full compactions. For example a new
metric at the CF level that tracks % of tombstones read per row (ideally a
histogram based on row size), or perhaps spit something out in the logs (a
la mysql slowquery log) when a wide row is read and a certain % of
tombstone columns are encountered...this alone can be a huge help in at
least detecting the latent problem.

...what we had to do to fully debug and understand the issue was to build
some tools that scanned SSTables and provided some of those stats. In a
large cluster that is painful to do.

Anyway, just wanted to chime in the thread to provide our input in the
matter.

Cheers,

Josep M.

On Mon, Sep 17, 2012 at 2:01 AM, Rene Kochen
wrote:

> Oke, thanks!
>
> So a column tombstone will only be removed if all row fragments are
> present in the tables being compacted.
>
> I have a row called "Index" which contains columns like "page0",
> "page1", "page2", etc. Every several minutes, new columns are created
> and old ones deleted. The problem is that I now have an "Index" row in
> several SSTables, but the column tombstones are never deleted. And
> reading the "Index" row (and all its column tombstones) takes longer
> and longer.
>
> If I do a major compaction, all tombstones are deleted and reading the
> "index" row takes one millisecond again (and all the garbage-collect
> issues because of this).
>
> Is it not advised to use rows with many new column creates/deletes
> (because of how minor compactions work)?
>
> Thanks!
>
> Rene
>
> 2012/9/17 aaron morton :
> > Does minor compaction delete expired column-tombstones when the row is
> > also present in another table which is
> >
> > No.
> > Compaction is per Column Family.
> >
> > Tombstones will be expired by Minor Compaction if all fragments of the
> row
> > are contained in the SSTables being compacted.
> >
> > Cheers
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 15/09/2012, at 6:32 AM, Rene Kochen  wrote:
> >
> > Hi all,
> >
> > Does minor compaction delete expired column-tombstones when the row is
> > also present in another table which is not subject to the minor
> > compaction?
> >
> > Example:
> >
> > Say there are 5 SStables:
> >
> > - Customers_0 (10 MB)
> > - Customers_1 (10 MB)
> > - Customers_2 (10 MB)
> > - Customers_3 (10 MB)
> > - Customers_4 (30 MB)
> >
> > A minor compaction is triggered which will compact the similar sized
> > tables 0 to 3. In these tables is a customer record with key "C1" with
> > an expired column tombstone. Customer "C1" is also present in table 4.
> > Will the minor compaction delete the column (i.e. will the tombstone
> > be present in the newly created table)?
> >
> > Thanks,
> >
> > Rene
> >
> >
>


Re: minor compaction and delete expired column-tombstones

2012-09-17 Thread Sylvain Lebresne
> Is there any JIRA or enhancement to perhaps be able to detect when certain
> column tombstones can be deleted in minor compactions? The new introduction
> of SSTable min-max timestamps might help? or perhaps there are new ones
> coming up that I'm not aware of 

https://issues.apache.org/jira/browse/CASSANDRA-4671

--
Sylvain


Re: Astyanax error

2012-09-17 Thread A J
I traced this to the misnomer of Integer datatype in Cassandra.
IntegerType in Cassandra is infact a variable length BigInt. Changing
it to Int32Type solved the issue.
https://github.com/Netflix/astyanax/issues/59



On Mon, Sep 17, 2012 at 10:51 AM, A J  wrote:
> Hello,
>
> I am tyring to retrieve a list of Column Names (that are defined as
> Integer) from a CF with RowKey as Integer as well. (I don't care for
> the column values that are just nulls)
>
> Following is snippet of my Astyanax code. I am getting 0 columns but I
> know the key that I am querying contains a few hundred columns. Any
> idea what part of the code below is incorrect ?
>
> Thanks.
>
> Astyanax code:
> 
> ColumnFamily CF1 =
> new ColumnFamily(
> "CF1", // Column Family Name
> IntegerSerializer.get(),   // Key Serializer
> IntegerSerializer.get());  // Column Serializer
>
> //Reading data
> int NUM_EVENTS = 9;
>
> StopWatch clock = new StopWatch();
> clock.start();
> for (int i = 0; i < NUM_EVENTS; ++i) {
> ColumnList result = keyspace.prepareQuery(CF1)
> .getKey(1919)
> .execute().getResult();
> System.out.println( "results are: " + result.size() );
> }
> clock.stop();
>
>
>
> CF definition:
> ===
> [default@ks1] describe CF1;
> ColumnFamily: CF1
>   Key Validation Class: org.apache.cassandra.db.marshal.IntegerType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.IntegerType


persistent compaction issue (1.1.4 and 1.1.5)

2012-09-17 Thread Michael Kjellman
Hi All,

I have an issue where each one of my nodes (currently all running at 1.1.5) is 
reporting around 30,000 pending compactions. I understand that a pending 
compaction doesn't necessarily mean it is a scheduled task however I'm confused 
why this behavior is occurring. It is the same on all nodes, occasionally goes 
down 5k pending compaction tasks, and then returns to 25,000-35,000 compaction 
tasks pending.

I have tried a repair operation/scrub operation on two of the nodes and while 
compactions initially happen the number of pending compactions does not 
decrease.

Any ideas? Thanks for your time.

Best,
michael


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Bloom Filters in Cassandra

2012-09-17 Thread Bill Hastings
How are bloom filters used in Cassandra? Is my understanding correct
in that there is one per SSTable encapsulating what keys are in the
SSTable? Please advise.


Is Cassandra right for me?

2012-09-17 Thread Marcelo Elias Del Valle
Hello,

 I am new to Cassandra and I am in doubt if Cassandra is the right
technology to use in the architecture I am defining. Also, I saw a
presentation which said that if I don't have rows with more than a hundred
rows in Cassandra, whether I am doing something wrong or I shouldn't be
using Cassandra. Therefore, it might be the case I am doing something
wrong. If you could help me to find out the answer for these questions by
giving any feedback, it would be highly appreciated.
 Here is my need and what I am thinking in using Cassandra for:

   - I need to support a high volume of writes per second. I might have a
   billion writes per hour
   - I need to write non-structured data that will be processed later by
   hadoop processes to generate structured data from it. Later, I index the
   structured data using SOLR or SOLANDRA, so the data can be consulted by my
   end user application. Is Cassandra recommended for that, or should I be
   thinking in writting directly to HDFS files, for instance? What's the main
   advantage I get from storing data in a nosql service like Cassandra, when
   compared to storing files into HDFS?
   - Usually I will write json data associated to an ID and my hadoop
   processes will process this data to write data to a database. I have two
   doubts here:
  - If I don't need to perform complicated queries in Cassandra, should
  I store the json-like data just as a column value? I am afraid of doing
  something wrong here, as I would need just to store the json
file and some
  more 5 or 6 fields to query the files later.
  - Does it make sense to you to use hadoop to process data from
  Cassandra and store the results in a database, like HBase? Once I have
  structured data, is there any reason I should use Cassandra instead of
  HBase?

 I am sorry if the questions are too dummy, I have been watching a lot
of videos and reading a lot of documentation about Cassandra, but honestly,
more I read more I have questions.

Thanks in advance.

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


are counters stable enough for production?

2012-09-17 Thread Bartłomiej Romański
Hi,

Does anyone have any experience with using Cassandra counters in production?

We rely heavily on them and recently we've got a few very serious
problems. Our counters values suddenly became a few times higher than
expected. From the business point of view this is a disaster :/ Also
there a few open major bugs related to them. Some of them for quite
long (months).

We are seriously considering going back to other solutions (e.g. SQL
databases). We simply cannot afford incorrect counter values. We can
tolerate loosing a few increments from time to time, but we cannot
tolerate having counters suddenly 3 times higher or lower than the
expected values.

What is the current status of counters? Should I consider them a
production-ready feature and we just have some bad luck? Or should I
rather consider them as a experimental-feature and look for some other
solutions?

Do you have any experiences with them? Any comments would be very
helpful for us!

Thanks,
Bartek


Re: Query advice to prevent node overload

2012-09-17 Thread aaron morton
> Could you explain the usage of the "sentinel"?
Queries that use a secondary index must include an equality clause. That's the 
sentinel is there for…

> select filename from inode where filename > ‘/tmp’ and filename < ‘/tmq’ and 
> sentinel = ‘x’;

Cheers 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/09/2012, at 9:17 PM, André Cruz  wrote:

> On Sep 17, 2012, at 3:04 AM, aaron morton  wrote:
> 
>>> I have a schema that represents a filesystem and one example of a Super CF 
>>> is:
>> This may help with some ideas
>> http://www.datastax.com/dev/blog/cassandra-file-system-design
> 
> Could you explain the usage of the "sentinel"? Which nodes have it? I 
> understand that it should be used for recursive dir listings, to restrict the 
> nodes returned to the "/tmp/" dir, but I'm not sure I understand how it 
> works
> 
> Thanks,
> André



Re: Cassandra Messages Dropped

2012-09-17 Thread aaron morton
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
> MemtablePostFlusher   1 5 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
> FlushWriter   1 5 0
Looks suspiciously like 
http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E

What version are you on ? 

Are there any ERROR log messages before this ? 

Are you seeing MutationStage back up ? 

Are you see log messages from GCInspector ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 2:16 AM, Michael Theroux  wrote:

> Hello,
> 
> While under load, we have occasionally been seeing "messages dropped" errors 
> in our cassandra log.  Doing some research, I understand this is part of 
> Cassandra's design to shed load, and we should look at the tpstats-like 
> output to determine what should be done to resolve the situation.  Typically, 
> you will see lots of messages blocked or pending, and that might be an 
> indicator that a specific part of hardware needs to be 
> improved/tuned/upgraded.  
> 
> However, looking at the output we are getting, I'm finding it difficult to 
> see what needs to be tuned, as it looks to me cassandra is handling the load 
> within the mutation stage:
> 
> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 
> 658) 119 MUTATION messages dropped in last 5000ms
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
> Pool NameActive   Pending   Blocked
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
> ReadStage 3 3 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
> RequestResponseStage  0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
> ReadRepairStage   0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
> MutationStage 0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
> ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
> GossipStage   0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
> AntiEntropyStage  0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
> MigrationStage0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
> StreamStage   0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
> MemtablePostFlusher   1 5 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
> FlushWriter   1 5 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
> MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
> commitlog_archiver0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
> InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
> AntiEntropySessions   0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
> HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) 
> CompactionManager 0 0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) 
> MessagingServicen/a   0,0
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) 
> Cache Type Size Capacity   
> KeysToSave Provider
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) 
> KeyCache2184533  2184533  
> all 
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) 
> RowCache  00  
> all  org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) 
> ColumnFamilyMemtable ops,data
> INFO [ScheduledTasks:1] 2

Re: Cassandra Messages Dropped

2012-09-17 Thread Michael Theroux
Thanks for the response.

We are on version 1.1.2.  We don't see the MutationStage back up.  The dump 
from the messages dropped error doesn't show a backup, but also watching 
"nodetool tpstats" doesn't show any backup there.

nodetool info also shows we have over a gig of available memory on the JVM heap 
of each node.

The earliest GCInspector traces I see before one of the more recent incidents 
in which messages were dropped are:

INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 
122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 4253024256
 
NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 
122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 4253024256
 
INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 
122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 4253024256

But this was 45 minutes before messages were dropped.

It's appreciated,
-Mike
 
On Sep 17, 2012, at 11:27 PM, aaron morton wrote:

>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>> MemtablePostFlusher   1 5 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>> FlushWriter   1 5 0
> Looks suspiciously like 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E
> 
> What version are you on ? 
> 
> Are there any ERROR log messages before this ? 
> 
> Are you seeing MutationStage back up ? 
> 
> Are you see log messages from GCInspector ?
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/09/2012, at 2:16 AM, Michael Theroux  wrote:
> 
>> Hello,
>> 
>> While under load, we have occasionally been seeing "messages dropped" errors 
>> in our cassandra log.  Doing some research, I understand this is part of 
>> Cassandra's design to shed load, and we should look at the tpstats-like 
>> output to determine what should be done to resolve the situation.  
>> Typically, you will see lots of messages blocked or pending, and that might 
>> be an indicator that a specific part of hardware needs to be 
>> improved/tuned/upgraded.  
>> 
>> However, looking at the output we are getting, I'm finding it difficult to 
>> see what needs to be tuned, as it looks to me cassandra is handling the load 
>> within the mutation stage:
>> 
>> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line 
>> 658) 119 MUTATION messages dropped in last 5000ms
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) 
>> Pool NameActive   Pending   Blocked
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) 
>> ReadStage 3 3 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>> RequestResponseStage  0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>> ReadRepairStage   0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) 
>> MutationStage 0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
>> ReplicateOnWriteStage 0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) 
>> GossipStage   0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>> AntiEntropyStage  0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>> MigrationStage0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>> StreamStage   0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) 
>> MemtablePostFlusher   1 5 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>> FlushWriter   1 5 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>> MiscStage 0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) 
>> commitlog_archiver0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
>> InternalResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) 
>> AntiEntropySessions   0 0 0
>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) 
>> HintedHandoff

Re: Stream definition is lost after server restart

2012-09-17 Thread Ishan Thilina
Sorry,

Forgot to mention that I'm using Cassandra 1.1.3
--
Thank you..!
-
071-6372089

Ishan's info: www.ishans.info
මගේ සටහන්: www.siblog.ishans.info
Ishan's way: www.blog.ishans.info
-



On Mon, Sep 17, 2012 at 9:32 PM, Ishan Thilina  wrote:

> Hi all,
>
> I am currently working on a project which uses Cassandra. I have a task
> running in my server which will periodically look at a certain set of
> pre-defined data (of the server) and writes them to Cassandra. The
> procedure for this work is as follows.
>
> 1. I give a name and a version to the task.
>
> 2. I configure what data should the task monitor.
>
> 3. The task will then look if a stream definition exists for the task
> using the task name and its version.
>
> 4. If a definition does not exist, then the task will create a definition
> (By looking at the types of data to be monitored).
>
> 5. Then (or if a stream definition exists) the task will write the data to
> Cassandra
>
> 6. The task will repeat the steps 3 to 5 forever (even after server
> restart).
>
>
> Please note that there can be multiple tasks like this monitoring
> different sets of data.
>
>
> The problem occurs when the server is used for few days and when several
> (around 100) stream definitions are created, I have observed that after the
> server is restarted, a stream definition does not exist exception is thrown
> in the step 3. I manually checked and the stream definition actually exists.
>
> When a new server is used (with a clean Cassandra server), then everything
> works fine for few days. But most of the time after a few days, the same
> issue arises.
>
> Has anyone experienced this..?
>
>
>
> --
> Thank you..!
> -
> 071-6372089
>
> Ishan's info: www.ishans.info
> මගේ සටහන්: www.siblog.ishans.info
> Ishan's way: www.blog.ishans.info
> -
>
>


HTimedOutException and cluster not working

2012-09-17 Thread Jason Wee
Hello,

A context to our environment, we have a clusters of 9 nodes with a few
keyspaces. The client write to the cluster with consistency level of one to
a keyspace in the cluster with a replication factor of 3. The hector client
is configured such that all the nodes in cluster is specified and so that
we would want to ensure that at any write request, two nodes, can fail and
one write is succcess to the cluster node.

However, under certain situation, we seen in the log, HTimedOutException is
logged during writing to the cluster. Hector client thus failover to the
next node in the cluster but what we noticed is that, the same exception,
HTimedOutException is logged for all the nodes. This result that the
cluster is not working as a whole. Logically, we checked all the nodes in
the cluster for load. Only node-3 seem to have high pending MutationStage
when nodetool tpstats is run. Other nodes are fine with 0 active and 0
pending for all the stages.

/nodetool -h localhost tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 6983 0 0
RequestResponseStage 0 0 1252368951 0 0
MutationStage 16 2177067 879092633 0 0
ReadRepairStage 0 0 3648106 0 0
ReplicateOnWriteStage 0 0 33722610 0 0
GossipStage 0 0 20504608 0 0
AntiEntropyStage 0 0 1197 0 0
MigrationStage 0 0 89 0 0
MemtablePostFlusher 0 0 5659 0 0
StreamStage 0 0 296 0 0
FlushWriter 0 0 5616 0 1321
MiscStage 0 0 5964 0 0
AntiEntropySessions 0 0 88 0 0
InternalResponseStage 0 0 27 0 0
HintedHandoff 1 2 5976 0 0

Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
BINARY 0
READ 178
MUTATION 17467
REQUEST_RESPONSE 0

We proceed to check if there is any compaction in node-3 and found out the
following:

./nodetool -hlocalhost compactionstats
pending tasks: 196
compaction type keyspace column family bytes compacted bytes total progress
Cleanup MyKeyspace MyCF 6946398685 10230720119 67.90%


Question:
* with a replication factor of 3 in the keyspace and client write
consistency
  level of one, in the situation above, and the current hector client
settings
  and cluster settings, it should be possible in this scenario, write
success
  on one of the nodes even though node-3 is too busy or failing for any
reason?

* when hector client failover to other nodes, basically all the nodes fail,
why
  is this so?

* what factors that increase MutationStage active and pending values?

Thank you for any comments and insight

Regards,
Jason


Lock on Cassandra ---- using bakery algo

2012-09-17 Thread Yang
https://github.com/yangyangyyy/cassandra/commit/c98795333c9c5252c9fb261aea0b7becf5b60da6

this has been described on the wiki , I recently needed such as thing, so
implemented it above, and would like to contribute to the community.

note that this uses a "Simplified" version of Bakery algorithm that
provides no fairness, as the MAX() is replaced by a random choice

pardon my beginner python usage :)

thanks
Yang