compression

2012-09-23 Thread Tamar Fraenkel
Hi!
In datastax 
documentationthere
is an explanation of what CFs are a good fit for compression:

When to Use Compression

Compression is best suited for column families where there are many rows,
with each row having the same columns, or at least many columns in common.
For example, a column family containing user data such as username, email,
etc., would be a good candidate for compression. The more similar the data
across rows, the greater the compression ratio will be, and the larger the
gain in read performance.

Compression is not as good a fit for column families where each row has a
different set of columns, or where there are just a few very wide rows.
Dynamic column families such as this will not yield good compression ratios.

I have many column families where rows share some of the columns and have
varied number of unique columns per row.
For example, I have a CF where each row has ~13 shared columns, but between
0 to many unique columns. Will such CF be a good fit for compression?

More generally, is there a rule of thumb for how many shared columns (or
percentage of columns which are shared) is considered a good fit for
compression?

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Correct model

2012-09-23 Thread Marcelo Elias Del Valle
2012/9/20 aaron morton 

> I would consider:
>
> # User CF
> * row_key: user_id
> * columns: user properties, key=value
>
> # UserRequests CF
> * row_key:  where partition_start is the start
> of a time partition that makes sense in your domain. e.g. partition
> monthly. Generally want to avoid rows the grow forever, as a rule of thumb
> avoid rows more than a few 10's of MB.
> * columns: two possible approaches:
> 1) If the requests are immutable and you generally want all of the data
> store the request in a single column using JSON or similar, with the column
> name a timestamp.
> 2) Otherwise use a composite column name of 
> to store the request in many columns.
> * In either case consider using Reversed comparators so the most recent
> columns are first  see
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
>
> # GlobalRequests CF
> * row_key: partition_start - time partition as above. It may be easier to
> use the same partition scheme.
> * column name: 
> * column value: empty
>

Ok, I think I understood your suggestion... But the only advantage in this
solution is to split data among partitions? I understood how it would work,
but I didn't understand why it's better than the other solution, without
the GlobalRequests CF


> - Select all the requests for an user
>
> Work out the current partition client side, get the first N columns. Then
> page.
>

What do you mean here by current partition? You mean I would perform a
query for each particition? If I want all the requests for the user,
couldn't I just select all UserRequest records which start with "userId"? I
might be missing something here, but in my understanding if I use hector to
query a column familly I can do that and Cassandra servers will
automatically communicate to each other to get the data I need, right? Is
it bad? I really didn't understand why to use partitions.



> - Select all the users which has new requests, since date D
>
> Worm out the current partition client side, get the first N columns from
> GlobalRequests, make a multi get call to UserRequests
> NOTE: Assuming the size of the global requests space is not huge.
> Hope that helps.
>
 For sure it is helping a lot. However, I don't know what is a multiget...
I saw the hector api reference and found this method, but not sure about
what Cassandra would do internally if I do a multiget... Is this expensive
in terms of performance and latency?

-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: Correct model

2012-09-23 Thread Hiller, Dean
But the only advantage in this solution is to split data among partitions?

You need to split data among partitions or your query won't scale as more and 
more data is added to table.  Having the partition means you are querying a lot 
less rows.

What do you mean here by current partition?

He means determine the ONE partition key and query that partition.  Ie. If you 
want just latest user requests, figure out the partition key based on which 
month you are in and query it.  If you want the latest independent of user, 
query the correct single partition for GlobalRequests CF.

If I want all the requests for the user, couldn't I just select all UserRequest 
records which start with "userId"?

He designed it so the user requests table was completely scalable so he has 
partitions there.  If you don't have partitions, you could run into a row that 
is t long.  You don't need to design it this way if you know none of your 
users are going to go into the millions as far as number of requests.  In his 
design then, you need to pick the correct partition and query into that 
partition.

I really didn't understand why to use partitions.

Partitions are a way if you know your rows will go into the trillions of 
breaking them up so each partition has 100k rows or so or even 1 million but 
maxes out in the millions most likely.  Without partitions, you hit a limit in 
the millions.  With partitions, you can keep scaling past that as you can have 
as many partitions as you want.

A multi-get is a query that finds IN PARALLEL all the rows with the matching 
keys you send to cassandra.  If you do 1000 gets(instead of a multi-get) with 
1ms latency, you will find, it takes 1 second+processing time.  If you do ONE 
multi-get, you only have 1 request and therefore 1ms latency.  The other 
solution is you could send 1000 "asycnh" gets but I have a feeling that would 
be slower with all the marshalling/unmarshalling of the envelope…..really 
depends on the envelope size like if we were using http, you would get killed 
doing 1000 requests instead of 1 with 1000 keys in it.

Later,
Dean

From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Sunday, September 23, 2012 10:23 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Correct model


2012/9/20 aaron morton mailto:aa...@thelastpickle.com>>
I would consider:

# User CF
* row_key: user_id
* columns: user properties, key=value

# UserRequests CF
* row_key:  where partition_start is the start of a 
time partition that makes sense in your domain. e.g. partition monthly. 
Generally want to avoid rows the grow forever, as a rule of thumb avoid rows 
more than a few 10's of MB.
* columns: two possible approaches:
1) If the requests are immutable and you generally want all of the data store 
the request in a single column using JSON or similar, with the column name a 
timestamp.
2) Otherwise use a composite column name of  to 
store the request in many columns.
* In either case consider using Reversed comparators so the most recent columns 
are first  see http://thelastpickle.com/2011/10/03/Reverse-Comparators/

# GlobalRequests CF
* row_key: partition_start - time partition as above. It may be easier to use 
the same partition scheme.
* column name: 
* column value: empty

Ok, I think I understood your suggestion... But the only advantage in this 
solution is to split data among partitions? I understood how it would work, but 
I didn't understand why it's better than the other solution, without the 
GlobalRequests CF

- Select all the requests for an user
Work out the current partition client side, get the first N columns. Then page.

What do you mean here by current partition? You mean I would perform a query 
for each particition? If I want all the requests for the user, couldn't I just 
select all UserRequest records which start with "userId"? I might be missing 
something here, but in my understanding if I use hector to query a column 
familly I can do that and Cassandra servers will automatically communicate to 
each other to get the data I need, right? Is it bad? I really didn't understand 
why to use partitions.


- Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from 
GlobalRequests, make a multi get call to UserRequests
NOTE: Assuming the size of the global requests space is not huge.
Hope that helps.
 For sure it is helping a lot. However, I don't know what is a multiget... I 
saw the hector api reference and found this method, but not sure about what 
Cassandra would do internally if I do a multiget... Is this expensive in terms 
of performance and latency?

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


found major difference in CQL vs Scalable SQL(PlayOrm) and question

2012-09-23 Thread Hiller, Dean
I have been digging more and more into CQL vs. PlayOrm S-SQL and found a
major difference that is quite interesting(thought you might be interested
plus I have a question).  CQL uses a composite row key with the prefix so
now any other tables that want to reference that entity have references to
that row "with" the partition key embedded in the row key.

Scalable-SQL does a similar form of partitioning but
1. You can partition a table 2 ways not just one way (ie. By account and
by user perhaps) for queries into either types of partition
2. If you decide to repartition your data a different way, with S-SQL you
don't have to map/reduce all those rows with foreign keys.  In CQL, you
have to map/reduce the partitioned table AND all the rows referencing all
those rows since the partition key is basically embedded everywhere.

I found that quite interesting, but that said, I need to add support for
PlayOrm on top of partitioned CF's so we are compatible with CQL as well.

1. Is there any meta information I can grab from the meta model on this?
2. Also, how can I query the indexes without involving CQL at all such
that I can translate the playOrm Scalable-SQL to re-use the existing
indices?  (ie. Is there an index column family and how to form the row key
to access the index?)

Thanks,
Dean



Re: batch_mutate and erlang

2012-09-23 Thread Tyler Hobbs
It's a pretty solid standard at this point.  The large majority of client
library work from this point on will be based on cql.

On Sun, Sep 23, 2012 at 12:45 AM, Bradford Toney
wrote:

> Yeah i've seen how it's done in CQL3 is just wasn't sure if it was a solid
> standard yet. I will probably go the CQL route as right now i am doing each
> insert individually.
>
>
> On Sat, Sep 22, 2012 at 11:34 AM, Tyler Hobbs  wrote:
>
>> If there's not already a well-written client in place, you should
>> strongly consider using cql3 instead.  It will save you a ton of work.
>>
>> If you want to ignore that advice, you can look at the send() and
>> insert() methods in phpcassa:
>> https://github.com/thobbs/phpcassa/blob/master/lib/phpcassa/Batch/Mutator.php#L53
>>
>> and in pycassa:
>> https://github.com/pycassa/pycassa/blob/master/pycassa/batch.py#L113
>>
>>
>> On Sat, Sep 22, 2012 at 12:22 AM, Bradford Toney <
>> bradford.to...@gmail.com> wrote:
>>
>>> I was using batch_mutate through the thrift interface and kept getting
>>> supercolumn errors, I was wondering if there are any examples of
>>> batch_mutate in erlang anywhere, or maybe something similar.
>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>>
>


-- 
Tyler Hobbs
DataStax 


Re: compression

2012-09-23 Thread Tyler Hobbs
Due to repetition in the column metadata, you're still likely to get a
reasonable amount of compression.  This is especially true if there is some
amount of repetition in the column names, values, or TTLs in wide rows.
Compression will almost always be beneficial unless you're already somehow
CPU bound or are using large column values that are high in entropy, such
as pre-compressed or encrypted data.

On Sun, Sep 23, 2012 at 10:29 AM, Tamar Fraenkel wrote:

> Hi!
> In datastax 
> documentationthere is an 
> explanation of what CFs are a good fit for compression:
>
> When to Use Compression
>
> Compression is best suited for column families where there are many rows,
> with each row having the same columns, or at least many columns in common.
> For example, a column family containing user data such as username, email,
> etc., would be a good candidate for compression. The more similar the data
> across rows, the greater the compression ratio will be, and the larger the
> gain in read performance.
>
> Compression is not as good a fit for column families where each row has a
> different set of columns, or where there are just a few very wide rows.
> Dynamic column families such as this will not yield good compression ratios.
>
> I have many column families where rows share some of the columns and have
> varied number of unique columns per row.
> For example, I have a CF where each row has ~13 shared columns, but
> between 0 to many unique columns. Will such CF be a good fit for
> compression?
>
> More generally, is there a rule of thumb for how many shared columns (or
> percentage of columns which are shared) is considered a good fit for
> compression?
>
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>


-- 
Tyler Hobbs
DataStax 
<>

Re: Cassandra Messages Dropped

2012-09-23 Thread Michael Theroux
There were no errors in the log (other than the messages dropped exception 
pasted below), and the node does recover.  We have only a small number of 
secondary indexes (3 in the whole system).

However, I went through the cassandra code, and I believe I've worked through 
this problem.

Just to finish out this thread, I realized that when you see:

INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 
72) FlushWriter   1 5 0

It is an issue.  Cassandra will at various times enqueue many memtables for 
flushing.  By default, the queue size for this is 4.  If more than 5 memtables 
get queued for flushing (4 + 1 for the one currently being flushed), a lock 
will be acquired and held across all tables until all memtables that need to be 
flushed are enqueued.  If it takes more than rpc_timeout_time_in_ms time to 
flush enough information to allow all the pending memtables to be enqueued, a 
"messages dropped" will occur.  To put in other words, Cassandra will lock down 
all tables until all pending flush requests fit in the pending queue.  If your 
queue size is 4, and 8 tables need to be flushed, Cassandra will lock down all 
tables until a minimum of 3 memtables are flushed.

With this in mind, I went through the cassandra log and found this was indeed 
the case looking at log entries similar to these:

 INFO [OptionalTasks:1] 2012-09-16 05:54:29,750 ColumnFamilyStore.java (line 
643) Enqueuing flush of Memtable-p@1525015234(18686281/341486464 
serialized/live bytes, 29553 ops)
...
INFO [FlushWriter:29] 2012-09-16 05:54:29,768 Memtable.java (line 266) Writing 
Memtable-p@1525015234(18686281/341486464 serialized/live bytes, 29553 ops)
...
INFO [FlushWriter:29] 2012-09-16 05:54:30,254 Memtable.java (line 307) 
Completed flushing /data/cassandra/data/open/people/open-p-hd-441-Data.db

I was able to figure out what the rpc_timeout_in_ms needed to be to temporarily 
prevent the problem.

We had plenty of write I/O available.  We also had free memory.  I increased 
the memtable_flush_writers to "2" and memtable_flush_queue_size to "8".  We 
haven't had any timeouts for a number of days now.

Thanks for your help,
-Mike

On Sep 18, 2012, at 5:14 AM, aaron morton wrote:

> Any errors in the log ?
> 
> The node recovers ? 
> 
> Do you use secondary indexes ? If so check comments for  
> memtable_flush_queue_size in the yaml. if this value is too low writes may 
> back up. But I would not expect it to cause dropped messages. 
> 
>> nodetool info also shows we have over a gig of available memory on the JVM 
>> heap of each node.
> 
> Not all memory is created equal :)
> ParNew is kicking in to GC the Eden space in the New Heap. 
>  
> It may just be that the node is getting hammered by something and IO is 
> getting overwhelmed. If you can put the logs up someone might take a look. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/09/2012, at 3:46 PM, Michael Theroux  wrote:
> 
>> Thanks for the response.
>> 
>> We are on version 1.1.2.  We don't see the MutationStage back up.  The dump 
>> from the messages dropped error doesn't show a backup, but also watching 
>> "nodetool tpstats" doesn't show any backup there.
>> 
>> nodetool info also shows we have over a gig of available memory on the JVM 
>> heap of each node.
>> 
>> The earliest GCInspector traces I see before one of the more recent 
>> incidents in which messages were dropped are:
>> 
>>  INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 
>> 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 
>> 4253024256
>>  
>>  NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 
>> 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 
>> 4253024256
>>  
>>  INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 
>> 122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 
>> 4253024256
>> 
>> But this was 45 minutes before messages were dropped.
>> 
>> It's appreciated,
>> -Mike
>>  
>> On Sep 17, 2012, at 11:27 PM, aaron morton wrote:
>> 
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 
 72) MemtablePostFlusher   1 5 0
 INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 
 72) FlushWriter   1 5 0
>>> Looks suspiciously like 
>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E
>>> 
>>> What version are you on ? 
>>> 
>>> Are there any ERROR log messages before this ? 
>>> 
>>> Are you seeing MutationStage back up ? 
>>> 
>>> Are you see log messages from GCInspector ?
>>> 
>>> Cheers
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 1

Re: compression

2012-09-23 Thread Hiller, Dean
As well as your unlimited column names may all have the same prefix, right? 
Like "accounts".rowkey56, "accounts".rowkey78, etc. etc.  so the "accounts gets 
a ton of compression then.

Later,
Dean

From: Tyler Hobbs mailto:ty...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Sunday, September 23, 2012 11:46 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: compression

 column metadata, you're still likely to get a reasonable amount of 
compression.  This is especially true if there is some amount of repetition in 
the column names, values, or TTLs in wide rows.  Compression will almost always 
be beneficial unless you're already somehow CPU bound or are using large column 
values that are high in entropy, such as pre-compressed or encrypted data.


Secondary index loss on node restart

2012-09-23 Thread Michael Theroux
Hello,

We have been noticing an issue where, about 50% of the time in which a node 
fails or is restarted, secondary indexes appear to be partially lost or 
corrupted.  A drop and re-add of the index appears to correct the issue.  There 
are no errors in the cassandra logs that I see.  Part of the index seems to be 
simply missing.  Sometimes this corruption/loss doesn't happen immediately, but 
sometime after the node is restarted.  In addition, the index never appears to 
have an issue when the node comes down, it is only after the node comes back up 
and recovers in which we experience an issue.

We developed some code that goes through all the rows in the table, by key, in 
which the index is present.  It then attempts to look up the information via 
secondary index, in an attempt to detect when the issue occurs.  Another odd 
observation is that the number of members present in the index when we have the 
issue varies up and down (the index and the tables don't change that often).

We are running a 6 node Cassandra cluster with a replication factor of 3, 
consistency level for all queries is LOCAL_QUORUM.  We are running Cassandra 
1.1.2.

Anyone have any insights?

-Mike

Re: [problem with OOM in nodes]

2012-09-23 Thread aaron morton
> /var/log/cassandra$ cat system.log | grep "Compacting large" | grep -E
> "[0-9]+ bytes" -o | cut -d " " -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
> print foo "MB" }'  | sort -nr | head -n 50

> Is it bad signal?
Sorry, I do not know what this is outputting. 

>> As I can see in cfstats, compacted row maximum size: 386857368 !
Yes. 
Having rows in the 100's of MB is will cause problems. Doubly so if they are 
large super columns. 

Cheers



-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/09/2012, at 5:07 AM, Denis Gabaydulin  wrote:

> And some stuff from log:
> 
> 
> /var/log/cassandra$ cat system.log | grep "Compacting large" | grep -E
> "[0-9]+ bytes" -o | cut -d " " -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
> print foo "MB" }'  | sort -nr | head -n 50
> 3821.55MB
> 3337.85MB
> 1221.64MB
> 1128.67MB
> 930.666MB
> 916.4MB
> 861.114MB
> 843.325MB
> 711.813MB
> 706.992MB
> 674.282MB
> 673.861MB
> 658.305MB
> 557.756MB
> 531.577MB
> 493.112MB
> 492.513MB
> 492.291MB
> 484.484MB
> 479.908MB
> 465.742MB
> 464.015MB
> 459.95MB
> 454.472MB
> 441.248MB
> 428.763MB
> 424.028MB
> 416.663MB
> 416.191MB
> 409.341MB
> 406.895MB
> 397.314MB
> 388.27MB
> 376.714MB
> 371.298MB
> 368.819MB
> 366.92MB
> 361.371MB
> 360.509MB
> 356.168MB
> 355.012MB
> 354.897MB
> 354.759MB
> 347.986MB
> 344.109MB
> 335.546MB
> 329.529MB
> 326.857MB
> 326.252MB
> 326.237MB
> 
> Is it bad signal?
> 
> On Fri, Sep 21, 2012 at 8:22 PM, Denis Gabaydulin  wrote:
>> Found one more intersting fact.
>> As I can see in cfstats, compacted row maximum size: 386857368 !
>> 
>> On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin  wrote:
>>> Reports - is a SuperColumnFamily
>>> 
>>> Each report has unique identifier (report_id). This is a key of
>>> SuperColumnFamily.
>>> And a report saved in separate row.
>>> 
>>> A report is consisted of report rows (may vary between 1 and 50,
>>> but most are small).
>>> 
>>> Each report row is saved in separate super column. Hector based code:
>>> 
>>> superCfMutator.addInsertion(
>>>  report_id,
>>>  "Reports",
>>>  HFactory.createSuperColumn(
>>>report_row_id,
>>>mapper.convertObject(object),
>>>columnDefinition.getTopSerializer(),
>>>columnDefinition.getSubSerializer(),
>>>inferringSerializer
>>>  )
>>> );
>>> 
>>> We have two frequent operation:
>>> 
>>> 1. count report rows by report_id (calculate number of super columns
>>> in the row).
>>> 2. get report rows by report_id and range predicate (get super columns
>>> from the row with range predicate).
>>> 
>>> I can't see here a big super columns :-(
>>> 
>>> On Fri, Sep 21, 2012 at 3:10 AM, Tyler Hobbs  wrote:
 I'm not 100% that I understand your data model and read patterns correctly,
 but it sounds like you have large supercolumns and are requesting some of
 the subcolumns from individual super columns.  If that's the case, the 
 issue
 is that Cassandra must deserialize the entire supercolumn in memory 
 whenever
 you read *any* of the subcolumns.  This is one of the reasons why composite
 columns are recommended over supercolumns.
 
 
 On Thu, Sep 20, 2012 at 6:45 AM, Denis Gabaydulin  
 wrote:
> 
> p.s. Cassandra 1.1.4
> 
> On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin 
> wrote:
>> Hi, all!
>> 
>> We have a cluster with virtual 7 nodes (disk storage is connected to
>> nodes with iSCSI). The storage schema is:
>> 
>> Reports:{
>>1:{
>>1:{"value1":"some val", "value2":"some val"},
>>2:{"value1":"some val", "value2":"some val"}
>>...
>>},
>>2:{
>>1:{"value1":"some val", "value2":"some val"},
>>2:{"value1":"some val", "value2":"some val"}
>>...
>>}
>>...
>> }
>> 
>> create keyspace osmp_reports
>>  with placement_strategy = 'SimpleStrategy'
>>  and strategy_options = {replication_factor : 4}
>>  and durable_writes = true;
>> 
>> use osmp_reports;
>> 
>> create column family QueryReportResult
>>  with column_type = 'Super'
>>  and comparator = 'BytesType'
>>  and subcomparator = 'BytesType'
>>  and default_validation_class = 'BytesType'
>>  and key_validation_class = 'BytesType'
>>  and read_repair_chance = 1.0
>>  and dclocal_read_repair_chance = 0.0
>>  and gc_grace = 432000
>>  and min_compaction_threshold = 4
>>  and max_compaction_threshold = 32
>>  and replicate_on_write = true
>>  and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>>  and caching = 'KEYS_ONLY';
>> 
>> =
>> 
>> Read/Write CL: 2
>> 
>> Most of the reports are small, but some of them could have a half
>> mullion of rows (xml). Typical operations on this dataset is:
>> 
>> count report rows by report

Re: any ways to have compaction use less disk space?

2012-09-23 Thread Віталій Тимчишин
If you think about space, use Leveled compaction! This won't only allow you
to fill more space, but also will shrink you data much faster in case of
updates. Size compaction can give you 3x-4x more space used than there are
live data. Consider the following (our simplified) scenario:
1) The data is updated weekly
2) Each week a large SSTable is written (say, 300GB) after full update
processing.
3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
4) Only after 4th week they all will be compacted into one 300GB SSTable.

Leveled compaction've tamed space for us. Note that you should set
sstable_size_in_mb
to reasonably high value (it is 512 for us with ~700GB per node) to prevent
creating a lot of small files.

Best regards, Vitalii Tymchyshyn.

2012/9/20 Hiller, Dean 

> While diskspace is cheap, nodes are not that cheap, and usually systems
> have a 1T limit on each node which means we would love to really not add
> more nodes until we hit 70% disk space instead of the normal 50% that we
> have read about due to compaction.
>
> Is there any way to use less disk space during compactions?
> Is there any work being done so that compactions take less space in the
> future meaning we can buy less nodes?
>
> Thanks,
> Dean
>



-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-23 Thread Aaron Turner
On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин  wrote:
> If you think about space, use Leveled compaction! This won't only allow you
> to fill more space, but also will shrink you data much faster in case of
> updates. Size compaction can give you 3x-4x more space used than there are
> live data. Consider the following (our simplified) scenario:
> 1) The data is updated weekly
> 2) Each week a large SSTable is written (say, 300GB) after full update
> processing.
> 3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
> 4) Only after 4th week they all will be compacted into one 300GB SSTable.
>
> Leveled compaction've tamed space for us. Note that you should set
> sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
> per node) to prevent creating a lot of small files.

512MB per sstable?  Wow, that's freaking huge.  From my conversations
with various developers 5-10MB seems far more reasonable.   I guess it
really depends on your usage patterns, but that seems excessive to me-
especially as sstables are promoted.



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: CQL 2, CQL 3 and Thrift confusion

2012-09-23 Thread Sylvain Lebresne
In CQL3, names are case insensitive by default, while they were case
sensitive in CQL2. You can force whatever case you want in CQL3
however using double quotes. So in other words, in CQL3,
  USE "TestKeyspace";
should work as expected.

--
Sylvain

On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov
 wrote:
> Hi,
>
> I'm currently using Cassandra 1.1.5.
>
> When I'm trying to create a Keyspace from CQL 2 with a command (`cqlsh -2`):
>
>   CREATE KEYSPACE TestKeyspace WITH strategy_class = 'SimpleStrategy' AND
> strategy_options:replication_factor = 1
>
> Then try to access it from CQL 3 (`cqlsh -3`):
>
>   USE TestKeyspace;
>
> I get an error: Bad Request: Keyspace 'testkeyspace' does not exist
>
> Same thing is applicable to Thrift Interface. Somehow, I can only access
> keyspaces created from CQL 2 via Thrift Interface.
>
> Basically, I get same exact error: InvalidRequestException(why:There is no
> ring for the keyspace: CascadingCassandraCql3)
>
> Am I missing some switch? Or maybe it is intended to work that way?...
> Thanks!
>
> --
> alex p


Re: Disk configuration in new cluster node

2012-09-23 Thread Aaron Turner
On Fri, Sep 21, 2012 at 2:05 AM, aaron morton  wrote:
>> Would it help if I partitioned the computing resources of my physical
>> machines into VMs?
>
> No.
> Just like cutting a cake into smaller pieces does not mean you can eat more
> without getting fat.
>
> In the general case, regular HDD and 1 Gbe and 8 to 16 virtual cores and 8GB
> to 16GB ram, you can expect to comfortably run up 400GB of data (maybe
> 500GB). That is replicated storage,  so 400 / 3 = 133GB if you replicate
> data 3 times.

Remember also that these numbers reflect total size of your sstables.
This is both good and bad:

1. Good, because if you use compression you can store more data.  I'm
doing time series data for network statistics and I'm seeing extremely
good compression numbers (better then 10:1)

2. Bad, because if you're doing a lot of deletes, the old data +
tombstones count against you until they're actually purged from disk.

This can create rather interesting disk usage situations where my
"rolling 48 hours" of current data CF takes significantly more disk
space then my historical CF which currently stores over 4 months worth
of data.   I'm thinking about repairing the rolling 48 hours CF more
often and reducing the gc_grace time so that compaction has a better
chance of removing stale data from disk.


-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: Varchar indexed column and IN(...)

2012-09-23 Thread aaron morton
> If this is intended behavior, could somebody please point me to where this is
> documented? 
It is intended. 

The docs don't make it totally clear though:
 syntax is:

 { = | < | > | <= | >= } 
 IN ( [,...])

http://www.datastax.com/docs/1.1/references/cql/SELECT

the key_value means only the primary key field.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/09/2012, at 6:58 AM, Sergei Petrunia  wrote:

> Hello,
> 
> Does CQL's IN(...) predicate have the same meaning as SQL's IN(...)? I'm 
> asking this, because I get results that I cannot explain:
> 
> cqlsh:xpl1> select * from t1 where col2='bar1';
> pk   | col1 | col2
> --+--+--
> pk1b | foo1 | bar1
>  pk1 | foo1 | bar1
> pk1a | foo1 | bar1
> pk1c | foo1 | bar1
> 
> cqlsh:xpl1> select * from t1 where col2 in ('bar1', 'bar2') ;
> cqlsh:xpl1> 
> 
> The first query shows there are records with col2='bar1'. I would expect the
> second query return a superset of what the first query returned. However, it
> returns nothing!
> 
> If this is intended behavior, could somebody please point me to where this is
> documented? 
> 
> == Complete example == 
> # Repeatable on Cassandra 1.1.4 or 1.1.5:
> 
> [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> 
> cqlsh> create keyspace xpl1 WITH strategy_class ='SimpleStrategy' and 
> strategy_options:replication_factor=1;
> cqlsh> use xpl1;
> cqlsh:xpl1> create table t1 (pk varchar primary key, col1 varchar, col2 
> varchar);
> cqlsh:xpl1> create index t1_c1 on t1(col1);
> cqlsh:xpl1> create index t1_c2 on t1(col2);
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk1','foo1','bar1');
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk1a','foo1','bar1');
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk1b','foo1','bar1');
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk1c','foo1','bar1');
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk2','foo2','bar2');
> cqlsh:xpl1> insert into t1  (pk, col1, col2) values ('pk3','foo3','bar3');
> cqlsh:xpl1> select * from t1 where col2='bar1';
> pk   | col1 | col2
> --+--+--
> pk1b | foo1 | bar1
>  pk1 | foo1 | bar1
> pk1a | foo1 | bar1
> pk1c | foo1 | bar1
> 
> cqlsh:xpl1> select * from t1 where col2 in ('bar1', 'bar2') ;
> cqlsh:xpl1> 
> 
> BR
> Sergei
> -- 
> Sergei Petrunia, Software Developer
> Monty Program AB, http://askmonty.org
> Blog: http://s.petrunia.net/blog



Re: Correct model

2012-09-23 Thread aaron morton
Yup.

(Multi get is just a convenience method, it explodes into multiple gets on the 
server side. )

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2012, at 5:01 AM, "Hiller, Dean"  wrote:

> But the only advantage in this solution is to split data among partitions?
> 
> You need to split data among partitions or your query won't scale as more and 
> more data is added to table.  Having the partition means you are querying a 
> lot less rows.
> 
> What do you mean here by current partition?
> 
> He means determine the ONE partition key and query that partition.  Ie. If 
> you want just latest user requests, figure out the partition key based on 
> which month you are in and query it.  If you want the latest independent of 
> user, query the correct single partition for GlobalRequests CF.
> 
> If I want all the requests for the user, couldn't I just select all 
> UserRequest records which start with "userId"?
> 
> He designed it so the user requests table was completely scalable so he has 
> partitions there.  If you don't have partitions, you could run into a row 
> that is t long.  You don't need to design it this way if you know none of 
> your users are going to go into the millions as far as number of requests.  
> In his design then, you need to pick the correct partition and query into 
> that partition.
> 
> I really didn't understand why to use partitions.
> 
> Partitions are a way if you know your rows will go into the trillions of 
> breaking them up so each partition has 100k rows or so or even 1 million but 
> maxes out in the millions most likely.  Without partitions, you hit a limit 
> in the millions.  With partitions, you can keep scaling past that as you can 
> have as many partitions as you want.
> 
> A multi-get is a query that finds IN PARALLEL all the rows with the matching 
> keys you send to cassandra.  If you do 1000 gets(instead of a multi-get) with 
> 1ms latency, you will find, it takes 1 second+processing time.  If you do ONE 
> multi-get, you only have 1 request and therefore 1ms latency.  The other 
> solution is you could send 1000 "asycnh" gets but I have a feeling that would 
> be slower with all the marshalling/unmarshalling of the envelope…..really 
> depends on the envelope size like if we were using http, you would get killed 
> doing 1000 requests instead of 1 with 1000 keys in it.
> 
> Later,
> Dean
> 
> From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Sunday, September 23, 2012 10:23 AM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Re: Correct model
> 
> 
> 2012/9/20 aaron morton 
> mailto:aa...@thelastpickle.com>>
> I would consider:
> 
> # User CF
> * row_key: user_id
> * columns: user properties, key=value
> 
> # UserRequests CF
> * row_key:  where partition_start is the start of 
> a time partition that makes sense in your domain. e.g. partition monthly. 
> Generally want to avoid rows the grow forever, as a rule of thumb avoid rows 
> more than a few 10's of MB.
> * columns: two possible approaches:
> 1) If the requests are immutable and you generally want all of the data store 
> the request in a single column using JSON or similar, with the column name a 
> timestamp.
> 2) Otherwise use a composite column name of  to 
> store the request in many columns.
> * In either case consider using Reversed comparators so the most recent 
> columns are first  see 
> http://thelastpickle.com/2011/10/03/Reverse-Comparators/
> 
> # GlobalRequests CF
> * row_key: partition_start - time partition as above. It may be easier to use 
> the same partition scheme.
> * column name: 
> * column value: empty
> 
> Ok, I think I understood your suggestion... But the only advantage in this 
> solution is to split data among partitions? I understood how it would work, 
> but I didn't understand why it's better than the other solution, without the 
> GlobalRequests CF
> 
> - Select all the requests for an user
> Work out the current partition client side, get the first N columns. Then 
> page.
> 
> What do you mean here by current partition? You mean I would perform a query 
> for each particition? If I want all the requests for the user, couldn't I 
> just select all UserRequest records which start with "userId"? I might be 
> missing something here, but in my understanding if I use hector to query a 
> column familly I can do that and Cassandra servers will automatically 
> communicate to each other to get the data I need, right? Is it bad? I really 
> didn't understand why to use partitions.
> 
> 
> - Select all the users which has new requests, since date D
> Worm out the current partition client side, get the first N columns from 
> GlobalRequests, make a multi get call to UserRequests
> NOTE: Assuming the size of the g

Re: Cassandra Messages Dropped

2012-09-23 Thread aaron morton
> To put in other words, Cassandra will lock down all tables until all pending 
> flush requests fit in the pending queue.
This was the first issue I looked at in my Cassandra SF talk 
http://www.datastax.com/events/cassandrasummit2012/presentations

I've seen it occur more often with lots-o-secondary indexes. 
 

> We had plenty of write I/O available.  We also had free memory.  I increased 
> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8".  We 
> haven't had any timeouts for a number of days now.
Cool. 

Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2012, at 6:09 AM, Michael Theroux  wrote:

> There were no errors in the log (other than the messages dropped exception 
> pasted below), and the node does recover.  We have only a small number of 
> secondary indexes (3 in the whole system).
> 
> However, I went through the cassandra code, and I believe I've worked through 
> this problem.
> 
> Just to finish out this thread, I realized that when you see:
> 
>   INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 
> 72) FlushWriter   1 5 0
> 
> It is an issue.  Cassandra will at various times enqueue many memtables for 
> flushing.  By default, the queue size for this is 4.  If more than 5 
> memtables get queued for flushing (4 + 1 for the one currently being 
> flushed), a lock will be acquired and held across all tables until all 
> memtables that need to be flushed are enqueued.  If it takes more than 
> rpc_timeout_time_in_ms time to flush enough information to allow all the 
> pending memtables to be enqueued, a "messages dropped" will occur.  To put in 
> other words, Cassandra will lock down all tables until all pending flush 
> requests fit in the pending queue.  If your queue size is 4, and 8 tables 
> need to be flushed, Cassandra will lock down all tables until a minimum of 3 
> memtables are flushed.
> 
> With this in mind, I went through the cassandra log and found this was indeed 
> the case looking at log entries similar to these:
> 
>  INFO [OptionalTasks:1] 2012-09-16 05:54:29,750 ColumnFamilyStore.java (line 
> 643) Enqueuing flush of Memtable-p@1525015234(18686281/341486464 
> serialized/live bytes, 29553 ops)
> ...
> INFO [FlushWriter:29] 2012-09-16 05:54:29,768 Memtable.java (line 266) 
> Writing Memtable-p@1525015234(18686281/341486464 serialized/live bytes, 29553 
> ops)
> ...
> INFO [FlushWriter:29] 2012-09-16 05:54:30,254 Memtable.java (line 307) 
> Completed flushing /data/cassandra/data/open/people/open-p-hd-441-Data.db
> 
> I was able to figure out what the rpc_timeout_in_ms needed to be to 
> temporarily prevent the problem.
> 
> We had plenty of write I/O available.  We also had free memory.  I increased 
> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8".  We 
> haven't had any timeouts for a number of days now.
> 
> Thanks for your help,
> -Mike
> 
> On Sep 18, 2012, at 5:14 AM, aaron morton wrote:
> 
>> Any errors in the log ?
>> 
>> The node recovers ? 
>> 
>> Do you use secondary indexes ? If so check comments for  
>> memtable_flush_queue_size in the yaml. if this value is too low writes may 
>> back up. But I would not expect it to cause dropped messages. 
>> 
>>> nodetool info also shows we have over a gig of available memory on the JVM 
>>> heap of each node.
>> 
>> Not all memory is created equal :)
>> ParNew is kicking in to GC the Eden space in the New Heap. 
>>  
>> It may just be that the node is getting hammered by something and IO is 
>> getting overwhelmed. If you can put the logs up someone might take a look. 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/09/2012, at 3:46 PM, Michael Theroux  wrote:
>> 
>>> Thanks for the response.
>>> 
>>> We are on version 1.1.2.  We don't see the MutationStage back up.  The dump 
>>> from the messages dropped error doesn't show a backup, but also watching 
>>> "nodetool tpstats" doesn't show any backup there.
>>> 
>>> nodetool info also shows we have over a gig of available memory on the JVM 
>>> heap of each node.
>>> 
>>> The earliest GCInspector traces I see before one of the more recent 
>>> incidents in which messages were dropped are:
>>> 
>>> INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 
>>> 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 
>>> 4253024256
>>>  
>>> NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 
>>> 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 
>>> 4253024256
>>>  
>>> INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 
>>> 122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 
>>> 4253024256
>>> 
>>> But this was 45 minutes before messages were dropped.
>>> 
>>> It's appreciated,
>>> -Mike
>>>  
>>> O

Re: Cassandra Messages Dropped

2012-09-23 Thread Michael Theroux
Love the Mars lander analogies :)

On Sep 23, 2012, at 5:39 PM, aaron morton wrote:

>> To put in other words, Cassandra will lock down all tables until all pending 
>> flush requests fit in the pending queue.
> This was the first issue I looked at in my Cassandra SF talk 
> http://www.datastax.com/events/cassandrasummit2012/presentations
> 
> I've seen it occur more often with lots-o-secondary indexes. 
>  
> 
>> We had plenty of write I/O available.  We also had free memory.  I increased 
>> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8".  We 
>> haven't had any timeouts for a number of days now.
> Cool. 
> 
> Cheers
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 24/09/2012, at 6:09 AM, Michael Theroux  wrote:
> 
>> There were no errors in the log (other than the messages dropped exception 
>> pasted below), and the node does recover.  We have only a small number of 
>> secondary indexes (3 in the whole system).
>> 
>> However, I went through the cassandra code, and I believe I've worked 
>> through this problem.
>> 
>> Just to finish out this thread, I realized that when you see:
>> 
>>  INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 
>> 72) FlushWriter   1 5 0
>> 
>> It is an issue.  Cassandra will at various times enqueue many memtables for 
>> flushing.  By default, the queue size for this is 4.  If more than 5 
>> memtables get queued for flushing (4 + 1 for the one currently being 
>> flushed), a lock will be acquired and held across all tables until all 
>> memtables that need to be flushed are enqueued.  If it takes more than 
>> rpc_timeout_time_in_ms time to flush enough information to allow all the 
>> pending memtables to be enqueued, a "messages dropped" will occur.  To put 
>> in other words, Cassandra will lock down all tables until all pending flush 
>> requests fit in the pending queue.  If your queue size is 4, and 8 tables 
>> need to be flushed, Cassandra will lock down all tables until a minimum of 3 
>> memtables are flushed.
>> 
>> With this in mind, I went through the cassandra log and found this was 
>> indeed the case looking at log entries similar to these:
>> 
>>  INFO [OptionalTasks:1] 2012-09-16 05:54:29,750 ColumnFamilyStore.java (line 
>> 643) Enqueuing flush of Memtable-p@1525015234(18686281/341486464 
>> serialized/live bytes, 29553 ops)
>> ...
>> INFO [FlushWriter:29] 2012-09-16 05:54:29,768 Memtable.java (line 266) 
>> Writing Memtable-p@1525015234(18686281/341486464 serialized/live bytes, 
>> 29553 ops)
>> ...
>> INFO [FlushWriter:29] 2012-09-16 05:54:30,254 Memtable.java (line 307) 
>> Completed flushing /data/cassandra/data/open/people/open-p-hd-441-Data.db
>> 
>> I was able to figure out what the rpc_timeout_in_ms needed to be to 
>> temporarily prevent the problem.
>> 
>> We had plenty of write I/O available.  We also had free memory.  I increased 
>> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8".  We 
>> haven't had any timeouts for a number of days now.
>> 
>> Thanks for your help,
>> -Mike
>> 
>> On Sep 18, 2012, at 5:14 AM, aaron morton wrote:
>> 
>>> Any errors in the log ?
>>> 
>>> The node recovers ? 
>>> 
>>> Do you use secondary indexes ? If so check comments for  
>>> memtable_flush_queue_size in the yaml. if this value is too low writes may 
>>> back up. But I would not expect it to cause dropped messages. 
>>> 
 nodetool info also shows we have over a gig of available memory on the JVM 
 heap of each node.
>>> 
>>> Not all memory is created equal :)
>>> ParNew is kicking in to GC the Eden space in the New Heap. 
>>>  
>>> It may just be that the node is getting hammered by something and IO is 
>>> getting overwhelmed. If you can put the logs up someone might take a look. 
>>> 
>>> Cheers
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 18/09/2012, at 3:46 PM, Michael Theroux  wrote:
>>> 
 Thanks for the response.
 
 We are on version 1.1.2.  We don't see the MutationStage back up.  The 
 dump from the messages dropped error doesn't show a backup, but also 
 watching "nodetool tpstats" doesn't show any backup there.
 
 nodetool info also shows we have over a gig of available memory on the JVM 
 heap of each node.
 
 The earliest GCInspector traces I see before one of the more recent 
 incidents in which messages were dropped are:
 
INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 
 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 
 4253024256
  
NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 
 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 
 4253024256
  
INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspe

Cassandra simulator

2012-09-23 Thread Shankaranarayanan P N
Hi,

Has there been any updates on the cassandra simulator
https://issues.apache.org/jira/browse/CASSANDRA-561 ?

I have been trying to build it using Cassandra 0.4 (which I believe was the
version the simulator was built with ), but the build breaks at multiple
places. I thought it would be useful to ask around if someone else had
tried the simulator anytime earlier and actually got it to work.

Thanks,
Shankar


Re: Cassandra simulator

2012-09-23 Thread Tyler Hobbs
You might find these two projects useful:

- ccm, which makes it easy to run a cluster on a single machine:
https://github.com/pcmanus/ccm
- Cassanova, which supports a large portion of the Thrift API with a
lightweight python process: https://github.com/riptano/Cassanova


On Sun, Sep 23, 2012 at 5:04 PM, Shankaranarayanan P N <
shankarp...@gmail.com> wrote:

> Hi,
>
> Has there been any updates on the cassandra simulator
> https://issues.apache.org/jira/browse/CASSANDRA-561 ?
>
> I have been trying to build it using Cassandra 0.4 (which I believe was
> the version the simulator was built with ), but the build breaks at
> multiple places. I thought it would be useful to ask around if someone else
> had tried the simulator anytime earlier and actually got it to work.
>
> Thanks,
> Shankar
>



-- 
Tyler Hobbs
DataStax 


Re: [problem with OOM in nodes]

2012-09-23 Thread Denis Gabaydulin
On Sun, Sep 23, 2012 at 10:41 PM, aaron morton  wrote:
> /var/log/cassandra$ cat system.log | grep "Compacting large" | grep -E
> "[0-9]+ bytes" -o | cut -d " " -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
> print foo "MB" }'  | sort -nr | head -n 50
>
>
> Is it bad signal?
>
> Sorry, I do not know what this is outputting.
>

This is outputting size of big rows which cassandra had compacted before.

> As I can see in cfstats, compacted row maximum size: 386857368 !
>
> Yes.
> Having rows in the 100's of MB is will cause problems. Doubly so if they are
> large super columns.
>

What exactly is the problem with big rows?
And, how can we should place our data in this case (see the schema in
the previous replies)? Splitting one report to multiple rows is
uncomfortably :-(


> Cheers
>
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/09/2012, at 5:07 AM, Denis Gabaydulin  wrote:
>
> And some stuff from log:
>
>
> /var/log/cassandra$ cat system.log | grep "Compacting large" | grep -E
> "[0-9]+ bytes" -o | cut -d " " -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
> print foo "MB" }'  | sort -nr | head -n 50
> 3821.55MB
> 3337.85MB
> 1221.64MB
> 1128.67MB
> 930.666MB
> 916.4MB
> 861.114MB
> 843.325MB
> 711.813MB
> 706.992MB
> 674.282MB
> 673.861MB
> 658.305MB
> 557.756MB
> 531.577MB
> 493.112MB
> 492.513MB
> 492.291MB
> 484.484MB
> 479.908MB
> 465.742MB
> 464.015MB
> 459.95MB
> 454.472MB
> 441.248MB
> 428.763MB
> 424.028MB
> 416.663MB
> 416.191MB
> 409.341MB
> 406.895MB
> 397.314MB
> 388.27MB
> 376.714MB
> 371.298MB
> 368.819MB
> 366.92MB
> 361.371MB
> 360.509MB
> 356.168MB
> 355.012MB
> 354.897MB
> 354.759MB
> 347.986MB
> 344.109MB
> 335.546MB
> 329.529MB
> 326.857MB
> 326.252MB
> 326.237MB
>
> Is it bad signal?
>
> On Fri, Sep 21, 2012 at 8:22 PM, Denis Gabaydulin  wrote:
>
> Found one more intersting fact.
> As I can see in cfstats, compacted row maximum size: 386857368 !
>
> On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin 
> wrote:
>
> Reports - is a SuperColumnFamily
>
> Each report has unique identifier (report_id). This is a key of
> SuperColumnFamily.
> And a report saved in separate row.
>
> A report is consisted of report rows (may vary between 1 and 50,
> but most are small).
>
> Each report row is saved in separate super column. Hector based code:
>
> superCfMutator.addInsertion(
>  report_id,
>  "Reports",
>  HFactory.createSuperColumn(
>report_row_id,
>mapper.convertObject(object),
>columnDefinition.getTopSerializer(),
>columnDefinition.getSubSerializer(),
>inferringSerializer
>  )
> );
>
> We have two frequent operation:
>
> 1. count report rows by report_id (calculate number of super columns
> in the row).
> 2. get report rows by report_id and range predicate (get super columns
> from the row with range predicate).
>
> I can't see here a big super columns :-(
>
> On Fri, Sep 21, 2012 at 3:10 AM, Tyler Hobbs  wrote:
>
> I'm not 100% that I understand your data model and read patterns correctly,
> but it sounds like you have large supercolumns and are requesting some of
> the subcolumns from individual super columns.  If that's the case, the issue
> is that Cassandra must deserialize the entire supercolumn in memory whenever
> you read *any* of the subcolumns.  This is one of the reasons why composite
> columns are recommended over supercolumns.
>
>
> On Thu, Sep 20, 2012 at 6:45 AM, Denis Gabaydulin  wrote:
>
>
> p.s. Cassandra 1.1.4
>
> On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin 
> wrote:
>
> Hi, all!
>
> We have a cluster with virtual 7 nodes (disk storage is connected to
> nodes with iSCSI). The storage schema is:
>
> Reports:{
>1:{
>1:{"value1":"some val", "value2":"some val"},
>2:{"value1":"some val", "value2":"some val"}
>...
>},
>2:{
>1:{"value1":"some val", "value2":"some val"},
>2:{"value1":"some val", "value2":"some val"}
>...
>}
>...
> }
>
> create keyspace osmp_reports
>  with placement_strategy = 'SimpleStrategy'
>  and strategy_options = {replication_factor : 4}
>  and durable_writes = true;
>
> use osmp_reports;
>
> create column family QueryReportResult
>  with column_type = 'Super'
>  and comparator = 'BytesType'
>  and subcomparator = 'BytesType'
>  and default_validation_class = 'BytesType'
>  and key_validation_class = 'BytesType'
>  and read_repair_chance = 1.0
>  and dclocal_read_repair_chance = 0.0
>  and gc_grace = 432000
>  and min_compaction_threshold = 4
>  and max_compaction_threshold = 32
>  and replicate_on_write = true
>  and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>  and caching = 'KEYS_ONLY';
>
> =
>
> Read/Write CL: 2
>
> Most of the reports are small, but some of them could have a half
> mullion of rows (xml). Typical operations on this dataset is:
>
> count report rows by report_id (top level id of super column);
> g

Re: CQL 2, CQL 3 and Thrift confusion

2012-09-23 Thread Oleksandr Petrov
Yup, that was exactly the cause. Somehow I could not figure out why it was
downcasing my keyspace name all the time.
May be good to put it somewhere in reference material with a more detailed
explanation.

On Sun, Sep 23, 2012 at 9:30 PM, Sylvain Lebresne wrote:

> In CQL3, names are case insensitive by default, while they were case
> sensitive in CQL2. You can force whatever case you want in CQL3
> however using double quotes. So in other words, in CQL3,
>   USE "TestKeyspace";
> should work as expected.
>
> --
> Sylvain
>
> On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov
>  wrote:
> > Hi,
> >
> > I'm currently using Cassandra 1.1.5.
> >
> > When I'm trying to create a Keyspace from CQL 2 with a command (`cqlsh
> -2`):
> >
> >   CREATE KEYSPACE TestKeyspace WITH strategy_class = 'SimpleStrategy' AND
> > strategy_options:replication_factor = 1
> >
> > Then try to access it from CQL 3 (`cqlsh -3`):
> >
> >   USE TestKeyspace;
> >
> > I get an error: Bad Request: Keyspace 'testkeyspace' does not exist
> >
> > Same thing is applicable to Thrift Interface. Somehow, I can only access
> > keyspaces created from CQL 2 via Thrift Interface.
> >
> > Basically, I get same exact error: InvalidRequestException(why:There is
> no
> > ring for the keyspace: CascadingCassandraCql3)
> >
> > Am I missing some switch? Or maybe it is intended to work that way?...
> > Thanks!
> >
> > --
> > alex p
>



-- 
alex p


Re: compression

2012-09-23 Thread Tamar Fraenkel
Thanks all, that helps. Will start with one - two CFs and let you know the
effect

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean  wrote:

> As well as your unlimited column names may all have the same prefix,
> right? Like "accounts".rowkey56, "accounts".rowkey78, etc. etc.  so the
> "accounts gets a ton of compression then.
>
> Later,
> Dean
>
> From: Tyler Hobbs mailto:ty...@datastax.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Sunday, September 23, 2012 11:46 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: compression
>
>  column metadata, you're still likely to get a reasonable amount of
> compression.  This is especially true if there is some amount of repetition
> in the column names, values, or TTLs in wide rows.  Compression will almost
> always be beneficial unless you're already somehow CPU bound or are using
> large column values that are high in entropy, such as pre-compressed or
> encrypted data.
>
<>