CQL3 dynamic CF design questions

2013-04-07 Thread Marko Asplund
Hi,

I'm currently designing a backend service that would store user profile
information for different applications. Most of the properties in a user
profile would be unknown to the service and specified by the applications
using the service, so the properties would need to be added dynamically.

I was planning to use CQL3 and a dynamic column family defined something
like this:

CREATE TABLE user (
  id UUID,
  propertyset_key TEXT,
  propertyset_val TEXT,
  PRIMARY KEY (id, propertyset_key)
);

There would be N (assuming < 50) property sets associated with a user.
The property set values would be complex object graphs represented as JSON.

Which would lead to the storage engine storing rows similar to this (AFAIK):

8b2c0b60-977a-11e2-99c2-c8bcc8dc5d1d
- basic_info:propertyset_val = { firstName:"john", lastName:"smith", ...}
- contact_info:propertyset_val = { address: {streetAddr:"1 infinite loop",
postalCode: ""}, ... }
- meal_prefs:propertyset_val = { ... }
- ...

Any comments on this design?

Another option would be to use the Cassandra map type for storing property
sets like this:

CREATE TABLE user (
  id UUID,
  property_sets MAP,
  PRIMARY KEY (id)
);

Based on the documentation I understood that each map element would
internally be stored as separate a column, so are these user table
definitions equivalent from the storage engine perspective?

I'm using Astyanax which seems to support Cassandra collections.

With the second definition, it should be possible to later migrate a
dynamic property e.g. job_title to a static property, so that I could
execute CQL queries like this:

SELECT * FROM user WHERE job_title = 'developer';

but is it possible to accomplish that somehow with the first definition?
Or should I create separate column families for static and dynamic
properties instead?


marko


Re: Problem loading Saved Key-Cache on 1.2.3 after upgrading from 1.1.10

2013-04-07 Thread Arya Goudarzi
Yes, I know blowing them away would fix it and that is what I did, but I
want to understand why this happens in first place. I was upgrading from
1.1.10 to 1.2.3


On Fri, Apr 5, 2013 at 2:53 PM, Edward Capriolo wrote:

> This has happened before the save caches files were not compatible between
> 0.6 and 0.7. I have ran into this a couple other times before. The good
> news is the save key cache is just an optimization, you can blow it away
> and it is not usually a big deal.
>
>
>
>
> On Fri, Apr 5, 2013 at 2:55 PM, Arya Goudarzi  wrote:
>
>> Here is a chunk of bloom filter sstable skip messages from the node I
>> enabled DEBUG on:
>>
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39459
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39483
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39332
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39335
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39438
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39478
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39456
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39469
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39334
>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
>> 737) Bloom filter allows skipping sstable 39406
>>
>> This is the last chunk of log before C* gets stuck, right before I stop
>> the process, remove key caches and start again (This is from another node
>> that I upgraded 2 days ago):
>>
>> INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,769 SSTableReader.java
>> (line 166) Opening
>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-316499
>> (5273270 bytes)
>>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,858 SSTableReader.java
>> (line 166) Opening
>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314755
>> (5264359 bytes)
>>  INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,894 SSTableReader.java
>> (line 166) Opening
>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314762
>> (5260887 bytes)
>>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,980 SSTableReader.java
>> (line 166) Opening
>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-315886
>> (5262864 bytes)
>>  INFO [OptionalTasks:1] 2013-04-03 01:59:40,298 AutoSavingCache.java
>> (line 112) reading saved cache
>> /var/lib/cassandra/saved_caches/cardspring_production-UniqueIndexes-KeyCache
>>
>>
>> I finally upgrade all 12 nodes in our test environment yesterday. This
>> issue seemed to exists on 7 nodes out of 12 nodes. They didn't alway get
>> stuck on the same CF loading its saved KeyCache.
>>
>>
>> On Fri, Apr 5, 2013 at 9:56 AM, aaron morton wrote:
>>
>>> skipping sstable due to bloom filter debug messages
>>>
>>> What were these messages?
>>>
>>> Do you have the logs from the start up ?
>>>
>>> Cheers
>>>
>>>-
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 4/04/2013, at 6:11 AM, Arya Goudarzi  wrote:
>>>
>>> Hi,
>>>
>>> I have upgraded 2 nodes out of a 12 mode test cluster from 1.1.10 to
>>> 1.2.3. During startup while tailing C*'s system.log, I observed a series of
>>> SSTable batch load messages and skipping sstable due to bloom filter debug
>>> messages which is normal for startup, but when it reached loading saved key
>>> caches, it gets stuck forever. The I/O wait stays high in the CPU graph and
>>> I/O ops are sent to disk, but C* never passes that step of loading the key
>>> cache file successfully. The saved key cache file was about 75MB on one
>>> node and 125MB on the other node and they were for different CFs.
>>>
>>> 
>>>
>>> The CPU I/O wait constantly stayed at 40%~ while system.log was stuck at
>>> loading one saved key cache file. I have marked that on the graph above.
>>> The workaround was to delete the saved cache files and things loaded fine
>>> (See marked Normal Startup).
>>>
>>> These machines are m1.xlarge EC2 instances. And this issue happened on
>>> both nodes upgraded. This did not happen during exercise of upgrade from
>>> 1.1.6 to 1.2.2 using the same snapshot.
>>>
>>> Should I raise a JIRA?
>>>
>>> -Arya
>>>
>>>
>>>
>>
>


Problems with shuffle

2013-04-07 Thread Rustam Aliyev

Hi,

After upgrading to the vnodes I created and enabled shuffle operation as 
suggested. After running for a couple of hours I had to disable it 
because nodes were not catching up with compactions. I repeated this 
process 3 times (enable/disable).


I have 5 nodes and each of them had ~35GB. After shuffle operations 
described above some nodes are now reaching ~170GB. In the log files I 
can see same files transferred 2-4 times to the same host within the 
same shuffle session. Worst of all, after all of these I had only 20 
vnodes transferred out of 1280. So if it will continue at the same speed 
it will take about a month or two to complete shuffle.


I had few question to better understand shuffle:

1. Does disabling and re-enabling shuffle starts shuffle process from
   scratch or it resumes from the last point?

2. Will vnode reallocations speedup as shuffle proceeds or it will
   remain the same?

3. Why I see multiple transfers of the same file to the same host? e.g.:

   INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
   StreamReplyVerbHandler.java (line 44) Successfully sent
   /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
   to /10.0.1.8
   INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
   StreamReplyVerbHandler.java (line 44) Successfully sent
   /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
   to /10.0.1.8

4. When I enable/disable shuffle I receive warning message such as
   below. Do I need to worry about it?

   cassandra-shuffle -h localhost disable
   Failed to enable shuffling on 10.0.1.1!
   Failed to enable shuffling on 10.0.1.3!

I couldn't find many docs on shuffle, only read through JIRA and 
original proposal by Eric.


BR,
Rustam.



Re: Problem loading Saved Key-Cache on 1.2.3 after upgrading from 1.1.10

2013-04-07 Thread Edward Capriolo
It was not something you did wrong. The key cache format/classes involved
changed, there are a few jira issues around this:

https://issues.apache.org/jira/browse/CASSANDRA-4916
https://issues.apache.org/jira/browse/CASSANDRA-5253

Depending on how you moved between version you may or may not have been
effected.


On Sun, Apr 7, 2013 at 4:56 AM, Arya Goudarzi  wrote:

> Yes, I know blowing them away would fix it and that is what I did, but I
> want to understand why this happens in first place. I was upgrading from
> 1.1.10 to 1.2.3
>
>
> On Fri, Apr 5, 2013 at 2:53 PM, Edward Capriolo wrote:
>
>> This has happened before the save caches files were not compatible
>> between 0.6 and 0.7. I have ran into this a couple other times before. The
>> good news is the save key cache is just an optimization, you can blow it
>> away and it is not usually a big deal.
>>
>>
>>
>>
>> On Fri, Apr 5, 2013 at 2:55 PM, Arya Goudarzi  wrote:
>>
>>> Here is a chunk of bloom filter sstable skip messages from the node I
>>> enabled DEBUG on:
>>>
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39459
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39483
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39332
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39335
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39438
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39478
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39456
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39469
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39334
>>> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
>>> 737) Bloom filter allows skipping sstable 39406
>>>
>>> This is the last chunk of log before C* gets stuck, right before I stop
>>> the process, remove key caches and start again (This is from another node
>>> that I upgraded 2 days ago):
>>>
>>> INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,769 SSTableReader.java
>>> (line 166) Opening
>>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-316499
>>> (5273270 bytes)
>>>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,858 SSTableReader.java
>>> (line 166) Opening
>>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314755
>>> (5264359 bytes)
>>>  INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,894 SSTableReader.java
>>> (line 166) Opening
>>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314762
>>> (5260887 bytes)
>>>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,980 SSTableReader.java
>>> (line 166) Opening
>>> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-315886
>>> (5262864 bytes)
>>>  INFO [OptionalTasks:1] 2013-04-03 01:59:40,298 AutoSavingCache.java
>>> (line 112) reading saved cache
>>> /var/lib/cassandra/saved_caches/cardspring_production-UniqueIndexes-KeyCache
>>>
>>>
>>> I finally upgrade all 12 nodes in our test environment yesterday. This
>>> issue seemed to exists on 7 nodes out of 12 nodes. They didn't alway get
>>> stuck on the same CF loading its saved KeyCache.
>>>
>>>
>>> On Fri, Apr 5, 2013 at 9:56 AM, aaron morton wrote:
>>>
 skipping sstable due to bloom filter debug messages

 What were these messages?

 Do you have the logs from the start up ?

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 4/04/2013, at 6:11 AM, Arya Goudarzi  wrote:

 Hi,

 I have upgraded 2 nodes out of a 12 mode test cluster from 1.1.10 to
 1.2.3. During startup while tailing C*'s system.log, I observed a series of
 SSTable batch load messages and skipping sstable due to bloom filter debug
 messages which is normal for startup, but when it reached loading saved key
 caches, it gets stuck forever. The I/O wait stays high in the CPU graph and
 I/O ops are sent to disk, but C* never passes that step of loading the key
 cache file successfully. The saved key cache file was about 75MB on one
 node and 125MB on the other node and they were for different CFs.

 

 The CPU I/O wait constantly stayed at 40%~ while syste

Re: Problems with shuffle

2013-04-07 Thread Edward Capriolo
I am not familiar with shuffle, but if you attempt a shuffle and it fails
if would be a good idea to let compaction die down, or even trigger major
compaction on the nodes where the size grew. The reason is because once the
data files are on disk, even if they are duplicates, cassandra does not
know that fact. Thus if you do a move or shuffle again cassandra will try
to move all that duplicated data again. In other words, if some failed
operation grows the size of your data, deal with that first before trying
that same operation again.

FOr now your best bet is to run major compact on each node and get the data
sizes small again.


On Sun, Apr 7, 2013 at 8:43 AM, Rustam Aliyev  wrote:

>  Hi,
>
> After upgrading to the vnodes I created and enabled shuffle operation as
> suggested. After running for a couple of hours I had to disable it because
> nodes were not catching up with compactions. I repeated this process 3
> times (enable/disable).
>
> I have 5 nodes and each of them had ~35GB. After shuffle operations
> described above some nodes are now reaching ~170GB. In the log files I can
> see same files transferred 2-4 times to the same host within the same
> shuffle session. Worst of all, after all of these I had only 20 vnodes
> transferred out of 1280. So if it will continue at the same speed it will
> take about a month or two to complete shuffle.
>
> I had few question to better understand shuffle:
>
>1. Does disabling and re-enabling shuffle starts shuffle process from
>scratch or it resumes from the last point?
>
> 2. Will vnode reallocations speedup as shuffle proceeds or it will
>remain the same?
>
> 3. Why I see multiple transfers of the same file to the same host?
>e.g.:
>
>INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
>StreamReplyVerbHandler.java (line 44) Successfully sent
>/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>to /10.0.1.8
>INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
>StreamReplyVerbHandler.java (line 44) Successfully sent 
> /u01/cassandra/data/
>Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db to /10.0.1.8
>
> 4. When I enable/disable shuffle I receive warning message such as
>below. Do I need to worry about it?
>
> cassandra-shuffle -h localhost disable
> Failed to enable shuffling on 10.0.1.1!
> Failed to enable shuffling on 10.0.1.3!
>
> I couldn't find many docs on shuffle, only read through JIRA and original
> proposal by Eric.
>
> BR,
> Rustam.
>
>


Re: Cassandra services down frequently [Version 1.1.4]

2013-04-07 Thread 金剑
It also use off-heap memory out of JVM. SerializingCacheProvider should be
one of the case.

Best Regards!

Jian Jin


2013/4/6 

> Thank you Aaron and Bryan for your advice.
>
> I have changed following parameters and now Cassandra running absolutely
> fine. Please review below setting and advice am I right or right direction.
>
> cassandra-env.sh
> #JVM_OPTS="$JVM_OPTS -ea"
> MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="500M"
>
>  cassandra.yaml
> # do not persist caches to disk
> key_cache_save_period: 0
> row_cache_save_period: 0
>
> key_cache_size_in_mb: 512
> row_cache_size_in_mb: 14336
> row_cache_provider: SerializingCacheProvider
>
> I have a querry, if Cassandra is using JVM for all operations then why we
> need change above parameters separately in cassandra.yaml.
>
>
> Thanks & Regards
>
> Adeel Akbar
>
>
> Quoting aaron morton :
>
>  We can see from below that you've tweaked and disabled many of the
>>>  memory "safety valve" and other memory related settings.
>>>
>> Agree.
>> Also you are running with JVM heap size of 3.81GB which is non  default.
>> For a 16GB node I would expect 8GB.
>>
>> Try restoring the yaml values to the defaults and allowing the
>>  cassandra-env.sh file to determine the memory size.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/04/2013, at 12:36 PM, Bryan Talbot  wrote:
>>
>>  On Thu, Apr 4, 2013 at 1:27 AM,  wrote:
>>>
>>> After some time (1 hour / 2 hour) cassandra shut services on one or  two
>>> nodes with follwoing errors;
>>>
>>>
>>> Wonder what the workload and schema is like ...
>>>
>>> We can see from below that you've tweaked and disabled many of the
>>>  memory "safety valve" and other memory related settings.  Those  could be
>>> causing issues too.
>>>
>>>
>>> hinted_handoff_throttle_delay_**in_ms: 0
>>> flush_largest_memtables_at: 1.0
>>> reduce_cache_sizes_at: 1.0
>>> reduce_cache_capacity_to: 0.6
>>> rpc_keepalive: true
>>> rpc_server_type: sync
>>> rpc_min_threads: 16
>>> rpc_max_threads: 2147483647
>>> in_memory_compaction_limit_in_**mb: 256
>>> compaction_throughput_mb_per_**sec: 16
>>> rpc_timeout_in_ms: 15000
>>> dynamic_snitch_badness_**threshold: 0.0
>>>
>>
>>
>>
>