Re: Restrict Cassandra users seeing all keyspaces

2013-11-08 Thread Bhathiya Jayasekara
Hi Mikhail,

Thank you for pointing out. It is helpful.

Thanks,
Bhathiya


On Fri, Nov 8, 2013 at 1:18 PM, Mikhail Stepura  wrote:

> Please take a look at https://issues.apache.org/jira/browse/CASSANDRA-6266for 
> details
>
> -M
>
> "Bhathiya Jayasekara"  wrote in message
> news:capt+24qd4e_+_6oc3rrmzpq2c6yoij1yqydlkcepngabxk_...@mail.gmail.com...
>
> Hi all,
>
> If I understood correctly, in Cassandra, authenticated users can see the
> list of all keyspaces. (i.e. full schema)
>
> Is that the default behavior in Cassandra? Can we restrict that behavior?
>
> Thanks,
> Bhathiya
>
>
>
>


OOM while reading key cache

2013-11-08 Thread olek.stas...@gmail.com
Hello,
I'm facing OOM on reading key_cache
Cluster conf is as follows:
-6 machines which 8gb RAM each and three 150GB disks each
-default heap configuration
-deafult key cache configuration
-the biggest keyspace has abt 500GB size (RF: 2, so in fact there is
250GB of raw data).

After upgrading first of the machines from 1.2.11 to 2.0.2 i've recieved error:
 INFO [main] 2013-11-08 10:53:16,716 AutoSavingCache.java (line 114)
reading saved cache
/home/synat/nosql_filesystem/cassandra/data/saved_caches/production_storage-METADATA-KeyCache-b.db
ERROR [main] 2013-11-08 10:53:16,895 CassandraDaemon.java (line 478)
Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:352)
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:264)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:409)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:381)
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:314)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)


Error appears every start, so I've decided to disable key cache (this
was not helpful) and temporarily moved key cache out of cache folder
(file was of size 13M). That helps in starting node, but this is only
workaround and it's not demanded configuration. Anyone has any idea
what is the real cause of problem with oom?
best regards
Aleksander
ps. I've still 5 nodes to upgrade, I'll inform if problem apperas on the rest.


Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

2013-11-08 Thread Jiri Horky
Hi,

On 11/07/2013 05:18 AM, Aaron Morton wrote:
>> Class Name  
>>| Shallow Heap | Retained Heap
>> ---
>>  
>>   |  |  
>> java.nio.HeapByteBuffer @ 0x7806a0848
>>   |   48 |80
>> '- name org.apache.cassandra.db.Column @ 0x7806424e8
>>|   32 |   112
>>|- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable
>>   |2,160,888 | 2,160,888
>>|- [338530] java.lang.Object[810325] @ 0x591546540
>>   |3,241,320 | 7,820,328
>>|  '- elementData java.util.ArrayList @ 0x75e8424c0  
>>|   24 | 7,820,352
>>| |- list
>> org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @
>> 0x5940e0b18  |   48 |   128
>>| |  '- val$filteredIter
>> org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48  
>>   |   32 | 7,820,568
>>| | '- val$iter
>> org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68
>> Unreachable   |   24 | 7,820,592
>>| |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8
>>|   40 |40
>>| |  '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0
>>   |   40 |80
>>| | '- currentSlice
>> org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @
>> 0x5940e0b18|   48 |   128
>>| |'- val$filteredIter
>> org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48  
>> |   32 | 7,820,568
>>| |   '- val$iter
>> org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68
>> Unreachable |   24 | 7,820,592
>>| |- columns org.apache.cassandra.db.ArrayBackedSortedColumns
>> @ 0x5b0a33488  |   32 |56
>>| |  '- val$cf
>> org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48  
>> |   32 | 7,820,568
>>| | '- val$iter
>> org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68
>> Unreachable   |   24 | 7,820,592
>>| '- Total: 3 entries
>>|  |  
>>|- [338530] java.lang.Object[360145] @ 0x7736ce2f0 Unreachable
>>   |1,440,600 | 1,440,600
>>'- Total: 3 entries  
>>|  |  
>
> Are you doing large slices or do could you have a lot of tombstones on
> the rows ?
don't really know - how can I monitor that?
>
>> We have disabled row cache on one node to see  the  difference. Please
>> see attached plots from visual VM, I think that the effect is quite
>> visible.
> The default row cache is of the JVM heap, have you changed to
> the ConcurrentLinkedHashCacheProvider ?
Answered by Chris already :) No.
>
> One way the SerializingCacheProvider could impact GC is if the CF
> takes a lot of writes. The SerializingCacheProvider invalidates the
> row when it is written to and had to read the entire row and serialise
> it on a cache miss.
>
>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
>>> -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
> You probably want the heap to be 4G to 8G in size, 10G will encounter
> longer pauses. 
> Also the size of the new heap may be too big depending on the number
> of cores. I would recommend trying 800M
I tried to decrease it first to 384M then to 128M with no change in the
behaviour. I don't really care extra memory overhead of the cache - to
be able to actual point to it with objects, but I don't really see the
reason why it should create/delete those many objects so quickly.
>
>
>> prg01.visual.vm.png
> Shows the heap growing very quickly. This could be due to wide reads
> or a high write throughput.
Well, both prg01 and prg02 receive the same load which is about ~150-250
(during peak) read requests per seconds and 100-160 write requests per
second. The only with heap growing rapidly and GC kicking in is on nodes
with row cache enabled.

>
> Hope that helps.
Thank you!

Jiri Horky



How would you model that?

2013-11-08 Thread pavli...@gmail.com
Hey guys, I need to retrieve a list of distinct users based on their
activity datetime. How can I model a table to store that kind of
information?

The straightforward decision was this:

CREATE TABLE user_activity (user text primary key, ts timeuuid);

but it turned out it is impossible to do a select like this:

select * from user_activity order by ts;

as it fails with "ORDER BY is only supported when the partition key is
restricted by an EQ or an IN".

How would you model the thing? Just need to have a list of users based on
their last activity timestamp...

Thanks!


Re: How would you model that?

2013-11-08 Thread Laing, Michael
You could try this:

CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary key
(shard, ts));

select user, ts from user_activity where shard in ('00', '01', ...) order
by ts desc;

Grab each user and ts the first time you see that user.

Use as many shards as you think you need to control row size and spread the
load.

Set ttls to expire user_activity entries when you are no longer interested
in them.

ml


On Fri, Nov 8, 2013 at 6:10 AM, pavli...@gmail.com wrote:

> Hey guys, I need to retrieve a list of distinct users based on their
> activity datetime. How can I model a table to store that kind of
> information?
>
> The straightforward decision was this:
>
> CREATE TABLE user_activity (user text primary key, ts timeuuid);
>
> but it turned out it is impossible to do a select like this:
>
> select * from user_activity order by ts;
>
> as it fails with "ORDER BY is only supported when the partition key is
> restricted by an EQ or an IN".
>
> How would you model the thing? Just need to have a list of users based on
> their last activity timestamp...
>
> Thanks!
>
>


IN predicates on non-primary-key columns (%s) is not yet supported - then will it be ?

2013-11-08 Thread Сергей Нагайцев
CREATE COLUMNFAMILY post (
KEY uuid,
author uuid,
blog uuid,
name text,
data text,
PRIMARY KEY ( KEY )
);

SELECT * FROM post WHERE blog IN (1,2) AND author=3 ALLOW FILTERING;
(don't look at fact numbers are not uuids :)

Error: IN predicates on non-primary-key columns (blog) is not yet supported

And how to workaround this ?
Manual index tables ? Any guidelines how to design them ?


Re: How would you model that?

2013-11-08 Thread Franc Carter
How about something like using a time-range as the key (e.g an hour
depending on your update rate) and a composite (time:user)  as the column
name

cheers



On Fri, Nov 8, 2013 at 10:45 PM, Laing, Michael
wrote:

> You could try this:
>
> CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary
> key (shard, ts));
>
> select user, ts from user_activity where shard in ('00', '01', ...) order
> by ts desc;
>
> Grab each user and ts the first time you see that user.
>
> Use as many shards as you think you need to control row size and spread
> the load.
>
> Set ttls to expire user_activity entries when you are no longer interested
> in them.
>
> ml
>
>
> On Fri, Nov 8, 2013 at 6:10 AM, pavli...@gmail.com wrote:
>
>> Hey guys, I need to retrieve a list of distinct users based on their
>> activity datetime. How can I model a table to store that kind of
>> information?
>>
>> The straightforward decision was this:
>>
>> CREATE TABLE user_activity (user text primary key, ts timeuuid);
>>
>> but it turned out it is impossible to do a select like this:
>>
>> select * from user_activity order by ts;
>>
>> as it fails with "ORDER BY is only supported when the partition key is
>> restricted by an EQ or an IN".
>>
>> How would you model the thing? Just need to have a list of users based on
>> their last activity timestamp...
>>
>> Thanks!
>>
>>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: IN predicates on non-primary-key columns (%s) is not yet supported - then will it be ?

2013-11-08 Thread Laing, Michael
try this:

CREATE COLUMNFAMILY post (
KEY uuid,
author uuid,
blog timeuuid, -- sortable
name text,
data text,
PRIMARY KEY ( KEY, blog )
);

create index on post (author);

SELECT * FROM post
WHERE
blog >= 4d6b5fc5-487b-11e3-a6f4-406c8f1838fa
AND blog <= 50573ef8-487b-11e3-be65-406c8f1838fa
AND author= a6c9f405-487b-11e3-bd38-406c8f1838fa
;

works if blog can be modeled this way...

ml


On Fri, Nov 8, 2013 at 6:58 AM, Сергей Нагайцев  wrote:

> CREATE COLUMNFAMILY post (
> KEY uuid,
> author uuid,
> blog uuid,
> name text,
> data text,
> PRIMARY KEY ( KEY )
> );
>
> SELECT * FROM post WHERE blog IN (1,2) AND author=3 ALLOW FILTERING;
> (don't look at fact numbers are not uuids :)
>
> Error: IN predicates on non-primary-key columns (blog) is not yet supported
>
> And how to workaround this ?
> Manual index tables ? Any guidelines how to design them ?
>


Re: Endless loop LCS compaction

2013-11-08 Thread Chris Burroughs

On 11/07/2013 06:48 AM, Desimpel, Ignace wrote:

Total data size is only 3.5GB. Column family was created with SSTableSize : 10 
MB


You may want to try a significantly larger size.

https://issues.apache.org/jira/browse/CASSANDRA-5727


Re: Why truncate previous hints when upgrade from 1.1.9 to 1.2.6?

2013-11-08 Thread Chris Burroughs

NEWS.txt has some details and suggested procedures

- The hints schema was changed from 1.1 to 1.2. Cassandra automatically
  snapshots and then truncates the hints column family as part of
  starting up 1.2 for the first time.  Additionally, upgraded nodes
  will not store new hints destined for older (pre-1.2) nodes. It is
  therefore recommended that you perform a cluster upgrade when all
  nodes are up. Because hints will be lost, a cluster-wide repair (with
  -pr) is recommended after upgrade of all nodes.

On 11/07/2013 07:33 AM, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 wrote:

Hi all,
When I upgrade C* from 1.1.9 to 1.2.6, I notice that the previous 
hintscolumnfamily would be directly truncated.
Can you tell me why ?
Because consistency is important to my services.


Best Regards,
Boole Guo





Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

2013-11-08 Thread Elias Ross
On Thu, Nov 7, 2013 at 7:01 PM, Krishna Chaitanya wrote:

> Check if its an issue with permissions or broken links..
>
>
I don't think permissions are an issue. You might be on to something
regarding the links.

I've been seeing this on 4 nodes, configured identically.

Here's what I think the problem may be: (or may be a combination of a few
problems)

1. I have symlinked the data directories. This confuses Cassandra in some
way, causing it to create multiple files. Does Cassandra care if the data
directory was symlinked from someplace? Would this cause an issue.

lrwxrwxrwx1 root root 6 Oct 30 18:37 data01 -> /data1 # [1]

Evidence for:
a. Somehow it's creating duplicate hard links.
b. It is unlikely other Cassandra users would have setup their directories
like this and this seems like a serious bug.
c. Also, my other cluster is nearly identical (OS, JVM, 6 drives, same
Cassandra/RHQ, hardware similar) and not seeing the same issues, although
that is a two node cluster.

If I were to grep through, I guess I would see if there's a chance the path
that Java sees, maybe File.getAbsoluteFile() (which might resolve the link)
doesn't match the path of another file. In other words, it is a Cassandra
bug, based on some assumptions from the JVM


2. When I created the cluster, I had a single data directory for each node.
I then added 5 more. Somehow Cassandra mis-remembers where the data was
put, causing all sorts of issues. How does Cassandra decide where to put
its data and where to read it from? What happens when additional data
directories are added? There could be a bug in the code.

Evidence for:
a. Somehow it's looking for data in the wrong directory. It also seems
unlikely a user would create a cluster, then add 5 more drives.

# [1] The reason the links are setup is because the mount points didn't
match my Puppet setup, which sets up my directory permissions. So I added
the links to compensate.


Re: CQL Tables in Cassandra with an Index

2013-11-08 Thread Techy Teck
If I execute the above query from CQL shell, it doesn't work for me at
all... This is what I get -

cqlsh:pp> insert into test_new (employee_id, employee_name, value,
last_modified_date) values ('1', 'e29',  'some_new_value', now()) if not
exists
;
Bad Request: line 1:123 missing EOF at 'if'

Is there anything I am missing here? I am running Cassandra 1.2.3




On Fri, Nov 8, 2013 at 5:33 AM, DuyHai Doan  wrote:

> Consider using the new lightweight transaction
>
>  insert into test_new (employee_id, employee_name, value,
> last_modified_date) values ('1', 'e29',  'some_new_value', now()) *if not
> exists*;
>
>
>
> Le vendredi 8 novembre 2013 03:53:12 UTC+1, Techy Teck a écrit :
>
>> I am using the below table in our use case -
>>
>> create table test_new (
>> employee_id text,
>> employee_name text,
>> value text,
>> last_modified_date timeuuid,
>> primary key (employee_id, last_modified_date)
>>);
>>
>> create index employee_name_idx on test_new (employee_name);
>>
>> In my above table employee_id will be unique always starting from 1 till
>> 32767. So our query pattern is like this -
>>
>> 1. Give me everything for any of the employee_id?
>> 2. Give me everything for what has changed in last 5 minutes?
>> 3. Give me everything for any of the employee_name?
>>
>>
>> I will be inserting below data into my above table -
>>
>> insert into test_new (employee_id, employee_name, value,
>> last_modified_date) values ('1', 'e27',  'some_value', now());
>> insert into test_new (employee_id, employee_name, value,
>> last_modified_date) values ('2', 'e27',  'some_new_value', now());
>> insert into test_new (employee_id, employee_name, value,
>> last_modified_date) values ('3', 'e28',  'some_new_again_value', now());
>>
>> So now is there any way to avoid this particular scenario in my above
>> table for the below query.. Somehow somebody by mistake is trying to
>> execute the below query then it will create another row with employee_id as
>> 1 and with other fields? I don't want anyone to insert the same employee_id
>> again if it is already there in the cassandra database.. Any thoughts?
>>
>> insert into test_new (employee_id, employee_name, value,
>> last_modified_date) values ('1', 'e29',  'some_new_value', now());
>>
>>
>>  To unsubscribe from this group and stop receiving emails from it, send
> an email to java-driver-user+unsubscr...@lists.datastax.com.
>


Statistics

2013-11-08 Thread Parag Patel
Hi,

I'm looking for a way to view statistics.  Mainly, I'd like to see the 
distribution of writes and reads over the course of a day or a set of days. Is 
there a way to do this through nodetool or by downloading a utility?

Thanks,
Parag


Re: CQL Tables in Cassandra with an Index

2013-11-08 Thread Alex Popescu
Conditional inserts/updates (lightweight transactions) are available only
in C* 2.0+.

Also most of the time you should try to think about alternative ways to
solve the problem and rely on these only if you cannot find a different
solution (the reason for this is that they come with performance penalties
and you'd be better off with a scalable & performant design, rather than
taking the easy way out ;-)


On Fri, Nov 8, 2013 at 11:40 AM, Techy Teck  wrote:

> If I execute the above query from CQL shell, it doesn't work for me at
> all... This is what I get -
>
> cqlsh:pp> insert into test_new (employee_id, employee_name, value,
> last_modified_date) values ('1', 'e29',  'some_new_value', now()) if not
> exists
> ;
> Bad Request: line 1:123 missing EOF at 'if'
>
> Is there anything I am missing here? I am running Cassandra 1.2.3
>
>
>
>
> On Fri, Nov 8, 2013 at 5:33 AM, DuyHai Doan  wrote:
>
>> Consider using the new lightweight transaction
>>
>>  insert into test_new (employee_id, employee_name, value,
>> last_modified_date) values ('1', 'e29',  'some_new_value', now()) *if
>> not exists*;
>>
>>
>>
>> Le vendredi 8 novembre 2013 03:53:12 UTC+1, Techy Teck a écrit :
>>
>>> I am using the below table in our use case -
>>>
>>> create table test_new (
>>> employee_id text,
>>> employee_name text,
>>> value text,
>>> last_modified_date timeuuid,
>>> primary key (employee_id, last_modified_date)
>>>);
>>>
>>> create index employee_name_idx on test_new (employee_name);
>>>
>>> In my above table employee_id will be unique always starting from 1 till
>>> 32767. So our query pattern is like this -
>>>
>>> 1. Give me everything for any of the employee_id?
>>> 2. Give me everything for what has changed in last 5 minutes?
>>> 3. Give me everything for any of the employee_name?
>>>
>>>
>>> I will be inserting below data into my above table -
>>>
>>> insert into test_new (employee_id, employee_name, value,
>>> last_modified_date) values ('1', 'e27',  'some_value', now());
>>> insert into test_new (employee_id, employee_name, value,
>>> last_modified_date) values ('2', 'e27',  'some_new_value', now());
>>> insert into test_new (employee_id, employee_name, value,
>>> last_modified_date) values ('3', 'e28',  'some_new_again_value', now());
>>>
>>> So now is there any way to avoid this particular scenario in my above
>>> table for the below query.. Somehow somebody by mistake is trying to
>>> execute the below query then it will create another row with employee_id as
>>> 1 and with other fields? I don't want anyone to insert the same employee_id
>>> again if it is already there in the cassandra database.. Any thoughts?
>>>
>>> insert into test_new (employee_id, employee_name, value,
>>> last_modified_date) values ('1', 'e29',  'some_new_value', now());
>>>
>>>
>>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to java-driver-user+unsubscr...@lists.datastax.com.
>>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>



-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


Creating custom secondary index in Cassandra.

2013-11-08 Thread mahesh rajamani
Hello,

I am looking out for some additional details about how to create a custom
Secondary Index.
I see CQL documentation where we can provide our implementation of
secondary index using the syntax:

CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass';

But there are no information about the interface that need to be
implemented. Can someone give more information/reference on how to use this.

-- 
Regards,
Mahesh Rajamani


Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

2013-11-08 Thread Elias Ross
On Fri, Nov 8, 2013 at 10:31 AM, Elias Ross  wrote:


> On Thu, Nov 7, 2013 at 7:01 PM, Krishna Chaitanya 
> wrote:
>
>> Check if its an issue with permissions or broken links..
>>
>>
> I don't think permissions are an issue. You might be on to something
> regarding the links.
>
>
As it turns out (and I noted in CASSANDRA-6298 already) this was a user
issue. One of my links was pointing to the same drive:

lrwxrwxrwx1 root root 6 Oct 30 18:37 data05 -> /data5
lrwxrwxrwx1 root root 6 Oct 30 18:37 data06 -> /data5

Thanks for the help everyone, I'm happy it's all working. I'm not so happy
that I messed up my configuration like this.


Re: Statistics

2013-11-08 Thread David Chia
http://www.datastax.com/dev/blog/metrics-in-cassandra12


On Fri, Nov 8, 2013 at 11:42 AM, Parag Patel wrote:

>Hi,
>
>
>
> I’m looking for a way to view statistics.  Mainly, I’d like to see the
> distribution of writes and reads over the course of a day or a set of days.
> Is there a way to do this through nodetool or by downloading a utility?
>
>
>
> Thanks,
>
> Parag
>


Best data structure for tracking most recent updates.

2013-11-08 Thread Jacob Rhoden
I need to be able to show the most recent changes that have occurred in a system, I understand inserting every update into a tracking table and deleting old updates may not be great, as I may end up creating millions of tombstones. i.e. don't do this:create table recent_updates(uuid timeuuid primary key, message text);insert into recent_updates(now(), 'the message');insert into recent_updates(now(), 'the message');insert into recent_updates(now(), 'the message');// delete all but the most recent ten messages.So how do people solve it? The following option occurs to me, but I am not sure if its the best option:create table recent_updates(record int primary key, message text, uuid timeuuid);insert into recent_updates(1, 'the message', now());insert into recent_updates(2, 'the message', now());insert into recent_updates(10, 'the message', now());// rotate back to 1insert into recent_updates(1, 'the message', now());Doing it this way would require a query to find out what number in the sequence we are up to.Best regards,Jacob

Re: Best data structure for tracking most recent updates.

2013-11-08 Thread Laing, Michael
Here are a couple ideas:

1. You can rotate tables and truncate to avoid deleting.
2. You can shard your tables (partition key) to mitigate hotspots.
3. You can use a column key to store rows in timeuuid sequence.

create table recent_updates_00 (shard text, uuid timeuuid, message text,
primary key (shard, uuid));
create table recent_updates_01 (shard text, uuid timeuuid, message text,
primary key (shard, uuid)));
...

You can determine 'shard' randomly within a range, e.g. 1 of 24 shards,
when you write. Sharding spreads the load as each shard is a row.

You determine which table to write to by current datetime, e.g. hour of
day, day of week, etc. and use the modulus based upon, e.g. every 5 hours,
every 3 days, etc. So you are only writing to 1 table at a time. Usually I
derive the datetime from the timeuuid so all is consistent. Within your
modulus range, you can truncate currently unused tables so they are ready
for reuse - truncation is overall much cheaper than deletion.

You can retrieve 'the latest' updates by doing a query like this - the
table is determined by current time, but possibly you will want to append
results from the 'prior' table if you do not satisfy your limit:

select uuid, message from recent_updates_xx where shard in ('00', '01',
...) order by uuid desc limit 10; -- get the latest 10

This is a very efficient query. You can improve efficiency somewhat by
altering the storage order in the table creates.

ml








On Fri, Nov 8, 2013 at 6:02 PM, Jacob Rhoden  wrote:

> I need to be able to show the most recent changes that have occurred in a
> system, I understand inserting every update into a tracking table and
> deleting old updates may not be great, as I may end up creating millions of
> tombstones. i.e. don't do this:
>
> create table recent_updates(uuid timeuuid primary key, message text);
> insert into recent_updates(now(), 'the message');
> insert into recent_updates(now(), 'the message');
> 
> insert into recent_updates(now(), 'the message');
> // delete all but the most recent ten messages.
>
> So how do people solve it? The following option occurs to me, but I am not
> sure if its the best option:
>
> create table recent_updates(record int primary key, message text, uuid
> timeuuid);
> insert into recent_updates(1, 'the message', now());
> insert into recent_updates(2, 'the message', now());
> 
> insert into recent_updates(10, 'the message', now());
> // rotate back to 1
> insert into recent_updates(1, 'the message', now());
>
> Doing it this way would require a query to find out what number in the
> sequence we are up to.
>
> Best regards,
> Jacob
>


Re: Cassandra 1.1.6 - New node bootstrap not completing

2013-11-08 Thread Chris Burroughs

On 11/01/2013 03:03 PM, Robert Coli wrote:

On Fri, Nov 1, 2013 at 9:36 AM, Narendra Sharma
wrote:


I was successfully able to bootstrap the node. The issue was RF > 2.
Thanks again Robert.



For the record, I'm not entirely clear why bootstrapping two nodes into the
same range should have caused your specific bootstrap problem, but I am
glad to hear that bootstrapping one node at a time was a usable workaround.

=Rob



(A) If it can't work shouldn't a node refuse to bootstrap if it sees 
another node already in that state?


(B) It would be nice if nodes in independent DCs could at least be 
bootstrapped at the same time.