Re: Seed vs non-seed in YAML

2011-09-28 Thread Peter Schuller
> Seeds will not auto-bootstrap themselves when you add them to  the cluster.

And having a bunch of nodes that aren't even in the cluster (when you
move them around, decommission etc) seems like a bad idea.

As for amount: I would say definitely have at least RF seeds (for some
reason I haven't heard that recommendation before but it makes sense
to me).

-- 
/ Peter Schuller (@scode on twitter)


Partitioner per keyspace

2011-09-28 Thread Philippe
Hi is there any reason why configuring a partitioner per keyspace wouldn't
be possible technically ?

Thanks.


Re: GC for ParNew on 0.8.6

2011-09-28 Thread Peter Schuller
> I have changed absolutely nothing and the work load is perhaps even lower
> than before upgrading because I have paused most updates to the cluster. Did
> a log level change or does this have deeper meaning ?

You don't say which version you upgraded from but I suspect 0.7? JVM
options like heap size calculations have changed. I am guessing you're
seeing it in the logs because the young generation was sized
differently, causing young generation garbage collections to be longer
than before and thus triggering the logging by the Cassandra
GCInspector.

-- 
/ Peter Schuller (@scode on twitter)


Re: how does compaction_throughput_kb_per_sec affect disk io?

2011-09-28 Thread Peter Schuller
> I would think that compaction_throughput_kb_per_sec does have indirect impact
> on disk IO. High number means or setting it to 0 means there is no
> throttling on how much IO is being performed. Wouldn't it impact normal
> reads from disk during the time when disk IO or util is high which
> compaction is taking place?

It would. I don't think anyone intended to say otherwise; but the
setting itself does not directly change the behavior of flushes or
reads. I think that's what jbellis meant. I.e., there is no throttling
of flushes or reads.

-- 
/ Peter Schuller (@scode on twitter)


Re: Assertion error in AntiEntropyService.rendezvous()

2011-09-28 Thread aaron morton
looks like you are hitting this 
https://issues.apache.org/jira/browse/CASSANDRA-3256

Does the repair complete ?

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 7:04 PM, Philippe wrote:

> Hello,
> Have just ran into a new assertion error, again after upgrading a 2 month-old 
> cluster to 0.8.6
> Can someone explain what this means and the possible consequences ?
> 
> Thanks
> ERROR [AntiEntropyStage:2] 2011-09-27 06:07:41,960 
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
> Thread[AntiEntropyStage:2,5,main]
> java.lang.AssertionError
> at 
> org.apache.cassandra.service.AntiEntropyService.rendezvous(AntiEntropyService.java:170)
> at 
> org.apache.cassandra.service.AntiEntropyService.access$100(AntiEntropyService.java:90)
> at 
> org.apache.cassandra.service.AntiEntropyService$TreeResponseVerbHandler.doVerb(AntiEntropyService.java:518)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 



Re: Partitioner per keyspace

2011-09-28 Thread aaron morton
The first thing I can think of is the initial_token for the node must be a 
valid token according to the configured partitioner, as the tokens created by 
the partitioner are the things stored the distributed hash tree. If you had a 
partitioner per KS you would need to configure the initial_token per KS. 

Also it's not possible to change *ever* change the partitioner, so it would 
have to be excluded from the KS update. 

They are not show stoppers, just the firs things that come to mind. 

IIRC a lot of the other access happens in the context of a KS, their may be 
other issues but I've not checked the code. 

Anyone else ? 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 8:28 PM, Philippe wrote:

> Hi is there any reason why configuring a partitioner per keyspace wouldn't be 
> possible technically ?
> 
> Thanks.
> 



Re: [RELEASE CANDIDATE] Apache Cassandra 1.0.0-rc1 released

2011-09-28 Thread Sylvain Lebresne
On Wed, Sep 28, 2011 at 8:32 AM, Philippe  wrote:
> Congrats.
> Is there a target date for the release.  If not is it likely to be in
> October?

October 8th.

Of course, it's a target only, we make no guarantee. But the beta has been
rather calm (which isn't saying there wasn't a few bugs fixed) so hopes are
good we'll hit the target, or close of it if not.

--
Sylvain

>
> Le 27 sept. 2011 18:57, "Sylvain Lebresne"  a écrit :
>> The Cassandra team is pleased to announce the release of the first release
>> candidate for the future Apache Cassandra 1.0.
>>
>> The warnings first: this is *not* the final release and hence should not
>> be
>> considered ready for production use just yet. However, unless major
>> regression
>> are found in the test of this release candidate, the final release should
>> be
>> very similar to this RC.
>>
>> Your help in testing this release candidate will be highly appreciated and
>> while doing so, please report any problem you may encounter[3,4]. The
>> changes
>> since the beta1 can be found in the change log[1] and see the release
>> notes[2]
>> to find what Cassandra 1.0 is made of.
>>
>> Apache Cassandra 1.0.0-rc1[5] is available as usual from the cassandra
>> website:
>>
>> http://cassandra.apache.org/download/
>>
>> Thank you for your help in testing and have fun with it, Cassandra 1.0 is
>> right around the corner!
>>
>> [1]: http://goo.gl/1wJ1h (CHANGES.txt)
>> [2]: http://goo.gl/O5DmR (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>> [4]: user@cassandra.apache.org
>> [5]: https://svn.apache.org/repos/asf/cassandra/tags/cassandra-1.0.0-rc1
>


Re: Partitioner per keyspace

2011-09-28 Thread Sylvain Lebresne
https://issues.apache.org/jira/browse/CASSANDRA-295

--
Sylvain

On Wed, Sep 28, 2011 at 10:06 AM, aaron morton  wrote:
> The first thing I can think of is the initial_token for the node must be a
> valid token according to the configured partitioner, as the tokens created
> by the partitioner are the things stored the distributed hash tree. If you
> had a partitioner per KS you would need to configure the initial_token per
> KS.
> Also it's not possible to change *ever* change the partitioner, so it would
> have to be excluded from the KS update.
> They are not show stoppers, just the firs things that come to mind.
> IIRC a lot of the other access happens in the context of a KS, their may be
> other issues but I've not checked the code.
> Anyone else ?
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 28/09/2011, at 8:28 PM, Philippe wrote:
>
> Hi is there any reason why configuring a partitioner per keyspace wouldn't
> be possible technically ?
>
> Thanks.
>


Re: Bulk uploader issue on multi-node cluster

2011-09-28 Thread aaron morton
Thats just a warning, it should keep trying to connect. Does it make any 
progress  ?

After 8 attempts it should fail with an error. 

With the debug option you should get a full stack trace printed if it fails, 
can you send that along as well. 

It's probably some sort of config problem.

Cheers



-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 3:44 AM, Thamizh wrote:

> Hi,
> 
> I had set below config on SSTable instance (127.0.0.2).
> 
> auto_bootstrap: false
> seeds: "172.27.15.2" (lab02)
> rpc_address: 127.0.0.2
> listen_address: 127.0.0.2
> rpc_port: 9160
> storage_port: 7000
> 
> #ifconfig
> lo:2  Link encap:Local Loopback  
>   inet addr:127.0.0.2  Mask:255.0.0.0
>   UP LOOPBACK RUNNING  MTU:16436  Metric:1
> 
> But When I ran sstableloader command it ended up with below error
> 
> I nutch@lab02:/code/SST/apache-cassandra-0.8.6$ bin/sstableloader --debug -v 
> /code/sstable0-cassandra-0.8.6/SSTableUploader/ipovw
> Starting client (and waiting 30 seconds for gossip) ...
> Streaming revelant part of 
> /code/sstable0-cassandra-0.8.6/SSTableUploader/ipovw/ip-g-1-Data.db to 
> [/172.27.15.4, /172.27.15.2, /172.27.15.3]
> 
> progress: [/172.27.15.4 0/0 (100)] [/172.27.15.2 0/1 (0)] [/172.27.15.3 0/0 
> (100)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 09:42:14,883 Failed attempt 1 to 
> connect to /172.27.15.4 to stream null. Retrying in 2 ms. 
> (java.net.SocketException: Invalid argument)
> progress: [/172.27.15.4 0/0 (100)] [/172.27.15.2 0/1 (0)] [/172.27.15.3 0/0 
> (100)] [total: 0 - 0MB/s (avg: 0MB/s)]
> 
> 
> When I issue, bin/nodetool --host  ring. I could see the 
> ring topology has formed. I am using Cassandra-0.8.6 version.
> 
> Any suggestion would be appreciated.
> 
> Regards,
> Thamizhannal P
> 
> --- On Fri, 23/9/11, Benoit Perroud  wrote:
> 
> From: Benoit Perroud 
> Subject: Re: Bulk uploader issue on multi-node cluster
> To: user@cassandra.apache.org
> Date: Friday, 23 September, 2011, 9:01 PM
> 
> On the sstableloader config, make sure you have the seed set and rpc_address 
> and rpc_port pointing to your cassandra instance (127.0.0.2) 
> 
> 
> 
> 2011/9/23 Thamizh 
> Hi All,
> 
> I am using bulk-loading to upload data(from lab02) to multi-node cluster of 3 
> machines(lab02,lab03 & lab04) with sigle ethernet card. I have created 
> SSTable instance on lab02 by duplicating look back address( sudo ifconfig 
> lo:2 127.0.0.2 netmask 255.0.0.0 up; ) "127.0.0.2" as rpc and storage 
> address. Here 'sstableloader' ended up with below error message,
> 
> Starting client (and waiting 30 seconds for gossip) ...
> java.lang.IllegalStateException: Cannot load any sstable, no live member 
> found in the cluster
> 
> Here, in my case, Does lab02 machine should have 2 ethernet card(one for 
> cassandra original instance and another for 'sstableloader') ?
> 
> Regards,
> Thamizhannal
> 



Re: Partitioner per keyspace

2011-09-28 Thread aaron morton
Thats the one I was thinking of. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 9:12 PM, Sylvain Lebresne wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-295
> 
> --
> Sylvain
> 
> On Wed, Sep 28, 2011 at 10:06 AM, aaron morton  
> wrote:
>> The first thing I can think of is the initial_token for the node must be a
>> valid token according to the configured partitioner, as the tokens created
>> by the partitioner are the things stored the distributed hash tree. If you
>> had a partitioner per KS you would need to configure the initial_token per
>> KS.
>> Also it's not possible to change *ever* change the partitioner, so it would
>> have to be excluded from the KS update.
>> They are not show stoppers, just the firs things that come to mind.
>> IIRC a lot of the other access happens in the context of a KS, their may be
>> other issues but I've not checked the code.
>> Anyone else ?
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 28/09/2011, at 8:28 PM, Philippe wrote:
>> 
>> Hi is there any reason why configuring a partitioner per keyspace wouldn't
>> be possible technically ?
>> 
>> Thanks.
>> 



Re: unable to start as a service on Ubuntu server

2011-09-28 Thread Ramesh S
Thanks a lot Shyamal.
That was the solution. It works now :)

regards
Ramesh



On Tue, Sep 27, 2011 at 8:36 PM, Shyamal Prasad wrote:

>
> > "Ramesh" =3D=3D Ramesh S  writes:
>
>Ramesh> Hello all, We installed Cassandra on our development
>Ramesh> server , which is a fresh Ubuntu server running only
>Ramesh> Cassandra.  We followed all the instructions on this
>Ramesh> link=C2=A0and when we want to start the server as a service by
>Ramesh> issuing either of the command.
>
>Ramesh> service cassandra start
>Ramesh> /etc/init.d/cassandra start
>
>Ramesh> it waits for a few seconds, then returns the prompt.
>Ramesh> But the service never starts.  When we try to start with
>Ramesh> cassandra -f , it works fine.
>
> If you installed from the Debian package you probably have a known
> permission problem, and you should see it recorded in
> /var/log/cassandra/output.log when starting from the init script. See
> https://issues.apache.org/jira/browse/CASSANDRA-3198
>
> The fix is to:
>  chown -R cassandra: /var/lib/cassandra
>  chown -R cassandra: /var/log/cassandra
>
> Cheers!
> Shyamal
>
>


Re: Partitioner per keyspace

2011-09-28 Thread Edward Capriolo
On Wed, Sep 28, 2011 at 4:36 AM, aaron morton wrote:

> Thats the one I was thinking of.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/09/2011, at 9:12 PM, Sylvain Lebresne wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-295
> >
> > --
> > Sylvain
> >
> > On Wed, Sep 28, 2011 at 10:06 AM, aaron morton 
> wrote:
> >> The first thing I can think of is the initial_token for the node must be
> a
> >> valid token according to the configured partitioner, as the tokens
> created
> >> by the partitioner are the things stored the distributed hash tree. If
> you
> >> had a partitioner per KS you would need to configure the initial_token
> per
> >> KS.
> >> Also it's not possible to change *ever* change the partitioner, so it
> would
> >> have to be excluded from the KS update.
> >> They are not show stoppers, just the firs things that come to mind.
> >> IIRC a lot of the other access happens in the context of a KS, their may
> be
> >> other issues but I've not checked the code.
> >> Anyone else ?
> >>
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 28/09/2011, at 8:28 PM, Philippe wrote:
> >>
> >> Hi is there any reason why configuring a partitioner per keyspace
> wouldn't
> >> be possible technically ?
> >>
> >> Thanks.
> >>
>
>
The last time I asked about this I heard it was "really baked in" this led
me to plan on this not happening any time soon. If you really need two
partitioner my advice is to run two clusters. In some cases multi-tenancy
depending on how you use the word is possible, but in other cases it is a
pipe dream.

The reason I say this is that as you add more CF and KS to a cluster you
lower your ability to optimize for a specific keyspace. You inevitably get
different workloads and they internally start contending for resources. Also
may run into a situation where you need to scale only one CF, but  because
of constraints of another you end up having to get resources/hardware you do
not need.

**depending on your work load not a hard fast rule**
For example say you have two column families and a 10 node cluster.
ColumnFamily A 10GB data/node /500reads/sec
ColumnFamily B 500GB data/node /100reads/sec

Imagine column family A will need to double read traffic but column family B
does not. With one cluster you end up buying 10 nodes with 600GB disk
space.
With two clusters you could have just extended the capacity of one cluster
without the other.

You can get this vibe by listening to some of the talks at CassandraSF
http://twitter.com/#!/slideshare/status/78906858169057280

In particular twitter had precomputed a matrix of datasize/number
servers/ops sec. Rather then have one large cluster that has all your data
but tuned for none, have smaller distinct clusters exactly turned for your
workload.

I am a bit off topic but in general if you are considering two partitioners
you almost certainly want 2 distinct clusters.   Really NONE of the
operations work across keyspace anyway batch_mutate,mget, so a design that
spans keyspaces can be unorthodox.


RE: Removal of old data files

2011-09-28 Thread hiroyuki.watanabe
Thank you.  You have been very helpful.

We stored only one day worth of data for now. However, we want to store 5 days 
worth of data eventually.
That is 5 times more disk space.
That is our main reason for us to look at older SSTables that appears to be 
holding only tombstones.

- yuki



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, September 27, 2011 8:22 PM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

Short Answer: Cassandra will actively delete files when it needs to make space. 
Otherwise they will be deleted "some time later". Unless you are getting out of 
disk space errors it's not normally something to worry about.

Longer:
The TTL guarantee is "do not return this data to get requests after this many 
seconds".

Data is "purged" from an SSTable when we run compactions (either minor/auto or 
major/manual). Purging means it will not be written in the new SSTable created 
by the compaction process. The main criteria for purging is that either 
gc_grace_seconds OR ttl have expired on the column.

After compaction completes it's writes the -Compacted for the SSTables that 
were compacted. But there is a bunch of logic associated with which files are 
compacted, it's not "compact the oldest 3 files".  Basically it tries to 
compact files which are about the same size.

Remember we *never* modify data on disk. If we want to remove data from an 
SSTable we have to write a new SSTable. It's one of the reasons things writes 
are fast http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

Hope that helps.

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 10:16 AM, 
hiroyuki.watan...@barclayscapital.com
 wrote:


Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
Cassandra to remove old data/files more aggressively.

Cassandra do remove fair amount of old data files.
Cassandra tends to removed 4 out of every 5 files.
I notice it because data file has a sequence number as a part of name.

I also noticed when Cassandra generated *-Compacted file it generated 4 file at 
a time.
They have consecutive numbers as file name, but skip one number from the 
previous group of 4.
The one missing is the file that is failed to be removed in the end and stays 
forever.

I looked at the Keys in an index file that failed to be removed.  If I make 
query of any of keys, Cassandra indicates that there is not data, which is 
correct because these files are older than 24 hours.  All the data must be 
obsolete due to TTL.

I am wondering why Cassandra does not remove all data file whose time stamp is 
much older than TTL + grace period.

Does anybody have similar experience ?


Yuki Watanabe



-Original Message-
From: Watanabe, Hiroyuki: IT (NYK)
Sent: Friday, September 02, 2011 9:01 AM
To: user@cassandra.apache.org
Subject: RE: Removal of old data files


I see. Thank you for helpful information

Yuki



-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Friday, September 02, 2011 3:40 AM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

On Fri, Sep 2, 2011 at 12:11 AM,  
mailto:hiroyuki.watan...@barclayscapital.com>>
 wrote:
Yes, I see files with name like
Orders-g-6517-Compacted

However, all of those file have a size of 0.

Starting from Monday to Thurseday we have 5642 files for -Data.db,
-Filter.db and Statistics.db and only 128 -Compacted files.
and all of -Compacted file has size of 0.

Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just marker, to 
indicate that the -Data file corresponding (with the same number) are, in fact, 
compacted and will eventually be removed. So those files will always have a 
size of 0.

--
Sylvain



yuki


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, August 25, 2011 6:13 PM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

If cassandra does not have enough disk space to create a new file it
will provoke a JVM GC which should result in compacted SStables that
are no longer needed been deleted. Otherwise they are deleted at some
time in the future.
Compacted SSTables have a file written out with a "compacted" extension.
Do you see compacted sstables in the data directory?
Cheers.
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

We are using Cassandra 0.8.0 with 8 node ring and only one CF.
Every column has TTL of 86400 (24 hours). we also set 'GC grace
second' to 43200
(12 hours).  We have to store massive amount of data for one day now
and eventually for five da

Re: Can not connect to cassandra 0.7 using CLI

2011-09-28 Thread Julio Julio
thanks a lot for help! I don't know what exactly was wrong (because I deleted
everything including java and started from the very beginning) but now
everything seems to work perfectly and I'm doing first stepes with Cassandra.

By the way... 
Maybe do you have some useful advices (or you know some articles) about making
working the "ring". I want to make databases under control of Cassandra which
will consist of a few computers working together. (?)

thanks for help
best regards 





Re: Can not connect to cassandra 0.7 using CLI

2011-09-28 Thread Jake Luciani
http://screenr.com/5G6

On Wed, Sep 28, 2011 at 11:56 AM, Julio Julio
wrote:

> thanks a lot for help! I don't know what exactly was wrong (because I
> deleted
> everything including java and started from the very beginning) but now
> everything seems to work perfectly and I'm doing first stepes with
> Cassandra.
>
> By the way...
> Maybe do you have some useful advices (or you know some articles) about
> making
> working the "ring". I want to make databases under control of Cassandra
> which
> will consist of a few computers working together. (?)
>
> thanks for help
> best regards
>
>
>
>


-- 
http://twitter.com/tjake


Hiring Software Engineers With Cassandra Experience (Boston, MA)

2011-09-28 Thread Chris Herron
Hi everybody,

Apptegic is hiring software engineers and we are especially interested in 
anybody who has developed with and used Cassandra in production.

We are based in Boston and are looking for local people. However, we would love 
to hear from Cassandra experts anywhere in the United States.

Please contact me directly with your questions.

Cheers,

Chris

P.S. Please, no recruiting agencies or consulting offers.









How get is working?

2011-09-28 Thread Julio Julio
Hi everyone!

I'm new user of Cassandra (I'm using Cassandra 0.8.6) 
and I created CF by command:

create column family People with comparator=UTF8Type and
default_validation_class=UTF8Type;

and then I inserted some values like:
set People[wking][name]='ala';
set People[wking][sname]='sala';

set People[jking][name]='aala';
set People[jking][sname]='saala';
 
and now when i'm adding column to for example 'wking' like:
set People[wking][age]=long(55);
it's also added to 'jking'. What I'm doing wrong? 
Also it surprised me when I wrote:
get People[akingl]; 
I get some columns although I haven't inserted anything with 
key like this.

Can anyone explain why this happening?

Best regards
julio




Weird problem with empty CF

2011-09-28 Thread Daning
I have an app polling a few CFs (select first N * from CF), there were 
data in CFs but later were deleted so CFs were empty for a long time. I 
found Cassandra CPU usage was getting high to 80%, normally it uses less 
than 30%. I issued the select query manually and feel the response is 
slow. I have tried nodetool compact/repair for those CFs but that does 
not work. later, I issue 'truncate' for all the CFs and CPU usage gets 
down to 1%.


Can somebody explain to me why I need to truncate an empty CF? and what 
else I could do to bring the CPU usage down?


I am running 0.8.6.

Thanks,

Daning



Best indexing solution for Cassandra

2011-09-28 Thread Anthony Ikeda
Well, we go live with our project very soon and we are now looking into what
we will be doing for the next phase. One of the enhancements we would like
to consider is an indexing platform to start building searches into our
application.

Right now we are just using column families to index the information
(different views based on what we want to find) however it is proving to be
quite a task to keep the index views in sync with the data - although not a
showstopper, it isn't something we want to be handling all the time
especially since operations like deletions require changes to multiple
column families.

I've heard of Solandra and Lucandra but I want to understand the experiences
of people that may have used them or other suggestions.

Anthony


Re: Best indexing solution for Cassandra

2011-09-28 Thread Rafael Almeida
>From Anthony Ikeda :
> Well, we go live with our project very soon and we are now looking into what 
> we will be doing for the next phase. One of the enhancements we would like to 
> consider is an indexing platform to start building searches into our 
> application.
>
>
> Right now we are just using column families to index the information 
> (different views based on what we want to find) however it is proving to be 
> quite a task to keep the index views in sync with the data - although not a 
> showstopper, it isn't something we want to be handling all the time 
> especially since operations like deletions require changes to multiple column 
> families.
>
>
> I've heard of Solandra and Lucandra but I want to understand the experiences 
> of people that may have used them or other suggestions.


I've had some experience with that. My main problem was that I had a limited 
vocabulary and a large number of documents. It seems like solandra kept all my 
documents on the same row for a given term. That means the documents don't get 
spread out throught the cluster and search was painfully slow. We ended up 
rolling up our own solution and not using cassandra at all for that purpose 
(althought we still use it for storage).



Re: Best indexing solution for Cassandra

2011-09-28 Thread Mohit Anchlia
look at elasticsearch too. It shards differently.

On Wed, Sep 28, 2011 at 1:45 PM, Rafael Almeida  wrote:
> From Anthony Ikeda :
>> Well, we go live with our project very soon and we are now looking into what 
>> we will be doing for the next phase. One of the enhancements we would like 
>> to consider is an indexing platform to start building searches into our 
>> application.
>>
>>
>> Right now we are just using column families to index the information 
>> (different views based on what we want to find) however it is proving to be 
>> quite a task to keep the index views in sync with the data - although not a 
>> showstopper, it isn't something we want to be handling all the time 
>> especially since operations like deletions require changes to multiple 
>> column families.
>>
>>
>> I've heard of Solandra and Lucandra but I want to understand the experiences 
>> of people that may have used them or other suggestions.
>
>
> I've had some experience with that. My main problem was that I had a limited 
> vocabulary and a large number of documents. It seems like solandra kept all 
> my documents on the same row for a given term. That means the documents don't 
> get spread out throught the cluster and search was painfully slow. We ended 
> up rolling up our own solution and not using cassandra at all for that 
> purpose (althought we still use it for storage).
>
>


Re: Removal of old data files

2011-09-28 Thread aaron morton
For background:

Minor compaction will bucket files (see 
https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L989)
 and then compact them if they have more than min_compaction_threshold set per 
CF. 

It will then purge tombstones if a row is only contained in the SSTables 
involved in the compaction (see 
https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L84)
 

So there are a couple of approaches you can take if you want to ensure all 
TTL'd data is purged ASAP. Note that Tombstones and TTL'd data will be 
automatically purged at some point, but if more precise control you may need to 
take a few steps.  

First if all the data you are storing has a 24 hour TTL you can include a 
manualy major compaction via node tool in your maintenance routine. We normally 
advise against it because it tries to create one large file, but if all your 
data is going to be removed it's prob ok. 

Second, play around with minor compaction to increase the chances that data is 
purged soon after the 

Third, monkey up a process to kick of user defined compaction runs for SSTables 
that are over 24 hours old. 

I know disk space can be an issue, but if you have the spare capacity you can 
just let cassandra manage things. Also 1.0 has some major changes in this area.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29/09/2011, at 4:39 AM, hiroyuki.watan...@barclayscapital.com wrote:

> Thank you.  You have been very helpful.
>  
> We stored only one day worth of data for now. However, we want to store 5 
> days worth of data eventually.
> That is 5 times more disk space.
> That is our main reason for us to look at older SSTables that appears to be 
> holding only tombstones.
>  
> - yuki
>  
> 
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Tuesday, September 27, 2011 8:22 PM
> To: user@cassandra.apache.org
> Subject: Re: Removal of old data files
> 
> Short Answer: Cassandra will actively delete files when it needs to make 
> space. Otherwise they will be deleted "some time later". Unless you are 
> getting out of disk space errors it's not normally something to worry about. 
> 
> Longer: 
> The TTL guarantee is "do not return this data to get requests after this many 
> seconds".
>  
> Data is "purged" from an SSTable when we run compactions (either minor/auto 
> or major/manual). Purging means it will not be written in the new SSTable 
> created by the compaction process. The main criteria for purging is that 
> either gc_grace_seconds OR ttl have expired on the column. 
> 
> After compaction completes it's writes the -Compacted for the SSTables that 
> were compacted. But there is a bunch of logic associated with which files are 
> compacted, it's not "compact the oldest 3 files".  Basically it tries to 
> compact files which are about the same size. 
> 
> Remember we *never* modify data on disk. If we want to remove data from an 
> SSTable we have to write a new SSTable. It's one of the reasons things writes 
> are fast http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
> 
> Hope that helps. 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/09/2011, at 10:16 AM, hiroyuki.watan...@barclayscapital.com wrote:
> 
>> 
>> Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
>> Cassandra to remove old data/files more aggressively.  
>> 
>> Cassandra do remove fair amount of old data files. 
>> Cassandra tends to removed 4 out of every 5 files. 
>> I notice it because data file has a sequence number as a part of name.
>> 
>> I also noticed when Cassandra generated *-Compacted file it generated 4 file 
>> at a time. 
>> They have consecutive numbers as file name, but skip one number from the 
>> previous group of 4. 
>> The one missing is the file that is failed to be removed in the end and 
>> stays forever. 
>> 
>> I looked at the Keys in an index file that failed to be removed.  If I make 
>> query of any of keys, Cassandra indicates that there is not data, which is 
>> correct because these files are older than 24 hours.  All the data must be 
>> obsolete due to TTL. 
>> 
>> I am wondering why Cassandra does not remove all data file whose time stamp 
>> is much older than TTL + grace period. 
>> 
>> Does anybody have similar experience ? 
>> 
>> 
>> Yuki Watanabe
>> 
>> 
>> 
>> -Original Message-
>> From: Watanabe, Hiroyuki: IT (NYK) 
>> Sent: Friday, September 02, 2011 9:01 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Removal of old data files
>> 
>> 
>> I see. Thank you for helpful information 
>> 
>> Yuki
>> 
>> 
>> 
>> -Original Message-
>> From: Sylvain Lebresne [mailto:sylv...@datastax.com]
>> Sent: Friday, September 02, 2011 3:40 AM
>> To: user@cassandra

Re: Weird problem with empty CF

2011-09-28 Thread Jonathan Ellis
Sounds like you have non-expired tombstones.

http://wiki.apache.org/cassandra/DistributedDeletes

On Wed, Sep 28, 2011 at 12:35 PM, Daning  wrote:
> I have an app polling a few CFs (select first N * from CF), there were data
> in CFs but later were deleted so CFs were empty for a long time. I found
> Cassandra CPU usage was getting high to 80%, normally it uses less than 30%.
> I issued the select query manually and feel the response is slow. I have
> tried nodetool compact/repair for those CFs but that does not work. later, I
> issue 'truncate' for all the CFs and CPU usage gets down to 1%.
>
> Can somebody explain to me why I need to truncate an empty CF? and what else
> I could do to bring the CPU usage down?
>
> I am running 0.8.6.
>
> Thanks,
>
> Daning
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: 7

2011-09-28 Thread Angel
7?

Sent from my iPhone

On Sep 28, 2011, at 4:18 PM, RAJASHEKAR REDDY  wrote:

> ..Your friend wants to help you! Don’t ignore this letter!  
> http://alexeist.euweb.cz/com.friend.php?baflucky=16uv0


Re: How get is working?

2011-09-28 Thread aaron morton
You should have gotten an error from the CLI saying it cannot parse hex bytes, 
because it would now know that the key you are passing is a utf8 / ascii string 
to be converted into a bytes. 

It's throwing on the 0.8 head, but not in the release. Will try to work out 
why, anyway it should do this 

[default@abc] set People[wking][name]='ala';  
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'wking' as hex 
bytes

If you add a key validation class the CLI will know what to do, try this…

create column family People2
with comparator=UTF8Type
and key_validation_class = UTF8Type 
and default_validation_class=UTF8Type;

[default@abc] list People2;
Using default limit of 100
---
RowKey: jking
=> (column=name, value=aala, timestamp=1317255173285000)
=> (column=sname, value=saala, timestamp=1317255173854000)
---
RowKey: wking
=> (column=name, value=ala, timestamp=1317255173281000)
=> (column=sname, value=sala, timestamp=1317255173283000)

2 Rows Returned.

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29/09/2011, at 6:12 AM, Julio Julio wrote:

> Hi everyone!
> 
> I'm new user of Cassandra (I'm using Cassandra 0.8.6) 
> and I created CF by command:
> 
> create column family People with comparator=UTF8Type and
> default_validation_class=UTF8Type;
> 
> and then I inserted some values like:
> set People[wking][name]='ala';
> set People[wking][sname]='sala';
> 
> set People[jking][name]='aala';
> set People[jking][sname]='saala';
> 
> and now when i'm adding column to for example 'wking' like:
> set People[wking][age]=long(55);
> it's also added to 'jking'. What I'm doing wrong? 
> Also it surprised me when I wrote:
> get People[akingl]; 
> I get some columns although I haven't inserted anything with 
> key like this.
> 
> Can anyone explain why this happening?
> 
> Best regards
> julio
> 
> 



Re: Weird problem with empty CF

2011-09-28 Thread aaron morton
if I had to guess I would say it was spending time handling tombstones. If you 
see it happen again, and are interested, turn the logging up to DEBUG and look 
for messages from something starting with "Slice"

Minor (automatic) compaction will, over time, purge the tombstones. Until then 
reads must read discard the data deleted by the tombstones. If you perform a 
big (i.e. 100k's ) delete this can reduce performance until compaction does 
it's thing.  

My second guess would be read repair (or the simple consistency checks on read) 
kicking in. That would show up in the "ReadRepairStage" in TPSTATS

it may have been neither of those two things, just guesses. If you have more 
issues let us know and provide some more info. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29/09/2011, at 6:35 AM, Daning wrote:

> I have an app polling a few CFs (select first N * from CF), there were data 
> in CFs but later were deleted so CFs were empty for a long time. I found 
> Cassandra CPU usage was getting high to 80%, normally it uses less than 30%. 
> I issued the select query manually and feel the response is slow. I have 
> tried nodetool compact/repair for those CFs but that does not work. later, I 
> issue 'truncate' for all the CFs and CPU usage gets down to 1%.
> 
> Can somebody explain to me why I need to truncate an empty CF? and what else 
> I could do to bring the CPU usage down?
> 
> I am running 0.8.6.
> 
> Thanks,
> 
> Daning
> 



nodetools cfstats question

2011-09-28 Thread Sanjeev Kulkarni
Hey guys,
I'm using a three node cluster running 0.8.6 with rf of 3. Its a freshly
installed cluster with no upgrade history.
I have 6 cfs and only one of them is written into. That cf has around one
thousand keys. A quick key_range_scan verifies this.
However when I do cfstats, I see the following for this cf.

Number of Keys (estimate): 5248
Key cache capacity: 20
Key cache size: 99329

What is the definition of these three output values? Both the Number of Keys
and Key Cache size are way over what they should be.
Thanks!


Re: GC for ParNew on 0.8.6

2011-09-28 Thread Philippe
No it was an upgrade from 0.8.4 or 0.8.5 depending on the nodes.
No cassandra-env files were changed during the update.
Any other ideas?  The cluster has just been weird ever since running 0.8.6 :
has anyone else upgraded and not run into this?
Le 28 sept. 2011 09:32, "Peter Schuller"  a
écrit :
>> I have changed absolutely nothing and the work load is perhaps even lower
>> than before upgrading because I have paused most updates to the cluster.
Did
>> a log level change or does this have deeper meaning ?
>
> You don't say which version you upgraded from but I suspect 0.7? JVM
> options like heap size calculations have changed. I am guessing you're
> seeing it in the logs because the young generation was sized
> differently, causing young generation garbage collections to be longer
> than before and thus triggering the logging by the Cassandra
> GCInspector.
>
> --
> / Peter Schuller (@scode on twitter)