Re: Some commit logs not deleting

2011-07-25 Thread Sylvain Lebresne
This is likely due to
https://issues.apache.org/jira/browse/CASSANDRA-2829. Basically, if
you have a column family on which you stop to write suddenly forever
(which will be the case if you drop a cf), one commit log could get
retained forever (forever meaning "until next restart of the node" in
that context). The patch is targeted for 0.8.3 and I see no good
reason why it wouldn't make it in, but I don't think we'll push it in
the 0.7 series.

--
Sylvain

On Mon, Jul 25, 2011 at 4:48 AM, Chad Johnson  wrote:
> Hello,
>
> We are running Cassandra 0.7.5 on a 15 node cluster, RF=3. We are having a
> problem where some commit logs do not get deleted. Our write load generates
> a new commit log about every two to three minutes. On average, one commit
> log an hour is not deleted. Without draining, deleting the remaining commit
> log files and restarting each node in the cluster, the commit log partition
> will fill up. We do one thing with our cluster that is probably not very
> common. We make schema changes four times per day. We cycle column families
> by dropping old column families for old data we don't care about any longer
> and creating new ones for new data.
>
> Is anybody else see this problem? I can assume that the dirty bit for those
> commit logs are still set, but why? How can I determine what CF memtable is
> still dirty?
>
> Please let me know if there is additional information I can provide and
> thanks for your help.
>
> Chad
>


cassandra server disk full

2011-07-25 Thread Donna Li

All:
Could anyone help me?


Best Regards
Donna li

-邮件原件-
发件人: Donna Li [mailto:donna...@utstar.com] 
发送时间: 2011年7月22日 11:23
收件人: user@cassandra.apache.org
主题: cassandra server disk full


All:

Is there an easy way to fix the bug by change server's code?


Best Regards
Donna li

-邮件原件-
发件人: Donna Li [mailto:donna...@utstar.com] 
发送时间: 2011年7月8日 11:29
收件人: user@cassandra.apache.org
主题: cassandra server disk full


Does CASSANDRA-809 resolved or any other path can resolve the problem? Is there 
any way to avoid reboot the cassandra server?
Thanks!

Best Regards
Donna li

-邮件原件-
发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
发送时间: 2011年7月8日 11:03
收件人: user@cassandra.apache.org
主题: Re: cassandra server disk full

Yeah, ideally it should probably die or drop into read-only mode if it
runs out of space.
(https://issues.apache.org/jira/browse/CASSANDRA-809)

Unfortunately dealing with disk-full conditions tends to be a low
priority for many people because it's relatively easy to avoid with
decent monitoring, but if it's critical for you, we'd welcome the
assistance.

On Thu, Jul 7, 2011 at 8:34 PM, Donna Li  wrote:
> All:
>
> When one of the cassandra servers disk full, the cluster can not work
> normally, even I make space. I must reboot the server that disk full, the
> cluster can work normally.
>
>
>
> Best Regards
>
> Donna li



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: host clocks

2011-07-25 Thread Paul Loy
And I guess there's the next problem, if you have more than one client...

On Mon, Jul 25, 2011 at 5:25 AM, Edward Capriolo wrote:

> You should always sync your host clocks. Clients provide timestamp but for
> the server gc_grace and ttl columns can have issues if server clocks are not
> correct.
>
> On Sunday, July 24, 2011, 魏金仙  wrote:
> > hi all,
> >I'm launching a cassandra cluster with 30 nodes. I wonder whether the
> inconsistency of host clocks will influence the performance of cluster.
> >   Thanks!
> >
> >
>



-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-25 Thread lebron james
>There are many things you can do to lower caches,optimize memtables, and
tune jvms.

Please tell what thins i can do   to lower caches,optimize memtables, and
tune jvms?

>From experience with similar-sized data sets, 1.5GB may be too little.
Recently I bumped our java HEAP limit from 3GB to 4GB to get ?>past an OOM
doing a major compaction.

In future i need database more than 10TB, so i need solve problem with ram,
bacause i need use not more that 4 GB ram on 5TB database.


Re: host clocks

2011-07-25 Thread Sooraj S
may be you can sync the servers with "pool.ntp.org" with running ntp as a
service.

On 25 July 2011 14:34, Paul Loy  wrote:

> And I guess there's the next problem, if you have more than one client...
>
>
> On Mon, Jul 25, 2011 at 5:25 AM, Edward Capriolo wrote:
>
>> You should always sync your host clocks. Clients provide timestamp but for
>> the server gc_grace and ttl columns can have issues if server clocks are not
>> correct.
>>
>> On Sunday, July 24, 2011, 魏金仙  wrote:
>> > hi all,
>> >I'm launching a cassandra cluster with 30 nodes. I wonder whether the
>> inconsistency of host clocks will influence the performance of cluster.
>> >   Thanks!
>> >
>> >
>>
>
>
>
> --
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy
>



-- 
Thanks & Regards
 SOORAJ S


http://SparkSupport.com | http://migrate2cloud.com
Thanks and Regards,
Sooraj S
Software Engineer



Re: host clocks

2011-07-25 Thread Paul Loy
As I understand it, this will not quarantee that they are millisecond
accurate which is what you need for Cassandra to use the correct commit.
We've seen problems in production and had to rearchitect parts of the system
due to this even though all servers are NTP synched.

http://www.endruntechnologies.com/faq.htm#How_accurate_is

On Mon, Jul 25, 2011 at 1:38 PM, Sooraj S  wrote:

> may be you can sync the servers with "pool.ntp.org" with running ntp as a
> service.
>
>
> On 25 July 2011 14:34, Paul Loy  wrote:
>
>> And I guess there's the next problem, if you have more than one client...
>>
>>
>> On Mon, Jul 25, 2011 at 5:25 AM, Edward Capriolo 
>> wrote:
>>
>>> You should always sync your host clocks. Clients provide timestamp but
>>> for the server gc_grace and ttl columns can have issues if server clocks are
>>> not correct.
>>>
>>> On Sunday, July 24, 2011, 魏金仙  wrote:
>>> > hi all,
>>> >I'm launching a cassandra cluster with 30 nodes. I wonder whether
>>> the inconsistency of host clocks will influence the performance of cluster.
>>> >   Thanks!
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>>
>
>
>
> --
> Thanks & Regards
>  SOORAJ S
>
>
> http://SparkSupport.com | http://migrate2cloud.com
> Thanks and Regards,
> Sooraj S
> Software Engineer
> 
>



-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
I am using normal SATA disk,  actually I was worrying about whether it
is okay if every time cassandra using all the io resources?
further more when is the good time to add more nodes when I was just
using normal SATA disk and with 100r/s it could reach 100 %util

how large the data size it should be on each node?


below is my iostat -x 2 when doing node repair, I have to repair
column family separately otherwise the load will be more crazy:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   1.50 1.50  121.50   14.00 3.68 0.30
60.19   116.98 1569.46   59.49 14673.86   7.38 100.00






On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis  wrote:
> On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard  wrote:
>> My understanding is that during compaction cassandra does a lot of non 
>> sequential readsa then dumps the results with a big sequential write.
>
> Compaction reads and writes are both sequential, and 0.8 allows
> setting a MB/s to cap compaction at.
>
> As to the original question "do I need to add more machines" I'd say
> that depends more on whether your application's SLA is met, than what
> % io util spikes to.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: host clocks

2011-07-25 Thread zGreenfelder
On Mon, Jul 25, 2011 at 8:51 AM, Paul Loy  wrote:

> As I understand it, this will not quarantee that they are millisecond
> accurate which is what you need for Cassandra to use the correct commit.
> We've seen problems in production and had to rearchitect parts of the system
> due to this even though all servers are NTP synched.
>
> http://www.endruntechnologies.com/faq.htm#How_accurate_is
>
>
>
Perhaps you'd do well to do some more intensive NTP configuration.
Something like:

outside server(s)  <--- 2 local NTP machines (time[01].mydomain) < X
number of cassandra servers over lan

assuming your lan is speedy enough, you should be able to get to the .5 MS
range of sameness (which might still be 100MS off 'real' time, but I don't
think you'd really care about that).

it might even be worth creating your own NTP LAN that connects all the
machines on an interface separate from Data & Management.


-- 
Even the Magic 8 ball has an opinion on email clients: Outlook not so good.


Re: host clocks

2011-07-25 Thread Paul Loy
we don't have those guarantees on EC2. Networks can fluctuate wildly.

On Mon, Jul 25, 2011 at 2:00 PM, zGreenfelder wrote:

>
>
> On Mon, Jul 25, 2011 at 8:51 AM, Paul Loy  wrote:
>
>> As I understand it, this will not quarantee that they are millisecond
>> accurate which is what you need for Cassandra to use the correct commit.
>> We've seen problems in production and had to rearchitect parts of the system
>> due to this even though all servers are NTP synched.
>>
>> http://www.endruntechnologies.com/faq.htm#How_accurate_is
>>
>>
>>
> Perhaps you'd do well to do some more intensive NTP configuration.
> Something like:
>
> outside server(s)  <--- 2 local NTP machines (time[01].mydomain) < X
> number of cassandra servers over lan
>
> assuming your lan is speedy enough, you should be able to get to the .5 MS
> range of sameness (which might still be 100MS off 'real' time, but I don't
> think you'd really care about that).
>
> it might even be worth creating your own NTP LAN that connects all the
> machines on an interface separate from Data & Management.
>
>
> --
> Even the Magic 8 ball has an opinion on email clients: Outlook not so good.
>



-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: host clocks

2011-07-25 Thread Peter Schuller
> As I understand it, this will not quarantee that they are millisecond 
> accurate which is what you need for Cassandra to use the correct commit. 
> We've seen problems in production and had to rearchitect parts of the system 
> due to this even though all servers are NTP synched.

It does not matter if all clocks are infinitely well in synch. Without
actual co-ordination you cannot rely on clocks to resolve conflicting
writes correctly if there is an ordering requirement. Even with
perfect clocks, all sorts of other effects can affect the "timeline"
as observed by an outside observer. For example, the "first" client
may get context switched away by the OS just before grabbing the
current time, allowing the "second" client to obtain an earlier
timestamp.

("First" and "second" refer to some arbitrary ordering with respect to
something else; whatever it is that you are looking to achieve
ordering with respect to.)

Clocks should be synchronized, yes. But either your data model is such
that conflicting writes are okay, or you need external co-ordination.
There's not hoping for the best by keeping clocks better in synch.

--
/ Peter Schuller (@scode on twitter)


Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
as the wiki suggested:
http://wiki.apache.org/cassandra/LargeDataSetConsiderations
Adding nodes is a slow process if each node is responsible for a large
amount of data. Plan for this; do not try to throw additional hardware
at a cluster at the last minute.


I really would like to know what's the status of my cluster, if it is normal


On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu  wrote:
> I am using normal SATA disk,  actually I was worrying about whether it
> is okay if every time cassandra using all the io resources?
> further more when is the good time to add more nodes when I was just
> using normal SATA disk and with 100r/s it could reach 100 %util
>
> how large the data size it should be on each node?
>
>
> below is my iostat -x 2 when doing node repair, I have to repair
> column family separately otherwise the load will be more crazy:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               1.50     1.50  121.50   14.00     3.68     0.30
> 60.19   116.98 1569.46   59.49 14673.86   7.38 100.00
>
>
>
>
>
>
> On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis  wrote:
>> On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard  wrote:
>>> My understanding is that during compaction cassandra does a lot of non 
>>> sequential readsa then dumps the results with a big sequential write.
>>
>> Compaction reads and writes are both sequential, and 0.8 allows
>> setting a MB/s to cap compaction at.
>>
>> As to the original question "do I need to add more machines" I'd say
>> that depends more on whether your application's SLA is met, than what
>> % io util spikes to.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>


Re: After column deletion cassandra won't insert more data to a specific key

2011-07-25 Thread Guillermo Winkler
Hi, thanks both for the answers.

The problem was indeed with the timestamps.

What was happening also was that in a mutation involving 1 deletion and
various insertions for the same key, all were using the same timestamp, so
beside looking at the code doing this

remove key
insert key, col, val
insert key, col, val
insert key, col, val

With quorum 1, the insertions were always missing.

I've been reading past threads regarding time sync inside/outside cassandra,
I guess this ain't changing in the near future?

Best,
Guille


On Sun, Jul 24, 2011 at 1:07 PM, Edward Capriolo wrote:

> Remember the cli uses microsecond precision . so if your app is not using
> the same precision weird this will result in clients writing the biggest
> timsetamp winning the final value.
>
>
> On Saturday, July 23, 2011, Jonathan Ellis  wrote:
> > You must have given it a delete timestamp in the "future."
> >
> > On Sat, Jul 23, 2011 at 3:46 PM, Guillermo Winkler
> >  wrote:
> >> I'm having a strange behavior on one of my cassandra boxes, after all
> >> columns are removed from a row, insertion on that key stops working
> (from
> >> API and from the cli)
> >> [default@Agent] get Schedulers['atendimento'];
> >> Returned 0 results.
> >> [default@Agent] set Schedulers['atendimento']['test'] = 'dd';
> >> Value inserted.
> >> [default@Agent] get Schedulers['atendimento'];
> >> Returned 0 results.
> >> Already tried nodetool flush/compact/repair on the CF, doesn't fix the
> >> problem.
> >> With a ver simple setup:
> >> * only one node in the cluster (the cluster never had more nodes nor
> >> replicas)
> >> * random partitioner
> >> * CF defined as "create column family Schedulers with
> comparator=BytesType;"
> >> The only way for it to start working again is to truncate the CF.
> >> Do you have any clues how to diagnose what's going on?
> >> Thanks,
> >> Guille
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
> >
>




Re: cassandra server disk full

2011-07-25 Thread Ryan King
We have a patch somewhere that will kill the node on IOErrors, since
those tend to be of the class that are unrecoverable.

-ryan

On Thu, Jul 7, 2011 at 8:02 PM, Jonathan Ellis  wrote:
> Yeah, ideally it should probably die or drop into read-only mode if it
> runs out of space.
> (https://issues.apache.org/jira/browse/CASSANDRA-809)
>
> Unfortunately dealing with disk-full conditions tends to be a low
> priority for many people because it's relatively easy to avoid with
> decent monitoring, but if it's critical for you, we'd welcome the
> assistance.
>
> On Thu, Jul 7, 2011 at 8:34 PM, Donna Li  wrote:
>> All:
>>
>> When one of the cassandra servers disk full, the cluster can not work
>> normally, even I make space. I must reboot the server that disk full, the
>> cluster can work normally.
>>
>>
>>
>> Best Regards
>>
>> Donna li
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Counter consistency - are counters idempotent?

2011-07-25 Thread Aaron Turner
On Sun, Jul 24, 2011 at 3:36 PM, aaron morton  wrote:
> What's your use case ? There are people out there having good times with 
> counters, see
>
> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter

It's actually pretty similar to Twitter's click counting, but
apparently we have different requirements for accuracy.  It's possible
Rainbird does something on the front end to solve for this issue- I'm
honestly not sure since they haven't released the code yet.

Anyways, when you're building network aggregate graphs and fail to add
the +100G of traffic from one switch to your site or metro aggregate,
people around here notice.  And people quickly start distrusting
graphs which don't look "real" and either ignore them completely or
complain.

Obviously, one should manage their Cassandra cluster to limit the
occurrence of Timeouts, but frankly I don't want to be paged at 2am to
"fix" these kind of problems.  If I knew "timeout" meant "failed to
increment counter", I could spool my changes on the client and try
again later, but that's not what timeout means.  Without any means to
recover I've actually lost a lot of reliability that I currently have
with my single PostgreSQL server backed data store.

Right now I'm trying to come up with a way that my distributed snmp
pollers can build aggregates efficiently without counters, but that's
going to add a lot of complexity. :(

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: cassandra server disk full

2011-07-25 Thread Ryan King
Actually I was wrong– our patch will disable gosisp and thrift but
leave the process running:

https://issues.apache.org/jira/browse/CASSANDRA-2118

If people are interested in that I can make sure its up to date with
our latest version.

-ryan

On Mon, Jul 25, 2011 at 10:07 AM, Ryan King  wrote:
> We have a patch somewhere that will kill the node on IOErrors, since
> those tend to be of the class that are unrecoverable.
>
> -ryan
>
> On Thu, Jul 7, 2011 at 8:02 PM, Jonathan Ellis  wrote:
>> Yeah, ideally it should probably die or drop into read-only mode if it
>> runs out of space.
>> (https://issues.apache.org/jira/browse/CASSANDRA-809)
>>
>> Unfortunately dealing with disk-full conditions tends to be a low
>> priority for many people because it's relatively easy to avoid with
>> decent monitoring, but if it's critical for you, we'd welcome the
>> assistance.
>>
>> On Thu, Jul 7, 2011 at 8:34 PM, Donna Li  wrote:
>>> All:
>>>
>>> When one of the cassandra servers disk full, the cluster can not work
>>> normally, even I make space. I must reboot the server that disk full, the
>>> cluster can work normally.
>>>
>>>
>>>
>>> Best Regards
>>>
>>> Donna li
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>


Re: Counter consistency - are counters idempotent?

2011-07-25 Thread Sylvain Lebresne
On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner  wrote:
> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton  wrote:
>> What's your use case ? There are people out there having good times with 
>> counters, see
>>
>> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
>> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter
>
> It's actually pretty similar to Twitter's click counting, but
> apparently we have different requirements for accuracy.  It's possible
> Rainbird does something on the front end to solve for this issue- I'm
> honestly not sure since they haven't released the code yet.
>
> Anyways, when you're building network aggregate graphs and fail to add
> the +100G of traffic from one switch to your site or metro aggregate,
> people around here notice.  And people quickly start distrusting
> graphs which don't look "real" and either ignore them completely or
> complain.
>
> Obviously, one should manage their Cassandra cluster to limit the
> occurrence of Timeouts, but frankly I don't want to be paged at 2am to
> "fix" these kind of problems.  If I knew "timeout" meant "failed to
> increment counter", I could spool my changes on the client and try
> again later, but that's not what timeout means.  Without any means to
> recover I've actually lost a lot of reliability that I currently have
> with my single PostgreSQL server backed data store.

Just to make it very clear: *nobody* is arguing this is not a limitation.

The thing is some find counters useful even while perfectly aware of
that limitation and seems to be very productive with it, so we have
added them. Truth is, if you can live with the limitations and manage
the timeout to a bare minimum (hopefully 0), then you won't find much
system that are able to scale counting both in term of number of
counters and number of ops/s on each counter, and that across
datacenters, like Cassandra counters does. And let's recall that
while you don't know what happened on a timeout, you at least know
when those happens, so you can compute the error margin.

Again, this does not mean we don't want to fix the limitations, nor
that we want you to wake up at 2am, and there is actually a ticket
open for that:
https://issues.apache.org/jira/browse/CASSANDRA-2495
The problem is, so far, we haven't found any satisfying solution to
that problem. If someone has a solution, please, please, share!

But yes, counters in their current state don't fit everyone needs
and we certainly don't want to hide it.

--
Sylvain


Re: Counter consistency - are counters idempotent?

2011-07-25 Thread Aaron Turner
On Mon, Jul 25, 2011 at 11:24 AM, Sylvain Lebresne  wrote:
> On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner  wrote:
>> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton  
>> wrote:
>>> What's your use case ? There are people out there having good times with 
>>> counters, see
>>>
>>> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
>>> http://www.scribd.com/doc/59830692/Cassandra-at-Twitter
>>
>> It's actually pretty similar to Twitter's click counting, but
>> apparently we have different requirements for accuracy.  It's possible
>> Rainbird does something on the front end to solve for this issue- I'm
>> honestly not sure since they haven't released the code yet.
>>
>> Anyways, when you're building network aggregate graphs and fail to add
>> the +100G of traffic from one switch to your site or metro aggregate,
>> people around here notice.  And people quickly start distrusting
>> graphs which don't look "real" and either ignore them completely or
>> complain.
>>
>> Obviously, one should manage their Cassandra cluster to limit the
>> occurrence of Timeouts, but frankly I don't want to be paged at 2am to
>> "fix" these kind of problems.  If I knew "timeout" meant "failed to
>> increment counter", I could spool my changes on the client and try
>> again later, but that's not what timeout means.  Without any means to
>> recover I've actually lost a lot of reliability that I currently have
>> with my single PostgreSQL server backed data store.
>
> Just to make it very clear: *nobody* is arguing this is not a limitation.
>
> The thing is some find counters useful even while perfectly aware of
> that limitation and seems to be very productive with it, so we have
> added them. Truth is, if you can live with the limitations and manage
> the timeout to a bare minimum (hopefully 0), then you won't find much
> system that are able to scale counting both in term of number of
> counters and number of ops/s on each counter, and that across
> datacenters, like Cassandra counters does. And let's recall that
> while you don't know what happened on a timeout, you at least know
> when those happens, so you can compute the error margin.
>
> Again, this does not mean we don't want to fix the limitations, nor
> that we want you to wake up at 2am, and there is actually a ticket
> open for that:
> https://issues.apache.org/jira/browse/CASSANDRA-2495
> The problem is, so far, we haven't found any satisfying solution to
> that problem. If someone has a solution, please, please, share!
>
> But yes, counters in their current state don't fit everyone needs
> and we certainly don't want to hide it.

I think the Cassandra community has been pretty open about the
limitations and I can see there are some uses for them in their
current state.  Probably my biggest concern is that I'm pretty new to
Cassandra and don't understand why occasionally I see timeouts even
under very low load (one single threaded client).  Once I understood
the impacts wrt counters it went from "annoying" to "oh crap".

Anyways, as I said earlier, I understand this problem is "hard" and I
don't expect a fix in 0.8.2 :)

Mostly right now I'm just bummed because I'm pretty much back at
square one trying to create a scalable solution which meets our needs.
  Not to say Cassandra won't be a part of it, but just that the
solution design has become a lot less obvious.


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


sstableloader throws storage_port error

2011-07-25 Thread John Conwell
I'm trying to figure out how to use the sstableloader tool.  For my test I
have a single node cassandra instance running on my local machine.  I have
cassandra running, and validate this by connecting to it with cassandra-cli.

I run sstableloader using the following command:

bin/sstableloader /Users/someuser/cassandra/mykeyspace

and I get the following error:

org.apache.cassandra.config.ConfigurationException:
localhost/127.0.0.1:7000is in use by another process.  Change
listen_address:storage_port in
cassandra.yaml to values that do not conflict with other services

I've played around with different ports, but nothing works.  It it because
I'm trying to run sstableloader on the same machine that cassandra is
running on?  It would be odd I think, but cant thing of another reason I
would get that eror.

Thanks,
John


Re: cassandra server disk full

2011-07-25 Thread aaron morton
If the commit log or data disk is full it's not possible for the server to 
process any writes, the best it could do is perform reads. But reads may result 
in a write due to read repair and will also need to do some app logging, so 
IMHO it's really down / dead. 

You should free space and restart the cassandra service. Restarting a cassandra 
service should be something your installation can handle. 

Is there something else I'm missing here ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25 Jul 2011, at 20:06, Donna Li wrote:

> 
> All:
>   Could anyone help me?
> 
> 
> Best Regards
> Donna li
> 
> -邮件原件-
> 发件人: Donna Li [mailto:donna...@utstar.com] 
> 发送时间: 2011年7月22日 11:23
> 收件人: user@cassandra.apache.org
> 主题: cassandra server disk full
> 
> 
> All:
> 
> Is there an easy way to fix the bug by change server's code?
> 
> 
> Best Regards
> Donna li
> 
> -邮件原件-
> 发件人: Donna Li [mailto:donna...@utstar.com] 
> 发送时间: 2011年7月8日 11:29
> 收件人: user@cassandra.apache.org
> 主题: cassandra server disk full
> 
> 
> Does CASSANDRA-809 resolved or any other path can resolve the problem? Is 
> there any way to avoid reboot the cassandra server?
> Thanks!
> 
> Best Regards
> Donna li
> 
> -邮件原件-
> 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
> 发送时间: 2011年7月8日 11:03
> 收件人: user@cassandra.apache.org
> 主题: Re: cassandra server disk full
> 
> Yeah, ideally it should probably die or drop into read-only mode if it
> runs out of space.
> (https://issues.apache.org/jira/browse/CASSANDRA-809)
> 
> Unfortunately dealing with disk-full conditions tends to be a low
> priority for many people because it's relatively easy to avoid with
> decent monitoring, but if it's critical for you, we'd welcome the
> assistance.
> 
> On Thu, Jul 7, 2011 at 8:34 PM, Donna Li  wrote:
>> All:
>> 
>> When one of the cassandra servers disk full, the cluster can not work
>> normally, even I make space. I must reboot the server that disk full, the
>> cluster can work normally.
>> 
>> 
>> 
>> Best Regards
>> 
>> Donna li
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-25 Thread aaron morton
How much memory you need depends on a few things such as how many CF's you 
have, what your data is like, and what the usage patterns are like. There is no 
exact formula. 

Generally…
* i would say 4GB of JVM heap is a good start
* key and row caches are set when the CF is created, see "help create column 
family" in cassandra-cli
* CF sizes are set using when running create column family, BUT under 0.8 there 
is a new automagical memory management see 
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
* see the seting in_memory_compaction_limit_in_mb in cassandra.yaml for info on 
reducing the memory requirements for compaction. 

The simplest things you can do is increase the JVM heap size to 3 or 4 GB. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 00:35, lebron james wrote:

> >There are many things you can do to lower caches,optimize memtables, and 
> >tune jvms.
> 
> Please tell what thins i can do   to lower caches,optimize memtables, and 
> tune jvms?  
> 
> >From experience with similar-sized data sets, 1.5GB may be too little. 
> >Recently I bumped our java HEAP limit from 3GB to 4GB to get ?>past an OOM 
> >doing a major compaction.
> 
> In future i need database more than 10TB, so i need solve problem with ram, 
> bacause i need use not more that 4 GB ram on 5TB database.
> 



Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread aaron morton
There are no hard and fast rules to add new nodes, but here are two guidelines:

1) Single node load is getting too high, rule of thumb is 300GB is probably too 
high. 
2) There are times when the cluster cannot keep up with throughout, for example 
the client is getting TimedOutExceptions or TPStats is showing consistently 
high (a multiple of the available threads) read or write pending queues. 

What works for you will be what keeps your site running and keeps the ops/dev 
team sleeping at night.   

In your case, high IO during repair maybe OK if the cluster can keep up with 
demands. Or it may mean you need to upgrade the IO capacity or add nodes. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 01:17, Yan Chunlu wrote:

> as the wiki suggested:
> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
> Adding nodes is a slow process if each node is responsible for a large
> amount of data. Plan for this; do not try to throw additional hardware
> at a cluster at the last minute.
> 
> 
> I really would like to know what's the status of my cluster, if it is normal
> 
> 
> On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu  wrote:
>> I am using normal SATA disk,  actually I was worrying about whether it
>> is okay if every time cassandra using all the io resources?
>> further more when is the good time to add more nodes when I was just
>> using normal SATA disk and with 100r/s it could reach 100 %util
>> 
>> how large the data size it should be on each node?
>> 
>> 
>> below is my iostat -x 2 when doing node repair, I have to repair
>> column family separately otherwise the load will be more crazy:
>> 
>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda   1.50 1.50  121.50   14.00 3.68 0.30
>> 60.19   116.98 1569.46   59.49 14673.86   7.38 100.00
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis  wrote:
>>> On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard  
>>> wrote:
 My understanding is that during compaction cassandra does a lot of non 
 sequential readsa then dumps the results with a big sequential write.
>>> 
>>> Compaction reads and writes are both sequential, and 0.8 allows
>>> setting a MB/s to cap compaction at.
>>> 
>>> As to the original question "do I need to add more machines" I'd say
>>> that depends more on whether your application's SLA is met, than what
>>> % io util spikes to.
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>> 
>> 



Re: After column deletion cassandra won't insert more data to a specific key

2011-07-25 Thread aaron morton
It's just not possible to control time, as many super villains and Peter 
Schuller have shown us 
http://www.mail-archive.com/user@cassandra.apache.org/msg15636.html

Often it's not necessary, you can design around simultaneous updates the same 
key, use a coordination layer such as zoo keeper or rely on consensus. 

If you have a design problem provide some details and someone may be able to 
help. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 04:25, Guillermo Winkler wrote:

> Hi, thanks both for the answers.
> 
> The problem was indeed with the timestamps.
> 
> What was happening also was that in a mutation involving 1 deletion and 
> various insertions for the same key, all were using the same timestamp, so 
> beside looking at the code doing this
> 
> remove key
> insert key, col, val
> insert key, col, val
> insert key, col, val
> 
> With quorum 1, the insertions were always missing.
> 
> I've been reading past threads regarding time sync inside/outside cassandra, 
> I guess this ain't changing in the near future? 
> 
> Best,
> Guille
> 
> 
> On Sun, Jul 24, 2011 at 1:07 PM, Edward Capriolo  
> wrote:
> Remember the cli uses microsecond precision . so if your app is not using the 
> same precision weird this will result in clients writing the biggest 
> timsetamp winning the final value.
> 
> 
> On Saturday, July 23, 2011, Jonathan Ellis  wrote:
> > You must have given it a delete timestamp in the "future."
> >
> > On Sat, Jul 23, 2011 at 3:46 PM, Guillermo Winkler
> >  wrote:
> >> I'm having a strange behavior on one of my cassandra boxes, after all
> >> columns are removed from a row, insertion on that key stops working (from
> >> API and from the cli)
> >> [default@Agent] get Schedulers['atendimento'];
> >> Returned 0 results.
> >> [default@Agent] set Schedulers['atendimento']['test'] = 'dd';
> >> Value inserted.
> >> [default@Agent] get Schedulers['atendimento'];
> >> Returned 0 results.
> >> Already tried nodetool flush/compact/repair on the CF, doesn't fix the
> >> problem.
> >> With a ver simple setup:
> >> * only one node in the cluster (the cluster never had more nodes nor
> >> replicas)
> >> * random partitioner
> >> * CF defined as "create column family Schedulers with 
> >> comparator=BytesType;"
> >> The only way for it to start working again is to truncate the CF.
> >> Do you have any clues how to diagnose what's going on?
> >> Thanks,
> >> Guille
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
> >
> 



Re: After column deletion cassandra won't insert more data to a specific key

2011-07-25 Thread Guillermo Winkler
I guess the problem it's not whether you can control time in a distributed
system or not, but in this case at least, it's if you consider a timestamp
set by a client outside the cluster as *safe*.

When the timestamp gets hidden behind a client/wrapper library
implementation default, realizing it's your responsibility to handle time
sync gets lost in the abstraction. Maybe it would be better for the cluster
to set a default value in those cases.

Not happening to me again, that's for sure :)

Thanks,
Guille

On Mon, Jul 25, 2011 at 7:48 PM, aaron morton wrote:

> It's just not possible to control time, as many super villains and Peter
> Schuller have shown us
> http://www.mail-archive.com/user@cassandra.apache.org/msg15636.html
>
> Often it's not necessary, you can design around simultaneous updates the
> same key, use a coordination layer such as zoo keeper or rely on consensus.
>
> If you have a design problem provide some details and someone may be able
> to help.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26 Jul 2011, at 04:25, Guillermo Winkler wrote:
>
> Hi, thanks both for the answers.
>
> The problem was indeed with the timestamps.
>
> What was happening also was that in a mutation involving 1 deletion and
> various insertions for the same key, all were using the same timestamp, so
> beside looking at the code doing this
>
> remove key
> insert key, col, val
> insert key, col, val
> insert key, col, val
>
> With quorum 1, the insertions were always missing.
>
> I've been reading past threads regarding time sync inside/outside
> cassandra, I guess this ain't changing in the near future?
>
> Best,
> Guille
>
>
> On Sun, Jul 24, 2011 at 1:07 PM, Edward Capriolo wrote:
>
>> Remember the cli uses microsecond precision . so if your app is not using
>> the same precision weird this will result in clients writing the biggest
>> timsetamp winning the final value.
>>
>>
>> On Saturday, July 23, 2011, Jonathan Ellis  wrote:
>> > You must have given it a delete timestamp in the "future."
>> >
>> > On Sat, Jul 23, 2011 at 3:46 PM, Guillermo Winkler
>> >  wrote:
>> >> I'm having a strange behavior on one of my cassandra boxes, after all
>> >> columns are removed from a row, insertion on that key stops working
>> (from
>> >> API and from the cli)
>> >> [default@Agent] get Schedulers['atendimento'];
>> >> Returned 0 results.
>> >> [default@Agent] set Schedulers['atendimento']['test'] = 'dd';
>> >> Value inserted.
>> >> [default@Agent] get Schedulers['atendimento'];
>> >> Returned 0 results.
>> >> Already tried nodetool flush/compact/repair on the CF, doesn't fix the
>> >> problem.
>> >> With a ver simple setup:
>> >> * only one node in the cluster (the cluster never had more nodes nor
>> >> replicas)
>> >> * random partitioner
>> >> * CF defined as "create column family Schedulers with
>> comparator=BytesType;"
>> >> The only way for it to start working again is to truncate the CF.
>> >> Do you have any clues how to diagnose what's going on?
>> >> Thanks,
>> >> Guille
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of DataStax, the source for professional Cassandra support
>> > http://www.datastax.com
>> >
>>
>
>
>




Re: Interpreting the output of cfhistograms

2011-07-25 Thread Maki Watanabe
Offset represent different "units" for each columns.
On SSTables columns, you can see following histgrams:

20  4291637
24  28680590
29  3876198 

It means your 4291637 read operations required 20 SStables to read,
28680590 ops required 24, so on.
In Write/Read latency columns, Offset represents micro seconds.
3711340 read operations completed
in 2 ms.
Most of your row is between 925 ~ 1331 bytes.
Most of your row has 925 ~ 1331 columns.

maki


2011/7/26 Aishwarya Venkataraman 
>
> Hello,
>
> I need help understanding the output of cfhistograms option provided as part 
> of nodetool.
> When I run cfhistograms on one node of a 3 node cluster, I get the following :
>
> Offset SSTables Write Latency Read Latency Row Size Column Count
> 1 0 0 458457 0 0
> 2 0 0 3711340 0 0
> 3 0 0 12159992 0 0
> 4 0 0 14350840 0 0
> 5 0 0 7866204 0 0
> 6 0 0 3427977 0 0
> 7 0 0 2407296 0 0
> 8 0 0 2516075 0 0
> 10 0 0 5392567 0 0
> 12 0 0 4239979 0 0
> 14 0 0 2415529 0 0
> 17 0 0 1406153 0 0
> 20 4291637 0 380625 0 0
> 24 28680590 0 191431 0 0
> 29 3876198 0 141841 0 0
> 35 0 0 57855 0 0
> 42 0 0 15403 0 0
> 50 0 0 4291 0 0
> 60 0 0 2118 0 0
> 72 0 0 1096 0 0
> 86 0 0 662 0 0
> 179 0 0 115 173 173
> 215 0 0 70 35 35
> 258 0 0 48 0 0
> 310 0 0 41 404 404
> 372 0 0 37 0 0
> 446 0 0 22 975 975
> 770 0 0 12 3668 3668
> 924 0 0 4 10142 10142
> 1331 0 0 4 256983543 256983543
> What do these numbers mean ? How can I interpret the above data ? I found 
> some explanation here 
> http://narendrasharma.blogspot.com/2011/04/cassandra-07x-understanding-output-of.html,
>  but I did no understand this completely.
>
> Thanks,
> Aishwarya


Re: Repair fails with java.io.IOError: java.io.EOFException

2011-07-25 Thread Sameer Farooqui
Looks like the repair finished successfully the second time. However, the
cluster is still severely unbalanced. I was hoping the repair would balance
the nodes. We're using random partitioner. One node has 900GB and others
have 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just
insanely high. Not sure why or how to troubleshoot.



On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui wrote:

> I don't see a JVM crashlog ( hs_err_pid[pid].log) in
> ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash?
>
> We're running a pretty up to date with Sun Java:
>
> ubuntu@ip-10-2-x-x:/tmp$ java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> I'm gonna restart the Repair process in a few more hours. If there are any
> additional debug or troubleshooting logs you'd like me to enable first,
> please let me know.
>
> - Sameer
>
>
>
>
> On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis  wrote:
>
>> Did you check for a JVM crash log?
>>
>> You should make sure you're running the latest Sun JVM, older versions
>> and OpenJDK in particular are prone to segfaulting.
>>
>> On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui
>>  wrote:
>> > We are starting Cassandra with "brisk cassandra", so as a stand-alone
>> > process, not a service.
>> >
>> > The syslog on the node doesn't show anything regarding the Cassandra
>> Java
>> > process around the time the last entries were made in the Cassandra
>> > system.log (2011-07-21 13:01:51):
>> >
>> > Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v
>> > debian-sa1 > /dev/null && debian-sa1 1 1)
>> > Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v
>> > debian-sa1 > /dev/null && debian-sa1 1 1)
>> > Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v
>> > debian-sa1 > /dev/null && debian-sa1 1 1)
>> > Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source =
>> > /proc/kmsg started.
>> > Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd"
>> > swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com";]
>> (re)start
>> >
>> >
>> > The last thing in the Cassandra log before INFO Logging initialized is:
>> >
>> >  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line
>> 128)
>> > GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max
>> is
>> > 4030726144
>> >
>> >
>> > I can start Repair again, but am worried that it will crash Cassandra
>> again,
>> > so I want to turn on any debugging or helpful logs to diagnose the crash
>> if
>> > it happens again.
>> >
>> >
>> > - Sameer
>> >
>> >
>> > On Thu, Jul 21, 2011 at 4:30 PM, aaron morton 
>> > wrote:
>> >>
>> >> The default init.d script will direct std out/err to that file, how are
>> >> you starting brisk / cassandra ?
>> >> Check the syslog and other logs in /var/log to see if the OS killed
>> >> cassandra.
>> >> Also, what was the last thing in the casandra log before INFO [main]
>> >> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging
>> >> initialised ?
>> >>
>> >> Cheers
>> >>
>> >> -
>> >> Aaron Morton
>> >> Freelance Cassandra Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:
>> >>
>> >> Hey Aaron,
>> >>
>> >> I don't have any output.log files in that folder:
>> >>
>> >> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
>> >> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
>> >> system.log system.log.11  system.log.4  system.log.7
>> >> system.log.1   system.log.2   system.log.5  system.log.8
>> >> system.log.10  system.log.3   system.log.6  system.log.9
>> >>
>> >>
>> >>
>> >> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton > >
>> >> wrote:
>> >>>
>> >>> Check /var/log/cassandra/output.log (assuming the default init
>> scripts)
>> >>> A
>> >>> -
>> >>> Aaron Morton
>> >>> Freelance Cassandra Developer
>> >>> @aaronmorton
>> >>> http://www.thelastpickle.com
>> >>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
>> >>>
>> >>> Hmm. Just looked at the log more closely.
>> >>>
>> >>> So, what actually happened is while Repair was running on this
>> specific
>> >>> node, the Cassandra java process terminated itself automatically. The
>> last
>> >>> entries in the log are:
>> >>>
>> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java
>> (line
>> >>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888
>> used; max
>> >>> is 4030726144
>> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java
>> (line
>> >>> 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688
>> used; max
>> >>> is 4030726144
>> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java
>> (line
>> >>> 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 193120
>> used; max
>> >>> is 4030726144
>> >>>  INFO [ScheduledTasks:1] 2011-07-2

Re: Repair fails with java.io.IOError: java.io.EOFException

2011-07-25 Thread aaron morton
Background: http://wiki.apache.org/cassandra/Operations

Use node tool ring to check if the tokens are evenly distributed. If not then 
check the Load Balancing and Moving Nodes sections in the page above.

If they are and repair has completed use node tool cleanup to remove the data 
the node is no longer responsible. See bootstrap section above. 

Hope that helps. 


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26 Jul 2011, at 12:44, Sameer Farooqui wrote:

> Looks like the repair finished successfully the second time. However, the 
> cluster is still severely unbalanced. I was hoping the repair would balance 
> the nodes. We're using random partitioner. One node has 900GB and others have 
> 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just insanely 
> high. Not sure why or how to troubleshoot.
> 
> 
> 
> On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui  
> wrote:
> I don't see a JVM crashlog ( hs_err_pid[pid].log) in 
> ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash?
> 
> We're running a pretty up to date with Sun Java:
> 
> ubuntu@ip-10-2-x-x:/tmp$ java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> 
> I'm gonna restart the Repair process in a few more hours. If there are any 
> additional debug or troubleshooting logs you'd like me to enable first, 
> please let me know.
> 
> - Sameer
> 
> 
> 
> 
> On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis  wrote:
> Did you check for a JVM crash log?
> 
> You should make sure you're running the latest Sun JVM, older versions
> and OpenJDK in particular are prone to segfaulting.
> 
> On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui
>  wrote:
> > We are starting Cassandra with "brisk cassandra", so as a stand-alone
> > process, not a service.
> >
> > The syslog on the node doesn't show anything regarding the Cassandra Java
> > process around the time the last entries were made in the Cassandra
> > system.log (2011-07-21 13:01:51):
> >
> > Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v
> > debian-sa1 > /dev/null && debian-sa1 1 1)
> > Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source =
> > /proc/kmsg started.
> > Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software="rsyslogd"
> > swVersion="4.2.0" x-pid="663" x-info="http://www.rsyslog.com";] (re)start
> >
> >
> > The last thing in the Cassandra log before INFO Logging initialized is:
> >
> >  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128)
> > GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is
> > 4030726144
> >
> >
> > I can start Repair again, but am worried that it will crash Cassandra again,
> > so I want to turn on any debugging or helpful logs to diagnose the crash if
> > it happens again.
> >
> >
> > - Sameer
> >
> >
> > On Thu, Jul 21, 2011 at 4:30 PM, aaron morton 
> > wrote:
> >>
> >> The default init.d script will direct std out/err to that file, how are
> >> you starting brisk / cassandra ?
> >> Check the syslog and other logs in /var/log to see if the OS killed
> >> cassandra.
> >> Also, what was the last thing in the casandra log before INFO [main]
> >> 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging
> >> initialised ?
> >>
> >> Cheers
> >>
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:
> >>
> >> Hey Aaron,
> >>
> >> I don't have any output.log files in that folder:
> >>
> >> ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
> >> ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
> >> system.log system.log.11  system.log.4  system.log.7
> >> system.log.1   system.log.2   system.log.5  system.log.8
> >> system.log.10  system.log.3   system.log.6  system.log.9
> >>
> >>
> >>
> >> On Thu, Jul 21, 2011 at 3:40 PM, aaron morton 
> >> wrote:
> >>>
> >>> Check /var/log/cassandra/output.log (assuming the default init scripts)
> >>> A
> >>> -
> >>> Aaron Morton
> >>> Freelance Cassandra Developer
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>> On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
> >>>
> >>> Hmm. Just looked at the log more closely.
> >>>
> >>> So, what actually happened is while Repair was running on this specific
> >>> node, the Cassandra java process terminated itself automatically. The last
> >>> entries in the log are:
> >>>
> >>>  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line
> >>> 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; 
> >>> max
> 

Re: sstableloader throws storage_port error

2011-07-25 Thread Jonathan Ellis
sstableloader uses gossip to discover the Cassandra ring, so you'll
need to run it on a different IP (127.0.0.2 is fine).

On Mon, Jul 25, 2011 at 2:41 PM, John Conwell  wrote:
> I'm trying to figure out how to use the sstableloader tool.  For my test I
> have a single node cassandra instance running on my local machine.  I have
> cassandra running, and validate this by connecting to it with cassandra-cli.
> I run sstableloader using the following command:
> bin/sstableloader /Users/someuser/cassandra/mykeyspace
> and I get the following error:
> org.apache.cassandra.config.ConfigurationException: localhost/127.0.0.1:7000
> is in use by another process.  Change listen_address:storage_port in
> cassandra.yaml to values that do not conflict with other services
>
> I've played around with different ports, but nothing works.  It it because
> I'm trying to run sstableloader on the same machine that cassandra is
> running on?  It would be odd I think, but cant thing of another reason I
> would get that eror.
> Thanks,
> John



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-25 Thread lebron james
I have only 4GB on server, so i give jvm 3 GB of heap, but this dont help,
cassandra still fall when i launch major compaction on 37 GB database.