Re: Working backwards from production to staging/dev

2011-03-26 Thread Edward Capriolo
On Fri, Mar 25, 2011 at 2:11 PM, ian douglas  wrote:
> On 03/25/2011 10:12 AM, Jonathan Ellis wrote:
>>
>> On Fri, Mar 25, 2011 at 11:59 AM, ian douglas  wrote:
>>>
>>> (we're running v0.60)
>>
>> I don't know if you could hear that from where you are, but our whole
>> office just yelled, "WTF!" :)
>
> Ah, that's what that noise was... And yeah, we know we're way behind. Our
> initial delay in upgrading was waiting for 0.7 to come out and then we
> learned we needed a whole new Thrift client for our PHP code base, and then
> we got busy on other things, but we're at a point where we have some time to
> take care of Cassandra and get it upgraded.
>
>  Our planned path, now, is:
>
> (our nodes' tokens are numbered using the python code (0, 1/3 and 2/3 times
> 2^127), and called node 1 through 3, respectively; our RF is set to 2 right
> now)
>
> 1. remove node 1 from our software
> 2. bring node 1 offline after a flush/repair/cleanup
> 3. run a cleanup on node 2 and then on node 3 so they have a full copy of
> all data from the old node 1 and each other.
> 4. bring up a new Large 64-bit instance, install 0.6.12, assign a Token
> value of 0 (node 1), RF:2, on a new gossip ring, and copy all data from the
> 32-bit nodes 2 and 3 and run a repair/cleanup to remove any duplicated data
> 5. remove node 3 from our software
> 6. point our code to the new 64-bit node 1
> 7. bring node 3 offline after a flush/repair/cleanup so node 2 has the last
> fresh copy of everything
> 8. bring node 2 offline after a flush/repair/cleanup
> 9. bring up another Large instance, get a copy of all data from our old node
> 2, assign a Token value of (1/2 * 2^127), RF:2, on the new gossip ring, run
> a repair to remove duplicate data, and then a cleanup so it gets replicated
> data from the new node 1
> 10. add the new node 2 to our software
> 11. run a final cleanup on the new node 1 and then on node 2 to make sure
> all data is replicated evenly on both nodes
>
> ... at this point, we should have two 64-bit Large instances, with RF:2, on
> a new gossip ring, replacing three 32-bit systems, with minimal down time
> and no data loss (just a data delay between steps 6 and 10 above).
>
> Questions:
> 1. Does it appear that we've missed any steps, or doing something out of
> order?
> 2. Is the flush/repair/cleanup overkill when bringing the old nodes offline,
> or is that the correct sequence to follow?
> 3. Will the difference in compute units (lower on Large instances than
> Medium instances) make any noticeable difference, or will the fact that the
> machine is 64-bit handle things efficiently enough such that a Large
> instance works harder than a Medium instance? (never did figure out their
> how their compute units work)
> 4. Can we follow similar steps when we're ready to upgrade to 0.7x and have
> our new Thrift client for PHP all squared away?
>
>
> Thanks again for the help!!!
>
>

If you have a node with an old column family you are not using
anymore...Stop node...delete data...start node.

Edward


Re: Starter GUI Tool for Windows

2011-03-26 Thread Edward Capriolo
I don't know. Apache web server is a patchy web server, but crapsandra
just no way to put that in a good light.

On Friday, March 25, 2011, Dario Bravo  wrote:
> People: Crapssandra.
> I'm starting a Cassandra project and starting to learn about this beautiful 
> Cassandra, so I thougth that it would be nice to have a db gui tool under my 
> current OS.
> It doesn't do anything other than showing some info about the server or the 
> selected keyspace... but I hope it'll do many things such as manage 
> keyspaces, column families, columns and super columns, show data contained on 
> columns, allow to perform queries (get, set, mostly), etc.
>
>
> If anyone wishes to help in any way, please feel free to download the code 
> and modify it.
> It's called Crapssandra because it started as a crappy simple code and it's 
> features are gonna be developed as I need them... so it will have crappy 
> code, mostly.
>
>
> It's done using .net 3.5 and Thrift.
> The address to download it and it's source code 
> is: http://code.google.com/p/crapssandra/
>
>
>  Hope this helps someone, that the app 
> grow as I wish, and to get some help from the community.
> Thanks!
>
> --
> Darío Bravo
>
>
>
>


Re: ParNew (promotion failed)

2011-03-26 Thread ruslan usifov
2011/3/23 ruslan usifov 

> Hello
>
> Sometimes i seen in gc log follow message:
>
> 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion
> failed)
> Desired survivor size 41943040 bytes, new threshold 2 (max 2)
> - age   1:5573024 bytes,5573024 total
> - age   2:5064608 bytes,   10637632 total
> : 672577K->670749K(737280K), 0.1837950 secs]14897.288: [CMS:
> 1602487K->779310K(2326528K), 4.7525580 secs] 2270940K->779310K(3063808K), [
> CMS Perm : 20073K->19913K(33420K)], 4.9365810 secs] [Times: user=5.06
> sys=0.00, real=4.93 secs]
> Total time for which application threads were stopped: 4.9378750 seconds
>

After investigations i detect that this happens when Memtableflash and
compact happens. So at this moment young part of heap is overflown and Full
GC happens.

So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
tune in_memory_compaction_limit_in_mb config parameter?

Also if memtables flushes due  "memtable_flush_after" if i separate in time
memtable flushes can this helps?


Re: help modeling a requirement in cassandra

2011-03-26 Thread buddhasystem
That would depend on how much data is generated per day. If it can still fit
in a row, the solution wold be to to just have rows keyed by date, like
20110326. This way you don't have to move data inside the cluster, the
selection logic will be in the client.

Even if the data is too large to be put in a row for single day, you  can
store all IDs of objects in this fashion, that's usually practical (plus you
can make use of column names for additional indexing capability). Then, you
have to have a separate (and potentially very large table) which contained
the actual data. Essentially, one table would serve as an index to the
other, which is the data store. I have a working application that's
structured like that and it works OK.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/help-modeling-a-requirement-in-cassandra-tp6209726p6210994.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: how does cassandra pick its replicant peers?

2011-03-26 Thread Jonathan Colby
Hi Peter - 

Thanks for the response and adding the FAQ.   Really great answers.

So if I understand correctly, the nodes getting the replica copies are 
predetermined, based on the replication strategy.

One thing is still a bit unclear.   So once a node establishes who its 
replicants are,  are those "replicant nodes" always used?In other words, 
given a RF=3, if the other two replica nodes go down, the "original" node will 
not automatically pick 2 new nodes for its replica copies?


Jon



On Mar 26, 2011, at 12:05 AM, Peter Schuller wrote:

>> Does anyone know how cassandra chooses the nodes for its other replicant 
>> copies?
> 
> This keeps coming up so I added a FAQ entry:
> 
>   http://wiki.apache.org/cassandra/FAQ#replicaplacement
> 
> I don't quite like the phrasing but I couldn't come up anything that
> was sufficiently clear and complete right now.
> 
>> The first node gets the first copy because its token is assigned for that 
>> key.   But what about the other copies of the data?
>> Do the replicant nodes stay the same based on the token range?  Or are the 
>> other copies send to any random node based on its load and availability?
>> I think this is important in order to understand because it affects how to 
>> plan for situations where a significant number of nodes are suddenly 
>> unavailable, such as the loss of a data center.
> 
> I hope the above is answered by the FAQ. If it's unclear please say so
> and we can clarify.
> 
>> If the replicants are copied just based on random availability, then quorum 
>> writes could survive on the remaining nodes.  But if the replicant nodes are 
>> somehow pre-determined, those replicants may node be available and writes 
>> will fail.
> 
> I'm not really following this though. Why would you ever want data to
> be placed based on "random availability"?
> 
> If you are writing at QUORUM, a quorum of nodes in the replicate set
> must have ack:ed the write in order for the read to be considered
> successful (similar for reads). If a sufficient amount of nodes are
> up, you're fine. If not, then no - fundamentally that would violate
> the requirement of quorum.
> 
> For example, if you're at RF=3, at least two nodes (in the replica set
> for a given key) must be responding to your request in order for them
> to succeed at QUORUM.
> 
> -- 
> / Peter Schuller



Re: how does cassandra pick its replicant peers?

2011-03-26 Thread Peter Schuller
> So if I understand correctly, the nodes getting the replica copies are 
> predetermined, based on the replication strategy.

Yes, pre-determined as a function of ring layout (which nodes have
which tokens), the replication factor, and the replication strategy.

> One thing is still a bit unclear.   So once a node establishes who its 
> replicants are,  are those "replicant nodes" always used?

To be clear, it's not that there is a primary node that internally
decides who is responsible for data. Rather, given the known rules for
determining which nodes are responsible for a given row key, any node
in the cluster is aware of that information.

In order to satisfy consistency levels during read and write
operations, requests are routed as appropriate to nodes that are
responsible for the row key.

So yes, the nodes responsible for a given row key are always used for
reads and writes to that row key.

>  In other words, given a RF=3, if the other two replica nodes go down, the 
> "original" node will not automatically pick 2 new nodes for its replica 
> copies?

No. Actually moving data between nodes is done by ring management
operations; i.e., an operator deciding to decommission or move nodes.
Moving data is quite an expensive/significant operation and there is
no automatic mechanism to do this. The way to survive nodes going down
is to select an appropriate replication factor on the key space, and
select an appropriate consistency level for reads and writes.

(The only exception would be hinted hand-off when writing at CL.ANY
and hinted hand-off, but I'll leave that beyond the scope of this.)

-- 
/ Peter Schuller


Re: ParNew (promotion failed)

2011-03-26 Thread Peter Schuller
> So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
> tune in_memory_compaction_limit_in_mb config parameter?

More likely adjust the initial occupancy trigger and/or the heap size.
Probably just the latter. This is assuming you're on 0.7 with mostly
default JVM options. See cassandra-env.sh.

In fact, you may even be helped by *decreasing* the young generation
size if you're running a version which has a cassandra-env which
specifies -Xmn. I'm not entirely sure because I don't know off hand
exactly based on what the occupancy trigger is happening, but the
young gen is large and the workload is such that young-gen gc:s
promote a high percentage of it's data, I suspect that can lead to CMS
triggering too late.  (So this paragraph is speculation.)

-- 
/ Peter Schuller


Re: ParNew (promotion failed)

2011-03-26 Thread Peter Schuller
>> So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
>> tune in_memory_compaction_limit_in_mb config parameter?
>
> More likely adjust the initial occupancy trigger and/or the heap size.
> Probably just the latter. This is assuming you're on 0.7 with mostly
> default JVM options. See cassandra-env.sh.

Well, in_memory_compaction_limit_in_mb can be relevant. But this
should only matter if you have large rows in your column family.

-- 
/ Peter Schuller


Re: URGENT HELP PLEASE!

2011-03-26 Thread Peter Schuller
> What happened is this:
> You started your cluster with only one node, so at first, all data was on 
> this.
> Then you added a second node. Cassandra then moved (approximatively)
> half of the data to the second node. In theory, at that
> point the data that was moved to the second node could be removed from
> the first node (since you had RF=1). However, Cassandra
> don't do that removing part automatically for safety reasons. You'll
> have to run cleanup on the first node for that to happen.
> So there was stale data on the first node, that never got updated
> because the first node was not responsible anymore for that data.

But this doesn't explain why he was able to read the stale data? Or
did I miss something about actually having removed the second node
from the ring after it was shut off?

-- 
/ Peter Schuller


question on saved_cache_directory

2011-03-26 Thread Anurag Gujral
Hi All,
I have sdd and normal disk .I am using sdd for data directory
should i also use sdd for saved_cache directory.
Thanks
Anurag


Re: Starter GUI Tool for Windows

2011-03-26 Thread Dario Bravo
hehe, okay, maybe I'd chosen a bad name... does anybody think a better one?

If you check out the source, it can do a few new things, such as drop
keyspaces (except "system"), and show info on selected nodes...

Tomorrow I'll be adding a bunch of new features, I hope.


2011/3/26 Edward Capriolo 

> I don't know. Apache web server is a patchy web server, but crapsandra
> just no way to put that in a good light.
>
> On Friday, March 25, 2011, Dario Bravo  wrote:
> > People: Crapssandra.
> > I'm starting a Cassandra project and starting to learn about this
> beautiful Cassandra, so I thougth that it would be nice to have a db gui
> tool under my current OS.
> > It doesn't do anything other than showing some info about the server or
> the selected keyspace... but I hope it'll do many things such as manage
> keyspaces, column families, columns and super columns, show data contained
> on columns, allow to perform queries (get, set, mostly), etc.
> >
> >
> > If anyone wishes to help in any way, please feel free to download the
> code and modify it.
> > It's called Crapssandra because it started as a crappy simple code and
> it's features are gonna be developed as I need them... so it will have
> crappy code, mostly.
> >
> >
> > It's done using .net 3.5 and Thrift.
> > The address to download it and it's source code is:
> http://code.google.com/p/crapssandra/
> >
> >
> >  Hope this helps someone, that
> the app grow as I wish, and to get some help from the community.
> > Thanks!
> >
> > --
> > Darío Bravo
> >
> >
> >
> >
>



-- 
Darío Bravo


Re: data aggregation in Cassandra

2011-03-26 Thread aaron morton
If you are using OPP you will need to understand how to balance the data around 
the ring, start with RP until you have an idea why it's now working for you. 
The RP will  transform the key with a hash function, which is then compared to 
the node tokens to locate the first replica for the data. The OPP uses the raw 
key. see http://wiki.apache.org/cassandra/Operations#Ring_management and 
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 
Reading 20 to 30 million records will take a while. Perhaps look at 
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
 and http://www.datastax.com/products/brisk for background. 

Consider how you can dernormalise to support your queries. e.g. in a CF use 
keys such as "attr1/value" column name as the time stamp and value as the stuff 
you need (you could pack all the data you need into a structure like JSON )

CF's have a (potentially) large memory overhead. Use fewer and store mixed but 
related content in them. 
  
Hope that helps. 
Aaron


On 26 Mar 2011, at 05:38, Saurabh Sehgal wrote:

> Thanks for all the responses. 
> 
> My leading questions then are ->
> 
> - Should I go with the OrderPreservingPartitioner based on timestamps so I 
> can do time range queries - is this recommended ? any special cases regarding 
> load balancing I need to keep in mind ? I have read buzz over blogs/forums on 
> how RandomPartitioner yields better load balancing, and it is discouraged to 
> use OrderPreservingPartitioner. Can someone expand/comment on this ?
> 
> - Also, lets say I query all partitioned data between timestampuuid1 and 
> timestampuuid2 (over several weeks) .. this would potentially , in my case, 
> return anywhere to 20 - 30 million records. How would I go about aggregating 
> this data "by hand" ? Will this perform ?
> 
> Since I am only interested in aggregating over a finite set of 10-20 
> attributes. Does it make more sense to have a column family per finite 
> attribute ? In this case, I do not need to do any aggregation, since all the 
> data for that attribute resides in one column family. Is there an upper bound 
> to the number of column families Cassandra currently supports ?
> 
> 
> 
> On Fri, Mar 25, 2011 at 7:31 AM, buddhasystem  wrote:
> Hello Saurabh,
> 
> I have a similar situation, with a more complex data model, and I do an
> equivalent of map-reduce "by hand". The redeeming value is that you have
> complete freedom in how you hash, and you design the way you store indexes
> and similar structures. If there is a pattern in data store, you use it to
> your advantage. In the end, you get good performance.
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/data-aggregation-in-Cassandra-tp6206994p6207879.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: data aggregation in Cassandra

2011-03-26 Thread Saurabh Sehgal
Thanks for the reply. The reason I want to go with OPP is to do range based
queries on time. All queries against the data are going to be time based.
With an RPP partitioning scheme, will it be efficient to do range based
queries ?
On Mar 26, 2011 9:12 PM, "aaron morton"  wrote:
> If you are using OPP you will need to understand how to balance the data
around the ring, start with RP until you have an idea why it's now working
for you. The RP will transform the key with a hash function, which is then
compared to the node tokens to locate the first replica for the data. The
OPP uses the raw key. see
http://wiki.apache.org/cassandra/Operations#Ring_management and
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
>
> Reading 20 to 30 million records will take a while. Perhaps look at
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011and
http://www.datastax.com/products/brisk for background.
>
> Consider how you can dernormalise to support your queries. e.g. in a CF
use keys such as "attr1/value" column name as the time stamp and value as
the stuff you need (you could pack all the data you need into a structure
like JSON )
>
> CF's have a (potentially) large memory overhead. Use fewer and store mixed
but related content in them.
>
> Hope that helps.
> Aaron
>
>
> On 26 Mar 2011, at 05:38, Saurabh Sehgal wrote:
>
>> Thanks for all the responses.
>>
>> My leading questions then are ->
>>
>> - Should I go with the OrderPreservingPartitioner based on timestamps so
I can do time range queries - is this recommended ? any special cases
regarding load balancing I need to keep in mind ? I have read buzz over
blogs/forums on how RandomPartitioner yields better load balancing, and it
is discouraged to use OrderPreservingPartitioner. Can someone expand/comment
on this ?
>>
>> - Also, lets say I query all partitioned data between timestampuuid1 and
timestampuuid2 (over several weeks) .. this would potentially , in my case,
return anywhere to 20 - 30 million records. How would I go about aggregating
this data "by hand" ? Will this perform ?
>>
>> Since I am only interested in aggregating over a finite set of 10-20
attributes. Does it make more sense to have a column family per finite
attribute ? In this case, I do not need to do any aggregation, since all the
data for that attribute resides in one column family. Is there an upper
bound to the number of column families Cassandra currently supports ?
>>
>>
>>
>> On Fri, Mar 25, 2011 at 7:31 AM, buddhasystem  wrote:
>> Hello Saurabh,
>>
>> I have a similar situation, with a more complex data model, and I do an
>> equivalent of map-reduce "by hand". The redeeming value is that you have
>> complete freedom in how you hash, and you design the way you store
indexes
>> and similar structures. If there is a pattern in data store, you use it
to
>> your advantage. In the end, you get good performance.
>>
>> --
>> View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/data-aggregation-in-Cassandra-tp6206994p6207879.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
>


How to repair HintsColumnFamily?

2011-03-26 Thread Shotaro Kamio
Hi,

Our cluster uses cassandra 0.7.4 (upgraded from 0.7.3) with
replication = 3. I found that error occurs on one node during hinted
handoff with following error (log #1 below).
When I tried out "scrub system HintsColumnFamily", I saw an ERROR in
log (log #2 below).
Do you think these errors are critical ?
I tried to "repair system HintsColumnFamily". But, it refuses to run
with "No neighbors". I can understand because hints are not
replicated. But then, is there any way to fix it without data loss?

 INFO [manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd] 2011-03-27
13:55:05,664 AntiEntropyService.java (line 752) No neighbors to repair
with: manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd completed.


Best regards,
Shotaro


 Log #1: Error on hinted handoff


ERROR [HintedHandoff:1] 2011-03-26 20:04:22,528
DebuggableThreadPoolExecutor.java (line 103) Error in
ThreadPoolExecutor
java.lang.RuntimeException: java.lang.RuntimeException: error reading
4976040 of 4976067
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: error reading 4976040 of 4976067
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
at 
org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364)
at 
org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterator.java:217)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:63)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
at 
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1368)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:321)
at 
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
at 
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:409)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readByte(RandomAccessFile.java:591)
at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:351)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311)
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
... 21 more

--

 Log #2: Error on scrub ---

 INFO [CompactionExecutor:1] 2011-03-27 08:07:34,527
CompactionManager.java (line 512) Scrubbing
SSTableReader(path='/data/cassandra/system/HintsColumnFamily-f-530-Data.db')
 WARN [CompactionExecutor:1] 2011-03-27 08:07:34,602
CompactionManager.java (line 607) Non-fatal error reading row
(stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size 406136901
at 
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:589)
at 
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
at 
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util

Re: How to repair HintsColumnFamily?

2011-03-26 Thread aaron morton
Similar case here 
http://www.mail-archive.com/user@cassandra.apache.org/msg11358.html

Suggestion was to run repair if scrub raises an error. 

Aaron

On 27 Mar 2011, at 16:17, Shotaro Kamio wrote:

> Hi,
> 
> Our cluster uses cassandra 0.7.4 (upgraded from 0.7.3) with
> replication = 3. I found that error occurs on one node during hinted
> handoff with following error (log #1 below).
> When I tried out "scrub system HintsColumnFamily", I saw an ERROR in
> log (log #2 below).
> Do you think these errors are critical ?
> I tried to "repair system HintsColumnFamily". But, it refuses to run
> with "No neighbors". I can understand because hints are not
> replicated. But then, is there any way to fix it without data loss?
> 
> INFO [manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd] 2011-03-27
> 13:55:05,664 AntiEntropyService.java (line 752) No neighbors to repair
> with: manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd completed.
> 
> 
> Best regards,
> Shotaro
> 
> 
>  Log #1: Error on hinted handoff
> 
> 
> ERROR [HintedHandoff:1] 2011-03-26 20:04:22,528
> DebuggableThreadPoolExecutor.java (line 103) Error in
> ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.RuntimeException: error reading
> 4976040 of 4976067
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.RuntimeException: error reading 4976040 of 4976067
>at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
>at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
>at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>at 
> org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364)
>at 
> org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterator.java:217)
>at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:63)
>at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
>at 
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
>at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1368)
>at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245)
>at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173)
>at 
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:321)
>at 
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>at 
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:409)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>... 3 more
> Caused by: java.io.EOFException
>at java.io.RandomAccessFile.readByte(RandomAccessFile.java:591)
>at 
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
>at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
>at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:351)
>at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311)
>at 
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
>... 21 more
> 
> --
> 
>  Log #2: Error on scrub ---
> 
> INFO [CompactionExecutor:1] 2011-03-27 08:07:34,527
> CompactionManager.java (line 512) Scrubbing
> SSTableReader(path='/data/cassandra/system/HintsColumnFamily-f-530-Data.db')
> WARN [CompactionExecutor:1] 2011-03-27 08:07:34,602
> CompactionManager.java (line 607) Non-fatal error reading row
> (stacktrace follows)
> java.io.IOError: java.io.IOException: Impossible row size 406136901
>at 
> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:589)
>at 
> org.apache.cassandra.db.CompactionManager.access$600(Compa

Re: How to repair HintsColumnFamily?

2011-03-26 Thread Shotaro Kamio
Hi Aaron,

I saw the discussion and I already tried to repair. But it doesn't
work since it's HintsColumnFamily. it's not replicated to other nodes.

Any suggestions are welcome.

Thanks,
Shotaro



On Sun, Mar 27, 2011 at 2:25 PM, aaron morton  wrote:
> Similar case
> here http://www.mail-archive.com/user@cassandra.apache.org/msg11358.html
> Suggestion was to run repair if scrub raises an error.
> Aaron
> On 27 Mar 2011, at 16:17, Shotaro Kamio wrote:
>
> Hi,
>
> Our cluster uses cassandra 0.7.4 (upgraded from 0.7.3) with
> replication = 3. I found that error occurs on one node during hinted
> handoff with following error (log #1 below).
> When I tried out "scrub system HintsColumnFamily", I saw an ERROR in
> log (log #2 below).
> Do you think these errors are critical ?
> I tried to "repair system HintsColumnFamily". But, it refuses to run
> with "No neighbors". I can understand because hints are not
> replicated. But then, is there any way to fix it without data loss?
>
> INFO [manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd] 2011-03-27
> 13:55:05,664 AntiEntropyService.java (line 752) No neighbors to repair
> with: manual-repair-0996a2ec-26d3-4243-9586-d56daf30f9bd completed.
>
>
> Best regards,
> Shotaro
>
>
>  Log #1: Error on hinted handoff
> 
>
> ERROR [HintedHandoff:1] 2011-03-26 20:04:22,528
> DebuggableThreadPoolExecutor.java (line 103) Error in
> ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.RuntimeException: error reading
> 4976040 of 4976067
>    at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.RuntimeException: error reading 4976040 of 4976067
>    at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
>    at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
>    at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>    at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>    at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>    at
> org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364)
>    at
> org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterator.java:217)
>    at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:63)
>    at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>    at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>    at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
>    at
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1368)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1245)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1173)
>    at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:321)
>    at
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>    at
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:409)
>    at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>    ... 3 more
> Caused by: java.io.EOFException
>    at java.io.RandomAccessFile.readByte(RandomAccessFile.java:591)
>    at
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
>    at
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
>    at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:351)
>    at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311)
>    at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
>    ... 21 more
>
> --
>
>  Log #2: Error on scrub ---
>
> INFO [CompactionExecutor:1] 2011-03-27 08:07:34,527
> CompactionManager.java (line 512) Scrubbing
> SSTableReader(path='/data/cassandra/system/HintsColumnFamily-f-530-Data.db')
> WARN [CompactionExecutor:1] 2011-03-27 08:07:34,602
> CompactionManager.java (line 607) Non-fatal error reading row
> (stacktrace follows)
> java.io.IOEr