date:20100519

Re: Some questions about using Binary Memtable to import data.

2010-05-19 Thread Peng Guo

Thanks for you information.

I look at some source code of the implement. There still some question:

1 How did I know that the binary write message send to endpoint success?
2 What will happen if the some of nature endpoints dead?

Thanks again.

On Wed, May 19, 2010 at 2:26 PM, Jonathan Ellis  wrote:

> 1. yes
> 2. yes
> 3. compaction will slow down the load
> 4. it will flush the memtable
>
> On Tue, May 18, 2010 at 12:24 AM, Peng Guo  wrote:
> > Hi All:
> >
> > I am trying to use Binary Memtable to import a large number of data.
> >
> > But after I look at the wiki
> > intro:http://wiki.apache.org/cassandra/BinaryMemtable
> >
> > I have some questions about using BinaryMemtable
> >
> > Will the data be replicated automatic？
> > Can we modify the data that already exist in Cassandra?
> > What will happen if we not turn off compaction?
> > What will happen if the data beyond the limited of the
> > BinaryMemtableThroughputInMB?
> >
> > Thanks.
> >
> > --
> > Regards
> >Peng Guo
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Regards
   Peng Guo

Cassandra compaction disk space logic

2010-05-19 Thread Maxim Kramarenko


Hi!

We have mail archive application, so we have a lot of data (30TB on 
multiple nodes) and should delete data after a few months of storing.


Questions are:

1) Compaction require extra space to process. What happend if node have 
no extra space for compaction ? Will it crash, or just stop compaction 
process ?


2) Is it possible to limit max SSTable file size ? I am worry about 
following situation: we have 1TB disk, 600 GB of data in single file and 
should delete 50 GB of outdated data. This can lead to another 550 GB 
data file generation, which cannot fit on the disk.


3) If we have 30 TB in data and replicas, how much disk space required 
to handle this, including adding new data, deleting old, compaction, etc ?


4) What occurs, if we run decommission, but target node have not enough 
disk space ?

Strange error with data reading

2010-05-19 Thread Maxim Kramarenko


Hello!

I have 3 node cluster: node1, node2, node3. Replication factor = 2.
I run decommission on node3 and it's in progress, moving data to node1

Ring on all nodes show all 3 nodes up, no problems (but node 1 response 
with 3-5 sec delay).


I tried to execute a few "get" statements using cli, like
get  MailArchive.Meta['ec3-n2:1274046482!5C/9B-05558-11860FB4!c']

on node 1 and 3 all works fine, but node 2 cli always return
Exception null

data is 3 days old, so it doesn't seems like temp effect.

No errors in log.
Node 2 restart doesn't help too.
tpstats returns 0 active/pending in all rows.

What is it ?

Re: Data migration from mysql to cassandra

2010-05-19 Thread Beier Cai

Thanks Jonathan, using mysql as an id sequence generator definitely is a
good options. One thing though, does using sequential ids defeat the purpose
of random partitioner?

On Tue, May 18, 2010 at 11:25 PM, Jonathan Ellis  wrote:

> Those are 2 of the 3 options (the other one being, continue to
> generate incrementing IDs either by continuing to use mysql solely for
> that purpose, or by using another system like redis for that).
>
> On Mon, May 17, 2010 at 10:48 PM, Beier Cai  wrote:
> > I'm currently moving my existing mysql database to cassandra. One
> particular
> > problem I have is to migrate all those integer auto-increment ids to
> > cassandra's code generated keys (like UUID). One way I can do is to dump
> all
> > the existing records into Cassandra and start with UUID for new records,
> but
> > this will leave mixed mode of ids. another way I can think of is to
> > re-create existing records using UUID and deal with all
> > those referential keys. Either way seems kinda awkward. Are there any
> > good practice to deal with this? I know many people here come from mysql,
> > what did you do?
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

key path vs super column

2010-05-19 Thread Torsten Curdt

We are currently working on a prototype that is using Cassandra for
realtime-ish statistics system. This seems to be quite a common use
case. If people are interested - maybe it be worth collaborating on
this beyond design discussions on the list. But first let's me explain
our approach and where we could use some input.

We are storing the raw events into minute buckets.

 => {
 => { 'id'=>1, 'attrA' => 'a1', 'attrB' => 'b1' },
 => { 'id'=>2, 'attrA' => 'a2', 'attrB' => 'b1' }
...
}

The number of attributes are quite limited currently (below 20) and
for now we only plan to have no more than 1000 events per minute. So
this should be really a piece of cake for Cassandra. With this little
data using a super column should be no problem.

Now the idea is to iterate over the minute buckets and build hour,
day, month and year aggregates. With that getting the totals across a
certain time frame isn't more than a few gets (or a multiget) and
summing it all up. I guess the idea is straight forward.

One could use a super column to store and access the aggregated data
from the time buckets:

 => {
'id/1' => { 'count' => 12 },
'id/2' => { 'count' => 21 }
...
}

While this feels natuaral, the hierarchy might not be best choice for
with the current Cassandra if the number of different ids becomes too
large IIUC. One could also move the id part into the row key space
instead.

 + 'id/1' => 12
 + 'id/2' => 21

...at least as long as we don't have to access all data for one time
slot (like one hour in this case). (This should still be possible with
a row key range query though ...if the ordered partitioner is being
used)

Q: Is the only difference the limitation from the row size? What are
the performance considerations weighting in for one or the other
approach. Does Cassandra first has to load the whole row into memory
before one can access e.g. "id/1" with the super column approach?

cheers
--
Torsten

Re: Cassandra compaction disk space logic

2010-05-19 Thread Ryan King

2010/5/19 Maxim Kramarenko :
> Hi!
>
> We have mail archive application, so we have a lot of data (30TB on multiple
> nodes) and should delete data after a few months of storing.
>
> Questions are:
>
> 1) Compaction require extra space to process. What happend if node have no
> extra space for compaction ? Will it crash, or just stop compaction process
> ?

Stop compaction.

> 2) Is it possible to limit max SSTable file size ? I am worry about
> following situation: we have 1TB disk, 600 GB of data in single file and
> should delete 50 GB of outdated data. This can lead to another 550 GB data
> file generation, which cannot fit on the disk.

Not currently. You have to manage this operationally.

> 3) If we have 30 TB in data and replicas, how much disk space required to
> handle this, including adding new data, deleting old, compaction, etc ?

All of that depends on the cardinality of those operations (how much
is deleted, etc.). You're going to have to benchmark it.

> 4) What occurs, if we run decommission, but target node have not enough disk
> space ?

I don't know. Please let me know when you find out. :)

-ryan

Ring out of sync, cassandra_UnavailableException being thrown

2010-05-19 Thread Keith Thornhill

in a 5 node cluster, i noticed in our client error log that one of the
nodes was consistently throwing cassandra_UnavailableException during
a read operation.

looking into jmx, it was obvious that one of the node's view of the
ring was out of sync.

$ nodetool -host 192.168.20.150 ring
Address   Status Load  Range
   Ring

139508497374977076191526400448759597506
192.168.20.156Up 5.73 GB
733665530305941485083898696792520436   |<--|
192.168.20.158Up 3.41 GB
9629533262984150011756238989685472219  |   ^
192.168.20.154Up 2.44 GB
31048334058970902242412812423471654868 v   |
192.168.20.150Up 4.89 GB
105769574715070648260922426249777160699|   ^
192.168.20.152Up 5.24 GB
139508497374977076191526400448759597506|-->|

$ nodetool -host 192.168.20.158 ring
Address   Status Load  Range
   Ring
192.168.20.158Up 3.41 GB
9629533262984150011756238989685472219  |<--|

looking at the CF stats on that node, it is obvious that reads and
writes are happening, but i have to assume that those are coming from
proxy connections via the other nodes.

when restarting that node, the error logs in the other cluster nodes
show that they detect the server going away and then coming back into
the ring.

INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448
OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475
OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node
/192.168.20.158 has restarted, now UP again
INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538)
Node /192.168.20.158 state jump to normal

any ideas on how to kick that node and remind it of its buddies?

thanks!
-keith

Re: Disk usage doubled after nodetool decommission and node still in ring

2010-05-19 Thread Ran Tavory

Run nodetool streams.

On May 18, 2010 4:14 PM, "Maxim Kramarenko"  wrote:

Hi!

After nodetool decomission data size on all nodes grow twice, node still up
and in ring, and no streaming now / tmp SSTables now.

BTW, I have ssh connection to server, so after run nodetool decommission I
expect, that server receive the command press Ctrl-C and close shell. It is
correct ?

What is the best way to check current node state to check, is decommission
is finished ? Should node accept new data after I run "decommission" command
?

Re: ConcurrentModificationException in gossiper while decommissioning another node

2010-05-19 Thread Ran Tavory

that sounds like it, thanks

On Tue, May 18, 2010 at 3:53 PM, roger schildmeijer
wrote:

> This is hopefully fixed in trunk (CASSANDRA-757 (revision 938597));
> "Replace synchronization in Gossiper with concurrent data structures and
> volatile fields."
>
> // Roger Schildmeijer
>
>
> On Tue, May 18, 2010 at 1:55 PM, Ran Tavory  wrote:
>
>> While the node 192.168.252.61 was in the process of decommissioning I see
>> this error in two other nodes:
>>
>>  INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179)
>> InetAddress /192.168.252.62 is now dead.
>>  INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568)
>> InetAddress /192.168.252.62 is now UP
>>  INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient
>> /192.168.252.61 has been silent for 360ms, removing from gossip
>> ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88)
>> Fatal exception in thread Thread[Timer-1,5,main]
>> java.lang.RuntimeException: java.util.ConcurrentModificationException
>> at
>> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97)
>> at java.util.TimerThread.mainLoop(Timer.java:512)
>> at java.util.TimerThread.run(Timer.java:462)
>> Caused by: java.util.ConcurrentModificationException
>> at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
>> at
>> org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382)
>> at
>> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91)
>> ... 2 more
>>
>>
>> .61 is the decommissioned node. .62 was under load (streams transferred to
>> it from .61)
>>
>> I simply ran nodetool decommission on the 61 node and then (after an hour,
>> I guess) I saw this error in two other live nodes.
>>
>> Does this ring any bell? It's either a bug, or that I wasn't
>> running decommission correctly...
>>
>
>

Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

2010-05-19 Thread Ran Tavory

My decommission was progressing OK, although very slow, but I'll send
another question to the list about that...
The exception must be a hiccup, I hope I won't get it again I suppose...

On Tue, May 18, 2010 at 4:10 PM, Gary Dusbabek  wrote:

> If I had to guess, I'd say that something at the transport layer had
> trouble.  Possibly some kind of thrift hiccup that we haven't seen
> before.
>
> Your description makes it sound as if the decommission is proceeding
> normally though.
>
> Gary.
>
> On Tue, May 18, 2010 at 04:42, Ran Tavory  wrote:
> > What's the correct way to remove a node from a cluster?
> > According to this page http://wiki.apache.org/cassandra/Operations a
> > decommission call should be enough.
> > When decommissioning one of the nodes from my cluster I see an error in
> the
> > client:
> > org.apache.thrift.TApplicationException: get_slice failed: unknown result
> >at
> >
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407)
> >at
> >
> org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367)
> >
> > The client isn't talking to the decommissioned node, it's connected to
> > another node, so I'd expect all operations to continue as normal
> (although
> > slower), right?
> > I simply called "nodetool -h ... decommission" on the host and waited.
> After
> > a while, while the node is still decommissioning I saw the error at the
> > client.
> > The current state of the node is Decommissioned and it's not in the ring
> > now. It is still moving streams to other hosts, though. I can't be sure,
> > though whether the error happened during the time it was Leaving the ring
> or
> > was it already Decommissioned.
> > The server logs don't show something of note (no errors or warnings).
> > What do you think?
>

how to decommission two slow nodes?

2010-05-19 Thread Ran Tavory

In my cluster setup I have two datacenters with 5 hosts in one DC and 3 in
the other.
In the 5 hosts DC I'd like to remove two hosts so I'd get 3 and 3 in each.
The two nodes I'd like to decommission have less RAM than the other 3 so
they operate slower.
What's the most effective way to decommission them?

At first I thought I'd decommission the first and then when it's done,
decommission the second, but the problem was that when I decommissioned the
first it started streaming its data to the second node (as well as others I
think) and since the second node was under heavy load, and not enough ram,
it was busy GCing and worked horribly slow. Eventually, after almost 24h of
horribly slow streaming I gave up. This also caused the entire cluster to
operate horribly slow.

So, is there a better way to decommission the two under provisioned nodes
without slowing down the cluster, or at least with a minimum effect?

My replication is 2 and I'm using a RackAwareStrategy so (if everything is
configured correctly with the EndPointSnitch) then at any given time, two
copies of the data exist, one in each DC.

Thanks

Re: Some questions about using Binary Memtable to import data.

Cassandra compaction disk space logic

Strange error with data reading

Re: Data migration from mysql to cassandra

key path vs super column

Re: Cassandra compaction disk space logic

Ring out of sync, cassandra_UnavailableException being thrown

Re: Disk usage doubled after nodetool decommission and node still in ring

Re: ConcurrentModificationException in gossiper while decommissioning another node

Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

how to decommission two slow nodes?

11 matches

Site Navigation

Mail list logo

Footer information