Re: singular or plural column family names

2010-07-27 Thread Aaron Morton
For RDBMS I *always* used singular for table names. And was prepared to backup this position for force if necessary :)Any ways, now days it's all about how it makes you feel inside. And I feel it should still be singular. It tends to work better when there are multiple CF's related to the same logical entity e.g. Monkey and MonkeyIndex or MonkeyAccess Vs Monkeys an MonkeysIndex or MonkeysAccess. AaronOn 27 Jul, 2010,at 06:38 PM, uncle mantis  wrote:I know this is an age old question. Kinda like the chicken and the egg. I know that everyone's solution is different but I wanted to get an open opinion.

Do you all use singular or plural column family names in your keyspaces?

I have been using plural for years and I have worked at jobs that use singular and the reason behind it made sense too.

Thanks!Regards,Michael


SV: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-07-27 Thread Thorvaldsson Justus
AFAIK You could use more nodes and read in parallel from them making your read 
rate go up. Also don't write and read to the same disk may help some. It's not 
so much about "Cassandra's" read rate but what your hardware can manage.

/Justus

Från: Dathan Pattishall [mailto:datha...@gmail.com]
Skickat: den 27 juli 2010 08:56
Till: user@cassandra.apache.org
Ämne: Re: what causes MESSAGE-DESERIALIZER-POOL to spike


On Mon, Jul 26, 2010 at 8:30 PM, Jonathan Ellis 
mailto:jbel...@gmail.com>> wrote:
MDP is backing up because RRS is full at 4096.  This means you're not
able to process reads as quickly as the requests are coming in.  Make
whatever is doing those reads be less aggressive.

So, for cassandra to function correctly I need to throttle my reads? What 
request rate is ideal? 100s reads a second, 1000s? For me I would love to do 
100s of thousands of reads a second. Is Cassandra not suited for this?


As to why the reads are slow in the first place, usually this means
you are disk i/o bound.  Posting your cfstats can help troubleshoot
but is no substitute for thinking about your application workload.

How should I think about my application workload? I use cassandra as a 
distributed hash table accessing Cassandra by individual keys (BigO(1)). I 
randomly hit a node through a F5 loadbalancer, using the storage CF definition 
as defined in the sample storage-conf.xml. Each key is no more then 30 bytes, 
the value is a time stamp. I store a total for 20 million keys and update 1.5 
million keys a day. Is there anything else I should really think about? What 
are the limitations in Cassandra that would effect this workload?





On Mon, Jul 26, 2010 at 12:32 PM, Anthony Molinaro
mailto:antho...@alumni.caltech.edu>> wrote:
> It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL.  You
> should check iostat and see what it looks like.  It may be that you
> need more nodes in order to deal with the read/write rate.   You can also
> use JMX to get latency values on reads and writes and see if the backup
> has a corresponding increase in latency.  You may be able to get more
> out of your hardware and memory with row caching but that really depends
> on your data set.
>
> -Anthony
>
> On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote:
>> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7 cores,
>> RAID Disks).
>>
>> ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
>> Pool NameActive   Pending  Completed
>> STREAM-STAGE  0 0  0
>> RESPONSE-STAGE0 0 516280
>> ROW-READ-STAGE8  40961164326
>> LB-OPERATIONS 0 0  0
>> *MESSAGE-DESERIALIZER-POOL 16820081818682*
>> GMFD  0 0   6467
>> LB-TARGET 0 0  0
>> CONSISTENCY-MANAGER   0 0 661477
>> ROW-MUTATION-STAGE0 0 998780
>> MESSAGE-STREAMING-POOL0 0  0
>> LOAD-BALANCER-STAGE   0 0  0
>> FLUSH-SORTER-POOL 0 0  0
>> MEMTABLE-POST-FLUSHER 0 0  4
>> FLUSH-WRITER-POOL 0 0  4
>> AE-SERVICE-STAGE  0 0  0
>> HINTED-HANDOFF-POOL   0 0  3
>>
>> EQX r...@cass04:~# vmstat -n 1
>>
>> procs ---memory-- ---swap-- -io --system--
>> -cpu--
>>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
>> wa st
>>  6 10   7096 121816  16244 1037549200 1 300  5  1
>> 94  0  0
>>  2 10   7096 116484  16248 1038114400  5636 4 21210 9820  2  1
>> 79 18  0
>>  1  9   7096 108920  16248 1038759200  6216 0 21439 9878  2  1
>> 81 16  0
>>  0  9   7096 129108  16248 1036485200  6024 0 23280 8753  2  1
>> 80 17  0
>>  2  9   7096 122460  16248 1037090800  6072 0 20835 9461  2  1
>> 83 14  0
>>  2  8   7096 115740  16260 1037575200  5168   292 21049 9511  3  1
>> 77 20  0
>>  1 10   7096 108424  16260 1038230000  6244 0 21483 8981  2  1
>> 75 22  0
>>  3  8   7096 125028  16260 1036410400  5584 0 21238 9436  2  1
>> 81 16  0
>>  3  9   7096 117928  16260 1037006400  5988 0 21505 10225  2  1
>> 77 19  0
>>  1  8   7096 109544  16260 1037664000  634028 20840 8602  2  1
>> 80 18  0
>>  0  9   7096 127028  16240 1035765200  5984 0 20853 9158  2  1
>> 79 18  0
>>  9  0   7096 121472  16240 1036349200  5716 0 20520 8489  1  1
>> 82 16  0
>>  3  9   7096 112668  16240 1036987200  6404 0 21314 9459  2  1
>> 84 13  0
>>  1  9   7096 127300  16236 1035344000  5684 

SV: Help! Cassandra Data Loader threads are getting stuck

2010-07-27 Thread Thorvaldsson Justus
I made one program doing just this with Java
Basically
I read with one thread from file into an array stopping when size is 20k and w8 
until it is less than 20k and continue reading the datafile. (this is the raw 
data I want to move)

I have n number of threads
Each with one batch of their own and one connection to Cassandra of their own.
They fill their batch with data taking it out of the array (this is 
synchronized), when it reaches 1k it sends it to Cassandra.
I had some problems but none regarding Cassandra it was my own code that 
faltered.
I could provide code if you want.
Justus

Från: Aaron Morton [mailto:aa...@thelastpickle.com]
Skickat: den 26 juli 2010 23:32
Till: user@cassandra.apache.org
Ämne: Re: Help! Cassandra Data Loader threads are getting stuck

Try running it without threading to see if it's a cassandra problem or an issue 
with your threading.

Perhaps split the file and run many single threaded processes to load the data.

Aaron

On 27 Jul, 2010,at 07:14 AM, Rana Aich  wrote:
Hi All,

I have to load huge quantity of data into Cassandra (~10Billion rows).

I'm trying to load the Data from files using multithreading.

The idea is each thread will read the TAB delimited file and process chunk of 
records.

For example Thread1 reads line 1-1000 lines
Thread 2 reads line 1001-2000 and insert into Cassandra.
Thread 3 reads line 2001-3000 and insert into Cassandra.

Thread 10 reads line 9001-1 and insert into Cassandra.
Thread 1  reads line 10001-11000 and insert into Cassandra.
Thread 2 reads line 11001-12000 and insert into Cassandra.

and so on...

I'm testing with a small file size with 20 records.

But somehow the process gets stuck and doesn't proceed any further after 
processing say 16,000 records.

I've attached my working file.

Any help will be very much appreciated.

Regards

raich


Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-07-27 Thread Dathan Pattishall
Ah, the weird thing is I/O is assumed to be the limiting factor, but iops on
the box was very low. Service time and atime very low, and the data access
was only 6MB a second. With all of this, I'm tending to believe that the
problem may be someplace else.

Maybe there is a preferred Java version for Cassandra 0.6.3? I am not
running the latest 1.6 in production.


On Tue, Jul 27, 2010 at 12:01 AM, Thorvaldsson Justus <
justus.thorvalds...@svenskaspel.se> wrote:

>  AFAIK You could use more nodes and read in parallel from them making your
> read rate go up. Also don’t write and read to the same disk may help some.
> It’s not so much about “Cassandra’s” read rate but what your hardware can
> manage.
>
>
>
> /Justus
>
>
>
> *Från:* Dathan Pattishall [mailto:datha...@gmail.com]
> *Skickat:* den 27 juli 2010 08:56
> *Till:* user@cassandra.apache.org
> *Ämne:* Re: what causes MESSAGE-DESERIALIZER-POOL to spike
>
>
>
>
>
> On Mon, Jul 26, 2010 at 8:30 PM, Jonathan Ellis  wrote:
>
> MDP is backing up because RRS is full at 4096.  This means you're not
> able to process reads as quickly as the requests are coming in.  Make
> whatever is doing those reads be less aggressive.
>
>
> So, for cassandra to function correctly I need to throttle my reads? What
> request rate is ideal? 100s reads a second, 1000s? For me I would love to do
> 100s of thousands of reads a second. Is Cassandra not suited for this?
>
>
>
> As to why the reads are slow in the first place, usually this means
> you are disk i/o bound.  Posting your cfstats can help troubleshoot
> but is no substitute for thinking about your application workload.
>
>
> How should I think about my application workload? I use cassandra as a
> distributed hash table accessing Cassandra by individual keys (BigO(1)). I
> randomly hit a node through a F5 loadbalancer, using the storage CF
> definition as defined in the sample storage-conf.xml. Each key is no more
> then 30 bytes, the value is a time stamp. I store a total for 20 million
> keys and update 1.5 million keys a day. Is there anything else I should
> really think about? What are the limitations in Cassandra that would effect
> this workload?
>
>
>
>
>
>
>
> On Mon, Jul 26, 2010 at 12:32 PM, Anthony Molinaro
>  wrote:
> > It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL.  You
> > should check iostat and see what it looks like.  It may be that you
> > need more nodes in order to deal with the read/write rate.   You can also
> > use JMX to get latency values on reads and writes and see if the backup
> > has a corresponding increase in latency.  You may be able to get more
> > out of your hardware and memory with row caching but that really depends
> > on your data set.
> >
> > -Anthony
> >
> > On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote:
> >> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
> cores,
> >> RAID Disks).
> >>
> >> ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
> >> Pool NameActive   Pending  Completed
> >> STREAM-STAGE  0 0  0
> >> RESPONSE-STAGE0 0 516280
> >> ROW-READ-STAGE8  40961164326
> >> LB-OPERATIONS 0 0  0
> >> *MESSAGE-DESERIALIZER-POOL 16820081818682*
> >> GMFD  0 0   6467
> >> LB-TARGET 0 0  0
> >> CONSISTENCY-MANAGER   0 0 661477
> >> ROW-MUTATION-STAGE0 0 998780
> >> MESSAGE-STREAMING-POOL0 0  0
> >> LOAD-BALANCER-STAGE   0 0  0
> >> FLUSH-SORTER-POOL 0 0  0
> >> MEMTABLE-POST-FLUSHER 0 0  4
> >> FLUSH-WRITER-POOL 0 0  4
> >> AE-SERVICE-STAGE  0 0  0
> >> HINTED-HANDOFF-POOL   0 0  3
> >>
> >> EQX r...@cass04:~# vmstat -n 1
> >>
> >> procs ---memory-- ---swap-- -io --system--
> >> -cpu--
> >>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
> id
> >> wa st
> >>  6 10   7096 121816  16244 1037549200 1 300  5
>  1
> >> 94  0  0
> >>  2 10   7096 116484  16248 1038114400  5636 4 21210 9820  2
>  1
> >> 79 18  0
> >>  1  9   7096 108920  16248 1038759200  6216 0 21439 9878  2
>  1
> >> 81 16  0
> >>  0  9   7096 129108  16248 1036485200  6024 0 23280 8753  2
>  1
> >> 80 17  0
> >>  2  9   7096 122460  16248 1037090800  6072 0 20835 9461  2
>  1
> >> 83 14  0
> >>  2  8   7096 115740  16260 1037575200  5168   292 21049 9511  3
>  1
> >> 77 20  0
> >>  1 10   7096 108424  16260 1038230000  6244 0 21483 898

Re: Key Caching

2010-07-27 Thread Peter Schuller
> @Todd, I noticed some new ops in your cassandra.in.sh. Is there any
> documentation on what these ops are, and what they do?
>
> For instance AggressiveOpts, etc.

A fairly complete list is here:

http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

-- 
/ Peter Schuller


Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-07-27 Thread Peter Schuller
> Ah, the weird thing is I/O is assumed to be the limiting factor, but iops on
> the box was very low. Service time and atime very low, and the data access
> was only 6MB a second. With all of this, I'm tending to believe that the
> problem may be someplace else.

You vmstat output shows idle and wait time. I suspect an "iostat -x 1"
will show that your not keeping your underlying device busy (the
right-most column not being stuck at 100% or close to it). Is this the
case? If it is close to 100% or at 100% you'd want to look at the
average queue size column too. But given the vmstat output I doubt
this is the case since you should either be seeing a lot more wait
time or a lot less idle time.

The question is what the limiting factor is. What does jconsole/etc
say about the state of the threads in ROW-READ-STAGE? Statistically if
you poll them a few times; what does it mostly seem to be waiting on?
Normally, if the expectation is that ROW-READ-STAGE is disk bound,
that should show up by the threads usually being busy waiting for disk
I/O upon repeated polling (or waiting for more work to do if they are
idle).

(And btw for some reason I totally missed the fact that ROW-READ-STAGE
had 4096 pending and 8 active... oh well.)

-- 
/ Peter Schuller


Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-07-27 Thread Peter Schuller
> average queue size column too. But given the vmstat output I doubt
> this is the case since you should either be seeing a lot more wait
> time or a lot less idle time.

Hmm, another thing: you mention 16 i7 cores. I presume that's 16 in
total, counting hyper-threading? Because that means 8 threads should
be able to saturate 50% (as perceived by the operating system). If you
have 32 (can you get this yet anyway?) virtual cores then I'd say that
your vmstat output could be consistent with READ-ROW-STAGE being CPU
bound rather than disk bound (presumably with data fitting in cache
and not having to go down to disk). If this is the case, increasing
read concurrency should at least make the actual problem more obvious
(i.e., achieving CPU saturation), though it probably won't increase
throughput much unless Cassandra is very friendly to
hyperthreading

-- 
/ Peter Schuller


SV: Key Caching

2010-07-27 Thread Thorvaldsson Justus
I can test on 3 servers and I can test using up to 86gb on each, is there 
anything specific you want to test in this case?
I am using Cassandra 6.3 and running a much smaller amount of RAM but if you 
think it is interesting I will add it to my ToDo list. I don’t know if I will 
have more servers soon because that is under consideration.

/Justus

-Ursprungligt meddelande-
Från: B. Todd Burruss [mailto:bburr...@real.com] 
Skickat: den 27 juli 2010 01:33
Till: user@cassandra.apache.org
Ämne: Re: Key Caching

i run cassandra with a 30gb heap on machines with 48gb total with good
results.  i don't use more just because i want to leave some for the OS
to cache disk pages, etc.  i did have the problem a couple of times with
GC doing a full stop on the JVM because it couldn't keep up.  my
understanding of the CMS GC is that it kicks in when a certain
percentage of the JVM heap is used.  by tweaking
XX:CMSInitiatingOccupancyFraction you can make this kick in sooner (or
later) and this fixed it for me.

my JVM opts differ just slightly from the latest cassandra changes in
0.6

JVM_OPTS=" \
-ea \
-Xms30G \
-Xmx30G \
-XX:SurvivorRatio=128 \
-XX:MaxTenuringThreshold=0 \
-XX:TargetSurvivorRatio=90 \
-XX:+AggressiveOpts \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled \
-XX:CMSInitiatingOccupancyFraction=88 \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc \
-Dnetworkaddress.cache.ttl=60 \
-Dcom.sun.management.jmxremote.port=6786 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
"






On Mon, 2010-07-26 at 14:04 -0700, Peter Schuller wrote:
> > If the cache is stored in the heap, how big can the heap be made
> > realistically on a 24gb ram machine? I am a java newbie but I have read
> > concerns with going over 8gb for the heap as the GC can be too painful/take
> > too long. I already have seen timeout issues (node is dead errors) under
> > load during GC or compaction. Can/should the heap be set to 16gb with 24gb
> > ram?
> 
> I have never run Cassandra in production with such a large heap, so
> I'll let others comment on practical experience with that.
> 
> In general however, with the JVM and the CMS garbage collector (which
> is enabled by default with Cassandra), having a large heap is not
> necessarily a problem depending on the application's workload.
> 
> In terms of GC:s taking too long - with the default throughput
> collector used by the JVM you will tend to see the longest pause times
> scale roughly linearly with heap size. Most pauses would still be
> short (these are what is known as young generation collections), but
> periodically a so-called full collection is done. WIth the throughput
> collector, this implies stopping all Java threads while the *entire*
> Java heap is garbage collected.
> 
> WIth the CMS (Concurrent Mark/Sweep) collector the intent is that the
> periodic scans of the entire Java heap are done concurrently with the
> application without pausing it. Fallback to full stop-the-world
> garbage collections can still happen if CMS fails to complete such
> work fast enough, in which case tweaking of garbage collection
> settings may be required.
> 
> One thing to consider in any case is how much memory you actually
> need; the more you give to the JVM, the less there is left for the OS
> to cache file contents. If for example your true working set in
> cassandra is, to grab a random number, 3 GB and you set the heap
> sizeto 15 GB - now you're wasting a lot of memory by allowing the JVM
> to postpone GC until it starts approaching the 15 GB mark. This is
> actually good (normally) for overall GC throughput, but not
> necessarily good overall for something like cassandra where there is a
> direct trade-off with cache eviction in the operating system possibly
> causing additional I/O.
> 
> Personally I'd be very interested in hearing any stories about running
> cassandra nodes with 10+ gig heap sizes, and how well it has worked.
> My gut feeling is that it should work reasonable well, but I have no
> evidence of that and I may very well be wrong. Anyone?
> 
> (On a related noted, my limited testing with the G1 collector with
> Cassandra has indicated it works pretty well. Though I'm concerned
> with the weak ref finalization based cleanup of compacted sstables
> since the G1 collector will be much less deterministic in when a
> particular object may be collected. Has anyone deployed Cassandra with
> G1 on very large heaps under real load?)
> 




Re: Key Caching

2010-07-27 Thread Dathan Pattishall
woot thnx, lots of knobs to play with!

On Tue, Jul 27, 2010 at 12:16 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > @Todd, I noticed some new ops in your cassandra.in.sh. Is there any
> > documentation on what these ops are, and what they do?
> >
> > For instance AggressiveOpts, etc.
>
> A fairly complete list is here:
>
> http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp
>
> --
> / Peter Schuller
>


Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-27 Thread aaron morton
> Some possibilities open up when using OPP, especially with aggregate
> keys. This is more of an option when RF==cluster size, but not
> necessarily a good reason to make RF=cluster size if you haven't
> already.

This use of the OOP sounds like the way Lucandra stores data, they 
want to have range scans and some random key distribution. 

http://github.com/tjake/Lucandra

See the hash_key() function in CassandraUtils.java for how they manually hash 
the key before storing it in cassandra. 


> 64MB per row, 1MB columns
> customerABC:file123: (colnames: , 0010, 0020, ...)
> customerABC:file123:0400 (colnames: 0400, 0410, ... )
> if 0x is not enough for the file size (4,294,967,295), then
> you can start with 10 or 12 digits instead (up to 2.8e+14)

Grouping together chunks into larger groups/extents is an interesting idea. 
You could have a 'read ahead' buffer.  I'm 
sure somewhere in all these designs there is a magical balance between row size 
and 
the number of rows. They were saying chunks with the same has should only 
be stored once though, so not sure if it's applicable in this case. 

> If you needed to add metadata to chunk groups/chunks, you can use
> column names which are disjoint from '0'-'F', as long as your API
> knows how to set your predicates up likewise. If there is at least one
> column name which is dependable in each chunk row, then you can use it
> as your predicate for "what's out there" queries. This avoids loading
> column data for the chunks when looking up names (row/file/... names).
> On the other hand, if you use an empty predicate, there is not an easy
> way to avoid tombstone rows unless you make another trip to Cassandra
> to verify.

I've experimented with name spacing columns before, and found easier to 
use a super CF in the long run.

Cheers
Aaron
  



non blocking Cassandra with Tornado

2010-07-27 Thread aaron morton
Today I worked out how to make non blocking calls to Cassandra inside of the 
non blocking Tornado web server (http://www.tornadoweb.org/) using Python. I 
thought I'd share it here and see if anyone thinks I'm abusing Thrift too much 
and inviting trouble.

It's a bit mucky and I have not tested it for things like timeouts and errors. 
But here goes...

The idea is rather than calling a cassandra client function like get_slice(), 
call the send_get_slice() then have a non blocking wait on the socket thrift is 
using, then call recv_get_slice().

So the steps in Tornado are:

1.  Web handler creates an object from the model, calls a function on it 
like start_read() to populate it.

2.  model.start_read() needs to call get_slice() on the thrift generated 
Cassandra client. Instead it calls send_get_slice() and returns to the calling 
web handler. 

3   Web Handler then asks Tornado to epoll wait for any activity on the 
thrift socket. It gets access to the socket file descriptor by following this 
chain from the thrift generated Cassandra client 
_iprot.trans.__TTransportBase__trans.handle.fileno()

4,  Web handler function called in 1 above returns, Tornado keeps the http 
connection alive and the web handler instance alive. Later when the socket has 
activity Tornado will call back into the web handler. 

5.  To get the result of the call to cassandra the Web Handler calls a 
function on the model such as finish_read(). finish_read() wants to get the 
results of the get_slice() and do something, so it calls recv_get_slice on the 
thrift Cassandra client. Processes the result and returns to the web handler. 


This looks like the same process the TTwisted.py transport in the thrift 
package is using. Except it's not using the nasty reference to get to the raw 
socket. 

I'm not sure about any adverse affects on the Cassandra server from the client 
not servicing the socket immediately when it starts sending data back. I'm 
guessing there are some buffers there, but not sure. Could I be accidentally 
blocking / hurting the cassandra server ?

Thanks
Aaron

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Jonathan Ellis
On Fri, Jul 23, 2010 at 8:57 AM, Julie  wrote:
> But in my focused testing today I see that if I run nodetool "cleanup" on the
> nodes taking up way more space than I expect, I see multiple SS Tables being
> combined into 1 or 2 and the live disk usage going way down, down to what I 
> know
> the raw data requires.
>
> This is great news!  I haven't tested it on hugely bloated nodes yet (where 
> the
> disk usage is 6X the size of the raw data) since I haven't reproduced that
> problem today, but I would think using nodetool "cleanup" will work.
>
> I just have two questions:
>
>       (1) How can I set up Cassandra to do this automatically, to allow my
> nodes to store more data?

You'd have to use cron or a similar external service.

>       (2) I am a bit confused why cleanup is working this way since the doc
> claims it just cleans up keys no longer belonging to this node.  I have 8 
> nodes
> and do a simple sequential write of 10,000 keys to each of them.  I'm using
> random partitioning and give each node an Initial Token that should force even
> spacing of tokens around the hash space:

a) cleanup is a superset of compaction, so if you've been doing
overwrites at all then it will reduce space used for that reason
b) if you have added, moved, or removed any nodes then you will have
"keys no longer belonging to this node"

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: non blocking Cassandra with Tornado

2010-07-27 Thread Sandeep Kalidindi at PaGaLGuY.com
@aaron - thanks a lot. i will test it. This is very much needed.

Cheers,
Deepu.



On Tue, Jul 27, 2010 at 6:03 PM, aaron morton wrote:

> Today I worked out how to make non blocking calls to Cassandra inside of
> the non blocking Tornado web server (http://www.tornadoweb.org/) using
> Python. I thought I'd share it here and see if anyone thinks I'm abusing
> Thrift too much and inviting trouble.
>
> It's a bit mucky and I have not tested it for things like timeouts and
> errors. But here goes...
>
> The idea is rather than calling a cassandra client function like
> get_slice(), call the send_get_slice() then have a non blocking wait on the
> socket thrift is using, then call recv_get_slice().
>
> So the steps in Tornado are:
>
> 1.  Web handler creates an object from the model, calls a function on it
> like start_read() to populate it.
>
> 2. model.start_read() needs to call get_slice() on the thrift generated
> Cassandra client. Instead it calls send_get_slice() and returns to the
> calling web handler.
>
> 3  Web Handler then asks Tornado to epoll wait for any activity on the
> thrift socket. It gets access to the socket file descriptor by following
> this chain from the thrift generated Cassandra client
> _iprot.trans.__TTransportBase__trans.handle.fileno()
>
> 4,  Web handler function called in 1 above returns, Tornado keeps the http
> connection alive and the web handler instance alive. Later when the socket
> has activity Tornado will call back into the web handler.
>
> 5. To get the result of the call to cassandra the Web Handler calls a
> function on the model such as finish_read(). finish_read() wants to get the
> results of the get_slice() and do something, so it calls recv_get_slice on
> the thrift Cassandra client. Processes the result and returns to the web
> handler.
>
>
> This looks like the same process the TTwisted.py transport in the thrift
> package is using. Except it's not using the nasty reference to get to the
> raw socket.
>
> I'm not sure about any adverse affects on the Cassandra server from the
> client not servicing the socket immediately when it starts sending data
> back. I'm guessing there are some buffers there, but not sure. Could I be
> accidentally blocking / hurting the cassandra server ?
>
> Thanks
> Aaron
>


Quick Poll: Server names

2010-07-27 Thread uncle mantis
I will be naming my servers after insect family names. What do you all use
for yours?

If this is something that is too off topic please contact a moderator.

Regards,

Michael


RE: Quick Poll: Server names

2010-07-27 Thread John Hogan
Star Trek ship names.

JH

From: uncle mantis [mailto:uncleman...@gmail.com]
Sent: Tuesday, July 27, 2010 9:55 AM
To: cassandra-u...@incubator.apache.org
Subject: Quick Poll: Server names

I will be naming my servers after insect family names. What do you all use for 
yours?

If this is something that is too off topic please contact a moderator.

Regards,

Michael


Cassandra vs MongoDB

2010-07-27 Thread Mark
Can someone quickly explain the differences between the two? Other than 
the fact that MongoDB supports ad-hoc querying I don't know whats 
different. It also appears (using google trends) that MongoDB seems to 
be growing while Cassandra is dying off. Is this the case?


Thanks for the help


Can't find the storageproxy using jconsole

2010-07-27 Thread Mingfan Lu
I am using Jconsole to access JMX and find out that I can't find
storageproxy under mbean tab while I can get information of
storageservice.
It is very interesting that I find the storageproxy has been
registered in source code.

private StorageProxy() {}
static
{
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
try
{
mbs.registerMBean(new StorageProxy(), new
ObjectName("org.apache.cassandra.service:type=StorageProxy"));
}
catch (Exception e)
{
throw new RuntimeException(e);
}
   }

my cassandra is 0.6.3


Re: Quick Poll: Server names

2010-07-27 Thread Michael Widmann
Stargate Series Names:

ONeil
Asgard
Jumper
ZPM1 - till ZPMx
Chevron1 till Chevron9




2010/7/27 John Hogan 

>  Star Trek ship names.
>
>
>
> JH
>
>
>
> *From:* uncle mantis [mailto:uncleman...@gmail.com]
> *Sent:* Tuesday, July 27, 2010 9:55 AM
> *To:* cassandra-u...@incubator.apache.org
> *Subject:* Quick Poll: Server names
>
>
>
> I will be naming my servers after insect family names. What do you all use
> for yours?
>
>
>
> If this is something that is too off topic please contact a moderator.
>
> Regards,
>
> Michael
>



-- 
bayoda.com - Professional Online Backup Solutions for Small and Medium Sized
Companies


Re: Quick Poll: Server names

2010-07-27 Thread Dave Viner
I've seen & used several...

names of children of employees of the company
names of streets near office
names of diseases (lead to very hard to spell names after a while, but was
quite educational for most developers)
names of characters from famous books (e.g., lord of the rings, asimov
novels, etc)


On Tue, Jul 27, 2010 at 7:54 AM, uncle mantis  wrote:

> I will be naming my servers after insect family names. What do you all use
> for yours?
>
> If this is something that is too off topic please contact a moderator.
>
> Regards,
>
> Michael
>


Re: non blocking Cassandra with Tornado

2010-07-27 Thread Peter Schuller
> The idea is rather than calling a cassandra client function like
> get_slice(), call the send_get_slice() then have a non blocking wait on the
> socket thrift is using, then call recv_get_slice().

(disclaimer: I've never used tornado)

Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?

Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.

Apologies if I'm misunderstanding what you're trying to do.

-- 
/ Peter Schuller


Re: Quick Poll: Server names

2010-07-27 Thread Brett Thomas
I like names of colleges

On Tue, Jul 27, 2010 at 11:40 AM, Dave Viner  wrote:

> I've seen & used several...
>
> names of children of employees of the company
> names of streets near office
> names of diseases (lead to very hard to spell names after a while, but was
> quite educational for most developers)
> names of characters from famous books (e.g., lord of the rings, asimov
> novels, etc)
>
>
> On Tue, Jul 27, 2010 at 7:54 AM, uncle mantis wrote:
>
>> I will be naming my servers after insect family names. What do you all use
>> for yours?
>>
>> If this is something that is too off topic please contact a moderator.
>>
>> Regards,
>>
>> Michael
>>
>
>


Re: Quick Poll: Server names

2010-07-27 Thread uncle mantis
Ah S**T! The Pooh server is is down again! =)

What does one do if they run out of themed names?

Regards,

Michael


On Tue, Jul 27, 2010 at 10:46 AM, Brett Thomas wrote:

> I like names of colleges
>
>
> On Tue, Jul 27, 2010 at 11:40 AM, Dave Viner  wrote:
>
>> I've seen & used several...
>>
>> names of children of employees of the company
>> names of streets near office
>> names of diseases (lead to very hard to spell names after a while, but was
>> quite educational for most developers)
>> names of characters from famous books (e.g., lord of the rings, asimov
>> novels, etc)
>>
>>
>> On Tue, Jul 27, 2010 at 7:54 AM, uncle mantis wrote:
>>
>>> I will be naming my servers after insect family names. What do you all
>>> use for yours?
>>>
>>> If this is something that is too off topic please contact a moderator.
>>>
>>> Regards,
>>>
>>> Michael
>>>
>>
>>
>


Re: non blocking Cassandra with Tornado

2010-07-27 Thread Dave Viner
FWIW - I think this is actually more of a question about Thrift than about
Cassandra.  If I understand you correctly, you're looking for a async
client.  Cassandra "lives" on the other side of the thrift service.  So, you
need a client that can speak Thrift asynchronously.

You might check out the new async Thrift client in Java for inspiration:

http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/

Or, even better, port the Thrift async client to work for python and other
languages.

Dave Viner


On Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller  wrote:

> > The idea is rather than calling a cassandra client function like
> > get_slice(), call the send_get_slice() then have a non blocking wait on
> the
> > socket thrift is using, then call recv_get_slice().
>
> (disclaimer: I've never used tornado)
>
> Without looking at the generated thrift code, this sounds dangerous.
> What happens if send_get_slice() blocks? What happens if
> recv_get_slice() has to block because you didn't happen to receive the
> response in one packet?
>
> Normally you're either doing blocking code or callback oriented
> reactive code. It sounds like you're trying to use blocking calls in a
> non-blocking context under the assumption that readable data on the
> socket means the entire response is readable, and that the socket
> being writable means that the entire request can be written without
> blocking. This might seems to work and you may not block, or block
> only briefly. Until, for example, a TCP connection stalls and your
> entire event loop hangs due to a blocking read.
>
> Apologies if I'm misunderstanding what you're trying to do.
>
> --
> / Peter Schuller
>


Re: Quick Poll: Server names

2010-07-27 Thread Nick Jones

Counties in Texas is a significant list:
http://en.wikipedia.org/wiki/List_of_counties_in_Texas

Nick




Re: Quick Poll: Server names

2010-07-27 Thread Edward Capriolo
On Tue, Jul 27, 2010 at 11:49 AM, uncle mantis  wrote:
> Ah S**T! The Pooh server is is down again! =)
>
> What does one do if they run out of themed names?
>
> Regards,
>
> Michael
>
>
> On Tue, Jul 27, 2010 at 10:46 AM, Brett Thomas 
> wrote:
>>
>> I like names of colleges
>>
>> On Tue, Jul 27, 2010 at 11:40 AM, Dave Viner  wrote:
>>>
>>> I've seen & used several...
>>> names of children of employees of the company
>>> names of streets near office
>>> names of diseases (lead to very hard to spell names after a while, but
>>> was quite educational for most developers)
>>> names of characters from famous books (e.g., lord of the rings, asimov
>>> novels, etc)
>>>
>>>
>>> On Tue, Jul 27, 2010 at 7:54 AM, uncle mantis 
>>> wrote:

 I will be naming my servers after insect family names. What do you all
 use for yours?

 If this is something that is too off topic please contact a moderator.

 Regards,

 Michael
>>>
>>
>
>

I know this is a fun thread, and I hate being a "debby downer"
but...In my opinion, naming servers after anything then their function
is not a great idea. Lets look at some positives and negatives:

System1:
cassandra01
cassandra02
cassandra03

VS

System2:
tom
dick
harry

Forward and reverse DNS:

System1 is easy to mange with the server number you can easily figure
out an offset.
System2 requires careful mapping and will be more error prone.

The future:
So way back when a company i was at used Native American tribe names.
Guess what happened. At about 20 nodes we ran out of common names like
Cherokee, and we had servers named choctaw. These names become hard to
spell and hard to say. Once you run out of native American names and
you start using 'country names' What is the point? It is not even a
convention any more. Cassandra servers are named after Native
Americans, or possible food, or possibly a dog.

Quick someone... fido just went down? What does fido do? Is it
important? Is it in our web cluster or are cassandra cluster?

Someone about mentioned Chevron1 till Chevron9. Look then ran out of
unique names after the 5th server. So essentially 5 unique fun names
then chevron6-1000.  Why is chevron6-1000 better then cassandra6-1000
and is it any more fun?

Reboots:
Have you ever called a data center at 1AM for a server reboot? Picking
a fancy, non phonetic name is a great way for a tired NOC operator to
reboot the wrong one.


Re: Quick Poll: Server names

2010-07-27 Thread Benjamin Black
[role][sequence].[airport code][sequence].[domain].[tld]


Re: Quick Poll: Server names

2010-07-27 Thread Colin Vipurs
+1 for this

> I know this is a fun thread, and I hate being a "debby downer"
> but...In my opinion, naming servers after anything then their function
> is not a great idea. Lets look at some positives and negatives:
>
> System1:
> cassandra01
> cassandra02
> cassandra03
>
> VS
>
> System2:
> tom
> dick
> harry
>
> Forward and reverse DNS:
>
> System1 is easy to mange with the server number you can easily figure
> out an offset.
> System2 requires careful mapping and will be more error prone.
>
> The future:
> So way back when a company i was at used Native American tribe names.
> Guess what happened. At about 20 nodes we ran out of common names like
> Cherokee, and we had servers named choctaw. These names become hard to
> spell and hard to say. Once you run out of native American names and
> you start using 'country names' What is the point? It is not even a
> convention any more. Cassandra servers are named after Native
> Americans, or possible food, or possibly a dog.
>
> Quick someone... fido just went down? What does fido do? Is it
> important? Is it in our web cluster or are cassandra cluster?
>
> Someone about mentioned Chevron1 till Chevron9. Look then ran out of
> unique names after the 5th server. So essentially 5 unique fun names
> then chevron6-1000.  Why is chevron6-1000 better then cassandra6-1000
> and is it any more fun?
>
> Reboots:
> Have you ever called a data center at 1AM for a server reboot? Picking
> a fancy, non phonetic name is a great way for a tired NOC operator to
> reboot the wrong one.
>



-- 
Maybe she awoke to see the roommate's boyfriend swinging from the
chandelier wearing a boar's head.

Something which you, I, and everyone else would call "Tuesday", of course.


Re: Quick Poll: Server names

2010-07-27 Thread Michael Widmann
Hmm

I never will that anyone than one of my team will reboot a instance or
server of mine.
Means - if I don't have the possiblity to remote "terminate" the task - or
Remote Power (IP Based) reboot
the DataCenter isn't my Datacenter ;-)

Just my 2 cents - my names (chevron etc) are already on the list ;-)

greetings

Mike

2010/7/27 Edward Capriolo 

> On Tue, Jul 27, 2010 at 11:49 AM, uncle mantis 
> wrote:
> > Ah S**T! The Pooh server is is down again! =)
> >
> > What does one do if they run out of themed names?
> >
> > Regards,
> >
> > Michael
> >
> >
> > On Tue, Jul 27, 2010 at 10:46 AM, Brett Thomas 
> > wrote:
> >>
> >> I like names of colleges
> >>
> >> On Tue, Jul 27, 2010 at 11:40 AM, Dave Viner 
> wrote:
> >>>
> >>> I've seen & used several...
> >>> names of children of employees of the company
> >>> names of streets near office
> >>> names of diseases (lead to very hard to spell names after a while, but
> >>> was quite educational for most developers)
> >>> names of characters from famous books (e.g., lord of the rings, asimov
> >>> novels, etc)
> >>>
> >>>
> >>> On Tue, Jul 27, 2010 at 7:54 AM, uncle mantis 
> >>> wrote:
> 
>  I will be naming my servers after insect family names. What do you all
>  use for yours?
> 
>  If this is something that is too off topic please contact a moderator.
> 
>  Regards,
> 
>  Michael
> >>>
> >>
> >
> >
>
> I know this is a fun thread, and I hate being a "debby downer"
> but...In my opinion, naming servers after anything then their function
> is not a great idea. Lets look at some positives and negatives:
>
> System1:
> cassandra01
> cassandra02
> cassandra03
>
> VS
>
> System2:
> tom
> dick
> harry
>
> Forward and reverse DNS:
>
> System1 is easy to mange with the server number you can easily figure
> out an offset.
> System2 requires careful mapping and will be more error prone.
>
> The future:
> So way back when a company i was at used Native American tribe names.
> Guess what happened. At about 20 nodes we ran out of common names like
> Cherokee, and we had servers named choctaw. These names become hard to
> spell and hard to say. Once you run out of native American names and
> you start using 'country names' What is the point? It is not even a
> convention any more. Cassandra servers are named after Native
> Americans, or possible food, or possibly a dog.
>
> Quick someone... fido just went down? What does fido do? Is it
> important? Is it in our web cluster or are cassandra cluster?
>
> Someone about mentioned Chevron1 till Chevron9. Look then ran out of
> unique names after the 5th server. So essentially 5 unique fun names
> then chevron6-1000.  Why is chevron6-1000 better then cassandra6-1000
> and is it any more fun?
>
> Reboots:
> Have you ever called a data center at 1AM for a server reboot? Picking
> a fancy, non phonetic name is a great way for a tired NOC operator to
> reboot the wrong one.
>



-- 
bayoda.com - Professional Online Backup Solutions for Small and Medium Sized
Companies


Re: Quick Poll: Server names

2010-07-27 Thread uncle mantis
+1. Quick and simple.

Regards,

Michael


On Tue, Jul 27, 2010 at 10:54 AM, Benjamin Black  wrote:

> [role][sequence].[airport code][sequence].[domain].[tld]
>


Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Peter Schuller
> a) cleanup is a superset of compaction, so if you've been doing
> overwrites at all then it will reduce space used for that reason

I had failed to consider over-writes as a possible culprit (since
removals were stated not to be done). However thinking about it I
believe the effect of this should be limited to roughly a doubling of
disk space in the absolute worst case of over-writing all data in the
absolute worst possible order (such as writing everything twice in the
same order).

Or more accurately, it should be limited to wasting as much as space
as the size of the overwritten values. If you're overwriting with
larger values, it will no longer be a "doubling" relative to the
actual live data set.

Julie, did you do over-writes or was your disk space measurements
based on the state of the cluster after an initial set of writes of
unique values?

-- 
/ Peter Schuller


Re: Cassandra behaviour

2010-07-27 Thread Peter Schuller
> So userspace throttling is probably the answer?

I believe so.

>  Is the normal way of
> doing this to go through the JMX interface from a userspace program,
> and hold off on inserts until the values fall below a given threshold?
>  If so, that's going to be a pain, since most of my system is
> currently using python :)

I don't know what the normal way is or what people have done with
cassandra in production.

What I have tended to do personally and in general (not cassandra
specifik) is to do domain specific rate limiting as required whenever
I do batch jobs / bulk reads/writes. Regardless of whether your
database is cassandra, postgresql or anything else - throwing writes
(or reads for that matter) at the database at maximum possible speed
tends to have adverse effects on latency on other normal traffic. Only
during offline batch operations where latency of other traffic is
irrelevant, do I ever go "all in" and throw traffic at a database at
full speed.

That said, often simple measures like "write with a single sequential
writer subject to RTT of RPC requests" is sufficient to rate limit
pretty well in practice. But of course that depends on the nature of
the writes and how expensive they are relative to RTT and/or RPC.

FWIW, whenever I have needed a hard "maximum of X per second" rate
limit I have implemented or re-used a rate limiter (e.g. a token
bucket) for the language in question and used it in my client code.

-- 
/ Peter Schuller


cassandra summit, making videos?

2010-07-27 Thread S Ahmed
Will there be videos of the session at the Cassandra Summit in SF?

I am really interested in the Cassandra codebase/internals seminar.


Re: cassandra summit, making videos?

2010-07-27 Thread uncle mantis
Why is everything always in California or Las Vegas? :-(

Regards,

Michael


On Tue, Jul 27, 2010 at 11:49 AM, S Ahmed  wrote:

> Will there be videos of the session at the Cassandra Summit in SF?
>
> I am really interested in the Cassandra codebase/internals seminar.
>
>
>


Re: Can't find the storageproxy using jconsole

2010-07-27 Thread Jonathan Ellis
I have also seen StorageProxy missing from the mbeans tab -- I'm not
sure if it is being removed after being registered, or somehow never
being registered at all, or possibly even a jconsole bug where
querying the object manually (or, say, with jmxterm) would still work.
 I haven't spent any time troubleshooting yet, so any insight here
would be welcome.

On Tue, Jul 27, 2010 at 8:19 AM, Mingfan Lu  wrote:
> I am using Jconsole to access JMX and find out that I can't find
> storageproxy under mbean tab while I can get information of
> storageservice.
> It is very interesting that I find the storageproxy has been
> registered in source code.
>
> private StorageProxy() {}
>    static
>    {
>        MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
>        try
>        {
>            mbs.registerMBean(new StorageProxy(), new
> ObjectName("org.apache.cassandra.service:type=StorageProxy"));
>        }
>        catch (Exception e)
>        {
>            throw new RuntimeException(e);
>        }
>   }
>
> my cassandra is 0.6.3
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Jonathan Ellis
On Tue, Jul 27, 2010 at 9:26 AM, Peter Schuller
 wrote:
> I had failed to consider over-writes as a possible culprit (since
> removals were stated not to be done). However thinking about it I
> believe the effect of this should be limited to roughly a doubling of
> disk space in the absolute worst case of over-writing all data in the
> absolute worst possible order (such as writing everything twice in the
> same order).

Minor compactions (see
http://wiki.apache.org/cassandra/MemtableSSTable) will try to keep the
growth in check but it is by no means limited to 2x.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Peter Schuller
> Minor compactions (see
> http://wiki.apache.org/cassandra/MemtableSSTable) will try to keep the
> growth in check but it is by no means limited to 2x.

Sorry I was being unclear. I was rather thinking along the lines of a
doubling of data triggering an implicit major compaction. However I
was wrong anyway, since minimumCompactionThreshold in
CompactionManager is set to 4.

This does make me realize that the actual worst-case spike of disk
space usage is decidedly non-trivial to figure out, even if we are
allowed to assume that compaction speed is equal to or greater than
the speed of writes.

-- 
/ Peter Schuller


Re: Cassandra vs MongoDB

2010-07-27 Thread Drew Dahlke
There's a good post on stackoverflow comparing the two
http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra

It seems to me that both projects have pretty vibrant communities behind them.

On Tue, Jul 27, 2010 at 11:14 AM, Mark  wrote:
> Can someone quickly explain the differences between the two? Other than the
> fact that MongoDB supports ad-hoc querying I don't know whats different. It
> also appears (using google trends) that MongoDB seems to be growing while
> Cassandra is dying off. Is this the case?
>
> Thanks for the help
>


Re: about cassandra compression

2010-07-27 Thread Jeremy Davis
I've been wondering about this question as well, but from a different angle.
More along the lines of should I bother to compress myself? Specifically in
cases where I might want to take several small columns and compress into 1
more compact column. Each column by itself is pretty spartan and won't
benefit from compression on its own. However a collection of columns
certainly would.

I see that there is work in 0.8 to compress the SSTables on disk, which is
great and is half the problem. But what about on the wire? Will the planned
Avro work provide any compression on data to/from the Cassandra Server?

-JD


On Mon, Jul 26, 2010 at 3:35 AM, Ran Tavory  wrote:

> cassandra doesn't compress before storing, no.
> It may be beneficial to compress, depending on the size of your data,
> network latency, disk size and data compressability... You'll need to test.
> I sometimes compress, depending on data size but it's done in the client,
>
>
> On Mon, Jul 26, 2010 at 1:31 PM, john xie  wrote:
>
>> is cassandra  compression before stored?
>> when I stored the data,  is  compression  beneficial to reduce the
>> storage space?
>>
>>
>>
>


Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Julie
Peter Schuller  infidyne.com> writes:

> > a) cleanup is a superset of compaction, so if you've been doing
> > overwrites at all then it will reduce space used for that reason
> 

Hi Peter and Jonathan,

In my test, I write 80,000 rows (100KB each row) to an 8 node cluster.  The
80,000 rows all have unique keys '1' through '8' so no overwriting is
occurring.  I also don't do any deletes.  I simply write the 80,000 rows to 
the 8 node cluster which should be about 1GB of data times 3 (replication 
factor=3) on each node.  

The only thing I am doing special, is I use Random Partitioning and set the
Initial Token on each node to try to get the data evenly distributed:

# Create tokens for the RandomPartitioner that evenly divide token space
# The RandomPatitioner hashes keys into integer tokens in the range 0 to
# 2^127.
# So we simply divide that space into N equal sections.
# serverCount = the number of Cassandra nodes in the cluster

for ((ii=1; ii<=serverCount; ii++)); do
host=ec2-server$ii
echo Generating InitialToken for server on $host
token=$(bc<<-EOF
($ii*(2^127))/$serverCount
EOF)
echo host=$host initialToken=$token
echo "$token" >> storage-conf-node.xml
cat storage-conf-node.xml
...
done

24 hours after my writes, the data is evenly distributed according to 
cfstats (I see almost identical numRows from node to node) but there is 
a lot of extra disk space being used on some nodes, again according to 
cfstats.  This disk usage drops back down to 2.7GB (exactly what I expect 
since that's how much raw data is on each node) when I run "nodetool 
cleanup".

I am confused why there is anything to clean up 24 hours after my last 
write? All nodes in the cluster are fully up and aware of each other 
before I begin the writes.  The only other thing that could possibly be
considered unusual is I cycle through all 8 nodes, rather than 
communicating with a single Cassandra node.  I use a write consistency 
setting of ALL.  I can't see how these would increase the amount of disk 
space used but just mentioning it.

Any help would be greatly appreciated,
Julie

Peter Schuller  infidyne.com> writes:

> > a) cleanup is a superset of compaction, so if you've been doing
> > overwrites at all then it will reduce space used for that reason
> 
> I had failed to consider over-writes as a possible culprit (since
> removals were stated not to be done). However thinking about it I
> believe the effect of this should be limited to roughly a doubling of
> disk space in the absolute worst case of over-writing all data in the
> absolute worst possible order (such as writing everything twice in the
> same order).
> 
> Or more accurately, it should be limited to wasting as much as space
> as the size of the overwritten values. If you're overwriting with
> larger values, it will no longer be a "doubling" relative to the
> actual live data set.
> 
> Julie, did you do over-writes or was your disk space measurements
> based on the state of the cluster after an initial set of writes of
> unique values?






Re: UnavailableException on QUORUM write

2010-07-27 Thread Per Olesen

On Jul 27, 2010, at 12:23 AM, Jonathan Ellis wrote:

> Can you turn on debug logging and try this patch?

Yes, but..I am on vacation now, so it will be about 3 weeks from now.




Upgrading to Cassanda 0.7 Thrift Erlang

2010-07-27 Thread J T
Hi,

I just tried upgrading a perfectly working Cassandra 0.6.3 to Cassandra 0.7
and am finding that even after re-generating the erlang thrift bindings that
I am unable to perform any operation.
I can get a connection but if I try to login or set the keyspace I get a
report from the erlang bindings to say that the connection is closed.

I then tried upgrading to a later version of thrift but still get the same
error.

e.g.
(zotonic3...@127.0.0.1)1> thrift_client:start_link("localhost", 9160,
cassandra_thrift).
{ok,<0.327.0>}
(zotonic3...@127.0.0.1)2> {ok,C}=thrift_client:start_link("localhost", 9160,
cassandra_thrift).
{ok,<0.358.0>}
(zotonic3...@127.0.0.1)3> thrift_client:call( C, set_keyspace, [ <<"Test">>
 ]).

=ERROR REPORT 27-Jul-2010::03:48:08 ===
** Generic server <0.358.0> terminating
** Last message in was {call,set_keyspace,[<<"Test">>]}
** When Server state == {state,cassandra_thrift,
 {protocol,thrift_binary_protocol,
  {binary_protocol,
   {transport,thrift_buffered_transport,<0.359.0>},
   true,true}},
 0}
** Reason for termination ==
** {{case_clause,{error,closed}},
[{thrift_client,read_result,3},
 {thrift_client,catch_function_exceptions,2},
 {thrift_client,handle_call,3},
 {gen_server,handle_msg,5},
 {proc_lib,init_p_do_apply,3}]}
** exception exit: {case_clause,{error,closed}}
 in function  thrift_client:read_result/3
 in call from thrift_client:catch_function_exceptions/2
 in call from thrift_client:handle_call/3
 in call from gen_server:handle_msg/5
 in call from proc_lib:init_p_do_apply/3

The cassandra log seems to indicate that a connection has been made
(although thats only apparent by a TRACE log message saying that a logout
has been done).

The cassandra-cli program is able to connect and function normally so I can
only assume that there is a problem with the erlang bindings.

Has anyone else had any success using 0.7 from Erlang ?

JT.


NoServer Available

2010-07-27 Thread Daniel Bernstein
I've set up a 2 node cluster and I'm trying to connect using pycassa.

My thrift address is set to the default:  localhost:9160

I've verified that the port is open and I'm able to connect to it via
telnet.

My keyspace "Ananda" is defined as is the column family "URL" in storage.xml

Running the following commands locally, I get this:


>>> client = pycassa.connect("Ananda", ['127.0.0.1:9160'],timeout=3.5)
>>> cf = pycassa.ColumnFamily(client, "URL")
>>> cf.insert("foo", {"bar":"value"})
Traceback (most recent call last):
  File "", line 1, in 
  File "pycassa/columnfamily.py", line 332, in insert
self._wcl(write_consistency_level))
  File "pycassa/connection.py", line 164, in _client_call
conn = self._ensure_connection()
  File "pycassa/connection.py", line 175, in _ensure_connection
conn = self.connect()
  File "pycassa/connection.py", line 193, in connect
return self.connect()
  File "pycassa/connection.py", line 186, in connect
server = self._servers.get()
  File "pycassa/connection.py", line 136, in get
raise NoServerAvailable()
pycassa.connection.NoServerAvailable


NB: When I try to connect without any timeout, it just hangs.
When I shutdown cassandra, it fails immediately (rather than failing in 3.5
seconds when I use a timeout).

Any help would be much appreciated.

--Daniel


Re: Key Caching

2010-07-27 Thread B. Todd Burruss
AggressiveOpts, if i remember correctly, uses options that are not
documented but will probably make into a future release of the JVM.

cassandra used it once upon a time.  probably should take it out, but
things work just fine for me now ;)


On Tue, 2010-07-27 at 01:48 -0700, Dathan Pattishall wrote:
> woot thnx, lots of knobs to play with!
> 
> On Tue, Jul 27, 2010 at 12:16 AM, Peter Schuller
>  wrote:
> > @Todd, I noticed some new ops in your cassandra.in.sh. Is
> there any
> > documentation on what these ops are, and what they do?
> >
> > For instance AggressiveOpts, etc.
> 
> 
> A fairly complete list is here:
> 
> http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp
> 
> --
> / Peter Schuller
> 




Re: Cassandra vs MongoDB

2010-07-27 Thread Jonathan Shook
Also, google trends is only a measure of what terms people are
searching for. To equate this directly to growth would be misleading.

 Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke  wrote:
> There's a good post on stackoverflow comparing the two
> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>
> It seems to me that both projects have pretty vibrant communities behind them.
>
> On Tue, Jul 27, 2010 at 11:14 AM, Mark  wrote:
>> Can someone quickly explain the differences between the two? Other than the
>> fact that MongoDB supports ad-hoc querying I don't know whats different. It
>> also appears (using google trends) that MongoDB seems to be growing while
>> Cassandra is dying off. Is this the case?
>>
>> Thanks for the help
>>
>


Re: Cassandra vs MongoDB

2010-07-27 Thread Dave Gardner
There are quite a few differences. Ultimately it depends on your use
case! For example Mongo has a limit on the maximum "document" size of
4MB, whereas with Cassandra you are not really limited to the volume
of data/columns per-row (I think there maybe a limit of 2GB perhaps;
basically none)

Another point re: search volumes is that mongo has been actively
promoting over the last few months. I recently attended an excellent
conference day in London which was very cheap; tickets probably didn't
cover the costs. I guess this is part of their strategy. Eg: encourage
adoption.

Dave

On Tuesday, July 27, 2010, Jonathan Shook  wrote:
> Also, google trends is only a measure of what terms people are
> searching for. To equate this directly to growth would be misleading.
>
>  Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke  wrote:
>> There's a good post on stackoverflow comparing the two
>> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>>
>> It seems to me that both projects have pretty vibrant communities behind 
>> them.
>>
>> On Tue, Jul 27, 2010 at 11:14 AM, Mark  wrote:
>>> Can someone quickly explain the differences between the two? Other than the
>>> fact that MongoDB supports ad-hoc querying I don't know whats different. It
>>> also appears (using google trends) that MongoDB seems to be growing while
>>> Cassandra is dying off. Is this the case?
>>>
>>> Thanks for the help
>>>
>>
>


Re: Quick Poll: Server names

2010-07-27 Thread Benoit Perroud
We use name of (european) cities for "logical" functionnalities :

- berlin01, berlin02, berlin03 part are mysql cluster,
- zurich1 and zurich2 are AD,
- roma01, roma02, and so on are Cassanrda cluster for the Roma project
- and so on.

We found this way a good tradeoff.

Regards,

Benoit.

2010/7/27 uncle mantis :
> +1. Quick and simple.
>
> Regards,
>
> Michael
>
>
> On Tue, Jul 27, 2010 at 10:54 AM, Benjamin Black  wrote:
>>
>> [role][sequence].[airport code][sequence].[domain].[tld]
>
>


Re: Cassandra vs MongoDB

2010-07-27 Thread Mark

On 7/27/10 12:42 PM, Dave Gardner wrote:

There are quite a few differences. Ultimately it depends on your use
case! For example Mongo has a limit on the maximum "document" size of
4MB, whereas with Cassandra you are not really limited to the volume
of data/columns per-row (I think there maybe a limit of 2GB perhaps;
basically none)

Another point re: search volumes is that mongo has been actively
promoting over the last few months. I recently attended an excellent
conference day in London which was very cheap; tickets probably didn't
cover the costs. I guess this is part of their strategy. Eg: encourage
adoption.

Dave

On Tuesday, July 27, 2010, Jonathan Shook  wrote:
   

Also, google trends is only a measure of what terms people are
searching for. To equate this directly to growth would be misleading.

  Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke  wrote:
 

There's a good post on stackoverflow comparing the two
http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra

It seems to me that both projects have pretty vibrant communities behind them.

On Tue, Jul 27, 2010 at 11:14 AM, Mark  wrote:
   

Can someone quickly explain the differences between the two? Other than the
fact that MongoDB supports ad-hoc querying I don't know whats different. It
also appears (using google trends) that MongoDB seems to be growing while
Cassandra is dying off. Is this the case?

Thanks for the help

 
   
 
Well my initial use case would be to store our search logs and perform 
some ad-hoc querying which I know is a win for Mongo. However I don't 
think I fully understand how to build indexes in Cassandra so maybe its 
just an issue of ignorance. I know going forward though we would be 
expanding it to house our per item translations.


repair failed or stopped after 7-8 hours?

2010-07-27 Thread Michael Andreasen
I've started repair on 6 nodes some 7-8 hours ago

The nodes still have load of 2-3 (normally 0.5) and if i grep AE in
system.log i get lines like this on most of the nodes

   Performing streaming repair of 30 ranges to /172.19.0.32 for 

Load is 400-500gb on the nodes.

Any word of advise on this one, should i just wait or do you think its
failed?

Thanks
Mike


Re: Quick Poll: Server names

2010-07-27 Thread Daniel Jue
Names of Transformers

Blurr, Megatron, Sideswipe, Unicron, Arcee etc

On Tue, Jul 27, 2010 at 3:57 PM, Benoit Perroud  wrote:
> We use name of (european) cities for "logical" functionnalities :
>
> - berlin01, berlin02, berlin03 part are mysql cluster,
> - zurich1 and zurich2 are AD,
> - roma01, roma02, and so on are Cassanrda cluster for the Roma project
> - and so on.
>
> We found this way a good tradeoff.
>
> Regards,
>
> Benoit.
>
> 2010/7/27 uncle mantis :
>> +1. Quick and simple.
>>
>> Regards,
>>
>> Michael
>>
>>
>> On Tue, Jul 27, 2010 at 10:54 AM, Benjamin Black  wrote:
>>>
>>> [role][sequence].[airport code][sequence].[domain].[tld]
>>
>>
>


Re: Upgrading to Cassanda 0.7 Thrift Erlang

2010-07-27 Thread Jonathan Ellis
trunk is using framed thrift connections by default now (was unframed)

On Tue, Jul 27, 2010 at 11:33 AM, J T  wrote:
> Hi,
> I just tried upgrading a perfectly working Cassandra 0.6.3 to Cassandra 0.7
> and am finding that even after re-generating the erlang thrift bindings that
> I am unable to perform any operation.
> I can get a connection but if I try to login or set the keyspace I get a
> report from the erlang bindings to say that the connection is closed.
> I then tried upgrading to a later version of thrift but still get the same
> error.
> e.g.
> (zotonic3...@127.0.0.1)1> thrift_client:start_link("localhost", 9160,
> cassandra_thrift).
> {ok,<0.327.0>}
> (zotonic3...@127.0.0.1)2> {ok,C}=thrift_client:start_link("localhost", 9160,
> cassandra_thrift).
> {ok,<0.358.0>}
> (zotonic3...@127.0.0.1)3> thrift_client:call( C, set_keyspace, [ <<"Test">>
>  ]).
> =ERROR REPORT 27-Jul-2010::03:48:08 ===
> ** Generic server <0.358.0> terminating
> ** Last message in was {call,set_keyspace,[<<"Test">>]}
> ** When Server state == {state,cassandra_thrift,
>                          {protocol,thrift_binary_protocol,
>                           {binary_protocol,
>                            {transport,thrift_buffered_transport,<0.359.0>},
>                            true,true}},
>                          0}
> ** Reason for termination ==
> ** {{case_clause,{error,closed}},
>     [{thrift_client,read_result,3},
>      {thrift_client,catch_function_exceptions,2},
>      {thrift_client,handle_call,3},
>      {gen_server,handle_msg,5},
>      {proc_lib,init_p_do_apply,3}]}
> ** exception exit: {case_clause,{error,closed}}
>      in function  thrift_client:read_result/3
>      in call from thrift_client:catch_function_exceptions/2
>      in call from thrift_client:handle_call/3
>      in call from gen_server:handle_msg/5
>      in call from proc_lib:init_p_do_apply/3
> The cassandra log seems to indicate that a connection has been made
> (although thats only apparent by a TRACE log message saying that a logout
> has been done).
> The cassandra-cli program is able to connect and function normally so I can
> only assume that there is a problem with the erlang bindings.
> Has anyone else had any success using 0.7 from Erlang ?
> JT.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Help! Cassandra Data Loader threads are getting stuck

2010-07-27 Thread Rana Aich
Thanks for your offer...there was some problem in reading the *.gz files in
System.in.
I've rectified my code..


On Tue, Jul 27, 2010 at 12:09 AM, Thorvaldsson Justus <
justus.thorvalds...@svenskaspel.se> wrote:

>  I made one program doing just this with Java
>
> Basically
>
> I read with one thread from file into an array stopping when size is 20k
> and w8 until it is less than 20k and continue reading the datafile. (this is
> the raw data I want to move)
>
>
>
> I have n number of threads
>
> Each with one batch of their own and one connection to Cassandra of their
> own.
>
> They fill their batch with data taking it out of the array (this is
> synchronized), when it reaches 1k it sends it to Cassandra.
>
> I had some problems but none regarding Cassandra it was my own code that
> faltered.
>
> I could provide code if you want.
>
> Justus
>
>
>
> *Från:* Aaron Morton [mailto:aa...@thelastpickle.com]
> *Skickat:* den 26 juli 2010 23:32
> *Till:* user@cassandra.apache.org
> *Ämne:* Re: Help! Cassandra Data Loader threads are getting stuck
>
>
>
> Try running it without threading to see if it's a cassandra problem or an
> issue with your threading.
>
> Perhaps split the file and run many single threaded processes to load the
> data.
>
> Aaron
>
>   On 27 Jul, 2010,at 07:14 AM, Rana Aich  wrote:
>
>  Hi All,
>
>
>
> I have to load huge quantity of data into Cassandra (~10Billion rows).
>
>
>
> I'm trying to load the Data from files using multithreading.
>
>
>
> The idea is each thread will read the TAB delimited file and process chunk
> of records.
>
>
>
> For example Thread1 reads line 1-1000 lines
>
> Thread 2 reads line 1001-2000 and insert into Cassandra.
>
> Thread 3 reads line 2001-3000 and insert into Cassandra.
>
>
>
> Thread 10 reads line 9001-1 and insert into Cassandra.
>
> Thread 1  reads line 10001-11000 and insert into Cassandra.
>
> Thread 2 reads line 11001-12000 and insert into Cassandra.
>
>
>
> and so on...
>
>
>
> I'm testing with a small file size with 20 records.
>
>
>
> But somehow the process gets stuck and doesn't proceed any further after
> processing say 16,000 records.
>
>
>
> I've attached my working file.
>
>
>
> Any help will be very much appreciated.
>
>
>
> Regards
>
>
>
> raich
>
>


Re: non blocking Cassandra with Tornado

2010-07-27 Thread Aaron Morton


Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?
get_slice() has two lines it it, a call to send_get_slice() and one to recv_get_slice() . send_get_slice() sends the request down the socket to the server and returns. recv_get_slice() take a blocking read (with timeout) against the socket, pulls the entire message, decodes it and returns it.
Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.
 I'm not interrupting any of the work thrift is doing when reading or writing to the socket. Those functions still get to complete as normal. The goal is to let the tornado server work on another request while the first one is waiting for Cassandra to do its work. It's wasted time on the web heads that could otherwise be employed servicing other requests. Once it detects the socket state has changed it will add the callback into the event loop. And I then ask the Cassandra client to read all the data from the socket. It's still a blocking call, just that we don't bother to call it unless we know there is data sitting there for it.The recv could still bock hang etc, but will do that in a the normally blocking model.  I'll need to test the timeouts and error propagation in these cases. Thanks for the feedbackAaron

Re: non blocking Cassandra with Tornado

2010-07-27 Thread Aaron Morton
Thanks for the link. It is more of a thrift thing, perhaps I need to do some tests where the web handler sends the get_slice to cassandra but never calls recv to see what could happen. I'll take a look at the Java binding and 
see what it would take to offer a patch to Thrift. Most people coding 
Python (including the guy sitting next to me) would probably so to use 
the thrift Twisted binding.May also take a look at the avro bindings. AaronOn 28 Jul, 2010,at 03:51 AM, Dave Viner  wrote:FWIW - I think this is actually more of a question about Thrift than about Cassandra.  If I understand you correctly, you're looking for a async client.  Cassandra "lives" on the other side of the thrift service.  So, you need a client that can speak Thrift asynchronously.
You might check out the new async Thrift client in Java for inspiration:http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/
Or, even better, port the Thrift async client to work for python and other languages.  Dave VinerOn Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller  wrote:
> The idea is rather than calling a cassandra client function like
> get_slice(), call the send_get_slice() then have a non blocking wait on the
> socket thrift is using, then call recv_get_slice().

(disclaimer: I've never used tornado)

Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?

Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.

Apologies if I'm misunderstanding what you're trying to do.

--
/ Peter Schuller



Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-27 Thread Bryan Whitehead
Just a warning about ZFS. If the plan is to use JBOD w/RAID-Z, don't.
3, 4, 5, ... or N disks in a RAID-Z array (using ZFS) will result in
read performance equivalent to only 1 disk.

Check out this blog entry:
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance

The second chart and the section "The Parity Performance Rathole" are
both a must read.

On Fri, Jul 23, 2010 at 11:51 PM, Michael Widmann
 wrote:
> Hi Jonathan
>
> Thanks for your very valuable input on this.
>
> I maybe didn't enough explanation - so I'll try to clarify
>
> Here are some thoughts:
>
> binary data will not be indexed - only stored.
> The file name to the binary data (a hash) should be indexed for search
> We could group the hashes in 62 "entry" points for search retrieving -> i
> think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9)
> the 64k Blobs meta data (which one belong to which file) should be stored
> separate in cassandra
> For Hardware we rely on solaris / opensolaris with ZFS in the backend
> Write operations occur much more often than reads
> Memory should hold the hash values mainly for fast search (not the binary
> data)
> Read Operations (restore from cassandra) may be async - (get about 1000
> Blobs) - group them restore
>
> So my question is too:
>
> 2 or 3 Big boxes or 10 till 20 small boxes for storage...
> Could we separate "caching" - hash values CFs cashed and indexed - binary
> data CFs not ...
> Writes happens around the clock - on not that tremor speed but constantly
> Would compaction of the database need really much disk space
> Is it reliable on this size (more my fear)
>
> thx for thinking and answers...
>
> greetings
>
> Mike
>
> 2010/7/23 Jonathan Shook 
>>
>> There are two scaling factors to consider here. In general the worst
>> case growth of operations in Cassandra is kept near to O(log2(N)). Any
>> worse growth would be considered a design problem, or at least a high
>> priority target for improvement.  This is important for considering
>> the load generated by very large column families, as binary search is
>> used when the bloom filter doesn't exclude rows from a query.
>> O(log2(N)) is basically the best achievable growth for this type of
>> data, but the bloom filter improves on it in some cases by paying a
>> lower cost every time.
>>
>> The other factor to be aware of is the reduction of binary search
>> performance for datasets which can put disk seek times into high
>> ranges. This is mostly a direct consideration for those installations
>> which will be doing lots of cold reads (not cached data) against large
>> sets. Disk seek times are much more limited (low) for adjacent or near
>> tracks, and generally much higher when tracks are sufficiently far
>> apart (as in a very large data set). This can compound with other
>> factors when session times are longer, but that is to be expected with
>> any system. Your storage system may have completely different
>> characteristics depending on caching, etc.
>>
>> The read performance is still quite high relative to other systems for
>> a similar data set size, but the drop-off in performance may be much
>> worse than expected if you are wanting it to be linear. Again, this is
>> not unique to Cassandra. It's just an important consideration when
>> dealing with extremely large sets of data, when memory is not likely
>> to be able to hold enough hot data for the specific application.
>>
>> As always, the real questions have lots more to do with your specific
>> access patterns, storage system, etc. I would look at the benchmarking
>> info available on the lists as a good starting point.
>>
>> On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann
>>  wrote:
>> > Hi
>> >
>> > We plan to use cassandra as a data storage on at least 2 nodes with RF=2
>> > for about 1 billion small files.
>> > We do have about 48TB discspace behind for each node.
>> >
>> > now my question is - is this possible with cassandra - reliable - means
>> > (every blob is stored on 2 jbods)..
>> >
>> > we may grow up to nearly 40TB or more on cassandra "storage" data ...
>> >
>> > anyone out did something similar?
>> >
>> > for retrieval of the blobs we are going to index them with an hashvalue
>> > (means hashes are used to store the blob) ...
>> > so we can search fast for the entry in the database and combine the
>> > blobs to
>> > a normal file again ...
>> >
>> > thanks for answer
>> >
>> > michael
>> >
>
>
>
> --
> bayoda.com - Professional Online Backup Solutions for Small and Medium Sized
> Companies
>