Re: Minimum database size and ops/second to start considering Cassandra

2014-04-10 Thread Tim Wintle
On Thu, 2014-04-10 at 11:17 -0700, motta.lrd wrote: > What is the minimum database size and number of Operations/Second (reads and > write) for which I should seriously consider this database? Significant number of writes / second -> possibly a good use case for cassandra. Database size is a di

Re: Provisioning/Configuration Question

2014-03-01 Thread Tim Wintle
137GB would fairly easily fit in core memory on a single node these days: so it seems a very low amount for a 27 node cluster.. Off the top of my head: would 99th percentile latency be improved by using replication factor 5, assuming you are doing quorum operations.. Sent from my phone On 1 Mar 2

Re: calculating sizes on disk

2013-12-07 Thread Tim Wintle
I have found that in (limited) practice that it's fairly hard to estimate due to compression and compaction behaviour. I think measuring and extrapolating (with an understanding of the datastructures) is the most effective. Tim Sent from my phone On 6 Dec 2013 20:54, "John Sanda" wrote: > I hav

Re: Moving a cluster between networks.

2013-08-23 Thread Tim Wintle
On Wed, 2013-08-21 at 10:42 -0700, Robert Coli wrote: > On Wed, Aug 21, 2013 at 3:58 AM, Tim Wintle wrote: > > > What would the best way to achieve this? (We can tolerate a fairly short > > period of downtime). > > > > I think this would work, but may require a

Moving a cluster between networks.

2013-08-21 Thread Tim Wintle
Hi, Suppose we have two networks: 10.1.0.0/16 and 10.2.0.0/16. It is not possible to route packets between the two networks, but all nodes have interfaces on both networks, so any node can communicate with any address on either network. We are currently running our all nodes on one network, but

Re: Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-15 Thread Tim Wintle
I might be missing something, but if it is all on one machine then why use Cassandra or hadoop? Sent from my phone On 13 Jul 2013 01:16, "Martin Arrowsmith" wrote: > Dear Cassandra experts, > > I have an HP Proliant ML350 G8 server, and I want to put virtual > servers on it. I would like to put

Re: Populating seeds dynamically

2013-06-06 Thread Tim Wintle
On Mon, 2013-06-03 at 17:20 -0700, Aiman Parvaiz wrote: > @Faraaz check out the comment by Aaron morton here : > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Seed-Nodes-td6077958.html > Having same nodes is a good idea but it is not necessary. > > In your case, sure the nodes w

Time complexity of cassandra operations

2013-02-11 Thread Tim Wintle
Hi, I've tried searching for this all over the place, but I can't find an answer anywhere... What is the (theoretical) time complexity of basic C* operations? I assume that single lookups are O(log(R/N)) for R rows across N nodes (as SST lookups should be O(log(n)) and there are R/N rows per nod

Re: Pycassa vs YCSB results.

2013-02-06 Thread Tim Wintle
On Tue, 2013-02-05 at 13:51 -0500, Edward Capriolo wrote: > Without stating the obvious, if you are interested in scale, then why > pick python?. I would (kind of) agree with this point.. If you absolutely need performance here then python isn't the right choice. If, however, you are currently w

Re: Pycassa vs YCSB results.

2013-02-05 Thread Tim Wintle
On Tue, 2013-02-05 at 21:38 +1300, aaron morton wrote: > The first thing I noticed is your script uses python threading library, which > is hampered by the Global Interpreter Lock > http://docs.python.org/2/library/threading.html > > You don't really have multiple threads running in parallel, tr

RE: what is more important (RAM vs Cores)

2012-10-12 Thread Tim Wintle
On Fri, 2012-10-12 at 10:20 +, Viktor Jevdokimov wrote: > IMO, in most cases you'll be limited by the RAM first. +1 - I've seen our 8-core boxes limited by RAM and inter-rack networking, but not by CPU (yet). Tim

Re: Help for creating a custom partitioner

2012-10-01 Thread Tim Wintle
es - assuming the number of categories is significantly smaller than the number of documents that could make a major difference to latency. Tim > > Regards, > Clément > > 2012/9/28 Tim Wintle > > > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote: > > > Hi,**

Re: Remove node from cluster and have it run as a single node cluster by itself

2012-09-29 Thread Tim Wintle
On Fri, 2012-09-28 at 18:53 +, Xu, Zaili wrote: > Hi, > > I have an existing Cassandra Cluster. I removed a node from the cluster. Then > I decommissioned the removed node, stopped it, updated its config so that it > only has itself as the seed and in the cassandra-topology.properties file,

Re: Help for creating a custom partitioner

2012-09-28 Thread Tim Wintle
On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote: > Hi, > > ** ** > > I have hierarchical data. > > I'm storing them in CF with rowkey somewhat like (category, doc id), and > plenty of columns for a doc definition. > > ** ** > > I have hierarchical data traversal too. >

Re: Order of the cyclic group of hashed partitioners

2012-09-05 Thread Tim Wintle
t; > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/09/2012, at 8:20 PM, Tim Wintle wrote: > > > On Tue, 2012-08-28 at 16:57 +1200, aaron morton wrote: > > > Sorry I don't understand your q

Re: Order of the cyclic group of hashed partitioners

2012-09-03 Thread Tim Wintle
On Tue, 2012-08-28 at 16:57 +1200, aaron morton wrote: > Sorry I don't understand your question. > > Can you explain it a bit more or maybe someone else knows. I believe the question is why is the maximum 2**127 and not 0x Tim > > Cheers > > - > Aaron Morton >

Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-17 Thread Tim Wintle
Data layer into parts that are stateless and parts which aren't then you can load balance the horizontally scalable parts of that layer using something like haproxy too if you need to. Tim Wintle

Re: increased RF and repair, not working?

2012-07-30 Thread Tim Wintle
t; Tamar Fraenkel > Senior Software Engineer, TOK Media > > Inline image 1 > > ta...@tok-media.com > Tel: +972 2 6409736 > Mob: +972 54 8356490 > Fax: +972 2 5612956 > > > On Mon, Jul 30, 2012 at 3:14 PM, Tim Wintle > wrote: > On Mon, 2012-07

Re: increased RF and repair, not working?

2012-07-30 Thread Tim Wintle
On Mon, 2012-07-30 at 14:40 +0300, Tamar Fraenkel wrote: > Hi! > To clarify it a bit more, > Let's assume the setup is changed to > RF=3 > W_CL=QUORUM (or two for that matter) > R_CL=ONE > The setup will now work for both read and write in case of one node > failure. > What are the disadvantages,

Re: Distinct Counter Proposal for Cassandra

2012-06-29 Thread Tim Wintle
Would it be possible to support this in a more general case by providing a distributed |= operator over arbitrary byte strings (like the + operator on counter columns), which would allow distributed bloom filters as well? Tim Wintle On Fri, Jun 29, 2012 at 6:31 AM, Chris Burroughs wrote: > W

RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Tim Wintle
entire dataset in the single node cluser, or has it been lost along the way? What is the replication factor for your data? Tim Wintle

Re: Data modeling advice (time series)

2012-05-02 Thread Tim Wintle
On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote: > Tens or a few hundred MB per row seems reasonable. You could do > thousands/MB if you wanted to, but that can make things harder to > manage. thanks (Both Aarons) > Depending on the size of your data, you may find that the overhead of > ea

Data modeling advice (time series)

2012-05-01 Thread Tim Wintle
I believe that the general design for time-series schemas looks something like this (correct me if I'm wrong): (storing time series for X dimensions for Y different users) Row Keys: "{USET_ID}_{TIMESTAMP/BUCKETSIZE}" Columns: "{DIMENSION_ID}_{TIMESTAMP%BUCKETSIZE}" -> {Counter} But I've not fou