Re: memory consuption

2011-02-18 Thread Peter Schuller
> Jonathan,
> When you get time could you please explain that a little more. Got a feeling
> I'm about to learn something :)

I'm not Jonathan, but: The operating system's virtual memory system
supports mapping files into a process' address space. This will "use"
virtual memory; i.e. address space. On 32 bit systems this was a
concern recently since running out of address space was a practical
concern; with 64 bit (even if you can't address the full 64 bit) this
is no longer an issue for a while - making virtual address space
essentially "free".

What matters from the perspective of "memory use" in the sense as it
is normally meant, is the amount of data allocated on brk():ed or
mmap():ed /dev/zero, which represent real memory used (or possibly
swap space, but unless the memory is never again accessed that's
usually not interesting from the point of view of "how much memory do
I need").

The key issue is that for a mmap():ed file, there is never a need to
retain the data in physical memory (=resident). Thus, whatever you do
keep resident in physical memory is essentially just there as a cache,
in the same way as normal I/O will cause the kernel page cache to
retain data that you read/write. The different between the normal I/O
and mmap() is that in the mmap() case the memory is actually mapped to
the process, thus affecting the virtual size as reported by top. The
main argument for using mmap() instead of standard I/O is the fact
that reading entails just touching memory - in the case of the memory
being resident, you just read it - you don't even take a page fault
(so no overhead in entering the kernel and doing a semi-context
switch).

A downside with mmap() is that you have less control over how I/O is
done (you can't say "read 60 MB from here", but instead traverse pages
hoping prefetch and/or read-ahead will help you; this can be mitigated
with posix_fadvise(), but then you're back to doing syscalls).

The other effect with mmap() is that it seems to affect the sense the
kernel has of the priority of different pages in terms of what to drop
or swap out, such that mmap() has a tendency to cause swapping out of
the JVM heap. But this is not because the process actually uses more
memory as such.

I didn't read it now but scrolling through it seems the wikipedia
article is a pretty good intro:

   http://en.wikipedia.org/wiki/Virtual_memory

-- 
/ Peter Schuller


Re: memory consuption

2011-02-18 Thread Peter Schuller
> main argument for using mmap() instead of standard I/O is the fact
> that reading entails just touching memory - in the case of the memory
> being resident, you just read it - you don't even take a page fault
> (so no overhead in entering the kernel and doing a semi-context
> switch).

Oh and in the case of Java/Cassandra, as Jonathan clued me in on
earlier, there is also the issue that byte[] arrays are mandated to be
zeroed when allocated which causes overhead typically because there
has to be a loop[1] somewhere writing a bunch of zeroes in, that
you're then just going to replace immediately. Mapping a file has no
such implications as long as you read directly from the underlying
direct ByteBuffer.

[1] Not *necessarily*; a JVM could theoretically do byte[] allocations
in such a way that it already knows the contents is zeroed, but it
would be highly dependent on the GC/memory management technique used
by the JVM whether this is practical. (It just occurred to me that
Azul should get this for 'free' in their GC. Wonder if that's true.)

-- 
/ Peter Schuller


Re: Frequent updates of freshly written columns

2011-02-18 Thread Sylvain Lebresne
On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:

> Are the very freshly written columns to a row in memtables, efficiently
> updated/overwritten by edited/new column values.
>
> After flushing of memtable, are those(edited + unedited ones) columns
> stored together on disk (in same blocks!?) as if they were written in one
> single operation or same time ?? I know if old columns are edited then
> several copies of same column will be dispersed in different sst tables,
> what about fresh columns ?
>
> Are there any disadvantages to frequently updating fresh columns present in
> memtable ?
>

The SSTables are immutable but the memtable are not. As long as you
update/overwrite a column that is still in memtable, it is simply replaced
in memory (so it's as efficient as it gets).
In other words, when the memtable is flushed, only the last version of the
column goes in.

--
Sylvain


Re: cluster size, several cluster on one node for multi-tenancy

2011-02-18 Thread Mimi Aluminium
Thanks a lot for you suggestions,
I will check the virtual keyspace solution - btw, currently I am using
Thrift client with Pycassa, I am not familiar with Hector - does it mean
we'll need to move to Hector client?

I thought of using keyspaces for each tenant, but I dont understand how to
define the whole cluster. Meaning, assuming the tenants are distributed
(replicated) across hundreds  of DCs each consists of tens of racks and
servers, so can I define a single cassandra cluster for all the servers? it
does not seem to be reasonable , this is the reason I thought of sepearating
the clusters. Please let me know how would you solve it?
Thanks,
Miriam



On Thu, Feb 17, 2011 at 10:30 PM, Nate McCall  wrote:

> Hector's virtual keyspaces would work well for what you describe. Ed
> Anuff, who added this feature to Hector, showed me a working
> multi-tennancy based app the other day and it worked quite well.
>
> On Thu, Feb 17, 2011 at 1:44 PM, Norman Maurer  wrote:
> > Maybe you could make use of "Virtual Keyspaces".
> >
> > See this wiki for the idea:
> > https://github.com/rantav/hector/wiki/Virtual-Keyspaces
> >
> > Bye,
> > Norman
> >
> > 2011/2/17 Frank LoVecchio :
> >> Why not just create some sort of ACL on the client side and use one
> >> Keyspace?  It's a lot less management.
> >>
> >> On Thu, Feb 17, 2011 at 12:34 PM, Mimi Aluminium <
> mimi.alumin...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> I really need your help in this matter.
> >>> I will try to simplify my problem and ask specific questions
> >>>
> >>> I am thinking of solving the multi-tenancy problem by providing a
> separate
> >>> cluster per each tenant. Does it sound reasonable?
> >>> I can end-up with one node belongs to several clusters.
> >>> Does Cassandra support several clusters per node? Does it mean several
> >>> Cassandra daemons on each node? Do you recommend doing that ? what is
> the
> >>> overhead? is there any link that explain how to do that?
> >>>
> >>> Thanks a lot,
> >>> Mimi
> >>>
> >>>
> >>> On Wed, Feb 16, 2011 at 6:43 PM, Mimi Aluminium <
> mimi.alumin...@gmail.com>
> >>> wrote:
> 
>  Hi,
>  We are interested in a multi-tenancy environment, that may consist of
> up
>  to hundreds of data centers. The current design requires cross rack
> and
>  cross DC replication. Specifically, the per-tenant CFs will be
> replicated 6
>  times: in three racks,  with 2 copies inside a rack, the racks will be
>  located in at least two different DCs. In the future other replication
>  policies will be considered. The application will decide where (which
> racks
>  and DC)  to place each tenant's replicas.  and it might be that one
> rack can
>  hold more than one tenant.
> 
>  Separating each tenant in a different keyspace, as was suggested
>  in  previous mail thread in this subject, seems to be a good approach
>  (assuming the memtable problem will be solved somehow).
>  But then we had concern with regard to the cluster size.
>  and here are my questions:
>  1) Given the above, should I define one Cassandra cluster that hold
> all
>  the DCs? sounds not reasonable  given hundreds DCs tens of servers in
> each
>  DC etc. Where is the bottleneck here? keep-alive messages, the gossip,
>  request routing? what is the largest number of servers a cluster can
> bear?
>  2) Now assuming that I can create the per-tenant  keyspace only for
>  the
>  servers that in the three racks where the replicas are held,  does
> such
>  definition reduces the messaging transfer among the other servers.
> Does
>  Cassandra optimizes the message transfer in such case?
>  3) Additional possible solution was to create a separate clusters per
>  each tenant. But it can cause a situation where one server has to run
> two or
>  more Cassandra's clusters. Can we run more than one cluster in
> parallel,
>  does it means two cassandra daemons / instances on one server? what
> will be
>  the overhead? do you have a link that explains how to deal with it?
> 
>  Please can you help me to decide which of these solution can work or
> you
>  are welcome to suggest something else.
>  Thanks a lot,
>  Mimi
> 
> 
> 
> 
> 
> 
> 
> >>
> >>
> >>
> >> --
> >> Frank LoVecchio
> >> Senior Software Engineer | Isidorey, LLC
> >> Google Voice +1.720.295.9179
> >> isidorey.com | facebook.com/franklovecchio | franklovecchio.com
> >>
> >
>


How to use NetworkTopologyStrategy

2011-02-18 Thread Héctor Izquierdo Seliva
Hi!

Can some body give me some hints about how to configure a keyspace with
NetworkTopologyStrategy via cassandra-cli? Or what is the preferred
method to do so?

Thanks!



Queries on secondary indexes

2011-02-18 Thread Rauan Maemirov
With this schema:

create column family Userstream with comparator=UTF8Type and rows_cached =
1 and keys_cached = 10
and column_metadata=[{column_name:account_id, validation_class:IntegerType,
index_type: 0, index_name:UserstreamAccountidIdx},
{column_name:from_id, validation_class:IntegerType, index_type: 0,
index_name:UserstreamFromidIdx},
{column_name:type, validation_class:IntegerType, index_type: 0,
index_name:UserstreamTypeIdx}];

I'm having this:

[default@Keyspace1] get Userstream where from_id=5 and type<4;
---
RowKey: 23:feed:12980301937245
=> (column=account_id, value=23, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313864333936302d336235362d313165302d393838302d666235613434333135343865225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=5, timestamp=1298031252270173)
---
RowKey: 5:feed:12980301937196
=> (column=account_id, value=5, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313863376339302d336235362d313165302d623536342d666235303739333835303234225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=5, timestamp=1298031252270173)
---
RowKey: 9:feed:12980301937207
=> (column=account_id, value=9, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313863613637302d336235362d313165302d39622d373530393638613764326561225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=0, timestamp=1298031252270173)

3 Rows Returned.


and

[default@Keyspace1] get Userstream where from_id=5 and type=5;

0 Row Returned.



What's wrong with it?


Re: Understand eventually consistent

2011-02-18 Thread Anthony John
At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ?

But yes - if single node READs are opted for, it will go through.

The original question was - "Why is Cassandra called eventually consistent
data store?"
Because at write time, there is not a guarantee that all replicas are
consistent. But they eventually will be!

At Quorum write and Read - you will not get inconsistent results and your
read will force consistency, if such a state has not yet been arrived at for
the particular piece of data.

But you have the option of or writing and reading at a lower standard, which
could result in inconsistencies.

HTH,

-JA

On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood  wrote:

> But, the reason that it isn't safe to say that we are a strongly consistent
> store is that if 2 of your 3 replicas were to die and come back with no
> data, QUORUM might return the wrong result.
>
> A requirement of a strongly consistent store is that replicas cannot begin
> answering queries until they are consistent: this is not a requirement in
> Cassandra, althought arguably should be an option at some point in the
> distant future.
>
>
> On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton wrote:
>
>> For background...
>>
>> http://wiki.apache.org/cassandra/ArchitectureOverview
>> (There is a section on consistency in there)
>>
>> For  deep background...
>> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>>
>> http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
>>
>> In short, yes (for all your questions) if you read and write at Quorum you
>> have consistency behavior for your operations. Even though some nodes
>> may have an inconsistent view of the data, e.g. one node is partitioned by
>> a broken network or is overloaded and does not respond.
>>
>> Aaron
>>
>> On 18 Feb, 2011,at 02:11 PM, mcasandra  wrote:
>>
>>
>> Why is Cassandra called eventually consistent data store? Wouldn't it be
>> consistent if QUORAM is used?
>>
>> Another question is when I specify replication factor of 3 and write with
>> factor of 2 and read with factor of 2 then what happens?
>>
>> 1. When write occurs cassandra will return to the client only when the
>> writes go to commit log on 2 nodes successfully?
>>
>> 2. When read happens cassandra will return only when it is able to read
>> from
>> 2 nodes and determine that it has consistent copy?
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>>
>


Re: Understand eventually consistent

2011-02-18 Thread Markus Klems
Related question: Is it a good idea to specify ConsistencyLevels on a
per-operation basis? For example: Read ONE Write ALL would deliver
consistent read results, just like Read ALL Write ONE. However, if you
specify Read ONE Write QUORUM you cannot give such guarantees anymore.
Should there be (is there) a programming abstraction on top of
ConsistencyLevel that takes care of these things and makes them
explicit to the application developer?

On Fri, Feb 18, 2011 at 2:04 PM, Anthony John  wrote:
> At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ?
> But yes - if single node READs are opted for, it will go through.
> The original question was - "Why is Cassandra called eventually consistent
> data store?"
> Because at write time, there is not a guarantee that all replicas are
> consistent. But they eventually will be!
> At Quorum write and Read - you will not get inconsistent results and your
> read will force consistency, if such a state has not yet been arrived at for
> the particular piece of data.
> But you have the option of or writing and reading at a lower standard, which
> could result in inconsistencies.
> HTH,
> -JA
> On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood  wrote:
>>
>> But, the reason that it isn't safe to say that we are a strongly
>> consistent store is that if 2 of your 3 replicas were to die and come back
>> with no data, QUORUM might return the wrong result.
>> A requirement of a strongly consistent store is that replicas cannot begin
>> answering queries until they are consistent: this is not a requirement in
>> Cassandra, althought arguably should be an option at some point in the
>> distant future.
>>
>> On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton 
>> wrote:
>>>
>>> For background...
>>> http://wiki.apache.org/cassandra/ArchitectureOverview
>>> (There is a section on consistency in there)
>>> For  deep background...
>>> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>>>
>>> http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
>>> In short, yes (for all your questions) if you read and write at Quorum
>>> you have consistency behavior for your operations. Even though some nodes
>>> may have an inconsistent view of the data, e.g. one node is partitioned
>>> by a broken network or is overloaded and does not respond.
>>>
>>> Aaron
>>> On 18 Feb, 2011,at 02:11 PM, mcasandra  wrote:
>>>
>>>
>>> Why is Cassandra called eventually consistent data store? Wouldn't it be
>>> consistent if QUORAM is used?
>>>
>>> Another question is when I specify replication factor of 3 and write with
>>> factor of 2 and read with factor of 2 then what happens?
>>>
>>> 1. When write occurs cassandra will return to the client only when the
>>> writes go to commit log on 2 nodes successfully?
>>>
>>> 2. When read happens cassandra will return only when it is able to read
>>> from
>>> 2 nodes and determine that it has consistent copy?
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
>>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>>> Nabble.com.
>>
>
>


Re: Frequent updates of freshly written columns

2011-02-18 Thread James Churchman
but a compaction will mutate the sstables and reclaim the space (eventually)  ? 


james

On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:

> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:
> Are the very freshly written columns to a row in memtables, efficiently 
> updated/overwritten by edited/new column values. 
> 
> After flushing of memtable, are those(edited + unedited ones) columns stored 
> together on disk (in same blocks!?) as if they were written in one single 
> operation or same time ?? I know if old columns are edited then several 
> copies of same column will be dispersed in different sst tables, what about 
> fresh columns ?
> 
> Are there any disadvantages to frequently updating fresh columns present in 
> memtable ? 
> 
> The SSTables are immutable but the memtable are not. As long as you 
> update/overwrite a column that is still in memtable, it is simply replaced in 
> memory (so it's as efficient as it gets).
> In other words, when the memtable is flushed, only the last version of the 
> column goes in. 
> 
> --
> Sylvain



Re: Frequent updates of freshly written columns

2011-02-18 Thread Aklin_81
Compaction does not 'mutate' the sst files, it 'merges' several sst files
into one with new indexes, merged data rows & deleting tombstones. Thus you
reclaim your disk space.


On Fri, Feb 18, 2011 at 7:34 PM, James Churchman
wrote:

> but a compaction will mutate the sstables and reclaim the
> space (eventually)  ?
>
>
> james
>
> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
>
> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:
>
>> Are the very freshly written columns to a row in memtables, efficiently
>> updated/overwritten by edited/new column values.
>>
>> After flushing of memtable, are those(edited + unedited ones) columns
>> stored together on disk (in same blocks!?) as if they were written in one
>> single operation or same time ?? I know if old columns are edited then
>> several copies of same column will be dispersed in different sst tables,
>> what about fresh columns ?
>>
>> Are there any disadvantages to frequently updating fresh columns present
>> in memtable ?
>>
>
> The SSTables are immutable but the memtable are not. As long as you
> update/overwrite a column that is still in memtable, it is simply replaced
> in memory (so it's as efficient as it gets).
> In other words, when the memtable is flushed, only the last version of the
> column goes in.
>
> --
> Sylvain
>
>
>


Replacing Redis

2011-02-18 Thread Benson Margulies
I'm about to launch off on replacing redis with cassandra. I wonder if
anyone else has ever been there and done that.


Re: Replacing Redis

2011-02-18 Thread Joshua Partogi
Any reason why you want to do that?

On Sat, Feb 19, 2011 at 1:32 AM, Benson Margulies  wrote:
> I'm about to launch off on replacing redis with cassandra. I wonder if
> anyone else has ever been there and done that.
>



-- 
http://twitter.com/jpartogi


Re: Replacing Redis

2011-02-18 Thread Benson Margulies
redis times out at random regardless of what we configure for client
timeouts; the platform-sensitive binaries are painful for us since we
support many platform; just to name two reasons.

On Fri, Feb 18, 2011 at 10:04 AM, Joshua Partogi  wrote:
> Any reason why you want to do that?
>
> On Sat, Feb 19, 2011 at 1:32 AM, Benson Margulies  
> wrote:
>> I'm about to launch off on replacing redis with cassandra. I wonder if
>> anyone else has ever been there and done that.
>>
>
>
>
> --
> http://twitter.com/jpartogi
>


R and N

2011-02-18 Thread A J
Questions about R and N (and W):
1. If I set R to Quorum and cassandra identifies a need for read
repair before returning, would the read repair happen on R nodes (I
mean subset of R that needs repair) or N nodes before the data is
delivered to the client ?
2. Also does the repair happen at level of row (key) or at level of column ?

3. During write, if W is met but N-W is not met for some reason; would
cassandra try to repair N-W nodes in the background as and when it
can. Or the N-W are only repaired when a read is issued ?

4. What is the significance of the 'primary' replica for writes from
usage point ? Writes to primary and non-primary replicas all happen
simultaneously. Ensuring W is decided irrespective of it being primary
or not. Ensuring R is decided by any of the R nodes out of N.
I know the tokens are divided per the primary replica. But other than
that, for read and write operations, do the primary replica play any
special role ?

Thanks.


Re: Frequent updates of freshly written columns

2011-02-18 Thread James Churchman
ok great, thanks for the exact clarification

On 18 Feb 2011, at 14:11, Aklin_81 wrote:

> Compaction does not 'mutate' the sst files, it 'merges' several sst files 
> into one with new indexes, merged data rows & deleting tombstones. Thus you 
> reclaim your disk space.
> 
> 
> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman  
> wrote:
> but a compaction will mutate the sstables and reclaim the space (eventually)  
> ? 
> 
> 
> james
> 
> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
> 
>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:
>> Are the very freshly written columns to a row in memtables, efficiently 
>> updated/overwritten by edited/new column values. 
>> 
>> After flushing of memtable, are those(edited + unedited ones) columns stored 
>> together on disk (in same blocks!?) as if they were written in one single 
>> operation or same time ?? I know if old columns are edited then several 
>> copies of same column will be dispersed in different sst tables, what about 
>> fresh columns ?
>> 
>> Are there any disadvantages to frequently updating fresh columns present in 
>> memtable ? 
>> 
>> The SSTables are immutable but the memtable are not. As long as you 
>> update/overwrite a column that is still in memtable, it is simply replaced 
>> in memory (so it's as efficient as it gets).
>> In other words, when the memtable is flushed, only the last version of the 
>> column goes in. 
>> 
>> --
>> Sylvain
> 
> 



Re: Coordinator node

2011-02-18 Thread A J
Hi,
Are there any blogs/writeups anyone is aware of that talks of using
primary replica as coordinator node (rather than a random coordinator
node) in production scenarios ?

Thank you.


On Wed, Feb 16, 2011 at 10:53 AM, A J  wrote:
> Thanks for the confirmation. Interesting alternatives to avoid random
> coordinator.
> Are there any blogs/writeups of they (primary node as co-ordinator) been
> used in production scenarios. I googled but could not find anything
> relevant.
> On Wed, Feb 16, 2011 at 3:25 AM, Oleg Anastasyev  wrote:
>>
>> A J  gmail.com> writes:
>>
>> >
>> >
>> > Makes sense ! Thanks.
>> > Just a quick follow-up:
>> > Now I understand the write is not made to coordinator (unless it is part
>> > of
>> the replica for that key). But does the write column traffic 'flow'
>> through the
>> coordinator node. For a 2G column write, will I see 2G network traffic on
>> the
>> coordinator node  or just a few bytes of traffic on the co-ordinator of it
>> reading the key and talking to nodes/client etc ?
>>
>> Yes, if you talk to random (AKA coordinator) node first - all 2G traffic
>> will
>> flow to it first and then forwarded to natural nodes (those owning
>> replicas of a
>> row to be written).
>> If you want to avoid extra traffic, you should determine natural nodes of
>> the
>> row and send your write directly to one of natural nodes (i.e. one of
>> natural
>> nodes became coordinator). This natural coordinator node will accept write
>> locally and submit write to other replicas in parallel.
>> If your client is written in java this can be implemented relatively easy.
>> Look
>> at TokenMetadata.ringIterator().
>>
>> If you have no requirement on using thrift interface of cassandra, it
>> could be
>> more efficient to write using StorageProxy interface. The latter plays a
>> local
>> coordinator role, so it talks directly to all replicas, so these 2G will
>> be
>> passed directly from your client to all row replicas.
>>
>>
>> >
>> > This will be a factor for us. So need to make sure exactly.
>>
>>
>
>


Re: Understand eventually consistent

2011-02-18 Thread Jonathan Ellis
On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood  wrote:
> But, the reason that it isn't safe to say that we are a strongly consistent
> store is that if 2 of your 3 replicas were to die and come back with no
> data, QUORUM might return the wrong result.

 Not so.  If you allow vaporizing arbitrary numbers of machines
without a trace then only systems that block for all replicas on each
update could be considered strongly consistent, and I don't know of
any systems in the wild that actually do that.  Certainly other
systems commonly considered "strongly consisent" like HBase do not.

> A requirement of a strongly consistent store is that replicas cannot begin
> answering queries until they are consistent

The system as a whole can be consistent even if an individual replica
is not; that is the point of CL  > ONE.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Schema init 'best practice'

2011-02-18 Thread Benson Margulies
I want to package some schema with a library.

I could use the hector API to create the schema if not found. Or I
could, what, stuff a yaml file into something? Is there an API for
that, or do I end up where I started?


Re: R and N

2011-02-18 Thread A J
Couple of more related questions:

5. For reads, does Cassandra first read N nodes or just the R nodes it
selects ? I am thinking unless it reads all the N nodes, how will it
know which node has the latest write.

6. Who decides the timestamp that gets inserted into the timestamp
field of every column. I would guess the coordinator node picks up its
system's timestamp.  If that is true, the clocks on all the nodes
should be synchronized, right ? Otherwise conflict resolution cannot
be done correctly.
For a distributed system, this is not always possible. How do folks
get around this issue ?

Thanks.



On Fri, Feb 18, 2011 at 10:23 AM, A J  wrote:
> Questions about R and N (and W):
> 1. If I set R to Quorum and cassandra identifies a need for read
> repair before returning, would the read repair happen on R nodes (I
> mean subset of R that needs repair) or N nodes before the data is
> delivered to the client ?
> 2. Also does the repair happen at level of row (key) or at level of column ?
>
> 3. During write, if W is met but N-W is not met for some reason; would
> cassandra try to repair N-W nodes in the background as and when it
> can. Or the N-W are only repaired when a read is issued ?
>
> 4. What is the significance of the 'primary' replica for writes from
> usage point ? Writes to primary and non-primary replicas all happen
> simultaneously. Ensuring W is decided irrespective of it being primary
> or not. Ensuring R is decided by any of the R nodes out of N.
> I know the tokens are divided per the primary replica. But other than
> that, for read and write operations, do the primary replica play any
> special role ?
>
> Thanks.
>


Re: Schema init 'best practice'

2011-02-18 Thread Jonathan Ellis
On Fri, Feb 18, 2011 at 9:59 AM, Benson Margulies  wrote:
> I want to package some schema with a library.
>
> I could use the hector API to create the schema if not found.

That's probably simplest for your users.  (This is what stress.java
does, for instance.)

Otherwise, I'd recommend bundling a cli script that does the desired
create statements.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Replacing Redis

2011-02-18 Thread Jonathan Shook
Benson,
I was considering using Redis for a specific project. Can you
elaborate a bit on your problem with it? What were the circumstances,
loading factors, etc?

On Fri, Feb 18, 2011 at 9:19 AM, Benson Margulies  wrote:
> redis times out at random regardless of what we configure for client
> timeouts; the platform-sensitive binaries are painful for us since we
> support many platform; just to name two reasons.
>
> On Fri, Feb 18, 2011 at 10:04 AM, Joshua Partogi  
> wrote:
>> Any reason why you want to do that?
>>
>> On Sat, Feb 19, 2011 at 1:32 AM, Benson Margulies  
>> wrote:
>>> I'm about to launch off on replacing redis with cassandra. I wonder if
>>> anyone else has ever been there and done that.
>>>
>>
>>
>>
>> --
>> http://twitter.com/jpartogi
>>
>


RE: cassandra & php

2011-02-18 Thread David Quattlebaum
John,

 

Just wondering what you are using if not phpcassa?

 

Thanks!

 

David

 

From: John Lennard [mailto:j...@gravitate.co.nz] 
Sent: Thursday, February 17, 2011 6:41 PM
To: user@cassandra.apache.org
Subject: Re: cassandra & php

 

Hi,

 

How does this connection pooling fit in with the  TSocketPool.php
classes? Or am I off the wicket here?

 

These are just a few of my observations in relation to what i have seen
so far when working with PHP and Cassandra. I have been working with
cassandra / php for the last 8 months now in a project, and while not
using phpcassa, it strikes me that the Thrift layer in php may need some
energy directed at it. Reads in particular do seem noticeably slow and i
am not sure if this is tied in with the php socket implementation, how
my test cluster is currently set up or how i am currently working with
and structuring my data. I also wonder if there are other aspects of the
thrift layer that could pushed into a native module as there is still
seems to be lots  php code present in the thrift classes.

 

Another observation I have made during this work is that xdebug has a
significant effect on performance, which can make profiling a little
more challenging.

 

 

 

Regards

John

 

 

 

On 18/02/2011, at 10:49 AM, Tyler Hobbs wrote:





what i'm not entirely happy with in using php versus java/hector
is that there isn't any connection pooling.  maybe that's just me and my
poor skills.  


Better connection pooling and failover are on the way.  You can check on
the progress in the connection-pooling branch here:
https://github.com/thobbs/phpcassa/tree/connection-pooling

I just haven't had time to wrap it up lately, but it should be done
soon.

-- 
Tyler Hobbs
Software Engineer, DataStax  
Maintainer of the pycassa   Cassandra
Python client library

 



Re: Replacing Redis

2011-02-18 Thread Benson Margulies
typical experiment.

Redis 2.0.4 deployed on my macbook pro.

Saves enabled.

appendfsync off.

vm enabled, 1g max memory.

72 databases. Each database asked to store 13*N key-value pairs with
lpush, bucket size not very big, N -> 500,000.

Client jredis.

Start running against a stream of inputs. Sooner or later, client
times out 'last operation not performed'.

These could be jredis bugs, but I don't care to find out.


Re: Frequent updates of freshly written columns

2011-02-18 Thread Aklin_81
Sylvain,
I also need to store data that is frequently updated, same column
being updated several times during each user session, at each action
by user, But, this data is not very fresh and hence when I update this
column frequently, there would be many versions of the same column in
several sst files!
Reading this type of data would not be too efficient I guess as the
row would be totally scattered!

Could there be any better strategy to store such data in cassandra?

(Since the column holds an aggregate data obtained from all actions of
the users, I have the need of updating that same column again & again)


my another doubt,  When old column has been updated and exists in the
memtable, but other versions of the column in SST tables exist, do the
reads also scan the sst tables for that column, after memtable. or is
that smart enough to say that this column is the most recent one ?

On Fri, Feb 18, 2011 at 10:32 PM, Aklin_81  wrote:
>
> Sylvain,
> I also need to store data that is frequently updated, same column being 
> updated several times during each user session, at each action by user, But, 
> this data is not very fresh and hence when I update this column frequently, 
> there would be many versions of the same column in several sst files!
> Reading this type of data would not be too efficient I guess as the row would 
> be totally scattered!
>
> Could there be any better strategy to store such data in cassandra?
>
> (Since the column holds an aggregate data obtained from all actions of the 
> users, I have the need of updating that same column again & again)
>
>
> my another doubt,  When old column has been updated and exists in the 
> memtable, but other versions of the column in SST tables exist, do the reads 
> also scan the sst tables for that column, after memtable. or is that smart 
> enough to say that this column is the most recent one ?
>
>
>
>
> On Fri, Feb 18, 2011 at 8:54 PM, James Churchman  
> wrote:
>>
>> ok great, thanks for the exact clarification
>> On 18 Feb 2011, at 14:11, Aklin_81 wrote:
>>
>> Compaction does not 'mutate' the sst files, it 'merges' several sst files 
>> into one with new indexes, merged data rows & deleting tombstones. Thus you 
>> reclaim your disk space.
>>
>>
>> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman  
>> wrote:
>>>
>>> but a compaction will mutate the sstables and reclaim the 
>>> space (eventually)  ?
>>>
>>> james
>>> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
>>>
>>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:

 Are the very freshly written columns to a row in memtables, efficiently 
 updated/overwritten by edited/new column values.

 After flushing of memtable, are those(edited + unedited ones) columns 
 stored together on disk (in same blocks!?) as if they were written in one 
 single operation or same time ?? I know if old columns are edited then 
 several copies of same column will be dispersed in different sst tables, 
 what about fresh columns ?

 Are there any disadvantages to frequently updating fresh columns present 
 in memtable ?
>>>
>>> The SSTables are immutable but the memtable are not. As long as you 
>>> update/overwrite a column that is still in memtable, it is simply replaced 
>>> in memory (so it's as efficient as it gets).
>>> In other words, when the memtable is flushed, only the last version of the 
>>> column goes in.
>>> --
>>> Sylvain
>>
>>
>


Re: R and N

2011-02-18 Thread Anthony John
K - let me state the facts first (As I see know them)
- I do not know the inner workings, so interpret my response with that
caveat. Although, at an architectural level, one should be able to keep
detailed implementation at bay
- Quorum is (N+!)/2 where N is the Replication Factor (RF)
- And consistency is a guarantee if R(ead) + W(rite) > RF (Which Quorum
gives you, but can be achieved via other permutations, depending on whether
Read or Write performance is desired)

No getting to your questions:-
1. If Read at Q is nondeterministic, it would likely have to read the other
(RF-Q) nodes to achieve Quorum on a deterministic value. At which point -
sync'ing all with writes should not be that expensive. But at what point
precisely the read is returned - do not know - you will have to look at the
code. IMO - at this level it should not matter.
2. Should be at the granularity of data divergence
3. Read Repair or Nodetool (which ever comes first)
4. All peer - there is no primary. There might be a connected node - but no
special role/privileges
5. Tries to Q - returns on deterministic read. If not - see (1)
6. Writer supplies timestamp value - can be any value that makes sense
within the scope of data/application.

HTH,

-JA

On Fri, Feb 18, 2011 at 10:28 AM, A J  wrote:

> Couple of more related questions:
>
> 5. For reads, does Cassandra first read N nodes or just the R nodes it
> selects ? I am thinking unless it reads all the N nodes, how will it
> know which node has the latest write.
>
> 6. Who decides the timestamp that gets inserted into the timestamp
> field of every column. I would guess the coordinator node picks up its
> system's timestamp.  If that is true, the clocks on all the nodes
> should be synchronized, right ? Otherwise conflict resolution cannot
> be done correctly.
> For a distributed system, this is not always possible. How do folks
> get around this issue ?
>
> Thanks.
>
>
>
> On Fri, Feb 18, 2011 at 10:23 AM, A J  wrote:
> > Questions about R and N (and W):
> > 1. If I set R to Quorum and cassandra identifies a need for read
> > repair before returning, would the read repair happen on R nodes (I
> > mean subset of R that needs repair) or N nodes before the data is
> > delivered to the client ?
> > 2. Also does the repair happen at level of row (key) or at level of
> column ?
> >
> > 3. During write, if W is met but N-W is not met for some reason; would
> > cassandra try to repair N-W nodes in the background as and when it
> > can. Or the N-W are only repaired when a read is issued ?
> >
> > 4. What is the significance of the 'primary' replica for writes from
> > usage point ? Writes to primary and non-primary replicas all happen
> > simultaneously. Ensuring W is decided irrespective of it being primary
> > or not. Ensuring R is decided by any of the R nodes out of N.
> > I know the tokens are divided per the primary replica. But other than
> > that, for read and write operations, do the primary replica play any
> > special role ?
> >
> > Thanks.
> >
>


Re: Understand eventually consistent

2011-02-18 Thread mcasandra

I have couple of more quesitons:

1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would
read and write fail or get the results from that one machine that is up?
2. Someone in this thread mentioned that write is eventually consistent. Is
it because response is returned to the client as soon as data is written to
commit log. But isn't this same as other RDBMS? Oracle does the same thing
it writes to REDO log and somepoint later does a checkpoint and flushes data
to disk. But RDBMS is not called eventually consistent.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6040893.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understand eventually consistent

2011-02-18 Thread Anthony John
Again, my understanding!

1. Writes will go thru w/hinted handoff, read will fail
2. Yes - but Oracle and others have no partition tolerance and lower levels
of availability. To build in partition tolerance and high availability and
still be shared nothing to avoid SPOF (to cover the RAC implementation), you
have to write on to multiple nodes and then read off multiple nodes to
ensure consistency.

You could always run RF=1 to be like most of the traditional DBMSs. The
issues you would phase are the ones that Cassandra is trying to prevent!

HTH,

-JA

On Fri, Feb 18, 2011 at 11:53 AM, mcasandra  wrote:

>
> I have couple of more quesitons:
>
> 1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would
> read and write fail or get the results from that one machine that is up?
> 2. Someone in this thread mentioned that write is eventually consistent. Is
> it because response is returned to the client as soon as data is written to
> commit log. But isn't this same as other RDBMS? Oracle does the same thing
> it writes to REDO log and somepoint later does a checkpoint and flushes
> data
> to disk. But RDBMS is not called eventually consistent.
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6040893.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


managing a limited-length list as a value

2011-02-18 Thread Benson Margulies
The following is derived from the redis list operations.

The data model is that a key maps to an list of items. The operation
is to push a new item into the front, and discard any items from the
end above a threshold number of items.

of course, this can be done by reading a value, fiddling with it, and
writing it back. I write this email to wonder if there's any native
trickery to avoid having to read the value, but rather permitting some
sort of 'push' operation.


Re: Frequent updates of freshly written columns

2011-02-18 Thread Sylvain Lebresne
On Fri, Feb 18, 2011 at 6:19 PM, Aklin_81  wrote:

> Sylvain,
> I also need to store data that is frequently updated, same column
> being updated several times during each user session, at each action
> by user, But, this data is not very fresh and hence when I update this
> column frequently, there would be many versions of the same column in
> several sst files!
> Reading this type of data would not be too efficient I guess as the
> row would be totally scattered!
>
> Could there be any better strategy to store such data in cassandra?


> (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
>

That why compaction is for. Hopefully even if the column is scattered in
many sstable, compaction will keep that to a handfull of them. Chances are,
you won't see too bad read performances. But other than that, tweaking
memtable thresholds so that you don't flush too often will also help.

Now I don't what is your use case exactly and what is this aggregate. But if
there is a natural way to split this aggregate in multiple columns so that
each update will update only one of those columns forming the aggregate,
hopefully that would help. Really depends on what we are talking about.


> my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the
> reads also scan the sst tables for that column, after memtable. or is
> that smart enough to say that this column is the most recent one ?
>

It can't skip the sstable. The problem is that you never know if the value
you see in the sstable is the more recent one. To take a concrete example,
suppose a node was down. When he goes up, changes are that he will see new
updates before he sees old updates that went while he was down (those will
arrive with either Hinted Handoff, read repair or repair). And more
generally, there is never any guarantee that messages will arrive to
replicas in the order they were received by the coordinator(s).

--
Sylvain


>
> On Fri, Feb 18, 2011 at 10:32 PM, Aklin_81  wrote:
> >
> > Sylvain,
> > I also need to store data that is frequently updated, same column being
> updated several times during each user session, at each action by user, But,
> this data is not very fresh and hence when I update this column frequently,
> there would be many versions of the same column in several sst files!
> > Reading this type of data would not be too efficient I guess as the row
> would be totally scattered!
> >
> > Could there be any better strategy to store such data in cassandra?
> >
> > (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
> >
> >
> > my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the reads
> also scan the sst tables for that column, after memtable. or is that smart
> enough to say that this column is the most recent one ?
> >
> >
> >
> >
> > On Fri, Feb 18, 2011 at 8:54 PM, James Churchman <
> jameschurch...@gmail.com> wrote:
> >>
> >> ok great, thanks for the exact clarification
> >> On 18 Feb 2011, at 14:11, Aklin_81 wrote:
> >>
> >> Compaction does not 'mutate' the sst files, it 'merges' several sst
> files into one with new indexes, merged data rows & deleting tombstones.
> Thus you reclaim your disk space.
> >>
> >>
> >> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman <
> jameschurch...@gmail.com> wrote:
> >>>
> >>> but a compaction will mutate the sstables and reclaim the
> space (eventually)  ?
> >>>
> >>> james
> >>> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
> >>>
> >>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81  wrote:
> 
>  Are the very freshly written columns to a row in memtables,
> efficiently updated/overwritten by edited/new column values.
> 
>  After flushing of memtable, are those(edited + unedited ones) columns
> stored together on disk (in same blocks!?) as if they were written in one
> single operation or same time ?? I know if old columns are edited then
> several copies of same column will be dispersed in different sst tables,
> what about fresh columns ?
> 
>  Are there any disadvantages to frequently updating fresh columns
> present in memtable ?
> >>>
> >>> The SSTables are immutable but the memtable are not. As long as you
> update/overwrite a column that is still in memtable, it is simply replaced
> in memory (so it's as efficient as it gets).
> >>> In other words, when the memtable is flushed, only the last version of
> the column goes in.
> >>> --
> >>> Sylvain
> >>
> >>
> >
>


Re: Understand eventually consistent

2011-02-18 Thread A J
#1, R=2, so if only one machine is up, by definition R cannot be
satisfied. So it will not return.

#2, consistency is an involved topic with no quick and easy
explanation and answers. my 2 cents,
Question of eventual consistency comes in distributed systems, where
you can write to one machine but read from another machine.

If it is not distributed and just one machine, then ofcourse you will
always read your write and be strongly consistent.
Check the post:
http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

Also cassandra is not really eventually consistent but tunable
consistency. You can make it strongly consistent at the cost of
availability.  Check the video and slides out at:
http://cassandra.apache.org/


On Fri, Feb 18, 2011 at 12:53 PM, mcasandra  wrote:
>
> I have couple of more quesitons:
>
> 1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would
> read and write fail or get the results from that one machine that is up?
> 2. Someone in this thread mentioned that write is eventually consistent. Is
> it because response is returned to the client as soon as data is written to
> commit log. But isn't this same as other RDBMS? Oracle does the same thing
> it writes to REDO log and somepoint later does a checkpoint and flushes data
> to disk. But RDBMS is not called eventually consistent.
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6040893.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>


Re: managing a limited-length list as a value

2011-02-18 Thread Norman Maurer
Hi there,

there is not such an operation in cassandra. The only thing which
comes "close" is the TTL support which will "delete" columns after a
given time. See:

http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns

Bye,
Norman

2011/2/18 Benson Margulies :
> The following is derived from the redis list operations.
>
> The data model is that a key maps to an list of items. The operation
> is to push a new item into the front, and discard any items from the
> end above a threshold number of items.
>
> of course, this can be done by reading a value, fiddling with it, and
> writing it back. I write this email to wonder if there's any native
> trickery to avoid having to read the value, but rather permitting some
> sort of 'push' operation.
>


Metadata

2011-02-18 Thread A J
If I wish to find name of all the keys in all the column families
along with other related metadata (such as last updated, size of
column value field), is there an additional solution that caches this
metadata OR do I have to always perform range queries and get the
information ?

I am not interested in the actual value contents of the columns but
just the metadata.

Thanks.


Re: cluster size, several cluster on one node for multi-tenancy

2011-02-18 Thread Mimi Aluminium
Nick,
Assuming I have a tenant that has only one CF, and I am using NetworkAware
repliaction strategy where the keys of this  CF are replicated 3 times, each
copy in a different DC (DC1,DC2,DC3)
Now lets assume the cluster holds 5 DCs. As far as I understand only the
servers that belong to the three DCs that hold a copy will build this
CF's memtable. The servers that belong to the other 2 DCs  (DC4,DC5) wont
have evidence to these CF nor this keyspace, am I correct?

I have additional more basic question as follows:
Is there a way to define two clusters on the same node?  Is it by
configuration in the storage-conf file or does it means additional Cassandra
daemon?
Thanks a lot,
Miriam

On Fri, Feb 18, 2011 at 12:08 PM, Nick Telford wrote:

> Large numbers of keyspaces/column-families are not a good ideas as each
> column-family memtable requires it's own memory. If you have 1000 tenants in
> the same cluster, each with only 1 CF, regardless of the cluster size
> *every* node will require 1 memtable per tenant CF - 1000 memtables.
>
> This limitation is the primary reason for workarounds (such as "virtual
> keyspaces") to enable multi-tenant setups.
>
> You might have more luck partitioning tenants in to different clusters, but
> then you end up with potential hot-spots (where more active tenants generate
> more load on a specific cluster).
>
> Regards,
> Nick
>
>
> On 18 February 2011 09:55, Mimi Aluminium wrote:
>
>>  Thanks a lot for you suggestions,
>> I will check the virtual keyspace solution - btw, currently I am using
>> Thrift client with Pycassa, I am not familiar with Hector - does it mean
>> we'll need to move to Hector client?
>>
>> I thought of using keyspaces for each tenant, but I dont understand how to
>> define the whole cluster. Meaning, assuming the tenants are distributed
>> (replicated) across hundreds  of DCs each consists of tens of racks and
>> servers, so can I define a single cassandra cluster for all the servers? it
>> does not seem to be reasonable , this is the reason I thought of sepearating
>> the clusters. Please let me know how would you solve it?
>> Thanks,
>> Miriam
>>
>>
>>
>> On Thu, Feb 17, 2011 at 10:30 PM, Nate McCall  wrote:
>>
>>> Hector's virtual keyspaces would work well for what you describe. Ed
>>> Anuff, who added this feature to Hector, showed me a working
>>> multi-tennancy based app the other day and it worked quite well.
>>>
>>> On Thu, Feb 17, 2011 at 1:44 PM, Norman Maurer 
>>> wrote:
>>> > Maybe you could make use of "Virtual Keyspaces".
>>> >
>>> > See this wiki for the idea:
>>> > https://github.com/rantav/hector/wiki/Virtual-Keyspaces
>>> >
>>> > Bye,
>>> > Norman
>>> >
>>> > 2011/2/17 Frank LoVecchio :
>>> >> Why not just create some sort of ACL on the client side and use one
>>> >> Keyspace?  It's a lot less management.
>>> >>
>>> >> On Thu, Feb 17, 2011 at 12:34 PM, Mimi Aluminium <
>>> mimi.alumin...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hi,
>>> >>> I really need your help in this matter.
>>> >>> I will try to simplify my problem and ask specific questions
>>> >>>
>>> >>> I am thinking of solving the multi-tenancy problem by providing a
>>> separate
>>> >>> cluster per each tenant. Does it sound reasonable?
>>> >>> I can end-up with one node belongs to several clusters.
>>> >>> Does Cassandra support several clusters per node? Does it mean
>>> several
>>> >>> Cassandra daemons on each node? Do you recommend doing that ? what is
>>> the
>>> >>> overhead? is there any link that explain how to do that?
>>> >>>
>>> >>> Thanks a lot,
>>> >>> Mimi
>>> >>>
>>> >>>
>>> >>> On Wed, Feb 16, 2011 at 6:43 PM, Mimi Aluminium <
>>> mimi.alumin...@gmail.com>
>>> >>> wrote:
>>> 
>>>  Hi,
>>>  We are interested in a multi-tenancy environment, that may consist
>>> of up
>>>  to hundreds of data centers. The current design requires cross rack
>>> and
>>>  cross DC replication. Specifically, the per-tenant CFs will be
>>> replicated 6
>>>  times: in three racks,  with 2 copies inside a rack, the racks will
>>> be
>>>  located in at least two different DCs. In the future other
>>> replication
>>>  policies will be considered. The application will decide where
>>> (which racks
>>>  and DC)  to place each tenant's replicas.  and it might be that one
>>> rack can
>>>  hold more than one tenant.
>>> 
>>>  Separating each tenant in a different keyspace, as was suggested
>>>  in  previous mail thread in this subject, seems to be a good
>>> approach
>>>  (assuming the memtable problem will be solved somehow).
>>>  But then we had concern with regard to the cluster size.
>>>  and here are my questions:
>>>  1) Given the above, should I define one Cassandra cluster that hold
>>> all
>>>  the DCs? sounds not reasonable  given hundreds DCs tens of servers
>>> in each
>>>  DC etc. Where is the bottleneck here? keep-alive messages, the
>>> gossip,
>>>  request routing? what is the largest number of

Error when bringing up 3rd node

2011-02-18 Thread mcasandra

I see following error. Is it because I have initial token defined? What token
should I use as initial token?

 INFO 12:31:36,689 Finished hinted handoff of 0 rows to endpoint
/172.16.208.12
 INFO 12:32:58,448 Joining: getting bootstrap token
ERROR 12:32:58,451 Fatal error: Bootstraping to existing token 0 is not
allowed (decommission/removetoken the old node first).
Bad configuration; unable to start server

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041409.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
It sounds like one of your existing nodes already has the initial token
zero.  Did you set the intial token of the first node you brought online to
zero?

On Fri, Feb 18, 2011 at 12:35 PM, mcasandra  wrote:

>
> I see following error. Is it because I have initial token defined? What
> token
> should I use as initial token?
>
>  INFO 12:31:36,689 Finished hinted handoff of 0 rows to endpoint
> /172.16.208.12
>  INFO 12:32:58,448 Joining: getting bootstrap token
> ERROR 12:32:58,451 Fatal error: Bootstraping to existing token 0 is not
> allowed (decommission/removetoken the old node first).
> Bad configuration; unable to start server
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041409.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra

Yes I had set the first node to token 0. I think I read somewhere in the
docs. What should I do. Should I write a java program to calculate the hash
for 3 nodes and distribute it accross 3 nodes?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041430.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Async write

2011-02-18 Thread mcasandra

I am still trying to understand how writes work. Is there any concept of sync
and async writes? For eg:

If I want to have W=2 but 1 write as sync and the 2nd as async. 

Or say I want to have W=3 with networktopology with DC1 getting 1 sync write
+ 1 async write and DC2 always getting async write.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Async-write-tp6041440p6041440.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
A Java program should work fine.  The Wiki and the DataStax documentation
use a python program for the same purpose:

http://www.datastax.com/docs/0.7/operations/clustering#calculating-tokens

On Fri, Feb 18, 2011 at 12:45 PM, mcasandra  wrote:

>
> Yes I had set the first node to token 0. I think I read somewhere in the
> docs. What should I do. Should I write a java program to calculate the hash
> for 3 nodes and distribute it accross 3 nodes?
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041430.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Async write

2011-02-18 Thread Anthony John
This is transparent!

Essentially - when enough writes are acknowledged to meet the desired
Consistency Level - it returns.

On Fri, Feb 18, 2011 at 2:48 PM, mcasandra  wrote:

>
> I am still trying to understand how writes work. Is there any concept of
> sync
> and async writes? For eg:
>
> If I want to have W=2 but 1 write as sync and the 2nd as async.
>
> Or say I want to have W=3 with networktopology with DC1 getting 1 sync
> write
> + 1 async write and DC2 always getting async write.
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Async-write-tp6041440p6041440.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra

Thanks! This is what I got. Is this right?

public class TokenCalc{
  public static void main(String ...args){
   int nodes=3;
   for(int i = 1 ; i <= nodes; i++) {
 System.out.println( (2 ^ 127) / nodes * i);
   }
  }
}

41
82
123
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Async write

2011-02-18 Thread mcasandra

So does it mean there is no way to say use sync + async ? I am thinking if I
have to write accross data center and doing it synchronuosly is going to be
very slow and will be bad for clients to have to wait. What are my options
or alternatives?

Use N=3 and W=2? And the 3rd one (assuming will be async) will be in other
DC? How do I set it up or possible to setup?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Async-write-tp6041440p6041479.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: frequent client exceptions on 0.7.0

2011-02-18 Thread Andy Skalet
On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton  wrote:
> Messages been dropped means the machine node is overloaded. Look at the 
> thread pool stats to see which thread pools have queues. It may be IO 
> related, so also check the read and write latency on the CF and use iostat.
>
> i would try those first, then jump into GC land.

Thanks, Aaron.  I am looking at the thread pool queues; not enough
data on that yet but so far I've seen queues in the ReadStage from
4-30 (once 100) and MemtablePostFlusher as much as 70, though not consistently.

The read latencies on the CFs on this cluster are sitting around
20-40ms, and the write latencies are are all around .01ms.  That seems
good to me, but I don't have a baseline.

I do see high (90-100%) utilization from time to time on the disk that
holds the data, based on reads.  This doesn't surprise me too much
because IO on these machines is fairly limited in performance.

Does this sound like the node is overloaded?

Andy


Are row-keys sorted by the compareWith?

2011-02-18 Thread cbert...@libero.it
Hi all,
I created a CF in which i need to get, sorted by time, the Rows inside. Each 
Row represents a comment.



I've created a few rows using as Row Key a generated TimeUUID but when I call 
the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: 
rows are not sorted by TimeUUID.
I though it was probably cause of the random-part of the TimeUUID so I create 
a new CF ...



This time I created a few rows using the java System.CurrentTimeMillis() that 
retrieve a long. I call again the "GetColumnsFromRows" and again the same 
results: data are not sorted!
I've read many times that Rows are sorted as specified in the compareWith but 
I can't see it. 
To solve this problem for the moment I've used a SuperColumnFamily with an 
UNIQUE ROW ... but I think this is just a workaround and not the solution.



Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I 
expected: sorted by TimeUUID. Why it does not happen the same with the Rows? 
I'm confused.

TIA for any help.

Best Regards

Carlo


Re: Async write

2011-02-18 Thread A J
W always stands for number of sync writes. N-W is the number of async writes.
Note, N decides number of replicas. W only decides out of those N
replicas, how many should be written synchronously before returning
success of write to client. All writes always happen to a total of N
nodes (W right away and the rest later)
The higher the value of W the more sync writes and so more latency.

I might be wrong, but I think you cannot decide which of the N nodes
will get the sync write. On a write-by-write basis, I think Cassandra
needs the flexibility to decide on several parameters which W out of N
nodes would it write synchronously.



On Fri, Feb 18, 2011 at 3:48 PM, mcasandra  wrote:
>
> I am still trying to understand how writes work. Is there any concept of sync
> and async writes? For eg:
>
> If I want to have W=2 but 1 write as sync and the 2nd as async.
>
> Or say I want to have W=3 with networktopology with DC1 getting 1 sync write
> + 1 async write and DC2 always getting async write.
>
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Async-write-tp6041440p6041440.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>


Re: Are row-keys sorted by the compareWith?

2011-02-18 Thread Jonathan Ellis
No.  CompareWith is for columns.

On Fri, Feb 18, 2011 at 3:16 PM, cbert...@libero.it  wrote:
> Hi all,
> I created a CF in which i need to get, sorted by time, the Rows inside. Each
> Row represents a comment.
>
> 
>
> I've created a few rows using as Row Key a generated TimeUUID but when I call
> the Pelops method "GetColumnsFromRows" I don't get the data back as I expect:
> rows are not sorted by TimeUUID.
> I though it was probably cause of the random-part of the TimeUUID so I create
> a new CF ...
>
> 
>
> This time I created a few rows using the java System.CurrentTimeMillis() that
> retrieve a long. I call again the "GetColumnsFromRows" and again the same
> results: data are not sorted!
> I've read many times that Rows are sorted as specified in the compareWith but
> I can't see it.
> To solve this problem for the moment I've used a SuperColumnFamily with an
> UNIQUE ROW ... but I think this is just a workaround and not the solution.
>
>  CompareSubcolumnsWith="BytesType"/ >
>
> Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I
> expected: sorted by TimeUUID. Why it does not happen the same with the Rows?
> I'm confused.
>
> TIA for any help.
>
> Best Regards
>
> Carlo
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
41
82
123

These certainly not correct.  Can't just use 2 ^ 127, will overflow

You can't use Java's primitive type to do this calculation.   long only use
64 bit.

You'd need to use BigInteger class to do this calculation.

Regards,

Chen

www.evidentsoftware.com

On Fri, Feb 18, 2011 at 4:04 PM, mcasandra  wrote:

>
> Thanks! This is what I got. Is this right?
>
> public class TokenCalc{
>  public static void main(String ...args){
>   int nodes=3;
>   for(int i = 1 ; i <= nodes; i++) {
> System.out.println( (2 ^ 127) / nodes * i);
>   }
>  }
> }
>
> 41
> 82
> 123
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Async write

2011-02-18 Thread Anthony John
Fact as i understand them:-
- A write call to db triggers a number of async writes to all nodes where
the particular write should be recorded (and the nodes are up per Gossip and
so on)
- Once desired CL number of writes acknowledge - the call returns

So your issue is moot. That is what is happening under the covers!


On Fri, Feb 18, 2011 at 3:08 PM, mcasandra  wrote:

>
> So does it mean there is no way to say use sync + async ? I am thinking if
> I
> have to write accross data center and doing it synchronuosly is going to be
> very slow and will be bad for clients to have to wait. What are my options
> or alternatives?
>
> Use N=3 and W=2? And the 3rd one (assuming will be async) will be in other
> DC? How do I set it up or possible to setup?
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Async-write-tp6041440p6041479.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
I'm not sure I can say exactly why, but I'm sure those numbers can't be
correct.  One node should be zero and the other values should be very long
numbers like 85070591730234615865843651857942052863.

We need another Java expert's opinion here, but it looks like your snippet
may have "integer
overflow"
or "integer overload" going on.

On Fri, Feb 18, 2011 at 1:04 PM, mcasandra  wrote:

>
> Thanks! This is what I got. Is this right?
>
> public class TokenCalc{
>  public static void main(String ...args){
>   int nodes=3;
>   for(int i = 1 ; i <= nodes; i++) {
> System.out.println( (2 ^ 127) / nodes * i);
>   }
>  }
> }
>
> 41
> 82
> 123
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Error when bringing up 3rd node

2011-02-18 Thread Jonathan Ellis
Also, ^ means xor in Java, not exponentiation.

Just use the Python Eric linked. :)

On Fri, Feb 18, 2011 at 3:24 PM, Ching-Cheng Chen
 wrote:
> 41
> 82
> 123
> These certainly not correct.  Can't just use 2 ^ 127, will overflow
> You can't use Java's primitive type to do this calculation.   long only use
> 64 bit.
> You'd need to use BigInteger class to do this calculation.
> Regards,
> Chen
> www.evidentsoftware.com
>
> On Fri, Feb 18, 2011 at 4:04 PM, mcasandra  wrote:
>>
>> Thanks! This is what I got. Is this right?
>>
>> public class TokenCalc{
>>  public static void main(String ...args){
>>       int nodes=3;
>>       for(int i = 1 ; i <= nodes; i++) {
>>                 System.out.println( (2 ^ 127) / nodes * i);
>>       }
>>  }
>> }
>>
>> 41
>> 82
>> 123
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
try this

BigInteger bi = new BigInteger("2");
BigInteger or = new BigInteger("2");
 for (int i=1;i<127;i++) {
or = or.multiply(bi);
}
or = or.divide(new BigInteger("3"));
 for (int i=0;i<3;i++) {
System.out.println(or.multiply(new BigInteger(""+i)));
}

which generate

0
56713727820156410577229101238628035242
113427455640312821154458202477256070484

Regards,

Chen

www.evidentsoftware.com

On Fri, Feb 18, 2011 at 4:24 PM, Eric Gilmore  wrote:

> I'm not sure I can say exactly why, but I'm sure those numbers can't be
> correct.  One node should be zero and the other values should be very long
> numbers like 85070591730234615865843651857942052863.
>
> We need another Java expert's opinion here, but it looks like your snippet
> may have "integer 
> overflow"
> or "integer overload" going on.
>
>
> On Fri, Feb 18, 2011 at 1:04 PM, mcasandra  wrote:
>
>>
>> Thanks! This is what I got. Is this right?
>>
>> public class TokenCalc{
>>  public static void main(String ...args){
>>   int nodes=3;
>>   for(int i = 1 ; i <= nodes; i++) {
>> System.out.println( (2 ^ 127) / nodes * i);
>>   }
>>  }
>> }
>>
>> 41
>> 82
>> 123
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041471.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>


Re: Are row-keys sorted by the compareWith?

2011-02-18 Thread Michal Augustýn
Hi,

I see "The CompareWith attribute tells Cassandra how to sort the columns for
slicing operations." on wiki (
http://wiki.apache.org/cassandra/StorageConfiguration). So the CompareWith
defines how to sort column (or super-columns) in scope of one row. So this
option is relate to (multi)get_slice operation.

I'm not sure if you can retrieve sorted rows. The only way how to get more
rows is "get_range_slices" method (and "get_indexed_slices") and there is no
sorting.

Augi

2011/2/18 cbert...@libero.it 

> Hi all,
> I created a CF in which i need to get, sorted by time, the Rows inside.
> Each
> Row represents a comment.
>
> 
>
> I've created a few rows using as Row Key a generated TimeUUID but when I
> call
> the Pelops method "GetColumnsFromRows" I don't get the data back as I
> expect:
> rows are not sorted by TimeUUID.
> I though it was probably cause of the random-part of the TimeUUID so I
> create
> a new CF ...
>
> 
>
> This time I created a few rows using the java System.CurrentTimeMillis()
> that
> retrieve a long. I call again the "GetColumnsFromRows" and again the same
> results: data are not sorted!
> I've read many times that Rows are sorted as specified in the compareWith
> but
> I can't see it.
> To solve this problem for the moment I've used a SuperColumnFamily with an
> UNIQUE ROW ... but I think this is just a workaround and not the solution.
>
>  CompareSubcolumnsWith="BytesType"/ >
>
> Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as
> I
> expected: sorted by TimeUUID. Why it does not happen the same with the
> Rows?
> I'm confused.
>
> TIA for any help.
>
> Best Regards
>
> Carlo
>


Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra

Thanks! I feel so horrible after realizing what mistaked I made :)

After I bring up the new node I just need to run the following on old nodes?

1) New node set the initial token to 56713727820156410577229101238628035242
2) start new node
3) On second node run nodetool move 113427455640312821154458202477256070484

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041649.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Cassandra as write-behind, Cassandra as Cache

2011-02-18 Thread Benson Margulies
Cassandra as dessert topping? Cassandra as floor-wax?

I do apologize for this basket of clueless questions, but I'm
exploring new territory for me.

Overall problem has two datasets with distinct storage characteristics.

The first is a set of data that can fit in memory, but which needs
reliable persistance. In the first instance, there's no need for
replication of this data. One server can have it in memory, it can
update it, but it needs to persist the updates to disk, reliably, so
that it can pick up where it left off. This data is shaped like a hash
table (it's an LSH implementation, if anyone cares) so that there is
on the order of 50-100 'tables', each with 2^13 slots, each slot
containing an array of pairs of strings. In memory on one machine,
it's just 72 ordinary java arrays of references to arrays of strings.
This is enough to accomodate the results of applying it to 1M
documents. The arrays are of bounded size.

To use Cassandra as the persistance mechanism, I would be using it as
a fast log. Each insertion would create an item consisting of a
generated timestamp key, the table index, the slot index, and a paid
of string. Loading up for a reboot would mean reading all the records
and building the memory data structure.

The other part of this is, in some ways, a lot simpler. It's a
key-value map from string keys to blobs, where the blobs derive, by
some serialization or another, from hash tables. (bag-of-words feature
vectors, for the entertained.) The size of this is less bounded, so
I'm inclined to assume that I need to use a read-write persistence
mechanism from the start. However, a lot of it will fit into memory.

Theory 1: use EHCache or something like it.
Theory 2: having it in memory in the Cassandra server is nearly as
good as having it in memory in my jvm, since thrift is thrifty.
Theory 3: I've seen some blogs from a while back about embedding
Cassandra. I'm not clear on the current viability of this, or of the
efficiency thereof.

So, there you have it. Am I on the right mailing list at all, or have
I wandered, as it were, into the wrong sort of bar?


simple erlang example

2011-02-18 Thread Sasha Dolgy
hi,

does anyone have an erlang example for connecting to cassandra and
performing an operation like a get?

I'm not having much luck with: \thrift-0.5.0\test\erl\src\* as a reference
point.

I generated all of the erlang files using thrift and have successfully
compiled them but am having a pretty rough go at it.

Found this old post:
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg02893.html
...
but seems the examples never made it to the wiki.

-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
If you know you will have 3 nodes, you should set the initial token inside
the cassandra.yaml for each node.

Then you won't need to run nodetool move.

Regards,

Chen

www.evidentsoftware.com

On Fri, Feb 18, 2011 at 5:24 PM, mcasandra  wrote:

>
> Thanks! I feel so horrible after realizing what mistaked I made :)
>
> After I bring up the new node I just need to run the following on old
> nodes?
>
> 1) New node set the initial token to 56713727820156410577229101238628035242
> 2) start new node
> 3) On second node run nodetool move 113427455640312821154458202477256070484
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Error-when-bringing-up-3rd-node-tp6041409p6041649.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Timeout

2011-02-18 Thread mcasandra

I have this below code and what I see is that when I run this below code
there is a timeout that occurs when I try to insert a column. But when I
comment out first 4 lines (drop to display) then it works without any
issues. I am trying to understand why. If required I can sleep and then
insert. Is it because it's getting insert too fast before Cassandra is able
to persist keyspace info accross nodes?




hUtil.dropKeyspace(c, KEYSPACE); //drop on the server
 hUtil.createKeyspace(c, KEYSPACE, CF_NAME); //Create on the
server
hUtil.addColumn(c, KEYSPACE, CF_NAME); // Add on the server
hUtil.display(c, KEYSPACE); //Display keyspace info

ExampleDaoV2 ed = new ExampleDaoV2(createKeyspace(KEYSPACE, c));
ed.insert("key1", "value2", StringSerializer.get());
System.out.println(ed.get("key1", StringSerializer.get()));


Caused by: TimedOutException()
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16493)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:93)
... 14 more

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-tp6042052p6042052.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout

2011-02-18 Thread Javier Canillas
Why don't you post some details about your Cassandra Cluster, version,
information about the keyspace you are creating (for example which is the
replication factor within)? It might be of help.

Besides, I don't fully understand your code. First you drop KEYSPACE, then
create it again with a column Family CF_NAME, then you add a Column to
CF_NAME's column definitions (Am I right?) and finally you display
KEYSPACE's information.

After that, you create your own structure by passing again another object.
What does actually do that method "createKeyspace()"?

If you can add more details, it might be of help.

On Fri, Feb 18, 2011 at 10:02 PM, mcasandra  wrote:

>
> I have this below code and what I see is that when I run this below code
> there is a timeout that occurs when I try to insert a column. But when I
> comment out first 4 lines (drop to display) then it works without any
> issues. I am trying to understand why. If required I can sleep and then
> insert. Is it because it's getting insert too fast before Cassandra is able
> to persist keyspace info accross nodes?
>
>
>
>
>hUtil.dropKeyspace(c, KEYSPACE); //drop on the server
> hUtil.createKeyspace(c, KEYSPACE, CF_NAME); //Create on the
> server
>hUtil.addColumn(c, KEYSPACE, CF_NAME); // Add on the server
>hUtil.display(c, KEYSPACE); //Display keyspace info
>
>ExampleDaoV2 ed = new ExampleDaoV2(createKeyspace(KEYSPACE, c));
>ed.insert("key1", "value2", StringSerializer.get());
>System.out.println(ed.get("key1", StringSerializer.get()));
>
>
> Caused by: TimedOutException()
>at
>
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16493)
>at
>
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
>at
>
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
>at
>
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:93)
>... 14 more
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-tp6042052p6042052.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Timeout

2011-02-18 Thread mcasandra

This is a test cluster of 3 nodes.

This is a test code that does the following:

1) First 4 lines physically drop, create keyspace and then creates CF and
column definition on the server
2) Right after from 5th line onwards it then gets the reference to keyspace
and tries to insert a row and columns.

First 4 lines are similiar to creating keyspace, CF definitions etc. on the
cassandra-cli.
Rest of the code is similar to setting/inserting values in column in
cassandra-cli.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-tp6042052p6042147.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Timeout

2011-02-18 Thread mcasandra

Forgot to mention replication factor is 1 and I am running Cassandra 0.7.0.
It's using SimpleStrategy
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-tp6042052p6042150.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem

I've been too smart for my own good trying to type columns, on the theory
that it would later increase performance by having more efficient
comparators in place. So if a string represents an integer, I would convert
it to an integer and declare the column as such. Same for LONG.

What I found is that during the write operation, the type conversion kills
the performance. It's really not too trivial amount of time.

Has anyone had a similar experience?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread Jonathan Ellis
That doesn't make sense to me.  IntegerType validation is a no-op and
LongType validation is pretty close (just a size check).

If you meant that the conversion is killing performance on your
client, you should switch to a more performant client language. :)

On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem  wrote:
>
> I've been too smart for my own good trying to type columns, on the theory
> that it would later increase performance by having more efficient
> comparators in place. So if a string represents an integer, I would convert
> it to an integer and declare the column as such. Same for LONG.
>
> What I found is that during the write operation, the type conversion kills
> the performance. It's really not too trivial amount of time.
>
> Has anyone had a similar experience?
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem

Dude, I never mentioned the server side, sorry if it wasn't obvious.
As for python being slow, I'm not going away from it. It performs
amazingly well in other circumstances.


Jonathan Ellis-3 wrote:
> 
> That doesn't make sense to me.  IntegerType validation is a no-op and
> LongType validation is pretty close (just a size check).
> 
> If you meant that the conversion is killing performance on your
> client, you should switch to a more performant client language. :)
> 
> On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem  wrote:
>>
>> I've been too smart for my own good trying to type columns, on the theory
>> that it would later increase performance by having more efficient
>> comparators in place. So if a string represents an integer, I would
>> convert
>> it to an integer and declare the column as such. Same for LONG.
>>
>> What I found is that during the write operation, the type conversion
>> kills
>> the performance. It's really not too trivial amount of time.
>>
>> Has anyone had a similar experience?
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042601.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: simple erlang example

2011-02-18 Thread Joshua Partogi
Is there any reason why you would be interested to use erlang with
cassandra instead of other erlang based database [i.e Couchbase, Riak]
?

I am interested to know the reason.

Kind regards,
Joshua

On Sat, Feb 19, 2011 at 9:39 AM, Sasha Dolgy  wrote:
> hi,
> does anyone have an erlang example for connecting to cassandra and
> performing an operation like a get?
> I'm not having much luck with: \thrift-0.5.0\test\erl\src\* as a reference
> point.
> I generated all of the erlang files using thrift and have successfully
> compiled them but am having a pretty rough go at it.
> Found this old post:
>  http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg02893.html ...
> but seems the examples never made it to the wiki.
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>



-- 
http://twitter.com/jpartogi


Re: simple erlang example

2011-02-18 Thread Sasha Dolgy
there is a current stratregy to use cassandra for data storage and it makes
sense to have user management and roster management exist in the same place
for all the different services that we provide.

specific to user interaction, i started looking at ejabberd because Apache
Vysper is not as feature rich  ejabber is erlang based and supports
different external sources as a storage provider ... mysql, sql server,
active directory, etc.

-sd

On Sat, Feb 19, 2011 at 8:18 AM, Joshua Partogi wrote:

> Is there any reason why you would be interested to use erlang with
> cassandra instead of other erlang based database [i.e Couchbase, Riak]
> ?
>
> I am interested to know the reason.
>
> Kind regards,
> Joshua
>
>