Re: Cassandra with large number of columns per row

2012-08-20 Thread Chuan-Heng Hsiao
I think the limit of the size per row in cassandra is 2G?

1 x 1M = 10G.

Hsiao

On Mon, Aug 20, 2012 at 1:07 PM, oupfevph  wrote:

> I setup cassandra with default configuration in clean AWS instance, and I
> insert 1 columns into a row, each column has a 1MB data. I use this
> ruby(version 1.9.3) script:
>
> 1.times do
> key = rand(36**8).to_s(36)
> value = rand(36**1024).to_s(36) * 1024
> Cas_client.insert(**TestColumnFamily,TestRow,{key=**>value})
> end
>
> every time I run this script, it will crash:
>
> /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**
> thrift/transport/socket.rb:**109:in `read': CassandraThrift::Cassandra::**
> Client::TransportException
> from /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**
> thrift/transport/base_**transport.rb:87:in `read_all'
> from /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**
> thrift/transport/framed_**transport.rb:104:in `read_frame'
> from /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**
> thrift/transport/framed_**transport.rb:69:in `read_into_buffer'
> from 
> /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**thrift/client.rb:45:in
> `read_message_begin'
> from 
> /usr/local/lib/ruby/gems/1.9.**1/gems/thrift-0.8.0/lib/**thrift/client.rb:45:in
> `receive_message'
> from /usr/local/lib/ruby/gems/1.9.**1/gems/cassandra-0.15.0/**
> vendor/0.8/gen-rb/cassandra.**rb:251:in `recv_batch_mutate'
> from /usr/local/lib/ruby/gems/1.9.**1/gems/cassandra-0.15.0/**
> vendor/0.8/gen-rb/cassandra.**rb:243:in `batch_mutate'
> from /usr/local/lib/ruby/gems/1.9.**1/gems/thrift_client-0.8.1/**
> lib/thrift_client/abstract_**thrift_client.rb:150:in `handled_proxy'
>from /usr/local/lib/ruby/gems/1.9.**1/gems/thrift_client-0.8.1/**
> lib/thrift_client/abstract_**thrift_client.rb:60:in `batch_mutate'
> from 
> /usr/local/lib/ruby/gems/1.9.**1/gems/cassandra-0.15.0/lib/**cassandra/protocol.rb:7:in
> `_mutate'
> from 
> /usr/local/lib/ruby/gems/1.9.**1/gems/cassandra-0.15.0/lib/**cassandra/cassandra.rb:463:in
> `insert'
> from a.rb:6:in `block in '
> from a.rb:3:in `times'
> from a.rb:3:in `'
>
> yet cassandra performs normally, then I run another ruby script to get how
> many columns I have inserted:
>
> p cas_client.count_columns(**TestColumnFamily,TestRow)
>
> this script crashed again, same error message. And cassandra process
> remain in 100% cpu usage.
>
>
> AWS m1.xlarge type instance (15GB mem,800GB harddisk, 4cores cpu)
> cassandra-1.1.2
> ruby-1.9.3-p194
> jdk-7u6-linux-x64
> ruby-gems:
> cassandra (0.15.0)
> thrift (0.8.0)
> thrift_client (0.8.1)
>
> What is the problem?
>
>


Re: Why so slow?

2012-08-20 Thread Peter Morris
Thanks, I shall get onto the developer of the library :)


On Sun, Aug 19, 2012 at 10:13 PM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> You're almost certainly using a client that doesn't set TCP_NODELAY on
> the thrift TCP socket. The nagle algorithm is enabled, leading to 200
> ms latency for each, and thus 5 requests/second.
>
> http://en.wikipedia.org/wiki/Nagle's_algorithm
>
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>


Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-20 Thread Andy Ballingall TF
On Aug 19, 2012 9:55 AM, "aaron morton"  wrote:
>
> > Aaron Morton (aa...@thelastpickle.com) advised:
> >
> > "If possible i would avoid using PHP. The PHP story with cassandra has
> > not been great in the past. There is little love for it, so it takes a
> > while for work changes to get in the client drivers.
> >
> > AFAIK it lacks server side states which makes connection pooling
> > impossible. You should not pool cassandra connections in something
> > like HAProxy."
>
> Please note, this was a personal opinion expressed off list.
>
> It is not a judgement on the quality of PHPCassa or PDO-cassandra,
neither of which I have used.

I'd like to apologise to Aaron for taking part of a private message and
sharing it in public without permission, and for any potential
embarrassment caused. I'm certainly embarrassed by the thoughtlessness I've
displayed.

I've used PHP successfully in many projects, and though I didn't take his
comment as a criticism of the efforts of others in the PHP community, I now
appreciate that some might do. In any case, nothing excuses my lack of
respect for Aaron's privacy, which was a victim purely of particularly
clumsy attempt to open up a public debate and no more.

I suspect I'm not the only person trying to identify the suitability or
otherwise of an application stack with Cassandra. In general, if there's a
path of least resistance, or a path supported by a larger community, then
I'd consider that path rather than impose choices that worked in previous
projects. As Cassandra will probably be the core of that I'm working on, if
there are good reasons why PHP isn't an optimal choice, then I'd consider
adopting the alternative, and that's all I'm trying to get to the bottom
of. Believe me, I'd prefer not to learn yet-another-software stack if I can
help it!

Finally, I do hope that despite my stupidity, Aaron will forgive me and
contribute to this discussion.


Andy








>
> My comments were mostly informed by past issues with Thrift and PHP.
>
> Aaron
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2012, at 10:09 PM, Andy Ballingall TF <
balling...@thefoundry.co.uk> wrote:
>
> > Hi,
> >
> > I've been running a number of tests with Cassandra using a couple of
> > PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
> > PDO-cassandra (
http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
> > and the experience hasn't been great, mainly because I can't try out
> > the CQL3.
> >
> > Aaron Morton (aa...@thelastpickle.com) advised:
> >
> > "If possible i would avoid using PHP. The PHP story with cassandra has
> > not been great in the past. There is little love for it, so it takes a
> > while for work changes to get in the client drivers.
> >
> > AFAIK it lacks server side states which makes connection pooling
> > impossible. You should not pool cassandra connections in something
> > like HAProxy."
> >
> > So my question is - if you were to build a new scalable project from
> > scratch tomorrow sitting on top of Cassandra, which technologies would
> > you select to serve HTTP requests to ensure you get:
> >
> > a) The best support from the cassandra community (e.g. timely updates
> > of drivers, better stability)
> > b) Optimal efficiency between webservers and cassandra cluster, in
> > terms of the performance of individual requests and in the volumes of
> > connections handled per second
> > c) Ease of development and and deployment.
> >
> > What worked for you, and why? What didn't work for you?
> >
> >
> > Thanks,
> > Andy
> >
> >
> > --
> > Andy Ballingall
> > Senior Software Engineer
> >
> > The Foundry
> > 6th Floor, The Communications Building,
> > 48, Leicester Square,
> > London, WC2H 7LT, UK
> > Tel: +44 (0)20 7968 6828 - Fax: +44 (0)20 7930 8906
> > Web: http://www.thefoundry.co.uk/
> >
> > The Foundry Visionmongers Ltd.
> > Registered in England and Wales No: 4642027
>


AUTO: Ken Robbins is out of the office (returning 08/22/2012)

2012-08-20 Thread Ken Robbins


I am out of the office until 08/22/2012.

I will be out of the office with no access to email on Monday and Tuesday
(8/20, 8/21). For urgent  issues, please call or text 781-856-0078.



Note: This is an automated response to your message  "Cassandra with large
number of columns per row" sent on 08/19/2012 23:07:33.

This is the only notification you will receive while this person is away.

Re: Why so slow?

2012-08-20 Thread Peter Morris
I've set NoDelay = true on the socket, and although it is much better it is
still only giving me 500 record inserts per second over a 1Gbps crossover
cable - (I now also get 200 record inserts per second over wireless.)

I would expect the cross over to have much better performance than this.
 Any other ideas?
<>

Re: ColumnFamilies.ReadCount

2012-08-20 Thread Rene Kochen
Okay, thanks for the info! I was just trying to understand what I saw.

2012/8/20 Tyler Hobbs :
>
>
> On Sun, Aug 19, 2012 at 6:27 AM, Rene Kochen 
> wrote:
>>
>>
>> Why does it not increase when servicing a range operation?
>
>
> It doesn't because, basically, it wasn't designed to.  Range queries aren't
> very commonly used with Cassandra, so I doubt that there has been any demand
> for it.  If that's something you'd like to see, feel free to open a ticket
> on jira for it: https://issues.apache.org/jira/browse/CASSANDRA
>
> --
> Tyler Hobbs
> DataStax
>


Re: Why so slow?

2012-08-20 Thread Hiller, Dean
IF one has 1ms delay per request and the other has .001, 1000 requests will be 
a one second delay tacked on(which is huge).  This is why he suggested 
multi-threaded ;).  Maybe there is some other factors as well.

Dean

From: Peter Morris mailto:mrpmor...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, August 20, 2012 4:49 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Why so slow?

I've set NoDelay = true on the socket, and although it is much better it is 
still only giving me 500 record inserts per second over a 1Gbps crossover cable 
- (I now also get 200 record inserts per second over wireless.)

I would expect the cross over to have much better performance than this.  Any 
other ideas?




Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-20 Thread Hiller, Dean
As far as opinions go, the stack we are using is

Playframework 1.2.5 (the stateless nature rocks compared to other
platforms like tomcat or servlet container stuff).
playOrm
Astyanax

Later,
Dean

On 8/17/12 11:54 AM, "Aaron Turner"  wrote:

>My stack:
>
>Java + JRuby + Rails + Torquebox
>
>I'm using the Hector client (arguably the most mature out there) and
>JRuby+RoR+Torquebox gives me a great development platform which really
>scales (full native thread support for example) and is extremely
>powerful.  Honestly I expect, all my future RoR apps will be built on
>JRuby/Torquebox because I've been so happy with it even if I don't
>have a specific need to utilize Java libraries from inside the app.
>
>And the best part is that I've yet to have to write a single line of
>Java! :)
>
>
>
>On Fri, Aug 17, 2012 at 6:53 AM, Edward Capriolo 
>wrote:
>> The best stack is the THC stack. :)
>>
>> Tomcat Hadoop Cassandra :)
>>
>> On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF
>>  wrote:
>>> Hi,
>>>
>>> I've been running a number of tests with Cassandra using a couple of
>>> PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and
>>> PDO-cassandra 
>>>(http://code.google.com/a/apache-extras.org/p/cassandra-pdo/),
>>> and the experience hasn't been great, mainly because I can't try out
>>> the CQL3.
>>>
>>> Aaron Morton (aa...@thelastpickle.com) advised:
>>>
>>> "If possible i would avoid using PHP. The PHP story with cassandra has
>>> not been great in the past. There is little love for it, so it takes a
>>> while for work changes to get in the client drivers.
>>>
>>> AFAIK it lacks server side states which makes connection pooling
>>> impossible. You should not pool cassandra connections in something
>>> like HAProxy."
>>>
>>> So my question is - if you were to build a new scalable project from
>>> scratch tomorrow sitting on top of Cassandra, which technologies would
>>> you select to serve HTTP requests to ensure you get:
>>>
>>> a) The best support from the cassandra community (e.g. timely updates
>>> of drivers, better stability)
>>> b) Optimal efficiency between webservers and cassandra cluster, in
>>> terms of the performance of individual requests and in the volumes of
>>> connections handled per second
>>> c) Ease of development and and deployment.
>>>
>>> What worked for you, and why? What didn't work for you?
>
>-- 
>Aaron Turner
>http://synfin.net/ Twitter: @synfinatic
>http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
>Windows
>Those who would give up essential Liberty, to purchase a little temporary
>Safety, deserve neither Liberty nor Safety.
>-- Benjamin Franklin
>"carpe diem quam minimum credula postero"



(new nosqlOrm linke) composite table with cassandra without using cql3?

2012-08-20 Thread Hiller, Dean
Sorry, project went through a rename and I didn't realize links changed…

https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/orm/layer9z/spi/db/cassandra/CassandraSession.java

NOTE: You can look for the trick we use to store all longs, ints, shorts as 
smallest possible bytes every time.  In fact, a long between –128 and 127 takes 
up one byte in the value part and then of course takes up space in the name 
part.  But overall, it can save lots of space.  (we do same thing with decimal 
types).

Later,
Dean


From: Ben Frank mailto:b...@airlust.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Friday, August 17, 2012 5:29 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: composite table with cassandra without using cql3?

Hi Dean,
   I'm interested in this too, but I get a 404 with the link below, looks like 
I can't see your nosqlORM project.

-Ben

On Thu, Aug 2, 2012 at 9:04 AM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
For how to do it with astyanax, you can see here...

Lines 310 and 335

https://github.com/deanhiller/nosqlORM/blob/indexing/input/javasrc/com/alva
zan/orm/layer3/spi/db/cassandra/CassandraSession.java


For how to do with thrift, you could look at astyanax.

I use it on that project for indexing for the ORM layer we use(which is
not listed on the cassandra ORM's page as of yet ;) ).

Later,
Dean


On 8/2/12 9:50 AM, "Greg Fausak" mailto:g...@named.com>> wrote:

>I've been using the cql3 to create a composite table.
>Can I use the thrift interface to accomplish the
>same thing?  In other words, do I have to use cql 3 to
>get a composite table type? (The same behavior as
>multiple PRIMARY key columns).
>
>Thanks,
>---greg




Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-20 Thread Alex Major
On Sun, Aug 19, 2012 at 11:04 PM, Tyler Hobbs  wrote:

> On Sun, Aug 19, 2012 at 3:55 AM, aaron morton wrote:
>
>>
>>
>> It is not a judgement on the quality of PHPCassa or PDO-cassandra,
>> neither of which I have used.
>>
>> My comments were mostly informed by past issues with Thrift and PHP.
>>
>
> Eh, you don't need to disclaim your opinion that much :)
>
> The PHP clients have, overall, been a bit rough and slow moving compared
> to the Java and Python clients.  My hope is that the transition to cql3
> will it easier to maintain the drivers and clients; it just tends to be a
> lot of work with PHP.
>
> Thrift does have some issues of its own, so perhaps the custom protocol
> that's replacing it will smooth out some of the issues.  Regardless, some
> work on enabling persistent connections is definitely needed.  If anybody
> is familiar enough with that to lend a hand, I would be glad to get some
> kind of support in.
>
> --
> Tyler Hobbs
> DataStax 
>
>
The company I work for currently uses PHP with Cassandra in production and
we're certainly interested in helping out with this. However for persistent
connections and some of the more advanced features, I think it would
require a move away from PHPCassa to the PDO extension. I was around when
the mysql extension introduced persistent connections and it wasn't as
painful as first thought.

We're using PHPCassa at the moment, but currently doing a data-model
re-write towards CQL3 with compound columns/sets. For our part, we were
looking at first moving the PDO driver (which needs some TLC) to CQL3, but
not until the native driver is out in Cassandra 1.2.

The only thing that the PHP Driver won't natively be able to handle
properly is connection pooling (as its stateless), however that can fairly
painlessly be handled in the application via APC (we currently use this
option).

Given a little time I would have confidence that PHP drivers will catch up
to other language drivers, I know we're not the only ones interested in
helping out with that effort.


new node joins the cluster but can't drop schemas

2012-08-20 Thread mdione.ext

  We used to have a nice test cluster with 2 nodes and everything was peachy. 
At some point we (re)added a third node, which seems to work allright. But then 
we try to delete one CF and requery it and we get this:

root@pnscassandra03:~# cqlsh -3
[cqlsh 2.2.0 | Cassandra 1.1.2hebex1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
cqlsh:test_restore> drop table cf1;
cqlsh:test_restore> select count (*) from cf1;
 count
---
  6703 

  "Of course" if we drop the CF from nodes 1 or 2 it disappears as expected:

cqlsh:test_restore> drop table cf1;
cqlsh:test_restore> select count (*) from cf1;
Bad Request: unconfigured columnfamily cf1

  As it's a test cluster and several people had used it on and off to do some 
tests, including the restore tests I posted about last week, I can't say 
exactly what happened with this machines. All I can say it's that logs look 
good[1]. I also have the debug logs if they're useful. Other info in the setup:

Virtual machines running ubuntu 12.04, sun's java6, and nothing much changed 
from the default config. KS created as:

CREATE KEYSPACE test_restore WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor = '2';

USE test_restore;

CREATE TABLE cf2 (
  id int PRIMARY KEY,
  val text
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

and the deceased cf1 is similar to cf2. Any ideas?

--
[1] http://pastebin.lugmen.org.ar/7647
--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Opscenter 2.1 vs 1.3

2012-08-20 Thread Nick Bailey
Robin,

RF shouldn't affect the numbers on that graph at all. The only
explanation for those differences that I can see is the increase in
the number of writes OpsCenter itself is doing. Do you see the same
jump in writes when viewing graphs just for your application's column
families?

-Nick

On Sun, Aug 19, 2012 at 3:02 PM, Robin Verlangen  wrote:
> Hi Nick,
>
> I'm talking about the total writes/reads in the dashboard (left graph). It
> exactly tripled during our update. I guess this is a change because of the
> fact we also replicate with RF=3. Is that true?
>
> With kind regards,
>
> Robin Verlangen
> Software engineer
>
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/8/17 Nick Bailey 
>>
>> Robin,
>>
>> Are you talking about total writes to the cluster, writes to  a
>> specific column family, or something else?
>>
>> There has been some changes to OpsCenters metric collection/storage
>> system but nothing that should cause something like that. Also its
>> possible the number of writes to the OpsCenter keyspace itself would
>> have changed quite a bit between those versions, I'm assuming you
>> don't mean the column families in the OpsCenter keyspace though right?
>>
>> -Nick
>>
>> On Thu, Aug 16, 2012 at 7:05 PM, aaron morton 
>> wrote:
>> > You may have better luck on the Data Stax forums
>> > http://www.datastax.com/support-forums/
>> >
>> > Cheers
>> >
>> > -
>> > Aaron Morton
>> > Freelance Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 17/08/2012, at 4:36 AM, Robin Verlangen  wrote:
>> >
>> > Hi there,
>> >
>> > I just upgraded to opscenter 2.1 (from 1.3). It appears that my writes
>> > have
>> > tripled. Is this a change in the display/measuring of opscenter?
>> >
>> >
>> > Best regards,
>> >
>> > Robin Verlangen
>> > Software engineer
>> >
>> > W http://www.robinverlangen.nl
>> > E ro...@us2.nl
>> >
>> > Disclaimer: The information contained in this message and attachments is
>> > intended solely for the attention and use of the named addressee and may
>> > be
>> > confidential. If you are not the intended recipient, you are reminded
>> > that
>> > the information remains the property of the sender. You must not use,
>> > disclose, distribute, copy, print or rely on this e-mail. If you have
>> > received this message in error, please contact the sender immediately
>> > and
>> > irrevocably delete this message and any copies.
>> >
>> >
>
>


RE: new node joins the cluster but can't drop schemas

2012-08-20 Thread mdione.ext
De : mdione@orange.com [mailto:mdione@orange.com]
>   We used to have a nice test cluster with 2 nodes and everything was
> peachy. At some point we (re)added a third node, which seems to work
> allright. But then we try to delete one CF and requery it and we get
> this:

  Seems we've got biten by (at least one of) these bugs:

https://issues.apache.org/jira/browse/CASSANDRA-4432

https://issues.apache.org/jira/browse/CASSANDRA-4472

  Sorry for the noise.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Why so slow?

2012-08-20 Thread Peter Morris
I'm assessing how quickly on average I can deal with a single request.  I
cannot believe that connecting through a 1Gbps network cable is 14 times
slower.  I think I get a higher insert rate for SQL Server.



On Mon, Aug 20, 2012 at 1:20 PM, Hiller, Dean  wrote:

>  IF one has 1ms delay per request and the other has .001, 1000 requests
> will be a one second delay tacked on(which is huge).  This is why he
> suggested multi-threaded ;).  Maybe there is some other factors as well.
>
> Dean
>
> From: Peter Morris mailto:mrpmor...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Monday, August 20, 2012 4:49 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: Why so slow?
>
> I've set NoDelay = true on the socket, and although it is much better it
> is still only giving me 500 record inserts per second over a 1Gbps
> crossover cable - (I now also get 200 record inserts per second over
> wireless.)
>
> I would expect the cross over to have much better performance than this.
>  Any other ideas?
>
>
>


Re: Why so slow?

2012-08-20 Thread Hiller, Dean
There is latency and throughput.  These are two totally different things even 
for MySQL.  If you are single threaded, each request (even with MySql) has to 
be delayed by 1ms or whatever your ping time is.  To fully utilize a 1Gps 
bandwidth, you NEED to be multithreaded or you are wasting bandwidth…and even 
then, you probably waste bandwidth as one CPU can't always keep up with keeping 
the pipe filled.

Later,
Dean

From: Peter Morris mailto:mrpmor...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, August 20, 2012 9:29 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Why so slow?

I'm assessing how quickly on average I can deal with a single request.  I 
cannot believe that connecting through a 1Gbps network cable is 14 times 
slower.  I think I get a higher insert rate for SQL Server.



On Mon, Aug 20, 2012 at 1:20 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
 IF one has 1ms delay per request and the other has .001, 1000 requests will be 
a one second delay tacked on(which is huge).  This is why he suggested 
multi-threaded ;).  Maybe there is some other factors as well.

Dean

From: Peter Morris 
mailto:mrpmor...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Monday, August 20, 2012 4:49 AM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: Re: Why so slow?

I've set NoDelay = true on the socket, and although it is much better it is 
still only giving me 500 record inserts per second over a 1Gbps crossover cable 
- (I now also get 200 record inserts per second over wireless.)

I would expect the cross over to have much better performance than this.  Any 
other ideas?





Re: Why so slow?

2012-08-20 Thread Peter Morris
My misunderstanding, thanks for correcting me!


On Mon, Aug 20, 2012 at 4:32 PM, Hiller, Dean  wrote:

> There is latency and throughput.  These are two totally different things
> even for MySQL.  If you are single threaded, each request (even with MySql)
> has to be delayed by 1ms or whatever your ping time is.  To fully utilize a
> 1Gps bandwidth, you NEED to be multithreaded or you are wasting
> bandwidth…and even then, you probably waste bandwidth as one CPU can't
> always keep up with keeping the pipe filled.
>
>


CQL logical operator: OR

2012-08-20 Thread Peter Morris
select * from Users where UserName='me' or EmailAddress='m...@home.com';
Bad Request: line 1:40 mismatched input 'or' expecting EOF

Could someone tell me how to use OR conditions in CQL? I am able to find
examples of AND, but none for OR and it doesn't seem to work.


Re: CQL logical operator: OR

2012-08-20 Thread Juan Ezquerro
Cassandra doesn't support disjunctions (OR) yet, so you'll have to do
multiple queries.

https://groups.google.com/forum/?fromgroups#!topic/phpcassa/Py42QgDHm3w%5B1-25%5D

2012/8/20 Peter Morris 

> select * from Users where UserName='me' or EmailAddress='m...@home.com';
> Bad Request: line 1:40 mismatched input 'or' expecting EOF
>
> Could someone tell me how to use OR conditions in CQL? I am able to find
> examples of AND, but none for OR and it doesn't seem to work.
>
>
>


-- 
Juan Ezquerro LLanes 

Telf: 618349107/964051479


Re: Why so slow?

2012-08-20 Thread Carlos Carrasco
Are you inserting in bulk? Try to increase the amount of mutations you send
in a single batch, otherwise you are just measuring the TCP roundtrip time.

On 20 August 2012 17:36, Peter Morris  wrote:

> My misunderstanding, thanks for correcting me!
>
>
> On Mon, Aug 20, 2012 at 4:32 PM, Hiller, Dean wrote:
>
>> There is latency and throughput.  These are two totally different things
>> even for MySQL.  If you are single threaded, each request (even with MySql)
>> has to be delayed by 1ms or whatever your ping time is.  To fully utilize a
>> 1Gps bandwidth, you NEED to be multithreaded or you are wasting
>> bandwidth…and even then, you probably waste bandwidth as one CPU can't
>> always keep up with keeping the pipe filled.
>>
>>


Re: Why so slow?

2012-08-20 Thread Hiller, Dean
Be careful on bulk as cassandra takes a bit longer to process.  It was faster 
not doing too many rows at a time multithreaded in our performance testing and 
if I remember Aaron Morton might have told me that as well.

Definitely use the cassandra bulk testing tool as well.  I used that and 
compared it to my tool until I got my tool in par with their tool and you can 
post the numbers for the cassandra bulk testing tool and I know there was 
someone on this list who told me the expected writes/ms(it was probably Aaron 
as well).

Later,
Dean

From: Carlos Carrasco 
mailto:carlos.carra...@groupalia.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, August 20, 2012 10:03 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Why so slow?

Are you inserting in bulk? Try to increase the amount of mutations you send in 
a single batch, otherwise you are just measuring the TCP roundtrip time.

On 20 August 2012 17:36, Peter Morris 
mailto:mrpmor...@gmail.com>> wrote:
My misunderstanding, thanks for correcting me!


On Mon, Aug 20, 2012 at 4:32 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
There is latency and throughput.  These are two totally different things even 
for MySQL.  If you are single threaded, each request (even with MySql) has to 
be delayed by 1ms or whatever your ping time is.  To fully utilize a 1Gps 
bandwidth, you NEED to be multithreaded or you are wasting bandwidth…and even 
then, you probably waste bandwidth as one CPU can't always keep up with keeping 
the pipe filled.





get_slice on wide rows

2012-08-20 Thread feedly team
I have a column family that I am using for consistency purposes. Basically
a marker column is written to a row in this family before some actions take
place and is deleted only after all the actions complete. The idea is that
if something goes horribly wrong this table can be read to see what needs
to be fixed.

In my dev environment things worked as planned, but in a larger scale/high
traffic environment, the slice query times out and then cassandra quickly
runs out of memory. The main difference here is that there is a very large
number of writes (and deleted columns) in the row my code is attempting to
read. Is the problem that cassandra is attempting to load all the deleted
columns into memory? I did an sstableToJson dump and saw that the "d"
deletion marker seemed to be present for the columns, though i didn't write
any code to check all values. Is the solution here partitioning the wide
row into multiple narrower rows?


[RELEASE] Apache Cassandra 1.1.4 released

2012-08-20 Thread Eric Evans
The Cassandra team is pleased to announce the release of Apache Cassandra 1.1.4

This is a maintenance release; The list of changes[1] is quite small
but practice safe upgrades, and always read the release notes[2].  If
you encounter any problems, please let us know[3].

Downloads of source and binary distributions are listed on the
download page of the website:

http://cassandra.apache.org/download

Cheers,

[1]: http://goo.gl/Iu7W3 (CHANGES.txt)
[2]: http://goo.gl/yi8Iu (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


How to add secondary index to existing column family with CLI?

2012-08-20 Thread Ryabin, Thomas
I want to add a secondary index to an existing column family, but am running 
into some trouble. I'm trying to use the Cassandra CLI to add the secondary 
index. The column family is called "books", the column I'm trying to index is 
called "title", the key validation class is UTF8Type, and the default column 
value validator is BytesType.

I first tried running this command with no success:
update column family books with column_metadata=[{column_name: title, 
index_type: KEYS}];

I got the error:
cannot parse 'title' as hex bytes.


I then tried running:
update column family books with column_metadata=[{column_name: utf8('title'), 
index_type: KEYS}];

but got the error:
cannot parse 'FUNCTION_CALL' as hex bytes


Is there something I should be doing differently?

Thanks,
Thomas




CQL results are confusing me

2012-08-20 Thread Peter Morris
Consider the following statements

#1 New family is created so I have no data
create columnfamily Test (UserName varchar primary key, EmailAddress
varchar);

#2 Count how many rows I have
select count(1) from Test;
-Expected: 0
-Actual: 0

#3 Select all users with a specific email address
select * from Test where EmailAddress = 'x';
-Expected: Zero rows
-Actual: Bad Request: No indexed columns present in by-columns clause with
"equals" operator


#4 Select all users with a specific user name (primary key)
select * from Test where UserName = 'x';
-Expected: Zero rows
-Actual: 1 row with 1 column 'UserName' = 'x';


I am simply trying to determine if a user already exists with a specific
email address or with a specific user name.  Item #4 is the most confusing,
what is going on?


Re: CQL results are confusing me

2012-08-20 Thread Juan Ezquerro
EmailAddress is not indexed, must declare key for this before can do a
search.

2012/8/20 Peter Morris 

> Consider the following statements
>
> #1 New family is created so I have no data
> create columnfamily Test (UserName varchar primary key, EmailAddress
> varchar);
>
> #2 Count how many rows I have
> select count(1) from Test;
> -Expected: 0
> -Actual: 0
>
> #3 Select all users with a specific email address
> select * from Test where EmailAddress = 'x';
> -Expected: Zero rows
> -Actual: Bad Request: No indexed columns present in by-columns clause with
> "equals" operator
>
>
> #4 Select all users with a specific user name (primary key)
> select * from Test where UserName = 'x';
> -Expected: Zero rows
> -Actual: 1 row with 1 column 'UserName' = 'x';
>
>
> I am simply trying to determine if a user already exists with a specific
> email address or with a specific user name.  Item #4 is the most confusing,
> what is going on?
>



-- 
Juan Ezquerro LLanes 

Telf: 618349107/964051479


Re: Index build status

2012-08-20 Thread Jeremy Hanna
For an individual node, you can check the status of building indexes using 
nodetool compactionstats.  And similarly, if you want to speed up building the 
indexes (and you have the extra IO) you can increase or unthrottle your 
compaction throughput temporarily - nodetool setcompactionthrough 0 to 
unthrottle it.  Default is 16.

On Aug 20, 2012, at 2:05 PM, A J  wrote:

> Hi
> What command gives me status as index is being built. On a CF with
> thousands of rows, the index is taking a while. Need to find the
> status of its build.
> 
> Thanks.



Re: Thrift batch_mutate erase previous data?

2012-08-20 Thread Cyril Auburtin
no right it's ok, it was a bug on my side

2012/8/11 Tyler Hobbs 

>
>
> On Thu, Aug 9, 2012 at 10:43 AM, Cyril Auburtin 
> wrote:
>
>> It seems the Thrift method *batch-mutate*, with Mutations, will not
>> update the previous data with the mutation given, but clear and replace by
>> it? right?
>>
>
> I'm not sure what you're asking.  Writes in Cassandra are always blind
> overwrites, there's not a concept of clearing or replacing.
>
> --
> Tyler Hobbs
> DataStax 
>
>


nodetool output through REST API?

2012-08-20 Thread Yang
I'm trying to write a little python script to manage our cassandra cluster.

it uses output from nodetool, for example to find the current token
assignment, node status etc.

I could do this by parsing output from "nodetool ring" command.

but is there a more "native way" , for example through some REST API or
python API, so that I avoid
the possible changes in formatting of the output?

I checked pycassa, it doesn't seem to have an API for the JMX services


Thanks
Yang


nodetool , localhost connection refused

2012-08-20 Thread A J
I am running 1.1.3
Nodetool on the database node (just a single node db) is giving the error:
Failed to connect to 'localhost:7199': Connection refused

Any idea what could be causing this ?

Thanks.


Re: nodetool , localhost connection refused

2012-08-20 Thread Hiller, Dean
My guess is "telnet localhost 7199" also fails?  And if you are on linux
and run netstat -anp, you will see no one is listening on that port?

So database node did not start and bind to that port and you would see
exception in the logs of that database nodeŠ.just a guess.

Dean

On 8/20/12 4:10 PM, "A J"  wrote:

>I am running 1.1.3
>Nodetool on the database node (just a single node db) is giving the error:
>Failed to connect to 'localhost:7199': Connection refused
>
>Any idea what could be causing this ?
>
>Thanks.



Re: nodetool , localhost connection refused

2012-08-20 Thread A J
Yes, the telnet does not work.
Don't know what it was but switching to 1.1.4 solved the issue.

On Mon, Aug 20, 2012 at 6:17 PM, Hiller, Dean  wrote:
> My guess is "telnet localhost 7199" also fails?  And if you are on linux
> and run netstat -anp, you will see no one is listening on that port?
>
> So database node did not start and bind to that port and you would see
> exception in the logs of that database nodeŠ.just a guess.
>
> Dean
>
> On 8/20/12 4:10 PM, "A J"  wrote:
>
>>I am running 1.1.3
>>Nodetool on the database node (just a single node db) is giving the error:
>>Failed to connect to 'localhost:7199': Connection refused
>>
>>Any idea what could be causing this ?
>>
>>Thanks.
>


Re: nodetool output through REST API?

2012-08-20 Thread Tyler Hobbs
Your best bet is probably to set up mx4j with Cassandra, which will expose
a REST api for all of the JMX stuff.

On Mon, Aug 20, 2012 at 2:46 PM, Yang  wrote:

> I'm trying to write a little python script to manage our cassandra cluster.
>
> it uses output from nodetool, for example to find the current token
> assignment, node status etc.
>
> I could do this by parsing output from "nodetool ring" command.
>
> but is there a more "native way" , for example through some REST API or
> python API, so that I avoid
> the possible changes in formatting of the output?
>
> I checked pycassa, it doesn't seem to have an API for the JMX services
>
>
> Thanks
> Yang
>



-- 
Tyler Hobbs
DataStax 


Re: nodetool repair uses insane amount of disk space

2012-08-20 Thread Michael Morris
Thanks everyone, for the pointers.  I've found an opportunity to simplify
the setup, still 2 DCs and 3 rack setup (RF = 1 for DC with 1 rack, and RF
= 2 for DC with 2 racks), but now each rack contains 9 nodes with even
token distribution.

Once I got the new topology in place, I ran multiple repairs (serially) on
a single node to see if I could get the merkel trees to sync up with the
other nodes in that range.  I knew the 1st time, and even expected the 2nd
run, would be a bit out of sync.  What surprised me was that on the 3rd
repair run, there were still over 600 ranges out of sync for 1 CF, and over
1000 ranges out of sync for another CF.  To me, this isn't a big deal
(unless someone more knowledgeable about these things thinks it is), but
the repair process isn't using nearly as much space while it's doing its
work.