Re: Doubt regarding CQL

2012-02-22 Thread Mateusz Korniak
On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> I have installed CQL drivers for python. When I try execute cqlsh I get
> following error
> cql-1.0.3$ cqlsh localhost 9160
> (...)
>   File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> line 7, in  from thrift.Thrift import *
> ImportError: No module named thrift.Thrift

Seems you do not have installed python thrift module.

In my distro (PLD) it is:
Package:python-thrift-0.5.0-4.i686
/usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
/usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo, 
TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, 
__init__.pyo,
/usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc, 
TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, 
TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
/usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc, 
THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, 
TServer.pyo, __init__.pyc, __init__.pyo
/usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc, 
THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, 
TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo


Regards,

-- 
Mateusz Korniak


Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta1 released

2012-02-22 Thread Sylvain Lebresne
Arf, you'r right sorry.
I've fixed it (but it could take ~1 to get propagated to all apache mirrors).

--
SYlvain

On Wed, Feb 22, 2012 at 2:46 AM, Maki Watanabe  wrote:
> The link is wrong.
> http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
> Should be:
> http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0-beta1/apache-cassandra-1.1.0-beta1-bin.tar.gz
>
>
> 2012/2/21 Sylvain Lebresne :
>> The Cassandra team is pleased to announce the release of the first beta for
>> the future Apache Cassandra 1.1.
>>
>> Let me first stress that this is beta software and as such is *not* ready for
>> production use.
>>
>> The goal of this release is to give a preview of what will become Cassandra
>> 1.1 and to get wider testing before the final release. All help in testing
>> this release would be therefore greatly appreciated and please report any
>> problem you may encounter[3,4]. Have a look at the change log[1] and the
>> release notes[2] to see where Cassandra 1.1 differs from the previous series.
>>
>> Apache Cassandra 1.1.0-beta1[5] is available as usual from the cassandra
>> website (http://cassandra.apache.org/download/) and a debian package is
>> available using the 11x branch (see
>> http://wiki.apache.org/cassandra/DebianPackaging).
>>
>> Thank you for your help in testing and have fun with it.
>>
>> [1]: http://goo.gl/6iURu (CHANGES.txt)
>> [2]: http://goo.gl/hWilW (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>> [4]: user@cassandra.apache.org
>> [5]: 
>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta1
>
>
>
> --
> w3m


Re: What linux distro for the Cassandra nodes ?

2012-02-22 Thread aaron morton
Here are the platforms the data stax distro supports 
http://www.datastax.com/products/community/platforms

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/02/2012, at 12:11 AM, Aditya Gupta wrote:

> I am about to choose a linux distro to be installed on Cassandra nodes. Which 
> are the most popular & recommended ones by Cassandra community? (Not 
> interested in paying licensing fees)
> 
> 



RE: Doubt regarding CQL

2012-02-22 Thread Rishabh Agrawal
Thanks for the reply
I installed 0.8.0 drift package. But still problem persists.

-Original Message-
From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl]
Sent: Wednesday, February 22, 2012 1:47 PM
To: user@cassandra.apache.org
Subject: Re: Doubt regarding CQL

On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> I have installed CQL drivers for python. When I try execute cqlsh I
> get following error cql-1.0.3$ cqlsh localhost 9160
> (...)
>   File
> "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> line 7, in  from thrift.Thrift import *
> ImportError: No module named thrift.Thrift

Seems you do not have installed python thrift module.

In my distro (PLD) it is:
Package:python-thrift-0.5.0-4.i686
/usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
/usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo, 
TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, 
__init__.pyo,
/usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc, 
TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, TProtocol.pyc, 
TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
/usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc, 
THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, 
TServer.pyo, __init__.pyc, __init__.pyo
/usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc, 
THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, 
TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo


Regards,

--
Mateusz Korniak



Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big Data 
Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7.

Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & 
Smartphones’ available at http://bit.ly/yQC1oD.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


List all keys with RandomPartitioner

2012-02-22 Thread Flavio Baronti

I need to iterate over all the rows in a column family stored with 
RandomPartitioner.
When I reach the end of a key slice, I need to find the token of the last key 
in order to ask for the next slice.
I saw in an old email that the token for a specific key can be recoveder through FBUtilities.hash(). That class however 
is inside the full Cassandra jar, not inside the client-specific part.

Is there a way to iterate over all the keys which does not require the 
server-side Cassandra jar?

Thanks
Flavio


Re: List all keys with RandomPartitioner

2012-02-22 Thread Henrik Schröder
I had to port that piece of code to C#, and it's just a few lines of code,
so just write your own. Here's the original so you can see what it does:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=src/java/org/apache/cassandra/utils/FBUtilities.java;hb=refs/heads/trunk


/Henrik

On Wed, Feb 22, 2012 at 10:47, Flavio Baronti wrote:

> I need to iterate over all the rows in a column family stored with
> RandomPartitioner.
> When I reach the end of a key slice, I need to find the token of the last
> key in order to ask for the next slice.
> I saw in an old email that the token for a specific key can be recoveder
> through FBUtilities.hash(). That class however is inside the full Cassandra
> jar, not inside the client-specific part.
> Is there a way to iterate over all the keys which does not require the
> server-side Cassandra jar?
>
> Thanks
> Flavio
>


Re: List all keys with RandomPartitioner

2012-02-22 Thread Franc Carter
On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti wrote:

> I need to iterate over all the rows in a column family stored with
> RandomPartitioner.
> When I reach the end of a key slice, I need to find the token of the last
> key in order to ask for the next slice.
> I saw in an old email that the token for a specific key can be recoveder
> through FBUtilities.hash(). That class however is inside the full Cassandra
> jar, not inside the client-specific part.
> Is there a way to iterate over all the keys which does not require the
> server-side Cassandra jar?
>

Does this help ?

 http://wiki.apache.org/cassandra/FAQ#iter_world

cheers


> Thanks
> Flavio
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: List all keys with RandomPartitioner

2012-02-22 Thread Flavio Baronti

Il 2/22/2012 12:24 PM, Franc Carter ha scritto:

On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti mailto:f.baro...@list-group.com>> wrote:

I need to iterate over all the rows in a column family stored with 
RandomPartitioner.
When I reach the end of a key slice, I need to find the token of the last 
key in order to ask for the next slice.
I saw in an old email that the token for a specific key can be recoveder 
through FBUtilities.hash(). That class
however is inside the full Cassandra jar, not inside the client-specific 
part.
Is there a way to iterate over all the keys which does not require the 
server-side Cassandra jar?


Does this help ?

http://wiki.apache.org/cassandra/FAQ#iter_world

cheers



Looks good... I thought you were not supposed to iterate directly over row keys 
with a RandomPartitioner!
Thanks
Flavio


Re: List all keys with RandomPartitioner

2012-02-22 Thread Rafael Almeida
>
> From: Franc Carter 
>To: user@cassandra.apache.org 
>Sent: Wednesday, February 22, 2012 9:24 AM
>Subject: Re: List all keys with RandomPartitioner
> 
>
>On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti  
>wrote:
>
>I need to iterate over all the rows in a column family stored with 
>RandomPartitioner.
>>When I reach the end of a key slice, I need to find the token of the last key 
>>in order to ask for the next slice.
>>I saw in an old email that the token for a specific key can be recoveder 
>>through FBUtilities.hash(). That class however is inside the full Cassandra 
>>jar, not inside the client-specific part.
>>Is there a way to iterate over all the keys which does not require the 
>>server-side Cassandra jar?
>>
>
>
>Does this help ?
>
>
> http://wiki.apache.org/cassandra/FAQ#iter_world


I don't get it. It says to use the last key read as start key, but what should 
be used as end key?


Re: List all keys with RandomPartitioner

2012-02-22 Thread R. Verlangen
You can leave the end key empty.

1) Start with "startkey" = ""
2) Next iteration start with "startkey" = "last key of the previous batch"
3) Keep on going until you ran out of results

2012/2/22 Rafael Almeida 

> >
> > From: Franc Carter 
> >To: user@cassandra.apache.org
> >Sent: Wednesday, February 22, 2012 9:24 AM
> >Subject: Re: List all keys with RandomPartitioner
> >
> >
> >On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti 
> wrote:
> >
> >I need to iterate over all the rows in a column family stored with
> RandomPartitioner.
> >>When I reach the end of a key slice, I need to find the token of the
> last key in order to ask for the next slice.
> >>I saw in an old email that the token for a specific key can be recoveder
> through FBUtilities.hash(). That class however is inside the full Cassandra
> jar, not inside the client-specific part.
> >>Is there a way to iterate over all the keys which does not require the
> server-side Cassandra jar?
> >>
> >
> >
> >Does this help ?
> >
> >
> > http://wiki.apache.org/cassandra/FAQ#iter_world
>
>
> I don't get it. It says to use the last key read as start key, but what
> should be used as end key?
>


Re: Doubt regarding CQL

2012-02-22 Thread paul cannon
Rishabh-

It looks like you're not actually using the cqlsh that comes with Cassandra
1.0.7.  Are you using an old version of the Python CQL driver?  Old
versions of the driver had cqlsh bundled with it, instead of with Cassandra.

The 1.0.7 Debian/Ubuntu packages do not include cqlsh, because of some
packaging+distribution difficulties (resolved in 1.1).  One easy way to get
cqlsh as part of a package is to use the free DataStax Community Edition:
see http://www.datastax.com/products/community .  Cqlsh is included in the
"dsc" package.  That package will also bring in thrift and any other
dependencies you need.

p


On Wed, Feb 22, 2012 at 3:00 AM, Rishabh Agrawal <
rishabh.agra...@impetus.co.in> wrote:

> Thanks for the reply
> I installed 0.8.0 drift package. But still problem persists.
>
> -Original Message-
> From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl]
> Sent: Wednesday, February 22, 2012 1:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Doubt regarding CQL
>
> On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> > I have installed CQL drivers for python. When I try execute cqlsh I
> > get following error cql-1.0.3$ cqlsh localhost 9160
> > (...)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> > line 7, in  from thrift.Thrift import *
> > ImportError: No module named thrift.Thrift
>
> Seems you do not have installed python thrift module.
>
> In my distro (PLD) it is:
> Package:python-thrift-0.5.0-4.i686
> /usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
> /usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo,
> TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo,
> __init__.pyc, __init__.pyo,
> /usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc,
> TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo,
> TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
> /usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc,
> THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo,
> TServer.pyc, TServer.pyo, __init__.pyc, __init__.pyo
> /usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc,
> THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo,
> TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo
>
>
> Regards,
>
> --
> Mateusz Korniak
>
> 
>
> Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big
> Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1)
> http://bit.ly/bSMWd7.
>
> Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets &
> Smartphones’ available at http://bit.ly/yQC1oD.
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

Hello everybody,

I'm being asked whether we can serve an "object", which I assume is a 
blob, of 750MB size?
I guess the real question is of how to chunk it and/or even it's 
possible to chunk it.


Thanks!

Maxim



Re: Please advise -- 750MB object possible?

2012-02-22 Thread Dan Retzlaff
Chunking is a good idea, but you'll have to do it yourself. A few of the
columns in our application got quite large (maybe ~150MB) and the failure
mode was RPC timeout exceptions. Nodes couldn't always move that much data
across our data center interconnect in the default 10 seconds. With enough
heap and a faster network you could probably get by without chunking, but
it's not ideal.

On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:

> Hello everybody,
>
> I'm being asked whether we can serve an "object", which I assume is a
> blob, of 750MB size?
> I guess the real question is of how to chunk it and/or even it's possible
> to chunk it.
>
> Thanks!
>
> Maxim
>
>


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Mohit Anchlia
In my opinion if you are busy site or application keep blobs out of the
database.

On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff  wrote:

> Chunking is a good idea, but you'll have to do it yourself. A few of the
> columns in our application got quite large (maybe ~150MB) and the failure
> mode was RPC timeout exceptions. Nodes couldn't always move that much data
> across our data center interconnect in the default 10 seconds. With enough
> heap and a faster network you could probably get by without chunking, but
> it's not ideal.
>
>
> On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:
>
>> Hello everybody,
>>
>> I'm being asked whether we can serve an "object", which I assume is a
>> blob, of 750MB size?
>> I guess the real question is of how to chunk it and/or even it's possible
>> to chunk it.
>>
>> Thanks!
>>
>> Maxim
>>
>>
>


Re: reads/s suddenly dropped

2012-02-22 Thread Franc Carter
On Mon, Feb 20, 2012 at 9:42 PM, Franc Carter wrote:

> On Mon, Feb 20, 2012 at 12:00 PM, aaron morton wrote:
>
>> Aside from iostats..
>>
>> nodetool cfstats will give you read and write latency for each CF. This
>> is the latency for the operation on each node. Check that to see if latency
>> is increasing.
>>
>> Take a look at nodetool compactionstats to see if compactions are running
>> at the same time. The IO is throttled but if you are on aws it may not be
>> throttled enough.
>>
>>
> compaction had finished
>
>
>> The sweet spot for non netflix deployments seems to be a m1.xlarge with
>> 16GB. THe JVM can have 8 and the rest can be used for memmapping files.
>> Here is a good post about choosing EC2 sizes…
>> http://perfcap.blogspot.co.nz/2011/03/understanding-and-using-amazon-ebs.html
>>
>
> Thanks - good article. I'll go up to m1.xlarge and explore that behaviour
>

the m1.xlarge is giving much better and more consistent results

thanks


>
> cheers
>
>
>
>>
>> Cheers
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 20/02/2012, at 9:31 AM, Franc Carter wrote:
>>
>> On Mon, Feb 20, 2012 at 4:10 AM, Philippe  wrote:
>>
>>> Perhaps your dataset can no longer be held in memory. Check iostats
>>>
>>
>> I have been flushing the keycache and dropping the linux disk caches
>> before each to avoid testing memory reads.
>>
>> One possibility that I thought of is that the success keys are now 'far
>> enough away' that they are not being included in the previous read and
>> hence the seek penalty has to be paid a lot more often  - viable ?
>>
>> cheers
>>
>>>
>>> Le 19 févr. 2012 11:24, "Franc Carter"  a
>>> écrit :
>>>
>>>
 I've been testing Cassandra - primarily looking at reads/second for our
 fairly data model - one unique key with a row of columns that we always
 request. I've now setup the cluster with with m1.large (2 cpus 8GB)

 I had loaded a months worth of data in and was doing random requests as
 a torture test - and getting very nice results. I then loaded another days
 worth of day and repeated the tests while the load was running - still 
 good.

 I then started loading more days and at some point the performance
 dropped by close to an order of magnitude ;-(

 Any ideas on what to look for ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215


>>
>>
>> --
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  
>> franc.car...@sirca.org.au | www.sirca.org.au
>> Tel: +61 2 9236 9118
>>  Level 9, 80 Clarence St, Sydney NSW 2000
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>>
>
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Rafael Almeida
Keep them where?



>
> From: Mohit Anchlia 
>To: user@cassandra.apache.org 
>Cc: potek...@bnl.gov 
>Sent: Wednesday, February 22, 2012 3:44 PM
>Subject: Re: Please advise -- 750MB object possible?
> 
>
>In my opinion if you are busy site or application keep blobs out of the 
>database.
>
>
>On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff  wrote:
>
>Chunking is a good idea, but you'll have to do it yourself. A few of the 
>columns in our application got quite large (maybe ~150MB) and the failure mode 
>was RPC timeout exceptions. Nodes couldn't always move that much data across 
>our data center interconnect in the default 10 seconds. With enough heap and a 
>faster network you could probably get by without chunking, but it's not ideal. 
>>
>>
>>
>>On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:
>>
>>Hello everybody,
>>>
>>>I'm being asked whether we can serve an "object", which I assume is a blob, 
>>>of 750MB size?
>>>I guess the real question is of how to chunk it and/or even it's possible to 
>>>chunk it.
>>>
>>>Thanks!
>>>
>>>Maxim
>>>
>>>
>>
>
>
>

Re: Please advise -- 750MB object possible?

2012-02-22 Thread R. Verlangen
I would suggest you chunk them down into small pieces (~ 10-50MB) and just
fetch all the parts you need. A problem might be that if fetching one
fails, the whole blob is useless.

2012/2/22 Rafael Almeida 

> Keep them where?
>
>   --
> *From:* Mohit Anchlia 
> *To:* user@cassandra.apache.org
> *Cc:* potek...@bnl.gov
> *Sent:* Wednesday, February 22, 2012 3:44 PM
> *Subject:* Re: Please advise -- 750MB object possible?
>
> In my opinion if you are busy site or application keep blobs out of the
> database.
>
> On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff  wrote:
>
> Chunking is a good idea, but you'll have to do it yourself. A few of the
> columns in our application got quite large (maybe ~150MB) and the failure
> mode was RPC timeout exceptions. Nodes couldn't always move that much data
> across our data center interconnect in the default 10 seconds. With enough
> heap and a faster network you could probably get by without chunking, but
> it's not ideal.
>
>
> On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:
>
> Hello everybody,
>
> I'm being asked whether we can serve an "object", which I assume is a
> blob, of 750MB size?
> I guess the real question is of how to chunk it and/or even it's possible
> to chunk it.
>
> Thanks!
>
> Maxim
>
>
>
>
>
>


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Mohit Anchlia
Outside on the file system and a pointer to it in C*

On Wed, Feb 22, 2012 at 10:03 AM, Rafael Almeida wrote:

>  Keep them where?
>
>   --
> *From:* Mohit Anchlia 
> *To:* user@cassandra.apache.org
> *Cc:* potek...@bnl.gov
> *Sent:* Wednesday, February 22, 2012 3:44 PM
> *Subject:* Re: Please advise -- 750MB object possible?
>
> In my opinion if you are busy site or application keep blobs out of the
> database.
>
> On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff  wrote:
>
> Chunking is a good idea, but you'll have to do it yourself. A few of the
> columns in our application got quite large (maybe ~150MB) and the failure
> mode was RPC timeout exceptions. Nodes couldn't always move that much data
> across our data center interconnect in the default 10 seconds. With enough
> heap and a faster network you could probably get by without chunking, but
> it's not ideal.
>
>
> On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:
>
> Hello everybody,
>
> I'm being asked whether we can serve an "object", which I assume is a
> blob, of 750MB size?
> I guess the real question is of how to chunk it and/or even it's possible
> to chunk it.
>
> Thanks!
>
> Maxim
>
>
>
>
>
>


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

The idea was to provide redundancy, resilience, automatic load balancing
and automatic repairs. Going the way of the file system does not achieve 
any of that.


Maxim


On 2/22/2012 1:34 PM, Mohit Anchlia wrote:

Outside on the file system and a pointer to it in C*

On Wed, Feb 22, 2012 at 10:03 AM, Rafael Almeida > wrote:


Keep them where?


*From:* Mohit Anchlia mailto:mohitanch...@gmail.com>>
*To:* user@cassandra.apache.org

*Cc:* potek...@bnl.gov 
*Sent:* Wednesday, February 22, 2012 3:44 PM
*Subject:* Re: Please advise -- 750MB object possible?

In my opinion if you are busy site or application keep blobs
out of the database.

On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff
mailto:dretzl...@gmail.com>> wrote:

Chunking is a good idea, but you'll have to do it
yourself. A few of the columns in our application got
quite large (maybe ~150MB) and the failure mode was RPC
timeout exceptions. Nodes couldn't always move that much
data across our data center interconnect in the default 10
seconds. With enough heap and a faster network you could
probably get by without chunking, but it's not ideal.


On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin
mailto:potek...@bnl.gov>> wrote:

Hello everybody,

I'm being asked whether we can serve an "object",
which I assume is a blob, of 750MB size?
I guess the real question is of how to chunk it and/or
even it's possible to chunk it.

Thanks!

Maxim










Re: Please advise -- 750MB object possible?

2012-02-22 Thread Mohit Anchlia
unless you use distributed fs

On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin  wrote:

> The idea was to provide redundancy, resilience, automatic load balancing
> and automatic repairs. Going the way of the file system does not achieve
> any of that.
>
> Maxim
>
>
>
> On 2/22/2012 1:34 PM, Mohit Anchlia wrote:
>
> Outside on the file system and a pointer to it in C*
>
> On Wed, Feb 22, 2012 at 10:03 AM, Rafael Almeida wrote:
>
>>  Keep them where?
>>
>>   --
>> *From:* Mohit Anchlia 
>> *To:* user@cassandra.apache.org
>> *Cc:* potek...@bnl.gov
>> *Sent:* Wednesday, February 22, 2012 3:44 PM
>> *Subject:* Re: Please advise -- 750MB object possible?
>>
>> In my opinion if you are busy site or application keep blobs out of the
>> database.
>>
>> On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff wrote:
>>
>> Chunking is a good idea, but you'll have to do it yourself. A few of the
>> columns in our application got quite large (maybe ~150MB) and the failure
>> mode was RPC timeout exceptions. Nodes couldn't always move that much data
>> across our data center interconnect in the default 10 seconds. With enough
>> heap and a faster network you could probably get by without chunking, but
>> it's not ideal.
>>
>>
>> On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin  wrote:
>>
>> Hello everybody,
>>
>> I'm being asked whether we can serve an "object", which I assume is a
>> blob, of 750MB size?
>> I guess the real question is of how to chunk it and/or even it's possible
>> to chunk it.
>>
>> Thanks!
>>
>> Maxim
>>
>>
>>
>>
>>
>>
>
>


Re: Flume and Cassandra

2012-02-22 Thread aaron morton
Maybe Storm is what you are looking for (as well as flume to get the messages 
from the network)
http://www.datastax.com/events/cassandranyc2011/presentations/marz

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/02/2012, at 2:23 AM, Alain RODRIGUEZ wrote:

> Thanks for answering.
> 
> "This is a good starting point 
> https://github.com/thobbs/flume-cassandra-plugin "
> 
> I already saw that, but it only does a raw store of the logs. I would like 
> too store them in a "smart way", I mean I'd like to store logs to be able to 
> use information contained into them.
> 
> If I have rows like : (date action/event/id_ad/id_transac)
> 
> 1 - 2012-02-17 18:22:09 track/display/4/70
> 2 - 2012-02-17 18:22:09 track/display/2/70
> 3 - 2012-02-17 18:22:09 track/display/3/70
> 4 - 2012-02-17 18:22:29 track/start/3/70
> 5 - 2012-02-17 18:22:39 track/firstQuartile/3/70
> 6 - 2012-02-17 18:22:46 track/midpoint/3/70
> 7 - 2012-02-17 18:22:53 track/complete/3/70
> 8 - 2012-02-17 18:23:02 track/click/3/70
> 
> I would like to process this logs to store in cassandra :
> 
> 1 - increment the display counter for the ad 4, find the transac with id "70" 
> in my database to get the id_product (let's say it's 19) and then increment 
> the display counter for product 19. I would also store a raw data like 
> event1: (event => display, ad => 4, transac => 70 ...)
> 
> 2 - ...
> ...
> 
> 7 - ...
> 
> 8 - increment the click counter for the ad 3, find the transac with id "70" 
> in my database to get the id_product (let's say it's 19) and then increment 
> the  click counter for product 19. I would also store a raw data like event8 
> : (event => click, ad => 3, transac => 70 ...) and update the status of the 
> transaction to a "finish" state.
> 
> I want a really custom behaviour, so I guess I'll have to build a specific 
> flume sink (or is there a generic and configurable sink existing somewhere ?).
> 
> Maybe should I use the flume-cassandra-plugin and process the data once 
> already stored rawly ? In this case, how to be sure that I have proceed all 
> the data and how to be sure doing it in real-time or near real-time ? Is this 
> performant ?
> 
> I hope you'll understand what I just wrote, it's not very simple, and I'm not 
> fluent in English. Don't hesitate asking for more explanation.  
> 
> The final goal of all this is to have statistics in near real-time, on the 
> same cluster than the OLTP which is critical to us. The real-time statistics 
> have to be slowed (and become near real-time stats) when we are in rush hours 
> in order to be fully performant in the business part.
> 
> Alain
> 
> 2012/2/10 aaron morton 
>> How to do it ? Do I need to build a custom plugin/sink or can I configure an 
>> existing sink to write data in a custom way ?
> This is a good starting point https://github.com/thobbs/flume-cassandra-plugin
> 
>> 2 - My business process also use my Cassandra DB (without flume, directly 
>> via thrift), how to ensure that log writing won't overload my database and 
>> introduce latency in my business process ?
> Anytime you have a data stream you don't control it's a good idea to put some 
> sort of buffer in there between the outside world and the database. Flume has 
> a buffered sync, I think your can subclass it and aggregate the counters for 
> a minute or two 
> http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_decorator_semantics
> 
> Hope that helps. 
> A
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:
> 
>> Hi,
>> 
>> 1 - I would like to generate some statistics and store some raw events from 
>> log files tailed with flume. I saw some plugins giving Cassandra sinks but I 
>> would like to store data in a custom way, storing raw data but also 
>> incrementing counters to get near real-time statistcis. How to do it ? Do I 
>> need to build a custom plugin/sink or can I configure an existing sink to 
>> write data in a custom way ?
>> 
>> 2 - My business process also use my Cassandra DB (without flume, directly 
>> via thrift), how to ensure that log writing won't overload my database and 
>> introduce latency in my business process ? I mean, is there a way to to 
>> manage the throughput sent by the flume's tails and slow them when my 
>> Cassandra cluster is overloaded ? I would like to avoid building 2 separated 
>> clusters.
>> 
>> Thank you,
>> 
>> Alain
>> 
> 
> 



Re: Issue regarding 'describe' keyword in 1.0.7 version.

2012-02-22 Thread aaron morton
Rishabh could you give an example ? Was this using the CLI ? 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/02/2012, at 2:55 AM, Dave Brosius wrote:

> What it's saying is if you define a KeySpace Foo and under it a ColumnFamily 
> called Foo, you won't be able to use describe to describe the ColumnFamily 
> named Foo.
> 
> 
> 
> On 02/21/2012 07:26 AM, Rishabh Agrawal wrote:
>> 
>> Hello,
>>  
>> I am newbie to Cassandra. Please bear with my lame doubts.
>> I running Cassandra version on 1.0.7 on Ubuntu. I found following case with 
>> describe:
>>  
>> If there is Keyspace with name ‘x’ then describe x command will give desired 
>> results. But if there is also a Column Family named ‘x’ then describe will 
>> not be able to catch it. But if there is only column family ‘x’ and no 
>> keyspace with the same name then describe x command will give desired 
>> results i.e. it will be able to capture and display info regarding ‘x’ 
>> column family.
>>  
>> Kindly help me with that.
>>  
>>  
>> Thanks and Regards
>> Rishabh Agrawal
>> 
>> 
>> Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big 
>> Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) 
>> http://bit.ly/bSMWd7. 
>> 
>> Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & 
>> Smartphones’ available at http://bit.ly/yQC1oD. 
>> 
>> 
>> NOTE: This message may contain information that is confidential, 
>> proprietary, privileged or otherwise protected by law. The message is 
>> intended solely for the named addressee. If received in error, please 
>> destroy and notify the sender. Any use of this email is prohibited when 
>> received in error. Impetus does not represent, warrant and/or guarantee, 
>> that the integrity of this communication has been maintained nor that the 
>> communication is free of errors, virus, interception or interference.
> 



Re: Logging 'write' operations

2012-02-22 Thread aaron morton
In theory you could grab the commit log, you would then have to work with 
internal cassandra structures to understand what is in there. 

Otherwise do it at the app level or design it into your data model. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/02/2012, at 7:18 AM, A J wrote:

> Hello,
> What is the best way to log write operations (insert,remove, counter
> add, batch operations) in Cassandra. I need to store the operations
> (with values being passed) in some fashion or the other for audit
> purposes (and possibly to undo some operation after inspection).
> 
> Thanks.



Re: Please advise -- 750MB object possible?

2012-02-22 Thread Rob Coli
On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin  wrote:

>  The idea was to provide redundancy, resilience, automatic load balancing
> and automatic repairs. Going the way of the file system does not achieve
> any of that.
>

(Apologies for continuing slightly OT thread, but if people google and find
this thread, I'd like to to contain the below relevant suggestion.. :D)

With the caveat that you would have to ensure that your client code streams
instead of buffering the entire object, you probably want something like
MogileFS :

http://danga.com/mogilefs/

I have operated a sizable MogileFS cluster for Digg, and it was one of the
simplest, most comprehensible and least error prone parts of our
infrastructure. A++ would run again.

-- 
=Robert Coli
rc...@palominodb.com


Chicago Cassandra Meetup on 3/1 (Preview of my Pycon talk)

2012-02-22 Thread Jeremiah Jordan
I am going to be doing a trial run of my Pycon talk about setting up a 
development instance of Cassandra and accessing it from Python (Pycassa 
mostly, some thrift just to scare people off of using thrift) for a 
Chicago Cassandra Meetup.  Anyone in Chicago feel free to come by.  The 
talk is next Thursday, 3/1.  See the Meetup listing for full time/place/etc.


http://www.meetup.com/Cassandra-Chicago/events/53378712/

If you are going to be at Pycon, I will be presenting on Friday 3/9 @ 2:40.
https://us.pycon.org/2012/schedule/presentation/122/

If anyone is interested we could probably get some kind of Cassandra 
Open Space going as well.  I see DataStax is a Pycon sponsor, are you 
guys planning anything?


-Jeremiah


nodetool ring runs very slow

2012-02-22 Thread Feng Qu
We noticed that nodetool ring sometimes returns in 17-20 sec while it normally 
runs in less than a sec. There were some compaction running when it happened. 
Did compaction cause nodetool slowness? Anything else I should check?
>>> time nodetool -h hostname ring

real 0m17.595s
user 0m0.339s
sys 0m0.054s
 
Feng Qu


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Maxim Potekhin

Thank you so much, looks nice, I'll be looking into it.


On 2/22/2012 3:08 PM, Rob Coli wrote:



On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin > wrote:


The idea was to provide redundancy, resilience, automatic load
balancing
and automatic repairs. Going the way of the file system does not
achieve any of that.


(Apologies for continuing slightly OT thread, but if people google and 
find this thread, I'd like to to contain the below relevant 
suggestion.. :D)


With the caveat that you would have to ensure that your client code 
streams instead of buffering the entire object, you probably want 
something like MogileFS :


http://danga.com/mogilefs/

I have operated a sizable MogileFS cluster for Digg, and it was one of 
the simplest, most comprehensible and least error prone parts of our 
infrastructure. A++ would run again.


--
=Robert Coli
rc...@palominodb.com 




Re: Please advise -- 750MB object possible?

2012-02-22 Thread Rustam Aliyev

Hi Maxim,

If you need to store Blobs, then BlobStores such as OpenStack Object 
Store (aka Swift) should be better choise.


As far as I know, MogileFS (which is also a sort of BlobStore) has 
scalability bottleneck - MySQL.


There are few reasons why BlobStores are better choise. In the 
following presentation, I summarised why we chose to store blobs for 
ElasticInbox on BlobStores, not Cassandra: 
http://www.elasticinbox.com/blog/slides-and-video-from-london-meetup/


Main downside of BlobStores in comparison to Cassandra is write speed. 
Cassandra writes to memtabes, BlobStores to disk.


-
Rustam.


On Wed Feb 22 22:19:26 2012, Maxim Potekhin wrote:

Thank you so much, looks nice, I'll be looking into it.


On 2/22/2012 3:08 PM, Rob Coli wrote:



On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin > wrote:


The idea was to provide redundancy, resilience, automatic load
balancing
and automatic repairs. Going the way of the file system does not
achieve any of that.


(Apologies for continuing slightly OT thread, but if people google 
and find this thread, I'd like to to contain the below relevant 
suggestion.. :D)


With the caveat that you would have to ensure that your client code 
streams instead of buffering the entire object, you probably want 
something like MogileFS :


http://danga.com/mogilefs/

I have operated a sizable MogileFS cluster for Digg, and it was one 
of the simplest, most comprehensible and least error prone parts of 
our infrastructure. A++ would run again.


--
=Robert Coli
rc...@palominodb.com 




Best suitable value for flush_largest_memtables_at

2012-02-22 Thread Roshan Pradeep
Hi Experts

Under massive write load what would be the best value for Cassandra *
flush_largest_memtables_at* setting? Yesterday I got an OOM exception in
one of our production Cassandra node under heavy write load within 5 minute
duration.

I change the above setting value to .45 and also change the
-XX:CMSInitiatingOccupancyFraction=45 in cassandra-env.sh file.

Previously the *flush_largest_memtables_at *was .75 and commit logs are
flush to SSTables and the size around 40MB. But with the change (reducing
it to .45) the flushed SStable size is 90MB.

Could someone please explain my configuration change will help under heavy
write load?

Thanks.


Re: Flume and Cassandra

2012-02-22 Thread Edward Capriolo
I have been working on IronCount
(https://github.com/edwardcapriolo/IronCount/) which is designed to do
what you are talking about. Kafka takes care of the distributed
producer/consumer message queues and IronCount sets up custom
consumers to process those messages.

It might be what your are looking for. It is not as fancy as
s4/storm/flume but that is supposed to be the charm of it.

On Wed, Feb 22, 2012 at 1:55 PM, aaron morton  wrote:
> Maybe Storm is what you are looking for (as well as flume to get the
> messages from the network)
> http://www.datastax.com/events/cassandranyc2011/presentations/marz
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/02/2012, at 2:23 AM, Alain RODRIGUEZ wrote:
>
> Thanks for answering.
>
> "This is a good starting point
> https://github.com/thobbs/flume-cassandra-plugin "
>
> I already saw that, but it only does a raw store of the logs. I would like
> too store them in a "smart way", I mean I'd like to store logs to be able to
> use information contained into them.
>
> If I have rows like : (date action/event/id_ad/id_transac)
>
> 1 - 2012-02-17 18:22:09 track/display/4/70
> 2 - 2012-02-17 18:22:09 track/display/2/70
> 3 - 2012-02-17 18:22:09 track/display/3/70
> 4 - 2012-02-17 18:22:29 track/start/3/70
> 5 - 2012-02-17 18:22:39 track/firstQuartile/3/70
> 6 - 2012-02-17 18:22:46 track/midpoint/3/70
> 7 - 2012-02-17 18:22:53 track/complete/3/70
> 8 - 2012-02-17 18:23:02 track/click/3/70
>
> I would like to process this logs to store in cassandra :
>
> 1 - increment the display counter for the ad 4, find the transac with id
> "70" in my database to get the id_product (let's say it's 19) and then
> increment the display counter for product 19. I would also store a raw data
> like event1: (event => display, ad => 4, transac => 70 ...)
>
> 2 - ...
> ...
>
> 7 - ...
>
> 8 - increment the click counter for the ad 3, find the transac with id "70"
> in my database to get the id_product (let's say it's 19) and then increment
> the  click counter for product 19. I would also store a raw data like event8
> : (event => click, ad => 3, transac => 70 ...) and update the status of the
> transaction to a "finish" state.
>
> I want a really custom behaviour, so I guess I'll have to build a specific
> flume sink (or is there a generic and configurable sink existing somewhere
> ?).
>
> Maybe should I use the flume-cassandra-plugin and process the data once
> already stored rawly ? In this case, how to be sure that I have proceed all
> the data and how to be sure doing it in real-time or near real-time ? Is
> this performant ?
>
> I hope you'll understand what I just wrote, it's not very simple, and I'm
> not fluent in English. Don't hesitate asking for more explanation.
>
> The final goal of all this is to have statistics in near real-time, on the
> same cluster than the OLTP which is critical to us. The real-time statistics
> have to be slowed (and become near real-time stats) when we are in rush
> hours in order to be fully performant in the business part.
>
> Alain
>
> 2012/2/10 aaron morton 
>>
>> How to do it ? Do I need to build a custom plugin/sink or can I configure
>> an existing sink to write data in a custom way ?
>>
>> This is a good starting
>> point https://github.com/thobbs/flume-cassandra-plugin
>>
>> 2 - My business process also use my Cassandra DB (without flume, directly
>> via thrift), how to ensure that log writing won't overload my database and
>> introduce latency in my business process ?
>>
>> Anytime you have a data stream you don't control it's a good idea to put
>> some sort of buffer in there between the outside world and the database.
>> Flume has a buffered sync, I think your can subclass it and aggregate the
>> counters for a minute or
>> two http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_decorator_semantics
>>
>> Hope that helps.
>> A
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:
>>
>> Hi,
>>
>> 1 - I would like to generate some statistics and store some raw events
>> from log files tailed with flume. I saw some plugins giving Cassandra sinks
>> but I would like to store data in a custom way, storing raw data but also
>> incrementing counters to get near real-time statistcis. How to do it ? Do I
>> need to build a custom plugin/sink or can I configure an existing sink to
>> write data in a custom way ?
>>
>> 2 - My business process also use my Cassandra DB (without flume, directly
>> via thrift), how to ensure that log writing won't overload my database and
>> introduce latency in my business process ? I mean, is there a way to to
>> manage the throughput sent by the flume's tails and slow them when my
>> Cassandra cluster is overloaded ? I would like to avoid building
>> 2 separated clusters.
>>
>> Thank you,
>>
>> Alain
>>
>>
>
>


Re: Flume and Cassandra

2012-02-22 Thread Milind Parikh
Coolwww.countandra.org calls them cascaded counters and it will be also
based on Kafka.

/***
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
/

On Feb 22, 2012 7:22 PM, "Edward Capriolo"  wrote:

I have been working on IronCount
(https://github.com/edwardcapriolo/IronCount/) which is designed to do
what you are talking about. Kafka takes care of the distributed
producer/consumer message queues and IronCount sets up custom
consumers to process those messages.

It might be what your are looking for. It is not as fancy as
s4/storm/flume but that is supposed to be the charm of it.


On Wed, Feb 22, 2012 at 1:55 PM, aaron morton 
wrote:
> Maybe Storm is wha...


Re: Please advise -- 750MB object possible?

2012-02-22 Thread Edward Capriolo
Someone has backended mongo's gridfs into Cassandra but I can not find
it on Github atm

On Wed, Feb 22, 2012 at 6:51 PM, Rustam Aliyev  wrote:
> Hi Maxim,
>
> If you need to store Blobs, then BlobStores such as OpenStack Object Store
> (aka Swift) should be better choise.
>
> As far as I know, MogileFS (which is also a sort of BlobStore) has
> scalability bottleneck - MySQL.
>
> There are few reasons why BlobStores are better choise. In the following
> presentation, I summarised why we chose to store blobs for ElasticInbox on
> BlobStores, not Cassandra:
> http://www.elasticinbox.com/blog/slides-and-video-from-london-meetup/
>
> Main downside of BlobStores in comparison to Cassandra is write speed.
> Cassandra writes to memtabes, BlobStores to disk.
>
> -
> Rustam.
>
>
> On Wed Feb 22 22:19:26 2012, Maxim Potekhin wrote:
>>
>> Thank you so much, looks nice, I'll be looking into it.
>>
>>
>> On 2/22/2012 3:08 PM, Rob Coli wrote:
>>>
>>>
>>>
>>> On Wed, Feb 22, 2012 at 10:37 AM, Maxim Potekhin >> > wrote:
>>>
>>>    The idea was to provide redundancy, resilience, automatic load
>>>    balancing
>>>    and automatic repairs. Going the way of the file system does not
>>>    achieve any of that.
>>>
>>>
>>> (Apologies for continuing slightly OT thread, but if people google and
>>> find this thread, I'd like to to contain the below relevant suggestion.. :D)
>>>
>>> With the caveat that you would have to ensure that your client code
>>> streams instead of buffering the entire object, you probably want something
>>> like MogileFS :
>>>
>>> http://danga.com/mogilefs/
>>>
>>> I have operated a sizable MogileFS cluster for Digg, and it was one of
>>> the simplest, most comprehensible and least error prone parts of our
>>> infrastructure. A++ would run again.
>>>
>>> --
>>> =Robert Coli
>>> rc...@palominodb.com 
>>
>>
>


insert performance

2012-02-22 Thread Deno Vichas

all,

would i be better off (i'm in java land) with spawning a bunch of
threads that all add a single item to a mutator or a single thread that
adds a bunch of items to a mutator?


thanks,
deno



Re: How to delete a range of columns using first N components of CompositeType Column?

2012-02-22 Thread Praveen Baratam
More precisely,

Lets say we have a CF with the following spec.

create column family Test
with comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)'
and key_validation_class = 'UTF8Type'
and default_validation_class = 'UTF8Type';

And I have columns such as:

Jack:Name:First - Jackson
Jack:Name:Last -  Samuel
Jack:Age - 50

Now To delete all columns related to Jack, I need to use as far as I can
comprehend

Delete 'Jack:Name:First', 'Jack:Name:Last', 'Jack:Age' from Test where KEY
= "friends";

The problem is we do not usually know what meta-data is associated with a
user as it may include Timestamp based columns.

such as: Jack:1234567890:Location - Chicago

Can something like -

Delete 'Jack' from Test where KEY = "friends";

be done using the First N components of the CompositeType?

Or should we read first and then delete?

Thank You.

On Thu, Feb 23, 2012 at 4:47 AM, Praveen Baratam
wrote:

> I am using CompositeType columns and its very convenient to query for a
> range of columns using the *First N *components but how do I delete a
> range of columns using the First N components of the CompositeType column.
>
> In order to specify the exact column names to delete, I would have to read
> first and then delete.
>
> Is there a better way?
>