Re: Prevent queries from OOM nodes

2012-10-02 Thread Sylvain Lebresne
> Could you create one ?
> https://issues.apache.org/jira/browse/CASSANDRA

There's one already. See
https://issues.apache.org/jira/browse/CASSANDRA-3702 that redirect to
https://issues.apache.org/jira/browse/CASSANDRA-4415.

--
Sylvain


Re: Why data tripled in size after repair?

2012-10-02 Thread Sylvain Lebresne
> It's in the 1.1 branch; I don't remember if it went into a release
> yet. If not, it'll be in the next 1.1.x release.

As the ticket says, this is in since 1.1.1. I don't pretend this is
well documented, but it's in.

--
Sylvain


Re: Advice on correct storage configuration

2012-10-02 Thread Lewis John Mcgibbney
Hi Dean,

Thanks for the feedback.

On Mon, Oct 1, 2012 at 3:12 PM, Hiller, Dean  wrote:
> What is really going to matter is what is the applications trying to read?
>  That is really the critical piece of context.  Without knowing what the
> application needs to read, it is very hard to design.
>

OK so as I suspected, my actual description of the data which is going
to be stored in Cassandra and of course the use cases the data will be
subject to were not described as verbosely as is required to get more
substantial feedback. The reason I didn't go into more fine grained
detail regarding typical requirements for cf's, c's and sc's is that
webpage data can change quite substantially between pages, hosts, etc.

Some context here. We recently introduced a whole series of new
serializer options in Apache Gora gora-cassandra module 0.2.1 [0]
however I seem to be having problems populating Cassandra with certain
super column fields when mapping from webpages to super columns. I'm
trying to determine if each field (for the webpage --> cassandra
mapping) is correctly configured to store and retrieve the data
efficiently.

Thanks for your comments, I'll go away and have a more thorough think
+ test various configs in an attempt to find a best option.

Thanks

Lewis

[0] 
http://svn.apache.org/repos/asf/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/serializers/


Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Niteesh kumar
while looking at netstat table i observed that my cluster nodes not 
using persistent connection  to talk among themselves on port 9160 to 
redirect request. I also observed that local write latency is around 
30-40 microsecond, while its takes around .5 miliseconds if the chosen 
node is not the node responsible for the key for 50K QPS. I think this 
attributes to connection making time among servers as my servers are on 
same rack.


how can i configure my servers to use persistent connection on port 9160 
thus exclude connection making time for each request that is redirected...


RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
9160 is a client port. Nodes are using messaging service on storage_port (7000) 
for intra-node communication.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.> -Original Message-
> From: Niteesh kumar [mailto:nitees...@directi.com]
> Sent: Tuesday, October 02, 2012 12:32
> To: user@cassandra.apache.org
> Subject: Persistent connection among nodes to communicate and redirect
> request
>
> while looking at netstat table i observed that my cluster nodes not using
> persistent connection  to talk among themselves on port 9160 to redirect
> request. I also observed that local write latency is around
> 30-40 microsecond, while its takes around .5 miliseconds if the chosen node
> is not the node responsible for the key for 50K QPS. I think this attributes 
> to
> connection making time among servers as my servers are on same rack.
>
> how can i configure my servers to use persistent connection on port 9160
> thus exclude connection making time for each request that is redirected...


Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread rohit bhatia
i guess 7000 is only for gossip protocol. Cassandra still uses 9160
for RPC even among nodes
Also, I see Connections over port 9160 among various cassandra Nodes
in my cluster.
Please correct me if i am wrong..

PS: mentioned Here http://wiki.apache.org/cassandra/CloudConfig

On Tue, Oct 2, 2012 at 4:56 PM, Viktor Jevdokimov
 wrote:
> 9160 is a client port. Nodes are using messaging service on storage_port 
> (7000) for intra-node communication.
>
>
> Best regards / Pagarbiai
>
> Viktor Jevdokimov
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063
> Fax: +370 5 261 0453
>
> J. Jasinskio 16C,
> LT-01112 Vilnius,
> Lithuania
>
>
>
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.> -Original Message-
>> From: Niteesh kumar [mailto:nitees...@directi.com]
>> Sent: Tuesday, October 02, 2012 12:32
>> To: user@cassandra.apache.org
>> Subject: Persistent connection among nodes to communicate and redirect
>> request
>>
>> while looking at netstat table i observed that my cluster nodes not using
>> persistent connection  to talk among themselves on port 9160 to redirect
>> request. I also observed that local write latency is around
>> 30-40 microsecond, while its takes around .5 miliseconds if the chosen node
>> is not the node responsible for the key for 50K QPS. I think this attributes 
>> to
>> connection making time among servers as my servers are on same rack.
>>
>> how can i configure my servers to use persistent connection on port 9160
>> thus exclude connection making time for each request that is redirected...


Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh
On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne  wrote:
>> It's in the 1.1 branch; I don't remember if it went into a release
>> yet. If not, it'll be in the next 1.1.x release.
>
> As the ticket says, this is in since 1.1.1. I don't pretend this is
> well documented, but it's in.
>
Nope. It is in 1.1.1 only. I run 1.1.5, it doesn't have it.


1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Hiller, Dean
So basically, with moving towards the 1000's of CF all being put in one
CF, our performance is going to tank on map/reduce, correct?  I mean, from
what I remember we could do map/reduce on a single CF, but by stuffing
1000's of virtual Cf's into one CF, our map/reduce will have to read in
all 999 virtual CF's rows that we don't want just to map/reduce the ONE CF.

Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.

Is this correct?  This really sounds like highly undesirable behavior.
There needs to be a way for people with 1000's of CF's to also run
map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
rows will be 1000 times slowerŠ.and of course, we will most likely get up
to 20,000 tables from my most recent projectionsŠ.our last test load, we
ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
started getting really REALLY slow when we got up to 15k+ CF's in the
system though I didn't look into why.

I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
map/reduce "just" the virtual CF!  Ugh.

Thanks,
Dean

On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>wrote:
>> Its just a convenient way of prefixing:
>> 
>>http://hector-client.github.com/hector/build/html/content/virtual_keyspac
>>es.html
>
>So given that it is possible to use a CF per tenant, should we assume
>that there at sufficient scale that there is less overhead to prefix
>keys than there is to manage multiple CFs?
>
>Ben



Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, 
  to address your question, read my last post but to summarize, yes, there
is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
when doing map/reduce.  Doing map/reduce, you will now have HUGE overhead
in reading a whole slew of rows you don't care about as you can't
map/reduce a single virtual CF but must map/reduce the whole CF wasting
TONS of resources.

Thanks,
Dean

On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>wrote:
>> Its just a convenient way of prefixing:
>> 
>>http://hector-client.github.com/hector/build/html/content/virtual_keyspac
>>es.html
>
>So given that it is possible to use a CF per tenant, should we assume
>that there at sufficient scale that there is less overhead to prefix
>keys than there is to manage multiple CFs?
>
>Ben



Re: Read latency issue

2012-10-02 Thread Hiller, Dean
Interesting results.  With PlayOrm, we did a 6 node test of reading 100 rows 
from 1,000,000 using PlayOrm Scalable SQL.  It only took 60ms.  Maybe we have 
better hardware though???  We are using 7200 RPM drives so nothing fancy on the 
disk side of things.  More nodes puts at a higher throughput though as reading 
from more disks will be faster.  Anyways, you may want to play with more nodes 
and re-run.  If you run a test with PlayOrm, I would love to know the results 
there as well.

Later,
Dean

From: Arindam Barua mailto:aba...@247-inc.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, October 1, 2012 4:57 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Read latency issue

unning a query to like “select * from  where atag=”, where 
‘atag’ is the first column of the composite key, from either JDBC or Hector 
(equivalent code), results in read times of 200-300ms from a remote host on the 
same network. The query returned around 800 results. Running the same query on 
a Cassandra host results in a read time of ~110-130 ms.
Using read consistency of ONE reduces the read latency by ~20ms, compared to 
using QUORUM.

Enabling row cache did not seem to change the performance much. Moreover, the 
row cache ‘size’ according to nodetool was very tiny. Here is a snapshot of the 
nodetool info after running few read tests:
Key Cache: size 2448 (bytes), capacity 104857584 (bytes), 231 hits, 266 
requests, 1.000 recent hit rate, 14400 save period in seconds
Row Cache: size 96 (bytes), capacity 4194304000 (bytes), 9 hits, 13 
requests, NaN recent hit rate, 0 save period in seconds



Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean,

On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean  wrote:
> Ben,
>   to address your question, read my last post but to summarize, yes, there
> is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
> when doing map/reduce.  Doing map/reduce, you will now have HUGE overhead
> in reading a whole slew of rows you don't care about as you can't
> map/reduce a single virtual CF but must map/reduce the whole CF wasting
> TONS of resources.

That's a good point that I hadn't considered beforehand, especially as
I'd like to run MR jobs against these CFs.

Is this limitation inherent in the way that Cassandra is modelled as
input for Hadoop or could you write a custom slice query to only feed
in one particular prefix into Hadoop?

Cheers,

Ben


RE: Read latency issue

2012-10-02 Thread Roshni Rajagopal

Arindam,
  Did you also try the cassandra stress tool & compare results?
I havent done a performance test as yet, the only ones published on the 
internet are of YCSB on an older version of apache cassandra, and it doesn't 
seem to be actively supported or 
updatedhttp://www.brianfrankcooper.net/pubs/ycsb-v4.pdf. 
The numbers you have sound very low, for a read of a row by key which should 
have been the fastest.  I hope someone can help investigate or share numbers 
from their tests.
 
Regards,Roshni 

> From: dean.hil...@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 2 Oct 2012 06:41:09 -0600
> Subject: Re: Read latency issue
> 
> Interesting results.  With PlayOrm, we did a 6 node test of reading 100 rows 
> from 1,000,000 using PlayOrm Scalable SQL.  It only took 60ms.  Maybe we have 
> better hardware though???  We are using 7200 RPM drives so nothing fancy on 
> the disk side of things.  More nodes puts at a higher throughput though as 
> reading from more disks will be faster.  Anyways, you may want to play with 
> more nodes and re-run.  If you run a test with PlayOrm, I would love to know 
> the results there as well.
> 
> Later,
> Dean
> 
> From: Arindam Barua mailto:aba...@247-inc.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Monday, October 1, 2012 4:57 PM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Read latency issue
> 
> unning a query to like “select * from  where atag=”, where 
> ‘atag’ is the first column of the composite key, from either JDBC or Hector 
> (equivalent code), results in read times of 200-300ms from a remote host on 
> the same network. The query returned around 800 results. Running the same 
> query on a Cassandra host results in a read time of ~110-130 ms.
> Using read consistency of ONE reduces the read latency by ~20ms, compared to 
> using QUORUM.
> 
> Enabling row cache did not seem to change the performance much. Moreover, the 
> row cache ‘size’ according to nodetool was very tiny. Here is a snapshot of 
> the nodetool info after running few read tests:
> Key Cache: size 2448 (bytes), capacity 104857584 (bytes), 231 hits, 
> 266 requests, 1.000 recent hit rate, 14400 save period in seconds
> Row Cache: size 96 (bytes), capacity 4194304000 (bytes), 9 hits, 13 
> requests, NaN recent hit rate, 0 save period in seconds
> 
  

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill

Without putting too much thought into it...

Given the underlying architecture, I think you could/would have to write
your own partitioner, which would partition based on the prefix/virtual
keyspace.  

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42   €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:00 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>Dean,
>
>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean  wrote:
>> Ben,
>>   to address your question, read my last post but to summarize, yes,
>>there
>> is less overhead in memory to prefix keys than manage multiple Cfs
>>EXCEPT
>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>overhead
>> in reading a whole slew of rows you don't care about as you can't
>> map/reduce a single virtual CF but must map/reduce the whole CF wasting
>> TONS of resources.
>
>That's a good point that I hadn't considered beforehand, especially as
>I'd like to run MR jobs against these CFs.
>
>Is this limitation inherent in the way that Cassandra is modelled as
>input for Hadoop or could you write a custom slice query to only feed
>in one particular prefix into Hadoop?
>
>Cheers,
>
>Ben




Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Thanks for the idea but…(but please keep thinking on it)...

100% what we don't want since partitioned data resides on the same node.
I want to map/reduce the column families and leverage the parallel disks

:( :(

I am sure others would want to do the same…..We almost need a feature of
virtual Column Families and column family should really not be column
family but should be called ReplicationGroup or something where
replication is configured for all CF's in that group.

ANYONE have any other ideas???

Dean

On 10/2/12 7:20 AM, "Brian O'Neill"  wrote:

>
>Without putting too much thought into it...
>
>Given the underlying architecture, I think you could/would have to write
>your own partitioner, which would partition based on the prefix/virtual
>keyspace.  
>
>-brian
>
>---
>Brian O'Neill
>Lead Architect, Software Development
> 
>Health Market Science
>The Science of Better Results
>2700 Horizon Drive € King of Prussia, PA € 19406
>M: 215.588.6024 € @boneill42   €
>healthmarketscience.com
>
>This information transmitted in this email message is for the intended
>recipient only and may contain confidential and/or privileged material. If
>you received this email in error and are not the intended recipient, or
>the person responsible to deliver it to the intended recipient, please
>contact the sender at the email above and delete this email and any
>attachments and destroy any copies thereof. Any review, retransmission,
>dissemination, copying or other use of, or taking any action in reliance
>upon, this information by persons or entities other than the intended
>recipient is strictly prohibited.
> 
>
>
>
>
>
>
>On 10/2/12 9:00 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>
>>Dean,
>>
>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean 
>>wrote:
>>> Ben,
>>>   to address your question, read my last post but to summarize, yes,
>>>there
>>> is less overhead in memory to prefix keys than manage multiple Cfs
>>>EXCEPT
>>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>>overhead
>>> in reading a whole slew of rows you don't care about as you can't
>>> map/reduce a single virtual CF but must map/reduce the whole CF wasting
>>> TONS of resources.
>>
>>That's a good point that I hadn't considered beforehand, especially as
>>I'd like to run MR jobs against these CFs.
>>
>>Is this limitation inherent in the way that Cassandra is modelled as
>>input for Hadoop or could you write a custom slice query to only feed
>>in one particular prefix into Hadoop?
>>
>>Cheers,
>>
>>Ben
>
>



Re: 1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Brian O'Neill
Dean,

Great point.  I hadn't considered that either.  Per my other email, think
we would need a custom partitioner for this? (a mix of
OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix)

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42   ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 8:35 AM, "Hiller, Dean"  wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>




Re: 1000's of column families

2012-10-02 Thread Brian O'Neill

Agreed. 

Do we know yet what the overhead is for each column family?  What is the
limit?
If you have a SINGLE keyspace w/ 2+ CF's, what happens?  Anyone know?

-brian


---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42   •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:28 AM, "Hiller, Dean"  wrote:

>Thanks for the idea but…(but please keep thinking on it)...
>
>100% what we don't want since partitioned data resides on the same node.
>I want to map/reduce the column families and leverage the parallel disks
>
>:( :(
>
>I am sure others would want to do the same…..We almost need a feature of
>virtual Column Families and column family should really not be column
>family but should be called ReplicationGroup or something where
>replication is configured for all CF's in that group.
>
>ANYONE have any other ideas???
>
>Dean
>
>On 10/2/12 7:20 AM, "Brian O'Neill"  wrote:
>
>>
>>Without putting too much thought into it...
>>
>>Given the underlying architecture, I think you could/would have to write
>>your own partitioner, which would partition based on the prefix/virtual
>>keyspace.  
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42   €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 10/2/12 9:00 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>>
>>>Dean,
>>>
>>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean 
>>>wrote:
 Ben,
   to address your question, read my last post but to summarize, yes,
there
 is less overhead in memory to prefix keys than manage multiple Cfs
EXCEPT
 when doing map/reduce.  Doing map/reduce, you will now have HUGE
overhead
 in reading a whole slew of rows you don't care about as you can't
 map/reduce a single virtual CF but must map/reduce the whole CF
wasting
 TONS of resources.
>>>
>>>That's a good point that I hadn't considered beforehand, especially as
>>>I'd like to run MR jobs against these CFs.
>>>
>>>Is this limitation inherent in the way that Cassandra is modelled as
>>>input for Hadoop or could you write a custom slice query to only feed
>>>in one particular prefix into Hadoop?
>>>
>>>Cheers,
>>>
>>>Ben
>>
>>
>




Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Hiller, Dean
Well, I think I know the direction we may follow so we can
1. Have Virtual CF's
2. Be able to map/reduce ONE Virtual CF

Well, not map/reduce exactly but really really close.  We use PlayOrm with
it's partitioning so I am now thinking what we will do is have a compute
grid  where we can have each node doing a findAll query into the
partitions it is responsible for.  In this way, I think we can 1000's of
virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
the rows for that partition of one virtual CF.

Anyone know of a computer grid we can dish out work to?  That would be my
only missing piece (well, that and the PlayOrm virtual CF feature but I
can add that within a week probably though I am on vacation this Thursday
to monday).

Later,
Dean


On 10/2/12 6:35 AM, "Hiller, Dean"  wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>



Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Brian O'Neill

Dean,

We moved away from Hadoop and M/R, and instead we are using Storm as our
compute grid.  We queue keys in Kafka, then Storm distributes the work to
the grid.  Its working well so far, but we haven't taken it to prod yet.
Data is read from Cassandra using a Cassandra-bolt.

If you end up using Storm, let me know.  We have an unreleased version of
the bolt that you probably want to use.  (we're waiting on Nathan/Storm to
fix some classpath loading issues)

RE: a customer virtual keyspace Partitioner, point well taken

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42   ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:33 AM, "Hiller, Dean"  wrote:

>Well, I think I know the direction we may follow so we can
>1. Have Virtual CF's
>2. Be able to map/reduce ONE Virtual CF
>
>Well, not map/reduce exactly but really really close.  We use PlayOrm with
>it's partitioning so I am now thinking what we will do is have a compute
>grid  where we can have each node doing a findAll query into the
>partitions it is responsible for.  In this way, I think we can 1000's of
>virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
>the rows for that partition of one virtual CF.
>
>Anyone know of a computer grid we can dish out work to?  That would be my
>only missing piece (well, that and the PlayOrm virtual CF feature but I
>can add that within a week probably though I am on vacation this Thursday
>to monday).
>
>Later,
>Dean
>
>
>On 10/2/12 6:35 AM, "Hiller, Dean"  wrote:
>
>>So basically, with moving towards the 1000's of CF all being put in one
>>CF, our performance is going to tank on map/reduce, correct?  I mean,
>>from
>>what I remember we could do map/reduce on a single CF, but by stuffing
>>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>>CF.
>>
>>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>>
>>Is this correct?  This really sounds like highly undesirable behavior.
>>There needs to be a way for people with 1000's of CF's to also run
>>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>>started getting really REALLY slow when we got up to 15k+ CF's in the
>>system though I didn't look into why.
>>
>>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>>map/reduce "just" the virtual CF!  Ugh.
>>
>>Thanks,
>>Dean
>>
>>On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote:
>>
>>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill 
>>>wrote:
 Its just a convenient way of prefixing:
 
http://hector-client.github.com/hector/build/html/content/virtual_keysp
a
c
es.html
>>>
>>>So given that it is possible to use a CF per tenant, should we assume
>>>that there at sufficient scale that there is less overhead to prefix
>>>keys than there is to manage multiple CFs?
>>>
>>>Ben
>>
>




Re: 1000's of column families

2012-10-02 Thread Ben Hood
Brian,

On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill  wrote:
>
> Without putting too much thought into it...
>
> Given the underlying architecture, I think you could/would have to write
> your own partitioner, which would partition based on the prefix/virtual
> keyspace.

I might be barking up the wrong tree here, but looking at source of
ColumnFamilyInputFormat, it seems that you can specify a KeyRange for
the input, but only when you use an order preserving partitioner. So I
presume that if you are using the RandomPartitioner, you are
effectively doing a full CF scan (i.e. including all tenants in your
system).

Ben


Re: Getting serialized Rows from CommitLogSegment file

2012-10-02 Thread Felipe Schmidt
I found a way how to do it, but now I have other issue.

I'm getting a problem when trying to get the ColumnFamily using the CfId.
This information is important to deserialize the stored ColumnFamily.
When I try to use the method Schema.instance.getCF(cfId) to take the
Pair it throws an
'UnknownColumnFamilyException'.

Seems like the information was dropped or, maybe, not existent in this
instance of the Schema. But, as soon as I know, it's just one instance of
the schema in Cassandra, right?

If I'm mistaking/misunderstanding some structural concept of the Cassandra
here, or if it has other way to take this information, please let me know.

Thanks in advance.

Regards,
Felipe Mathias Schmidt
*(Computer Science UFRGS, RS, Brazil)*





2012/10/1 Felipe Schmidt 

> Hello.
>
> I'm trying to catch the serialized RowMutations from a CommitLogSegment to
> capture the data change, but I don't have much idea about how to proceed.
> Some one know a way of how to do it? I supposed that it would be kind of
> simple.
>
> Regards,
> Felipe Mathias Schmidt
> *(Computer Science UFRGS, RS, Brazil)*
>
>
>
>


Re: Getting serialized Rows from CommitLogSegment file

2012-10-02 Thread Ben Hood
Filipe,

On Tue, Oct 2, 2012 at 2:56 PM, Felipe Schmidt  wrote:
> Seems like the information was dropped or, maybe, not existent in this
> instance of the Schema. But, as soon as I know, it's just one instance of
> the schema in Cassandra, right?

If I understand you correctly, you are trying to process the commit
log to get a change list?

If so, then this question has been asked and the general consensus is
that whilst being possible, the commit log is an internal apparatus
subject to change that is not guaranteed to give you the information
you think you should get. Other suggested approaches include producing
your event stream of mutations using AOP or multiplexing change events
on the app layer as they go into Cassandra.

HTH,

Ben


RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
Never seen connections between nodes on 9160 port, 7000 only.

>From the source code, for example, thrift request goes to rpc port 9160 
>(org.apache.cassandra.thrift.CassandraDaemon, 
>org.apache.cassandra.thrift.CassandraServer), then to StorageProxy 
>(org.apache.cassandra.service.StorageProxy), which forward request (if needed) 
>to other endpoints via MessagingService 
>(org.apache.cassandra.net.MessagingService), which uses storage_port from 
>yaml, not a thrift port (rpc_port in yaml). What else could be wrong? Wiki or 
>source code?




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.> -Original Message-
> From: rohit bhatia [mailto:rohit2...@gmail.com]
> Sent: Tuesday, October 02, 2012 14:35
> To: user@cassandra.apache.org
> Subject: Re: Persistent connection among nodes to communicate and
> redirect request
>
> i guess 7000 is only for gossip protocol. Cassandra still uses 9160 for RPC 
> even
> among nodes
> Also, I see Connections over port 9160 among various cassandra Nodes in my
> cluster.
> Please correct me if i am wrong..
>
> PS: mentioned Here http://wiki.apache.org/cassandra/CloudConfig
>
> On Tue, Oct 2, 2012 at 4:56 PM, Viktor Jevdokimov
>  wrote:
> > 9160 is a client port. Nodes are using messaging service on storage_port
> (7000) for intra-node communication.
> >
> >
> > Best regards / Pagarbiai
> >
> > Viktor Jevdokimov
> > Senior Developer
> >
> > Email: viktor.jevdoki...@adform.com
> > Phone: +370 5 212 3063
> > Fax: +370 5 261 0453
> >
> > J. Jasinskio 16C,
> > LT-01112 Vilnius,
> > Lithuania
> >
> >
> >
> > Disclaimer: The information contained in this message and attachments
> > is intended solely for the attention and use of the named addressee
> > and may be confidential. If you are not the intended recipient, you
> > are reminded that the information remains the property of the sender.
> > You must not use, disclose, distribute, copy, print or rely on this
> > e-mail. If you have received this message in error, please contact the
> > sender immediately and irrevocably delete this message and any
> > copies.> -Original Message-
> >> From: Niteesh kumar [mailto:nitees...@directi.com]
> >> Sent: Tuesday, October 02, 2012 12:32
> >> To: user@cassandra.apache.org
> >> Subject: Persistent connection among nodes to communicate and
> >> redirect request
> >>
> >> while looking at netstat table i observed that my cluster nodes not
> >> using persistent connection  to talk among themselves on port 9160 to
> >> redirect request. I also observed that local write latency is around
> >> 30-40 microsecond, while its takes around .5 miliseconds if the
> >> chosen node is not the node responsible for the key for 50K QPS. I
> >> think this attributes to connection making time among servers as my
> servers are on same rack.
> >>
> >> how can i configure my servers to use persistent connection on port
> >> 9160 thus exclude connection making time for each request that is
> redirected...



Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly.

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42   €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:55 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>Brian,
>
>On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill  wrote:
>>
>> Without putting too much thought into it...
>>
>> Given the underlying architecture, I think you could/would have to write
>> your own partitioner, which would partition based on the prefix/virtual
>> keyspace.
>
>I might be barking up the wrong tree here, but looking at source of
>ColumnFamilyInputFormat, it seems that you can specify a KeyRange for
>the input, but only when you use an order preserving partitioner. So I
>presume that if you are using the RandomPartitioner, you are
>effectively doing a full CF scan (i.e. including all tenants in your
>system).
>
>Ben




easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
If I understand –pr correctly…

 1.  -pr forces only the current nodes' stables to be fixed (so I run on each 
node once)
 2.  Can I run node tool –pr repair on just 1/RF of my nodes if I do the 
correct nodes?
 3.  Without the –pr, it will fix all the stuff on the current node AND the 
nodes with replicas?
 4.  But is it true replicas may have other data that still needs repair as 
well so I would still need to go to those nodes and run ./nodetool –pr repair?
 5.  I ran nodetool –pr on EVERY node, but I still see rows that were 
deleted…well, I see their keys only so the columns appear to be deleted 
correctly.  Why are the row keys still there though? (the cassandra-cli and 
cqlsh both have the same results).
 6.  The below is very confusing where it says "only first range returned by 
partitioner for a node is repaired".  This makes it sound like only a range of 
keys is repaired on the current node and the other rows on the current node are 
not touched.  How to correctly read that sentence???  Would it be more accurate 
to say "All the rows that this node is responsible for are repaired" vs. not 
using –pr means whole cluster is repaired or is that incorrect?

Begins an anti-entropy node repair operation. If the -pr option is specified, 
only the first range returned by the partitioner for a node is repaired. This 
allows you to repair each node in the cluster in succession without duplicating 
work.

Without -pr, all replica ranges that the node is responsible for are repaired.

Optionally takes a list of column family names.




Re: 1000's of column families

2012-10-02 Thread Ben Hood
On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill  wrote:
> Exactly.

So you're back to the deliberation between using multiple CFs
(potentially with some known working upper bound*) or feeding your map
reduce in some other way (as you decided to do with Storm). In my
particular scenario I'd like to be able to do a combination of some
batch processing on top of less frequently changing data (hence why I
was looking at Hadoop) and some real time analytics.

Cheers,

Ben

(*) Not sure whether this applies to an individual keyspace or an
entire cluster.


Re: easy repair questions on -pr

2012-10-02 Thread Sylvain Lebresne
The short version is: there is 2 use case for nodetool repair:
  1) For periodic repair of the whole cluster
(http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair).
In that case, you should run repair *with* -pr and you should run it
on *every* node.
  2) When a node has been down for a long time (for instance long
enough that hints may have been dropped), and you want to repair that
node specifically. In that case, you should run repair on that node
only and you should use it *without* -pr.

As for the gory details, nodetool repair without -pr will repair all
the range of the node on which the repair is done. But when a range is
repaired, it is repaired on *all* replica. In other words, a repair on
node A will also repair parts of other nodes that share a range with
A. That why, in the case 1) above, where you want to repair the whole
cluster, a repair without -pr is inefficient, because if you repair A
and B and both are replica for the same range, you will duplicate the
work. Hence repair -pr: on one node it repair only its primary range
(but all replica for said range), and so if you do that on every node,
you will have effectively repair the whole cluster without having
repaired the same range twice.

> Can I run node tool –pr repair on just 1/RF of my nodes if I do the correct 
> nodes?

As it's hopefully clear from the description above, no.

>  Why are the row keys still there though?

http://wiki.apache.org/cassandra/FAQ#range_ghosts


--
Sylvain


Re: easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
GREAT answer, thanks and one last questionŠ

So, I suspect I can expect those rows to finally go away when queried from
cassandra-cli once GCGraceSeconds has passed then?

Or will they always be there forever and ever and ever(this can't be true,
right).

Thanks,
Dean

On 10/2/12 9:34 AM, "Sylvain Lebresne"  wrote:

>The short version is: there is 2 use case for nodetool repair:
>  1) For periodic repair of the whole cluster
>(http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair)
>.
>In that case, you should run repair *with* -pr and you should run it
>on *every* node.
>  2) When a node has been down for a long time (for instance long
>enough that hints may have been dropped), and you want to repair that
>node specifically. In that case, you should run repair on that node
>only and you should use it *without* -pr.
>
>As for the gory details, nodetool repair without -pr will repair all
>the range of the node on which the repair is done. But when a range is
>repaired, it is repaired on *all* replica. In other words, a repair on
>node A will also repair parts of other nodes that share a range with
>A. That why, in the case 1) above, where you want to repair the whole
>cluster, a repair without -pr is inefficient, because if you repair A
>and B and both are replica for the same range, you will duplicate the
>work. Hence repair -pr: on one node it repair only its primary range
>(but all replica for said range), and so if you do that on every node,
>you will have effectively repair the whole cluster without having
>repaired the same range twice.
>
>> Can I run node tool ­pr repair on just 1/RF of my nodes if I do the
>>correct nodes?
>
>As it's hopefully clear from the description above, no.
>
>>  Why are the row keys still there though?
>
>http://wiki.apache.org/cassandra/FAQ#range_ghosts
>
>
>--
>Sylvain



Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Niteesh kumar
not only a node make connection to other nodes. i can also see nodes 
making connection to itself on port 9160.




On Tuesday 02 October 2012 07:42 PM, Viktor Jevdokimov wrote:

not a thrift por




Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Nick Bailey
The comments here so far are correct. Cassandra itself will never open
a thrift connection. Thrift is only for clients. Not sure what exactly
you are seeing but I don't think it's cassandra.

On Tue, Oct 2, 2012 at 10:48 AM, Niteesh kumar  wrote:
> not only a node make connection to other nodes. i can also see nodes making
> connection to itself on port 9160.
>
>
>
> On Tuesday 02 October 2012 07:42 PM, Viktor Jevdokimov wrote:
>>
>> not a thrift por
>
>


Re: easy repair questions on -pr

2012-10-02 Thread Sylvain Lebresne
> So, I suspect I can expect those rows to finally go away when queried from
> cassandra-cli once GCGraceSeconds has passed then?

Yes.

--
Sylvain


Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, 
   By the way, PlayOrm offers a NoSqlTypedSession that is different than
the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can
do Scalable SQL on data that has no ORM on top of it).  That is what we
use for our 1000's of CF's as we don't know the format of any of those
tables ahead of time(in our world, users tell us the format and wire in
streams through an api we expose AND they tell PlayOrm which columns to
index).  That layer deals with BigInteger, BigDecimal, String and I think
byte[].

So, I am going to add virtual CF's to PlayOrm in the coming week and we
are going to feed in streams and partition the virtual CF's which sit in a
single real CF using PlayOrm partitioning and then we can then query into
each partition.  

The only issue is really what partitions exist and that is left to the
client to keep track of, but if your app knows all the partitions(and that
could be saved to some rows in the nosql store), then I will probably try
out storm after that.

Later,
Dean

On 10/2/12 9:09 AM, "Ben Hood" <0x6e6...@gmail.com> wrote:

>On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill  wrote:
>> Exactly.
>
>So you're back to the deliberation between using multiple CFs
>(potentially with some known working upper bound*) or feeding your map
>reduce in some other way (as you decided to do with Storm). In my
>particular scenario I'd like to be able to do a combination of some
>batch processing on top of less frequently changing data (hence why I
>was looking at Hadoop) and some real time analytics.
>
>Cheers,
>
>Ben
>
>(*) Not sure whether this applies to an individual keyspace or an
>entire cluster.



Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Hiller, Dean
Can you just use netstat and dig into the process id and do a ps -ef |
grep  to clear up all the confusion.  Doing so you can tell which
process communicates with which process(I am assuming you are on linuxŠ.on
MAC or windows it is different commands).

Then, just paste all that in the email to this list so we can see it.

MY GUESS is someone is running a tool that is talking thrift on your
cassandra node  Maybe another server like a web server?

Later,
Dean

On 10/2/12 9:50 AM, "Nick Bailey"  wrote:

>The comments here so far are correct. Cassandra itself will never open
>a thrift connection. Thrift is only for clients. Not sure what exactly
>you are seeing but I don't think it's cassandra.
>
>On Tue, Oct 2, 2012 at 10:48 AM, Niteesh kumar 
>wrote:
>> not only a node make connection to other nodes. i can also see nodes
>>making
>> connection to itself on port 9160.
>>
>>
>>
>> On Tuesday 02 October 2012 07:42 PM, Viktor Jevdokimov wrote:
>>>
>>> not a thrift por
>>
>>



Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 
1.1+ to use a secondary index as an input to your mapreduce job.  What you 
might do is add a field to the column family that represents which virtual 
column family that it is part of.  Then when doing mapreduce jobs, you could 
use that field as the secondary index limiter.  Secondary index mapreduce is 
not as efficient since you first get all of the keys and then do multigets to 
get the data that you need for the mapreduce job.  However, it's another option 
for not scanning the whole column family.

On Oct 2, 2012, at 10:09 AM, Ben Hood <0x6e6...@gmail.com> wrote:

> On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill  wrote:
>> Exactly.
> 
> So you're back to the deliberation between using multiple CFs
> (potentially with some known working upper bound*) or feeding your map
> reduce in some other way (as you decided to do with Storm). In my
> particular scenario I'd like to be able to do a combination of some
> batch processing on top of less frequently changing data (hence why I
> was looking at Hadoop) and some real time analytics.
> 
> Cheers,
> 
> Ben
> 
> (*) Not sure whether this applies to an individual keyspace or an
> entire cluster.



Re: 1000's of column families

2012-10-02 Thread Ben Hood
Jeremy, 


On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

> Another option that may or may not work for you is the support in Cassandra 
> 1.1+ to use a secondary index as an input to your mapreduce job. What you 
> might do is add a field to the column family that represents which virtual 
> column family that it is part of. Then when doing mapreduce jobs, you could 
> use that field as the secondary index limiter. Secondary index mapreduce is 
> not as efficient since you first get all of the keys and then do multigets to 
> get the data that you need for the mapreduce job. However, it's another 
> option for not scanning the whole column family.
> 


Interesting. This is probably a stupid question but why shouldn't you be able 
to use the secondary index to go straight to the slices that belong to the 
attribute you are searching by? Is this something to do with the way Cassandra 
is exposed as an InputFormat for Hadoop or is this a general property for 
searching by secondary index?

Ben 



Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Because the data for an index is not all together(ie. Need a multi get to get 
the data).  It is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what 
I understand is contiguous.

QUESTION: What I don't get in the comment is I assume you are referring to CQL 
in which case we would need to specify the partition (in addition to the 
index)which means all that data is on one node, correct?  Or did I miss 
something there.

Thanks,
Dean

From: Ben Hood <0x6e6...@gmail.com>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 2, 2012 11:18 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: 1000's of column families

Jeremy,

On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

Another option that may or may not work for you is the support in Cassandra 
1.1+ to use a secondary index as an input to your mapreduce job. What you might 
do is add a field to the column family that represents which virtual column 
family that it is part of. Then when doing mapreduce jobs, you could use that 
field as the secondary index limiter. Secondary index mapreduce is not as 
efficient since you first get all of the keys and then do multigets to get the 
data that you need for the mapreduce job. However, it's another option for not 
scanning the whole column family.

Interesting. This is probably a stupid question but why shouldn't you be able 
to use the secondary index to go straight to the slices that belong to the 
attribute you are searching by? Is this something to do with the way Cassandra 
is exposed as an InputFormat for Hadoop or is this a general property for 
searching by secondary index?

Ben



Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, 


On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

> Because the data for an index is not all together(ie. Need a multi get to get 
> the data). It is not contiguous.
> 
> The prefix in a partition they keep the data so all data for a prefix from 
> what I understand is contiguous.
> 

So you're saying that you can access the primary index with a key range, but to 
access the secondary index, you first need to get all keys and follow up with a 
multiget, which would use the secondary index to speed the lookup of the 
matching rows?

 
> 
> QUESTION: What I don't get in the comment is I assume you are referring to 
> CQL in which case we would need to specify the partition (in addition to the 
> index)which means all that data is on one node, correct? Or did I miss 
> something there.
> 
> 


Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job 
to process all matching rows in the CF - I was assuming that that this job 
would get executed on the same node as the data.

But I think the real confusion in my question has to do with the way the 
ColumnFamilyInputFormat has been implemented, since it would appear that it 
ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to 
be applied in the job rather than up front in the Cassandra query.

Cheers,

Ben




Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
So you're saying that you can access the primary index with a key range, but to 
access the secondary index, you first need to get all keys and follow up with a 
multiget, which would use the secondary index to speed the lookup of the 
matching rows?

Yes, that is how I "believe" it works.  I am by no means an expert.

I also wanted to fire off a MR to process matching rows in the "virtual" CF 
ideally running on the nodes where it reads data in.  In 0.7, I thought the M/R 
jobs did not run locally with the data like hadoop does???  Anyone know if that 
is still true or does it run locally to the data now?

Thanks,
Dean

From: Ben Hood <0x6e6...@gmail.com>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 2, 2012 1:01 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: 1000's of column families

Dean,

On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

Because the data for an index is not all together(ie. Need a multi get to get 
the data). It is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what 
I understand is contiguous.





QUESTION: What I don't get in the comment is I assume you are referring to CQL 
in which case we would need to specify the partition (in addition to the 
index)which means all that data is on one node, correct? Or did I miss 
something there.

Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job 
to process all matching rows in the CF - I was assuming that that this job 
would get executed on the same node as the data.

But I think the real confusion in my question has to do with the way the 
ColumnFamilyInputFormat has been implemented, since it would appear that it 
ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to 
be applied in the job rather than up front in the Cassandra query.

Cheers,

Ben



Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
It's always had data locality (since hadoop support was added in 0.6).

You don't need to specify a partition, you specify the input predicate with 
ConfigHelper or the cassandra.input.predicate property.

On Oct 2, 2012, at 2:26 PM, "Hiller, Dean"  wrote:

> So you're saying that you can access the primary index with a key range, but 
> to access the secondary index, you first need to get all keys and follow up 
> with a multiget, which would use the secondary index to speed the lookup of 
> the matching rows?
> 
> Yes, that is how I "believe" it works.  I am by no means an expert.
> 
> I also wanted to fire off a MR to process matching rows in the "virtual" CF 
> ideally running on the nodes where it reads data in.  In 0.7, I thought the 
> M/R jobs did not run locally with the data like hadoop does???  Anyone know 
> if that is still true or does it run locally to the data now?
> 
> Thanks,
> Dean
> 
> From: Ben Hood <0x6e6...@gmail.com>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Tuesday, October 2, 2012 1:01 PM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Re: 1000's of column families
> 
> Dean,
> 
> On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:
> 
> Because the data for an index is not all together(ie. Need a multi get to get 
> the data). It is not contiguous.
> 
> The prefix in a partition they keep the data so all data for a prefix from 
> what I understand is contiguous.
> 
> 
> 
> 
> 
> QUESTION: What I don't get in the comment is I assume you are referring to 
> CQL in which case we would need to specify the partition (in addition to the 
> index)which means all that data is on one node, correct? Or did I miss 
> something there.
> 
> Maybe my question was just silly - I wasn't referring to CQL.
> 
> As for the locality of the data, I was hoping to be able to fire off an MR 
> job to process all matching rows in the CF - I was assuming that that this 
> job would get executed on the same node as the data.
> 
> But I think the real confusion in my question has to do with the way the 
> ColumnFamilyInputFormat has been implemented, since it would appear that it 
> ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to 
> be applied in the job rather than up front in the Cassandra query.
> 
> Cheers,
> 
> Ben
> 



Re: Cassandra vs Couchbase benchmarks

2012-10-02 Thread aaron morton
A few notes:

* +1 for missing RF and CL cassandra stats.
* Using stripped EBS for m1.xlarge is a bad choice, unless they are using 
provisioned IOPS. Which they do not say. 
* Cassandra JVM settings are *not* standard. It's a low new heap size and a 
larger than default heap size. 
* "memtable size" which I assume they mean memtable_total_space_in_mb should 
default to 1/3 the heap. They have doubled it. 
* I would expect the above non standard memory settings to result in increased 
GC activity and increased latency / reduced throughput

* They presented the facts and said "you decide who is a winner" LOLS

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/10/2012, at 4:48 AM, horschi  wrote:

> Hi Andy,
> 
> things I find odd:
> 
> - Replicacount=1 for mongo and couchdb. How is that a realistic benchmark? I 
> always want at least 2 replicas for my data. Maybe thats just me.
> - On the Mongo Config slide they said they disabled journaling. Why do you 
> disable all safety mechanisms that you would want in a production 
> environment? Maybe they should have added /dev/null to their benchmark ;-)
> - I dont see the replicacount for Cassandra in the slides. Also CL is not 
> specified. Imho the important stuff is missing in the cassandra configuration.
> - In the goals section it said "more data than RAM". But they only have 12GB 
> data per node, with 15GB of RAM per node!
> 
> I am very interested in a recent cassandra-benchmark, but I find this 
> benchmark very disappointing.
> 
> cheers,
> Christian
> 
> 
> On Mon, Oct 1, 2012 at 5:05 PM, Andy Cobley  
> wrote:
> There are some interesting results in the benchmarks below:
> 
> http://www.slideshare.net/renatko/couchbase-performance-benchmarking
> 
> Without starting a flame war etc, I'm interested if these results should
> be considered "Fair and Balanced" or if the methodology is flawed in some
> way ? (for instance is the use of Amazon EC2 sensible for Cassandra
> deployment) ?
> 
> Andy
> 
> 
> 
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 
> 
>