Re: PHP Client

2010-10-24 Thread Jeremy Hanna
Would you like feedback/questions on here or are you going to be using 
http://groups.google.com/group/phpcassa?

On Oct 23, 2010, at 8:26 PM, Tyler Hobbs wrote:

> Hello all,
> 
> I've been working for a while now on putting together a PHP client that works 
> with Cassandra 0.7.  It's at a decent state now, so I would like to start 
> getting feedback from PHP users out there.
> 
> It's available on github here: http://github.com/thobbs/phpcassa
> 
> and the API documentation can be found here: 
> http://thobbs.github.com/phpcassa/
> 
> It's compatible with the current trunk (or RC, as it so happens).  The client 
> itself is based heavily on pycassa.
> 
> I welcome any and all feedback, especially negative :)
> 
> - Tyler



Re: PHP Client

2010-10-24 Thread Tyler Hobbs
I would prefer to use http://groups.google.com/group/phpcassa to help keep
this list focused.

Thanks for the question.
- Tyler

On Sun, Oct 24, 2010 at 6:25 AM, Jeremy Hanna wrote:

> Would you like feedback/questions on here or are you going to be using
> http://groups.google.com/group/phpcassa?
>
> On Oct 23, 2010, at 8:26 PM, Tyler Hobbs wrote:
>
> > Hello all,
> >
> > I've been working for a while now on putting together a PHP client that
> works with Cassandra 0.7.  It's at a decent state now, so I would like to
> start getting feedback from PHP users out there.
> >
> > It's available on github here: http://github.com/thobbs/phpcassa
> >
> > and the API documentation can be found here:
> http://thobbs.github.com/phpcassa/
> >
> > It's compatible with the current trunk (or RC, as it so happens).  The
> client itself is based heavily on pycassa.
> >
> > I welcome any and all feedback, especially negative :)
> >
> > - Tyler
>
>


Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-24 Thread Takayuki Tsunakawa
Hello, Jonathan,

Thank you for your kind reply. Could you give me some more
opinions/comments?


From: "Jonathan Ellis" 
> (b) Cassandra generates input splits from the sampling of keys each
> node has in memory.  So if a node does end up with no data for a
> keyspace (because of bad OOP balancing for instance) it will have no
> splits generated or mapped.

I understood you are referring to StorageService.getSplits(). This
seems to filter out the Cassandra nodes which have no data for the
target (keyspace, column family) pair.

[Q1]
I understood that ColumnFamilyInputFormat requests the above node (or
split) filtering to all nodes in the cluster. Is this correct?

[Q2]
If Q1 is yes, more nodes result in higher cost of MapReduce job
startup (for executing InputFormat.getSplits()). Do you have any
performance numbers about this startup cost (time)? I'd like to know
how high it is when the cluster consists of hundreds of nodes.


[Q3]
Going back to my first mail, I'm wondering if the present Cassandra is
applicable to the analysis of petabytes of data.
[Q3-1]
How much data is aimed at by the 400 node cluster Riptano is planning?
If each node has 4 TB of disks and the replication factor is 3, the
simple calculation shows 4 TB * 400 / 3 = 533 TB (ignoring commit log,
OS areas, etc).
[Q3-2]
Based on the current architecture, how many nodes is the limit and how
much (approximate) data is the practical limit?


Regards,
Takayuki Tsunakawa








remove

2010-10-24 Thread Lance Li
remove



Experiences with Cassandra hardware planning

2010-10-24 Thread Eric Rosenberry
All-

Over the past nine months I have been working to tune our hardware
configuration to optimally balance CPU/RAM/disk/iops/network per node for
our Cassandra workload.  Thanks much to those here who have provided helpful
advice.

I wanted to share back to the community some of the learnings we have come
across including the hardware configuration we have been successful with
(YMMV).  This is still a work in progress naturally.

I have written up a detailed blog post about this here:
http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz-vertex-2-ssds-with-sandforce-arista-7048s/

Here are the highlights:

   - Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays
   - OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series
   controllers
   - Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to
   the hosts

Let me know if you have any questions!

-Eric


remove

2010-10-24 Thread Marie-Anne
remove



RE: Experiences with Cassandra hardware planning

2010-10-24 Thread David Dabbs
Eric,

Thanks for the detailed post! Did you need to start your JVMs with numactl
in order to take advantage of NUMA?
I know the board, OS and JVM must  be configured properly, but it's not
clear if the JVMs must be started with numactl.



Thanks,

David



From: Eric Rosenberry [mailto:epros...@gmail.com] 
Sent: Sunday, October 24, 2010 10:17 PM
To: user@cassandra.apache.org
Subject: Experiences with Cassandra hardware planning

All-

Over the past nine months I have been working to tune our hardware
configuration to optimally balance CPU/RAM/disk/iops/network per node for
our Cassandra workload.  Thanks much to those here who have provided helpful
advice.

I wanted to share back to the community some of the learnings we have come
across including the hardware configuration we have been successful with
(YMMV).  This is still a work in progress naturally.

I have written up a detailed blog post about this here:
http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz
-vertex-2-ssds-with-sandforce-arista-7048s/

Here are the highlights:
• Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays
• OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series
controllers
• Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to the
hosts
Let me know if you have any questions!

-Eric



Re: Experiences with Cassandra hardware planning

2010-10-24 Thread Eric Rosenberry
We have not started our JVM's with numactl.

I am not sure what (if any) benefit there has been to turning on NUMA in the
BIOS.  Turning it on could have in fact reduced performance.  I suspect that
Java is only using memory from one of the processors (since less than half
of the physical memory is assigned to the JVM) and the other processors
memory is being used for file system cache.

Clearly there is probably some room for improvement here.  We have not
invested much time in this as of yet.

If anyone else has knowledge in this area please chime in!

-Eric

On Sun, Oct 24, 2010 at 10:36 PM, David Dabbs  wrote:

> Eric,
>
> Thanks for the detailed post! Did you need to start your JVMs with numactl
> in order to take advantage of NUMA?
> I know the board, OS and JVM must  be configured properly, but it's not
> clear if the JVMs must be started with numactl.
>
>
>
> Thanks,
>
> David
>
>
>
> From: Eric Rosenberry [mailto:epros...@gmail.com]
> Sent: Sunday, October 24, 2010 10:17 PM
> To: user@cassandra.apache.org
> Subject: Experiences with Cassandra hardware planning
>
> All-
>
> Over the past nine months I have been working to tune our hardware
> configuration to optimally balance CPU/RAM/disk/iops/network per node for
> our Cassandra workload.  Thanks much to those here who have provided
> helpful
> advice.
>
> I wanted to share back to the community some of the learnings we have come
> across including the hardware configuration we have been successful with
> (YMMV).  This is still a work in progress naturally.
>
> I have written up a detailed blog post about this here:
>
> http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz
> -vertex-2-ssds-with-sandforce-arista-7048s/
>
> Here are the highlights:
> • Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays
> • OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series
> controllers
> • Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to
> the
> hosts
> Let me know if you have any questions!
>
> -Eric
>
>