Re: PHP Client
Would you like feedback/questions on here or are you going to be using http://groups.google.com/group/phpcassa? On Oct 23, 2010, at 8:26 PM, Tyler Hobbs wrote: > Hello all, > > I've been working for a while now on putting together a PHP client that works > with Cassandra 0.7. It's at a decent state now, so I would like to start > getting feedback from PHP users out there. > > It's available on github here: http://github.com/thobbs/phpcassa > > and the API documentation can be found here: > http://thobbs.github.com/phpcassa/ > > It's compatible with the current trunk (or RC, as it so happens). The client > itself is based heavily on pycassa. > > I welcome any and all feedback, especially negative :) > > - Tyler
Re: PHP Client
I would prefer to use http://groups.google.com/group/phpcassa to help keep this list focused. Thanks for the question. - Tyler On Sun, Oct 24, 2010 at 6:25 AM, Jeremy Hanna wrote: > Would you like feedback/questions on here or are you going to be using > http://groups.google.com/group/phpcassa? > > On Oct 23, 2010, at 8:26 PM, Tyler Hobbs wrote: > > > Hello all, > > > > I've been working for a while now on putting together a PHP client that > works with Cassandra 0.7. It's at a decent state now, so I would like to > start getting feedback from PHP users out there. > > > > It's available on github here: http://github.com/thobbs/phpcassa > > > > and the API documentation can be found here: > http://thobbs.github.com/phpcassa/ > > > > It's compatible with the current trunk (or RC, as it so happens). The > client itself is based heavily on pycassa. > > > > I welcome any and all feedback, especially negative :) > > > > - Tyler > >
Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data
Hello, Jonathan, Thank you for your kind reply. Could you give me some more opinions/comments? From: "Jonathan Ellis" > (b) Cassandra generates input splits from the sampling of keys each > node has in memory. So if a node does end up with no data for a > keyspace (because of bad OOP balancing for instance) it will have no > splits generated or mapped. I understood you are referring to StorageService.getSplits(). This seems to filter out the Cassandra nodes which have no data for the target (keyspace, column family) pair. [Q1] I understood that ColumnFamilyInputFormat requests the above node (or split) filtering to all nodes in the cluster. Is this correct? [Q2] If Q1 is yes, more nodes result in higher cost of MapReduce job startup (for executing InputFormat.getSplits()). Do you have any performance numbers about this startup cost (time)? I'd like to know how high it is when the cluster consists of hundreds of nodes. [Q3] Going back to my first mail, I'm wondering if the present Cassandra is applicable to the analysis of petabytes of data. [Q3-1] How much data is aimed at by the 400 node cluster Riptano is planning? If each node has 4 TB of disks and the replication factor is 3, the simple calculation shows 4 TB * 400 / 3 = 533 TB (ignoring commit log, OS areas, etc). [Q3-2] Based on the current architecture, how many nodes is the limit and how much (approximate) data is the practical limit? Regards, Takayuki Tsunakawa
remove
remove
Experiences with Cassandra hardware planning
All- Over the past nine months I have been working to tune our hardware configuration to optimally balance CPU/RAM/disk/iops/network per node for our Cassandra workload. Thanks much to those here who have provided helpful advice. I wanted to share back to the community some of the learnings we have come across including the hardware configuration we have been successful with (YMMV). This is still a work in progress naturally. I have written up a detailed blog post about this here: http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz-vertex-2-ssds-with-sandforce-arista-7048s/ Here are the highlights: - Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays - OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series controllers - Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to the hosts Let me know if you have any questions! -Eric
remove
remove
RE: Experiences with Cassandra hardware planning
Eric, Thanks for the detailed post! Did you need to start your JVMs with numactl in order to take advantage of NUMA? I know the board, OS and JVM must be configured properly, but it's not clear if the JVMs must be started with numactl. Thanks, David From: Eric Rosenberry [mailto:epros...@gmail.com] Sent: Sunday, October 24, 2010 10:17 PM To: user@cassandra.apache.org Subject: Experiences with Cassandra hardware planning All- Over the past nine months I have been working to tune our hardware configuration to optimally balance CPU/RAM/disk/iops/network per node for our Cassandra workload. Thanks much to those here who have provided helpful advice. I wanted to share back to the community some of the learnings we have come across including the hardware configuration we have been successful with (YMMV). This is still a work in progress naturally. I have written up a detailed blog post about this here: http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz -vertex-2-ssds-with-sandforce-arista-7048s/ Here are the highlights: Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series controllers Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to the hosts Let me know if you have any questions! -Eric
Re: Experiences with Cassandra hardware planning
We have not started our JVM's with numactl. I am not sure what (if any) benefit there has been to turning on NUMA in the BIOS. Turning it on could have in fact reduced performance. I suspect that Java is only using memory from one of the processors (since less than half of the physical memory is assigned to the JVM) and the other processors memory is being used for file system cache. Clearly there is probably some room for improvement here. We have not invested much time in this as of yet. If anyone else has knowledge in this area please chime in! -Eric On Sun, Oct 24, 2010 at 10:36 PM, David Dabbs wrote: > Eric, > > Thanks for the detailed post! Did you need to start your JVMs with numactl > in order to take advantage of NUMA? > I know the board, OS and JVM must be configured properly, but it's not > clear if the JVMs must be started with numactl. > > > > Thanks, > > David > > > > From: Eric Rosenberry [mailto:epros...@gmail.com] > Sent: Sunday, October 24, 2010 10:17 PM > To: user@cassandra.apache.org > Subject: Experiences with Cassandra hardware planning > > All- > > Over the past nine months I have been working to tune our hardware > configuration to optimally balance CPU/RAM/disk/iops/network per node for > our Cassandra workload. Thanks much to those here who have provided > helpful > advice. > > I wanted to share back to the community some of the learnings we have come > across including the hardware configuration we have been successful with > (YMMV). This is still a work in progress naturally. > > I have written up a detailed blog post about this here: > > http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz > -vertex-2-ssds-with-sandforce-arista-7048s/ > > Here are the highlights: > • Dell C1100 "cloud series" servers with 10x 2.5 inch drive bays > • OCZ Technology Vertex 2 MLC SSD's with the Sandforce 1200 series > controllers > • Arista Networks 7048 1U Top of Rack switches running MLAG with LACP to > the > hosts > Let me know if you have any questions! > > -Eric > >