Re: moving data from single node cassandra

2011-03-17 Thread Komal Goyal
Thanks Maki :) I copied the existing var folder to the new hardisk and changes the path to the data directories in the storage-config.xml I was successfully able to connect with cassandra and read the data that was shifted to the new location. On Fri, Mar 18, 2011 at 6:33 AM, Maki Watanabe

Re: moving data from single node cassandra

2011-03-17 Thread John Lewis
| data_file_directories makes it seem as though cassandra can use more than one location for sstable storage. Does anyone know how it splits up the data between partitions? I am trying to plan for just about every worst case scenario I can right now, and I want to know if I can change the config

Re: moving data from single node cassandra

2011-03-17 Thread Maki Watanabe
Refer to: http://wiki.apache.org/cassandra/StorageConfiguration You can specify the data directories with following parameter in storage-config.xml (or cassandra.yaml in 0.7+). commit_log_directory : where commitlog will be written data_file_directories : data files saved_cache_directory : saved

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread mcasandra
Also when it comes to RAID controller there are other options like write policy, read policy, cache io/direct io. Is there any preference on which policies should be chosen? In our case: http://support.dell.com/support/edocs/software/svradmin/1.9/en/stormgmt/cntrls.html -- View this message in c

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Where and how do I choose it? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks Peter, I can see it better now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nab

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> The reason for this is that you want to be able to saturate your > storage subsystem, and that means keeping all spindles working at all > times and efficiently. This is accomplished by ensuring you are able > to sustain a sufficient queue depth (number of outstanding commands) > on each device.

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> Thanks to all for replying, but frankly I didn't get the answer I wanted. > Does the "number of disks" apply to number of spindles in RAID0? Or > something else like a separate disk for commitlog and for data? The number of actual disks (spindles) in the device on which your sstables are on (not

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks to all for replying, but frankly I didn't get the answer I wanted. Does the "number of disks" apply to number of spindles in RAID0? Or something else like a separate disk for commitlog and for data? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nab

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> The comment in the example config file next to that setting explains it more > fully, but something like 16 * number of drives is a reasonable setting for > readers. Writers should be a multiple of the number of cores. In addition, if you're running on Linux in a situation where you're trying to

Re: Replacing a dead seed

2011-03-17 Thread Jonathan Colby
Of course! why didn't i think of that? Thanks!! On Mar 17, 2011, at 3:11 PM, Edward Capriolo wrote: > On Thu, Mar 17, 2011 at 9:09 AM, Jonathan Colby > wrote: >> Hi - >> >> If a seed crashes (i.e., suddenly unavailable due to HW problem), what is >> the best way to replace the seed in the c

Re: Upgrade to a different version?

2011-03-17 Thread Dan Kuebrich
Do people have success stories with 0.7.4? It seems like the list only hears if there's a major problem with a release, which means that if you're trying to judge the stability of a release you're looking for silence. But maybe that means not many people have tried it yet. Is there a record of t

Re: Pauses of GC

2011-03-17 Thread ruslan usifov
At this moments java hungs. Only one thread is work and it run mostly in OS core, with follow trace: [pid 1953] 0.050157 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1) = 0 <0.22> [pid 1953] 0.59 futex(0x7fbc24023794, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {13002023

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
Hi Paul, It's more of a scientific mining app. We crawl websites and extract information from these websites for our clients. For us, it doesn't really matter if one cassandra node replies after 1 second or a few ms, as long as the throughput over time stays high. And so far, this seems to be the

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
As for the version, we will wait a few more days, and if nothing really bad shows up, move to 0.7.4. On Thu, Mar 17, 2011 at 10:40 PM, Thibaut Britz < thibaut.br...@trendiction.com> wrote: > Hi Paul, > > It's more of a scientific mining app. We crawl websites and extract > information from thes

Re: AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-17 Thread Aaron Morton
Good work. Aaron On 17/03/2011, at 4:37 PM, Jonathan Ellis wrote: > Thanks for tracking that down, Roland. I've created > https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this. > > On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude > wrote: >> I have applied the suggested changes in m

Re: super_column.name?

2011-03-17 Thread Michael Fortin
Thanks for the response, sorry if my initial question wasn't clear. When using thrift, I call client.get_slice(keyBytes, columnParent, range, level) i get a list of ColumnOrSuperColumns back. When I iterate over them and and call: byte[] nameBytes = columnOrSuperColumn.getSuper_column().getNa

Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Stu Hood
The comment in the example config file next to that setting explains it more fully, but something like 16 * number of drives is a reasonable setting for readers. Writers should be a multiple of the number of cores. On Thu, Mar 17, 2011 at 1:09 PM, buddhasystem wrote: > Hello, in the instructions

Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
Depending on your memtable thresholds the heap may be too small for the deployment. At the same time I don't see any other log statements around that long pause that you have shown in the log snippet. It looks little odd to me. All the ParNew collected almost same amount of heap and did not take lo

Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Hello, in the instructions, I need to link "concurrent_reads" to number of drives. Is this related to number of physical drives that I have in my RAID0, or something else? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relat

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ching-Cheng Chen
>From OrderPreservingPartition.java public StringToken getToken(ByteBuffer key) { String skey; try { skey = ByteBufferUtil.string(key, Charsets.UTF_8); } catch (CharacterCodingException e) { throw new RuntimeException(

Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
Cool - let me know if you have any questions if you do. I'm @jeromatron in irc and on twitter. On Mar 17, 2011, at 1:10 PM, Ethan Rowe wrote: > Thanks, Jeremy. I looked over the work that was done and it seemed like it > was mostly there, though some comments in the ticket indicated possible

Re: hadoop streaming input

2011-03-17 Thread Ethan Rowe
Thanks, Jeremy. I looked over the work that was done and it seemed like it was mostly there, though some comments in the ticket indicated possible problems. I may well need to take a crack at this sometime in the next few weeks, but if somebody beats me to it, I certainly won't complain. On Thu,

Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
I started it and added the tentative patch at the end of October. It needs to be rebased with the current 0.7-branch and completed - it's mostly there. I just tried to abstract some things in the process. I have changed jobs since then and I just haven't had time with the things I've been doi

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan
Please can any one give their comment on this On 03/17/2011 07:02 PM, Ali Ahsan wrote: Dear Aaron, We are little confused about OPP token.How to calculate OPP Token? Few of our column families have UUID as key and other's have integer as key.

Cassandra 0.7.* replication question

2011-03-17 Thread Oleg Tsvinev
I wonder what it the right way to configure replication in Cassandra cluster. I need to have 3 copies of my data in a cluster consisting of 6 nodes. 3 of these nodes are in one datacenter - let's call it DC1 - and 3 in another, DC2. There is a significant latency between these datacenters and orig

hadoop streaming input

2011-03-17 Thread Ethan Rowe
Hello. What's the current thinking on input support for Hadoop streaming? It seems like the relevant Jira issue has been quiet for some time: https://issues.apache.org/jira/browse/CASSANDRA-1497 Thanks. - Ethan

Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
On 3/17/2011 1:06 PM, Thibaut Britz wrote: > If it helps you to sleep better, > > we use cassandra (0.7.2 with the flush fix) in production on > 100 > servers. > > Thibaut > Thanks Thibaut, believe it or not, it does. :) Is your use case a typical web app or something like a scientific/data mini

Re: nodetool repair on cluster

2011-03-17 Thread Huy Le
Thanks Jonathan, Aaron, Daniel! I have a related question. I would like to get a copy of data from these 12-server cluster with manually assigned babanced server tokens, and set it up on a new cluster. I would like to minimize the number of the server on the new cluster without having to build

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
If it helps you to sleep better, we use cassandra (0.7.2 with the flush fix) in production on > 100 servers. Thibaut On Thu, Mar 17, 2011 at 5:58 PM, Paul Pak wrote: > I'm at a crossroads right now. We built an application around .7 and > the features in .7, so going back to .6 wasn't an opt

Re: [RELEASE] 0.7.4

2011-03-17 Thread Jonathan Ellis
It is still there, but we took it out of the sample config because people think it affects normal writes which it does not. On Thu, Mar 17, 2011 at 11:48 AM, A J wrote: > I don't see binary_memtable_throughput_in_mb parameter in > cassandra.yaml anymore. > What is it replaced by ? > > thanks. > >

Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
I'm at a crossroads right now. We built an application around .7 and the features in .7, so going back to .6 wasn't an option for us. Now, we are in the middle of setting up dual mysql and cassandra support so that we can "fallback" to mysql if Cassandra can't handle the workload properly. It's

Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Yes thanks I was able to see that . Now I am getting the following error OutboundTcpConnection.java (line 159) attempting to connect to astrix.com where astrix.com is the machine on which I have installed cassandra Any suggestions. Thanks Anurag On Thu, Mar 17, 2011 at 9:49 AM, Jonathan Ellis

Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Jonathan Ellis
Internal error means "there is a stacktrace in the server system.log" and in this case probably also means "you sent some kind of invalid request that our validation didn't catch." On Thu, Mar 17, 2011 at 11:29 AM, Anurag Gujral wrote: > Thanks for the reply. I added mutation.__isset.column_or_su

Re: [RELEASE] 0.7.4

2011-03-17 Thread A J
I don't see binary_memtable_throughput_in_mb parameter in cassandra.yaml anymore. What is it replaced by ? thanks. On Tue, Mar 15, 2011 at 11:32 PM, Eric Evans wrote: > On Tue, 2011-03-15 at 22:19 -0500, Eric Evans wrote: >> On Tue, 2011-03-15 at 14:26 -0700, Mark wrote: >> > Still not seeing 0.

Re: Pauses of GC

2011-03-17 Thread ruslan usifov
2011/3/17 Narendra Sharma > What heap size are you running with? and Which version of Cassandra? > > 4G with cassandra 0.7.4

Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
What heap size are you running with? and Which version of Cassandra? Thanks, Naren On Thu, Mar 17, 2011 at 3:45 AM, ruslan usifov wrote: > Hello > > Some times i have very long GC pauses: > > > Total time for which application threads were stopped: 0.0303150 seconds > 2011-03-17T13:19:56.476+030

Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Thanks for the reply. I added mutation.__isset.column_or_supercolumn=true; Now I am getting TApplicationException: Internal error processing batch_mutate Any suggestions? Thanks Anurag On Thu, Mar 17, 2011 at 8:13 AM, Anurag Gujral wrote: > Hi All, > I am using function batch_mutate o

Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Tyler Hobbs
You need to set the __isset on the Mutation object as well. On Thu, Mar 17, 2011 at 10:13 AM, Anurag Gujral wrote: > Hi All, > I am using function batch_mutate of cassandra 0.7 and I am > getting the error InvalidRequestException: Mutation must have one > ColumnOrSuperColumn or one Dele

InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Hi All, I am using function batch_mutate of cassandra 0.7 and I am getting the error InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client using thrift 0.0.5 api. Any Suggestions. Sample Code map > cfmap; vector

moving data from single node cassandra

2011-03-17 Thread Komal Goyal
Hi, I am having single node cassandra setup on a windows machine. Very soon I have ran out of space on this machine so have increased the hardisk capacity of the machine. Now I want to know how I configure cassandra to start storing data in these high space partitions? Also how the existing data

RE: hadoop cassandra

2011-03-17 Thread Sagar Kohli
thanks Jeremy, its good pointer to start with regards Sagar From: Jeremy Hanna [jeremy.hanna1...@gmail.com] Sent: Thursday, March 17, 2011 7:34 PM To: user@cassandra.apache.org Subject: Re: hadoop cassandra You can start with a word count example that's on

Re: Replacing a dead seed

2011-03-17 Thread Edward Capriolo
On Thu, Mar 17, 2011 at 9:09 AM, Jonathan Colby wrote: > Hi - > > If a seed crashes (i.e., suddenly unavailable due to HW problem),   what is > the best way to replace the seed in the cluster? > > I've read that you should not bootstrap a seed.  Therefore I came up with > this procedure, but it

Re: hadoop cassandra

2011-03-17 Thread Jeremy Hanna
You can start with a word count example that's only for hdfs. Then you can replace the reducer in that with the ReducerToCassandra that's in the cassandra word_count example. You need to match up your Mapper's output to the Reducer's input and set a couple of configuration variables to tell it

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan
Dear Aaron, We are little confused about OPP token.How to calculate OPP Token? Few of our column families have UUID as key and other's have integer as key. On 03/17/2011 04:22 PM, Ali Ahsan wrote: Below is the ouput of nodetool ring Address Status Load Range

Re: super_column.name?

2011-03-17 Thread Sylvain Lebresne
Are you sure you don't have a problem with handling ByteBuffers ? What do you mean by 'deserialized string' ? -- Sylvain On Thu, Mar 17, 2011 at 4:20 AM, Michael Fortin wrote: > Hi, > > I've been working on a scala based api for cassandra.  I've built it directly > on top of thrift.  I'm having

Re: insert during forced compaction

2011-03-17 Thread Jonathan Ellis
We're aware of the potential for races during schema change but it looks like we missed this one. Can you create a ticket? On Wed, Mar 16, 2011 at 11:55 PM, Jeffrey Wang wrote: > Hey all, > > > > I’m running 0.7.0 on a cluster of 5 machines. When I create a new column > family after I run nodeto

Re: getting exception when cassandra 0.7.3 is starting

2011-03-17 Thread Jonathan Ellis
Remove the cache file or upgrade to 0.7.4 On Thu, Mar 17, 2011 at 1:15 AM, Anurag Gujral wrote: > I am getting exception when starting cassandra 0.7.3 > > ERROR 01:10:48,321 Exception encountered during startup. > java.lang.NegativeArraySizeException >     at > org.apache.cassandra.db.ColumnFamil

Re: super_column.name?

2011-03-17 Thread Jonathan Ellis
I see super-column-0 in there. Not sure what the question is. On Wed, Mar 16, 2011 at 10:20 PM, Michael Fortin wrote: > Hi, > > I've been working on a scala based api for cassandra.  I've built it directly > on top of thrift.  I'm having a problem getting a slice of a superColumn.   > When I ge

Replacing a dead seed

2011-03-17 Thread Jonathan Colby
Hi - If a seed crashes (i.e., suddenly unavailable due to HW problem), what is the best way to replace the seed in the cluster? I've read that you should not bootstrap a seed. Therefore I came up with this procedure, but it seems pretty complicated. any better ideas? 1. update the seed l

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan
Below is the ouput of nodetool ring Address Status Load Range Ring TuL8jLqs7uxLipP6 192.168.100.3 Up 89.91 GB JDtVOU0YVQ6MtBYA |<--| 192.168.100.4 Up 48

Pauses of GC

2011-03-17 Thread ruslan usifov
Hello Some times i have very long GC pauses: Total time for which application threads were stopped: 0.0303150 seconds 2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew: 678855K->20708K(737280K), 0.0271230 secs] 1457643K->806795K(4112384K), 0.027305 0 secs] [Times: user=0.33 sys=0.0

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread aaron morton
With the Order Preserving Partitioner you are responsible for balancing the rows around the cluster, http://wiki.apache.org/cassandra/Operations?highlight=%28partitioner%29#Token_selection Was there a reason for using the ordered partitioner rather than the random one? What does the output

Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan
Hi All We are running Cassandra 0.6.3,We have two node's with replication factor one and ordered partitioning.Problem we are facing at the moment all data is being send to one Cassandra node and its filling up quite rapidly and we are short of disk space.Unfortunately we have hardware constr

hadoop cassandra

2011-03-17 Thread Sagar Kohli
hi all, is there any example of hadoop and cassandra integration where input is from hdfs and out put to cassandra NOTE: i have gone through word count example provided with the source code, but it does not have above case.. regards Sagar Are you exploring a