json2sstable hanging on large sstable2json-generated JSON file

2011-03-12 Thread Jason Harvey
Trying to import a 3GB JSON file which was exported from sstable2json. I let it run for over an hour and saw zero IO activity. The last thing it logs is the following: DEBUG 23:19:32,638 collecting 0 of 2147483647: Avro/Schema:false:2042@1298067089267 DEBUG 23:19:32,638 collecting 1 of 2147483647:

Re: sstable2json not loading CLASSPATH properly?

2011-03-12 Thread Jason Harvey
nvm, I found the problem. Sstable2json and json2sstable require a log4j-tools properties file. I created one and all was well. I guess that should be added to the default install packages. Cheers, Jason On Sat, Mar 12, 2011 at 12:09 AM, Jason Harvey wrote: > Sstable2json always spits out the fol

Example usage of json2sstable?

2011-03-12 Thread Jason Harvey
Hey everyone, I ran into some severely broken SSTables which I ran through sstable2json to preserve all of the info I could. The scrub process deleted all of the screwed up rows, so I am now trying to reimport that data back into cassandra from JSON. I know I must specify an sstable for json2sstab

Re: memory utilization

2011-03-12 Thread Jonathan Ellis
Probably relevant: we only use mmap'd I/O for single-row reads. When we are paging through entire files like we do for compaction or AES we do buffered i/o to avoid the complexity of having to manage multiple mmap segments (Java limits us to 2GB per segment). On Sat, Mar 12, 2011 at 7:06 PM, Peter

Re: memory utilization

2011-03-12 Thread Peter Schuller
>> Nothing happens, because it _doesn't have to be resident_. >> > > Hm, but why in my case top show RSS 10g, when max HEAP_SIZE is 6G? The point is that it is a result of how the kernel manages memory and how it is reported in top. It is not reflective of actual memory "use", the way users normal

Re: Cassandra RPM

2011-03-12 Thread Nate McCall
Thanks for the kind words, but we simply build and host the artifacts defined by the the specfile in the redhat directory of the Apache Cassandra source distribution. Any feature requests should go through jira: https://issues.apache.org/jira/browse/CASSANDRA On Sat, Mar 12, 2011 at 9:02 AM, Rober

Re: memory utilization

2011-03-12 Thread ruslan usifov
2011/3/12 Jonathan Ellis > Nothing happens, because it _doesn't have to be resident_. > > Hm, but why in my case top show RSS 10g, when max HEAP_SIZE is 6G?? PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 27650 cassandr 20 0 14.9g 10g 3.8g S 51 86.6 370:15.82 jsvc 205

Re: Cassandra RPM

2011-03-12 Thread Mohit Anchlia
I think that can be configured in cassandra.yaml On Sat, Mar 12, 2011 at 7:02 AM, Robert Zotter wrote: > Riptano's Cassandra RPM works beautifully... love it. > > However, I would like to make a suggestion. I believe the commit and > data directories should be configurable via alternatives since

Re: memory utilization

2011-03-12 Thread Jonathan Ellis
Nothing happens, because it _doesn't have to be resident_. On Sat, Mar 12, 2011 at 1:47 PM, ruslan usifov wrote: > > > 2011/3/11 Chris Burroughs >> >> Is there an more or less constant amount of resident memory, or is it >> growing over a period of days? > > As said in cassandra wiki: > The

Re: memory utilization

2011-03-12 Thread ruslan usifov
2011/3/11 Chris Burroughs > Is there an more or less constant amount of resident memory, or is it > growing over a period of days? > As said in cassandra wiki: >>>The main argument for using mmap() instead of standard I/O is the fact that reading entails just touching memory - in the case of th

Re: Poor performance on small data set

2011-03-12 Thread Tyler Hobbs
On Sat, Mar 12, 2011 at 6:45 AM, Vodnok wrote: > > THRIFT-638 : It seems to be a solution but i don't know how to patch this > on my environement phpcassa has a C extension but it's hard for me to build > a php extension > The master branch of phpcassa includes the changes from THRIFT-638. --

Re: Poor performance on small data set

2011-03-12 Thread ruslan usifov
Here is php windows extension but you must use trunk version of thrift 2011/3/12 Vodnok > Thank you all for your replies > > > "nagle + delayed ACK problem" : I founded a way to solve this via regedit > but no impact on response time > > THRIFT-638 : It seems to be a solution but i don't know ho

Cassandra RPM

2011-03-12 Thread Robert Zotter
Riptano's Cassandra RPM works beautifully... love it. However, I would like to make a suggestion. I believe the commit and data directories should be configurable via alternatives since it suggested that one split those on two separate drives anyway. Other than that, no complaints. Thanks for th

Re: Poor performance on small data set

2011-03-12 Thread Vodnok
Thank you all for your replies "nagle + delayed ACK problem" : I founded a way to solve this via regedit but no impact on response time THRIFT-638 : It seems to be a solution but i don't know how to patch this on my environement phpcassa has a C extension but it's hard for me to build a php exte

Re: Cassandra still won't start - in-use ports block it

2011-03-12 Thread Maki Watanabe
Hello Bob, 1. What does lsof says on TCP:9160 port? $ lsof -i TCP:9160 2. Have you try to change rpc_port in conf/cassandra.yaml? ex. rpc_port: 19160 maki 2011/3/12 Jeremy Hanna : > I don't know if others have asked this but do you have a firewall running > that would prevent access to those

sstable2json not loading CLASSPATH properly?

2011-03-12 Thread Jason Harvey
Sstable2json always spits out the following when I execute it: log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. I verified that the run script sets the CLASSPATH properly, and I even tried