Some questions about using Binary Memtable to import data.

2010-05-18 Thread Peng Guo
Hi All: I am trying to use Binary Memtable to import a large number of data. But after I look at the wiki intro: http://wiki.apache.org/cassandra/BinaryMemtable I have some questions about using BinaryMemtable 1. Will the data be replicated automatic? 2. Can we modify the data that alread

Re: Hadoop over Cassandra

2010-05-18 Thread Maxim Grinev
On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis wrote: > On Mon, May 17, 2010 at 4:12 PM, Vick Khera wrote: > > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis > wrote: > >> Moving to the user@ list. > >> > >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. > > > > That documen

mapreduce from cassandra to cassandra

2010-05-18 Thread Ran Tavory
In the wordcount example the process reads from cassandra and the result is written to a local file at /tmp/word_count* Is it possible to read from cassandra and write the result back to cassandra to a specified cf/row/column? I see that there exists a ColumnFamilyInputFormat but not ColumnFamilyO

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Jeff Zhang
I believe it is possible to write result back to cassandra. If I remember correctly, HBase has both InputFormat and OutputFormat for hadoop. On Tue, May 18, 2010 at 5:08 PM, Ran Tavory wrote: > In the wordcount example the process reads from cassandra and the result is > written to a local fil

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Ran Tavory
hbase - yes. But is that reusable for cassandra? On Tue, May 18, 2010 at 12:17 PM, Jeff Zhang wrote: > I believe it is possible to write result back to cassandra. If I > remember correctly, HBase has both InputFormat and OutputFormat for > hadoop. > > > > > On Tue, May 18, 2010 at 5:08 PM, Ran T

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Jeff Zhang
reuse is not possible, but I think it won't hard for cassandra to implement a ColumnFamilyOutputFormat. In my opinion, the ColumnFamilyOutputFormat should be easier to implement than ColumnFamilyInputFormat, because you don't need to think about split in the implementation of OutputFormat. On

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Jeff Zhang
BTW, maybe we should create an issue on jira for this problem if the Cassandra committee think it is necessary. On Tue, May 18, 2010 at 5:30 PM, Jeff Zhang wrote: > reuse is not possible, but I think it won't hard for cassandra to > implement a ColumnFamilyOutputFormat. In my opinion, the > Col

decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

2010-05-18 Thread Ran Tavory
What's the correct way to remove a node from a cluster? According to this page http://wiki.apache.org/cassandra/Operations a decommission call should be enough. When decommissioning one of the nodes from my cluster I see an error in the client: org.apache.thrift.TApplicationException: get_slice f

is it possible to trace/debug cassandra?

2010-05-18 Thread S Ahmed
Would it be possible to put cassandra in debug mode, so I could actually step through, line by line, the execution flow of operations I execute against it? If yes, any help would be great.

Re: Hadoop over Cassandra

2010-05-18 Thread Ben Browning
Maxim, Check out the getLocation() method from this file: http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java Basically, it loops over the list of nodes containing this split of data and if any of them are the local node, it returns

ConcurrentModificationException in gossiper while decommissioning another node

2010-05-18 Thread Ran Tavory
While the node 192.168.252.61 was in the process of decommissioning I see this error in two other nodes: INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179) InetAddress /192.168.252.62 is now dead. INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568) InetAddress /192.168.25

Re: is it possible to trace/debug cassandra?

2010-05-18 Thread Ran Tavory
Add to cassandra.in.sh -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n to the JVM_OPTS section. Then connect with jdb ( http://java.sun.com/j2se/1.3/docs/tooldocs/solaris/jdb.html) or your IDE as a remote process On Tue, May 18, 2010 at 1:18 PM, S Ahmed wrote: > Would it b

Re: ConcurrentModificationException in gossiper while decommissioning another node

2010-05-18 Thread roger schildmeijer
This is hopefully fixed in trunk (CASSANDRA-757 (revision 938597)); "Replace synchronization in Gossiper with concurrent data structures and volatile fields." // Roger Schildmeijer On Tue, May 18, 2010 at 1:55 PM, Ran Tavory wrote: > While the node 192.168.252.61 was in the process of decommiss

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-18 Thread Lee Parker
How many different CFs do you have? If you only have a few, I would highly recommend increasing the MemtableThroughputInMB and MemtableOperationsInMillions. We only have to CFs and I have it set at 256MB and 2.5m. Since most of our columns are relatively small, these values are practically equiva

Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

2010-05-18 Thread Gary Dusbabek
If I had to guess, I'd say that something at the transport layer had trouble. Possibly some kind of thrift hiccup that we haven't seen before. Your description makes it sound as if the decommission is proceeding normally though. Gary. On Tue, May 18, 2010 at 04:42, Ran Tavory wrote: > What's t

Disk usage doubled after nodetool decommission and node still in ring

2010-05-18 Thread Maxim Kramarenko
Hi! After nodetool decomission data size on all nodes grow twice, node still up and in ring, and no streaming now / tmp SSTables now. BTW, I have ssh connection to server, so after run nodetool decommission I expect, that server receive the command press Ctrl-C and close shell. It is correct

Scaling problems

2010-05-18 Thread Ian Soboroff
I hope this isn't too much of a newbie question. I am using Cassandra 0.6.1 on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5 data drives. The nodes are running HDFS to serve files within the cluster, but at the moment the rest of Hadoop is shut down. I'm trying to load a lar

cassandra.yaml not picked up?

2010-05-18 Thread Frank Du
Hey All, I tried to run from cassandra trunk source. The keyspace schema has changed to cassandra.yaml. However, the defined Keyspace1 doesn't seem to be picked by cassandra. Is there any additional work to ask cassandra use it? Thank you so much! Best Regards, Frank

Re: cassandra.yaml not picked up?

2010-05-18 Thread Nathan McCall
I came across this the other day as well. The following FAQ will get you going: http://wiki.apache.org/cassandra/FAQ#no_keyspaces -Nate On Tue, May 18, 2010 at 9:24 AM, Frank Du wrote: > Hey All, > > I tried to run from cassandra trunk source. The keyspace schema has changed > to cassandra.yaml.

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-18 Thread Curt Bererton
We only have a few CFs (6 or 7). I've increased the MemtableThroughputInMB and MemtableOperationsInMillions as per your suggestions. Do we really need a swap file though? I suppose it can't hurt, but with my problem in particular we weren't maxing out main memory. We'll be running another test to

Re: Hadoop over Cassandra

2010-05-18 Thread Stu Hood
The Hadoop integration (as demonstrated by contrib/word_count) is locality aware: it begins by querying Cassandra to generate locality aware splits, and when the hostnames match up between the Hadoop and Cassandra clusters, the data can be mapped locally. -Original Message- From: "Maxim

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Stu Hood
A Cassandra OutputFormat was recently contributed, but I haven't had a chance to review it. Any feedback you can give would be awesome: https://issues.apache.org/jira/browse/CASSANDRA-1101 Thanks, Stu -Original Message- From: "Jeff Zhang" Sent: Tuesday, May 18, 2010 4:39am To: user@cass

Re: cassandra.yaml not picked up?

2010-05-18 Thread Stu Hood
Also, when you are testing trunk, please remember to read NEWS.txt, as things change frequently. -Original Message- From: "Nathan McCall" Sent: Tuesday, May 18, 2010 11:36am To: user@cassandra.apache.org Subject: Re: cassandra.yaml not picked up? I came across this the other day as well

Re: Hadoop over Cassandra

2010-05-18 Thread Joseph Stein
If anyone is interested there is a great talk from Jonathan Ellis on the topic of Hadoop & Cassandra (interviewed yesterday) http://wp.me/pTu1i-40 I never knew that Pig was supported and I must say it is pretty kewl that you can run Pig scripts against your Cassandra data. It is a podcast so grab

Help with UUID in C#.net

2010-05-18 Thread Sandeep
Hi all, I am new to Cassandra. I am trying to insert some values to the columnfamily. The definition of columnfamily in the config file is as follows. When ever I try to insert values to I always get "InvalidRequestException(why: UUIDs must be exactly 16 bytes)". I am using batch_mutate()

Re: Help with UUID in C#.net

2010-05-18 Thread Roger Schildmeijer
Nick Berardi's blog post about Cassandra in conjunction with c#/.net and TimeUUID describes how to do. http://www.coderjournal.com/2010/04/creating-a-time-uuid-guid-in-net/ // Roger Schildmeijer On 18 maj 2010, at 21.45em, Sandeep wrote: > Hi all, > > I am new to Cassandra. I am trying to

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Tobias Jungen
Also note that the BMT example in contrib is an example of a hadoop process writing to Cassandra. On Tue, May 18, 2010 at 12:52 PM, Stu Hood wrote: > A Cassandra OutputFormat was recently contributed, but I haven't had a > chance to review it. Any feedback you can give would be awesome: > https:

RE: Help with UUID in C#.net

2010-05-18 Thread Sandeep
Hi Roger, Thanks for your reply. Actually I copied the class(GUIDGenerator.) in my project. Guid guidTimeStamp = GuidGenerator.GenerateTimeBasedGuid(); And using the above statement to generate the UUID. But I have no idea how to insert the UUID into Cassandra. List listOfArrivalTimes = new L

RE: Help with UUID in C#.net

2010-05-18 Thread Sandeep
Batch Mutate can only take map>> as parameter but not a GUID. How to solve this problem. From: Sandeep [mailto:sand...@indatus.com] Sent: Tuesday, May 18, 2010 4:03 PM To: user@cassandra.apache.org Subject: RE: Help with UUID in C#.net Hi Roger, Thanks for your reply. Actually I copied the cl

Re: Timeouts running batch_mutate

2010-05-18 Thread Sonny Heer
Yeah there are many writes happening at the same time to any given cass node. e.g. assume 10 machines, all running hadoop and cassandra. The hadoop nodes are randomly picking a cassandra node and writing directly using the batch mutate. After increasing the timeout even more, i don't get that ex

When will major compaction be triggered

2010-05-18 Thread Yi Mao
Hi, Nodetool is one way to trigger major compaction. Will major compaction be triggered automatically? Thanks. Yi

Re: Hadoop over Cassandra

2010-05-18 Thread Mark Schnitzius
> > If anyone has "war stories" on the topic of Cassandra & Hadoop (or > even just Hadoop in general) let me know. Don't know if it counts as a war story, but I was successful recently in implementing something I got advice on in an earlier thread, namely feeding both a Cassandra table and a Had

Re: nodetool causing OOM?

2010-05-18 Thread Jonathan Ellis
Yeah, there's really not a whole lot we can do about these Thrift problems other than get Avro ready as a replacement, which we are doing. :( On Mon, May 17, 2010 at 2:37 PM, Nahor wrote: > On 2010-05-17 12:51, Brandon Williams wrote: >> >> On Mon, May 17, 2010 at 2:44 PM, Ronald Park >

Re: Multiple hard disks configuration

2010-05-18 Thread Jonathan Ellis
Yes, you can rely on replication for this (run nodetool repair). On Mon, May 17, 2010 at 10:39 PM, Ma Xiao wrote: > Hi all, >     Recently we have a 5 nodes running cassandra, 4 X 1.5TB drives for each, > I installed os(Ubuntu 9.10 Server Edition) on one of them, and make entrie > disk as 1 parti

Re: Disk usage doubled after nodetool compact

2010-05-18 Thread Jonathan Ellis
Sounds like this: "SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary, but Cassandra will force one itself if it detects that it is low on space. A compaction marker is also added to obsolete sstables so

Re: Data migration from mysql to cassandra

2010-05-18 Thread Jonathan Ellis
Those are 2 of the 3 options (the other one being, continue to generate incrementing IDs either by continuing to use mysql solely for that purpose, or by using another system like redis for that). On Mon, May 17, 2010 at 10:48 PM, Beier Cai wrote: > I'm currently moving my existing mysql database

Re: Some questions about using Binary Memtable to import data.

2010-05-18 Thread Jonathan Ellis
1. yes 2. yes 3. compaction will slow down the load 4. it will flush the memtable On Tue, May 18, 2010 at 12:24 AM, Peng Guo wrote: > Hi All: > > I am trying to use Binary Memtable to import a large number of data. > > But after I look at the wiki > intro:http://wiki.apache.org/cassandra/BinaryMe

Re: When will major compaction be triggered

2010-05-18 Thread Jonathan Ellis
No. (Except in the special case where a minor compaction, also happens to be a major one.) On Tue, May 18, 2010 at 6:29 PM, Yi Mao wrote: > Hi, > Nodetool is one way to trigger major compaction. Will major compaction be > triggered automatically? Thanks. > > Yi > -- Jonathan Ellis Project Ch

Re: Hadoop over Cassandra

2010-05-18 Thread Jonathan Ellis
On Tue, May 18, 2010 at 9:40 PM, Mark Schnitzius wrote: >> If anyone has "war stories" on the topic of Cassandra & Hadoop (or >> even just Hadoop in general) let me know. > > Don't know if it counts as a war story, but I was successful recently in > implementing something I got advice on in an ear