date:20100916

Get cassandra SuperColumn only!

2010-09-16 Thread Saurabh Raje

Hi,

I have a cassandra datastore as follows:
key:{
 supercol (utf8) : {
   subcol (timuuid) : data
 }
}

Now, for a particular usecase I want to do slice on 2 levels. Firstly
on supercols & then from the selected supercols results slice subcols
(mostly to restrict no of items fetched in mem). I have tried various
API's and there doesn't seem to be a way to do this. The reason being
when I slice supercols i get the subcols in the result too! Now,
ofcourse, I can add another index as follows:

key : {
   supercol (utf8) : 
 }
}

Haven't looked at cassandra storage in too detail - but hoping there
is a better solution!

Thanks in advance.

0.7 live schema updates

2010-09-16 Thread Marc Canaleta

Hi!

I like the new feature of making live schema updates. You can add, drop and
rename columns and keyspaces via thrift, but how do you modify column
attributes like key_cache_size or rows_cached?

Thank you.

Re: 0.7 live schema updates

2010-09-16 Thread Oleg Anastasyev

You can change these attrs using JMX interface. Take a look at
org.apache.cassandra.tools.NodeProbe setCacheCapacities method.

busy thread on IncomingStreamReader

2010-09-16 Thread Joseph Mermelstein

Hi - has anyone made any progress with this issue? We are having the same
problem with our Cassandra nodes in production. At some point a node (and
sometimes all 3) will jump to 100% CPU usage and stay there for hours until
restarted. Stack traces reveal several threads in a seemingly endless loop
doing this:

"Thread-21770" - Thread t...@25278
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.size0(Native Method)
 at sun.nio.ch.FileChannelImpl.size(Unknown Source)
- locked java.lang.obj...@7a2c843d
 at sun.nio.ch.FileChannelImpl.transferFrom(Unknown Source)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
 at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


My understanding from reading the code is that this trace shows a thread
belonging to the StreamingService which is writing an incoming stream to
disk. There seems to be some kind of bizzare problem which is causing the
FileChannel.size() function to spin with high CPU.

Also, this problem is not easy to replicate - so I would appreciate any
information on how the StreamingService works and what triggers it to
transfer these file streams.

Thanks,

Joseph Mermelstein
LivePerson http://solutions.liveperson.com



>
>
> i all,
>
>  We setup two nodes and simply set replication factor=2 for test run.
>
> After both nodes, say, node A and node B, serve several hours, we found that
> "node A" always keep 300% cpu usage.
>
>
> (the other node is under 100% cpu, which is normal)
>
> thread dump on "node A" shows that there are 3 busy threads related to
> IncomingStreamReader:
>
> ==
>
> "Thread-66" prio=10 tid=0x2aade4018800 nid=0x69e7 runnable
>
>
> [0x4030a000]
>java.lang.Thread.State: RUNNABLE
> at sun.misc.Unsafe.setMemory(Native Method)
> at sun.nio.ch.Util.erase(Util.java:202)
> at
> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
>
>
> at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>
>
> "Thread-65" prio=10 tid=0x2aade4017000 nid=0x69e6 runnable
> [0x4d44b000]
>java.lang.Thread.State: RUNNABLE
> at sun.misc.Unsafe.setMemory(Native Method)
> at sun.nio.ch.Util.erase(Util.java:202)
>
>
> at
> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
> at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>
>
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>
> "Thread-62" prio=10 tid=0x2aade4014800 nid=0x4150 runnable
> [0x4d34a000]
>java.lang.Thread.State: RUNNABLE
>
>
> at sun.nio.ch.FileChannelImpl.size0(Native Method)
> at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309)
> - locked <0x2aaac450dcd0> (a java.lang.Object)
> at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597)
>
>
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>
> ===
>
>
> Is there anyone experience similar issue ?
>
> environments:
>
> OS   --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux
> Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build
> 14.2-b01, mixed mode)
>
>
> Cassandra --- 0.6.0
> Node configuration --- node A and node B. both nodes use node A as Seed
> client --- Java thrift clients pick one node randomly to do read and write.
>
>
> --
> Ingram Chen
> online share order: http://dinbendon.net
>
>
> blog: http://www.javaworld.com.tw/roller/page/ingramchen
>
>
>

Getting client only example to work

2010-09-16 Thread Asif Jan



Hi

I am using 0.7.0-beta1 , and trying to get the contrib/client_only  
example to work.


I am running cassandra on host1, and trying to access it from host2.

When using thirft (via cassandra-cli) and in my application; I am able  
to connect and do all operations as expected.


But I am not able to connect to cassandra when using the code in  
client_only  (or far that matter using contrib/bmt_example). Since my  
test requires to do bulk insertion of about 1.4 TB of data, so I need  
to use a non-thirft interface.


The error that I am getting is follows (the keyspace and the column  
family exist and can be used via Thirft) :


10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode  
'auto' determined to be mmap, indexAccessMode is mmap

10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip
Exception in thread "main" java.lang.IllegalArgumentException: Unknown  
ColumnFamily Standard1 in keyspace Keyspace1
	at  
org 
.apache 
.cassandra 
.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009)
	at  
org 
.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java: 
418)

at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)

I am using the following code (from client_only example) (also passing  
JVM parameter -Dstorage-config=path_2_cassandra.yaml)




public static void main(String[] args) throws Exception {
System.setProperty("storage-config","cassandra.yaml");

testWriting();
}


// from client_only example

 private static void testWriting() throws Exception
{
StorageService.instance.initClient();
// sleep for a bit so that gossip can do its thing.
try
{
Thread.sleep(1L);
}
catch (Exception ex)
{
throw new AssertionError(ex);
}

// do some writing.
	final AbstractType comp =  
ColumnFamily.getComparatorFor("Keyspace1", "Standard1", null);

for (int i = 0; i < 100; i++)
{
	RowMutation change = new RowMutation("Keyspace1", ("key"  
+ i).getBytes());
	ColumnPath cp = new  
ColumnPath("Standard1").setColumn(("colb").getBytes());
	change.add(new QueryPath(cp), ("value" + i).getBytes(),  
new TimestampClock(0));


	// don't call change.apply().  The reason is that is  
makes a static call into Table, which will perform
	// local storage initialization, which creates local  
directories.

// change.apply();

StorageProxy.mutate(Arrays.asList(change));
System.out.println("wrote key" + i);
}
System.out.println("Done writing.");
StorageService.instance.stopClient();
}

RE: 0.7 live schema updates

2010-09-16 Thread Viktor Jevdokimov

But you'll loose these settings after Cassandra restart.

-Original Message-
From: Oleg Anastasyev [mailto:olega...@gmail.com] 
Sent: Thursday, September 16, 2010 11:21 AM
To: user@cassandra.apache.org
Subject: Re: 0.7 live schema updates

You can change these attrs using JMX interface. Take a look at
org.apache.cassandra.tools.NodeProbe setCacheCapacities method.

Indexing&Locking in Cassandra

2010-09-16 Thread Sandor Molnar

Hello,

I have a few questions about indexing and locking in Cassandra:
- if I understood well only row level indexing exists prior to v0.7. I mean 
only the primary keys are indexed. Is that true?
- is it possible to use composite primary keys? For instance I have a user 
object: User(name,birthday,gender,address) and I want to have the 
(name,birthday) columns as PK. Can I do? If yes, how?
- does Cassandra support CF (table) level locking? Couls someone explain 
me/provide a link how?

Thanks in advance,
Sandor

Re: Indexing&Locking in Cassandra

2010-09-16 Thread Juho Mäkinen

Hello,

> I have a few questions about indexing and locking in Cassandra:
> - if I understood well only row level indexing exists prior to v0.7. I mean 
> only the primary keys are indexed. Is that true?

Yes and no. The row name is the key which you use to fetch the row
from cassandra. There are methods to iterate thru rows but that's not
efficient and should be used only in batch operations. Columns inside
rows are sorted by their names so they are also indexes as you use the
column name to fetch the contents of the column. If you want to index
data by other ways you need to build your own application code which
maintains such indexes and the upcoming 0.7 version will bring some
handy features which makes the coders job much easier.

> - is it possible to use composite primary keys? For instance I have a user 
> object: User(name,birthday,gender,address) and I want to have the 
> (name,birthday) columns as PK. Can I do? If yes, how?

You can always create your row key as a string like "$name_$birthday".
Did this answer to your question?

> - does Cassandra support CF (table) level locking? Couls someone explain 
> me/provide a link how?

No, cassandra doesn't have any locking capabilities. You can always
use some external locking mechanism like zookeeper
[http://hadoop.apache.org/zookeeper/] or implement your own sollution
on top of cassandra (not recommended as it's quite hard to get it
correctly).

 - Juho Mäkinen / Garo

RE: Indexing&Locking in Cassandra

2010-09-16 Thread Sandor Molnar


Thanks for your fast answer.
Regarding to the composite keys: that's what I thought by default I just needed 
some confirmation. Unfortunately I can not use this approach in our application 
so I will figure out something else.
I will check out Zookeeper to see if I can use it.

Thanks again!

Hello,

> I have a few questions about indexing and locking in Cassandra:
> - if I understood well only row level indexing exists prior to v0.7. I mean 
> only the primary keys are indexed. Is that true?

Yes and no. The row name is the key which you use to fetch the row
from cassandra. There are methods to iterate thru rows but that's not
efficient and should be used only in batch operations. Columns inside
rows are sorted by their names so they are also indexes as you use the
column name to fetch the contents of the column. If you want to index
data by other ways you need to build your own application code which
maintains such indexes and the upcoming 0.7 version will bring some
handy features which makes the coders job much easier.

> - is it possible to use composite primary keys? For instance I have a user 
> object: User(name,birthday,gender,address) and I want to have the 
> (name,birthday) columns as PK. Can I do? If yes, how?

You can always create your row key as a string like "$name_$birthday".
Did this answer to your question?

> - does Cassandra support CF (table) level locking? Couls someone explain 
> me/provide a link how?

No, cassandra doesn't have any locking capabilities. You can always
use some external locking mechanism like zookeeper
[http://hadoop.apache.org/zookeeper/] or implement your own sollution
on top of cassandra (not recommended as it's quite hard to get it
correctly).

 - Juho Mäkinen / Garo

Re: Getting client only example to work

2010-09-16 Thread Gary Dusbabek

I discovered some problems with the fat client earlier this week when
I tried using it.  It needs some fixes to keep up with all the 0.7
changes.

Gary.

On Thu, Sep 16, 2010 at 05:48, Asif Jan  wrote:
>
> Hi
> I am using 0.7.0-beta1 , and trying to get the contrib/client_only example
> to work.
> I am running cassandra on host1, and trying to access it from host2.
> When using thirft (via cassandra-cli) and in my application; I am able to
> connect and do all operations as expected.
> But I am not able to connect to cassandra when using the code in client_only
>  (or far that matter using contrib/bmt_example). Since my test requires to
> do bulk insertion of about 1.4 TB of data, so I need to use a non-thirft
> interface.
> The error that I am getting is follows (the keyspace and the column family
> exist and can be used via Thirft) :
> 10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode 'auto'
> determined to be mmap, indexAccessMode is mmap
> 10/09/16 12:35:31 INFO service.StorageService: Starting up client gossip
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> ColumnFamily Standard1 in keyspace Keyspace1
> at
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:1009)
> at
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418)
> at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
> at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)
> I am using the following code (from client_only example) (also passing JVM
> parameter -Dstorage-config=path_2_cassandra.yaml)
>
>
> public static void main(String[] args) throws Exception {
> System.setProperty("storage-config","cassandra.yaml");
>         testWriting();
> }
>
> // from client_only example
>  private static void testWriting() throws Exception
>     {
>         StorageService.instance.initClient();
>         // sleep for a bit so that gossip can do its thing.
>         try
>         {
>             Thread.sleep(1L);
>         }
>         catch (Exception ex)
>         {
>             throw new AssertionError(ex);
>         }
>         // do some writing.
>         final AbstractType comp = ColumnFamily.getComparatorFor("Keyspace1",
> "Standard1", null);
>         for (int i = 0; i < 100; i++)
>         {
>             RowMutation change = new RowMutation("Keyspace1", ("key" +
> i).getBytes());
>             ColumnPath cp = new
> ColumnPath("Standard1").setColumn(("colb").getBytes());
>             change.add(new QueryPath(cp), ("value" + i).getBytes(), new
> TimestampClock(0));
>             // don't call change.apply().  The reason is that is makes a
> static call into Table, which will perform
>             // local storage initialization, which creates local
> directories.
>             // change.apply();
>             StorageProxy.mutate(Arrays.asList(change));
>             System.out.println("wrote key" + i);
>         }
>         System.out.println("Done writing.");
>         StorageService.instance.stopClient();
>     }
>
>
>
>
>
>

Re: 0.7 live schema updates

2010-09-16 Thread Gary Dusbabek

beta-2 will include the ability to set these values and others.  Look
for the system_update_column_family() and system_update_keyspace()
methods.

Gary.

On Thu, Sep 16, 2010 at 02:38, Marc Canaleta  wrote:
> Hi!
> I like the new feature of making live schema updates. You can add, drop and
> rename columns and keyspaces via thrift, but how do you modify column
> attributes like key_cache_size or rows_cached?
> Thank you.

Re: Build an index to for join query

2010-09-16 Thread Rock, Paul

Alvin - assuming I understand what you're after correctly, why not make a CF 
Name_Address(name, address). Modifying the Cassandra methods to do the "join" 
you describe seems like overkill to me...

-Paul

On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:

Hello,

I am going to build an index to join two CFs.
First, we see this index as a CF/SCF. The difference is I don't materialise it.
Assume we have two tables:
ID_Address(Id, address) ,  Name_ID(name, id)
Then,the index is: Name_Address(name, address)

When the application tries to query on Name_Address, the value of "name" is 
given by the application.
I want to direct the read operation  to Name_ID to get "Id" value, then go to 
ID_Address to
get the "address" value by the "Id" value. So far, I consider only the read 
operation.
By this way, the join query is transparent to the user.

So I think I should find out which methods or classes are in charge of the read 
operation in the above operation.
For example, the operation in cassandra CLI "get Keyspace1.Standard2['jsmith']" 
calls exactly which methods
in the server side?

I noted CassandraServer is used to listen to clients, and there are some 
methods such as get(), get_slice().
Is it the right place I can modify to implement my idea?

Thanks.

Alvin

Pb with memtable_throughput_in_mb?

2010-09-16 Thread Thomas Boucher

Hi,

I am trying out the latest trunk version and I get an error when
starting Cassandra with -Xmx3G:
Fatal error: memtable_operations_in_millions must be a positive double

I guess it is caused by line 76 in org/apache/cassandra/config/Config.java [0]:

public Integer memtable_throughput_in_mb = (int)
Runtime.getRuntime().maxMemory() / 8;

The cast to (int) is done on maxMemory() but this method returns a
long, leading to a cast to a negative integer for mem=3G for instance.
Thus memtable_operations_in_millions becomes negative (Double
memtable_operations_in_millions = memtable_throughput_in_mb / 64 *
0.3) and the exception is thrown:

maxMemory() is measured in bytes but I guess memtable_throughput_in_mb
should in MB (as it names imply), which is not the case here.


What do you think?

Thanks for any input you have to this,
Cheers

[0] 
http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java

Re: Pb with memtable_throughput_in_mb?

2010-09-16 Thread Brandon Williams

On Thu, Sep 16, 2010 at 11:00 AM, Thomas Boucher  wrote:

> Hi,
>
> I am trying out the latest trunk version and I get an error when
> starting Cassandra with -Xmx3G:
> Fatal error: memtable_operations_in_millions must be a positive double
>
> I guess it is caused by line 76 in org/apache/cassandra/config/Config.java
> [0]:
>
>public Integer memtable_throughput_in_mb = (int)
> Runtime.getRuntime().maxMemory() / 8;
>
> The cast to (int) is done on maxMemory() but this method returns a
> long, leading to a cast to a negative integer for mem=3G for instance.
> Thus memtable_operations_in_millions becomes negative (Double
> memtable_operations_in_millions = memtable_throughput_in_mb / 64 *
> 0.3) and the exception is thrown:
>
> maxMemory() is measured in bytes but I guess memtable_throughput_in_mb
> should in MB (as it names imply), which is not the case here.
>

Oops, good catch.  Fixed in r997841.

 -Brandon

Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Francois Richard

Guys,

I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
Ubuntu.  I see that you have a ./debian directory in the source builds, do you 
have a bit more background on how it is used and build?

P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.

Thanks,

FR



Francois Richard

Re: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Dave Viner

Hi Francois,

Any reason http://wiki.apache.org/cassandra/DebianPackaging isn't working
for you?

Dave Viner


On Thu, Sep 16, 2010 at 10:30 AM, Francois Richard wrote:

> Guys,
>
>
>
> I am trying to build a debian package in order to deploy Cassandra 0.6.5 on
> Ubuntu.  I see that you have a ./debian directory in the source builds, do
> you have a bit more background on how it is used and build?
>
>
>
> P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.
>
>
>
> Thanks,
>
>
>
> FR
>
>
>
>
>
>
>
> *Francois Richard *
>

Secondary Index Null Pointer Error

2010-09-16 Thread Colin Britton

Hi,

I am using Casandra 0.7 trunk (r997357) and am having issues with a
secondary index.

I have a ColumnFamily with a secondary index on column "X". Not every row of
data has column X. It looks like when I write a row that does not have
column X, Cassandra throws the following NPE when it writes the index:

ERROR 20:05:37,015 Uncaught exception in thread
Thread[FLUSH-WRITER-POOL:1,5,main]
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:87)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
... 2 more
Caused by: java.lang.NullPointerException
at
org.apache.cassandra.io.sstable.IndexSummary.complete(IndexSummary.java:63)
at
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:383)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:138)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:132)
at
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:156)
at org.apache.cassandra.db.Memtable.access$000(Memtable.java:44)
at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:168)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 6 more

This doesn't necessarily happen as soon as the row is written; it happens
after you write enough rows, or after a restart of the server when the
commitlog is replayed.

Is it the case that indexed columns must exist?

Thanks

Colin

Re: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Clint Byrum

Hello Francois, 

There are already .debs available here:

http://wiki.apache.org/cassandra/DebianPackaging

I've also setup a PPA to build the packages on Ubuntu here:

https://launchpad.net/~cassandra-ubuntu/+archive/stable

Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I 
write this email..

The .debs are nearly identical. The only difference is that I've packaged the 
jars necessary to build, so that you get the same exact versions of all 
libraries if you need to patch + repeat the build. Also, these are built 
specifically for Ubuntu releases, so if we find any incompatibilities between 
debian/ubuntu we can fix them for ubuntu users. 

I hope this helps!

On Sep 16, 2010, at 10:30 AM, Francois Richard wrote:

> Guys,
>  
> I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
> Ubuntu.  I see that you have a ./debian directory in the source builds, do 
> you have a bit more background on how it is used and build?
>  
> P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.
>  
> Thanks,
>  
> FR
>  
>  
>  
> Francois Richard

Re: Get cassandra SuperColumn only!

2010-09-16 Thread Aaron Morton

AFAIK there is no way to get a list of the super columns, without also getting the sub columns. I do not know if there is a technical reason that would prevent this from being added. In general it's more efficient to make 1 request that pulls back more data, than two or more than pull back just enough data. But you also want to design to answer the queries you need to make.Keeping an index of super column names in another CF does not sound too bad. it might pay to take another look at why you are using a super CF. It may be better to use two standard CF's if say you want to have one sort of request that gets a list of things, and another sort of request that gets the details for a number of things.AaronOn 16 Sep, 2010,at 07:25 PM, Saurabh Raje  wrote:Hi,I have a cassandra datastore as follows:
key:{ supercol (utf8) : {   subcol (timuuid) : data }}Now, for a particular usecase I want to do slice on 2 levels. Firstlyon supercols & then from the selected supercols results slice subcols
(mostly to restrict no of items fetched in mem). I have tried variousAPI's and there doesn't seem to be a way to do this. The reason beingwhen I slice supercols i get the subcols in the result too! Now,
ofcourse, I can add another index as follows:key : {   supercol (utf8) :  }}Haven't looked at cassandra storage in too detail - but hoping thereis a better solution!
Thanks in advance

RE: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Francois Richard

Thanks Clint,

I am going to look-up the links below, I am pretty new on the DEB packaging in 
general and from what I have seen so far, a lot of the tutorial on the web are 
mostly based on classic [ .configure | make | make install ] of an application 
built in C.  In this case I wanted to figure out the DEB packaging in the 
context of a Java application.  I'll read on more and will stay in touch.

My goal at the end of the day, is to install " the stock" package for Cassandra 
and then to create a special Cassandra-config package that would move and 
deploy my customized configuration files on the system.

Thanks,

FR

-Original Message-
From: Clint Byrum [mailto:cl...@ubuntu.com] 
Sent: Thursday, September 16, 2010 10:54 AM
To: user@cassandra.apache.org
Subject: Re: Buildding a Ubuntu / Debian package for Cassandra

Hello Francois, 

There are already .debs available here:

http://wiki.apache.org/cassandra/DebianPackaging

I've also setup a PPA to build the packages on Ubuntu here:

https://launchpad.net/~cassandra-ubuntu/+archive/stable

Its currently still at v0.6.4, but I am in the process of uploading 0.6.5 as I 
write this email..

The .debs are nearly identical. The only difference is that I've packaged the 
jars necessary to build, so that you get the same exact versions of all 
libraries if you need to patch + repeat the build. Also, these are built 
specifically for Ubuntu releases, so if we find any incompatibilities between 
debian/ubuntu we can fix them for ubuntu users. 

I hope this helps!

On Sep 16, 2010, at 10:30 AM, Francois Richard wrote:

> Guys,
>  
> I am trying to build a debian package in order to deploy Cassandra 0.6.5 on 
> Ubuntu.  I see that you have a ./debian directory in the source builds, do 
> you have a bit more background on how it is used and build?
>  
> P.S. I am new to Ubuntu/Debian packaging so any type of pointer will help.
>  
> Thanks,
>  
> FR
>  
>  
>  
> Francois Richard

Re: Getting client only example to work

2010-09-16 Thread Asif Jan

ok, did something about the message service changed in the initClient  
method; essentially now one can not call initClient when a cassandra  
instance is running on the same machine.


thanks
On Sep 16, 2010, at 3:48 PM, Gary Dusbabek wrote:


I discovered some problems with the fat client earlier this week when
I tried using it.  It needs some fixes to keep up with all the 0.7
changes.

Gary.

On Thu, Sep 16, 2010 at 05:48, Asif Jan  wrote:


Hi
I am using 0.7.0-beta1 , and trying to get the contrib/client_only  
example

to work.
I am running cassandra on host1, and trying to access it from host2.
When using thirft (via cassandra-cli) and in my application; I am  
able to

connect and do all operations as expected.
But I am not able to connect to cassandra when using the code in  
client_only
 (or far that matter using contrib/bmt_example). Since my test  
requires to
do bulk insertion of about 1.4 TB of data, so I need to use a non- 
thirft

interface.
The error that I am getting is follows (the keyspace and the column  
family

exist and can be used via Thirft) :
10/09/16 12:35:31 INFO config.DatabaseDescriptor: DiskAccessMode  
'auto'

determined to be mmap, indexAccessMode is mmap
10/09/16 12:35:31 INFO service.StorageService: Starting up client  
gossip
Exception in thread "main" java.lang.IllegalArgumentException:  
Unknown

ColumnFamily Standard1 in keyspace Keyspace1
at
org 
.apache 
.cassandra 
.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java: 
1009)

at
org 
.apache 
.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:418)

at gaia.cu7.cassandra.input.Ingestor.testWriting(Ingestor.java:103)
at gaia.cu7.cassandra.input.Ingestor.main(Ingestor.java:187)
I am using the following code (from client_only example) (also  
passing JVM

parameter -Dstorage-config=path_2_cassandra.yaml)


public static void main(String[] args) throws Exception {
System.setProperty("storage-config","cassandra.yaml");
testWriting();
}

// from client_only example
 private static void testWriting() throws Exception
{
StorageService.instance.initClient();
// sleep for a bit so that gossip can do its thing.
try
{
Thread.sleep(1L);
}
catch (Exception ex)
{
throw new AssertionError(ex);
}
// do some writing.
final AbstractType comp =  
ColumnFamily.getComparatorFor("Keyspace1",

"Standard1", null);
for (int i = 0; i < 100; i++)
{
RowMutation change = new RowMutation("Keyspace1",  
("key" +

i).getBytes());
ColumnPath cp = new
ColumnPath("Standard1").setColumn(("colb").getBytes());
change.add(new QueryPath(cp), ("value" + i).getBytes(),  
new

TimestampClock(0));
// don't call change.apply().  The reason is that is  
makes a

static call into Table, which will perform
// local storage initialization, which creates local
directories.
// change.apply();
StorageProxy.mutate(Arrays.asList(change));
System.out.println("wrote key" + i);
}
System.out.println("Done writing.");
StorageService.instance.stopClient();
}

Re: Bootstrapping stays stuck

2010-09-16 Thread Gurpreet Singh

Thanks to driftx from cassandra IRC channel for helping out.
This was resolved by increasing the rpc timeout for the bootstrap process.

On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh
wrote:

> This problem still stays unresolved despite numerous restarts to the
> cluster. I cant seem to find a way out of this one, and I am not really
> looking for a workaround, kinda need this to work if i need to go to
> production.
>
> Turned on the ALL logging in log4j, and now I see the following exception
> (EOFException) on the destination. After receiving each file, it seems to be
> throwing this exception. The transfer is successful except for this
> exception. The source successful declares the transfer complete. But the
> destination does not move out of the bootstrapping mode, and just sits
> there.
>
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.completedfilesta...@54828e7 to IP1
> TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
> at java.io.DataInputStream.readInt(Unknown Source)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 51) Receiving stream
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 54) Creating file for
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.completedfilesta...@7b41a32f to IP1
> TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
> at java.io.DataInputStream.readInt(Unknown Source)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
>
> /G
>
> On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh  > wrote:
>
>> Hi Vineet,
>> I have tracked the nodetool streams to completion each time. Below are the
>> logs on the source and destination node. There are 3 sstables being
>> transferred, and the transfer seems to be successful. However, after the
>> streams finish, the source prints out messages about the dropped messages,
>> which may point to the problem. ideas? I checked port 7000 is open for
>> communication. 9160 is not up on the node being bootstrapped, but that comes
>> up after the node is bootstrapped, is that right?
>>
>> Thanks a ton,
>> /G
>>
>> *Logs on the source node (IP2):*
>> *
>> *
>> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
>> Flushing memtables for userdata...
>>  INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
>> Performing anticompaction ...
>>  INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
>>  INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129)
>> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
>> 4415946752
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
>>  49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
>>  28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 1

Re: Buildding a Ubuntu / Debian package for Cassandra

2010-09-16 Thread Clint Byrum

On Sep 16, 2010, at 2:03 PM, Francois Richard wrote:

> Thanks Clint,
> 
> I am going to look-up the links below, I am pretty new on the DEB packaging 
> in general and from what I have seen so far, a lot of the tutorial on the web 
> are mostly based on classic [ .configure | make | make install ] of an 
> application built in C.  In this case I wanted to figure out the DEB 
> packaging in the context of a Java application.  I'll read on more and will 
> stay in touch.
> 

Actually there's a lot of ambiguity in packaging that arise given Java's unique 
properties as a compiled, architecture independent platform. I wouldn't 
recommend *starting* your debian packaging journey with java. Maybe find a nice 
C program first. ;)

> My goal at the end of the day, is to install " the stock" package for 
> Cassandra and then to create a special Cassandra-config package that would 
> move and deploy my customized configuration files on the system.
> 

You're probably better off using a configuration management system such as 
puppet, chef, or cfengine to.. well.. manage your configuration.

questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Gurpreet Singh

Hi,

I have a few questions and was looking for an answer.
I have a cluster of 7 Cassandra 0.6.5 nodes in my test setup. RF=2. Original
data size is about 100 gigs, with RF=2, i see the total load on the cluster
is about 200 gigs, all good.

1.  I was looking to increase the RF to 3. This process entails changing the
config and calling repair on the keyspace one at a time, right?
So, I started with one node at a time, changed the config file on the first
node for the keyspace, restarted the node. And then called a nodetool repair
on the node.   These same steps i followed for every node after that, as I
read somewhere that the repair should be invoked one node at a time.
(a) What is the best way to ascertain if the repair is completed on a node?
(b) After the repair was finished, I was expecting the total data load to be
300 gigs. However, calling the ring command, shows the total load to be 370
gigs. I double checked and config on all machines says RF=3. I am calling a
cleanup on each node right now. Is the cleanup required after calling a
repair? Am i missing something?


2. This question is regarding multi-datacenter support. I plan to have a
cluster of 6 machines across 2 datacenters, with the machines from the
datacenters alternating on the ring. RF=3 is the plan. I already have a test
setup as described above, which has most of the data, but its still
configured on the default RackUnAware strategy. I was hoping to find the
right steps to move it to RackAware strategy with the
PropertyFileEndpointSnitch that I read somewhere (not sure if thats
supported in 0.6.5, but CustomEndPointSnitch is the same, right?), all this
without having to repopulate any data again.
Currently there is only 1 datacenter, but I was stil planning to set the
cluster up as it would be in multi-datacenter support, and run it like that
in the one datacenter, and when the second datacenter comes up, just copy
all the files across to the new nodes in the second datacenter, and bring
the whole cluster up.  Will this work ? I have tried copying files to a new
node, shutting down all nodes, and bringing back everything up, and it
recognized the new ips.


Thanks
Gurpreet

What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Ying Tang

What the thrift version cassandra 0.7 beta uses?

-- 
Best regards,

Ivy Tang

Re: What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Jeremy Hanna

It doesn't use a specific version - it uses a specific subversion revision.  
The revision number is appended to the thrift jar in the cassandra lib folder.

On Sep 16, 2010, at 9:10 PM, Ying Tang wrote:

> What the thrift version cassandra 0.7 beta uses?
> 
> -- 
> Best regards,
> 
> Ivy Tang
> 
> 
>

Re: What the thrift version cassandra 0.7 beta uses?

2010-09-16 Thread Ying Tang

So the thrift.lib will maybe change while the cassandra is updating?

On Thu, Sep 16, 2010 at 10:36 PM, Jeremy Hanna
wrote:

> It doesn't use a specific version - it uses a specific subversion revision.
>  The revision number is appended to the thrift jar in the cassandra lib
> folder.
>
> On Sep 16, 2010, at 9:10 PM, Ying Tang wrote:
>
> > What the thrift version cassandra 0.7 beta uses?
> >
> > --
> > Best regards,
> >
> > Ivy Tang
> >
> >
> >
>
>


-- 
Best regards,

Ivy Tang

Re: questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Benjamin Black

On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh
 wrote:
> 1.  I was looking to increase the RF to 3. This process entails changing the
> config and calling repair on the keyspace one at a time, right?
> So, I started with one node at a time, changed the config file on the first
> node for the keyspace, restarted the node. And then called a nodetool repair
> on the node.

You need to change the RF on _all_ nodes in the cluster _before_
running repair on _any_ of them.  If nodes disagree on which nodes
should have replicas for keys, repair will not work correctly.
Different RF for the same keyspace creates that disagreement.

b

Re: questions on cassandra (repair and multi-datacenter)

2010-09-16 Thread Gurpreet Singh

Thanks Benjamin. I realised that, i have reverted using cleanup, got it back
to old state and testing the scenario exactly the way you put it.

On Thu, Sep 16, 2010 at 10:56 PM, Benjamin Black  wrote:

> On Thu, Sep 16, 2010 at 3:19 PM, Gurpreet Singh
>  wrote:
> > 1.  I was looking to increase the RF to 3. This process entails changing
> the
> > config and calling repair on the keyspace one at a time, right?
> > So, I started with one node at a time, changed the config file on the
> first
> > node for the keyspace, restarted the node. And then called a nodetool
> repair
> > on the node.
>
> You need to change the RF on _all_ nodes in the cluster _before_
> running repair on _any_ of them.  If nodes disagree on which nodes
> should have replicas for keys, repair will not work correctly.
> Different RF for the same keyspace creates that disagreement.
>
>
> b
>

Get cassandra SuperColumn only!

0.7 live schema updates

Re: 0.7 live schema updates

busy thread on IncomingStreamReader

Getting client only example to work

RE: 0.7 live schema updates

Indexing&Locking in Cassandra

Re: Indexing&Locking in Cassandra

RE: Indexing&Locking in Cassandra

Re: Getting client only example to work

Re: 0.7 live schema updates

Re: Build an index to for join query

Pb with memtable_throughput_in_mb?

Re: Pb with memtable_throughput_in_mb?

Buildding a Ubuntu / Debian package for Cassandra

Re: Buildding a Ubuntu / Debian package for Cassandra

Secondary Index Null Pointer Error

Re: Buildding a Ubuntu / Debian package for Cassandra

Re: Get cassandra SuperColumn only!

RE: Buildding a Ubuntu / Debian package for Cassandra

Re: Getting client only example to work

Re: Bootstrapping stays stuck

Re: Buildding a Ubuntu / Debian package for Cassandra

questions on cassandra (repair and multi-datacenter)

What the thrift version cassandra 0.7 beta uses?

Re: What the thrift version cassandra 0.7 beta uses?

Re: What the thrift version cassandra 0.7 beta uses?

Re: questions on cassandra (repair and multi-datacenter)

Re: questions on cassandra (repair and multi-datacenter)

29 matches

Site Navigation

Mail list logo

Footer information