Re: Cassandra search performance

2012-05-07 Thread David Jeske
On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin  wrote:

> Looking at your example,as I think you understand, you forgo indexes by
> combining two conditions in one query, thinking along the lines of what is
> often done in RDBMS. A scan is expected in this case, and there is no
> magic to avoid it.
>

This sounds like a mis-understanding of how RDBMSs work. If you combine two
conditions in a single SQL query, the SQL execution optimizer looks at the
cardinality of any indicies. If it can successfully predict that one of the
conditions significantly reduces the set of rows that would be considered
(such as a status match having 200 hits vs 1M rows in the table), then it
selects this index for the first-iteration, and each index hit causes a
record lookup which is then tested for the other conditions.  (This is one
of several query-execution types RDBMS systems use)

I'm no Cassandra expert, so I don't know what it does WRT index-selection,
but from the page written on secondary indicies, it seems like if you just
query on status, and do the other filtering yourself it'll probably do what
you want...

http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes


> However, if this query is important, you can easily index on two
> conditions,
> using a composite type (look it up), or string concatenation for quick and
> easy solution.
>

This is not necessarily a good idea. Creating a composite index explodes
the index size unnecessarily. If a condition can reduce a query to 200
records, there is no need to have a composite index including another
condition.


Re: SSTableWriter and Bulk Loading life cycle enhancement

2012-05-07 Thread aaron morton
Can you copy the sstables as a task after the load operation ? You should know 
where the files are. 

The are multiple files may be created by the writer during the loading process. 
So running code that performs a long running action will impact on the time 
taken to pump data through the SSTableSimpleUnsortedWriter.

wrt the patch, the best place to start the conversation for this is on 
https://issues.apache.org/jira/browse/CASSANDRA 

Thanks taking the time to look into this. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/05/2012, at 11:40 PM, Benoit Perroud wrote:

> Hi All,
> 
> I'm bulk loading (a lot of) data from Hadoop into Cassandra 1.0.x. The
> provided CFOutputFormat is not the best case here, I wanted to use the
> bulk loading feature. I know 1.1 comes with a BulkOutputFormat but I
> wanted to propose a simple enhancement to SSTableSimpleUnsortedWriter
> that could ease life :
> 
> When the table is flushed into the disk, it could be interesting to
> have listeners that could be triggered to perform any action (copying
> my SSTable into HDFS for instance).
> 
> Please have a look at the patch below to give a better idea. Do you
> think it could worth while opening a jira for this ?
> 
> 
> Regarding 1.1 BulkOutputFormat and bulk in general, the work done to
> have light client to stream into the cluster is really great. The
> issue now is that data is streamed at the end of the task only. This
> cause all the tasks storing the data locally and streaming everything
> at the end. Lot's of temporary space may be needed, and lot of
> bandwidth to the nodes are used at the "same" time. With the listener,
> we would be able to start streaming as soon the first table is
> created. That way the streaming bandwidth could be better balanced.
> Jira for this also ?
> 
> Thanks
> 
> Benoit.
> 
> 
> 
> 
> --- 
> a/src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
> +++ 
> b/src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
> @@ -21,6 +21,8 @@ package org.apache.cassandra.io.sstable;
> import java.io.File;
> import java.io.IOException;
> import java.nio.ByteBuffer;
> +import java.util.LinkedList;
> +import java.util.List;
> import java.util.Map;
> import java.util.TreeMap;
> 
> @@ -47,6 +49,8 @@ public class SSTableSimpleUnsortedWriter extends
> AbstractSSTableSimpleWriter
> private final long bufferSize;
> private long currentSize;
> 
> +private final List sSTableWrittenListeners
> = new LinkedList();
> +
> /**
>  * Create a new buffering writer.
>  * @param directory the directory where to write the sstables
> @@ -123,5 +127,16 @@ public class SSTableSimpleUnsortedWriter extends
> AbstractSSTableSimpleWriter
> }
> currentSize = 0;
> keys.clear();
> +
> +// Notify the registered listeners
> +for (SSTableWriterListener listeners : sSTableWrittenListeners)
> +{
> +
> listeners.onSSTableWrittenAndClosed(writer.getTableName(),
> writer.getColumnFamilyName(), writer.getFilename());
> +}
> +}
> +
> +public void addSSTableWriterListener(SSTableWriterListener listener)
> +{
> +   sSTableWrittenListeners.add(listener);
> }
> }
> diff --git 
> a/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
> b/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
> new file mode 100644
> index 000..6628d20
> --- /dev/null
> +++ b/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
> @@ -0,0 +1,9 @@
> +package org.apache.cassandra.io.sstable;
> +
> +import java.io.IOException;
> +
> +public interface SSTableWriterListener {
> +
> +   void onSSTableWrittenAndClosed(final String tableName, final
> String columnFamilyName, final String filename) throws IOException;
> +
> +}



Re: count after truncate NOT zero

2012-05-07 Thread aaron morton
I don't know the YCSB code, but one theory would be…

1) The cluster is overloaded by the test. 
2) A write at CL ALL fails because a node does not respond in time. 
3) The coordinator stores the hint and returns failure to the client. 
4) The client gets an UnavailableException and retries the operation. 

Did the nodes show any dropped messages ? Either in nodetool tpstats or in the 
logs?

Truncate is meta data operation, unlike deleting columns or rows. When a column 
is deleted a Tombstone column is written, when row is deleted information is 
associated with the key, in the context of the CF. Truncate snapshots and then 
deletes the SSTables on disk, it does not write to the SSTables. So it is 
possible for a write to be stored with a lower timestamp than the truncate, 
because truncate does not have a timestamp. 

cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/05/2012, at 1:28 AM, Peter Dijkshoorn wrote:

> Hi guys,
> 
> I got a weird thingy popping up twice today, I run a test where I insert
> a milion records via YCSB and edited it to allow me to adjust the
> consistency level: the write operations are done with ConsistencyLevel.ALL.
> This is send to a 4 (virtual) node cluster with a keyspace 'test' set up
> with replication factor 3.
> Now I expect that because of the ConsistencyLevel.ALL there is no hinted
> handoff active, since writes are to be accepted by all nodes before the
> operation returns to the client. The client gets only OK back, none fails.
> After the test I run a truncate, and a count which reveals still active
> records, time does not matter, I have to re-invoke the truncate to
> remove the records.
> 
> [cqlsh 2.0.0 | Cassandra 1.0.8 | CQL spec 2.0.0 | Thrift protocol 19.20.0]
> cqlsh> use test;
> cqlsh:test> truncate usertable;
> cqlsh:test> select count(*) from usertable ;
> count
> ---
> 3
> 
> 
> On the cassandra output (-f) I can see that there is some handoff-ing
> active, which I did not expect.
> 
> Has anyone an idea why the handoff is active while issuing opperations
> with ConsistencyLevel.ALL?
> Why is the truncate not correctly put in sync and allows subsequent
> handoff's delivered of records originally set before the truncate?
> 
> Thanks if you can clarify these thing, I did not expect this at all.
> 
> Cheers,
> 
> Peter
> 
> -- 
> Peter Dijkshoorn
> Adyen - Payments Made Easy
> www.adyen.com
> 
> Visiting address: Mail Address: 
> Simon Carmiggeltstraat 6-50   P.O. Box 10095
> 1011 DJ Amsterdam   1001 EB Amsterdam
> The Netherlands The Netherlands
> 
> Office +31.20.240.1240
> Email peter.dijksho...@adyen.com
> 



Re: Bulk loading and timestamps

2012-05-07 Thread aaron morton
Yes. See the example here http://www.datastax.com/dev/blog/bulk-loading

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/05/2012, at 2:49 AM, Oleg Proudnikov wrote:

> Hello, group
> 
> Will the bulk loader preserve original column timestamps?
> 
> Thank you very much,
> Oleg
> 
> 



Re: count after truncate NOT zero

2012-05-07 Thread Peter Dijkshoorn
Check, I understand. Thanks!

The cluster certainly was overloaded and I did not realize that truncate
does not tombstone or have a timestamp. Some 'feature' for future
implementation, maybe?
It seems odd if you expect the same behaviour of "delete from usertable"
(in SQL, not yet in CQL, I presume), especially because the truncate is
synced over all nodes before it returns to client, so a truncate may
rightfully discard its handoffs, right?
btw, it was very hard to replicate this behaviour, seems to be a rare
occurrence...

I wonder though that you don't know YCSB, what do you use to do
performance testing? wrote your own or use another tool? In the latter I
would like to know what you use :)

Ciao


Peter Dijkshoorn
Adyen - Payments Made Easy
www.adyen.com

Visiting address:   Mail Address: 
Simon Carmiggeltstraat 6-50 P.O. Box 10095
1011 DJ Amsterdam   1001 EB Amsterdam
The Netherlands The Netherlands

Office +31.20.240.1240
Email peter.dijksho...@adyen.com


On 05/07/2012 12:59 PM, aaron morton wrote:
> I don't know the YCSB code, but one theory would be...
>
> 1) The cluster is overloaded by the test. 
> 2) A write at CL ALL fails because a node does not respond in time. 
> 3) The coordinator stores the hint and returns failure to the client. 
> 4) The client gets an UnavailableException and retries the operation. 
>
> Did the nodes show any dropped messages ? Either in nodetool tpstats
> or in the logs?
>
> Truncate is meta data operation, unlike deleting columns or rows. When
> a column is deleted a Tombstone column is written, when row is deleted
> information is associated with the key, in the context of the CF.
> Truncate snapshots and then deletes the SSTables on disk, it does not
> write to the SSTables. So it is possible for a write to be stored with
> a lower timestamp than the truncate, because truncate does not have a
> timestamp. 
>
> cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/05/2012, at 1:28 AM, Peter Dijkshoorn wrote:
>
>> Hi guys,
>>
>> I got a weird thingy popping up twice today, I run a test where I insert
>> a milion records via YCSB and edited it to allow me to adjust the
>> consistency level: the write operations are done with
>> ConsistencyLevel.ALL.
>> This is send to a 4 (virtual) node cluster with a keyspace 'test' set up
>> with replication factor 3.
>> Now I expect that because of the ConsistencyLevel.ALL there is no hinted
>> handoff active, since writes are to be accepted by all nodes before the
>> operation returns to the client. The client gets only OK back, none
>> fails.
>> After the test I run a truncate, and a count which reveals still active
>> records, time does not matter, I have to re-invoke the truncate to
>> remove the records.
>>
>> [cqlsh 2.0.0 | Cassandra 1.0.8 | CQL spec 2.0.0 | Thrift protocol
>> 19.20.0]
>> cqlsh> use test;
>> cqlsh:test> truncate usertable;
>> cqlsh:test> select count(*) from usertable ;
>> count
>> ---
>> 3
>>
>>
>> On the cassandra output (-f) I can see that there is some handoff-ing
>> active, which I did not expect.
>>
>> Has anyone an idea why the handoff is active while issuing opperations
>> with ConsistencyLevel.ALL?
>> Why is the truncate not correctly put in sync and allows subsequent
>> handoff's delivered of records originally set before the truncate?
>>
>> Thanks if you can clarify these thing, I did not expect this at all.
>>
>> Cheers,
>>
>> Peter
>>
>> -- 
>> Peter Dijkshoorn
>> Adyen - Payments Made Easy
>> www.adyen.com 
>>
>> Visiting address:   Mail Address: 
>> Simon Carmiggeltstraat 6-50P.O. Box 10095
>> 1011 DJ Amsterdam   1001 EB Amsterdam
>> The Netherlands The Netherlands
>>
>> Office +31.20.240.1240
>> Email peter.dijksho...@adyen.com
>>
>


sstableloader 1.1 won't stream

2012-05-07 Thread Pieter Callewaert
Hi,

I'm trying to upgrade our bulk load process in our testing env.
We use the SSTableSimpleUnsortedWriter to write tables, and use sstableloader 
to stream it into our cluster.
I've changed the writer program to fit to the 1.1 api, but now I'm having 
troubles to load them to our cluster. The cluster exists out of one 1.1 node 
and two 1.0.9 nodes.

I've enabled debug as parameter and in the log4j conf.

[root@bms-app1 ~]# ./apache-cassandra/bin/sstableloader --debug -d 10.10.10.100 
/tmp/201205071234/MapData024/HOS/
INFO 16:25:40,735 Opening /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1 
(1588949 bytes)
INFO 16:25:40,755 JNA not found. Native methods will be disabled.
DEBUG 16:25:41,060 INDEX LOAD TIME for 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1: 327 ms.
Streaming revelant part of 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to [/10.10.10.102, 
/10.10.10.100, /10.10.10.101]
INFO 16:25:41,083 Stream context metadata 
[/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6557280 - 0%], 1 sstables.
DEBUG 16:25:41,084 Adding file 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
INFO 16:25:41,087 Streaming to /10.10.10.102
DEBUG 16:25:41,092 Files are 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6557280 - 0%
INFO 16:25:41,099 Stream context metadata 
[/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6551840 - 0%], 1 sstables.
DEBUG 16:25:41,100 Adding file 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
INFO 16:25:41,100 Streaming to /10.10.10.100
DEBUG 16:25:41,100 Files are 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6551840 - 0%
INFO 16:25:41,102 Stream context metadata 
[/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 
progress=0/6566400 - 0%], 1 sstables.
DEBUG 16:25:41,102 Adding file 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
INFO 16:25:41,102 Streaming to /10.10.10.101
DEBUG 16:25:41,102 Files are 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 
progress=0/6566400 - 0%

progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:41,107 Failed attempt 1 to 
connect to /10.10.10.101 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 
progress=0/6566400 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.102 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6557280 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.100 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6551840 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:45,109 Failed attempt 2 to 
connect to /10.10.10.101 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 
progress=0/6566400 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.102 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6557280 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.100 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6551840 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:53,113 Failed attempt 3 to 
connect to /10.10.10.101 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 
progress=0/6566400 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:53,114 Failed attempt 3 to connect to /10.10.10.102 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6557280 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 16:25:53,115 Failed attempt 3 to connect to /10.10.10.100 to stream 
/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 
progress=0/6551840 - 0%. Retrying in 16000 ms. (java.net.SocketException:

Re: getting status of long running repair

2012-05-07 Thread Bill Au
I restarted the nodes and then restarted the repair.  It is still hanging
like before.  Do I keep repeating until the repair actually finish?

Bill

On Fri, May 4, 2012 at 2:18 PM, Rob Coli  wrote:

> On Fri, May 4, 2012 at 10:30 AM, Bill Au  wrote:
> > I know repair may take a long time to run.  I am running repair on a node
> > with about 15 GB of data and it is taking more than 24 hours.  Is that
> > normal?  Is there any way to get status of the repair?  tpstats does
> show 2
> > active and 2 pending AntiEntropySessions.  But netstats and
> compactionstats
> > show no activity.
>
> As indicated by various recent threads to this effect, many versions
> of cassandra (including current 1.0.x release) contain bugs which
> sometimes prevent repair from completing. The other threads suggest
> that some of these bugs result in the state you are in now, where you
> do not see anything that looks like appropriate activity.
> Unfortunately the only solution offered on these other threads is the
> one I will now offer, which is to restart the participating nodes and
> re-start the repair. I am unaware of any JIRA tickets tracking these
> bugs (which doesn't mean they don't exist, of course) so you might
> want to file one. :)
>
> =Rob
>
> --
> =Robert Coli
> AIM>ALK - rc...@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb
>


cassandra1.1 can't start

2012-05-07 Thread cyril auburtin
The xassandra lauch command worked the first time

then now I keep getting

INFO 18:18:39,354 Starting up server gossip
ERROR 18:18:39,357 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.io.IOError: java.io.IOException: Map failed
at
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
 at
org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:191)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:803)
 at
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:119)
... 4 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:800)
... 5 more
ERROR 18:18:39,361 Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException
at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1113)
 at
org.apache.cassandra.service.StorageService$2.runMayThrow(StorageService.java:478)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:636)

I have tried rebooting, starting it alone, clearing all /var/lib/cassandra
dir
but keep getting this error

any idea?


Re: cassandra1.1 can't start

2012-05-07 Thread cyril auburtin
well I uncommented lines96&97 in cassandra-env.sh, with lower values

MAX_HEAP_SIZE="500M"
HEAP_NEWSIZE="100M"

seems to fix, it

2012/5/7 cyril auburtin 

> The xassandra lauch command worked the first time
>
> then now I keep getting
>
> INFO 18:18:39,354 Starting up server gossip
> ERROR 18:18:39,357 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.io.IOError: java.io.IOException: Map failed
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
>  at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:191)
> at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>  at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:803)
>  at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:119)
> ... 4 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
>  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:800)
> ... 5 more
> ERROR 18:18:39,361 Exception in thread
> Thread[StorageServiceShutdownHook,5,main]
> java.lang.NullPointerException
> at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1113)
>  at
> org.apache.cassandra.service.StorageService$2.runMayThrow(StorageService.java:478)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>  at java.lang.Thread.run(Thread.java:636)
>
> I have tried rebooting, starting it alone, clearing all /var/lib/cassandra
> dir
> but keep getting this error
>
> any idea?
>


Re: sstableloader 1.1 won't stream

2012-05-07 Thread Benoit Perroud
You may want to upgrade all your nodes to 1.1.

The streaming process connect to every living nodes of the cluster
(you can explicitely diable some nodes), so all nodes need to speak
1.1.



2012/5/7 Pieter Callewaert :
> Hi,
>
>
>
> I’m trying to upgrade our bulk load process in our testing env.
>
> We use the SSTableSimpleUnsortedWriter to write tables, and use
> sstableloader to stream it into our cluster.
>
> I’ve changed the writer program to fit to the 1.1 api, but now I’m having
> troubles to load them to our cluster. The cluster exists out of one 1.1 node
> and two 1.0.9 nodes.
>
>
>
> I’ve enabled debug as parameter and in the log4j conf.
>
>
>
> [root@bms-app1 ~]# ./apache-cassandra/bin/sstableloader --debug -d
> 10.10.10.100 /tmp/201205071234/MapData024/HOS/
>
> INFO 16:25:40,735 Opening
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1 (1588949 bytes)
>
> INFO 16:25:40,755 JNA not found. Native methods will be disabled.
>
> DEBUG 16:25:41,060 INDEX LOAD TIME for
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1: 327 ms.
>
> Streaming revelant part of
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to
> [/10.10.10.102, /10.10.10.100, /10.10.10.101]
>
> INFO 16:25:41,083 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%], 1 sstables.
>
> DEBUG 16:25:41,084 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,087 Streaming to /10.10.10.102
>
> DEBUG 16:25:41,092 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%
>
> INFO 16:25:41,099 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%], 1 sstables.
>
> DEBUG 16:25:41,100 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,100 Streaming to /10.10.10.100
>
> DEBUG 16:25:41,100 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%
>
> INFO 16:25:41,102 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%], 1 sstables.
>
> DEBUG 16:25:41,102 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,102 Streaming to /10.10.10.101
>
> DEBUG 16:25:41,102 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%
>
>
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:41,107 Failed attempt 1 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.102 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.100 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:45,109 Failed attempt 2 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.102 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.100 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:53,113 Failed attempt 3 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 16000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:53,114 Failed attempt 3 to connect to /10.10.10.102 to stream
> /tmp/201205071234/Ma

Error deleting column families with 1.1

2012-05-07 Thread André Cruz

Hello.

Since I upgraded to Cassandra 1.1, I get the following error when  
trying to delete a CF. After this happens the CF is not accessible  
anymore, but I cannot create another one with the same name until I  
restart the server.


INFO [MigrationStage:1] 2012-05-07 18:10:12,682 ColumnFamilyStore.java  
(line 634) Enqueuing flush of  
Memtable-schema_columnfamilies@1128094887(978/1222 serialized/live  
bytes, 21 ops)
INFO [FlushWriter:2] 2012-05-07 18:10:12,682 Memtable.java (line 266)  
Writing Memtable-schema_columnfamilies@1128094887(978/1222  
serialized/live bytes, 21 ops)
INFO [FlushWriter:2] 2012-05-07 18:10:12,720 Memtable.java (line 307)  
Completed flushing  
/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-28-Data.db (1041  
bytes)
INFO [MigrationStage:1] 2012-05-07 18:10:12,721 ColumnFamilyStore.java  
(line 634) Enqueuing flush of  
Memtable-schema_columns@1599271050(392/490 serialized/live bytes, 8 ops)
INFO [FlushWriter:2] 2012-05-07 18:10:12,722 Memtable.java (line 266)  
Writing Memtable-schema_columns@1599271050(392/490 serialized/live  
bytes, 8 ops)
INFO [CompactionExecutor:8] 2012-05-07 18:10:12,722  
CompactionTask.java (line 114) Compacting  
[SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-26-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-28-Data.db'),  
SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfam
ilies-hc-27-Data.db'),  
SSTableReader(path='/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-25-Data.db')]
INFO [FlushWriter:2] 2012-05-07 18:10:12,806 Memtable.java (line 307)  
Completed flushing  
/var/lib/cassandra/data/system/schema_columns/system-schema_columns-hc-23-Data.db (447  
bytes)
INFO [CompactionExecutor:8] 2012-05-07 18:10:12,811  
CompactionTask.java (line 225) Compacted to  
[/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-29-Data.db,].  24,797 to  
21,431

(~86% of original) bytes for 2 keys at 0.232252MB/s.  Time: 88ms.
ERROR [MigrationStage:1] 2012-05-07 18:10:12,895 CLibrary.java (line  
158) Unable to create hard link

com.sun.jna.LastErrorException: errno was 17
 at org.apache.cassandra.utils.CLibrary.link(Native Method)
 at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:150)
 at  
org.apache.cassandra.db.Directories.snapshotLeveledManifest(Directories.java:343)
 at  
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1450)
 at  
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1483)

 at org.apache.cassandra.db.DefsTable.dropColumnFamily(DefsTable.java:512)
 at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:403)
 at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:270)
 at  
org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)

 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at  
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)
ERROR [Thrift:17] 2012-05-07 18:10:12,898 CustomTThreadPoolServer.java  
(line 204) Error occurred during processing of message.
java.lang.RuntimeException: java.util.concurrent.ExecutionException:  
java.io.IOError: java.io.IOException: Unable to create hard link from  
/var/lib/cassandra/data/Disco/Client/Client.json to  
/var/lib/cassandra/data/

Disco/Client/snapshots/1336410612893-Client/Client.json (errno 17)
 at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:372)
 at  
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:191)
 at  
org.apache.cassandra.service.MigrationManager.announceColumnFamilyDrop(MigrationManager.java:182)
 at  
org.apache.cassandra.thrift.CassandraServer.system_drop_column_family(CassandraServer.java:948)
 at  
org.apache.cassandra.thrift.Cassandra$Processor$system_drop_column_family.getResult(Cassandra.java:3348)
 at  
org.apache.cassandra.thrift.Cassandra$Processor$system_drop_column_family.getResult(Cassandra.java:3336)

 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
 at  
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
 at  
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: java.io.IOError:  
java.io.IOException: Unable to create hard l

using the proxy on the cli or configHelper to connect to cassandra server

2012-05-07 Thread Shawna Qian
Hello:

In our cassandra settings, we need to specify the proxy to access the
cassandra: if using the java code, it will be like this:

Proxy proxy = new Proxy(Proxy.Type.SOCKS, new
InetSocketAddress("socks.corp.yahoo.com", 1080));
Socket socket = new Socket (proxy);
socket.connect(new InetSocketAddress(cassieHostName, cassiePort));
TSocket tsokect =  new TSocket(socket);
TTransport tr = new TFramedTransport(tsokect);


But I am not sure how to specify this in the cassandra-cli.  Also if I use
configHelper in hadoop jobs, how can I specify the proxy information?

Thx
Shawna



[ANN] Mojo's Cassandra Maven Plugin 1.1.0-1 released

2012-05-07 Thread Stephen Connolly
The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 1.1.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The Cassandra Plugin has the following goals.

 * cassandra:start Starts up a test instance of Cassandra in the background.
 * cassandra:stop Stops the test instance of Cassandra that was
started using cassandra:start.
 * cassandra:start-cluster Starts up a test cluster of Cassandra in
the background bound to the local loopback IP addresses 127.0.0.1,
127.0.0.2, etc.
 * cassandra:stop Stops the test cluster of Cassandra that was
started using cassandra:start.
 * cassandra:run Starts up a test instance of Cassandra in the foreground.
 * cassandra:load Runs a cassandra-cli script against the test
instance of Cassandra.
 * cassandra:repair Runs nodetool repair against the test instance of
Cassandra.
 * cassandra:flush Runs nodetool flush against the test instance of Cassandra.
 * cassandra:compact Runs nodetool compact against the test instance
of Cassandra.
 * cassandra:cleanup Runs nodetool cleanup against the test instance
of Cassandra.
 * cassandra:delete Deletes the the test instance of Cassandra.
 * cassandra:cql-exec Execute a CQL statement (directly or from a
file) against the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


   org.codehaus.mojo
   cassandra-maven-plugin
   1.1.0-1



Release Notes - Mojo's Cassandra Maven Plugin - Version 1.1.0-1

** Bug
* [MCASSANDRA-15] - Whitespace in path breaks execution

** New Feature
* [MCASSANDRA-18] - Support Cassandra 1.1

Enjoy,

The Mojo team.

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


CQL 3.0 composite keys and secondary indexes

2012-05-07 Thread Roland Mechler
It seems as though secondary indexes are not supported in tables (column
families) that have composite keys. Is that true? If so, are there plans to
suport that combination in the future?

-Roland


Re: Cassandra search performance

2012-05-07 Thread Maxim Potekhin

Thanks for the comments, much appreciated.

Maxim


On 5/7/2012 3:22 AM, David Jeske wrote:
On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin > wrote:


Looking at your example,as I think you understand, you forgo
indexes by
combining two conditions in one query, thinking along the lines of
what is
often done in RDBMS. A scan is expected in this case, and there is no
magic to avoid it.


This sounds like a mis-understanding of how RDBMSs work. If you 
combine two conditions in a single SQL query, the SQL execution 
optimizer looks at the cardinality of any indicies. If it can 
successfully predict that one of the conditions significantly reduces 
the set of rows that would be considered (such as a status match 
having 200 hits vs 1M rows in the table), then it selects this index 
for the first-iteration, and each index hit causes a record lookup 
which is then tested for the other conditions.  (This is one of 
several query-execution types RDBMS systems use)


I'm no Cassandra expert, so I don't know what it does WRT 
index-selection, but from the page written on secondary indicies, it 
seems like if you just query on status, and do the other filtering 
yourself it'll probably do what you want...


http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

However, if this query is important, you can easily index on two
conditions,
using a composite type (look it up), or string concatenation for
quick and
easy solution.


This is not necessarily a good idea. Creating a composite index 
explodes the index size unnecessarily. If a condition can reduce a 
query to 200 records, there is no need to have a composite index 
including another condition.




Re: getting status of long running repair

2012-05-07 Thread Ben Coverston
Check the log files for warnings or errors. They may indicate why your
repair failed.

On Mon, May 7, 2012 at 10:09 AM, Bill Au  wrote:

> I restarted the nodes and then restarted the repair.  It is still hanging
> like before.  Do I keep repeating until the repair actually finish?
>
> Bill
>
>
> On Fri, May 4, 2012 at 2:18 PM, Rob Coli  wrote:
>
>> On Fri, May 4, 2012 at 10:30 AM, Bill Au  wrote:
>> > I know repair may take a long time to run.  I am running repair on a
>> node
>> > with about 15 GB of data and it is taking more than 24 hours.  Is that
>> > normal?  Is there any way to get status of the repair?  tpstats does
>> show 2
>> > active and 2 pending AntiEntropySessions.  But netstats and
>> compactionstats
>> > show no activity.
>>
>> As indicated by various recent threads to this effect, many versions
>> of cassandra (including current 1.0.x release) contain bugs which
>> sometimes prevent repair from completing. The other threads suggest
>> that some of these bugs result in the state you are in now, where you
>> do not see anything that looks like appropriate activity.
>> Unfortunately the only solution offered on these other threads is the
>> one I will now offer, which is to restart the participating nodes and
>> re-start the repair. I am unaware of any JIRA tickets tracking these
>> bugs (which doesn't mean they don't exist, of course) so you might
>> want to file one. :)
>>
>> =Rob
>>
>> --
>> =Robert Coli
>> AIM>ALK - rc...@palominodb.com
>> YAHOO - rcoli.palominob
>> SKYPE - rcoli_palominodb
>>
>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: cassandra1.1 can't start

2012-05-07 Thread Watanabe Maki
How much memory do you have on the box?
It seems you need more memory.

maki


On 2012/05/08, at 1:29, cyril auburtin  wrote:

> well I uncommented lines96&97 in cassandra-env.sh, with lower values
> 
> MAX_HEAP_SIZE="500M"
> HEAP_NEWSIZE="100M"
> 
> seems to fix, it
> 
> 2012/5/7 cyril auburtin 
> The xassandra lauch command worked the first time
> 
> then now I keep getting
> 
> INFO 18:18:39,354 Starting up server gossip
> ERROR 18:18:39,357 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.io.IOError: java.io.IOException: Map failed
>   at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:191)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:803)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:119)
>   ... 4 more
> Caused by: java.lang.OutOfMemoryError: Map failed
>   at sun.nio.ch.FileChannelImpl.map0(Native Method)
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:800)
>   ... 5 more
> ERROR 18:18:39,361 Exception in thread 
> Thread[StorageServiceShutdownHook,5,main]
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1113)
>   at 
> org.apache.cassandra.service.StorageService$2.runMayThrow(StorageService.java:478)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:636)
> 
> I have tried rebooting, starting it alone, clearing all /var/lib/cassandra dir
> but keep getting this error
> 
> any idea?
> 


Re: CQL 3.0 composite keys and secondary indexes

2012-05-07 Thread Sylvain Lebresne
On Tue, May 8, 2012 at 12:08 AM, Roland Mechler  wrote:
> It seems as though secondary indexes are not supported in tables (column
> families) that have composite keys. Is that true?

It is.

> If so, are there plans to suport that combination in the future?

There is: https://issues.apache.org/jira/browse/CASSANDRA-3680

--
Sylvain


Re: cassandra1.1 can't start

2012-05-07 Thread cyril auburtin
8G, by default the jvm was taking 2G, and I had this error
even with 1G, I had the error, finally 500M made it work

(4 Intel Atoms, OS: ubuntu 10.04 )

2012/5/8 Watanabe Maki 

> How much memory do you have on the box?
> It seems you need more memory.
>
> maki
>
>
> On 2012/05/08, at 1:29, cyril auburtin  wrote:
>
> well I uncommented lines96&97 in cassandra-env.sh, with lower values
>
> MAX_HEAP_SIZE="500M"
> HEAP_NEWSIZE="100M"
>
> seems to fix, it
>
> 2012/5/7 cyril auburtin 
>
>> The xassandra lauch command worked the first time
>>
>> then now I keep getting
>>
>> INFO 18:18:39,354 Starting up server gossip
>> ERROR 18:18:39,357 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
>> java.io.IOError: java.io.IOException: Map failed
>> at
>> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
>>  at
>> org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:191)
>> at
>> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>>  at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>> at java.lang.Thread.run(Thread.java:636)
>> Caused by: java.io.IOException: Map failed
>> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:803)
>>  at
>> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:119)
>> ... 4 more
>> Caused by: java.lang.OutOfMemoryError: Map failed
>> at sun.nio.ch.FileChannelImpl.map0(Native Method)
>>  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:800)
>> ... 5 more
>> ERROR 18:18:39,361 Exception in thread
>> Thread[StorageServiceShutdownHook,5,main]
>> java.lang.NullPointerException
>> at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1113)
>>  at
>> org.apache.cassandra.service.StorageService$2.runMayThrow(StorageService.java:478)
>> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>  at java.lang.Thread.run(Thread.java:636)
>>
>> I have tried rebooting, starting it alone, clearing all
>> /var/lib/cassandra dir
>> but keep getting this error
>>
>> any idea?
>>
>
>