Re: Missing non composite column

2012-10-17 Thread Sylvain Lebresne
On Wed, Oct 17, 2012 at 3:17 AM, Vivek Mishra  wrote:
> column name will be "2012-07-24:2:alliance_involvement" or
> "alliance_involvement"?

The former. Though let's clarify that
"2012-07-24:2:alliance_involvement" is the string representation of a
composite name (i.e. one compatible with CompositeType) for display by
the cli. What you will get is a composite containing 3 components, the
first will be the string '2012-07-24', the second one will be the int
2 and the last one will be the string 'alliance_involvement'.

--
Sylvain

>
> -Vivek
>
> On Tue, Oct 16, 2012 at 10:25 PM, Sylvain Lebresne 
> wrote:
>>
>> On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra 
>> wrote:
>> > Thanks Sylvain. I missed it. If i try to access these via thrift API,
>> > what
>> > will be the column names?
>>
>> I'm not sure I understand the question. The cli output is pretty much
>> what you get via the thrift API.
>>
>> --
>> Sylvain
>
>


Re: Missing non composite column

2012-10-17 Thread Vivek Mishra
Yes, i understand that. Reason why i am asking is, with this i need to
split them to get actual column name using ":" as a seperator.
Though i did not try it yet, but wondering if column name is like
"alliance:movement", then how do it compute it?


On Wed, Oct 17, 2012 at 1:04 PM, Sylvain Lebresne wrote:

> On Wed, Oct 17, 2012 at 3:17 AM, Vivek Mishra 
> wrote:
> > column name will be "2012-07-24:2:alliance_involvement" or
> > "alliance_involvement"?
>
> The former. Though let's clarify that
> "2012-07-24:2:alliance_involvement" is the string representation of a
> composite name (i.e. one compatible with CompositeType) for display by
> the cli. What you will get is a composite containing 3 components, the
> first will be the string '2012-07-24', the second one will be the int
> 2 and the last one will be the string 'alliance_involvement'.
>
> --
> Sylvain
>
> >
> > -Vivek
> >
> > On Tue, Oct 16, 2012 at 10:25 PM, Sylvain Lebresne  >
> > wrote:
> >>
> >> On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra 
> >> wrote:
> >> > Thanks Sylvain. I missed it. If i try to access these via thrift API,
> >> > what
> >> > will be the column names?
> >>
> >> I'm not sure I understand the question. The cli output is pretty much
> >> what you get via the thrift API.
> >>
> >> --
> >> Sylvain
> >
> >
>


Astyanax empty column check

2012-10-17 Thread Xu Renjie
hello guys,
   I am currently using Astyanax as a client(new to Astyanax). But I am not
clear how to differentiate the following 2 situations:
a. A row which has only key without columns
b. No this row in database.

Since when I use RowQuery to query Cassandra with given key, both the above
two situations will return a ColumnList
with size 0. And also I didn't find other api can handle this.
Do you have any better way for this? Thanks in advance.
Cheers,
Xu


Re: Cassandra nodes loaded unequally

2012-10-17 Thread Alain RODRIGUEZ
I've got the same problem, and other people in the mailing list are
reporting the same issue.

I don't know what is happening here.

RF 2, 2 nodes :

10.59.21.241eu-west 1b  Up Normal  137.53 GB
50.00%  0
10.58.83.109eu-west 1b  Up Normal  102.46 GB
50.00%  85070591730234615865843651857942052864

I have no idea how to fix it.

Alain

2012/10/17 Ben Kaehne 

> Nothing unusual.
>
> All servers are exactly the same. Nothing unusual in the log files. Is
> there any level of logging that I should be turning on?
>
> Regards,
>
>
> On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh wrote:
>
>> With your environment (3 nodes, RF=3) it is very difficult to get
>> uneven load. Each node receives the same number of read/write
>> requests. Probably something is wrong on low level, OS or VM. Do you
>> see anything unusual in log files?
>>
>> Andrey
>>
>> On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne 
>> wrote:
>> > Not connecting to the same node every time. Using Hector to ensure an
>> even
>> > distribution of connections accross the cluster.
>> >
>> > Regards,
>> >
>> > On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss 
>> wrote:
>> >>
>> >> are you connecting to the same node every time?  if so, spread out
>> >> your connections across the ring
>> >>
>> >> On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov > >
>> >> wrote:
>> >> > Hi Ben,
>> >> >
>> >> > I suggest you to compare amount of queries for each node. May be the
>> >> > problem
>> >> > is on the client side.
>> >> > Yoy can do that using JMX:
>> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=> >> > KEYSPACE>,columnfamily=","ReadCount"
>> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=> >> > KEYSPACE>,columnfamily=","WriteCount"
>> >> >
>> >> > Also I suggest to check output of "nodetool compactionstats".
>> >> >
>> >> > --
>> >> > Alexey
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > -Ben
>>
>
>
>
> --
> -Ben
>


Re: Astyanax empty column check

2012-10-17 Thread Xu Renjie
Sorry for the version, I am using 1.0.1 Astyanax.

On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie  wrote:

> hello guys,
>I am currently using Astyanax as a client(new to Astyanax). But I am
> not clear how to differentiate the following 2 situations:
> a. A row which has only key without columns
> b. No this row in database.
>
> Since when I use RowQuery to query Cassandra with given key, both the
> above two situations will return a ColumnList
> with size 0. And also I didn't find other api can handle this.
> Do you have any better way for this? Thanks in advance.
> Cheers,
> Xu
>


Re: Astyanax empty column check

2012-10-17 Thread rohit bhatia
See
"If you attempt to retrieve an entire row and it returns a result with
no columns, it effectively means that row does not exist."
Essentially a row without columns doesn't exist.. (except those with tombstones)
 from here
http://stackoverflow.com/questions/8072253/is-there-a-difference-between-an-empty-key-and-a-key-that-doesnt-exist


On Wed, Oct 17, 2012 at 2:17 PM, Xu Renjie  wrote:
> Sorry for the version, I am using 1.0.1 Astyanax.
>
>
> On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie  wrote:
>>
>> hello guys,
>>I am currently using Astyanax as a client(new to Astyanax). But I am
>> not clear how to differentiate the following 2 situations:
>> a. A row which has only key without columns
>> b. No this row in database.
>>
>> Since when I use RowQuery to query Cassandra with given key, both the
>> above two situations will return a ColumnList
>> with size 0. And also I didn't find other api can handle this.
>> Do you have any better way for this? Thanks in advance.
>> Cheers,
>> Xu
>
>


Re: run repair on each node or every R nodes?

2012-10-17 Thread Alain RODRIGUEZ
"I see. So if I don't use the '-pr' option, triggering repair on node-00 is
sufficient to repair the first 3 nodes.
No need to cron a repair on node-{01,02}.
correct?"

"forget it. this was nonsense."

In my mind it does make sense, and what you're saying is correct. But I
read that it was better to run repair in each node with a "-pr" option.

Alain

2012/10/16 Alexis Midon 

> forget it. this was nonsense.
>
>
> On Mon, Oct 15, 2012 at 10:05 PM, Alexis Midon wrote:
>
>> I see. So if I don't use the '-pr' option, triggering repair on node-00
>> is sufficient to repair the first 3 nodes.
>> No need to cron a repair on node-{01,02}.
>> correct?
>>
>> thanks for your answer.
>>
>>
>> On Mon, Oct 15, 2012 at 6:51 PM, Andrey Ilinykh wrote:
>>
>>> Only one region (node-00 is responsible for) will get repaired on all
>>> three nodes.
>>> Andrey
>>> On Mon, Oct 15, 2012 at 11:56 AM, Alexis Midon 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I have a 9-node cluster with a replication factor R=3. When I run
>>> repair -pr
>>> > on node-00, I see the exact same load and activity on node-{01,02}.
>>> > Specifically, compactionstats shows the same Validation tasks.
>>> > Does this mean that all 3 nodes will be repaired when nodetool
>>> returns? or
>>> > do I still have to trigger a nodetool-repair on node-{01,02}?
>>> >
>>> > Thanks,
>>> >
>>> > Alexis
>>>
>>
>>
>


Re: run repair on each node or every R nodes?

2012-10-17 Thread Radim Kolar

what if first node in range is down? then -pr would be ineffective


Firebrand Object Mapper Library for Cassandra

2012-10-17 Thread Raul Raja Martinez
Hello All,

Just wanted to announce a new Open Source Object Mapping and Client library for 
Cassandra that we have recently released at 47 Degrees.
Firebrand OCM is a simple library for persisting and querying Java Objects to 
Cassandra.
Among other features it includes a CQL query builder.

The code is still in alpha and no stable releases are yet available but 
snapshots are periodically published via Maven to Sonatype.

For those of you interested in taking a look at it...

http://firebrandocm.org
https://github.com/47deg/firebrand

Best,


Raúl Raja Martínez
Co-Founder
47 Degrees, LLC
http://47deg.com



Re: Astyanax empty column check

2012-10-17 Thread Xu Renjie
So what you mean is essentially there is *no* way to differentiate it
because what they "appear" is the same?

On Wed, Oct 17, 2012 at 5:58 PM, rohit bhatia  wrote:

> See
> "If you attempt to retrieve an entire row and it returns a result with
> no columns, it effectively means that row does not exist."
> Essentially a row without co


http://stackoverflow.com/questions/8072253/is-there-a-difference-between-an-empty-key-and-a-key-that-doesnt-exist

lumns doesn't exist.. (except those with tombstones)
>  from here
> On Wed, Oct 17, 2012 at 2:17 PM, Xu Renjie  wrote:
> > Sorry for the version, I am using 1.0.1 Astyanax.
> >
> >
> > On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie 
> wrote:
> >>
> >> hello guys,
> >>I am currently using Astyanax as a client(new to Astyanax). But I am
> >> not clear how to differentiate the following 2 situations:
> >> a. A row which has only key without columns
> >> b. No this row in database.
> >>
> >> Since when I use RowQuery to query Cassandra with given key, both the
> >> above two situations will return a ColumnList
> >> with size 0. And also I didn't find other api can handle this.
> >> Do you have any better way for this? Thanks in advance.
> >> Cheers,
> >> Xu
> >
> >
>


Re: RF update

2012-10-17 Thread Matthias Broecheler
Follow up question: Is it safe to abort the compactions happening after
node repair?

On Mon, Oct 15, 2012 at 6:32 PM, Will Martin wrote:

> +1   It doesn't make sense that the xfr compactions are heavy unless they
> are translating the file. This could be a protocol mismatch: however the
> requirements for node level compaction and wire compaction I would expect
> to be pretty different.
> On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote:
>
> > Hey,
> >
> > we are writing a lot of data into a cassandra cluster for a batch
> loading use case. We cannot use the sstable batch loader, so in order to
> speed up the loading process we are using RF=1 while the data is loading.
> After the load is complete, we want to increase the RF. For that, we are
> updating the RF in the schema and then run the node repair tool on each
> cassandra instance to stream the data over. However, we are noticing that
> this process is slowed down by a lot of compactions (the actually streaming
> of data only takes a couple of minutes).
> >
> > Cassandra is already running a major compaction after the data loading
> process has completed. But then, there are to be two more compactions (one
> on the sender and one on the receiver) happening and those take a very long
> time even on the aws high i/o instance with no compaction throttling.
> >
> > Question: These additional compactions seem redundant since there are no
> reads or writes on the cluster after the first major compaction
> (immediately after the data load), is that right? And if so, what can we do
> to avoid them? We are currently waiting multiple days.
> >
> > Thank you very much for your help,
> > Matthias
> >
>
>


-- 
Matthias Broecheler, PhD
http://www.matthiasb.com
E-Mail: m...@matthiasb.com


Re: Astyanax empty column check

2012-10-17 Thread Hiller, Dean
What specifically are you trying to achieve?  The business requirement might 
help as there are other ways of solving it such that you do not need to know 
the difference.

Dean

From: Xu Renjie mailto:xrjxrjxrj...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, October 17, 2012 4:48 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Astyanax empty column check

So what you mean is essentially there is *no* way to differentiate it because 
what they "appear" is the same?

On Wed, Oct 17, 2012 at 5:58 PM, rohit bhatia 
mailto:rohit2...@gmail.com>> wrote:
See
"If you attempt to retrieve an entire row and it returns a result with
no columns, it effectively means that row does not exist."
Essentially a row without co

http://stackoverflow.com/questions/8072253/is-there-a-difference-between-an-empty-key-and-a-key-that-doesnt-exist

lumns doesn't exist.. (except those with tombstones)
 from here
On Wed, Oct 17, 2012 at 2:17 PM, Xu Renjie 
mailto:xrjxrjxrj...@gmail.com>> wrote:
> Sorry for the version, I am using 1.0.1 Astyanax.
>
>
> On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie 
> mailto:xrjxrjxrj...@gmail.com>> wrote:
>>
>> hello guys,
>>I am currently using Astyanax as a client(new to Astyanax). But I am
>> not clear how to differentiate the following 2 situations:
>> a. A row which has only key without columns
>> b. No this row in database.
>>
>> Since when I use RowQuery to query Cassandra with given key, both the
>> above two situations will return a ColumnList
>> with size 0. And also I didn't find other api can handle this.
>> Do you have any better way for this? Thanks in advance.
>> Cheers,
>> Xu
>
>



Re: Missing non composite column

2012-10-17 Thread Sylvain Lebresne
> Yes, i understand that. Reason why i am asking is, with this i need to split
> them to get actual column name using ":" as a seperator.
> Though i did not try it yet, but wondering if column name is like
> "alliance:movement", then how do it compute it?

You've lost me, sorry.

--
Sylvain

>
>
> On Wed, Oct 17, 2012 at 1:04 PM, Sylvain Lebresne 
> wrote:
>>
>> On Wed, Oct 17, 2012 at 3:17 AM, Vivek Mishra 
>> wrote:
>> > column name will be "2012-07-24:2:alliance_involvement" or
>> > "alliance_involvement"?
>>
>> The former. Though let's clarify that
>> "2012-07-24:2:alliance_involvement" is the string representation of a
>> composite name (i.e. one compatible with CompositeType) for display by
>> the cli. What you will get is a composite containing 3 components, the
>> first will be the string '2012-07-24', the second one will be the int
>> 2 and the last one will be the string 'alliance_involvement'.
>>
>> --
>> Sylvain
>>
>> >
>> > -Vivek
>> >
>> > On Tue, Oct 16, 2012 at 10:25 PM, Sylvain Lebresne
>> > 
>> > wrote:
>> >>
>> >> On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra 
>> >> wrote:
>> >> > Thanks Sylvain. I missed it. If i try to access these via thrift API,
>> >> > what
>> >> > will be the column names?
>> >>
>> >> I'm not sure I understand the question. The cli output is pretty much
>> >> what you get via the thrift API.
>> >>
>> >> --
>> >> Sylvain
>> >
>> >
>
>


Re: run repair on each node or every R nodes?

2012-10-17 Thread Andrey Ilinykh
>
> In my mind it does make sense, and what you're saying is correct. But I read
> that it was better to run repair in each node with a "-pr" option.
>
> Alain
>
Yes, it's correct. Running repair -pr on each node you repair whole
cluster without job duplication.

Andrey


Re: Cassandra nodes loaded unequally

2012-10-17 Thread Andrey Ilinykh
Some of your column families are not fully compacted. But it is pretty
normal, I would not worry about it. Eventually it should happen.

On Wed, Oct 17, 2012 at 1:46 AM, Alain RODRIGUEZ  wrote:
> I've got the same problem, and other people in the mailing list are
> reporting the same issue.
>
> I don't know what is happening here.
>
> RF 2, 2 nodes :
>
> 10.59.21.241eu-west 1b  Up Normal  137.53 GB
> 50.00%  0
> 10.58.83.109eu-west 1b  Up Normal  102.46 GB
> 50.00%  85070591730234615865843651857942052864
>
> I have no idea how to fix it.
>
> Alain
>
> 2012/10/17 Ben Kaehne 
>>
>> Nothing unusual.
>>
>> All servers are exactly the same. Nothing unusual in the log files. Is
>> there any level of logging that I should be turning on?
>>
>> Regards,
>>
>>
>> On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh 
>> wrote:
>>>
>>> With your environment (3 nodes, RF=3) it is very difficult to get
>>> uneven load. Each node receives the same number of read/write
>>> requests. Probably something is wrong on low level, OS or VM. Do you
>>> see anything unusual in log files?
>>>
>>> Andrey
>>>
>>> On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne 
>>> wrote:
>>> > Not connecting to the same node every time. Using Hector to ensure an
>>> > even
>>> > distribution of connections accross the cluster.
>>> >
>>> > Regards,
>>> >
>>> > On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss 
>>> > wrote:
>>> >>
>>> >> are you connecting to the same node every time?  if so, spread out
>>> >> your connections across the ring
>>> >>
>>> >> On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov
>>> >> 
>>> >> wrote:
>>> >> > Hi Ben,
>>> >> >
>>> >> > I suggest you to compare amount of queries for each node. May be the
>>> >> > problem
>>> >> > is on the client side.
>>> >> > Yoy can do that using JMX:
>>> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=>> >> > KEYSPACE>,columnfamily=","ReadCount"
>>> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=>> >> > KEYSPACE>,columnfamily=","WriteCount"
>>> >> >
>>> >> > Also I suggest to check output of "nodetool compactionstats".
>>> >> >
>>> >> > --
>>> >> > Alexey
>>> >> >
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > -Ben
>>
>>
>>
>>
>> --
>> -Ben
>
>


EOFException with BulkOutputFormat in 1.1.6

2012-10-17 Thread Michael Kjellman
I'm getting EOFExceptions with BulkOutputFormat

2012-10-17 12:23:01,182 ERROR 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor: Error in 
ThreadPoolExecutor
java.lang.RuntimeException: java.io.EOFException
at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more

Anyone else running into anything similar?

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: EOFException with BulkOutputFormat in 1.1.6

2012-10-17 Thread Michael Kjellman
Apologies - looks like this is already being tracked in 
https://issues.apache.org/jira/browse/CASSANDRA-4813

From: Michael Kjellman mailto:mkjell...@barracuda.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, October 17, 2012 12:25 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: EOFException with BulkOutputFormat in 1.1.6

I'm getting EOFExceptions with BulkOutputFormat

2012-10-17 12:23:01,182 ERROR 
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor: Error in 
ThreadPoolExecutor
java.lang.RuntimeException: java.io.EOFException
at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more

Anyone else running into anything similar?

--
'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook
  ­­

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




potential data loss in Cassandra 1.1.0 .. 1.1.4

2012-10-17 Thread Jonathan Ellis
I wanted to call out a particularly important bug for those who aren't
in the habit of reading CHANGES.

Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
1.0.x or earlier directly to 1.1.5, you're okay as far as this is
concerned.  But if you used an earlier 1.1 release, you should upgrade
to 1.1.6.

Explanation:

A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
generate commitlog segment IDs.  This could cause data loss in the
event of a power failure, since we assume commitlog IDs are strictly
increasing in our replay logic.  Simplified, the replay logic looks like this:

1. Take the most recent flush time X for each columnfamily
2. Replay all activity in the commitlog that occurred after X

The problem is that nanotime gets effectively a new random seed after
a reboot.  If the new seed is substantially below the old one, any new
commitlog segments will never be "after" the pre-reboot flush
timestamps.  Subsequently, restarting Cassandra will not replay any
unflushed updates.

We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
didn't realize the implications for replay timestamps until later
(CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
will result in replaying the entire commitlog, including data that may
have already been flushed.

Replaying already-flushed data a second time is harmless -- except for
counters.  So, to avoid replaying flushed counter data, we recommend
performing drain when shutting down the pre-1.1.6 C* prior to upgrade.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com