Re: Volunteers needed - Wiki

2011-10-11 Thread Jérémy SEVELLEC
Hi Aaron,

I think the CommitLog section is outdated (
http://wiki.apache.org/cassandra/ArchitectureCommitLog) :

The CommitLogHeader is no longer exist since this ticket :
https://issues.apache.org/jira/browse/CASSANDRA-2419

Regards,

Jérémy

2011/10/11 Sasha Dolgy 

> maybe that should be the first wiki update  the TODO
>
>
> On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe wrote:
>
>> Hello aaron,
>> I raise my hand too.
>> If you have to-do list about the wiki, please let us know.
>>
>> maki
>>
>
>


-- 
Jérémy


Re: anyway to throttle nodetool repair?

2011-10-11 Thread Peter Schuller
> so how about disk io?  is there anyway to use ionice to control it?
> I have tried to adjust the priority by "ionice -c3 -p [cassandra pid].
>  seems not working...

Compaction throttling (and in 1.0 internode streaming throttling) both
address disk I/O.

-- 
/ Peter Schuller (@scode on twitter)


Re: Multi DC setup

2011-10-11 Thread Peter Schuller
> We already have two separate rings. Idea of bidirectional sync is, if one
> ring is down, we can still send the traffic to other ring. When original
> cluster comes back, it will pick up the data from available cluster. I'm not
> sure if it makes sense to have separate rings or combine these two rings
> into one.

Cassandra doesn't have support for synchronizing data between two
different rings. The multi-dc support in Cassandra amounts to having a
single ring containing all nodes from all data centers. Cassandra is
told (by configuring the snitch, such as through a property files)
which nodes are in which data center. Using the
NetworkTopologyStrategy, you then make sure to distribute replicas in
DC:s as you see fit.

Cassandra will then prefer local nodes for read and write operations,
and you can use e.g. LOCAL_QUORUM consistency level to get quorum like
consistency within a DC.

Google/check wiki/read docs about NetworkTopologyStrategy and
PropertyFileSnitch. I don't have a good link to multi-dc off hand
(anyone got a good link to suggest that goes through this?).

-- 
/ Peter Schuller (@scode on twitter)


Re: Existing column(s) not readable

2011-10-11 Thread aaron morton
Nothing jumps out. The obvious answer is that the column has been deleted. Did 
you check all the SSTables ?

It looks like query returned from row cache, otherwise you would see this as 
well…

DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) 
collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861

Which would mean a version of the column was found. 

If you invalidate the cache with nodetool and run the query and the log message 
appears it will mean the column was read from (all of the) sstables. If you do 
not get a column returned I would say there is a tombstone in place. It's 
either a row level or a column level one.  

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 10:35 AM, Thomas Richter wrote:

> Hi Aaron,
> 
> normally we use hector to access cassandra, but for debugging I switched
> to cassandra-cli.
> 
> Column can not be read by a simple
> get CFName['rowkey']['colname'];
> 
> Response is "Value was not found"
> if i query another column, everything is just fine.
> 
> Serverlog for unsuccessful read (keyspace and CF names replaced):
> 
> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java
> (line 280) get
> 
> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line
> 320) Command/ConsistencyLevel is
> SliceByNamesReadCommand(table='Keyspace',
> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
> columnParent='QueryPath(columnFamilyName='ColumnFamily',
> superColumnName='null', columnName='null')',
> columns=[574c303030375030,])/ONE
> 
> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 ReadCallback.java (line
> 86) Blockfor/repair is 1/true; setting up requests to localhost/127.0.0.1
> 
> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 StorageProxy.java (line
> 343) reading data locally
> 
> DEBUG [ReadStage:33] 2011-10-10 23:15:29,751 StorageProxy.java (line
> 448) LocalReadRunnable reading SliceByNamesReadCommand(table='Keyspace',
> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
> columnParent='QueryPath(columnFamilyName='ColumnFamily',
> superColumnName='null', columnName='null')', columns=[574c303030375030,])
> 
> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,818 StorageProxy.java (line
> 393) Read: 67 ms.
> 
> Log looks fine to me, but no result is returned.
> 
> Best,
> 
> Thomas
> 
> On 10/10/2011 10:00 PM, aaron morton wrote:
>> How are they unreadable ? You need to go into some details about what is 
>> going wrong. 
>> 
>> What sort of read ? 
>> What client ? 
>> What is in the logging on client and server side ? 
>> 
>> 
>> Try turning the logging up to DEBUG on the server to watch what happens. 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 10/10/2011, at 9:23 PM, Thomas Richter wrote:
>> 
>>> Hi,
>>> 
>>> no errors in the server logs. The columns are unreadable on all nodes at
>>> any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
>>> upgraded to 0.7.6-2 two months ago.
>>> 
>>> Best,
>>> 
>>> Thomas
>>> 
>>> On 10/10/2011 10:03 AM, aaron morton wrote:
 What error are you seeing  in the server logs ? Are the columns unreadable 
 at all Consistency Levels ? i.e. are the columns unreadable on all nodes.
 
 What is the upgrade history of the cluster ? What version did it start at 
 ? 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 10/10/2011, at 7:42 AM, Thomas Richter wrote:
 
> Hi,
> 
> here is some further information. Compaction did not help, but data is
> still there when I dump the row with sstable2json.
> 
> Best,
> 
> Thomas
> 
> On 10/08/2011 11:30 PM, Thomas Richter wrote:
>> Hi,
>> 
>> we are running a 3 node cassandra (0.7.6-2) cluster and some of our
>> column families contain quite large rows (400k+ columns, 4-6GB row size).
>> Replicaton factor is 3 for all keyspaces. The cluster is running fine
>> for several months now and we never experienced any serious trouble.
>> 
>> Some days ago we noticed, that some previously written columns could not
>> be read. This does not always happen, and only some dozen columns out of
>> 400k are affected.
>> 
>> After ruling out application logic as a cause I dumped the row in
>> question with sstable2json and the columns are there (and are not marked
>> for deletion).
>> 
>> Next thing was setting up a fresh single node cluster and copying the
>> column family data to that node. Columns could not be read either.
>> Right now I'm running a nodetool compact for the cf to see if data could
>> be read afterwards.
>> 
>> Is th

Re: Volunteers needed - Wiki

2011-10-11 Thread aaron morton
@maki thanks, 
Could you take a look at the cli page 
http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of online docs 
in the tool, so we dont need to replicate that. Just a simple getting started 
guide, some examples and a few tips about about what to do if things don't 
work. e.g. often people have problems when using bytes comparator. If you could 
use the sample schema that ships in conf/ that would be handy. 

You may want to snapshot the 0.7 CLI page in the same way the 0.6 one 
was and link back http://wiki.apache.org/cassandra/CassandraCli06

Just update the draft home page to say you are working on it 
http://wiki.apache.org/cassandra/FrontPage_draft_aaron

@sasha
I was going to use the draft home page as a todo list, (do every page 
listed on there, and sensibly follow links) and as a checkout system  
http://wiki.apache.org/cassandra/FrontPage_draft_aaron

@Jérémy
Thanks I'll keep that in mind. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:

> Hi Aaron,
> 
> I think the CommitLog section is outdated 
> (http://wiki.apache.org/cassandra/ArchitectureCommitLog) :
> 
> The CommitLogHeader is no longer exist since this ticket : 
> https://issues.apache.org/jira/browse/CASSANDRA-2419
> 
> Regards,
> 
> Jérémy
> 
> 2011/10/11 Sasha Dolgy 
> maybe that should be the first wiki update  the TODO
> 
> 
> On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe  
> wrote:
> Hello aaron,
> I raise my hand too.
> If you have to-do list about the wiki, please let us know.
> 
> maki
> 
> 
> 
> 
> -- 
> Jérémy



Hector Problem Basic one

2011-10-11 Thread CASSANDRA learner
Hi Every One,

Actually I was using cassandra long time back and when i tried today, I am
getting a problem from eclipse. When i am trying to run a basic hector
(java) example, I am getting an exception
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked
down. Retry burden pushed out to client. . But My server is up. Node tool
also whows that it is up. I donno what happens..

1.)Is it any thing to do with JMX port.
2.) What is the storage port in casandra.yaml and jmx port in
cassandra-env.sh


Re: Existing column(s) not readable

2011-10-11 Thread Thomas Richter
Hi Aaron,

i invalidated the caches but nothing changed. I didn't get the mentioned
log line either, but as I read the code SliceByNamesReadCommand uses
NamesQueryFilter and not SliceQueryFilter.

Next, there is only one SSTable.

I can rule out that the row is deleted because I deleted all other rows
in that CF to reduce data size and speed up testing. I set
GCGraceSeconds to zero and ran a compaction. All other rows are gone,
but i can still access at least one column from the left row.
So as far as I understand it, there should not be a tombstone on row level.

To make it a list:

* One SSTable, one row
*
* Row is not deleted (other columns can be read, row survives compaction
with GCGraceSeconds=0)
* Most columns can be read by get['row']['col'] from cassandra-cli
* Some columns can not be read by get['row']['col'] from cassandra-cli
but can be found in output of sstable2json
* unreadable data survives compaction with GCGraceSeconds=0 (checked
with sstable2json)
* Invalidation caches does not help
* Nothing in the logs

Does that point into any direction where i should look next?

Best,

Thomas

On 10/11/2011 10:30 AM, aaron morton wrote:
> Nothing jumps out. The obvious answer is that the column has been deleted. 
> Did you check all the SSTables ?
> 
> It looks like query returned from row cache, otherwise you would see this as 
> well…
> 
> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) 
> collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861
> 
> Which would mean a version of the column was found. 
> 
> If you invalidate the cache with nodetool and run the query and the log 
> message appears it will mean the column was read from (all of the) sstables. 
> If you do not get a column returned I would say there is a tombstone in 
> place. It's either a row level or a column level one.  
> 
> Hope that helps. 
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/10/2011, at 10:35 AM, Thomas Richter wrote:
> 
>> Hi Aaron,
>>
>> normally we use hector to access cassandra, but for debugging I switched
>> to cassandra-cli.
>>
>> Column can not be read by a simple
>> get CFName['rowkey']['colname'];
>>
>> Response is "Value was not found"
>> if i query another column, everything is just fine.
>>
>> Serverlog for unsuccessful read (keyspace and CF names replaced):
>>
>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java
>> (line 280) get
>>
>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line
>> 320) Command/ConsistencyLevel is
>> SliceByNamesReadCommand(table='Keyspace',
>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>> superColumnName='null', columnName='null')',
>> columns=[574c303030375030,])/ONE
>>
>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 ReadCallback.java (line
>> 86) Blockfor/repair is 1/true; setting up requests to localhost/127.0.0.1
>>
>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 StorageProxy.java (line
>> 343) reading data locally
>>
>> DEBUG [ReadStage:33] 2011-10-10 23:15:29,751 StorageProxy.java (line
>> 448) LocalReadRunnable reading SliceByNamesReadCommand(table='Keyspace',
>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>> superColumnName='null', columnName='null')', columns=[574c303030375030,])
>>
>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,818 StorageProxy.java (line
>> 393) Read: 67 ms.
>>
>> Log looks fine to me, but no result is returned.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/10/2011 10:00 PM, aaron morton wrote:
>>> How are they unreadable ? You need to go into some details about what is 
>>> going wrong. 
>>>
>>> What sort of read ? 
>>> What client ? 
>>> What is in the logging on client and server side ? 
>>>
>>>
>>> Try turning the logging up to DEBUG on the server to watch what happens. 
>>>
>>> Cheers
>>>
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 10/10/2011, at 9:23 PM, Thomas Richter wrote:
>>>
 Hi,

 no errors in the server logs. The columns are unreadable on all nodes at
 any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
 upgraded to 0.7.6-2 two months ago.

 Best,

 Thomas

 On 10/10/2011 10:03 AM, aaron morton wrote:
> What error are you seeing  in the server logs ? Are the columns 
> unreadable at all Consistency Levels ? i.e. are the columns unreadable on 
> all nodes.
>
> What is the upgrade history of the cluster ? What version did it start at 
> ? 
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/10/201

Re: Hector Problem Basic one

2011-10-11 Thread Ben Ashton
Hey,

We had this one, even tho in the hector documentation it says that it
retry s failed servers even 30 by default, it doesn't.

Once we explicitly set it to X seconds, when ever there is a failure,
ie with network (AWS), it will retry and add it back into the pool.

Ben

On 11 October 2011 11:09, CASSANDRA learner  wrote:
> Hi Every One,
>
> Actually I was using cassandra long time back and when i tried today, I am
> getting a problem from eclipse. When i am trying to run a basic hector
> (java) example, I am getting an exception
> me.prettyprint.hector.api.exceptions.HectorException: All host pools marked
> down. Retry burden pushed out to client. . But My server is up. Node tool
> also whows that it is up. I donno what happens..
>
> 1.)Is it any thing to do with JMX port.
> 2.) What is the storage port in casandra.yaml and jmx port in
> cassandra-env.sh
>
>
>


Re: Existing column(s) not readable

2011-10-11 Thread aaron morton
kewl, 

> * Row is not deleted (other columns can be read, row survives compaction
> with GCGraceSeconds=0)

IIRC row tombstones can hang around for a while (until gc grace has passed), 
and they only have an effect on columns that have a lower timstamp. So it's 
possible to read columns from a row with a tombstone. 

Can you read the column using a slice range rather than specifying it's name ? 

Aaron

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 11:15 PM, Thomas Richter wrote:

> Hi Aaron,
> 
> i invalidated the caches but nothing changed. I didn't get the mentioned
> log line either, but as I read the code SliceByNamesReadCommand uses
> NamesQueryFilter and not SliceQueryFilter.
> 
> Next, there is only one SSTable.
> 
> I can rule out that the row is deleted because I deleted all other rows
> in that CF to reduce data size and speed up testing. I set
> GCGraceSeconds to zero and ran a compaction. All other rows are gone,
> but i can still access at least one column from the left row.
> So as far as I understand it, there should not be a tombstone on row level.
> 
> To make it a list:
> 
> * One SSTable, one row
> *
> * Row is not deleted (other columns can be read, row survives compaction
> with GCGraceSeconds=0)
> * Most columns can be read by get['row']['col'] from cassandra-cli
> * Some columns can not be read by get['row']['col'] from cassandra-cli
> but can be found in output of sstable2json
> * unreadable data survives compaction with GCGraceSeconds=0 (checked
> with sstable2json)
> * Invalidation caches does not help
> * Nothing in the logs
> 
> Does that point into any direction where i should look next?
> 
> Best,
> 
> Thomas
> 
> On 10/11/2011 10:30 AM, aaron morton wrote:
>> Nothing jumps out. The obvious answer is that the column has been deleted. 
>> Did you check all the SSTables ?
>> 
>> It looks like query returned from row cache, otherwise you would see this as 
>> well…
>> 
>> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 
>> 123) collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861
>> 
>> Which would mean a version of the column was found. 
>> 
>> If you invalidate the cache with nodetool and run the query and the log 
>> message appears it will mean the column was read from (all of the) sstables. 
>> If you do not get a column returned I would say there is a tombstone in 
>> place. It's either a row level or a column level one.  
>> 
>> Hope that helps. 
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 11/10/2011, at 10:35 AM, Thomas Richter wrote:
>> 
>>> Hi Aaron,
>>> 
>>> normally we use hector to access cassandra, but for debugging I switched
>>> to cassandra-cli.
>>> 
>>> Column can not be read by a simple
>>> get CFName['rowkey']['colname'];
>>> 
>>> Response is "Value was not found"
>>> if i query another column, everything is just fine.
>>> 
>>> Serverlog for unsuccessful read (keyspace and CF names replaced):
>>> 
>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java
>>> (line 280) get
>>> 
>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line
>>> 320) Command/ConsistencyLevel is
>>> SliceByNamesReadCommand(table='Keyspace',
>>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>>> superColumnName='null', columnName='null')',
>>> columns=[574c303030375030,])/ONE
>>> 
>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 ReadCallback.java (line
>>> 86) Blockfor/repair is 1/true; setting up requests to localhost/127.0.0.1
>>> 
>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 StorageProxy.java (line
>>> 343) reading data locally
>>> 
>>> DEBUG [ReadStage:33] 2011-10-10 23:15:29,751 StorageProxy.java (line
>>> 448) LocalReadRunnable reading SliceByNamesReadCommand(table='Keyspace',
>>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>>> superColumnName='null', columnName='null')', columns=[574c303030375030,])
>>> 
>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,818 StorageProxy.java (line
>>> 393) Read: 67 ms.
>>> 
>>> Log looks fine to me, but no result is returned.
>>> 
>>> Best,
>>> 
>>> Thomas
>>> 
>>> On 10/10/2011 10:00 PM, aaron morton wrote:
 How are they unreadable ? You need to go into some details about what is 
 going wrong. 
 
 What sort of read ? 
 What client ? 
 What is in the logging on client and server side ? 
 
 
 Try turning the logging up to DEBUG on the server to watch what happens. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 10/10/2011, at 9:23 PM, Thomas Richter wrote:
 
>

Cassandra as session store under heavy load

2011-10-11 Thread Maciej Miklas
Hi *,

I would like to use Cassandra to store session related informations. I do
not have real HTTP session - it's different protocol, but the same concept.

Memcached would be fine, but I would like to additionally persist data.

Cassandra setup:

   - non replicated Key Space
   - single Column Family, where key is session ID and each column within
   row stores single key/value - (Map>)
   - column TTL = 10 minutes
   - write CL = ONE
   - read CL = ONE
   - 2.000 writes/s
   - 5.000 reads/s

Data example:

session1:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
.
   {propXXX:val3, TTL:10 min}
},
session2:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
},
..
session:{ // CF row key
   {prop1:val1, TTL:10 min},
   {prop2:val2, TTL:10 min},
}

In this case consistency is not a problem, but the performance could be,
especially disk IO.

Since data in my session leaves for short time, I would like to avoid
storing it on hard drive - except for commit log.

I have some questions:

   1. If column expires in Memtable before flushing it to SSTable, will
   Cassandra anyway store such column in SSTable (flush it to HDD)?
   2. Replication is disabled for my Key Space, in this case storing such
   expired column in SSTable would not be necessary, right?
   3. Each CF hat max 10 columns. In such case I would enable row cache and
   disable key cache. But I am expecting my data to be still available in
   Memtable, in this case I could disable whole cache, right?
   4. Any Cassandra configuration hints for such session-store use case
   would be really appreciated :)

Thank you,

Maciej


Re: Multi DC setup

2011-10-11 Thread Brandon Williams
On Tue, Oct 11, 2011 at 2:36 AM, Peter Schuller
 wrote:
> Google/check wiki/read docs about NetworkTopologyStrategy and
> PropertyFileSnitch. I don't have a good link to multi-dc off hand
> (anyone got a good link to suggest that goes through this?).

http://www.datastax.com/docs/0.8/cluster_architecture/replication is
pretty good imo.

-Brandon


Re: Multi DC setup

2011-10-11 Thread Eric Tamme



We already have two separate rings. Idea of bidirectional sync is, if one
ring is down, we can still send the traffic to other ring. When original
cluster comes back, it will pick up the data from available cluster. I'm not
sure if it makes sense to have separate rings or combine these two rings
into one.
I am not sure you fully understand how Cassandra is supposed to work - 
you do not need two rings to have two complete sets of data that you can 
"hot cutover" between.



Cassandra doesn't have support for synchronizing data between two
different rings. The multi-dc support in Cassandra amounts to having a
single ring containing all nodes from all data centers. Cassandra is
told (by configuring the snitch, such as through a property files)
which nodes are in which data center. Using the
NetworkTopologyStrategy, you then make sure to distribute replicas in
DC:s as you see fit.
Using NTS you can configure a single ring into multiple "logical 
rings".  This is effectively what the property file snitch does in 
conjunction with NTS.


I gave a presentation on the NTS internals, and replicating data across 
geographically distributed data centers. You can find the slides here 
http://files.meetup.com/1794037/NTS_presentation.pdf


Also Edward Capriolio's book "high performance cassandra" has some 
recipes for using NTS.


I currently have 4 nodes in two data centers and I use NTS with property 
file snitch to write 1 copy of data to each DC (one node per DC) so that 
in the event of a total DC failure, we can still get to the data.  The 
first write is "local" and the replica is asynchronous if you set write 
consistency to 1 - so you get fast writes with distribution.


-Eric




CompletedTasks attribute exposed via JMX

2011-10-11 Thread Alexandru Dan Sicoe
Hello everyone,
 I was trying to get some cluster wide statistics of the total insertions
performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little
program that gets the CompletedTasks attribute of
org.apache.cassandra.db:type=Commitlog from every node, sums up the values
and records them in a .csv every 10 sec or so. Everything works and I get my
stats but later I found out that I am not really sure what this measure
means. I think it is the individual column insertions performed! Am I
correct?
 In the meantime I installed the trial version of the DataStax Operations
Center. The cluster wide dashboard, showing Writes performed as a function
of time, gives me much smaller values of the rates, compared to the
measurement I described before. The Datastax writes/sec are of the same
order of magnitude as the batch writes I perform on the cluster. But somehow
I cannot relate between this rate and the rate of my CompletedTasks
measurement.

How do people usually measure insertion rates for their custers ? Per batch,
per single columns or is actual data rate more important to know?

Cheers,
Alexandru


Request for Transactional Scenarios

2011-10-11 Thread Henrique Moniz
Hi,

>From time to time discussions pop up here regarding the transactional or
atomic capabilities of Cassandra (or lack thereof). There is at least one
project dedicated to solving this problem (i.e., Cages). Unfortunately, in
pretty much every discussion or blog post I’ve come across on this subject,
people have not been very clear on stating their own reasons for requiring
transactional semantics.

I would like to hear from people who, at some point, felt they needed
transactional semantics in Cassandra. What motivated this need and did you
do about it? Did you come up with your solution or used Cages? And how did
that worked out for you?

I ask this because I’m very interested in contributing with a general
solution to this problem, but first it’s important to understand its extent.

It would be awesome if you could share whatever you can from your experience
(e.g., the kind of data, the higher-level operations performed on this data,
level of concurrency, throughput requirements, etc). Even if you feel you
cannot provide a detailed description, it would also be very useful if you
could at least describe the transactional patterns involved (e.g., number of
servers accessed, ratio of read/write operations, conflict rate, data flows,
etc).

Take care,
Henrique


Re: ApacheCon meetup?

2011-10-11 Thread Eric Evans
On Tue, Oct 4, 2011 at 2:44 PM, Chris Burroughs
 wrote:
> ApacheCon NA is coming up next month.  I suspect there will be at least
> a few Cassandra users there (yeah new release!).  Would anyone be
> interested in getting together and sharing some stories?  This could
> either be a "official" [1] meetup.  Or grabbing food together sometime.

Let's do it.  We can organize an official one, and still grab food
together if that's not enough. :)

I created a wiki page to get things started:
http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011

If you have ideas for what to do (aside from drinking beer and
chatting with fellow Cassandrites), please update the page (maybe a
short presentation on what's new in 1.0?).

> [1] http://wiki.apache.org/apachecon/ApacheMeetupsNa11

I'll update this one as well.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Hector has a website

2011-10-11 Thread Aaron Turner
Just a FYI:

http://hector-client.org is requesting a username/pass
http://www.hector-client.org is working fine

On Fri, Oct 7, 2011 at 12:51 AM, aaron morton  wrote:
> Thanks, will be handy for new peeps.
> A
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 7/10/2011, at 12:00 PM, Patricio Echagüe wrote:
>
> Hi, I wanted to let you all know that Hector client has a website.
> http://hector-client.org
> There are links to documentation, Javadoc and resources from the community.
> If you have a personal blog and want us to include the link, let us know.
> Feedback is always welcome.
> Thanks!
> Hector Team.
>



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: ApacheCon meetup?

2011-10-11 Thread Jake Luciani
Sounds good. I'll be giving a talk there about Cassandra 1.0

http://na11.apachecon.com/talks/19500

On Tue, Oct 11, 2011 at 12:05 PM, Eric Evans  wrote:

> On Tue, Oct 4, 2011 at 2:44 PM, Chris Burroughs
>  wrote:
> > ApacheCon NA is coming up next month.  I suspect there will be at least
> > a few Cassandra users there (yeah new release!).  Would anyone be
> > interested in getting together and sharing some stories?  This could
> > either be a "official" [1] meetup.  Or grabbing food together sometime.
>
> Let's do it.  We can organize an official one, and still grab food
> together if that's not enough. :)
>
> I created a wiki page to get things started:
> http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011
>
> If you have ideas for what to do (aside from drinking beer and
> chatting with fellow Cassandrites), please update the page (maybe a
> short presentation on what's new in 1.0?).
>
> > [1] http://wiki.apache.org/apachecon/ApacheMeetupsNa11
>
> I'll update this one as well.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>



-- 
http://twitter.com/tjake


add bloomfilter results to nodetool?

2011-10-11 Thread Yang
I find the info about bloomfilter very helpful, could we add that to NodeCmd ?

Thanks
Yang


Re: add bloomfilter results to nodetool?

2011-10-11 Thread Brandon Williams
On Tue, Oct 11, 2011 at 12:19 PM, Yang  wrote:
> I find the info about bloomfilter very helpful, could we add that to NodeCmd ?

Feel free to create a ticket and tag it 'lhf'

-Brandon


different size sstable on different nodes?

2011-10-11 Thread Yang
after I did a major compaction on both nodes in my test cluster,
I found that for the same CF, one node has a 100MB sstable file, while
the other has a 1GB one.

since GC_grace is set into schema, and both nodes have the same
config, how could this happen?

I'm still going through sstable2json to figure out, just want to see
if there are any
apparent things I missed

thanks
Yang


Re: different size sstable on different nodes?

2011-10-11 Thread Yang
"46e70d80": 
[["0132f3726cbb30303030303030303030303030303030303030303030303030303030303030303030316431636633","4e945b0e",1318344486784,"d"]

for the timestamp
perl -e 'print gmtime(1318344486)."\n" '
Tue Oct 11 14:48:06 2011

$ TZ=GMT date
Tue Oct 11 17:40:31 GMT 2011


so it's almost 3 hours old, but I just finished running the
compaction, and GC_SECONDS is 7200 , set short for testing purpose. so
this deletion column should have been thrown away during the
compaction






On Tue, Oct 11, 2011 at 10:33 AM, Yang  wrote:
> after I did a major compaction on both nodes in my test cluster,
> I found that for the same CF, one node has a 100MB sstable file, while
> the other has a 1GB one.
>
> since GC_grace is set into schema, and both nodes have the same
> config, how could this happen?
>
> I'm still going through sstable2json to figure out, just want to see
> if there are any
> apparent things I missed
>
> thanks
> Yang
>


Schema versions reflect schemas on unwanted nodes

2011-10-11 Thread Eric Czech
Hi, I'm having what I think is a fairly uncommon schema issue --

My situation is that I had a cluster with 10 nodes and a consistent schema.
 Then, in an experiment to setup a second cluster with the same information
(by copying the raw sstables), I left the LocationInfo* sstables in the
system keyspace in the new cluster and after starting the second cluster, I
realized that the two clusters were discovering each other when they
shouldn't have been.  Since then, I changed the cluster name for the second
cluster and made sure to delete the LocationInfo* sstables before starting
it and the two clusters are now operating independent of one another for the
most part.  The only remaining connection between the two seems to be that
the first cluster is still maintaining references to nodes in the second
cluster in the schema versions despite those nodes not actually being part
of the ring.

Here's what my "describe cluster" looks like on the original cluster:

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
48971cb0-e9ff-11e0--eb9eab7d90bf: [, ,
..., ]
 848bcfc0-eddf-11e0--8a3bb58f08ff: [,
]

The second cluster, however, contains no schema versions involving nodes
from the first cluster.

My question then is, how can I remove those schema versions from the
original cluster that are associated with the unwanted nodes from the second
cluster?  Is there any way to remove or evict an IP from a cluster instead
of just a token?

Thanks in advance!

- Eric


Re: Volunteers needed - Wiki

2011-10-11 Thread Daria Hutchinson
DataStax would like to help with the wiki update effort. For example, we
have a start on updates for 1.0, such as the storage configuration.

http://www.datastax.com/docs/1.0/configuration/storage_configuration

Let me know how we can help.

Cheers,
Daria (DataStax Tech Writer)

Question - Are you planning on maintaining wiki docs by version going
forward (starting with 1.0)?

On Tue, Oct 11, 2011 at 1:55 AM, aaron morton wrote:

> @maki thanks,
> Could you take a look at the cli page
> http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of online
> docs in the tool, so we dont need to replicate that. Just a simple getting
> started guide, some examples and a few tips about about what to do if things
> don't work. e.g. often people have problems when using bytes comparator. If
> you could use the sample schema that ships in conf/ that would be handy.
>
> You may want to snapshot the 0.7 CLI page in the same way the 0.6 one was
> and link back http://wiki.apache.org/cassandra/CassandraCli06
>
> Just update the draft home page to say you are working on it
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>
> @sasha
> I was going to use the draft home page as a todo list, (do every page
> listed on there, and sensibly follow links) and as a checkout system
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>
> @Jérémy
> Thanks I'll keep that in mind.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:
>
> Hi Aaron,
>
> I think the CommitLog section is outdated (
> http://wiki.apache.org/cassandra/ArchitectureCommitLog) :
>
> The CommitLogHeader is no longer exist since this ticket :
> https://issues.apache.org/jira/browse/CASSANDRA-2419
>
> Regards,
>
> Jérémy
>
> 2011/10/11 Sasha Dolgy 
>
>> maybe that should be the first wiki update  the TODO
>>
>>
>> On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe 
>> wrote:
>>
>>> Hello aaron,
>>> I raise my hand too.
>>> If you have to-do list about the wiki, please let us know.
>>>
>>> maki
>>>
>>
>>
>
>
> --
> Jérémy
>
>
>


Re: ApacheCon meetup?

2011-10-11 Thread Eric Evans
On Tue, Oct 11, 2011 at 11:05 AM, Eric Evans  wrote:
> On Tue, Oct 4, 2011 at 2:44 PM, Chris Burroughs
>  wrote:
>> ApacheCon NA is coming up next month.  I suspect there will be at least
>> a few Cassandra users there (yeah new release!).  Would anyone be
>> interested in getting together and sharing some stories?  This could
>> either be a "official" [1] meetup.  Or grabbing food together sometime.
>
> Let's do it.  We can organize an official one, and still grab food
> together if that's not enough. :)
>
> I created a wiki page to get things started:
> http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011
>
> If you have ideas for what to do (aside from drinking beer and
> chatting with fellow Cassandrites), please update the page (maybe a
> short presentation on what's new in 1.0?).

To update this:

The meetup will be on November 10th from 8-10pm at the The Westin
Bayshore Vancouver, (1601 Bayshore Drive, Vancouver, Canada).

It is FREE to the public, and BEER will be served.

Please tell your friends. :)

See you there!

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


CassandraDaemon deactivate doesn't shutdown Cassandra

2011-10-11 Thread Shimi Kiviti
I am running an Embedded Cassandra (0.8.7) and
calling CassandraDaemon.deactivate() after I write rows (at least 1),
doesn't shutdown Cassandra.
If I run only "reads" it does shutdown even without
calling CassandraDaemon.deactivate()

Anyone have any idea what can cause this problem?

Shimi


Unsubscribe

2011-10-11 Thread Jim Zamata



Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Sasha Dolgy
I was trying to get a range of rows based on a secondary_index that was
defined.  Any rows where age was greater than or equal to ... it didn't
work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
find anything.

The output from "help get;" on the cli contains the following, which led me
to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...

get  where[
andand ...] [limit ];
get  where   () [
andand ...] [limit ];

- operator: Operator to test the column value with. Supported operators are
  =, >, >=, <, <= .

  In Cassandra 0.7 at least one = operator must be present.

[default@sdo]  get user where age >= 18;
No indexed columns present in index clause with operator EQ
[default@sdo]  get user where gender = 1 and age >= 18
(returns results)

Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...

create column family user
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and memtable_operations = 0.248437498
  and memtable_throughput = 53
  and memtable_flush_after = 1440
  and rows_cached = 0.0
  and row_cache_save_period = 0
  and keys_cached = 20.0
  and key_cache_save_period = 14400
  and read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
  and column_metadata = [
{column_name : 'gender',
validation_class : LongType,
index_name : 'user_gender_idx',
index_type : 0},
{column_name : 'year',
validation_class : LongType,
index_name : 'user_year_idx',
index_type : 0}];



-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Cassandra as session store under heavy load

2011-10-11 Thread aaron morton
Some thoughts…

> non replicated Key Space
Not sure what you mean here. Do you mean RF 1 ? I would consider using 3. 
Consider what happens you want to install a rolling upgrade to the cluster. 

> single Column Family, where key is session ID and each column within row 
> stores single key/value - (Map>)

Consider storing the session data as a single blob in a single column, it will 
reduce the memory and disk overhead and run a bit faster. Assuming the blobs 
are not too big.

> write CL = ONE
> read CL = ONE
Consider testing at QUORUM and then use ONE if you think it helps with your 
availability requirements. 

> 2.000 writes/s
> 5.000 reads/s
Fine and dandy. If you really want to squeeze the most out of the reads go down 
the netflix path and use the external Memcache row cache provider. So yo can 
have reads out of a very large cache outside of the JVM, have cassandra persist 
the data. 

With 3 reasonably spec'd machines I would guess this throughput is achievable 
without too much tuning. Depending on how big the working set is. 

> In this case consistency is not a problem, but the performance could be, 
> especially disk IO.
> 

Wait and see, but if you can disable the commit log or use a longer periodic 
sync. Of course the simple solution is add more machines. 

> If column expires in Memtable before flushing it to SSTable, will Cassandra 
> anyway store such column in SSTable (flush it to HDD)?
Yes, for technical reasons they need to hit the disk. Otherwise the column 
instance will not be used when reconciling against other copies of the column 
already on disk. 

> Each CF hat max 10 columns. In such case I would enable row cache and disable 
> key cache. But I am expecting my data to be still available in Memtable, in 
> this case I could disable whole cache, right?
Keep the row cache. Now days (0.8 sort of and 1.0 definitely) there is no way 
to control how long data stays in the memtable table. This is a good thing as 
you will get it wrong. 
 
> Any Cassandra configuration hints for such session-store use case would be 
> really appreciated :)
Inline above. It is important to understand is how big the working set may be, 
basically estimate concurrent users * session size. Do some tests and don't 
bother tuning until they show you need to. 

Have fun. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11/10/2011, at 11:49 PM, Maciej Miklas wrote:

> Hi *,
> 
> I would like to use Cassandra to store session related informations. I do not 
> have real HTTP session - it's different protocol, but the same concept.
> 
> Memcached would be fine, but I would like to additionally persist data.
> 
> Cassandra setup:
> 
> non replicated Key Space
> single Column Family, where key is session ID and each column within row 
> stores single key/value - (Map>)
> column TTL = 10 minutes
> write CL = ONE
> read CL = ONE
> 2.000 writes/s
> 5.000 reads/s
> Data example:
> 
> session1:{ // CF row key
>{prop1:val1, TTL:10 min},
>{prop2:val2, TTL:10 min},
> .
>{propXXX:val3, TTL:10 min}
> },
> session2:{ // CF row key
>{prop1:val1, TTL:10 min},
>{prop2:val2, TTL:10 min},
> },
> ..
> session:{ // CF row key
>{prop1:val1, TTL:10 min},
>{prop2:val2, TTL:10 min},
> }
> In this case consistency is not a problem, but the performance could be, 
> especially disk IO.
> 
> Since data in my session leaves for short time, I would like to avoid storing 
> it on hard drive - except for commit log.
> 
> I have some questions:
> 
> If column expires in Memtable before flushing it to SSTable, will Cassandra 
> anyway store such column in SSTable (flush it to HDD)?
> Replication is disabled for my Key Space, in this case storing such expired 
> column in SSTable would not be necessary, right?
> Each CF hat max 10 columns. In such case I would enable row cache and disable 
> key cache. But I am expecting my data to be still available in Memtable, in 
> this case I could disable whole cache, right?
> Any Cassandra configuration hints for such session-store use case would be 
> really appreciated :)
> Thank you, 
> 
> Maciej
> 



Re: CompletedTasks attribute exposed via JMX

2011-10-11 Thread aaron morton
Its the number of mutations, a mutation is a collection of changes for a single 
row across one or more column families. 

Take a look at the nodetool cfstats, this is where I assume Ops Centre is 
getting it's data from. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:

> Hello everyone,
>  I was trying to get some cluster wide statistics of the total insertions 
> performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little 
> program that gets the CompletedTasks attribute of 
> org.apache.cassandra.db:type=Commitlog from every node, sums up the values 
> and records them in a .csv every 10 sec or so. Everything works and I get my 
> stats but later I found out that I am not really sure what this measure 
> means. I think it is the individual column insertions performed! Am I correct?
>  In the meantime I installed the trial version of the DataStax Operations 
> Center. The cluster wide dashboard, showing Writes performed as a function of 
> time, gives me much smaller values of the rates, compared to the measurement 
> I described before. The Datastax writes/sec are of the same order of 
> magnitude as the batch writes I perform on the cluster. But somehow I cannot 
> relate between this rate and the rate of my CompletedTasks measurement.
> 
> How do people usually measure insertion rates for their custers ? Per batch, 
> per single columns or is actual data rate more important to know?
> 
> Cheers,
> Alexandru
> 



Re: Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Jake Luciani
This hasn't changed in AFAIK,  In Brisk we had the same problem in CFS so we
created a sentinel value that all rows shared then it works. CASSANDRA-2915
should fix it.

On Tue, Oct 11, 2011 at 4:48 PM, Sasha Dolgy  wrote:

> I was trying to get a range of rows based on a secondary_index that was
> defined.  Any rows where age was greater than or equal to ... it didn't
> work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
> find anything.
>
> The output from "help get;" on the cli contains the following, which led me
> to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...
>
> get  where[
> andand ...] [limit ];
> get  where   () [
> andand ...] [limit ];
>
> - operator: Operator to test the column value with. Supported operators are
>   =, >, >=, <, <= .
>
>   In Cassandra 0.7 at least one = operator must be present.
>
> [default@sdo]  get user where age >= 18;
> No indexed columns present in index clause with operator EQ
> [default@sdo]  get user where gender = 1 and age >= 18
> (returns results)
>
> Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...
>
> create column family user
>   with column_type = 'Standard'
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and memtable_operations = 0.248437498
>   and memtable_throughput = 53
>   and memtable_flush_after = 1440
>   and rows_cached = 0.0
>   and row_cache_save_period = 0
>   and keys_cached = 20.0
>   and key_cache_save_period = 14400
>   and read_repair_chance = 1.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
>   and column_metadata = [
> {column_name : 'gender',
> validation_class : LongType,
> index_name : 'user_gender_idx',
> index_type : 0},
> {column_name : 'year',
> validation_class : LongType,
> index_name : 'user_year_idx',
> index_type : 0}];
>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>



-- 
http://twitter.com/tjake


Re: Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Sasha Dolgy
ah, hadn't even thought of that.  simple.  elegant.

cheers.

On Tue, Oct 11, 2011 at 11:01 PM, Jake Luciani  wrote:

> This hasn't changed in AFAIK,  In Brisk we had the same problem in CFS so
> we created a sentinel value that all rows shared then it works.
> CASSANDRA-2915 should fix it.
>
> On Tue, Oct 11, 2011 at 4:48 PM, Sasha Dolgy  wrote:
>
>> I was trying to get a range of rows based on a secondary_index that was
>> defined.  Any rows where age was greater than or equal to ... it didn't
>> work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
>> find anything.
>>
>> The output from "help get;" on the cli contains the following, which led
>> me to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...
>>
>> get  where[
>> andand ...] [limit ];
>> get  where   () [
>> andand ...] [limit ];
>>
>> - operator: Operator to test the column value with. Supported operators
>> are
>>   =, >, >=, <, <= .
>>
>>   In Cassandra 0.7 at least one = operator must be present.
>>
>> [default@sdo]  get user where age >= 18;
>> No indexed columns present in index clause with operator EQ
>> [default@sdo]  get user where gender = 1 and age >= 18
>> (returns results)
>>
>> Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...
>>
>> create column family user
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'BytesType'
>>   and key_validation_class = 'BytesType'
>>   and memtable_operations = 0.248437498
>>   and memtable_throughput = 53
>>   and memtable_flush_after = 1440
>>   and rows_cached = 0.0
>>   and row_cache_save_period = 0
>>   and keys_cached = 20.0
>>   and key_cache_save_period = 14400
>>   and read_repair_chance = 1.0
>>   and gc_grace = 864000
>>   and min_compaction_threshold = 4
>>   and max_compaction_threshold = 32
>>   and replicate_on_write = true
>>   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
>>   and column_metadata = [
>> {column_name : 'gender',
>> validation_class : LongType,
>> index_name : 'user_gender_idx',
>> index_type : 0},
>> {column_name : 'year',
>> validation_class : LongType,
>> index_name : 'user_year_idx',
>> index_type : 0}];
>>
>>
>>
>> --
>> Sasha Dolgy
>> sasha.do...@gmail.com
>>
>
>
>
> --
> http://twitter.com/tjake
>



-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Volunteers needed - Wiki

2011-10-11 Thread aaron morton
Thanks Daria, I have a look at whats there and get in touch.

Right now I'm not thinking beyond getting the wiki complete (e.g. it lists all 
the command line tools) and correct for version 1.0. My main concern was people 
coming away from the site with incorrect information and having a bad out of 
the box experience.  

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12/10/2011, at 7:42 AM, Daria Hutchinson wrote:

> DataStax would like to help with the wiki update effort. For example, we have 
> a start on updates for 1.0, such as the storage configuration.
> 
> http://www.datastax.com/docs/1.0/configuration/storage_configuration
> 
> Let me know how we can help.
> 
> Cheers, 
> Daria (DataStax Tech Writer)  
> 
> Question - Are you planning on maintaining wiki docs by version going forward 
> (starting with 1.0)?   
> 
> On Tue, Oct 11, 2011 at 1:55 AM, aaron morton  wrote:
> @maki thanks, 
>   Could you take a look at the cli page 
> http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of online 
> docs in the tool, so we dont need to replicate that. Just a simple getting 
> started guide, some examples and a few tips about about what to do if things 
> don't work. e.g. often people have problems when using bytes comparator. If 
> you could use the sample schema that ships in conf/ that would be handy. 
> 
>   You may want to snapshot the 0.7 CLI page in the same way the 0.6 one 
> was and link back http://wiki.apache.org/cassandra/CassandraCli06
> 
>   Just update the draft home page to say you are working on it 
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
> 
> @sasha
>   I was going to use the draft home page as a todo list, (do every page 
> listed on there, and sensibly follow links) and as a checkout system  
> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
> 
> @Jérémy
>   Thanks I'll keep that in mind. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:
> 
>> Hi Aaron,
>> 
>> I think the CommitLog section is outdated 
>> (http://wiki.apache.org/cassandra/ArchitectureCommitLog) :
>> 
>> The CommitLogHeader is no longer exist since this ticket : 
>> https://issues.apache.org/jira/browse/CASSANDRA-2419
>> 
>> Regards,
>> 
>> Jérémy
>> 
>> 2011/10/11 Sasha Dolgy 
>> maybe that should be the first wiki update  the TODO
>> 
>> 
>> On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe  
>> wrote:
>> Hello aaron,
>> I raise my hand too.
>> If you have to-do list about the wiki, please let us know.
>> 
>> maki
>> 
>> 
>> 
>> 
>> -- 
>> Jérémy
> 
> 



pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Pete Warden
I'm trying to run the most basic example for pig_cassandra, counting the
number of rows in a column family, and I'm hitting the following error:

2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable field schema: left is
"columns:bag{:tuple(name:bytearray,value:bytearray)}", right is
"columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}"

I've tried it with various column families, with the same result, but here's
the definition of this one:

create column family FriendsAlreadyRanked with
  comparator = UTF8Type and
  column_metadata =
  [
{column_name: time_last_ranked, validation_class: UTF8Type},
  ];

Here's the command I'm running from within pig_cassandra:

rows = LOAD 'cassandra://Frap/FriendsAlreadyRanked' USING CassandraStorage()
AS (key, columns:bag{T: tuple(name, value)});

Here's my versions:

Apache Pig version 0.9.1 (r1177456)

Cassandra 0.8.1

Any thoughts on how to troubleshoot this? It's obviously connecting to
Cassandra since it pulls out the column family definition, so I'm guessing
it's a Pig type definition problem, but I haven't figured out what it
expects (and all the examples just use the form above).

cheers,

   Pete


Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-11 Thread Günter Ladwig
Hi all,

I'm seeing the same problem on my 1.0.0-rc2 cluster. However, I do not have 
5000, but just three (compressed) CFs.

The exception does not happen for the Migrations CF, but for one of mine:

Keyspace: KeyspaceCumulus
Read Count: 816
Read Latency: 8.926029411764706 ms.
Write Count: 16808336
Write Latency: 0.03914435902518846 ms.
Pending Tasks: 0
Column Family: OSP
SSTable count: 22
Space used (live): 22319610951
Space used (total): 7585112
Number of Keys (estimate): 87322624
Memtable Columns Count: 56028
Memtable Data Size: 54362270
Memtable Switch Count: 154
Read Count: 277
Read Latency: NaN ms.
Write Count: 10913659
Write Latency: NaN ms.
Pending Tasks: 0
Key cache: disabled
Row cache: disabled
Compacted row minimum size: 125
Compacted row maximum size: 9223372036854775807
Exception in thread "main" java.lang.IllegalStateException: Unable to compute 
ceiling for max when histogram overflowed
at 
org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
at 
org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
at 
org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:275)
[...snip…]

I also had a look at the stats using JMX. The other CFs work fine, the only 
problem seems to be this one. In JMX it shows 'Unavailable' for the row mean 
size and also that ridiculous value for the max size.

The cluster consists of 15 nodes. The keyspace has three CFs (SPO, OSP and POS) 
of which only two contain any data (POS is empty), and uses replication factor 
2. In total, there are about 2 billion columns in each CF. The data 
distribution is different between the two CFs. The row sizes for SPO should be 
fairly evenly distributed whereas OSP will have a few very wide rows and a 
large number of small rows. 

Here is the output from describe:

Keyspace: KeyspaceCumulus:  

 Replication Strategy: 
org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]
  Column Families:
ColumnFamily: OSP
  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Key cache size / save period in seconds: 0.0/0
  GC grace seconds: 0
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.0
  Replicate on write: false
  Built indexes: []
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: POS
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Key cache size / save period in seconds: 0.0/0
  GC grace seconds: 0
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.0
  Replicate on write: false
  Built indexes: [POS.index_p]
  Column Metadata:
Column Name: !o
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: !p
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: index_p
  Index Type: KEYS
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: SPO
  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Key cache size / save period in seconds: 0.0/0
  GC grace seconds: 0
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.0
  Replicate on write: false
  Built indexes: []
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

If you need additional information, let me know.

Cheer

Re: Volunteers needed - Wiki

2011-10-11 Thread Daria Hutchinson
Sounds like a good place to start!

Thanks for taking the lead and please let me know how I can help!

Daria

On Tue, Oct 11, 2011 at 2:20 PM, aaron morton wrote:

> Thanks Daria, I have a look at whats there and get in touch.
>
> Right now I'm not thinking beyond getting the wiki complete (e.g. it lists
> all the command line tools) and correct for version 1.0. My main concern was
> people coming away from the site with incorrect information and having a bad
> out of the box experience.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/10/2011, at 7:42 AM, Daria Hutchinson wrote:
>
> DataStax would like to help with the wiki update effort. For example, we
> have a start on updates for 1.0, such as the storage configuration.
>
> http://www.datastax.com/docs/1.0/configuration/storage_configuration
>
> Let me know how we can help.
>
> Cheers,
> Daria (DataStax Tech Writer)
>
> Question - Are you planning on maintaining wiki docs by version going
> forward (starting with 1.0)?
>
> On Tue, Oct 11, 2011 at 1:55 AM, aaron morton wrote:
>
>> @maki thanks,
>> Could you take a look at the cli page
>> http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of online
>> docs in the tool, so we dont need to replicate that. Just a simple getting
>> started guide, some examples and a few tips about about what to do if things
>> don't work. e.g. often people have problems when using bytes comparator. If
>> you could use the sample schema that ships in conf/ that would be handy.
>>
>> You may want to snapshot the 0.7 CLI page in the same way the 0.6 one was
>> and link back http://wiki.apache.org/cassandra/CassandraCli06
>>
>> Just update the draft home page to say you are working on it
>> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>>
>> @sasha
>> I was going to use the draft home page as a todo list, (do every page
>> listed on there, and sensibly follow links) and as a checkout system
>> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>>
>> @Jérémy
>> Thanks I'll keep that in mind.
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:
>>
>> Hi Aaron,
>>
>> I think the CommitLog section is outdated (
>> http://wiki.apache.org/cassandra/ArchitectureCommitLog) :
>>
>> The CommitLogHeader is no longer exist since this ticket :
>> https://issues.apache.org/jira/browse/CASSANDRA-2419
>>
>> Regards,
>>
>> Jérémy
>>
>> 2011/10/11 Sasha Dolgy 
>>
>>> maybe that should be the first wiki update  the TODO
>>>
>>>
>>> On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe 
>>> wrote:
>>>
 Hello aaron,
 I raise my hand too.
 If you have to-do list about the wiki, please let us know.

 maki

>>>
>>>
>>
>>
>> --
>> Jérémy
>>
>>
>>
>
>


Re: Volunteers needed - Wiki

2011-10-11 Thread Sasha Dolgy
while on the topic of the wiki ... it's not entirely pleasing to the senses
or at all user friendly ... hacking around on it earlier today, there aren't
that many options on how to give it some flare ... shame really that for
such a cool piece of software, the wiki doesn't scream the same level of
cool.

FWIW, Cassandra doesn't show up on http://wiki.apache.org/general/

On Wed, Oct 12, 2011 at 12:05 AM, Daria Hutchinson wrote:

> Sounds like a good place to start!
>
> Thanks for taking the lead and please let me know how I can help!
>
> Daria
>
>
> On Tue, Oct 11, 2011 at 2:20 PM, aaron morton wrote:
>
>> Thanks Daria, I have a look at whats there and get in touch.
>>
>> Right now I'm not thinking beyond getting the wiki complete (e.g. it lists
>> all the command line tools) and correct for version 1.0. My main concern was
>> people coming away from the site with incorrect information and having a bad
>> out of the box experience.
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12/10/2011, at 7:42 AM, Daria Hutchinson wrote:
>>
>> DataStax would like to help with the wiki update effort. For example, we
>> have a start on updates for 1.0, such as the storage configuration.
>>
>> http://www.datastax.com/docs/1.0/configuration/storage_configuration
>>
>> Let me know how we can help.
>>
>> Cheers,
>> Daria (DataStax Tech Writer)
>>
>> Question - Are you planning on maintaining wiki docs by version going
>> forward (starting with 1.0)?
>>
>> On Tue, Oct 11, 2011 at 1:55 AM, aaron morton wrote:
>>
>>> @maki thanks,
>>> Could you take a look at the cli page
>>> http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of
>>> online docs in the tool, so we dont need to replicate that. Just a simple
>>> getting started guide, some examples and a few tips about about what to do
>>> if things don't work. e.g. often people have problems when using bytes
>>> comparator. If you could use the sample schema that ships in conf/ that
>>> would be handy.
>>>
>>> You may want to snapshot the 0.7 CLI page in the same way the 0.6 one was
>>> and link back http://wiki.apache.org/cassandra/CassandraCli06
>>>
>>> Just update the draft home page to say you are working on it
>>> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>>>
>>> @sasha
>>> I was going to use the draft home page as a todo list, (do every page
>>> listed on there, and sensibly follow links) and as a checkout system
>>> http://wiki.apache.org/cassandra/FrontPage_draft_aaron
>>>
>>> @Jérémy
>>> Thanks I'll keep that in mind.
>>>
>>> Cheers
>>>
>>>  -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:
>>>
>>> Hi Aaron,
>>>
>>> I think the CommitLog section is outdated (
>>> http://wiki.apache.org/cassandra/ArchitectureCommitLog) :
>>>
>>> The CommitLogHeader is no longer exist since this ticket :
>>> https://issues.apache.org/jira/browse/CASSANDRA-2419
>>>
>>> Regards,
>>>
>>> Jérémy
>>>
>>> 2011/10/11 Sasha Dolgy 
>>>
 maybe that should be the first wiki update  the TODO


 On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe >>> > wrote:

> Hello aaron,
> I raise my hand too.
> If you have to-do list about the wiki, please let us know.
>
> maki
>


>>>
>>>
>>> --
>>> Jérémy
>>>
>>>
>>>
>>
>>
>


-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: 0.7.9 RejectedExecutionException

2011-10-11 Thread Ashley Martens
So we created a script to check if Cassandra is alive and run it every two
minutes. Here are some results for today:

Tue Oct 11 18:28:09 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 19:00:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 19:30:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 20:02:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 21:34:10 UTC 2011 - F this Cassandra bullshit... it died again
Tue Oct 11 22:06:10 UTC 2011 - F this Cassandra bullshit... it died again


And here are some of the log tails:

 INFO [CompactionExecutor:1] 2011-10-11 18:58:14,909 CompactionManager.java
(line 395) Compacting []
 INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 172)
Completed flushing /var/lib/cassandra/data/
system/HintsColumnFamily-f-568-Data.db (60 bytes)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@1493400027(0 bytes, 1 operations)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-569-Data.db (61 bytes)
 INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@1932871300(0 bytes, 1 operations)
 INFO [FlushWriter:10] 2011-10-11 18:58:15,031 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-570-Data.db (61 bytes)

INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/
system/HintsColumnFamily-f-1066
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1098
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1040
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1071
 INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,907 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1093

INFO [FlushWriter:8] 2011-10-11 20:00:10,701 Memtable.java (line 157)
Writing Memtable-HintsColumnFamily@
1488536311(0 bytes, 1 operations)
 INFO [CompactionExecutor:1] 2011-10-11 20:00:10,701 CompactionManager.java
(line 395) Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1687-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1688-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1689-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1690-Data.db')]
 INFO [FlushWriter:8] 2011-10-11 20:00:10,741 Memtable.java (line 172)
Completed flushing
/var/lib/cassandra/data/system/HintsColumnFamily-f-1691-Data.db (61 bytes)

 INFO [NonPeriodicTasks:1] 2011-10-11 21:33:26,980 SSTable.java (line 147)
Deleted /var/lib/cassandra/data/
system/HintsColumnFamily-f-3349
ERROR [Thread-18] 2011-10-11 21:33:31,452 AbstractCassandraDaemon.java (line
132) Fatal exception in thread Thread[Thread-18,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
   at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
   at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
   at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
   at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
   at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)

ERROR [Thread-19] 2011-10-11 22:04:39,195 AbstractCassandraDaemon.java (line
132) Fatal exception in thread Thread[Thread-19,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
   at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
   at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
   at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
   at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
   at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)


I'm going to increase the logging level to DEBUG. Other than that I've got
to say that Cassandra 0.7.9 is F'ed in some way or another.


Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Jeremy Hanna
Just for informational purposes, Pete and I tried to troubleshoot it via 
twitter.  I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1.  
He's going to dig in to see if there's something else going on.

// Cassandra-cli stuff
// bin/cassandra-cli -h localhost -p 9160
create keyspace lala;
use lala;
create column family FriendsAlreadyRanked with
comparator = UTF8Type and
key_validation_class = UTF8Type and
column_metadata =
[
{column_name: time_last_ranked, validation_class: UTF8Type},
];
set FriendsAlreadyRanked['mykey']['time_last_ranked'] = '2011-10-10';

// Pig stuff
// bin/pig_cassandra -x local myscript.pig
rows = LOAD 'cassandra://lala/FriendsAlreadyRanked' USING CassandraStorage() AS 
(key, columns:bag{T: tuple(name, value)});
dump rows;

// Ouput
(mykey,{(time_last_ranked,2011-10-10)})

On Oct 11, 2011, at 4:24 PM, Pete Warden wrote:

> I'm trying to run the most basic example for pig_cassandra, counting the 
> number of rows in a column family, and I'm hitting the following error:
> 
> 2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: left is 
> "columns:bag{:tuple(name:bytearray,value:bytearray)}", right is 
> "columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}"
> 
> I've tried it with various column families, with the same result, but here's 
> the definition of this one:
> 
> create column family FriendsAlreadyRanked with
>   comparator = UTF8Type and
>   column_metadata =
>   [
> {column_name: time_last_ranked, validation_class: UTF8Type},
>   ];
> 
> Here's the command I'm running from within pig_cassandra:
> rows = LOAD 'cassandra://Frap/FriendsAlreadyRanked' USING CassandraStorage() 
> AS (key, columns:bag{T: tuple(name, value)});
> 
> Here's my versions:
> 
> Apache Pig version 0.9.1 (r1177456)
> 
> Cassandra 0.8.1
> 
> Any thoughts on how to troubleshoot this? It's obviously connecting to 
> Cassandra since it pulls out the column family definition, so I'm guessing 
> it's a Pig type definition problem, but I haven't figured out what it expects 
> (and all the examples just use the form above).
> 
> cheers,
> 
>Pete
> 



Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Brandon Williams
On Tue, Oct 11, 2011 at 4:24 PM, Pete Warden  wrote:
> I'm trying to run the most basic example for pig_cassandra, counting the
> number of rows in a column family, and I'm hitting the following error:
> 2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1031: Incompatable field schema: left is
> "columns:bag{:tuple(name:bytearray,value:bytearray)}", right is
> "columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}"

After https://issues.apache.org/jira/browse/CASSANDRA-2777 you need to
remove the 'AS' and everything after it; your schema definition
conflicts with what was inferred.

-Brandon


Re: Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Jonathan Ellis
simple, elegant, and less performant than just doing a range scan
without the index. :)

On Tue, Oct 11, 2011 at 4:06 PM, Sasha Dolgy  wrote:
> ah, hadn't even thought of that.  simple.  elegant.
> cheers.
>
> On Tue, Oct 11, 2011 at 11:01 PM, Jake Luciani  wrote:
>>
>> This hasn't changed in AFAIK,  In Brisk we had the same problem in CFS so
>> we created a sentinel value that all rows shared then it works.
>> CASSANDRA-2915 should fix it.
>> On Tue, Oct 11, 2011 at 4:48 PM, Sasha Dolgy  wrote:
>>>
>>> I was trying to get a range of rows based on a secondary_index that was
>>> defined.  Any rows where age was greater than or equal to ... it didn't
>>> work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
>>> find anything.
>>> The output from "help get;" on the cli contains the following, which led
>>> me to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...
>>> get  where[
>>>     andand ...] [limit ];
>>> get  where   () [
>>>     andand ...] [limit ];
>>> - operator: Operator to test the column value with. Supported operators
>>> are
>>>   =, >, >=, <, <= .
>>>   In Cassandra 0.7 at least one = operator must be present.
>>> [default@sdo]  get user where age >= 18;
>>> No indexed columns present in index clause with operator EQ
>>> [default@sdo]  get user where gender = 1 and age >= 18
>>> (returns results)
>>> Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...
>>> create column family user
>>>   with column_type = 'Standard'
>>>   and comparator = 'UTF8Type'
>>>   and default_validation_class = 'BytesType'
>>>   and key_validation_class = 'BytesType'
>>>   and memtable_operations = 0.248437498
>>>   and memtable_throughput = 53
>>>   and memtable_flush_after = 1440
>>>   and rows_cached = 0.0
>>>   and row_cache_save_period = 0
>>>   and keys_cached = 20.0
>>>   and key_cache_save_period = 14400
>>>   and read_repair_chance = 1.0
>>>   and gc_grace = 864000
>>>   and min_compaction_threshold = 4
>>>   and max_compaction_threshold = 32
>>>   and replicate_on_write = true
>>>   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
>>>   and column_metadata = [
>>>     {column_name : 'gender',
>>>     validation_class : LongType,
>>>     index_name : 'user_gender_idx',
>>>     index_type : 0},
>>>     {column_name : 'year',
>>>     validation_class : LongType,
>>>     index_name : 'user_year_idx',
>>>     index_type : 0}];
>>>
>>>
>>> --
>>> Sasha Dolgy
>>> sasha.do...@gmail.com
>>
>>
>>
>> --
>> http://twitter.com/tjake
>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Volunteers needed - Wiki

2011-10-11 Thread hani elabed
Hi Aaron,
I got an account to the wiki, logged in, and claimed the 'Configuration'
page a.k.a 'Storage Configuration' for now. I will let you know when done or
if I get stumped. Will also work on "Setting up Eclipse" page and put it
somewhere.
Hani

On Mon, Oct 10, 2011 at 4:24 PM, aaron morton wrote:

> Thanks, Hani.
> If you would like to update the storage config page that would be handy.
> Just update http://wiki.apache.org/cassandra/FrontPage_draft_aaron/  to
> say you are working on it. Just click the login link at the top to setup an
> account.
>
> wrt setting up eclipse, perhaps you could post your instructions on a blog
> somewhere and we can link to it.
>
> cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/10/2011, at 5:51 AM, hani elabed wrote:
>
> Hi Aaron,
>
> I can help with the documentation... I grabbed tons of screenshots as I was
> installing Cassandra source trunk(1.0.0.rc2?) on my Mac OS X Snow leopard on
> Eclipse Galileo and later Eclipse Indigo, I will be installing it on Eclipse
> for Ubuntu 10.04 soon. I took the sceenshots after I noticed the missing
> picts in here:
>
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse
>
> so I did plan on helping with the update... I am glad you sent your email
> though to get me going.
>
> I am just not sure of the logistics, how to do it, and if I needed to be
> granted some write access to the wiki. Please educate...
>
> I can definitely help on the NodeTool and StorageConfiguration as soon as I
> can grok them myself, or any other documentation.
>
> Also you draft front page and focusing first on 1.0 first match my
> thinking.
>
> Hani Elabed
>
>
> On Mon, Oct 10, 2011 at 4:10 AM, aaron morton wrote:
>
>> Hi there,
>> The dev's have been very busy and Cassandra 1.0 is just around the corner
>> and full of new features. To celebrate I'm trying to give the wiki some
>> loving to make things a little more welcoming for new users.
>>
>> To keep things manageable I'd like to focus on completeness and
>> correctness for now, and worry about being super awesome later. For example
>> the nodetool page is incomplete http://wiki.apache.org/cassandra/NodeTool ,
>> we do not have anything about CQL and config page is from 0.7
>> http://wiki.apache.org/cassandra/StorageConfiguration
>>
>> As a starting point I've created a draft home page
>> http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope to
>> use this as a planning tool where we can mark off what's in progress or has
>> been completed.
>>
>> The guidelines I think we should follow are:
>> * ensure coverage of 1.0, a best effort for 0.8 and leave any content from
>> previous versions.
>>  * where appropriate include examples from CQL and RPC as both are still
>> supported.
>>
>> If you would like to contribute to this effort please let me know via the
>> email list. It's a great way to contribute to the project and learn how
>> Cassandra works, and I'll do my best to help with any questions you may
>> have. Or if you have something you've already written that you feel may be
>> of use let me know, and we'll see about linking to it.
>>
>> Thanks.
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>>
>
>


Re: 0.7.9 RejectedExecutionException

2011-10-11 Thread Jonathan Ellis
grep -i 'killed process' /var/log/messages

On Tue, Oct 11, 2011 at 5:25 PM, Ashley Martens  wrote:
> So we created a script to check if Cassandra is alive and run it every two
> minutes. Here are some results for today:
>
> Tue Oct 11 18:28:09 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 19:00:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 19:30:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 20:02:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 21:34:10 UTC 2011 - F this Cassandra bullshit... it died again
> Tue Oct 11 22:06:10 UTC 2011 - F this Cassandra bullshit... it died again
>
>
> And here are some of the log tails:
>
>  INFO [CompactionExecutor:1] 2011-10-11 18:58:14,909 CompactionManager.java
> (line 395) Compacting []
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 172)
> Completed flushing /var/lib/cassandra/data/
> system/HintsColumnFamily-f-568-Data.db (60 bytes)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,951 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@1493400027(0 bytes, 1 operations)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-569-Data.db (61 bytes)
>  INFO [FlushWriter:10] 2011-10-11 18:58:14,991 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@1932871300(0 bytes, 1 operations)
>  INFO [FlushWriter:10] 2011-10-11 18:58:15,031 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-570-Data.db (61 bytes)
>
> INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/
> system/HintsColumnFamily-f-1066
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1098
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1040
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,906 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1071
>  INFO [NonPeriodicTasks:1] 2011-10-11 19:29:20,907 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/system/HintsColumnFamily-f-1093
>
> INFO [FlushWriter:8] 2011-10-11 20:00:10,701 Memtable.java (line 157)
> Writing Memtable-HintsColumnFamily@
> 1488536311(0 bytes, 1 operations)
>  INFO [CompactionExecutor:1] 2011-10-11 20:00:10,701 CompactionManager.java
> (line 395) Compacting
> [SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1687-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1688-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1689-Data.db'),SSTableReader(path='/var/lib/cassandra/data/system/HintsColumnFamily-f-1690-Data.db')]
>  INFO [FlushWriter:8] 2011-10-11 20:00:10,741 Memtable.java (line 172)
> Completed flushing
> /var/lib/cassandra/data/system/HintsColumnFamily-f-1691-Data.db (61 bytes)
>  INFO [NonPeriodicTasks:1] 2011-10-11 21:33:26,980 SSTable.java (line 147)
> Deleted /var/lib/cassandra/data/
> system/HintsColumnFamily-f-3349
> ERROR [Thread-18] 2011-10-11 21:33:31,452 AbstractCassandraDaemon.java (line
> 132) Fatal exception in thread Thread[Thread-18,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>        at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>        at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>        at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>        at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>        at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
> ERROR [Thread-19] 2011-10-11 22:04:39,195 AbstractCassandraDaemon.java (line
> 132) Fatal exception in thread Thread[Thread-19,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>        at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>        at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>        at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>        at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>        at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
>
> I'm going to increase the logging level to DEBUG. Other than that I've got
> to say that Cassandra 0.7.9 is F'ed in some way or another.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.dat

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-11 Thread Jonathan Ellis
Are all 3 CFs using compression?

On Tue, Oct 11, 2011 at 4:43 PM, Günter Ladwig  wrote:
> Hi all,
>
> I'm seeing the same problem on my 1.0.0-rc2 cluster. However, I do not have 
> 5000, but just three (compressed) CFs.
>
> The exception does not happen for the Migrations CF, but for one of mine:
>
> Keyspace: KeyspaceCumulus
>        Read Count: 816
>        Read Latency: 8.926029411764706 ms.
>        Write Count: 16808336
>        Write Latency: 0.03914435902518846 ms.
>        Pending Tasks: 0
>                Column Family: OSP
>                SSTable count: 22
>                Space used (live): 22319610951
>                Space used (total): 7585112
>                Number of Keys (estimate): 87322624
>                Memtable Columns Count: 56028
>                Memtable Data Size: 54362270
>                Memtable Switch Count: 154
>                Read Count: 277
>                Read Latency: NaN ms.
>                Write Count: 10913659
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache: disabled
>                Row cache: disabled
>                Compacted row minimum size: 125
>                Compacted row maximum size: 9223372036854775807
> Exception in thread "main" java.lang.IllegalStateException: Unable to compute 
> ceiling for max when histogram overflowed
>        at 
> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170)
>        at 
> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:275)
>        [...snip…]
>
> I also had a look at the stats using JMX. The other CFs work fine, the only 
> problem seems to be this one. In JMX it shows 'Unavailable' for the row mean 
> size and also that ridiculous value for the max size.
>
> The cluster consists of 15 nodes. The keyspace has three CFs (SPO, OSP and 
> POS) of which only two contain any data (POS is empty), and uses replication 
> factor 2. In total, there are about 2 billion columns in each CF. The data 
> distribution is different between the two CFs. The row sizes for SPO should 
> be fairly evenly distributed whereas OSP will have a few very wide rows and a 
> large number of small rows.
>
> Here is the output from describe:
>
> Keyspace: KeyspaceCumulus:                                                    
>                                                                               
>                                  Replication Strategy: 
> org.apache.cassandra.locator.SimpleStrategy
>  Durable Writes: true
>    Options: [replication_factor:2]
>  Column Families:
>    ColumnFamily: OSP
>      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period in seconds / keys to save : 0.0/0/all
>      Key cache size / save period in seconds: 0.0/0
>      GC grace seconds: 0
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 0.0
>      Replicate on write: false
>      Built indexes: []
>      Compaction Strategy: 
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>      Compression Options:
>        sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
>    ColumnFamily: POS
>      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period in seconds / keys to save : 0.0/0/all
>      Key cache size / save period in seconds: 0.0/0
>      GC grace seconds: 0
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 0.0
>      Replicate on write: false
>      Built indexes: [POS.index_p]
>      Column Metadata:
>        Column Name: !o
>          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>        Column Name: !p
>          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>          Index Name: index_p
>          Index Type: KEYS
>      Compaction Strategy: 
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>      Compression Options:
>        sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
>    ColumnFamily: SPO
>      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period in seconds / keys to save : 0.0/0/all
>      Key cache size / save period in seconds: 0.0/0
>      GC grace seconds: 0
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 0.0
>      Replicate on write: false
>      Built indexes: []
>      Compaction Strategy: 
> o

Re: anyway to throttle nodetool repair?

2011-10-11 Thread Yan Chunlu
as I asked earlier:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-does-compaction-throughput-kb-per-sec-affect-disk-io-td6831711.html

might not directly throttle the disk I/O?

it would be easy if ionice could work with cassandra. not sure it is because
of jvm or something else, ionice works perfectly with "cp" or "mv" to make
the disk I/O very smooth.

thanks!

On Tue, Oct 11, 2011 at 3:15 PM, Peter Schuller  wrote:

> > so how about disk io?  is there anyway to use ionice to control it?
> > I have tried to adjust the priority by "ionice -c3 -p [cassandra pid].
> >  seems not working...
>
> Compaction throttling (and in 1.0 internode streaming throttling) both
> address disk I/O.
>
> --
> / Peter Schuller (@scode on twitter)
>


Using ttl to expire columns rather than using delete

2011-10-11 Thread Terry Cumaranatunge
Hello,

If you set a ttl and expire a column, I've read that this eventually turns
into a tombstone and will be cleaned out by the GC. Are expirations
considered a form of delete that still requires a node repair to be run in
gc_grace_period seconds? The operations guide says you have to run node
repair if you have deletes, so I'm trying to find out if we can upsert the
column with expirations using a ttl=1 to substitute deletes. The node repair
operations is very intensive in our environment and causes a
significant performance degradation on the system.

Thanks


Re: CompletedTasks attribute exposed via JMX

2011-10-11 Thread Tyler Hobbs
The OpsCenter graph you're referring to basically does the following:

1. For each node, find out how much the WriteOperations attribute of the
StorageProxy increased during the last minute.
2. Sum these values to get a total for the cluster.
3. Divide by 60 to get an average number of WriteOperations per second for
the cluster.

On Tue, Oct 11, 2011 at 3:55 PM, aaron morton wrote:

> Its the number of mutations, a mutation is a collection of changes for a
> single row across one or more column families.
>
> Take a look at the nodetool cfstats, this is where I assume Ops Centre is
> getting it's data from.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote:
>
> Hello everyone,
>  I was trying to get some cluster wide statistics of the total insertions
> performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little
> program that gets the CompletedTasks attribute of
> org.apache.cassandra.db:type=Commitlog from every node, sums up the values
> and records them in a .csv every 10 sec or so. Everything works and I get my
> stats but later I found out that I am not really sure what this measure
> means. I think it is the individual column insertions performed! Am I
> correct?
>  In the meantime I installed the trial version of the DataStax Operations
> Center. The cluster wide dashboard, showing Writes performed as a function
> of time, gives me much smaller values of the rates, compared to the
> measurement I described before. The Datastax writes/sec are of the same
> order of magnitude as the batch writes I perform on the cluster. But somehow
> I cannot relate between this rate and the rate of my CompletedTasks
> measurement.
>
> How do people usually measure insertion rates for their custers ? Per
> batch, per single columns or is actual data rate more important to know?
>
> Cheers,
> Alexandru
>
>
>


-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Pete Warden
Thanks for all your help Brandon and Jeremy, that got me to the point where
I could load data.

I'm now hitting a new issue that seems like it could possibly be related.
When I try to access the data like this:

grunt> rows = LOAD 'cassandra://Frap/FriendsAlreadyRanked' USING
CassandraStorage();
grunt> parts = FOREACH rows GENERATE key,
FromCassandraBag('time_last_ranked', columns);

I see the following error:

2011-10-11 22:23:43,877 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1108:
 Duplicate schema alias: value in "columns"

At first I thought it might be related to the Pygmalion helper functions, so
I tried to strip it back to basics using this second line instead:

parts = FOREACH rows GENERATE key,$1;

and I still get an identical error.

Any further thoughts on how I can dig into this?

Thanks again,
Pete

On Tue, Oct 11, 2011 at 3:37 PM, Brandon Williams  wrote:

> On Tue, Oct 11, 2011 at 4:24 PM, Pete Warden  wrote:
> > I'm trying to run the most basic example for pig_cassandra, counting the
> > number of rows in a column family, and I'm hitting the following error:
> > 2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1031: Incompatable field schema: left is
> > "columns:bag{:tuple(name:bytearray,value:bytearray)}", right is
> >
> "columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}"
>
> After https://issues.apache.org/jira/browse/CASSANDRA-2777 you need to
> remove the 'AS' and everything after it; your schema definition
> conflicts with what was inferred.
>
> -Brandon
>


Re: Cassandra as session store under heavy load

2011-10-11 Thread Maciej Miklas
- RF is 1. We have few KeySpaces, only this one is not replicated - this
data is not that very important. In case of error customer will have to
execute process again. But again, I would like to persist it.
- Serializing data is not an option, because I would like to have
possibility to access data using console
- I will keep row cache - you are right, there is no guarantee, that my data
is still in Memtable

I will get my hardware soon (3 servers) and we will see ;) In this worst
case I will switch my session storage to memcached, and leave all other data
in Cassandra (no TTL, or very long)

Another questions:
- Using Cassandra to build something like "HTTP session store" with short
TTL is not an anti-pattern ?
- There is really no way to tell Cassandra, that particular Key Space should
be stored "mostly" in RAM and only asynchronous backup on HDD (JMS has
something like that)?


Thanks,
Maciej


Re: Hector Problem Basic one

2011-10-11 Thread CASSANDRA learner
Thanks for the reply ben.

Actually The problem is, I could not able to run a basic hector example from
eclipse. Its throwing "me.prettyprint.hector.api.
exceptions.HectorException: All host pools marked
> down. Retry burden pushed out to client
"
Can you please let me know why i am getting this


On Tue, Oct 11, 2011 at 3:54 PM, Ben Ashton  wrote:

> Hey,
>
> We had this one, even tho in the hector documentation it says that it
> retry s failed servers even 30 by default, it doesn't.
>
> Once we explicitly set it to X seconds, when ever there is a failure,
> ie with network (AWS), it will retry and add it back into the pool.
>
> Ben
>
> On 11 October 2011 11:09, CASSANDRA learner 
> wrote:
> > Hi Every One,
> >
> > Actually I was using cassandra long time back and when i tried today, I
> am
> > getting a problem from eclipse. When i am trying to run a basic hector
> > (java) example, I am getting an exception
> > me.prettyprint.hector.api.exceptions.HectorException: All host pools
> marked
> > down. Retry burden pushed out to client. . But My server is up. Node tool
> > also whows that it is up. I donno what happens..
> >
> > 1.)Is it any thing to do with JMX port.
> > 2.) What is the storage port in casandra.yaml and jmx port in
> > cassandra-env.sh
> >
> >
> >
>


Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Pete Warden
For posterity, I ended up hacking around this by renaming the repeated
'value' alias in CassandraStorage and rebuilding it. Here's the patch:

--- src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java.original
2011-10-11
23:42:19.0 -0700
+++ src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java 2011-10-11
23:44:26.0 -0700
@@ -357,7 +357,7 @@
 validator = validators.get(cdef.getName());
 if (validator == null)
 validator = marshallers.get(1);
-valSchema.setName("value");
+valSchema.setName("value_"+new String(cdef.getName()));
 valSchema.setType(getPigType(validator));
 tupleFields.add(valSchema);
 }

I'm not suggesting this is a correct fix, but it does allow me to move
forward. Another suggestion was to try Pig 0.8.1 instead, but I ran into
https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AWhatshallIdoifIsaw%22FailedtocreateDataStorage%22%3F

On Tue, Oct 11, 2011 at 10:34 PM, Pete Warden  wrote:

> Thanks for all your help Brandon and Jeremy, that got me to the point where
> I could load data.
>
> I'm now hitting a new issue that seems like it could possibly be related.
> When I try to access the data like this:
>
> grunt> rows = LOAD 'cassandra://Frap/FriendsAlreadyRanked' USING
> CassandraStorage();
> grunt> parts = FOREACH rows GENERATE key,
> FromCassandraBag('time_last_ranked', columns);
>
> I see the following error:
>
> 2011-10-11 22:23:43,877 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1108:
>  Duplicate schema alias: value in "columns"
>
> At first I thought it might be related to the Pygmalion helper functions,
> so I tried to strip it back to basics using this second line instead:
>
> parts = FOREACH rows GENERATE key,$1;
>
> and I still get an identical error.
>
> Any further thoughts on how I can dig into this?
>
> Thanks again,
> Pete
>
> On Tue, Oct 11, 2011 at 3:37 PM, Brandon Williams wrote:
>
>> On Tue, Oct 11, 2011 at 4:24 PM, Pete Warden  wrote:
>> > I'm trying to run the most basic example for pig_cassandra, counting the
>> > number of rows in a column family, and I'm hitting the following error:
>> > 2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> > ERROR 1031: Incompatable field schema: left is
>> > "columns:bag{:tuple(name:bytearray,value:bytearray)}", right is
>> >
>> "columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}"
>>
>> After https://issues.apache.org/jira/browse/CASSANDRA-2777 you need to
>> remove the 'AS' and everything after it; your schema definition
>> conflicts with what was inferred.
>>
>> -Brandon
>>
>
>