Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread AJ

On 6/6/2011 11:25 PM, Benjamin Coverston wrote:
Currently, my data dir has about 16 sets.  I thought that compaction 
(with nodetool) would clean-up these files, but it doesn't.  Neither 
does cleanup or repair.


You're not even talking about snapshots using nodetool snapshot yet. 
Also nodetool compact does compact all of the live files, however the 
compacted SSTables will not be cleaned up until a garbage collection 
is triggered, or a capacity threshold is met.


Ok, so after a compaction, Cass is still not done with the older sets of 
.db files and I should let Cass delete them?  But, I thought one of the 
main purposes of compaction was to reclaim disk storage resources.  I'm 
only playing around with a small data set so I can't tell how fast the 
data grows.  I'm trying to plan my storage requirements.  Is each 
newly-generated set as large in size as the previous?


The reason I ask is it seems a snapshot is...

Q1: Should the files with the lower index #'s (under the 
data/{keyspace} directory) be manually deleted?  Or, do ALL of the 
files in this directory need to be backed-up?
Do not ever delete files in your data directory if you care about data 
on that replica, unless they are from a column family that no longer 
exists on that server. There may be some duplicate data in the files, 
but if the files are in the data directory, as a general rule, they 
are there because they contain some set of data that is in none of the 
other SSTables.


... It seems a snapshot is implemented, unsurprisingly,  as just a link 
to the latest (highest indexed) set; not the previous sets.  So, 
obviously, only the latest *.db files will get backed-up.  Therefore, 
the previous sets must be worthless.




Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Maki Watanabe
You can find useful information in:
http://www.datastax.com/docs/0.8/operations/scheduled_tasks

sstables are immutable. Once it written to disk, it won't be updated.
When you take snapshot, the tool makes hard links to sstable files.
After certain time, you will have some times of memtable flushs, so
your sstable files will be merged, and obsolete sstable files will be
removed. But snapshot set will remains on your disk, for backup.

Assume you have sstable: A B C D E F,
When you take snapshot, you will have hard links A B C D E F under
snapshots subdirectory.
These hard links = files will not removed even after you run
major/minor compaction.

maki

2011/6/7 AJ :
> On 6/6/2011 11:25 PM, Benjamin Coverston wrote:
>>>
>>> Currently, my data dir has about 16 sets.  I thought that compaction
>>> (with nodetool) would clean-up these files, but it doesn't.  Neither does
>>> cleanup or repair.
>>
>> You're not even talking about snapshots using nodetool snapshot yet. Also
>> nodetool compact does compact all of the live files, however the compacted
>> SSTables will not be cleaned up until a garbage collection is triggered, or
>> a capacity threshold is met.
>
> Ok, so after a compaction, Cass is still not done with the older sets of .db
> files and I should let Cass delete them?  But, I thought one of the main
> purposes of compaction was to reclaim disk storage resources.  I'm only
> playing around with a small data set so I can't tell how fast the data
> grows.  I'm trying to plan my storage requirements.  Is each newly-generated
> set as large in size as the previous?
>
> The reason I ask is it seems a snapshot is...
>
>>> Q1: Should the files with the lower index #'s (under the data/{keyspace}
>>> directory) be manually deleted?  Or, do ALL of the files in this directory
>>> need to be backed-up?
>>
>> Do not ever delete files in your data directory if you care about data on
>> that replica, unless they are from a column family that no longer exists on
>> that server. There may be some duplicate data in the files, but if the files
>> are in the data directory, as a general rule, they are there because they
>> contain some set of data that is in none of the other SSTables.
>
> ... It seems a snapshot is implemented, unsurprisingly,  as just a link to
> the latest (highest indexed) set; not the previous sets.  So, obviously,
> only the latest *.db files will get backed-up.  Therefore, the previous sets
> must be worthless.
>
>



-- 
w3m


Re: Troubleshooting IO performance ?

2011-06-07 Thread Terje Marthinussen
If you run iostat without output every few second, is the I/O stable or do
 you see very uneven I/O?

Regards,
Terje

On Tue, Jun 7, 2011 at 11:12 AM, aaron morton wrote:

> There is a big IO queue and reads are spending a lot of time in the queue.
>
> Some more questions:
> - what version are you on ?
> -  what is the concurrent_reads config setting ?
> - what is nodetool tpstats showing during the slow down ?
> - exactly how much data are you asking for ? how many rows and what sort of
> slice
> - has their been a lot of deletes or TTL columns used ?
>
> Hope that helps.
> Aaron
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 Jun 2011, at 10:09, Philippe wrote:
>
> Ok, here it goes again... No swapping at all...
>
> procs ---memory-- ---swap-- -io -system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> wa
>  1 63  32044  88736  37996 711652400 227156 0 18314 5607 30  5
> 11 53
>  1 63  32044  90844  37996 710390400 233524   202 17418 4977 29  4
>  9 58
>  0 42  32044  91304  37996 712388400 249736 0 16197 5433 19  6
>  3 72
>  3 25  32044  89864  37996 713598000 22314016 18135 7567 32  5
> 11 52
>  1  1  32044  88664  37996 715072800 229416   128 19168 7554 36  4
> 10 51
>  4  0  32044  89464  37996 714942800 21385218 21041 8819 45  5
> 12 38
>  4  0  32044  90372  37996 714943200 233086   142 19909 7041 43  5
> 10 41
>  7  1  32044  89752  37996 714952000 206906 0 19350 6875 50  4
> 11 35
>
> Lots and lots of disk activity
> iostat -dmx 2
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda  52.50 0.00 7813.000.00   108.01 0.0028.31
>   117.15   14.89   14.890.00   0.11  83.00
> sdb  56.00 0.00 7755.500.00   108.51 0.0028.66
>   118.67   15.18   15.180.00   0.11  82.80
> md1   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> md5   0.00 0.00 15796.500.00   219.21 0.0028.42
> 0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.00 15796.500.00   219.21 0.0028.42
>   273.42   17.03   17.030.00   0.05  83.40
> dm-1  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> More info :
> - all the data directory containing the data I'm querying into is  9.7GB
> and this is a server with 16GB
> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on
> multiple keys, some of them can bring back quite a number of data
> - I'm reading all the keys for one column, pretty much sequentially
>
> This is a query in a rollup table that was originally in MySQL and it
> doesn't look like the performance to query by key is better. So I'm betting
> I'm doing something wrong here... but what ?
>
> Any ideas ?
> Thanks
>
> 2011/6/6 Philippe 
>
>> hum..no, it wasn't swapping. cassandra was the only thing running on that
>> server
>> and i was querying the same keys over and over
>>
>> i restarted Cassandra and doing the same thing, io is now down to zero
>> while cpu is up which dosen't surprise me as much.
>>
>> I'll report if it happens again.
>> Le 5 juin 2011 16:55, "Jonathan Ellis"  a écrit :
>>
>> > You may be swapping.
>> >
>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>> > explains how to check this as well as how to see what threads are busy
>> > in the Java process.
>> >
>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe  wrote:
>> >> Hello,
>> >> I am evaluating using cassandra and I'm running into some strange IO
>> >> behavior that I can't explain, I'd like some help/ideas to troubleshoot
>> it.
>> >> I am running a 1 node cluster with a keyspace consisting of two columns
>> >> families, one of which has dozens of supercolumns itself containing
>> dozens
>> >> of columns.
>> >> All in all, this is a couple gigabytes of data, 12GB on the hard drive.
>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM
>> and
>> >> an i5 processor (4 cores).
>> >> Keyspace: xxx
>> >> Read Count: 460754852
>> >> Read Latency: 1.108205793092766 ms.
>> >> Write Count: 30620665
>> >> Write Latency: 0.01411020877567486 ms.
>> >> Pending Tasks: 0
>> >> Column Family: xx
>> >> SSTable count: 5
>> >> Space used (live): 548700725
>> >> Space used (total): 548700725
>> >> Memtable Columns Count: 0
>> >> Memtable Data Size: 0
>> >> Memtable Switch Count: 11
>> >> Read Count: 2891192
>> >> Read Lat

Re: Multiple large disks in server - setup considerations

2011-06-07 Thread Erik Forsberg
On Tue, 31 May 2011 13:23:36 -0500
Jonathan Ellis  wrote:

> Have you read http://wiki.apache.org/cassandra/CassandraHardware ?

I had, but it was a while ago so I guess I kind of deserved an RTFM! :-)

After re-reading it, I still want to know:

* If we disregard the performance hit caused by having the commitlog on
  the same physical device as parts of the data, are there any other
  grave effects on Cassandra's functionality with a setup like that?

* How does Cassandra handle a case where one of the disks in a striped
  RAID0 partition goes bad and is replaced? Is the only option to wipe
  everything from that node and reinit the node, or will it handle
  corrupt files? I.e, what's the recommended thing to do from an
  operations point of view when a disk dies on one of the nodes in a
  RAID0 Cassandra setup? What will cause the least risk for data loss?
  What will be the fastest way to get the node up to speed with the
  rest of the cluster?

Thanks,
\EF



> 
> On Tue, May 31, 2011 at 7:47 AM, Erik Forsberg 
> wrote:
> > Hi!
> >
> > I'm considering setting up a small (4-6 nodes) Cassandra cluster on
> > machines that each have 3x2TB disks. There's no hardware RAID in the
> > machine, and if there were, it could only stripe single disks
> > together, not parts of disks.
> >
> > I'm planning RF=2 (or higher).
> >
> > I'm pondering what the best disk configuration is. Two alternatives:
> >
> > 1) Make small partition on first disk for Linux installation and
> > commit log. Use Linux' software RAID0 to stripe the remaining space
> > on disk1
> >   + the two remaining disks into one large XFS partition.
> >
> > 2) Make small partition on first disk for Linux installation and
> > commit log. Mount rest of disk 1 as /var/cassandra1, then disk2
> >   as /var/cassandra2 and disk3 as /var/cassandra3.
> >
> > Is it unwise to put the commit log on the same physical disk as
> > some of the data? I guess it could impact write performance, but
> > maybe it's bad from a data consistency point of view?
> >
> > How does Cassandra handle replacement of a bad disk in the two
> > alternatives? With option 1) I guess there's risk of files being
> > corrupt. With option 2) they will simply be missing after replacing
> > the disk with a new one.
> >
> > With option 2) I guess I'm limiting the size of the total amount of
> > data in the largest CF at compaction to, hmm.. the free space on the
> > disk with most free space, correct?
> >
> > Comments welcome!
> >
> > Thanks,
> > \EF
> > --
> > Erik Forsberg 
> > Developer, Opera Software - http://www.opera.com/
> >
> 
> 
> 


-- 
Erik Forsberg 
Developer, Opera Software - http://www.opera.com/


Re: Replication-aware compaction

2011-06-07 Thread David Boxenhorn
Thanks! I'm actually on vacation now, so I hope to look into this next week.

On Mon, Jun 6, 2011 at 10:25 PM, aaron morton  wrote:
> You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 
> releases were prone to marking nodes up and down when they should not have 
> been. See 
> https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22
>
> Are the TimedOutExceptions to the client for read or write requests ? During 
> the burst times which stages are backing up  nodetool tpstats ? Compaction 
> should not affect writes too much (assuming different log and data spindles).
>
> You could also take a look at the read and write latency stats for a 
> particular CF using nodetool cfstats or JConsole. These will give you the 
> stats for the local operations. You could also take a look at the iostats on 
> the box http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 Jun 2011, at 00:30, David Boxenhorn wrote:
>
>> Version 0.7.3.
>>
>> Yes, I am talking about minor compactions. I have three nodes, RF=3.
>> 3G data (before replication). Not many users (yet). It seems like 3
>> nodes should be plenty. But when all 3 nodes are compacting, I
>> sometimes get timeouts on the client, and I see in my logs that each
>> one is full of notifications that the other nodes have died (and come
>> back to life after about a second). My cluster can tolerate one node
>> being out of commission, so I would rather have longer compactions one
>> at a time than shorter compactions all at the same time.
>>
>> I think that our usage pattern of bursty writes causes the three nodes
>> to decide to compact at the same time. These bursts are followed by
>> periods of relative quiet, so there should be time for the other two
>> nodes to compact one at a time.
>>
>>
>> On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn  wrote:
>>>
>>> Version 0.7.3.
>>>
>>> Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G 
>>> data (before replication). Not many users (yet). It seems like 3 nodes 
>>> should be plenty. But when all 3 nodes are compacting, I sometimes get 
>>> timeouts on the client, and I see in my logs that each one is full of 
>>> notifications that the other nodes have died (and come back to life after 
>>> about a second). My cluster can tolerate one node being out of commission, 
>>> so I would rather have longer compactions one at a time than shorter 
>>> compactions all at the same time.
>>>
>>> I think that our usage pattern of bursty writes causes the three nodes to 
>>> decide to compact at the same time. These bursts are followed by periods of 
>>> relative quiet, so there should be time for the other two nodes to compact 
>>> one at a time.
>>>
>>>
>>> On Mon, Jun 6, 2011 at 2:36 PM, aaron morton  
>>> wrote:

 Are you talking about minor (automatic) compactions ? Can you provide some 
 more information on what's happening to make the node unusable and what 
 version you are using? It's not lightweight process, but it should not 
 hurt the node that badly. It is considered an online operation.

 Delaying compaction will only make it run for longer and take more 
 resources.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6 Jun 2011, at 20:14, David Boxenhorn wrote:

> Is there some deep architectural reason why compaction can't be
> replication-aware?
>
> What I mean is, if one node is doing compaction, its replicas
> shouldn't be doing compaction at the same time. Or, at least a quorum
> of nodes should be available at all times.
>
> For example, if RF=3, and one node is doing compaction, the nodes to
> its right and left in the ring should wait on compaction until that
> node is done.
>
> Of course, my real problem is that compaction makes a node pretty much
> unavailable. If we can fix that problem then this is not necessary.

>>>
>
>


Re: Installing Thrift with Solandra

2011-06-07 Thread Jake Luciani
Good point it doesn't include the Cassandra.thrift file.

I suppose I should include it with the code but you can also grab it
from Cassandra.

Jake

On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 wrote:
> Thanks again :)
> Ok... But in the tutorial it says that I need to build a Thrift interface for 
> Cassandra:
>
> ./compiler/cpp/thrift -gen php ../PATH-TO-CASSANDRA/interface/cassandra.thrift
> How do I do this?
> Where is the interface folder?
>
> Again, tjake thanks allot for your time and help.
> On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani  wrote:
> To access Cassandra in Solandra it's the same as regular cassandra.  To 
> access Solr you use one of the Php Solr 
> libraries http://wiki.apache.org/solr/SolPHP
>
>
>
>
> On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
>  wrote:
>
>
>
>
> I am trying to install Thrift with Solandra.
>
>
> Normally when I just want to install Thrift with Cassandra, I followed this 
> tutorial:https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
>
>
>
>
>
> But how can I do the same for Solandra?
>
>
> Thrift with PHP...
>
>
> Using Ubuntu Server.
>
>
> Thanks in advance!
>
>
> --
> http://twitter.com/tjake
>
>
>
>

-- 
http://twitter.com/tjake


Re: Installing Thrift with Solandra

2011-06-07 Thread Jake Luciani
This seems to be a common cause of confusion. Let me try again.

Solandra doesn't integrate your Cassandra data into solr. It simply
provides a scalable backend for solr by
Building on Cassandra. The inverted index lives in it's own Cassandra keyspace.

What you have in the end is two functionally different components
(Cassandra and solr) in one logical service.

Jake

On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
 wrote:
> I just saw a post you made on Stackoverflow, where you said:
> "The Solandra project which is replacing Lucandra no longer uses thrift, only 
> Solr."
>
> So I use Solr to access my data in Cassandra?
> Thanks again...
> On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins 
>  wrote:
> Thanks again :)
> Ok... But in the tutorial it says that I need to build a Thrift interface for 
> Cassandra:
>
>
> ./compiler/cpp/thrift -gen php ../PATH-TO-CASSANDRA/interface/cassandra.thrift
> How do I do this?
> Where is the interface folder?
>
>
> Again, tjake thanks allot for your time and help.
> On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani  wrote:
> To access Cassandra in Solandra it's the same as regular cassandra.  To 
> access Solr you use one of the Php Solr 
> libraries http://wiki.apache.org/solr/SolPHP
>
>
>
>
>
> On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins 
>  wrote:
>
>
>
>
>
> I am trying to install Thrift with Solandra.
>
>
>
> Normally when I just want to install Thrift with Cassandra, I followed this 
> tutorial:https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
>
>
>
>
>
>
>
> But how can I do the same for Solandra?
>
>
>
> Thrift with PHP...--
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>
>

-- 
http://twitter.com/tjake


upgrading to cassandra 0.8

2011-06-07 Thread Sasha Dolgy
Hi,

Good news on the 0.8 release.  So ... if I upgrade one node out of four, and
let it run for a bit ... I should have no issues, correct?  If I make schema
changes, specifically, adding a new column family for counters, how will
this behave with the other three nodes that aren't upgraded?  Or ... should
schema changes not be done until all nodes are upgraded?

-- 
Sasha Dolgy
sasha.do...@gmail.com


set up a cassandra cluster with ByteOrderedPartitioner using whirr?

2011-06-07 Thread Khanh Nguyen
Hi,

I'm struggling to set up a cassandra cluster with
ByteOrderedPartitioner using whirr. (I'm not sure if the issue is
caused by Cassandra or Whirr so I cc-ed both lists).

Here are the steps I took

- use whirr to lauch a cassandra (version 0.8) cluster
- ssh into each instances and do
1) kill cassandra
2) edit "partitioner" field in cassandra.yaml from
"org.apache.cassandra.dht.RandomPartitioner" to
"org.apache.cassandra.dht.ByteOrderedPartitioner"
3) edit 'JMX_PORT" in cassandra-env.sh from 7199 to 8080 (whirr bind
JMX to port 8080)
3) delete "/var/lib/cassandra/data"
4) run cassandra again

in the end, I got my cluster back but when I do 'describe cluster'
inside cassandra-cli, it shows the cluster is still running
RandomPartitioner. What am I missing? Thanks.

Regards,

-k


Re: set up a cassandra cluster with ByteOrderedPartitioner using whirr?

2011-06-07 Thread Edward Capriolo
On Tue, Jun 7, 2011 at 10:57 AM, Khanh Nguyen wrote:

> Hi,
>
> I'm struggling to set up a cassandra cluster with
> ByteOrderedPartitioner using whirr. (I'm not sure if the issue is
> caused by Cassandra or Whirr so I cc-ed both lists).
>
> Here are the steps I took
>
> - use whirr to lauch a cassandra (version 0.8) cluster
> - ssh into each instances and do
> 1) kill cassandra
> 2) edit "partitioner" field in cassandra.yaml from
> "org.apache.cassandra.dht.RandomPartitioner" to
> "org.apache.cassandra.dht.ByteOrderedPartitioner"
> 3) edit 'JMX_PORT" in cassandra-env.sh from 7199 to 8080 (whirr bind
> JMX to port 8080)
> 3) delete "/var/lib/cassandra/data"
> 4) run cassandra again
>
> in the end, I got my cluster back but when I do 'describe cluster'
> inside cassandra-cli, it shows the cluster is still running
> RandomPartitioner. What am I missing? Thanks.
>
> Regards,
>
> -k
>

Changes to the YAML file after the node has started even once will not work.
You have to change the file before launching the cluster.

Edward


Re: upgrading to cassandra 0.8

2011-06-07 Thread Edward Capriolo
On Tue, Jun 7, 2011 at 10:54 AM, Sasha Dolgy  wrote:

> Hi,
>
> Good news on the 0.8 release.  So ... if I upgrade one node out of four,
> and let it run for a bit ... I should have no issues, correct?  If I make
> schema changes, specifically, adding a new column family for counters, how
> will this behave with the other three nodes that aren't upgraded?  Or ...
> should schema changes not be done until all nodes are upgraded?
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>

Do not make schema changes until the upgrade is complete. It will behave
very poorly I suspect, and you will be sad.


Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread AJ

On 6/7/2011 2:29 AM, Maki Watanabe wrote:

You can find useful information in:
http://www.datastax.com/docs/0.8/operations/scheduled_tasks

sstables are immutable. Once it written to disk, it won't be updated.
When you take snapshot, the tool makes hard links to sstable files.
After certain time, you will have some times of memtable flushs, so
your sstable files will be merged, and obsolete sstable files will be
removed. But snapshot set will remains on your disk, for backup.



Thanks for the doc source.  I will be experimenting with 0.8.0 since it 
has many features I've been waiting for.


But, still, if the snapshots don't link to all of the previous sets of 
.db files, then those unlinked previous file sets MUST be safe to 
manually delete.  But, they aren't deleted until later after a GC.  It's 
a bit confusing why they are kept after compaction up until GC when they 
seem to not be needed.  We have Big Data plans... one node can have 10's 
of TBs, so I'm trying to get an idea of how much disk space will be 
required and whether or not I can free-up some disk space.


Hopefully someone can still elaborate on this.




Re: sync commitlog in batch mode lose data

2011-06-07 Thread Peter Schuller
> But I have another question, while I disable the disk cache but leave the 
> cache write mode write-back, how sync works ? Still write the data into the 
> cache ? This issue may not belong to the scope of discussion here  .

I'm not sure, it depends on at what level of abstraction you changed
to write-back and how it's implemented. Generally, the contract of an
fsync() is that whatever was written up to that point must be
persistent (i.e., readable by subsequent reads, even in case of a
power outtage/crash) when the call returns. This usually means:

(1) the userland app must flush buffers and write data to kernel (this
is done prior to fsync())
(2) the OS file system code needs to write whatever is necessary to
underlying block device(s)
(3) the underlying block device(s) need to be told to insert a write
barrier or flush caches depending
(4) the underlying block device itself must handle this correctly
  (a) for a non-battery-backed disk it means flushing the cache and
you have to wakt for that to happen - at minimum seek + rotational
delay
  (b) for a battery-backed RAID device it typically is a NOOP if the
battery backup unit is working, as the raid controller cache is
considered persistent
  (c) for a raid device with caching turned off or the BBU being
inoperable, it usually means asking individual real drives to flush
their caches

However in general, I advise care since all sorts of little details
can derail this from working. For example if you have the kernel
driver configured not to propagate write barriers to the raid
controller, but the raid controller has BBU turned off but is still
caching, an fsync() would not work for the power outage case. Using
LVM in certain configurations can break write (at least up to not very
long ago, maybe fixed in newer kernels) barriers at the OS level - and
the list goes on.

--
/ Peter Schuller


RE: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Jeremiah Jordan
Don't manually delete things.  Let Cassandra do it.  Force a garbage
collection or restart your instance and Cassandra will delete the unused
files.

-Original Message-
From: AJ [mailto:a...@dude.podzone.net] 
Sent: Tuesday, June 07, 2011 10:15 AM
To: user@cassandra.apache.org
Subject: Re: Backups, Snapshots, SSTable Data Files, Compaction

On 6/7/2011 2:29 AM, Maki Watanabe wrote:
> You can find useful information in:
> http://www.datastax.com/docs/0.8/operations/scheduled_tasks
>
> sstables are immutable. Once it written to disk, it won't be updated.
> When you take snapshot, the tool makes hard links to sstable files.
> After certain time, you will have some times of memtable flushs, so 
> your sstable files will be merged, and obsolete sstable files will be 
> removed. But snapshot set will remains on your disk, for backup.
>

Thanks for the doc source.  I will be experimenting with 0.8.0 since it
has many features I've been waiting for.

But, still, if the snapshots don't link to all of the previous sets of
.db files, then those unlinked previous file sets MUST be safe to
manually delete.  But, they aren't deleted until later after a GC.  It's
a bit confusing why they are kept after compaction up until GC when they
seem to not be needed.  We have Big Data plans... one node can have 10's
of TBs, so I'm trying to get an idea of how much disk space will be
required and whether or not I can free-up some disk space.

Hopefully someone can still elaborate on this.




Re: upgrading to cassandra 0.8

2011-06-07 Thread Jonathan Ellis
Even schema changes *should* work, although to be safe, the less
"unusual" stuff you do with a mixed-version cluster, the better.

However, any kind of streaming (bootstrap, node movement,
decommission, nodetool repair) will not work.

On Tue, Jun 7, 2011 at 10:07 AM, Edward Capriolo  wrote:
>
>
> On Tue, Jun 7, 2011 at 10:54 AM, Sasha Dolgy  wrote:
>>
>> Hi,
>>
>> Good news on the 0.8 release.  So ... if I upgrade one node out of four,
>> and let it run for a bit ... I should have no issues, correct?  If I make
>> schema changes, specifically, adding a new column family for counters, how
>> will this behave with the other three nodes that aren't upgraded?  Or ...
>> should schema changes not be done until all nodes are upgraded?
>>
>> --
>> Sasha Dolgy
>> sasha.do...@gmail.com
>
> Do not make schema changes until the upgrade is complete. It will behave
> very poorly I suspect, and you will be sad.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: upgrading to cassandra 0.8

2011-06-07 Thread Sasha Dolgy
Thanks everyone ... upgrade to 0.8 on all nodes is a first priority then
...

On Tue, Jun 7, 2011 at 5:28 PM, Jonathan Ellis  wrote:

> Even schema changes *should* work, although to be safe, the less
> "unusual" stuff you do with a mixed-version cluster, the better.
>
> However, any kind of streaming (bootstrap, node movement,
> decommission, nodetool repair) will not work.
>
> On Tue, Jun 7, 2011 at 10:07 AM, Edward Capriolo 
> wrote:
> >
> >
> > On Tue, Jun 7, 2011 at 10:54 AM, Sasha Dolgy  wrote:
> >>
> >> Hi,
> >>
> >> Good news on the 0.8 release.  So ... if I upgrade one node out of four,
> >> and let it run for a bit ... I should have no issues, correct?  If I
> make
> >> schema changes, specifically, adding a new column family for counters,
> how
> >> will this behave with the other three nodes that aren't upgraded?  Or
> ...
> >> should schema changes not be done until all nodes are upgraded?
> >>
> >> --
> >> Sasha Dolgy
> >> sasha.do...@gmail.com
> >
> > Do not make schema changes until the upgrade is complete. It will behave
> > very poorly I suspect, and you will be sad.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Benjamin Coverston

Hi AJ,

Unfortunately, for storage capacity planning it's a bit of a guessing 
game. Until you run your load against it and profile the usage you just 
are not going to know for sure. I have seen cases where planning to have 
50% excess capacity/node was plenty, and I have seen other extreme cases 
where 3x planned capacity was not enough when replica counts and entropy 
levels are high.


Cassandra will _try_ to work within the resource restrictions that you 
give it, but keep in mind that if it has excess resources in terms of 
disk space it may be a bit more lazy than you would expect in getting 
rid of some of the extra files that are sitting around waiting to be 
deleted. You know if they are scheduled to be deleted have a .compacted 
marker. If you want to actually SEE this happen use the stress.java or 
stress.py tools and do several test runs with different workloads. I 
think actually watching it happen would be enlightening for you.


Lastly while I have seen a few instances where people have chosen to use 
node sizes with 10's of TB, it is an unusual case. Most node sizing I 
have seen falls in the range of 20-250GB. Not to say that there aren't 
workloads where having many TB/Node doesn't work, but if you're planning 
to read from the data you're writing you do want to ensure that your 
working set is stored in memory.


HTH,
Ben


On 6/7/11 9:14 AM, AJ wrote:

On 6/7/2011 2:29 AM, Maki Watanabe wrote:

You can find useful information in:
http://www.datastax.com/docs/0.8/operations/scheduled_tasks

sstables are immutable. Once it written to disk, it won't be updated.
When you take snapshot, the tool makes hard links to sstable files.
After certain time, you will have some times of memtable flushs, so
your sstable files will be merged, and obsolete sstable files will be
removed. But snapshot set will remains on your disk, for backup.



Thanks for the doc source.  I will be experimenting with 0.8.0 since 
it has many features I've been waiting for.


But, still, if the snapshots don't link to all of the previous sets of 
.db files, then those unlinked previous file sets MUST be safe to 
manually delete.  But, they aren't deleted until later after a GC.  
It's a bit confusing why they are kept after compaction up until GC 
when they seem to not be needed.  We have Big Data plans... one node 
can have 10's of TBs, so I'm trying to get an idea of how much disk 
space will be required and whether or not I can free-up some disk space.


Hopefully someone can still elaborate on this.




--
Ben Coverston
Director of Operations
DataStax -- The Apache Cassandra Company
http://www.datastax.com/



Re: Installing Thrift with Solandra

2011-06-07 Thread Jean-Nicolas Boulay Desjardins
Ok

So I have to install Thrift and Cassandra than Solandra.

I am asking because I followed the instructions in your Git page but I get
this error:

# cd solandra-app; ./start-solandra.sh

-bash: ./start-solandra.sh: No such file or directory

Thanks again :)

On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani  wrote:

> This seems to be a common cause of confusion. Let me try again.
>
> Solandra doesn't integrate your Cassandra data into solr. It simply
> provides a scalable backend for solr by
> Building on Cassandra. The inverted index lives in it's own Cassandra
> keyspace.
>
> What you have in the end is two functionally different components
> (Cassandra and solr) in one logical service.
>
> Jake
>
> On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
>  wrote:
> > I just saw a post you made on Stackoverflow, where you said:
> > "The Solandra project which is replacing Lucandra no longer uses thrift,
> only Solr."
> >
> > So I use Solr to access my data in Cassandra?
> > Thanks again...
> > On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins <
> jnbdzjn...@gmail.com> wrote:
> > Thanks again :)
> > Ok... But in the tutorial it says that I need to build a Thrift interface
> for Cassandra:
> >
> >
> > ./compiler/cpp/thrift -gen php
> ../PATH-TO-CASSANDRA/interface/cassandra.thrift
> > How do I do this?
> > Where is the interface folder?
> >
> >
> > Again, tjake thanks allot for your time and help.
> > On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani  wrote:
> > To access Cassandra in Solandra it's the same as regular cassandra.  To
> access Solr you use one of the Php Solr libraries
> http://wiki.apache.org/solr/SolPHP
> >
> >
> >
> >
> >
> > On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins <
> jnbdzjn...@gmail.com> wrote:
> >
> >
> >
> >
> >
> > I am trying to install Thrift with Solandra.
> >
> >
> >
> > Normally when I just want to install Thrift with Cassandra, I followed
> this tutorial:
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
> >
> >
> >
> >
> >
> >
> >
> > But how can I do the same for Solandra?
> >
> >
> >
> > Thrift with PHP...--
> > Name / Nom: Boulay Desjardins, Jean-Nicolas
> > Website / Site Web: www.jeannicolas.com
> >
> >
>
> --
> http://twitter.com/tjake
>



-- 
Name / Nom: Boulay Desjardins, Jean-Nicolas
Website / Site Web: www.jeannicolas.com


Re: Troubleshooting IO performance ?

2011-06-07 Thread Philippe
very even
will answer aaron's email...

will upgrade to 0.8 too !
Le 7 juin 2011 13:09, "Terje Marthinussen"  a
écrit :
> If you run iostat without output every few second, is the I/O stable or do
> you see very uneven I/O?
>
> Regards,
> Terje
>
> On Tue, Jun 7, 2011 at 11:12 AM, aaron morton wrote:
>
>> There is a big IO queue and reads are spending a lot of time in the
queue.
>>
>> Some more questions:
>> - what version are you on ?
>> - what is the concurrent_reads config setting ?
>> - what is nodetool tpstats showing during the slow down ?
>> - exactly how much data are you asking for ? how many rows and what sort
of
>> slice
>> - has their been a lot of deletes or TTL columns used ?
>>
>> Hope that helps.
>> Aaron
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 7 Jun 2011, at 10:09, Philippe wrote:
>>
>> Ok, here it goes again... No swapping at all...
>>
>> procs ---memory-- ---swap-- -io -system--
>> cpu
>> r b swpd free buff cache si so bi bo in cs us sy id
>> wa
>> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5
>> 11 53
>> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4
>> 9 58
>> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6
>> 3 72
>> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5
>> 11 52
>> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4
>> 10 51
>> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5
>> 12 38
>> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5
>> 10 41
>> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4
>> 11 35
>>
>> Lots and lots of disk activity
>> iostat -dmx 2
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
>> avgqu-sz await r_await w_await svctm %util
>> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31
>> 117.15 14.89 14.89 0.00 0.11 83.00
>> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66
>> 118.67 15.18 15.18 0.00 0.11 82.80
>> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00 0.00 0.00
>> md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
>> 0.00 0.00 0.00 0.00 0.00 0.00
>> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42
>> 273.42 17.03 17.03 0.00 0.05 83.40
>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00 0.00 0.00
>>
>> More info :
>> - all the data directory containing the data I'm querying into is 9.7GB
>> and this is a server with 16GB
>> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on
>> multiple keys, some of them can bring back quite a number of data
>> - I'm reading all the keys for one column, pretty much sequentially
>>
>> This is a query in a rollup table that was originally in MySQL and it
>> doesn't look like the performance to query by key is better. So I'm
betting
>> I'm doing something wrong here... but what ?
>>
>> Any ideas ?
>> Thanks
>>
>> 2011/6/6 Philippe 
>>
>>> hum..no, it wasn't swapping. cassandra was the only thing running on
that
>>> server
>>> and i was querying the same keys over and over
>>>
>>> i restarted Cassandra and doing the same thing, io is now down to zero
>>> while cpu is up which dosen't surprise me as much.
>>>
>>> I'll report if it happens again.
>>> Le 5 juin 2011 16:55, "Jonathan Ellis"  a écrit :
>>>
>>> > You may be swapping.
>>> >
>>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>>> > explains how to check this as well as how to see what threads are busy
>>> > in the Java process.
>>> >
>>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe  wrote:
>>> >> Hello,
>>> >> I am evaluating using cassandra and I'm running into some strange IO
>>> >> behavior that I can't explain, I'd like some help/ideas to
troubleshoot
>>> it.
>>> >> I am running a 1 node cluster with a keyspace consisting of two
columns
>>> >> families, one of which has dozens of supercolumns itself containing
>>> dozens
>>> >> of columns.
>>> >> All in all, this is a couple gigabytes of data, 12GB on the hard
drive.
>>> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with
LVM
>>> and
>>> >> an i5 processor (4 cores).
>>> >> Keyspace: xxx
>>> >> Read Count: 460754852
>>> >> Read Latency: 1.108205793092766 ms.
>>> >> Write Count: 30620665
>>> >> Write Latency: 0.01411020877567486 ms.
>>> >> Pending Tasks: 0
>>> >> Column Family: xx
>>> >> SSTable count: 5
>>> >> Space used (live): 548700725
>>> >> Space used (total): 548700725
>>> >> Memtable Columns Count: 0
>>> >> Memtable Data Size: 0
>>> >> Memtable Switch Count: 11
>>> >> Read Count: 2891192
>>> >> Read Latency: NaN ms.
>>> >> Write Count: 3157547
>>> >> Write Latency: NaN ms.
>>> >> Pending Tasks: 0
>>> >> Key cache capacity: 367396
>>> >> Key cache size: 367396
>>> >> Key cache hit rate: NaN
>>> >> Row cache capacity: 112683
>>> >> Row cache size: 112683
>>> >> Row cache hit rate: NaN
>>> >> Compacted row minimum size: 125
>>> >> Compacted row maximum size: 924
>>> >> Compac

Re: Multiple large disks in server - setup considerations

2011-06-07 Thread Ryan King
On Tue, Jun 7, 2011 at 4:34 AM, Erik Forsberg  wrote:
> On Tue, 31 May 2011 13:23:36 -0500
> Jonathan Ellis  wrote:
>
>> Have you read http://wiki.apache.org/cassandra/CassandraHardware ?
>
> I had, but it was a while ago so I guess I kind of deserved an RTFM! :-)
>
> After re-reading it, I still want to know:
>
> * If we disregard the performance hit caused by having the commitlog on
>  the same physical device as parts of the data, are there any other
>  grave effects on Cassandra's functionality with a setup like that?

You'll take a performance hit if you hare a high write load. I'd
recommend doing your own benchmarks (with an existing benchmark
framework like YCSB) against the configuration you'd like to use.

> * How does Cassandra handle a case where one of the disks in a striped
>  RAID0 partition goes bad and is replaced? Is the only option to wipe
>  everything from that node and reinit the node, or will it handle
>  corrupt files?

Don't plan on being able to recover any date on that node.

> I.e, what's the recommended thing to do from an
>  operations point of view when a disk dies on one of the nodes in a
>  RAID0 Cassandra setup? What will cause the least risk for data loss?
>  What will be the fastest way to get the node up to speed with the
>  rest of the cluster?

Decommission (or removetoken) on the dead node, replace the drive and
rebootstrap.

-ryan


Re: [RELEASE] 0.8.0

2011-06-07 Thread Ryan King
On Mon, Jun 6, 2011 at 7:00 PM, Terje Marthinussen
 wrote:
> Yes, I am aware of it but it was not an alternative for this project which
> will face production soon.
> The patch I have is fairly non-intrusive (especially vs. 674) so I think it
> can be interesting depending on how quickly 674 will be integrated into
> cassandra releases.
> I plan to take a closer look at 674 soon to see if I can add something
> there.

That's fair. 674 isn't ready yet so its earliest release will be in 1.0.

-ryan


Re: multiple clusters communicating

2011-06-07 Thread Ryan King
On Mon, Jun 6, 2011 at 5:01 PM, Jeffrey Wang  wrote:
> Hey all,
>
>
>
> We’re seeing a strange issue in which two completely separate clusters
> (0.7.3) on the same subnet (X.X.X.146 through X.X.X.150) with 3 machines
> (146-148) and 2 machines (149-150). Both of them are seeded with the
> respective machines in their cluster, yet when we run them they end up
> gossiping with each other.

Are you sure you have the seeds set properly?

> They have different cluster names so they don’t
> merge, but this is quite annoying as schema changes don’t actually go
> through. Anyone have any ideas about this? Thanks.
>
>
>
> -Jeffrey
>
>


getIndexedSlices issue using Pelops

2011-06-07 Thread Tan Huynh
Hi,



I am using Pelops client to query Cassandra secondary index and I get the e= 
xception listed below.

The code is pretty simple too. I can use Cassandra-cli to query the same 
secondary index, so there must be something wrong in my code. If you've seen 
this issue, would you please point me to what I am doing wrong.

Thanks.

Tan



org.scale7.cassandra.pelops.exceptions.ApplicationException: Internal error=  
processing get_indexed_slices

  at 
org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:49)

  at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:109)

  at 
org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1623)

  at 
org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1578)

  at TestSecondary.(TestSecondary.java:67)

  at TestSecondary.main(TestSecondary.java:91)

Caused by: org.apache.thrift.TApplicationException: Internal error processing 
get_indexed_slices

  at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)

  at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_indexed_slices(Cassandra.java:772)

  at 
org.apache.cassandra.thrift.Cassandra$Client.get_indexed_slices(Cassandra.java:752)

  at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1613)

  at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1610)

  at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:82)

  ... 4 more



This is the code:

   try {

  String collectionName = "test";

  KsDef keyspaceDefinition = null;



  cluster = new Cluster("localhost", RPC_PORT);

  ClusterManager clusterManager = Pelops.createClusterManager(cluster);





  KeyspaceManager keyspaceManager = Pelops.createKeyspaceManager(cluster);



  keyspaceDefinition = keyspaceManager.getKeyspaceSchema(KEYSPACE);



  if (keyspaceDefinition != null) {

Pelops.addPool(POOL, cluster, KEYSPACE);



IndexClause indexClause;

indexClause = Selector.newIndexClause(

Bytes.EMPTY,

Integer.MAX_VALUE,

Selector.newIndexExpression("birth_date", IndexOperator.EQ,

Bytes.fromLong(1973)));



SlicePredicate slicePredicate = Selector.newColumnsPredicateAll(false,

Integer.MAX_VALUE);



Selector selector = Pelops.createSelector(POOL);

Map> qResults = selector.getIndexedColumns(

collectionName, indexClause, slicePredicate, ConsistencyLevel.ONE);

}

  }

} catch (PelopsException e) {

  e.printStackTrace();

} catch (Exception e1) {

  e1.printStackTrace();

}





Re: getIndexedSlices issue using Pelops

2011-06-07 Thread Jonathan Ellis
internal error means look at the cassandra server logs for the stacktrace.

On Tue, Jun 7, 2011 at 12:20 PM, Tan Huynh  wrote:
> Hi,
>
>
>
> I am using Pelops client to query Cassandra secondary index and I get the e=
> xception listed below.
>
> The code is pretty simple too. I can use Cassandra-cli to query the same
> secondary index, so there must be something wrong in my code. If you've seen
> this issue, would you please point me to what I am doing wrong.
>
> Thanks.
>
> Tan
>
>
>
> org.scale7.cassandra.pelops.exceptions.ApplicationException: Internal
> error=  processing get_indexed_slices
>
>   at
> org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:49)
>
>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:109)
>
>   at
> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1623)
>
>   at
> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1578)
>
>   at TestSecondary.(TestSecondary.java:67)
>
>   at TestSecondary.main(TestSecondary.java:91)
>
> Caused by: org.apache.thrift.TApplicationException: Internal error
> processing get_indexed_slices
>
>   at
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>
>   at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_indexed_slices(Cassandra.java:772)
>
>   at
> org.apache.cassandra.thrift.Cassandra$Client.get_indexed_slices(Cassandra.java:752)
>
>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1613)
>
>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1610)
>
>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:82)
>
>   ... 4 more
>
>
>
> This is the code:
>
>    try {
>
>   String collectionName = "test";
>
>   KsDef keyspaceDefinition = null;
>
>
>
>   cluster = new Cluster("localhost", RPC_PORT);
>
>   ClusterManager clusterManager = Pelops.createClusterManager(cluster);
>
>
>
>
>
>   KeyspaceManager keyspaceManager =
> Pelops.createKeyspaceManager(cluster);
>
>
>
>   keyspaceDefinition = keyspaceManager.getKeyspaceSchema(KEYSPACE);
>
>
>
>   if (keyspaceDefinition != null) {
>
>     Pelops.addPool(POOL, cluster, KEYSPACE);
>
>
>
>     IndexClause indexClause;
>
>     indexClause = Selector.newIndexClause(
>
>     Bytes.EMPTY,
>
>     Integer.MAX_VALUE,
>
>     Selector.newIndexExpression("birth_date", IndexOperator.EQ,
>
>     Bytes.fromLong(1973)));
>
>
>
>     SlicePredicate slicePredicate =
> Selector.newColumnsPredicateAll(false,
>
>     Integer.MAX_VALUE);
>
>
>
>     Selector selector = Pelops.createSelector(POOL);
>
>     Map> qResults = selector.getIndexedColumns(
>
>     collectionName, indexClause, slicePredicate,
> ConsistencyLevel.ONE);
>
>     }
>
>   }
>
>     } catch (PelopsException e) {
>
>   e.printStackTrace();
>
>     } catch (Exception e1) {
>
>   e1.printStackTrace();
>
>     }
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: getIndexedSlices issue using Pelops

2011-06-07 Thread Jonathan Ellis
... also, are you on 0.7.6?  "works on cli but internal error w/
pelops" sounds like pelops is giving an invalid request, 0.7.6 is
better at catching those and giving a real error message.

On Tue, Jun 7, 2011 at 12:31 PM, Jonathan Ellis  wrote:
> internal error means look at the cassandra server logs for the stacktrace.
>
> On Tue, Jun 7, 2011 at 12:20 PM, Tan Huynh  wrote:
>> Hi,
>>
>>
>>
>> I am using Pelops client to query Cassandra secondary index and I get the e=
>> xception listed below.
>>
>> The code is pretty simple too. I can use Cassandra-cli to query the same
>> secondary index, so there must be something wrong in my code. If you've seen
>> this issue, would you please point me to what I am doing wrong.
>>
>> Thanks.
>>
>> Tan
>>
>>
>>
>> org.scale7.cassandra.pelops.exceptions.ApplicationException: Internal
>> error=  processing get_indexed_slices
>>
>>   at
>> org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:49)
>>
>>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:109)
>>
>>   at
>> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1623)
>>
>>   at
>> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1578)
>>
>>   at TestSecondary.(TestSecondary.java:67)
>>
>>   at TestSecondary.main(TestSecondary.java:91)
>>
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing get_indexed_slices
>>
>>   at
>> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>>
>>   at
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_indexed_slices(Cassandra.java:772)
>>
>>   at
>> org.apache.cassandra.thrift.Cassandra$Client.get_indexed_slices(Cassandra.java:752)
>>
>>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1613)
>>
>>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1610)
>>
>>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:82)
>>
>>   ... 4 more
>>
>>
>>
>> This is the code:
>>
>>    try {
>>
>>   String collectionName = "test";
>>
>>   KsDef keyspaceDefinition = null;
>>
>>
>>
>>   cluster = new Cluster("localhost", RPC_PORT);
>>
>>   ClusterManager clusterManager = Pelops.createClusterManager(cluster);
>>
>>
>>
>>
>>
>>   KeyspaceManager keyspaceManager =
>> Pelops.createKeyspaceManager(cluster);
>>
>>
>>
>>   keyspaceDefinition = keyspaceManager.getKeyspaceSchema(KEYSPACE);
>>
>>
>>
>>   if (keyspaceDefinition != null) {
>>
>>     Pelops.addPool(POOL, cluster, KEYSPACE);
>>
>>
>>
>>     IndexClause indexClause;
>>
>>     indexClause = Selector.newIndexClause(
>>
>>     Bytes.EMPTY,
>>
>>     Integer.MAX_VALUE,
>>
>>     Selector.newIndexExpression("birth_date", IndexOperator.EQ,
>>
>>     Bytes.fromLong(1973)));
>>
>>
>>
>>     SlicePredicate slicePredicate =
>> Selector.newColumnsPredicateAll(false,
>>
>>     Integer.MAX_VALUE);
>>
>>
>>
>>     Selector selector = Pelops.createSelector(POOL);
>>
>>     Map> qResults = selector.getIndexedColumns(
>>
>>     collectionName, indexClause, slicePredicate,
>> ConsistencyLevel.ONE);
>>
>>     }
>>
>>   }
>>
>>     } catch (PelopsException e) {
>>
>>   e.printStackTrace();
>>
>>     } catch (Exception e1) {
>>
>>   e1.printStackTrace();
>>
>>     }
>>
>>
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


CLI set command returns null

2011-06-07 Thread AJ

Ver 0.8.0.

Please help.  I don't know what I'm doing wrong.  One simple keyspace 
with one simple CF with one simple column.  I've tried two simple 
tutorials.  Is there a common newbie mistake I could be making???


Thanks in advance!


[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: 
org.apache.cassandra.locator.NetworkTopologyStrategy

Options: [replication_factor:1]
  Column Families:
ColumnFamily: User
  Key Validation Class: org.apache.cassandra.db.marshal.LongType
  Default column value validator: 
org.apache.cassandra.db.marshal.UTF8Type

  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.2859375/61/1440 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []
  Column Metadata:
Column Name: name
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
[default@Keyspace1]
[default@Keyspace1] set User[long(1)][utf8('name')]=utf8('aaa');
null
[default@Keyspace1] set User[1]['name']='aaa';
null
[default@Keyspace1]
[default@Keyspace1] list User;
Using default limit of 100
null
[default@Keyspace1]





Re: CLI set command returns null

2011-06-07 Thread Dan Kuebrich
Null response may mean an error on the server side.  Have you checked your
cassandra server's logs?

On Tue, Jun 7, 2011 at 2:22 PM, AJ  wrote:

> Ver 0.8.0.
>
> Please help.  I don't know what I'm doing wrong.  One simple keyspace with
> one simple CF with one simple column.  I've tried two simple tutorials.  Is
> there a common newbie mistake I could be making???
>
> Thanks in advance!
>
>
> [default@Keyspace1] describe keyspace;
> Keyspace: Keyspace1:
>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>Options: [replication_factor:1]
>  Column Families:
>ColumnFamily: User
>  Key Validation Class: org.apache.cassandra.db.marshal.LongType
>  Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>  Row cache size / save period in seconds: 0.0/0
>  Key cache size / save period in seconds: 20.0/14400
>  Memtable thresholds: 0.2859375/61/1440 (millions of ops/MB/minutes)
>  GC grace seconds: 864000
>  Compaction min/max thresholds: 4/32
>  Read repair chance: 1.0
>  Replicate on write: false
>  Built indexes: []
>  Column Metadata:
>Column Name: name
>  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
> [default@Keyspace1]
> [default@Keyspace1] set User[long(1)][utf8('name')]=utf8('aaa');
> null
> [default@Keyspace1] set User[1]['name']='aaa';
> null
> [default@Keyspace1]
> [default@Keyspace1] list User;
> Using default limit of 100
> null
> [default@Keyspace1]
>
>
>
>


Re: CLI set command returns null

2011-06-07 Thread Jonathan Ellis
try running cli with --debug

On Tue, Jun 7, 2011 at 1:22 PM, AJ  wrote:
> Ver 0.8.0.
>
> Please help.  I don't know what I'm doing wrong.  One simple keyspace with
> one simple CF with one simple column.  I've tried two simple tutorials.  Is
> there a common newbie mistake I could be making???
>
> Thanks in advance!
>
>
> [default@Keyspace1] describe keyspace;
> Keyspace: Keyspace1:
>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>    Options: [replication_factor:1]
>  Column Families:
>    ColumnFamily: User
>      Key Validation Class: org.apache.cassandra.db.marshal.LongType
>      Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period in seconds: 0.0/0
>      Key cache size / save period in seconds: 20.0/14400
>      Memtable thresholds: 0.2859375/61/1440 (millions of ops/MB/minutes)
>      GC grace seconds: 864000
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 1.0
>      Replicate on write: false
>      Built indexes: []
>      Column Metadata:
>        Column Name: name
>          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
> [default@Keyspace1]
> [default@Keyspace1] set User[long(1)][utf8('name')]=utf8('aaa');
> null
> [default@Keyspace1] set User[1]['name']='aaa';
> null
> [default@Keyspace1]
> [default@Keyspace1] list User;
> Using default limit of 100
> null
> [default@Keyspace1]
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread AJ

Thanks to everyone who responded thus far.


On 6/7/2011 10:16 AM, Benjamin Coverston wrote:

Not to say that there aren't workloads where having many TB/Node 
doesn't work, but if you're planning to read from the data you're 
writing you do want to ensure that your working set is stored in memory.




Thank you Ben.  Can you elaborate some more on the above point?  Are you 
referring to the OS's working set or the Cassandra caches?  Why exactly 
do I need to ensure this?


I am also wondering if there is any reason I should segregate my 
frequently write/read smallish data set (such as usage statistics) from 
my bulk mostly read-only data set (static content) into separate CFs if 
the schema allows it.  Would this be of any benefit?


Re: CLI set command returns null

2011-06-07 Thread AJ

The log only shows INFO level messages about flushes, etc..

The debug mode of the CLI shows an exception after the set:

[al@mars ~]$ cassandra-cli -h 192.168.1.101 --debug
Connected to: "Test Cluster" on 192.168.1.101/9160
Welcome to the Cassandra CLI.

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] set User[1]['name']='aaa';
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]



On 6/7/2011 12:27 PM, Jonathan Ellis wrote:

try running cli with --debug

On Tue, Jun 7, 2011 at 1:22 PM, AJ  wrote:

Ver 0.8.0.

Please help.  I don't know what I'm doing wrong.  One simple keyspace with
one simple CF with one simple column.  I've tried two simple tutorials.  Is
there a common newbie mistake I could be making???

Thanks in advance!


[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Options: [replication_factor:1]
  Column Families:
ColumnFamily: User
  Key Validation Class: org.apache.cassandra.db.marshal.LongType
  Default column value validator:
org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.2859375/61/1440 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []
  Column Metadata:
Column Name: name
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
[default@Keyspace1]
[default@Keyspace1] set User[long(1)][utf8('name')]=utf8('aaa');
null
[default@Keyspace1] set User[1]['name']='aaa';
null
[default@Keyspace1]
[default@Keyspace1] list User;
Using default limit of 100
null
[default@Keyspace1]











About Brisk, Hadoop powered by Cassandra

2011-06-07 Thread Marcos Ortiz

Regards to all.
I was reading about this DataStax's product called Brisk, and I think 
that's a amazing piece of technology.

Only two questions?
- Brisk is a propetary tecnology?
- Can anyone participate on its development? (I'm very interested on 
this Hadoop-Cassandra Integration)

- Which is the Cassandra's version that Brisk use?

Thanks a lot for your time

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://twitter.com/marcosluis2186
  



Re: Troubleshooting IO performance ?

2011-06-07 Thread Philippe
Aaron,

- what version are you on ?
0.7.6-2

-  what is the concurrent_reads config setting ?
>
concurrent_reads: 64
concurrent_writes: 64

Givent that I've got 4 cores and SSD drives, I doubled the concurrent writes
recommended.
Given that I've RAID-0ed the SSD drive, I figured I could at least double
for SSD and double for RAID-0 the recommended version.
Wrong assumptions ?

BTW, cassandra is running on an XFS filesystem over LVM over RAID-0

- what is nodetool tpstats showing during the slow down ?
>
The only value that changes is the ReadStage line. Here's values from a
sample every second
Pool NameActive   Pending  Completed
ReadStage64 99303  463085056
ReadStage64 88430  463095929
ReadStage64 91937  463107782

So basically, I'm flooding the system right ? For example 99303 means there
are 99303 key reads pending, possibly from just a couple MultiSlice gets ?


> - exactly how much data are you asking for ? how many rows and what sort of
> slice
>
According to some munin monitoring, the server is cranking out to the
client, over the network, 10Mbits/s = 1.25 Mbytes/s

The same munin monitoring shows me 200Mbytes/s read from the disks. This is
what is worrying me...

- has their been a lot of deletes or TTL columns used ?
>
No deletes, only update, don't know if that counts as deletes though...

This is going to be a read-heavy, update-heavy cluster.
No TTL columns, no counter columns

One question : when nodetool cfstats says the average read latency is 5ms,
is that counted once the query is being executed or does that include the
time spent "pending" ?

Thanks
Philippe

>
> Hope that helps.
> Aaron
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 Jun 2011, at 10:09, Philippe wrote:
>
> Ok, here it goes again... No swapping at all...
>
> procs ---memory-- ---swap-- -io -system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> wa
>  1 63  32044  88736  37996 711652400 227156 0 18314 5607 30  5
> 11 53
>  1 63  32044  90844  37996 710390400 233524   202 17418 4977 29  4
>  9 58
>  0 42  32044  91304  37996 712388400 249736 0 16197 5433 19  6
>  3 72
>  3 25  32044  89864  37996 713598000 22314016 18135 7567 32  5
> 11 52
>  1  1  32044  88664  37996 715072800 229416   128 19168 7554 36  4
> 10 51
>  4  0  32044  89464  37996 714942800 21385218 21041 8819 45  5
> 12 38
>  4  0  32044  90372  37996 714943200 233086   142 19909 7041 43  5
> 10 41
>  7  1  32044  89752  37996 714952000 206906 0 19350 6875 50  4
> 11 35
>
> Lots and lots of disk activity
> iostat -dmx 2
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda  52.50 0.00 7813.000.00   108.01 0.0028.31
>   117.15   14.89   14.890.00   0.11  83.00
> sdb  56.00 0.00 7755.500.00   108.51 0.0028.66
>   118.67   15.18   15.180.00   0.11  82.80
> md1   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> md5   0.00 0.00 15796.500.00   219.21 0.0028.42
> 0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.00 15796.500.00   219.21 0.0028.42
>   273.42   17.03   17.030.00   0.05  83.40
> dm-1  0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> More info :
> - all the data directory containing the data I'm querying into is  9.7GB
> and this is a server with 16GB
> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on
> multiple keys, some of them can bring back quite a number of data
> - I'm reading all the keys for one column, pretty much sequentially
>
> This is a query in a rollup table that was originally in MySQL and it
> doesn't look like the performance to query by key is better. So I'm betting
> I'm doing something wrong here... but what ?
>
> Any ideas ?
> Thanks
>
> 2011/6/6 Philippe 
>
>> hum..no, it wasn't swapping. cassandra was the only thing running on that
>> server
>> and i was querying the same keys over and over
>>
>> i restarted Cassandra and doing the same thing, io is now down to zero
>> while cpu is up which dosen't surprise me as much.
>>
>> I'll report if it happens again.
>> Le 5 juin 2011 16:55, "Jonathan Ellis"  a écrit :
>>
>> > You may be swapping.
>> >
>> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html
>> > explains how to check this as well as how to see what threads are busy
>> > in the Java process.
>> >
>> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe  wrote:
>> >> Hello,
>> >> I am evaluating using cas

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread aaron morton
I'd also say consider what happens during maintenance and failure scenarios. 
Moving 10's TB around takes a lot longer than 100's GB. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 06:40, AJ wrote:

> Thanks to everyone who responded thus far.
> 
> 
> On 6/7/2011 10:16 AM, Benjamin Coverston wrote:
> 
>> Not to say that there aren't workloads where having many TB/Node doesn't 
>> work, but if you're planning to read from the data you're writing you do 
>> want to ensure that your working set is stored in memory.
>> 
> 
> Thank you Ben.  Can you elaborate some more on the above point?  Are you 
> referring to the OS's working set or the Cassandra caches?  Why exactly do I 
> need to ensure this?
> 
> I am also wondering if there is any reason I should segregate my frequently 
> write/read smallish data set (such as usage statistics) from my bulk mostly 
> read-only data set (static content) into separate CFs if the schema allows 
> it.  Would this be of any benefit?



Re: About Brisk, Hadoop powered by Cassandra

2011-06-07 Thread aaron morton
it's here https://github.com/riptano/brisk under the apache v2 licence

try the #datastax-brisk irc room on freenode

cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 07:03, Marcos Ortiz wrote:

> Regards to all.
> I was reading about this DataStax's product called Brisk, and I think that's 
> a amazing piece of technology.
> Only two questions?
> - Brisk is a propetary tecnology?
> - Can anyone participate on its development? (I'm very interested on this 
> Hadoop-Cassandra Integration)
> - Which is the Cassandra's version that Brisk use?
> 
> Thanks a lot for your time
> 
> -- 
> Marcos Luís Ortíz Valmaseda
> Software Engineer (UCI)
> http://marcosluis2186.posterous.com
> http://twitter.com/marcosluis2186
>  



Re: Troubleshooting IO performance ?

2011-06-07 Thread aaron morton
> So basically, I'm flooding the system right ? For example 99303 means there 
> are 99303 key reads pending, possibly from just a couple MultiSlice gets ?

Yes and then some. Each row you ask for in a multiget turns into a single row 
request in the server. You are overloading the server.

> - exactly how much data are you asking for ? how many rows and what sort of 
> slice 
> According to some munin monitoring, the server is cranking out to the client, 
> over the network, 10Mbits/s = 1.25 Mbytes/s

Was thinking in terms of rows, but thats irrelevant now. The answer is "a lot"

> BTW, cassandra is running on an XFS filesystem over LVM 
Others know more about this than me.

> One question : when nodetool cfstats says the average read latency is 5ms, is 
> that counted once the query is being executed or does that include the time 
> spent "pending" ?

In the cf stats output the latency displayed under the Keyspace is the total 
latency for all CF's / the total read count. The latency displayed for the 
individual CF's is for the actual time taken getting the columns requested for 
a row. It's taking 5ms to read the data from disk and apply the filter. 

I'd check you are reading the data you expect then wind back the number of 
requests and rows / columns requested. Get to a stable baseline and then add 
pressure to see when / how things go wrong. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 08:00, Philippe wrote:

> Aaron,
> 
> - what version are you on ? 
> 0.7.6-2
> 
> -  what is the concurrent_reads config setting ? 
> concurrent_reads: 64
> concurrent_writes: 64
>  
> Givent that I've got 4 cores and SSD drives, I doubled the concurrent writes 
> recommended.
> Given that I've RAID-0ed the SSD drive, I figured I could at least double for 
> SSD and double for RAID-0 the recommended version.
> Wrong assumptions ?
> 
> BTW, cassandra is running on an XFS filesystem over LVM over RAID-0
> 
> - what is nodetool tpstats showing during the slow down ? 
> The only value that changes is the ReadStage line. Here's values from a 
> sample every second
> Pool NameActive   Pending  Completed
> ReadStage64 99303  463085056
> ReadStage64 88430  463095929
> ReadStage64 91937  463107782
> 
> So basically, I'm flooding the system right ? For example 99303 means there 
> are 99303 key reads pending, possibly from just a couple MultiSlice gets ?
>  
> - exactly how much data are you asking for ? how many rows and what sort of 
> slice 
> According to some munin monitoring, the server is cranking out to the client, 
> over the network, 10Mbits/s = 1.25 Mbytes/s
> 
> The same munin monitoring shows me 200Mbytes/s read from the disks. This is 
> what is worrying me...
> 
> - has their been a lot of deletes or TTL columns used ? 
> No deletes, only update, don't know if that counts as deletes though...
>  
> This is going to be a read-heavy, update-heavy cluster.
> No TTL columns, no counter columns
> 
> One question : when nodetool cfstats says the average read latency is 5ms, is 
> that counted once the query is being executed or does that include the time 
> spent "pending" ?
> 
> Thanks
> Philippe
> 
> Hope that helps. 
> Aaron
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7 Jun 2011, at 10:09, Philippe wrote:
> 
>> Ok, here it goes again... No swapping at all...
>> 
>> procs ---memory-- ---swap-- -io -system-- cpu
>>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
>>  1 63  32044  88736  37996 711652400 227156 0 18314 5607 30  5 
>> 11 53
>>  1 63  32044  90844  37996 710390400 233524   202 17418 4977 29  4  
>> 9 58
>>  0 42  32044  91304  37996 712388400 249736 0 16197 5433 19  6  
>> 3 72
>>  3 25  32044  89864  37996 713598000 22314016 18135 7567 32  5 
>> 11 52
>>  1  1  32044  88664  37996 715072800 229416   128 19168 7554 36  4 
>> 10 51
>>  4  0  32044  89464  37996 714942800 21385218 21041 8819 45  5 
>> 12 38
>>  4  0  32044  90372  37996 714943200 233086   142 19909 7041 43  5 
>> 10 41
>>  7  1  32044  89752  37996 714952000 206906 0 19350 6875 50  4 
>> 11 35
>> 
>> Lots and lots of disk activity
>> iostat -dmx 2
>> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>> sda  52.50 0.00 7813.000.00   108.01 0.0028.31   
>> 117.15   14.89   14.890.00   0.11  83.00
>> sdb  56.00 0.00 7755.500.00   108.51 0.0028.66   
>> 118.67   15.18   15.180.00   0.11  82.80
>> md1   0.00 0.000.000.00 0.00 0.00 0

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Benjamin Coverston
Aaron makes a good point, the happiest customers in my opinion are the 
ones that choose nodes on the smaller side, and more of them.


Regarding the working set, I am referring to the OS cache. On linux, 
with JNA, Cassadra utilizes, to great effectiveness, memory mapped files 
and this is where I would expect most of your working set to reside.


The smaller the data set on each node the higher the proportion of CPU 
cycles, disk IO, network bandwidth, and memory you can dedicate to 
working with that data and making it work within your use case.


Ben

On 6/7/11 2:15 PM, aaron morton wrote:

I'd also say consider what happens during maintenance and failure scenarios. 
Moving 10's TB around takes a lot longer than 100's GB.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 06:40, AJ wrote:


Thanks to everyone who responded thus far.


On 6/7/2011 10:16 AM, Benjamin Coverston wrote:


Not to say that there aren't workloads where having many TB/Node doesn't work, 
but if you're planning to read from the data you're writing you do want to 
ensure that your working set is stored in memory.


Thank you Ben.  Can you elaborate some more on the above point?  Are you 
referring to the OS's working set or the Cassandra caches?  Why exactly do I 
need to ensure this?

I am also wondering if there is any reason I should segregate my frequently 
write/read smallish data set (such as usage statistics) from my bulk mostly 
read-only data set (static content) into separate CFs if the schema allows it.  
Would this be of any benefit?


--
Ben Coverston
Director of Operations
DataStax -- The Apache Cassandra Company
http://www.datastax.com/



Re: how to know there are some columns in a row

2011-06-07 Thread Patrick de Torcy
But I want values in my columns... Imagine a cf with authors as keys. Each
author has written several books. So each row has columns with the title as
column names and the text of the book as value (ie a lot of data). If a user
wants to know the different books for an author, I'd like to be able to have
the column names without the values, then a user can pick a book name. In
this case I can retrieve the value from this column (and only for this one).
Of course, I could have an additionnal column which will manage the column
names (=titles), but it's not very efficient and could be source of
errors...
If you have a method to retrieve the number of columns of a row (without
their values),  I can't see why you couldn't retrieve the column names
(without their values). It's perharps harder than I think... But it would be
rather useful !

Thanks !

On Mon, Jun 6, 2011 at 2:08 AM, aaron morton wrote:

> You can create columns without values.
>
> Are you talking about reading them back through the API ?
>
> I would suggest looking at your data model to see if there is a better way
> to support your read patterns.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6 Jun 2011, at 10:18, Patrick de Torcy wrote:
>
> It would be definetely useful to be able to have columns (or super columns)
> names WITHOUT their values. If these ones are pretty big or if there are a
> lot of columns, that would generate traffic not necessarily needed (if in
> the end you are just interrested by some column).
> Moreover it doesn't seem to be a feature too difficult to implement (well,
> I think...)
>
> Patrick
>
>
>


RE: getIndexedSlices issue using Pelops

2011-06-07 Thread Tan Huynh
Thanks Jonathan for the pointer.  It turns out the issue has to do w/ the count 
number that I specify in the index clause (Integer.MAX_VALUE).   The 
StorageProxy.scan() method allocates a list of this size, causing Cassandra 
running out of heap space. Changing the count value to smaller value fixes the 
problem.
Tan

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Tuesday, June 07, 2011 10:32 AM
To: user@cassandra.apache.org
Subject: Re: getIndexedSlices issue using Pelops

... also, are you on 0.7.6?  "works on cli but internal error w/
pelops" sounds like pelops is giving an invalid request, 0.7.6 is
better at catching those and giving a real error message.

On Tue, Jun 7, 2011 at 12:31 PM, Jonathan Ellis  wrote:
> internal error means look at the cassandra server logs for the stacktrace.
>
> On Tue, Jun 7, 2011 at 12:20 PM, Tan Huynh  wrote:
>> Hi,
>>
>>
>>
>> I am using Pelops client to query Cassandra secondary index and I get the e=
>> xception listed below.
>>
>> The code is pretty simple too. I can use Cassandra-cli to query the same
>> secondary index, so there must be something wrong in my code. If you've seen
>> this issue, would you please point me to what I am doing wrong.
>>
>> Thanks.
>>
>> Tan
>>
>>
>>
>> org.scale7.cassandra.pelops.exceptions.ApplicationException: Internal
>> error=  processing get_indexed_slices
>>
>>   at
>> org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:49)
>>
>>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:109)
>>
>>   at
>> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1623)
>>
>>   at
>> org.scale7.cassandra.pelops.Selector.getIndexedColumns(Selector.java:1578)
>>
>>   at TestSecondary.(TestSecondary.java:67)
>>
>>   at TestSecondary.main(TestSecondary.java:91)
>>
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing get_indexed_slices
>>
>>   at
>> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>>
>>   at
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_indexed_slices(Cassandra.java:772)
>>
>>   at
>> org.apache.cassandra.thrift.Cassandra$Client.get_indexed_slices(Cassandra.java:752)
>>
>>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1613)
>>
>>   at org.scale7.cassandra.pelops.Selector$15.execute(Selector.java:1610)
>>
>>   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:82)
>>
>>   ... 4 more
>>
>>
>>
>> This is the code:
>>
>>    try {
>>
>>   String collectionName = "test";
>>
>>   KsDef keyspaceDefinition = null;
>>
>>
>>
>>   cluster = new Cluster("localhost", RPC_PORT);
>>
>>   ClusterManager clusterManager = Pelops.createClusterManager(cluster);
>>
>>
>>
>>
>>
>>   KeyspaceManager keyspaceManager =
>> Pelops.createKeyspaceManager(cluster);
>>
>>
>>
>>   keyspaceDefinition = keyspaceManager.getKeyspaceSchema(KEYSPACE);
>>
>>
>>
>>   if (keyspaceDefinition != null) {
>>
>>     Pelops.addPool(POOL, cluster, KEYSPACE);
>>
>>
>>
>>     IndexClause indexClause;
>>
>>     indexClause = Selector.newIndexClause(
>>
>>     Bytes.EMPTY,
>>
>>     Integer.MAX_VALUE,
>>
>>     Selector.newIndexExpression("birth_date", IndexOperator.EQ,
>>
>>     Bytes.fromLong(1973)));
>>
>>
>>
>>     SlicePredicate slicePredicate =
>> Selector.newColumnsPredicateAll(false,
>>
>>     Integer.MAX_VALUE);
>>
>>
>>
>>     Selector selector = Pelops.createSelector(POOL);
>>
>>     Map> qResults = selector.getIndexedColumns(
>>
>>     collectionName, indexClause, slicePredicate,
>> ConsistencyLevel.ONE);
>>
>>     }
>>
>>   }
>>
>>     } catch (PelopsException e) {
>>
>>   e.printStackTrace();
>>
>>     } catch (Exception e1) {
>>
>>   e1.printStackTrace();
>>
>>     }
>>
>>
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: how to know there are some columns in a row

2011-06-07 Thread Dan Kuebrich
There might not be a built-in way to do this, but if you make two rows for
each author, eg:

nabokov_fulltext [ 'lolita' : 'Lolita, light of my life ...' , ...]
nabokov_bookindex [ 'lolita' : None , ... ]

you could query the bookindex for each author without cassandra having to
load the full texts.  This would make your cassandra row cache much more
effective for this type of query as well, and you might even consider
putting it in a separate CF.

I'd also recommend compressing the data for the full text column values.
 You can't query very well against them anyway, and it will make everything
(inserts, reads, compaction) so much better.

dan

On Tue, Jun 7, 2011 at 6:30 PM, Patrick de Torcy  wrote:

> But I want values in my columns... Imagine a cf with authors as keys. Each
> author has written several books. So each row has columns with the title as
> column names and the text of the book as value (ie a lot of data). If a user
> wants to know the different books for an author, I'd like to be able to have
> the column names without the values, then a user can pick a book name. In
> this case I can retrieve the value from this column (and only for this one).
> Of course, I could have an additionnal column which will manage the column
> names (=titles), but it's not very efficient and could be source of
> errors...
> If you have a method to retrieve the number of columns of a row (without
> their values),  I can't see why you couldn't retrieve the column names
> (without their values). It's perharps harder than I think... But it would be
> rather useful !
>
> Thanks !
>
> On Mon, Jun 6, 2011 at 2:08 AM, aaron morton wrote:
>
>> You can create columns without values.
>>
>> Are you talking about reading them back through the API ?
>>
>> I would suggest looking at your data model to see if there is a better way
>> to support your read patterns.
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 6 Jun 2011, at 10:18, Patrick de Torcy wrote:
>>
>> It would be definetely useful to be able to have columns (or super
>> columns) names WITHOUT their values. If these ones are pretty big or if
>> there are a lot of columns, that would generate traffic not necessarily
>> needed (if in the end you are just interrested by some column).
>> Moreover it doesn't seem to be a feature too difficult to implement (well,
>> I think...)
>>
>> Patrick
>>
>>
>>
>


Re: how to know there are some columns in a row

2011-06-07 Thread aaron morton
> If you have a method to retrieve the number of columns of a row (without 
> their values),  I can't see why you couldn't retrieve the column names 
> (without their values). It's perharps harder than I think... But it would be 
> rather useful ! 

Internally this just gets the full columns and counts them. 

The main reason I was dismissive was the complication it brings when dealing 
with a Column. If a Column has no value would it be because there is no value 
associated with it or because only the column name was requested? For now when 
you have a Column you have all the information about the column. 

There may also be some modelling arguments to be made. 

it's not been a show stopper for people in the past, but that does not mean 
it's a bad idea.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jun 2011, at 10:30, Patrick de Torcy wrote:

> But I want values in my columns... Imagine a cf with authors as keys. Each 
> author has written several books. So each row has columns with the title as 
> column names and the text of the book as value (ie a lot of data). If a user 
> wants to know the different books for an author, I'd like to be able to have 
> the column names without the values, then a user can pick a book name. In 
> this case I can retrieve the value from this column (and only for this one).
> Of course, I could have an additionnal column which will manage the column 
> names (=titles), but it's not very efficient and could be source of errors...
> If you have a method to retrieve the number of columns of a row (without 
> their values),  I can't see why you couldn't retrieve the column names 
> (without their values). It's perharps harder than I think... But it would be 
> rather useful ! 
> 
> Thanks !
> 
> On Mon, Jun 6, 2011 at 2:08 AM, aaron morton  wrote:
> You can create columns without values. 
> 
> Are you talking about reading them back through the API ? 
> 
> I would suggest looking at your data model to see if there is a better way to 
> support your read patterns. 
> 
> Cheers
> 
> -
> Aaron Morton 
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6 Jun 2011, at 10:18, Patrick de Torcy wrote:
> 
>> It would be definetely useful to be able to have columns (or super columns) 
>> names WITHOUT their values. If these ones are pretty big or if there are a 
>> lot of columns, that would generate traffic not necessarily needed (if in 
>> the end you are just interrested by some column).
>> Moreover it doesn't seem to be a feature too difficult to implement (well, I 
>> think...)
>> 
>> Patrick
> 
> 



Re: Multiple large disks in server - setup considerations

2011-06-07 Thread Edward Capriolo
On Tue, Jun 7, 2011 at 12:43 PM, Ryan King  wrote:

> On Tue, Jun 7, 2011 at 4:34 AM, Erik Forsberg  wrote:
> > On Tue, 31 May 2011 13:23:36 -0500
> > Jonathan Ellis  wrote:
> >
> >> Have you read http://wiki.apache.org/cassandra/CassandraHardware ?
> >
> > I had, but it was a while ago so I guess I kind of deserved an RTFM! :-)
> >
> > After re-reading it, I still want to know:
> >
> > * If we disregard the performance hit caused by having the commitlog on
> >  the same physical device as parts of the data, are there any other
> >  grave effects on Cassandra's functionality with a setup like that?
>
> You'll take a performance hit if you hare a high write load. I'd
> recommend doing your own benchmarks (with an existing benchmark
> framework like YCSB) against the configuration you'd like to use.
>
> > * How does Cassandra handle a case where one of the disks in a striped
> >  RAID0 partition goes bad and is replaced? Is the only option to wipe
> >  everything from that node and reinit the node, or will it handle
> >  corrupt files?
>
> Don't plan on being able to recover any date on that node.
>
> > I.e, what's the recommended thing to do from an
> >  operations point of view when a disk dies on one of the nodes in a
> >  RAID0 Cassandra setup? What will cause the least risk for data loss?
> >  What will be the fastest way to get the node up to speed with the
> >  rest of the cluster?
>
> Decommission (or removetoken) on the dead node, replace the drive and
> rebootstrap.
>
> -ryan
>

I do not like large disk set-ups. I think they end up not being economical.
Most low latency use cases want high RAM to DISK ratio.  Two machines with
32GB RAM is usually less expensive then one machine with 64GB ram.

For a machine with 1TB drives (or multiple 1TB drives) it is going to be
difficult to get enough RAM to help with random read patterns.

Also cluster operations like joining, decommissioning, or repair can take a
*VERY* long time maybe a day. More smaller servers like blade style or more
agile.


Re: CLI set command returns null, ver 0.8.0

2011-06-07 Thread AJ
Can anyone help?  The CLI seems to be having issues.  The count command 
isn't working either:


[default@Keyspace1] count User[long(1)];
Expected 8 or 0 byte long (13)
java.lang.RuntimeException: Expected 8 or 0 byte long (13)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]
[default@Keyspace1] count User[1];;
Expected 8 or 0 byte long (1)
java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1] count User['1'];
Expected 8 or 0 byte long (1)
java.lang.RuntimeException: Expected 8 or 0 byte long (1)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:284)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1] count User['12345678'];
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]


Granted, there are no rows in the CF yet (see probs below), but this 
exception seems to be during the parsing stage.


I've check everything else, AFAIK, so I'm at a loss.

Much obliged.

On 6/7/2011 12:44 PM, AJ wrote:

The log only shows INFO level messages about flushes, etc..

The debug mode of the CLI shows an exception after the set:

[al@mars ~]$ cassandra-cli -h 192.168.1.101 --debug
Connected to: "Test Cluster" on 192.168.1.101/9160
Welcome to the Cassandra CLI.

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] set User[1]['name']='aaa';
null
java.lang.RuntimeException
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:292)
at 
org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217)

at org.apache.cassandra.cli.CliMain.main(CliMain.java:345)
[default@Keyspace1]






Re: Multiple large disks in server - setup considerations

2011-06-07 Thread AJ

On 6/7/2011 9:32 PM, Edward Capriolo wrote:



I do not like large disk set-ups. I think they end up not being 
economical. Most low latency use cases want high RAM to DISK ratio.  
Two machines with 32GB RAM is usually less expensive then one machine 
with 64GB ram.


For a machine with 1TB drives (or multiple 1TB drives) it is going to 
be difficult to get enough RAM to help with random read patterns.


Also cluster operations like joining, decommissioning, or repair can 
take a *VERY* long time maybe a day. More smaller servers like blade 
style or more agile.




Is there some rule-of-thumb as to how much RAM is needed per GB of 
data?  I know it probably "depends", but if you could try to explain the 
best you can that would be great!  I too am projecting "big data" 
requirements.


Re: Installing Thrift with Solandra

2011-06-07 Thread Jean-Nicolas Boulay Desjardins
I found start-solandra.sh in resources folder. But when I execute it. I
still get an error.

http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png

Thanks
again.

On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins <
jnbdzjn...@gmail.com> wrote:

> Ok
>
> So I have to install Thrift and Cassandra than Solandra.
>
> I am asking because I followed the instructions in your Git page but I get
> this error:
>
> # cd solandra-app; ./start-solandra.sh
>
> -bash: ./start-solandra.sh: No such file or directory
>
> Thanks again :)
>
> On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani  wrote:
>
>> This seems to be a common cause of confusion. Let me try again.
>>
>> Solandra doesn't integrate your Cassandra data into solr. It simply
>> provides a scalable backend for solr by
>> Building on Cassandra. The inverted index lives in it's own Cassandra
>> keyspace.
>>
>> What you have in the end is two functionally different components
>> (Cassandra and solr) in one logical service.
>>
>> Jake
>>
>> On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
>>  wrote:
>> > I just saw a post you made on Stackoverflow, where you said:
>> > "The Solandra project which is replacing Lucandra no longer uses thrift,
>> only Solr."
>> >
>> > So I use Solr to access my data in Cassandra?
>> > Thanks again...
>> > On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins <
>> jnbdzjn...@gmail.com> wrote:
>> > Thanks again :)
>> > Ok... But in the tutorial it says that I need to build a Thrift
>> interface for Cassandra:
>> >
>> >
>> > ./compiler/cpp/thrift -gen php
>> ../PATH-TO-CASSANDRA/interface/cassandra.thrift
>> > How do I do this?
>> > Where is the interface folder?
>> >
>> >
>> > Again, tjake thanks allot for your time and help.
>> > On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani  wrote:
>> > To access Cassandra in Solandra it's the same as regular cassandra.  To
>> access Solr you use one of the Php Solr libraries
>> http://wiki.apache.org/solr/SolPHP
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins <
>> jnbdzjn...@gmail.com> wrote:
>> >
>> >
>> >
>> >
>> >
>> > I am trying to install Thrift with Solandra.
>> >
>> >
>> >
>> > Normally when I just want to install Thrift with Cassandra, I followed
>> this tutorial:
>> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > But how can I do the same for Solandra?
>> >
>> >
>> >
>> > Thrift with PHP...--
>> > Name / Nom: Boulay Desjardins, Jean-Nicolas
>> > Website / Site Web: www.jeannicolas.com
>> >
>> >
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>



-- 
Name / Nom: Boulay Desjardins, Jean-Nicolas
Website / Site Web: www.jeannicolas.com


Re: Installing Thrift with Solandra

2011-06-07 Thread Krish Pan
you are trying to run solandra from resources directory,

follow these steps

1) don't use root - use a regular user
2) cd /tmp/
3) git clone git://github.com/tjake/Solandra.git
4) cd Solandra
5) ant

once you get BUILD SUCCESSFUL

6) cd solandra-app
7) ./start-solandra.sh



On Tue, Jun 7, 2011 at 10:29 PM, Jean-Nicolas Boulay Desjardins <
jnbdzjn...@gmail.com> wrote:

> I found start-solandra.sh in resources folder. But when I execute it. I
> still get an error.
>
>
> http://dl.dropbox.com/u/20599297/Screen%20shot%202011-06-08%20at%201.27.26%20AM.png
>
>
> Thanks
> again.
>
> On Tue, Jun 7, 2011 at 12:23 PM, Jean-Nicolas Boulay Desjardins <
> jnbdzjn...@gmail.com> wrote:
>
>> Ok
>>
>> So I have to install Thrift and Cassandra than Solandra.
>>
>> I am asking because I followed the instructions in your Git page but I get
>> this error:
>>
>> # cd solandra-app; ./start-solandra.sh
>>
>> -bash: ./start-solandra.sh: No such file or directory
>>
>> Thanks again :)
>>
>> On Tue, Jun 7, 2011 at 7:55 AM, Jake Luciani  wrote:
>>
>>> This seems to be a common cause of confusion. Let me try again.
>>>
>>> Solandra doesn't integrate your Cassandra data into solr. It simply
>>> provides a scalable backend for solr by
>>> Building on Cassandra. The inverted index lives in it's own Cassandra
>>> keyspace.
>>>
>>> What you have in the end is two functionally different components
>>> (Cassandra and solr) in one logical service.
>>>
>>> Jake
>>>
>>> On Tuesday, June 7, 2011, Jean-Nicolas Boulay Desjardins
>>>  wrote:
>>> > I just saw a post you made on Stackoverflow, where you said:
>>> > "The Solandra project which is replacing Lucandra no longer uses
>>> thrift, only Solr."
>>> >
>>> > So I use Solr to access my data in Cassandra?
>>> > Thanks again...
>>> > On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins <
>>> jnbdzjn...@gmail.com> wrote:
>>> > Thanks again :)
>>> > Ok... But in the tutorial it says that I need to build a Thrift
>>> interface for Cassandra:
>>> >
>>> >
>>> > ./compiler/cpp/thrift -gen php
>>> ../PATH-TO-CASSANDRA/interface/cassandra.thrift
>>> > How do I do this?
>>> > Where is the interface folder?
>>> >
>>> >
>>> > Again, tjake thanks allot for your time and help.
>>> > On Mon, Jun 6, 2011 at 11:13 PM, Jake Luciani 
>>> wrote:
>>> > To access Cassandra in Solandra it's the same as regular cassandra.  To
>>> access Solr you use one of the Php Solr libraries
>>> http://wiki.apache.org/solr/SolPHP
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins <
>>> jnbdzjn...@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > I am trying to install Thrift with Solandra.
>>> >
>>> >
>>> >
>>> > Normally when I just want to install Thrift with Cassandra, I followed
>>> this tutorial:
>>> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > But how can I do the same for Solandra?
>>> >
>>> >
>>> >
>>> > Thrift with PHP...--
>>> > Name / Nom: Boulay Desjardins, Jean-Nicolas
>>> > Website / Site Web: www.jeannicolas.com
>>> >
>>> >
>>>
>>> --
>>> http://twitter.com/tjake
>>>
>>
>>
>>
>> --
>> Name / Nom: Boulay Desjardins, Jean-Nicolas
>> Website / Site Web: www.jeannicolas.com
>>
>
>
>
> --
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>