Re: Slow Reads

2011-07-27 Thread CASSANDRA learner
R u using hector client for java

On Tue, Jul 26, 2011 at 11:17 PM, Priyanka  wrote:

> this is how my data looks
> “rowkey1”:{
>“supercol1”:{ “col1”:T,”col2”:C}
>“supercol2”:{“col1”:C,”col2”:T }
>“supercol3”:{ “col1”:C,”col2”:T}
>}
> "rowkey2”:{
>   “supercol1”:{ “col1”:A,”col2”:A}
>“supercol2”:{“col1”:A,”col2”:T }
>“supercol3”:{ “col1”:C,”col2”:T}
> }
>
> each row has 620901 super columns and 2 columns for each super column.
> Name of the super columns remain same for all the rows but the data in each
> super column is different.
> I am trying to get the data of a particular super col which is spread
> across
> all the rows but with different data.
>
> So  yes,its getting data from all rows.
> Please suggest me a better way to do so.
> Thank you.
>
> the output of my query will be (suppose if i do for supercol1)
> rowkey1,T,C
> rowkey2,A,A
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Is Cassandra Secured

2011-07-27 Thread CASSANDRA learner
Hi,

My Question is regarding security.
The data will be written in the disks in strings right! then how come the
data is secured, Is it not secured ???


Can we store java objects and images/files in cassandra

2011-07-27 Thread CASSANDRA learner
Hi,

Can we store images , java objects, files in cassandra, if so , how
Please let me know this as i need it urgently...


Read process

2011-07-27 Thread CASSANDRA learner
Hi,

I am having one doubt regarding reads. The data will be stored in
commitlog,memtable,sstables right.. While reading the data may be available
in all the three right, then from where the reads happens,, form commit log?
or from Memtable ? or from SSTables.. Please explain friends

Thnks


Re: Can we store java objects and images/files in cassandra

2011-07-27 Thread Oliver Dungey
You can store anything you like in Cassandra. The type of data is not
relevant as there are no types in Cassandra, they all get stored as byte
arrays. The only relevant limit is a column value cannot exceed 2GB (see
http://wiki.apache.org/cassandra/CassandraLimitations).

In terms of how: you just write the data like any other field. If you are
new to NOSQL I would go and look at the Hector API and examples:
https://github.com/rantav/hector


Re: Is Cassandra Secured

2011-07-27 Thread Oliver Dungey
Out of the box there is no security turned on but there are a number of
options: http://wiki.apache.org/cassandra/StorageConfiguration,
http://wiki.apache.org/cassandra/ExtensibleAuth

If you are worried about storing data in clear text then you should encrypt
it before storage.


Re: Recovering from a multi-node cluster failure caused by OOM on repairs

2011-07-27 Thread Maki Watanabe
This kind of information is very helpful.
Thank you to share your experience.

maki


2011/7/27 Teijo Holzer :
> Hi,
>
> I thought I share the following with this mailing list as a number of other
> users seem to have had similar problems.
>
> We have the following set-up:
>
> OS: CentOS 5.5
> RAM: 16GB
> JVM heap size: 8GB (also tested with 14GB)
> Cassandra version: 0.7.6-2 (also tested with 0.7.7)
> Oracle JDK version: 1.6.0_26
> Number of nodes: 5
> Load per node: ~40GB
> Replication factor: 3
> Number of requests/day: 2.5 Million (95% inserts)
> Total net insert data/day: 1GB
> Default TTL for most of the data: 10 days
>
> This set-up has been operating successfully for a few months, however
> recently
> we started seeing multi-node failures, usually triggered by a repair, but
> occasionally also under normal operation. A repair on node 3,4 and 5 would
> always cause the cluster as whole to fail, whereas node 1 & 2 completed
> their
> repair cycles successfully.
>
> These failures would usually result in 2 or 3 nodes becoming unresponsive
> and
> dropping out of the cluster, resulting in client failure rates to spike up
> to
> ~10%. We normally operate with a failure rate of <0.1%.
>
> The relevant log entries showed a complete heap memory exhaustion within 1
> minute (see log lines below where we experimented with a larger heap size of
> 14GB). Also of interest was a number of huge SliceQueryFilter collections
> running concurrently on the nodes in question (see log lines below).
>
> The way we ended recovering from this situation was as follows. Remember
> these
> steps were taken to get an unstable cluster back under control, so you might
> want to revert some of the changes once the cluster is stable again.
>
> Set "disk_access_mode: standard" in cassandra.yaml
> This allowed us to prevent the JVM blowing out the hard limit of 8GB via
> large
> mmaps. Heap size was set to 8GB (RAM/2). That meant the JVM was never using
> more than 8GB total. mlockall didn't seem to make a difference for our
> particular problem.
>
> Turn off all row & key caches via cassandra-cli, e.g.
> update column family Example with rows_cached=0;
> update column family Example with keys_cached=0;
> We were seeing compacted row maximum sizes of ~800MB from cfstats, that's
> why
> we turned them all off. Again, we saw a significant drop in the actual
> memory
> used from the available maximum of 8GB. Obviously, this will affect reads,
> but
> as 95% of our requests are inserts, it didn't matter so much for us.
>
> Bootstrap problematic node:
> Kill Cassandra
> Change "auto_bootstrap: true" in cassandra.yaml, remove own IP address from
> list of seeds (important)
> Delete all data directories (i.e. commit-log, data, saved-caches)
> Start Cassandra
> Wait for bootstrap to finish (see log & nodetool)
> Change "auto_bootstrap: false"
> (Run repair)
>
> The first bootstrap completed very quickly, so we decided to bootstrap every
> node in the cluster (not just the problematic ones). This resulted in some
> data
> loss. The next time we will follow the bootstrap by a repair before
> bootstrapping & repairing the next node to minimize data loss.
>
> After this procedure, the cluster was operating normally again.
>
> We now run a continuous rolling repair, followed by a (major) compaction and
> a
> manual garbage collection. As the repairs a required anyway, we decided to
> run
> them all the time in a continuous fashion. Therefore, potential problems can
> be identified earlier.
>
> The major compaction followed by a manual GC allows us to keep the disk
> usage low on each node. The manual GC is necessary as the unused files on
> disk are only really deleted when the reference is garbage collected inside
> the JVM (a restart would achieve the same).
>
> We also collected some statistics in regards to the duration of some of the
> operations:
>
> cleanup/compact: ~1 min/GB
> repair: ~2-3 min/GB
> bootstrap: ~1 min/GB
>
> This means that if you have a node with 60GB of data, it will take ~1hr to
> compact and ~2-3hrs to repair. Therefore, it is advisable to keep the data
> per
> node below ~120GB. We achieve this by using an aggressive TTL on most of our
> writes.
>
> Cheers,
>
>   Teijo
>
> Here are the relevant log entries showing the OOM conditions:
>
>
> [2011-07-21 11:12:11,059] INFO: GC for ParNew: 1141 ms, 509843976 reclaimed
> leaving 1469443752 used; max is 14675869696 (ScheduledTasks:1
> GCInspector.java:128)
> [2011-07-21 11:12:15,226] INFO: GC for ParNew: 1149 ms, 564409392 reclaimed
> leaving 2247228920 used; max is 14675869696 (ScheduledTasks:1
> GCInspector.java:128)
> ...
> [2011-07-21 11:12:55,062] INFO: GC for ParNew: 1110 ms, 564365792 reclaimed
> leaving 12901974704 used; max is 14675869696 (ScheduledTasks:1
> GCInspector.java:128)
>
> [2011-07-21 10:57:23,548] DEBUG: collecting 4354206 of 2147483647:
> 940657e5b3b0d759eb4a14a7228ae365:false:41@1311102443362542 (ReadStage:27
> SliceQueryFilter.java:123)
>



-- 
w3m

Too many open files

2011-07-27 Thread Donna Li
All:

What does the following error mean? One of my cassandra servers print
this error, and nodetool shows the state of the server is down. Netstat
result shows the socket number is very few.

 

WARN [main] 2011-07-27 16:14:04,872 CustomTThreadPoolServer.java (line
104) Transport error occurred during acceptance of message.

org.apache.thrift.transport.TTransportException:
java.net.SocketException: Too many open files

 at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:
124)

 at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:
35)

 at
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.jav
a:31)

 at
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadP
oolServer.java:98)

 at
org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:1
83)

 at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:22
4)

Caused by: java.net.SocketException: Too many open files

 at java.net.PlainSocketImpl.socketAccept(Native Method)

 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)

 at java.net.ServerSocket.implAccept(ServerSocket.java:453)

 at java.net.ServerSocket.accept(ServerSocket.java:421)

 at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:
119)

 ... 5 more

 

Best Regards

Donna li



Re: Can we store java objects and images/files in cassandra

2011-07-27 Thread CASSANDRA learner
Thanks for the response Oliver,

Can you please let me know where can i find an example for storing images
and files. I could not able to find one in tht link. Please let me know

On Wed, Jul 27, 2011 at 1:48 PM, Oliver Dungey wrote:

> You can store anything you like in Cassandra. The type of data is not
> relevant as there are no types in Cassandra, they all get stored as byte
> arrays. The only relevant limit is a column value cannot exceed 2GB (see
> http://wiki.apache.org/cassandra/CassandraLimitations).
>
> In terms of how: you just write the data like any other field. If you are
> new to NOSQL I would go and look at the Hector API and examples:
> https://github.com/rantav/hector
>
>
>


Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-27 Thread lebron james
I zip last version of cassandra with yaml file and 37gb database, can
anybody download and do major compaction to check what is wrong? and why
cassandra fall in outofmemory exception. Tanks!
here is link to download
http://213.186.117.181/apache-cassandra-0.8.2-bin.zip


Re: Slow Reads

2011-07-27 Thread Priyanka Ganuthula
Yes am using hector for java

On Wed, Jul 27, 2011 at 3:35 AM, CASSANDRA learner <
cassandralear...@gmail.com> wrote:

> R u using hector client for java
>
>
> On Tue, Jul 26, 2011 at 11:17 PM, Priyanka  wrote:
>
>> this is how my data looks
>> “rowkey1”:{
>>“supercol1”:{ “col1”:T,”col2”:C}
>>“supercol2”:{“col1”:C,”col2”:T }
>>“supercol3”:{ “col1”:C,”col2”:T}
>>}
>> "rowkey2”:{
>>   “supercol1”:{ “col1”:A,”col2”:A}
>>“supercol2”:{“col1”:A,”col2”:T }
>>“supercol3”:{ “col1”:C,”col2”:T}
>> }
>>
>> each row has 620901 super columns and 2 columns for each super column.
>> Name of the super columns remain same for all the rows but the data in
>> each
>> super column is different.
>> I am trying to get the data of a particular super col which is spread
>> across
>> all the rows but with different data.
>>
>> So  yes,its getting data from all rows.
>> Please suggest me a better way to do so.
>> Thank you.
>>
>> the output of my query will be (suppose if i do for supercol1)
>> rowkey1,T,C
>> rowkey2,A,A
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>


Re: Too many open files

2011-07-27 Thread Peter Schuller
> What does the following error mean? One of my cassandra servers print this
> error, and nodetool shows the state of the server is down. Netstat result
> shows the socket number is very few.

The operating system enforced limits have been hit, so Cassandra is
unable to create additional file descriptors (so it can't open files,
TCP connections, etc).

The correct fix is to ensure that Cassandra is running with higher
operating system enforced limits (see ulimit,
/etc/security/limits.conf, etc).

Cassandra is not expected to deal with this type of error gracefully
and you will want to restart nodes that run into this.

-- 
/ Peter Schuller (@scode on twitter)


Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-27 Thread lebron james
Why when i give jvm 1 GB heap size and try launch cassandra with 37GB
database, cassandra start loading database, but after memory usage full,
they fall with OutOfMemory Exception. Cassandra can`t work with low memory?
or she critical need more RAM if database grows?
here is link to download cassndra dist 0.8.2 with my 37GB database
http://213.186.117.181/apache-cassandra-0.8.2-bin.zip


results of index slice query

2011-07-27 Thread Roland Gude
Hi,
I was just experiencing that when i do an IndexSliceQuery with the index column 
not in the slicerange the index column will be returned anyways. Is this 
behavior intended or is it a bug (if so - is it a Cassandra bug or a hector 
bug)?
I am using Cassandra 0.7.7 and hector 0.7-26

Greetings,
roland

--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.g...@yoochoose.com
WWW: www.yoochoose.com

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln



Re: Too many open files

2011-07-27 Thread Adil
u should take a look at this

http://www.datastax.com/docs/0.7/troubleshooting/index

@dil

2011/7/27 Donna Li 

>  All:
>
> What does the following error mean? One of my cassandra servers print this
> error, and nodetool shows the state of the server is down. Netstat result
> shows the socket number is very few.
>
> ** **
>
> WARN [main] 2011-07-27 16:14:04,872 CustomTThreadPoolServer.java (line 104)
> Transport error occurred during acceptance of message.
>
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
>
>  at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
> 
>
>  at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
> 
>
>  at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> 
>
>  at
> org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98)
> 
>
>  at
> org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:183)
> 
>
>  at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 
>
> Caused by: java.net.SocketException: Too many open files
>
>  at java.net.PlainSocketImpl.socketAccept(Native Method)
>
>  at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
>
>  at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>
>  at java.net.ServerSocket.accept(ServerSocket.java:421)
>
>  at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
> 
>
>  ... 5 more
>
> ** **
>
> Best Regards
>
> Donna li
>


Error when set memtable_troughput with Cassandra-CLI

2011-07-27 Thread lebron james
Hi!
Need set memtable_troughput for cassandra
I try do this with help cassandra-cli by write this command

"update column family columnfamily2 memtable_troughput=155;"

but i get error

"missing EOF at memtable_troughput"

help! how i can set attribute memtable_troughput?


Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database

2011-07-27 Thread Edward Capriolo
On Wed, Jul 27, 2011 at 7:40 AM, lebron james  wrote:

> Why when i give jvm 1 GB heap size and try launch cassandra with 37GB
> database, cassandra start loading database, but after memory usage full,
> they fall with OutOfMemory Exception. Cassandra can`t work with low memory?
> or she critical need more RAM if database grows?
> here is link to download cassndra dist 0.8.2 with my 37GB database
> http://213.186.117.181/apache-cassandra-0.8.2-bin.zip
>
>
Cassandra's bloom filters must fit in memory. In addition each index file is
sampled (check the index_interval in cassandra.yaml)

# The Index Interval determines how large the sampling of row keys
#  is for a given SSTable. The larger the sampling, the more effective
#  the index is at the cost of space.
index_interval: 128

You can adjust index_interval to load large datasets with not-so-large
memory JVM, but 1GB heap for 37 GB database is not the best way to go.

How much ram does your system have?


Java bits: Horizontal Scalibility and DAO layer design

2011-07-27 Thread Selcuk Bozdag
Hi,

The question I am asking is a bit about the design of a data access
objects layer on top of a cassandra datastore.

I had a look at project Kundera which basically implies JPA approach
by creating POJOs and mapping them through annotations. Looks
promising but what if I would like to have new columns inside the data
store(i.e. an attribute of a pojo class) without changing the
codebase? How could I achieve such an extensiblity on the DAO layer?
What would be the most relevant design pattern?

Please do not take the question regarding Kundera, it was just an
example. The eventual design should let me CRUD POJOs while gaining an
ability to be extended.

Regards,

Selcuk


Re: Read process

2011-07-27 Thread samal
from ROW CACHE {if enabled} -->KEY CACHE-->MEMTABLE-->SSTABLE

On Wed, Jul 27, 2011 at 1:19 PM, CASSANDRA learner <
cassandralear...@gmail.com> wrote:

> Hi,
>
> I am having one doubt regarding reads. The data will be stored in
> commitlog,memtable,sstables right.. While reading the data may be available
> in all the three right, then from where the reads happens,, form commit log?
> or from Memtable ? or from SSTables.. Please explain friends
>
> Thnks
>


Re: Java bits: Horizontal Scalibility and DAO layer design

2011-07-27 Thread Ikeda Anthony
Speaking from personal experience (and believe me it all comes down to this) we 
attempted a DAO approach as well. However, we found that based upon all the 
mutator.addInsertion()s we were creating across column families everything 
ended up in the same DAO - we only have 3 main Column Families that represent 
about 10 logical entities.

So we have now moved to a Command pattern where the mutations are encapsulated 
within a command which can be used with other commands if need be - though we 
haven't found a need to just yet.

Anthony




On 27/07/2011, at 08:15 AM, Selcuk Bozdag wrote:

> Hi,
> 
> The question I am asking is a bit about the design of a data access
> objects layer on top of a cassandra datastore.
> 
> I had a look at project Kundera which basically implies JPA approach
> by creating POJOs and mapping them through annotations. Looks
> promising but what if I would like to have new columns inside the data
> store(i.e. an attribute of a pojo class) without changing the
> codebase? How could I achieve such an extensiblity on the DAO layer?
> What would be the most relevant design pattern?
> 
> Please do not take the question regarding Kundera, it was just an
> example. The eventual design should let me CRUD POJOs while gaining an
> ability to be extended.
> 
> Regards,
> 
> Selcuk



Re: Re: Cassandra start/stop scripts

2011-07-27 Thread Jonathan Ellis
The main advantage to that is, If you kill it without -9 it will make
sure to let the commitlog flush and fsync first, if you've been using
periodic sync (the default).

On Wed, Jul 27, 2011 at 1:31 AM,   wrote:
> A simple kill without -9 should work. Have you tried that?
>
> On , Jason Pell  wrote:
>> Check out the rpm packages from Cassandra they have init.d scripts that
>> work very nicely, there are debs as well for ubuntu
>>
>> Sent from my iPhone
>>
>> On Jul 27, 2011, at 3:19, Priyanka priya...@gmail.com> wrote:
>>
>>
>>
>> I do the same way...
>>
>>
>> On Tue, Jul 26, 2011 at 1:07 PM, mcasandra [via [hidden email]] [hidden
>> email]> wrote:
>>
>>
>> I need to write cassandra start/stop script. Currently I run "cassandra"
>> to start and kill -9 to stop.
>>
>>
>> Is this the best way? kill -9 doesn't sound right :) Wondering how others
>> do it.
>>
>>
>>
>>
>>
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622977.html
>>
>>
>> To start a new topic under [hidden email], email [hidden email]
>>
>> To unsubscribe from [hidden email], click here.
>>
>>
>>
>>
>>
>>
>>
>>
>> View this message in context: Re: Cassandra start/stop scripts
>>
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>>
>>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: results of index slice query

2011-07-27 Thread Jonathan Ellis
Sounds like a Cassandra bug to me.

On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude  wrote:
> Hi,
>
> I was just experiencing that when i do an IndexSliceQuery with the index
> column not in the slicerange the index column will be returned anyways. Is
> this behavior intended or is it a bug (if so – is it a Cassandra bug or a
> hector bug)?
>
> I am using Cassandra 0.7.7 and hector 0.7-26
>
>
>
> Greetings,
>
> roland
>
>
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Java bits: Horizontal Scalibility and DAO layer design

2011-07-27 Thread Selcuk Bozdag
Thanks Anthony. Following your comments, I searched for a similar
approach using command pattern. http://code.google.com/p/casemate
seems a way of doing that.

On 27 July 2011 18:23, Ikeda Anthony  wrote:
> Speaking from personal experience (and believe me it all comes down to this) 
> we attempted a DAO approach as well. However, we found that based upon all 
> the mutator.addInsertion()s we were creating across column families 
> everything ended up in the same DAO - we only have 3 main Column Families 
> that represent about 10 logical entities.
>
> So we have now moved to a Command pattern where the mutations are 
> encapsulated within a command which can be used with other commands if need 
> be - though we haven't found a need to just yet.
>
> Anthony
>
>
>
>
> On 27/07/2011, at 08:15 AM, Selcuk Bozdag wrote:
>
>> Hi,
>>
>> The question I am asking is a bit about the design of a data access
>> objects layer on top of a cassandra datastore.
>>
>> I had a look at project Kundera which basically implies JPA approach
>> by creating POJOs and mapping them through annotations. Looks
>> promising but what if I would like to have new columns inside the data
>> store(i.e. an attribute of a pojo class) without changing the
>> codebase? How could I achieve such an extensiblity on the DAO layer?
>> What would be the most relevant design pattern?
>>
>> Please do not take the question regarding Kundera, it was just an
>> example. The eventual design should let me CRUD POJOs while gaining an
>> ability to be extended.
>>
>> Regards,
>>
>> Selcuk
>
>


Re: Error when set memtable_troughput with Cassandra-CLI

2011-07-27 Thread Adi
typo in >> memtable_troughput

"update column family columnfamily2 memtable_troughput=155;"

"update column family columnfamily2 memtable_throughput=155;"

On Wed, Jul 27, 2011 at 9:59 AM, lebron james  wrote:

> Hi!
> Need set memtable_troughput for cassandra
> I try do this with help cassandra-cli by write this command
>
> "update column family columnfamily2 memtable_troughput=155;"
>
> but i get error
>
> "missing EOF at memtable_troughput"
>
> help! how i can set attribute memtable_troughput?
>


Questions over the use of CQL

2011-07-27 Thread Anthony Ikeda
For our current project we have decided to use Hector as the client API,
however, with the introduction of CQL I need to understand a few things.

Firstly, CQL use SQL like constructs. Column names seem to be limited to the
same constraints of SQL (restricted use of delimiters) and yet the strengths
of Cassandra actually lie in the fact that we can delimit column names for
hierarchical use - if anything it was encouraged at the Cassandra SF 2011
conference.

Should I be ensuring that I avoid using delimiters such as ':', '-' for
column names now?

Does CQL Support (Dynamic)Composite column names? Row Keys?

What other limitations does CQL have that are not present in Hector?

Thanks,
Anthony


Re: Slow Reads

2011-07-27 Thread Indranath Ghosh
You might want to avoid super columns and denormalize your schema...
Since you are querying by the supercoumns... you can make them the rowkeys
and current rowkeys can be your column names.. and using composite column
names to get to the columns faster.
Something like this (used your representation):

"supercol1":{

"rowkey1_col1":T

"rowkey1_col2":C

"rowkey2_col1":A

"rowkkey2_col2":A

   }

"supercol2":{

"rowkey1_col1":C

"rowkey1_col2":T
"rowkey2_col1":A
"rowkkey2_col2":A

   }


"supercol3":{

   }

"rowkey1_col1":C

"rowkey1_col2":T

"rowkey2_col1":C
"rowkkey2_col2":T

-indra

On Tue, Jul 26, 2011 at 10:47 AM, Priyanka  wrote:

> this is how my data looks
> “rowkey1”:{
>“supercol1”:{ “col1”:T,”col2”:C}
>“supercol2”:{“col1”:C,”col2”:T }
>“supercol3”:{ “col1”:C,”col2”:T}
>}
> "rowkey2”:{
>   “supercol1”:{ “col1”:A,”col2”:A}
>“supercol2”:{“col1”:A,”col2”:T }
>“supercol3”:{ “col1”:C,”col2”:T}
> }
>
> each row has 620901 super columns and 2 columns for each super column.
> Name of the super columns remain same for all the rows but the data in each
> super column is different.
> I am trying to get the data of a particular super col which is spread
> across
> all the rows but with different data.
>
> So  yes,its getting data from all rows.
> Please suggest me a better way to do so.
> Thank you.
>
> the output of my query will be (suppose if i do for supercol1)
> rowkey1,T,C
> rowkey2,A,A
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
*Indranath Ghosh
Phone: 408-813-9207*


Re: Can we store java objects and images/files in cassandra

2011-07-27 Thread mcasandra

CASSANDRA learner wrote:
> 
> Hi,
> 
> Can we store images , java objects, files in cassandra, if so , how
> Please let me know this as i need it urgently...
> 

Look at  http://goo.gl/S2E3C http://goo.gl/S2E3C 

It really depends on your workload. With heavy workloads cassandra is not
the right solution to store images and other large objects. You will get hit
by compactions taking longer, slow reads, disk space wastage since currently
you need 50% of unused disk space. But if you have low throughput
requirements then you can probably go with Cassandra. It's best to run some
stress test with bigger column size and projected traffic for next several
years.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-we-store-java-objects-and-images-files-in-cassandra-tp6625130p6626986.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Error when set memtable_troughput with Cassandra-CLI

2011-07-27 Thread lebron james
sorry, i dont understand, what right syntax to set memtable_throughput?


>


Re: Error when set memtable_troughput with Cassandra-CLI

2011-07-27 Thread Jeremy Hanna
Try help on the CLI for how to do it, specifically "help update column family;"
It looks like you're missing the "with."

update column family columnfamily2 memtable_throughput=155;

should be 

update column family columnfamily2 with memtable_throughput=155;

On Jul 27, 2011, at 12:49 PM, lebron james wrote:

> sorry, i dont understand, what right syntax to set memtable_throughput?
> 
> 
> 



Re: Slow Reads

2011-07-27 Thread Priyanka
Thank you Indra for your suggestion.
But the thing is apart from pulling data based on supercol in the below
example I also need to query to pull the data based on a particular
rowkey.If I change the model as u mentioned this query becomes slow.
I need to do both the retrievals efficiently.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6627231.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Slow Reads

2011-07-27 Thread Jake Luciani
The philosophy in no-sql is to store the data as you plan to access it. that
means duplicating the data many time possibly.  Disk is cheap, writes are
fast.


On Wed, Jul 27, 2011 at 2:22 PM, Priyanka  wrote:

> Thank you Indra for your suggestion.
> But the thing is apart from pulling data based on supercol in the below
> example I also need to query to pull the data based on a particular
> rowkey.If I change the model as u mentioned this query becomes slow.
> I need to do both the retrievals efficiently.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6627231.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
http://twitter.com/tjake


Re: Java bits: Horizontal Scalibility and DAO layer design

2011-07-27 Thread Peter Tillotson
I had a quick look at DAO and my common uses case is, row names
appearing as column ID's elsewhere (indexes etc). I also wanted to track
deltas at a column level and have a reasonable sized client side cache.

I ended up with two abastract DAO's, one for Column Family, one for
Super column factory, then subclasses with overridden methods for build
/ deconstruct POJOs.

Managing a bag of columns in this way makes column level locking easy,
and the delta approach minimizes the network traffic.


On 27/07/11 16:15, Selcuk Bozdag wrote:
> Hi,
> 
> The question I am asking is a bit about the design of a data access
> objects layer on top of a cassandra datastore.
> 
> I had a look at project Kundera which basically implies JPA approach
> by creating POJOs and mapping them through annotations. Looks
> promising but what if I would like to have new columns inside the data
> store(i.e. an attribute of a pojo class) without changing the
> codebase? How could I achieve such an extensiblity on the DAO layer?
> What would be the most relevant design pattern?
> 
> Please do not take the question regarding Kundera, it was just an
> example. The eventual design should let me CRUD POJOs while gaining an
> ability to be extended.
> 
> Regards,
> 
> Selcuk
> 



Expanding 0.6.x cluster to multiple datacenters

2011-07-27 Thread Ashley Martens
I have a current 0.6.x cluster in a single datacenter with RackUnaware and
am looking to expand into a second data center. I know I need to change to
RackAwareStrategy however, I'm not sure what will happen to my data when I
restart the nodes in the current cluster before I even add the new DC. Will
the data need to move based on the rack each node is in or will it stay on
the node it is currently on? Also, when I start adding nodes in the new DC
to the cluster should they come in one at a time, like bootstrap, or should
I light up several at the same time to distribute the data?

For reference I have 19 nodes in my cluster.

Thanks.


Re: Expanding 0.6.x cluster to multiple datacenters

2011-07-27 Thread Jonathan Ellis
As you know, with 0.6 adding a datacenter is not as easy as 0.7 with
NetworkTopologyStrategy.  With 0.6 there is a right way that will work
with some manual effort, and a wrong way that can cause you major pain
and grief.

The right way:
- Switch to a DC-aware snitch but leave your cluster on RUS to start with.
- Bootstrap the 2nd datacenter nodes (halfway) in between your 1st
datacenter tokens, so your ring alternates DC1 DC2 DC1 DC2 etc.  Do
this one at a time for minimum disruption.  You should have equal node
counts in each DC because RAS will keep data in each DC about equal.
- Switch the cluster to RAS
- Start repair.  You will need to run repair on each node.  In 0.6 you
should only run repair against one node at a time.
- While repair is going on, you need to do reads at at least CL.QUORUM
or data may appear to be missing, since it's not yet in all the places
the new strategy will look.  (But by alternating DC around the ring, 2
of the 3 replicas are guaranteed to be the same for both RUS and RAS.)

The wrong way:
- Switch to RAS, then start adding nodes in the new DC.  As soon as
you add the first node in DC2, RAS will try to replicate ALL the rows
in DC1 to it.  Usually this overwhelms the DC2 node and it dies a
fiery death.

On Wed, Jul 27, 2011 at 7:44 PM, Ashley Martens  wrote:
> I have a current 0.6.x cluster in a single datacenter with RackUnaware and
> am looking to expand into a second data center. I know I need to change to
> RackAwareStrategy however, I'm not sure what will happen to my data when I
> restart the nodes in the current cluster before I even add the new DC. Will
> the data need to move based on the rack each node is in or will it stay on
> the node it is currently on? Also, when I start adding nodes in the new DC
> to the cluster should they come in one at a time, like bootstrap, or should
> I light up several at the same time to distribute the data?
>
> For reference I have 19 nodes in my cluster.
>
> Thanks.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Changing the CLI, not a great idea!

2011-07-27 Thread Edward Capriolo
Notice how users react to these things. At one point for example we decided
to add a ';' to the CLI.

http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412

The initial chapters on downloading and installing the product get the
reader started using Cassandra immediately. Then the bumps in the road
appear. The current version of Cassandra requires a semicolon to end each
statement in the command line interface (CLI) client - that 's missing in
the book. This is noted in the errata on the O'Reilly site as "unconfirmed",
and if you're coming from a MySQL/Oracle background it's something you might
try, otherwise, it's frustrating.

It confuses users and here is the kicker:

[default@app] use app; use app;
Syntax error at position 9: missing EOF at 'use'

You can not even put two statements on the same line. So the ';' is semi
useless syntax.

What have learned we learned from this?

[default@app] set people['ecapriolo']['last']='capriolo';
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'last' as hex
bytes

Ow dear... are you saying...
[default@app] set
people[ascii('ecapriolo')][ascii('last')]=ascii('capriolo');
Value inserted.

Is their a way to move things forward without hurting backwards
compatibility of the CLI?

Edward


Re: Changing the CLI, not a great idea!

2011-07-27 Thread Jonathan Ellis
On Wed, Jul 27, 2011 at 10:53 PM, Edward Capriolo  wrote:
> You can not even put two statements on the same line. So the ';' is semi
> useless syntax.

Nobody ever asked for that, but lots of people asked to allow
statements spanning multiple lines.

> Is their a way to move things forward without hurting backwards
> compatibility of the CLI?

Yes.  Create a new one based on CQL but leave the old one around.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Changing the CLI, not a great idea!

2011-07-27 Thread Edward Capriolo
On Thu, Jul 28, 2011 at 12:01 AM, Jonathan Ellis  wrote:

> On Wed, Jul 27, 2011 at 10:53 PM, Edward Capriolo 
> wrote:
> > You can not even put two statements on the same line. So the ';' is semi
> > useless syntax.
>
> Nobody ever asked for that, but lots of people asked to allow
> statements spanning multiple lines.
>
> > Is their a way to move things forward without hurting backwards
> > compatibility of the CLI?
>
> Yes.  Create a new one based on CQL but leave the old one around.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

On a semi related note. How can you update a column family and add an index?

[default@app] create column family people;
4e3310c0-b8d1-11e0--242d50cf1f9f
Waiting for schema agreement...
... schemas agree across the cluster
[default@app] update column family people with column_metadata = [{
column_name : ascii(inserted_at), validation_class : LongType , index_type :
0 , index_name : ins_idx}];
org.apache.cassandra.db.marshal.MarshalException: cannot parse
'FUNCTION_CALL' as hex bytes
[default@app] update column family people with column_metadata = [{
column_name : inserted_at, validation_class : LongType , index_type : 0 ,
index_name : ins_idx}];
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'inserted_at'
as hex bytes

Edward