Re: OPP seems completely unsupported in Cassandra 1.2.5

2013-07-23 Thread Cyril Scetbon
AFAIK, OPP is no longer supported and you should use ByteOrderedPartitioner 
(support of non-UTF characters too) instead :

see http://www.datastax.com/docs/1.2/cluster_architecture/partitioners
-- 
Cyril SCETBON

On Jul 22, 2013, at 4:10 PM, Vara Kumar  wrote:

> We were using 0.7.6 version. And upgraded to 1.2.5 today. We were using OPP 
> (OrderPreservingPartitioner).
> 
> OPP throws error when any node join the cluster. Cluster can not be brought 
> up due to this error. After digging little deep, We realized that "peers" 
> column family is defined with key as type "inet". Looks like many other 
> column families in system keyspace has same issue.
> 
> - I know that OPP is deprecated. Is it that OPP completely unsupported? Is it 
> stated in upgrade instructions or some where? Did we miss it?
> - I could not find any related discussion or jira records about similar issue.
> 
> 
> Exception trace:
> java.lang.RuntimeException: The provided key was not UTF8 encoded.
>   at 
> org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:172)
>   at 
> org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44)
>   at org.apache.cassandra.db.Table.apply(Table.java:379)
>   at org.apache.cassandra.db.Table.apply(Table.java:353)
>   at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:258)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:117)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.processInternal(QueryProcessor.java:172)
>   at 
> org.apache.cassandra.db.SystemTable.updatePeerInfo(SystemTable.java:258)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1231)
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1948)
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:823)
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:901)
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
>   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124)
>   at 
> org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:168)
> 



Re: Safely adding new nodes without losing data

2013-07-23 Thread aaron morton
I think you are correct. 

When the new node starts it randomly selects tokens, which result in a random 
set of token ranges being transferred from other nodes. 

For each pending range the existing token ranges in the cluster are searched to 
find one that contains the range we want to transfer. A list of all replicas 
for this (exiting) range is created and sorted by proximity. The first endpoint 
in the list will be used. 
   
The bit I am unsure of is if it's possible for the replicas of row to move from 
A, B. C to D, E, F. 

Anyone else help out ? 

Cheers


-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/07/2013, at 9:21 AM, sankalp kohli  wrote:

> Interesting... I guess you have to add one node at a time and run repair on 
> it. 
> 
> 
> On Sat, Jul 20, 2013 at 7:30 AM, E S  wrote:
> I am trying to understand the best procedure for adding new nodes.  The one 
> that I see most often online seems to have a hole where there is a low 
> probability of permanently losing data.  I want to understand what I am 
> missing in my understanding.
> 
> Let's say I have a 3 node cluster (node A,B,C) with a RF of 3.  I want to 
> double the cluster size to 6 (node A,B,C,D,E,F) while keeping the replication 
> factor of 3.  Let's assume we use vnodes.
> 
> My understanding is to bootstrap the 3 nodes and then run repair then 
> cleanup.  Here is my failure case:
> 
> Before bootstrapping I have a row that is only replicated onto node A and B.  
> Assume I did a quorum write and there was some hiccup on C, hinted handoff 
> didn't work, and a repair has not yet been run.  Let's also assume that once 
> nodes D,E, F have been bootstrapped, this rows new replicas are D,E, and F.
> 
> My reading through the bootstrapping code shows that for a given range, it 
> streams it only from one node (unlike repair).  There is a 1/9 chance that 
> D,E,F will have streamed the range containing the row from C, which does not 
> have this row.
> 
> Now not even a consistency level read of ALL will return the row.  A repair 
> will not solve it, and when cleanup is run, the row is permanently deleted.
> 
> I don't think this problem would normally happen without vnodes, because when 
> doubling you would alternate the new nodes with the old nodes in the ring, so 
> while quorum might not work until the final repair, "all" would, and a repair 
> would solve the problem.  With vnodes though, some of the ranges will follow 
> the pattern above (range ownership moving from A,B,C to D,E,F).
> 
> Am I missing something here?  If I'm right, I think the only way to avoid 
> this is adding less then a quorum of new nodes (in this case 1) before doing 
> a repair.  That would be painful since repairs take a while.
> 
> Thanks for any help.
> 
> Eddie
> 
> 
> 



Re: funnel analytics, how to query for reports etc.

2013-07-23 Thread aaron morton
For background on rollup analytics:

Twitter Rainbird  
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
Acunu http://www.acunu.com/

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 1:03 AM, Vladimir Prudnikov  wrote:

> This can be done easily,
> 
> Use normal column family to store the sequence of events where key is session 
> #ID identifying one use interaction with a website, column names are TimeUUID 
> values and column value id of the event (do not write something like "user 
> added product to shopping cart", something shorter identifying this event).
> 
> Then you can use counter column family to store counters, you can count 
> anything, number of sessions, total number of events, number of particular 
> events etc. One row per day for example. Then you can retrieve this row and 
> calculate all required %.
> 
> 
> On Sun, Jul 21, 2013 at 1:05 AM, S Ahmed  wrote:
> Would cassandra be a good choice for creating a funnel analytics type product 
> similar to mixpanel?
> 
> e.g.  You create a set of events and store them in cassandra for things like:
> 
> event#1 user visited product page
> event#2 user added product to shopping cart
> event#3 user clicked on checkout page
> event#4 user filled out cc information
> event#5 user purchased product
> 
> Now in my web application I track each user and store the events somehow in 
> cassandra (in some column family etc)
> 
> Now how will I pull a report that produces results like:
> 
> 70% of people added to shopping cart
> 20% checkout page
> 10% filled out cc information
> 4% purchased the product
> 
> 
> And this is for a Saas, so this report would be for thousands of customers in 
> theory.
> 
> 
> 
> -- 
> Vladimir Prudnikov



Re: How to avoid inter-dc read requests

2013-07-23 Thread aaron morton
> All the read/write request are issued with CL local quorum, but still 
> there're a lot of inter-dc read request.
> 
How are you measuring this ?

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 8:41 AM, sankalp kohli  wrote:

> Slice query does not trigger background read repair. 
> Implement Read Repair on Range Queries
> 
> 
> On Sun, Jul 21, 2013 at 1:40 PM, sankalp kohli  wrote:
> There can be multiple reasons for that
> 1) Background read repairs.
> 2) Your data is not consistent and leading to read repairs. 
> 3) For writes, irrespective of the consistency used, a single write request 
> will goto other DC
> 4) You might be running other nodetools commands like repair.
> read_repair_chance¶
> 
> (Default: 0.1 or 1) Specifies the probability with which read repairs should 
> be invoked on non-quorum reads. The value must be between 0 and 1. For tables 
> created in versions of Cassandra before 1.0, it defaults to 1. For tables 
> created in versions of Cassandra 1.0 and higher, it defaults to 0.1. However, 
> for Cassandra 1.0, the default is 1.0 if you use CLI or any Thrift client, 
> such as Hector or pycassa, and is 0.1 if you use CQL.
> 
> 
> 
> On Sun, Jul 21, 2013 at 10:26 AM, Omar Shibli  wrote:
> One more thing, I'm doing a lot of key slice read requests, is that supposed 
> to change anything? 
> 
> 
> On Sun, Jul 21, 2013 at 8:21 PM, Omar Shibli  wrote:
> I'm seeing a lot of inter-dc read requests, although I've followed DataStax 
> guidelines for multi-dc deployment 
> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
> 
> Here is my setup:
> 2 data centers within the same region (AWS)
> Targeting DC, RP 3, 6 nodes
> Analytic DC, RP 3, 11 nodes
> 
> All the read/write request are issued with CL local quorum, but still 
> there're a lot of inter-dc read request.
> Any suggestion, or am I missing something?
> 
> Thanks in advance,
> 
> 
> 



Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread aaron morton
> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before being
> memory-bound/”
What is the source for that ? 

> This is because everything is
> *first *written to the commit log *on disk*. Any thoughts??
Pretty much. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 8:58 AM, hajjat  wrote:

> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before being
> memory-bound/”
> 
> However, from reading the documentation
> (http://www.datastax.com/docs/1.2/dml/about_writes) it seems the disk is the
> real bottleneck in Writes rather than the CPU. This is because everything is
> *first *written to the commit log *on disk*. Any thoughts??
> 
> 
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: CPU Bound Writes

2013-07-23 Thread aaron morton
That's very old documentation, try using the current docs.

Although the statement is syntactically correct, it will become CPU bound 
before becoming memory bound. That statement says nothing about the IO use. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 9:00 AM, Mohammad Hajjat  wrote:

> Aaron, here is the source: 
> http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning
> 
> Thanks!
> 
> 
> On Sun, Jul 21, 2013 at 4:57 PM, aaron morton  wrote:
> > Wouldn't this make Writes disk-bound then? I think the documentation may 
> > have been a bit misleading then "Insert-heavy workloads will actually be 
> > CPU-bound in Cassandra before being memory-bound"?
> What is the source of the quote ?
> 
> Cheers
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/07/2013, at 4:27 AM, Mohammad Hajjat  wrote:
> 
> > Thanks/Shukran, Jon! :)
> >
> > Wouldn't this make Writes disk-bound then? I think the documentation may 
> > have been a bit misleading then "Insert-heavy workloads will actually be 
> > CPU-bound in Cassandra before being memory-bound"?
> >
> >
> >
> >
> > On Sat, Jul 20, 2013 at 12:12 PM, Jonathan Haddad 
> >  wrote:
> > Everything is written to the commit log. In the case of a crash, cassandra 
> > recovers by replaying the log.
> >
> >
> > On Sat, Jul 20, 2013 at 9:03 AM, Mohammad Hajjat  wrote:
> > Patricia,
> >
> > Thanks for the info. So are you saying that the *whole* data is being 
> > written on disk in the commit log, not just some sort of a summary/digest?
> > I'm writing 10MB objects and I'm noticing high latency (250 milliseconds 
> > even with ANY consistency), so I guess that explains my high delays?
> >
> > Thanks,
> > Mohammad
> >
> >
> > On Fri, Jul 19, 2013 at 2:17 PM, Patricia Gorla  
> > wrote:
> > Kanwar,
> >
> > This is because writes are appends to the commit log, which is stored on 
> > disk, not memory. The commit log is then flushed to the memtable (in 
> > memory), before being written to an sstable on disk.
> >
> > So, most of the actions in sending out a write are writing to disk.
> >
> > Also see: http://www.datastax.com/docs/1.2/dml/about_writes
> >
> > Patricia
> >
> >
> > On Fri, Jul 19, 2013 at 1:05 PM, Kanwar Sangha  wrote:
> > “Insert-heavy workloads will actually be CPU-bound in Cassandra before 
> > being memory-bound”
> >
> >
> >
> > Can someone explain why the internals of why writes are CPU bound ?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Mohammad Hajjat
> > Ph.D. Student
> > Electrical and Computer Engineering
> > Purdue University
> >
> >
> >
> >
> > --
> > Mohammad Hajjat
> > Ph.D. Student
> > Electrical and Computer Engineering
> > Purdue University
> 
> 
> 
> 
> -- 
> Mohammad Hajjat
> Ph.D. Student
> Electrical and Computer Engineering
> Purdue University



Re: OPP seems completely unsupported in Cassandra 1.2.5

2013-07-23 Thread Sylvain Lebresne
> - I know that OPP is deprecated. Is it that OPP completely unsupported? Is
> it stated in upgrade instructions or some where? Did we miss it?
>

Basically yes, OPP is not going to work in 1.2 because of the System
tables. I don't think you'll find any upgrade instructions anywhere because
to be honest, I don't think anyone realized that these new system tables
would break OPP because I think everyone had forgotten than OPP was
existing (rather, when we talk of OPP, we generally imply
ByteOrderedPartitioner).  It's been deprecated for a long time.


> - I could not find any related discussion or jira records about similar
> issue.
>

I'd suggest opening a JIRA. It's too late to change the fact that OPP is
now unsupported I'm afraid, but we should probably make it official by
removing it from the code and documenting that it's unsupported. And it'll
make a reference if anyone else run into that problem.

As for upgrade options, if you are *sure* that all your row keys are
actually ascii, I think switching to ByteOrderedPartitioner would actually
be okay. Otherwise, note sure there is much other choice that manually
migrating all data to a brand new cluster :(

--
Sylvain


Re: Socket buffer size

2013-07-23 Thread aaron morton
>  Has anyone tried configuring the (internode_send_buff_size_in_bytes) 
> parameter?
> 
> Here is the Traceback (most recent call last):
Are you setting this on the client or the server ? 

It's a server side setting from the cassandra.yaml file. 

Cheers
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 12:15 PM, Mohammad Hajjat  wrote:

> For (rpc_send_buff_size_in_bytes), I was able to try many values of this 
> parameter. However, whenever I tried to configure 
> (internode_send_buff_size_in_bytes) Cassandra kept crashing. Has anyone tried 
> configuring the (internode_send_buff_size_in_bytes) parameter?
> 
> Here is the Traceback (most recent call last):
>   File "/home/ubuntu/Twissandra_direct/twissandra/manage.py", line 11, in 
> 
> execute_manager(settings)
>   File 
> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", 
> line 438, in execute_manager
> utility.execute()
>   File 
> "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", 
> line 379, in execute
> self.fetch_command(subcommand).run_from_argv(self.argv)
>   File 
> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 
> 191, in run_from_argv
> self.execute(*args, **options.__dict__)
>   File 
> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 
> 220, in execute
> output = self.handle(*args, **options)
>   File 
> "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 
> 351, in handle
> return self.handle_noargs(**options)
>   File 
> "/home/ubuntu/Twissandra_direct/twissandra/tweets/management/commands/sync_cassandra.py",
>  line 9, in handle_noargs
> sys = SystemManager(server='localhost:9160')
>   File "/usr/local/lib/python2.7/dist-packages/pycassa/system_manager.py", 
> line 70, in __init__
> self._conn = Connection(None, server, framed_transport, timeout, 
> credentials)
>   File "/usr/local/lib/python2.7/dist-packages/pycassa/connection.py", line 
> 33, in __init__
> self.transport.open()
>   File 
> "/usr/local/lib/python2.7/dist-packages/thrift/transport/TTransport.py", line 
> 261, in open
> return self.__trans.open()
>   File "/usr/local/lib/python2.7/dist-packages/thrift/transport/TSocket.py", 
> line 99, in open
> message=message)
> 
> 
> 
> 
> On Sat, Jul 20, 2013 at 6:31 PM, Shahab Yunus  wrote:
> I think the former is for client communication to the nodes and the latter 
> for communication between nodes themselves as evident by the name of the 
> property. Please feel free to correct me if I am wrong.
> 
> Regards,
> Shahab
> 
> 
> On Saturday, July 20, 2013, Mohammad Hajjat wrote:
> Hi,
> 
> What's the difference between: rpc_send_buff_size_in_bytes and 
> internode_send_buff_size_in_bytes?
> 
> I need to set my TCP socket buffer size (for both transmit/receive) to a 
> given value and I wasn't sure of the relation between these two 
> configurations. Is there any recommendation? Do they have to be equal, one 
> less than another, etc.?
> 
> The documentation here is not really helping much! 
> http://www.datastax.com/docs/1.2/configuration/node_configuration#rpc-send-buff-size-in-bytes
> 
> Thanks!
> -- 
> Mohammad Hajjat
> Ph.D. Student
> Electrical and Computer Engineering
> Purdue University
> 
> 
> 
> -- 
> Mohammad Hajjat
> Ph.D. Student
> Electrical and Computer Engineering
> Purdue University



Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

2013-07-23 Thread aaron morton
>> I really don't think I have more than 500 million rows ... any smart way to
>> count rows number inside the ks?
use the output from nodetool cfstats, it has a row count and bloom filter size 
for each CF. 

You may also want to upgrade to 1.1 to get global cache management, that can 
make things easier to manage. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/07/2013, at 6:26 AM, Nate McCall  wrote:

> Do you have a copy of the specific stack trace? Given the version and
> CL behavior, one thing you may be experiencing is:
> https://issues.apache.org/jira/browse/CASSANDRA-4578
> 
> On Mon, Jul 22, 2013 at 7:15 AM, cbert...@libero.it  
> wrote:
>> Hi Aaron, thanks for your help.
>> 
>>> If you have more than 500Million rows you may want to check the
>> bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.)
>> number is > 0.01 for sized tiered.
>> 
>> I really don't think I have more than 500 million rows ... any smart way to
>> count rows number inside the ks?
>> 
 Now a question -- why with 2 nodes offline all my application stop
>> providing
 the service, even when a Consistency Level One read is invoked?
>> 
>>> What error did the client get and what client are you using ?
>>> it also depends on if/how the node fails. The later versions try to shut 
>>> down
>> when there is an OOM, not sure what 1.0 does.
>> 
>> The exception was a TTransportException -- I am using Pelops client.
>> 
>>> Is the node went into a zombie state the clients may have been timing out.
>> The should then move onto to another node.
>>> If it had started shutting down the client should have gotten some immediate
>> errors.
>> 
>> It didn't shut down, it was more like in a zombie state,
>> One more question: I'm experiencing some wrong counters (which are very
>> important in my platform since the are used to keep user-points and generate
>> the TopX users) --could it be related with this problem? The problem is that 
>> in
>> some users (not all) the counter column increased its value.
>> 
>> After such a crash in 1.0 is there any best-practice to follow? (nodetool or
>> something?)
>> 
>> Cheers,
>> Carlo
>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> -
>>> Aaron Morton
>>> Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:
>>> 
 Hi all,
 I'm experiencing some problems after 3 years of cassandra in production
>> (from
 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory
 Exception.
 In the log I can read the warn about the few heap available ... now I'm
 increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing
>> the
 size of rows and memtables thresholds. Other tips?
 
 Now a question -- why with 2 nodes offline all my application stop
>> providing
 the service, even when a Consistency Level One read is invoked?
 I'd expected this behaviour:
 
 CL1 operations keep working
 more than 80% of CLQ operations working (nodes offline where 2 and 5 in a
 clockwise key distribution only writes to fifth node should impact to node
>> 2)
 most of all CLALL operations (that I don't use) failing
 
 The situation instead was that I had ALL services stop responding throwing
>> a
 TTransportException ...
 
 Thanks in advance
 
 Carlo
>>> 
>>> 
>> 
>> 



Re: Cassandra Out of Memory on startup while reading cache

2013-07-23 Thread aaron morton
As a work around remove the key / row caches before startup. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/07/2013, at 6:44 AM, Janne Jalkanen  wrote:

> 
> Sounds like this: https://issues.apache.org/jira/browse/CASSANDRA-5706, which 
> is fixed in 1.2.7.
> 
> /Janne
> 
> On 22 Jul 2013, at 20:40, Jason Tyler  wrote:
> 
>> Hello,
>> 
>> Since upgrading from 1.1.9 to 1.2.6 over the last week, we've had two 
>> instances where cassandra was unable, but kept trying to restart:
>> 
>> SNIP
>>  INFO [main] 2013-07-19 16:12:36,769 AutoSavingCache.java (line 140) reading 
>> saved cache /var/cassandra/caches/SyncCore-CommEvents-KeyCache-b.db
>> ERROR [main] 2013-07-19 16:12:36,966 CassandraDaemon.java (line 458) 
>> Exception encountered during startup
>> java.lang.OutOfMemoryError: Java heap space
>> at 
>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
>> at 
>> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
>> at 
>> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:379)
>> at 
>> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:145)
>> at 
>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:266)
>> at 
>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:382)
>> at 
>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:354)
>> at org.apache.cassandra.db.Table.initCf(Table.java:329)
>> at org.apache.cassandra.db.Table.(Table.java:272)
>> at org.apache.cassandra.db.Table.open(Table.java:109)
>> at org.apache.cassandra.db.Table.open(Table.java:87)
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:271)
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:441)
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:484)
>>  INFO [main] 2013-07-19 16:12:43,288 CassandraDaemon.java (line 118) Logging 
>> initialized
>> SNIP
>> 
>> This is new behavior with 1.2.6.  
>> 
>> Stopping cassandra, moving the offending file, then starting cassandra does 
>> succeed.  
>> 
>> Any config suggestions (key cache config?) to prevent this from happening?
>> 
>> THX
>> 
>> 
>> Cheers,
>> 
>> ~Jason
> 



Re: memtable overhead

2013-07-23 Thread aaron morton
An empty memtable would not take up much space, a few KB I would assume.

However they are considered in the calculations that control how frequently to 
flush to disk. The more CF's, even if they do not have data, the more 
frequently you will flush to disk. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/07/2013, at 6:06 PM, Michał Michalski  wrote:

> Not sure how up-to-date this info is, but from some discussions that happened 
> here long time ago I remember that a minimum of 1MB per Memtable needs to be 
> allocated.
> 
> The other constraint here is memtable_total_space_in_mb setting in 
> cassandra.yaml, which you might wish to tune when having a lot of CFs.
> 
> M.
> 
> W dniu 23.07.2013 07:12, Darren Smythe pisze:
>> The way weve gone about our data models has resulted in lots of column
>> families and just looking for guidelines about how much space each column
>> table adds.
>> 
>> TIA
>> 
>> 
>> On Sun, Jul 21, 2013 at 11:19 PM, Darren Smythe wrote:
>> 
>>> Hi,
>>> 
>>> How much overhead (in heap MB) does an empty memtable use? If I have many
>>> column families that aren't written to often, how much memory do these take
>>> up?
>>> 
>>> TIA
>>> 
>>> -- Darren
>>> 
>> 
> 



About column family

2013-07-23 Thread bjbylh
Hi all:
i have two questions to ask:
1,how many column families can be created in a cluster?is there a limit to the 
number of it?
2,it spents 2-5 seconds to create a new cf while the cluster contains about 
1 cfs(if the cluster is empty,it spents about 0.5s).is it normal?how to 
improve the efficiency of creating cf?
btw C* is 1.2.4.
thanks a lot.

Sent from Samsung Mobile

Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Out of curiosity, what version of hadoop are you using with cassandra?  I think 
we are trying 0.20.2 if I remember(I have to ask my guy working on it to be 
sure).  I do remember him saying the cassandra maven dependency was odd in that 
it is in the older version and not a newer hadoop version.

We are using RandomPartitioner though right now which I have personally used in 
the past with success.  We are in the process of map/reducing to a cassandra 
with MurmurPartitioner  (our real reason to map/reduce is some refactorings in 
our model though and we just thought we would switch to murmur).

Has anyone else used map/reduce with murmur partitioner?

Dean

From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, July 22, 2013 4:04 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: cassandra 1.2.6 -> Start key's token sorts after end token

Hello,

I am trying to figure what might be cause this error. I am using Cassandra 
1.2.6 (tried with 1.2.3 as well) and I am trying to read data from cassandra on 
hadoop using column family input format. I also got the same error using pure 
astyanax on a test.
I am using Murmur3Partitioner and I created the keyspace using Cassandra 
1.2.6, there is nothing from prior versions. I created the keyspace with 
SimpleStrategy and replication factor 1.
Here is the exception I am getting:
2013-07-22 21:53:05,824 WARN org.apache.hadoop.mapred.Child (main): Error 
running child
java.lang.RuntimeException: InvalidRequestException(why:Start key's token sorts 
after end token)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:459)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:522)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:547)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: InvalidRequestException(why:Start key's token sorts after end token)
at 
org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14168)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)
... 16 more
2013-07-22 21:53:05,828 INFO org.apache.hadoop.mapred.Task (main): Runnning 
cleanup for the task

 Any hint?

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: How to avoid inter-dc read requests

2013-07-23 Thread Omar Shibli
I simply monitor the load avg of the nodes using opscenter.
I started with idle nodes (by idle I mean load avg of all nodes < 1.0),
then started to run a lot of key slice read requests on *"analytic" DC *with
CL local quorum (I also made sure that the client worked with only
with analytic DC), after a few minutes I noticed that the load avg of all
the nodes increased dramatically (>10).

Thanks in Advance Aaron,

On Tue, Jul 23, 2013 at 12:02 PM, aaron morton wrote:

> > All the read/write request are issued with CL local quorum, but still
> there're a lot of inter-dc read request.
> >
> How are you measuring this ?
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/07/2013, at 8:41 AM, sankalp kohli  wrote:
>
> > Slice query does not trigger background read repair.
> > Implement Read Repair on Range Queries
> >
> >
> > On Sun, Jul 21, 2013 at 1:40 PM, sankalp kohli 
> wrote:
> > There can be multiple reasons for that
> > 1) Background read repairs.
> > 2) Your data is not consistent and leading to read repairs.
> > 3) For writes, irrespective of the consistency used, a single write
> request will goto other DC
> > 4) You might be running other nodetools commands like repair.
> > read_repair_chance¶
> >
> > (Default: 0.1 or 1) Specifies the probability with which read repairs
> should be invoked on non-quorum reads. The value must be between 0 and 1.
> For tables created in versions of Cassandra before 1.0, it defaults to 1.
> For tables created in versions of Cassandra 1.0 and higher, it defaults to
> 0.1. However, for Cassandra 1.0, the default is 1.0 if you use CLI or any
> Thrift client, such as Hector or pycassa, and is 0.1 if you use CQL.
> >
> >
> >
> > On Sun, Jul 21, 2013 at 10:26 AM, Omar Shibli 
> wrote:
> > One more thing, I'm doing a lot of key slice read requests, is that
> supposed to change anything?
> >
> >
> > On Sun, Jul 21, 2013 at 8:21 PM, Omar Shibli 
> wrote:
> > I'm seeing a lot of inter-dc read requests, although I've followed
> DataStax guidelines for multi-dc deployment
> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
> >
> > Here is my setup:
> > 2 data centers within the same region (AWS)
> > Targeting DC, RP 3, 6 nodes
> > Analytic DC, RP 3, 11 nodes
> >
> > All the read/write request are issued with CL local quorum, but still
> there're a lot of inter-dc read request.
> > Any suggestion, or am I missing something?
> >
> > Thanks in advance,
> >
> >
> >
>
>


Re: sstable size change

2013-07-23 Thread Keith Wright
Can you elaborate on what you mean by "let it take its own course organically"? 
 Will Cassandra force any newly compacted files to my new setting as 
compactions are naturally triggered?

From: sankalp kohli mailto:kohlisank...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, July 22, 2013 4:48 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: sstable size change

You can remove the json file and that will be treated as all sstables are now 
in L0. Since you have lot of data, the compaction will take a very long time. 
See the comment below directly from Cassandra code. If you chose to do this, 
you might want to increase the rate of compaction by usual means. If you are on 
spinning, then it might be a very big problem.
During the time of compaction, the read performance will be impacted.

Unless there is a very urgent need to change the sstable size, I would change 
the size and let it take it own course organically.



// LevelDB gives each level a score of how much data it contains vs its ideal 
amount, and
// compacts the level with the highest score. But this falls apart 
spectacularly once you
// get behind.  Consider this set of levels:
// L0: 988 [ideal: 4]
// L1: 117 [ideal: 10]
// L2: 12  [ideal: 100]
//
// The problem is that L0 has a much higher score (almost 250) than L1 
(11), so what we'll
// do is compact a batch of MAX_COMPACTING_L0 sstables with all 117 L1 
sstables, and put the
// result (say, 120 sstables) in L1. Then we'll compact the next batch 
of MAX_COMPACTING_L0,
// and so forth.  So we spend most of our i/o rewriting the L1 data 
with each batch.
//
// If we could just do *all* L0 a single time with L1, that would be 
ideal.  But we can't
// -- see the javadoc for MAX_COMPACTING_L0.
//
// LevelDB's way around this is to simply block writes if L0 compaction 
falls behind.
// We don't have that luxury.
//
// So instead, we
// 1) force compacting higher levels first, which minimizes the i/o 
needed to compact
//optimially which gives us a long term win, and
// 2) if L0 falls behind, we will size-tiered compact it to reduce read 
overhead until
//we can catch up on the higher levels.
//
// This isn't a magic wand -- if you are consistently writing too fast 
for LCS to keep
// up, you're still screwed.  But if instead you have intermittent 
bursts of activity,
// it can help a lot.


On Mon, Jul 22, 2013 at 12:51 PM, Andrew Bialecki 
mailto:andrew.biale...@gmail.com>> wrote:
My understanding is deleting the .json metadata file is the only way currently. 
If you search the user list archives, there are folks who are building tools to 
force compaction and rebuild sstables with the new size. I believe there's been 
a bit of talk of potentially including those tools as a pat of a future release.

Also, to answer your question about bloom filters, those are handled 
differently and if you run upgradesstables after altering the BF FP ratio, that 
will rebuild the BFs for each sstable.


On Mon, Jul 22, 2013 at 2:49 PM, Janne Jalkanen 
mailto:janne.jalka...@ecyrd.com>> wrote:

I don't think upgradesstables is enough, since it's more of a "change this file 
to a new format but don't try to merge sstables and compact" -thing.

Deleting the .json -file is probably the only way, but someone more familiar 
with cassandra LCS might be able to tell whether manually editing the json file 
so that you drop all sstables a level might work? Since they would overflow the 
new level, they would compact soon, but the impact might be less drastic than 
just deleting the .json file (which takes everything to L0)...

/Janne

On 22 Jul 2013, at 16:02, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:

Hi all,

   I know there has been several threads recently on this but I wanted to make 
sure I got a clear answer:  we are looking to increase our SSTable size for a 
couple of our LCS tables as well as chunk size (to match the SSD block size).   
The largest table is at 500 GB across 6 nodes (RF 3, C* 1.2.4 VNodes).  I 
wanted to get feedback on the best way to make this change with minimal load 
impact on the cluster.  After I make the change, I understand that I need to 
force the nodes to re-compact the tables.

Can this be done via upgrade sstables or do I need to shutdown the node, delete 
the .json file, and restart as some have suggested?

I assume I can do this one node at a time?

If I change the bloom filter size, I assume I will need to force compaction 
again?  Using the same methodology?

Thank you





Re: NPE in CompactionExecutor

2013-07-23 Thread Paul Ingalls
I'm running the latest from the 1.2 branch as of a few days ago.  I needed one 
of the patches that will be in 1.2.7

There was no error stack, just that line in the log.

I wiped the database (deleted all the files in the lib dir) and restarted my 
data load, and am consistently running into  the incorrect row data size error, 
almost immediately…  It seems to be specific to compacting large rows.  I have 
been unsuccessful in getting a large row to compact…

Paul

On Jul 21, 2013, at 1:42 PM, aaron morton  wrote:

> What version are you running ? 
> 
>> ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java 
>> (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
>> java.lang.NullPointerException
> What' the full error stack ? 
> 
>> Not sure if this is related or not, but I'm also getting a bunch of 
>> AssertionErrors as well, even after running a scrub…
>> 
>> ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java 
>> (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
>> java.lang.AssertionError: incorrect row data size 29502477 written to 
>> /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db;
>>  correct is 29725806
> Double check that the scrub was successful. 
> 
> If it's not detecting / fixing the problem look for previous log messages 
> from that thread  [CompactionExecutor:38] and see what sstables it was 
> compacting. Try remove those. But I would give scrub another chance to get it 
> sorted. 
> 
> Cheers
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/07/2013, at 5:04 AM, Paul Ingalls  wrote:
> 
>> I'm seeing a number of NullPointerExceptions in the log of my cluster.  You 
>> can see the log line below.  I'm thinking this is probably bad.  Any ideas?
>> 
>> ERROR [CompactionExecutor:38] 2013-07-19 17:01:34,494 CassandraDaemon.java 
>> (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
>> java.lang.NullPointerException
>> 
>> Not sure if this is related or not, but I'm also getting a bunch of 
>> AssertionErrors as well, even after running a scrub…
>> 
>> ERROR [CompactionExecutor:38] 2013-07-19 17:01:06,192 CassandraDaemon.java 
>> (line 192) Exception in thread Thread[CompactionExecutor:38,1,main]
>> java.lang.AssertionError: incorrect row data size 29502477 written to 
>> /mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_team/fanzo-tweets_by_team-tmp-ic-5262-Data.db;
>>  correct is 29725806
>>  at 
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>>  at 
>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>>  at 
>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>>  at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>  at 
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>>  at 
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>>  at 
>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>>  at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:724)
>> 
>> 
> 



Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread hajjat
On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via
cassandra-u...@incubator.apache.org] <
ml-node+s3065146n7589236...@n2.nabble.com> wrote:

> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before
> being
> memory-bound/”
>
> What is the source for that ?


http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning
​​

>
> This is because everything is
> *first *written to the commit log *on disk*. Any thoughts??
>
> Pretty much.
>
> Cheers
>
>-
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/07/2013, at 8:58 AM, hajjat <[hidden 
> email]>
> wrote:
>
> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before
> being
> memory-bound/”
>
> However, from reading the documentation
> (http://www.datastax.com/docs/1.2/dml/about_writes) it seems the disk is
> the
> real bottleneck in Writes rather than the CPU. This is because everything
> is
> *first *written to the commit log *on disk*. Any thoughts??
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191.html
> Sent from the [hidden 
> email]mailing list 
> archive at
> Nabble.com.
>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191p7589236.html
>  To unsubscribe from Are Writes disk-bound rather than CPU-bound?, click
> here
> .
> NAML
>



-- 
*Mohammad Hajjat*
*Ph.D. Student*
*Electrical and Computer Engineering*
*Purdue University*




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191p7589248.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: funnel analytics, how to query for reports etc.

2013-07-23 Thread S Ahmed
Thanks Aaron.

Too bad Rainbird isn't open sourced yet!


On Tue, Jul 23, 2013 at 4:48 AM, aaron morton wrote:

> For background on rollup analytics:
>
> Twitter Rainbird
> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
> Acunu http://www.acunu.com/
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/07/2013, at 1:03 AM, Vladimir Prudnikov 
> wrote:
>
> > This can be done easily,
> >
> > Use normal column family to store the sequence of events where key is
> session #ID identifying one use interaction with a website, column names
> are TimeUUID values and column value id of the event (do not write
> something like "user added product to shopping cart", something shorter
> identifying this event).
> >
> > Then you can use counter column family to store counters, you can count
> anything, number of sessions, total number of events, number of particular
> events etc. One row per day for example. Then you can retrieve this row and
> calculate all required %.
> >
> >
> > On Sun, Jul 21, 2013 at 1:05 AM, S Ahmed  wrote:
> > Would cassandra be a good choice for creating a funnel analytics type
> product similar to mixpanel?
> >
> > e.g.  You create a set of events and store them in cassandra for things
> like:
> >
> > event#1 user visited product page
> > event#2 user added product to shopping cart
> > event#3 user clicked on checkout page
> > event#4 user filled out cc information
> > event#5 user purchased product
> >
> > Now in my web application I track each user and store the events somehow
> in cassandra (in some column family etc)
> >
> > Now how will I pull a report that produces results like:
> >
> > 70% of people added to shopping cart
> > 20% checkout page
> > 10% filled out cc information
> > 4% purchased the product
> >
> >
> > And this is for a Saas, so this report would be for thousands of
> customers in theory.
> >
> >
> >
> > --
> > Vladimir Prudnikov
>
>


high write load, with lots of updates, considerations? tomestombed data coming back to life

2013-07-23 Thread S Ahmed
I was watching some videos from the C* summit 2013 and I recall many people
saying that if you can some up with a design where you don't preform
updates on rows, that would make things easier (I believe it was because
there would be less compaction).

When building an Analytics (time series) app on top of C*, based on
Twitters Rainbird design (
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011),
this means there will be lots and lots of counters.

With lots of counters (updates), admin wise, what are some things to
consider?

Could old tomestombed data somehow come back to life?  I forget what
scenerio brings about old data (kinda scary!).


Re: sstable size change

2013-07-23 Thread Robert Coli
On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright  wrote:

> Can you elaborate on what you mean by "let it take its own course
> organically"?  Will Cassandra force any newly compacted files to my new
> setting as compactions are naturally triggered?
>

You see, when two (or more!) SSTables love each other very much, they
sometimes decide they want to compact together..

But seriously, "yes." If you force all existing SSTables to level 0, it is
as if you just flushed them all. Level compaction then does a whole lot of
compaction, using the active table size.

=Rob


Re: About column family

2013-07-23 Thread Robert Coli
On Tue, Jul 23, 2013 at 3:23 AM, bjbylh  wrote:

> 1,how many column families can be created in a cluster?is there a limit to
> the number of it?
>

Low number of hundreds is highest practical. The limit in practice is
amount of heap, each CF consumes heap.


> 2,it spents 2-5 seconds to create a new cf while the cluster contains
> about 1 cfs(if the cluster is empty,it spents about 0.5s).is it
> normal?how to improve the efficiency of creating cf?
>
> Yes, that doesn't seem surprising to me. See above.

=Rob


Re: Safely adding new nodes without losing data

2013-07-23 Thread Robert Coli
  On Sat, Jul 20, 2013 at 7:30 AM, E S  wrote:

> I am trying to understand the best procedure for adding new nodes.  The
> one that I see most often online seems to have a hole where there is a low
> probability of permanently losing data.  I want to understand what I am
> missing in my understanding.
>

You aren't missing anything. Congratulations, you have deduced the
following bug :

https://issues.apache.org/jira/browse/CASSANDRA-2434

"Range movements violate consistency"

 I don't think this problem would normally happen without vnodes, because
> when doubling you would alternate the new nodes with the old nodes in the
> ring, so while quorum might not work until the final repair, "all" would,
> and a repair would solve the problem.  With vnodes though, some of the
> ranges will follow the pattern above (range ownership moving from A,B,C to
> D,E,F).
>

I believe this is correct, though adding nodes "between" other nodes is
just convention, there is nothing actually keeping you from being exposed
to 2434 if you do not do this.


> Am I missing something here?  If I'm right, I think the only way to avoid
> this is adding less then a quorum of new nodes (in this case 1) before
> doing a repair.  That would be painful since repairs take a while.
>

If you read/write at QUORUM, this is true. If you read/write at ONE, my
reading of 2434 is that any range movement is capable of losing data
because your data may be only on the node which has lost the range at the
end of the movement, which is not necessarily the host streaming its keys
to the bootstrapping node. Even running a repair before the movement only
reduces the chance of loss, it does not eliminate it.

=Rob


Re: About column family

2013-07-23 Thread Hiller, Dean
We use PlayOrm to have 60,000 VIRTUAL column families such that the performance 
is just fine ;).  You may want to try something like that.

Dean

From: Robert Coli mailto:rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, July 23, 2013 10:40 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, bjbylh 
mailto:bjb...@me.com>>
Subject: Re: About column family

On Tue, Jul 23, 2013 at 3:23 AM, bjbylh mailto:bjb...@me.com>> 
wrote:
1,how many column families can be created in a cluster?is there a limit to the 
number of it?

Low number of hundreds is highest practical. The limit in practice is amount of 
heap, each CF consumes heap.

2,it spents 2-5 seconds to create a new cf while the cluster contains about 
1 cfs(if the cluster is empty,it spents about 0.5s).is it normal?how to 
improve the efficiency of creating cf?

Yes, that doesn't seem surprising to me. See above.

=Rob


Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Alex Popescu
I see pretty much the same formulation in the 1.2 docs, so I'm wondering
what would be the best rewrite of that paragraph?


On Tue, Jul 23, 2013 at 9:00 AM, hajjat  wrote:

>
>
> On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via [hidden 
> email]]
> <[hidden email] >wrote:
>
>> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before
>> being
>> memory-bound/”
>>
>> What is the source for that ?
>
>
> http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning
> ​​
>
>>
>> This is because everything is
>> *first *written to the commit log *on disk*. Any thoughts??
>>
>> Pretty much.
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/07/2013, at 8:58 AM, hajjat <[hidden 
>> email]>
>> wrote:
>>
>> “/Insert-heavy workloads will actually be CPU-bound in Cassandra before
>> being
>> memory-bound/”
>>
>> However, from reading the documentation
>> (http://www.datastax.com/docs/1.2/dml/about_writes) it seems the disk is
>> the
>> real bottleneck in Writes rather than the CPU. This is because everything
>> is
>> *first *written to the commit log *on disk*. Any thoughts??
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191.html
>> Sent from the [hidden 
>> email]mailing list 
>> archive at
>> Nabble.com.
>>
>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191p7589236.html
>>  To unsubscribe from Are Writes disk-bound rather than CPU-bound?, click
>> here.
>> NAML
>>
>
>
>
> --
> *Mohammad Hajjat*
> *Ph.D. Student*
> *Electrical and Computer Engineering*
> *Purdue University*
>
> --
> View this message in context: Re: Are Writes disk-bound rather than
> CPU-bound?
>
> Sent from the cassandra-u...@incubator.apache.org mailing list 
> archiveat 
> Nabble.com.
>



-- 

:- a)


Alex Popescu
@al3xandru


Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
Out of curiosity, isn't what is really happening is this

"As writes keep coming in, memory fills up causing flushes to the commit log 
disk of the whole memtable.  In a bursting scenario, writes are thus limited 
only by memory and cpu in short bursting cases that tend to fit in memory.  In 
a more long window of constant writes, the writes become limited by the 
constant flushing to disk of the memtable"

Ie. We kind of have two scenarios in my mind if I got that right that is from 
my understanding that cassandra is write to memory first and asynchronously 
write to disk.

Later,
Dean

From: Alex Popescu mailto:al...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, July 23, 2013 12:07 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Cc: 
"cassandra-u...@incubator.apache.org"
 
mailto:cassandra-u...@incubator.apache.org>>
Subject: Re: Are Writes disk-bound rather than CPU-bound?

I see pretty much the same formulation in the 1.2 docs, so I'm wondering what 
would be the best rewrite of that paragraph?


On Tue, Jul 23, 2013 at 9:00 AM, hajjat 
mailto:haj...@purdue.edu>> wrote:


On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via [hidden 
email]] <[hidden 
email]> wrote:
“/Insert-heavy workloads will actually be CPU-bound in Cassandra before being
memory-bound/”
What is the source for that ?

http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning
​​

This is because everything is
*first *written to the commit log *on disk*. Any thoughts??
Pretty much.

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/07/2013, at 8:58 AM, hajjat <[hidden 
email]> wrote:

“/Insert-heavy workloads will actually be CPU-bound in Cassandra before being
memory-bound/”

However, from reading the documentation
(http://www.datastax.com/docs/1.2/dml/about_writes) it seems the disk is the
real bottleneck in Writes rather than the CPU. This is because everything is
*first *written to the commit log *on disk*. Any thoughts??



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191.html
Sent from the [hidden 
email] mailing list 
archive at Nabble.com.




If you reply to this email, your message will be added to the discussion below:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191p7589236.html
To unsubscribe from Are Writes disk-bound rather than CPU-bound?, click here.
NAML



--
Mohammad Hajjat
Ph.D. Student
Electrical and Computer Engineering
Purdue University


View this message in context: Re: Are Writes disk-bound rather than 
CPU-bound?

Sent from the cassandra-u...@incubator.apache.org mailing list 
archive at 
Nabble.com.



--

:- a)


Alex Popescu
@al3xandru


Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
Also, thinking about it more…..

There are constant write loads where if the cluster is big enough compared
to the write load that the flushes are not happening that often causing a
profile like that to be cpu bound.

In my mind it really is about what configuration you have.  If I have a
high write load and I increase my cluster size 10 fold, it may be the case
that I am limited by CPU as the memtable flushes are not happening as
often since writes are thinned out across the cluster.

Dean

On 7/23/13 12:12 PM, "Hiller, Dean"  wrote:

>Out of curiosity, isn't what is really happening is this
>
>"As writes keep coming in, memory fills up causing flushes to the commit
>log disk of the whole memtable.  In a bursting scenario, writes are thus
>limited only by memory and cpu in short bursting cases that tend to fit
>in memory.  In a more long window of constant writes, the writes become
>limited by the constant flushing to disk of the memtable"
>
>Ie. We kind of have two scenarios in my mind if I got that right that is
>from my understanding that cassandra is write to memory first and
>asynchronously write to disk.
>
>Later,
>Dean
>
>From: Alex Popescu mailto:al...@datastax.com>>
>Reply-To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Date: Tuesday, July 23, 2013 12:07 PM
>To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Cc: 
>"cassandra-u...@incubator.apache.orge.org>" 
>mailto:cassandra-user@incubator.apach
>e.org>>
>Subject: Re: Are Writes disk-bound rather than CPU-bound?
>
>I see pretty much the same formulation in the 1.2 docs, so I'm wondering
>what would be the best rewrite of that paragraph?
>
>
>On Tue, Jul 23, 2013 at 9:00 AM, hajjat
>mailto:haj...@purdue.edu>> wrote:
>
>
>On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via [hidden
>email]] <[hidden
>email]> wrote:
>“/Insert-heavy workloads will actually be CPU-bound in Cassandra before
>being
>memory-bound/”
>What is the source for that ?
>
>http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning
>​​
>
>This is because everything is
>*first *written to the commit log *on disk*. Any thoughts??
>Pretty much.
>
>Cheers
>
>-
>Aaron Morton
>Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 22/07/2013, at 8:58 AM, hajjat <[hidden
>email]> wrote:
>
>“/Insert-heavy workloads will actually be CPU-bound in Cassandra before
>being
>memory-bound/”
>
>However, from reading the documentation
>(http://www.datastax.com/docs/1.2/dml/about_writes) it seems the disk is
>the
>real bottleneck in Writes rather than the CPU. This is because everything
>is
>*first *written to the commit log *on disk*. Any thoughts??
>
>
>
>--
>View this message in context:
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Write
>s-disk-bound-rather-than-CPU-bound-tp7589191.html
>Sent from the [hidden
>email] mailing list
>archive at Nabble.com.
>
>
>
>
>If you reply to this email, your message will be added to the discussion
>below:
>http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Are-Write
>s-disk-bound-rather-than-CPU-bound-tp7589191p7589236.html
>To unsubscribe from Are Writes disk-bound rather than CPU-bound?, click
>here.
>NAMLlate/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.n
>aml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.Na
>bbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_su
>bscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_
>instant_email%21nabble%3Aemail.naml>
>
>
>
>--
>Mohammad Hajjat
>Ph.D. Student
>Electrical and Computer Engineering
>Purdue University
>
>
>View this message in context: Re: Are Writes disk-bound rather than
>CPU-bound?m/Are-Writes-disk-bound-rather-than-CPU-bound-tp7589191p7589248.html>
>
>Sent from the cassandra-u...@incubator.apache.org mailing list
>archive
> at Nabble.com.
>
>
>
>--
>
>:- a)
>
>
>Alex Popescu
>@al3xandru



Unable to describe table in CQL 3

2013-07-23 Thread Rahul Gupta
I am using Cassandra ver 1.1.9.7
Created a Column Family using Cassandra-cli.

create column family events
with comparator = 'CompositeType(DateType,UTF8Type)'
and key_validation_class = 'UUIDType'
and default_validation_class = 'UTF8Type';

I can describe this CF using CQL2 but getting error when trying the same 
describe with CQL 3

cqlsh:CQI> desc table events;

/usr/lib/python2.6/site-packages/cqlshlib/cql3handling.py:852: 
UnexpectedTableStructure: Unexpected table structure; may not translate 
correctly to CQL. expected composite key CF to have column aliases, but found 
none
/usr/lib/python2.6/site-packages/cqlshlib/cql3handling.py:875: 
UnexpectedTableStructure: Unexpected table structure; may not translate 
correctly to CQL. expected [u'KEY'] length to be 2, but it's 1. 
comparator='org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.UTF8Type)'
CREATE TABLE events (
  "KEY" uuid PRIMARY KEY
) WITH
  comment='' AND
  caching='KEYS_ONLY' AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

Any ideas why CQL3 won't display Composite columns? What should be done to make 
them compatible?

Thanks,
Rahul Gupta
DEKA Research & Development
340 Commercial St  Manchester, NH  03101
P: 603.666.3908 extn. 6504 | C: 603.718.9676

This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.



This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.


Re: sstable size change

2013-07-23 Thread sankalp kohli
"Will Cassandra force any newly compacted files to my new setting as
compactions are naturally triggered"
Yes. Let it compact and increase in size.


On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli  wrote:

> On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright wrote:
>
>> Can you elaborate on what you mean by "let it take its own course
>> organically"?  Will Cassandra force any newly compacted files to my new
>> setting as compactions are naturally triggered?
>>
>
> You see, when two (or more!) SSTables love each other very much, they
> sometimes decide they want to compact together..
>
> But seriously, "yes." If you force all existing SSTables to level 0, it is
> as if you just flushed them all. Level compaction then does a whole lot of
> compaction, using the active table size.
>
> =Rob
>


Decommission an entire DC

2013-07-23 Thread Lanny Ripple
Hi,

We have a multi-dc setup using DC1:2, DC2:2.  We want to get rid of DC1.
 We're in the position where we don't need to save any of the data on DC1.
 We know we'll lose a (tiny.  already checked) bit of data but our
processing is such that we'll recover over time.

How do we drop DC1 and just move forward with DC2?  Using nodetool
decommision or removetoken looks like we'll eventually end up with a single
DC1 node containing the entire dc's data which would be slow and costly.

We've speculated that setting DC1:0 or removing it from the schema would do
the trick but without finding any hits during searching on that idea I
hesitate to just do it.  We can drop DC1s data but have to keep a working
ring in DC2.


Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Marcelo Elias Del Valle
Dean,

I am using hadoop 1.0.3.
Indeed, using Cassandra 1.2.3 with Random partitioner, it worked.
However, it's the only reason for me to use randompartitioner, I really
would like to move forward. Besides, I tried to use Cassandra 1.2.6 with
RandomPartitioner and I got problems when inserting data, even stopping
Cassandra, cleaning my entire data folder and then starting it again.
I am also really curious to know if there is anyone else having these
problems or if it is just me...

Best regards,
Marcelo.


2013/7/23 Hiller, Dean 

> Out of curiosity, what version of hadoop are you using with cassandra?  I
> think we are trying 0.20.2 if I remember(I have to ask my guy working on it
> to be sure).  I do remember him saying the cassandra maven dependency was
> odd in that it is in the older version and not a newer hadoop version.
>
> We are using RandomPartitioner though right now which I have personally
> used in the past with success.  We are in the process of map/reducing to a
> cassandra with MurmurPartitioner  (our real reason to map/reduce is some
> refactorings in our model though and we just thought we would switch to
> murmur).
>
> Has anyone else used map/reduce with murmur partitioner?
>
> Dean
>
> From: Marcelo Elias Del Valle  mvall...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Monday, July 22, 2013 4:04 PM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: cassandra 1.2.6 -> Start key's token sorts after end token
>
> Hello,
>
> I am trying to figure what might be cause this error. I am using
> Cassandra 1.2.6 (tried with 1.2.3 as well) and I am trying to read data
> from cassandra on hadoop using column family input format. I also got the
> same error using pure astyanax on a test.
> I am using Murmur3Partitioner and I created the keyspace using
> Cassandra 1.2.6, there is nothing from prior versions. I created the
> keyspace with SimpleStrategy and replication factor 1.
> Here is the exception I am getting:
> 2013-07-22 21:53:05,824 WARN org.apache.hadoop.mapred.Child (main): Error
> running child
> java.lang.RuntimeException: InvalidRequestException(why:Start key's token
> sorts after end token)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:459)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:522)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:547)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: InvalidRequestException(why:Start key's token sorts after end
> token)
> at
> org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14168)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
> at
> org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)
> ... 16 more
> 2013-07-22 21:53:05,828 INFO org.apache.hadoop.mapred.Task (main):
> Runnning cleanup for the task
>
>  Any hint?
>
> Best regards,
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>



-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Perhaps try 0.20.2 as

 1.  The maven pom files have cassandra depending on 0.20.2
 2.  The 0.20.2 default was murmur and we had to change it to random 
partitioner or it wouldn't work for us

Ie. I suspect they will change the pom file to a more recent version of hadoop 
at some point but I wonder if test suites suck in 0.20.2 because the pom file 
points to that version….depends on if they actually have tests for map/reduce 
which is probably a bit hard.

Dean

From: Marcelo Elias Del Valle mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, July 23, 2013 1:54 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: cassandra 1.2.6 -> Start key's token sorts after end token

Dean,

I am using hadoop 1.0.3.
Indeed, using Cassandra 1.2.3 with Random partitioner, it worked. However, 
it's the only reason for me to use randompartitioner, I really would like to 
move forward. Besides, I tried to use Cassandra 1.2.6 with RandomPartitioner 
and I got problems when inserting data, even stopping Cassandra, cleaning my 
entire data folder and then starting it again.
I am also really curious to know if there is anyone else having these 
problems or if it is just me...

Best regards,
Marcelo.


2013/7/23 Hiller, Dean mailto:dean.hil...@nrel.gov>>
Out of curiosity, what version of hadoop are you using with cassandra?  I think 
we are trying 0.20.2 if I remember(I have to ask my guy working on it to be 
sure).  I do remember him saying the cassandra maven dependency was odd in that 
it is in the older version and not a newer hadoop version.

We are using RandomPartitioner though right now which I have personally used in 
the past with success.  We are in the process of map/reducing to a cassandra 
with MurmurPartitioner  (our real reason to map/reduce is some refactorings in 
our model though and we just thought we would switch to murmur).

Has anyone else used map/reduce with murmur partitioner?

Dean

From: Marcelo Elias Del Valle 
mailto:mvall...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Monday, July 22, 2013 4:04 PM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: cassandra 1.2.6 -> Start key's token sorts after end token

Hello,

I am trying to figure what might be cause this error. I am using Cassandra 
1.2.6 (tried with 1.2.3 as well) and I am trying to read data from cassandra on 
hadoop using column family input format. I also got the same error using pure 
astyanax on a test.
I am using Murmur3Partitioner and I created the keyspace using Cassandra 
1.2.6, there is nothing from prior versions. I created the keyspace with 
SimpleStrategy and replication factor 1.
Here is the exception I am getting:
2013-07-22 21:53:05,824 WARN org.apache.hadoop.mapred.Child (main): Error 
running child
java.lang.RuntimeException: InvalidRequestException(why:Start key's token sorts 
after end token)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:459)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:522)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:547)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: InvalidRequestException(why:Start key's token sorts after end token)
at 
org.apache.cassandra.thrift.Cassan

Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Oh, and in the past 0.20.x has been pretty stable by the wayŠ..they
finally switched their numbering scheme thank god.

Dean

On 7/23/13 2:13 PM, "Hiller, Dean"  wrote:

>Perhaps try 0.20.2 as
>
> 1.  The maven pom files have cassandra depending on 0.20.2
> 2.  The 0.20.2 default was murmur and we had to change it to random
>partitioner or it wouldn't work for us
>
>Ie. I suspect they will change the pom file to a more recent version of
>hadoop at some point but I wonder if test suites suck in 0.20.2 because
>the pom file points to that versionŠ.depends on if they actually have
>tests for map/reduce which is probably a bit hard.
>
>Dean
>
>From: Marcelo Elias Del Valle
>mailto:mvall...@gmail.com>>
>Reply-To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Date: Tuesday, July 23, 2013 1:54 PM
>To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Subject: Re: cassandra 1.2.6 -> Start key's token sorts after end token
>
>Dean,
>
>I am using hadoop 1.0.3.
>Indeed, using Cassandra 1.2.3 with Random partitioner, it worked.
>However, it's the only reason for me to use randompartitioner, I really
>would like to move forward. Besides, I tried to use Cassandra 1.2.6 with
>RandomPartitioner and I got problems when inserting data, even stopping
>Cassandra, cleaning my entire data folder and then starting it again.
>I am also really curious to know if there is anyone else having these
>problems or if it is just me...
>
>Best regards,
>Marcelo.
>
>
>2013/7/23 Hiller, Dean mailto:dean.hil...@nrel.gov>>
>Out of curiosity, what version of hadoop are you using with cassandra?  I
>think we are trying 0.20.2 if I remember(I have to ask my guy working on
>it to be sure).  I do remember him saying the cassandra maven dependency
>was odd in that it is in the older version and not a newer hadoop version.
>
>We are using RandomPartitioner though right now which I have personally
>used in the past with success.  We are in the process of map/reducing to
>a cassandra with MurmurPartitioner  (our real reason to map/reduce is
>some refactorings in our model though and we just thought we would switch
>to murmur).
>
>Has anyone else used map/reduce with murmur partitioner?
>
>Dean
>
>From: Marcelo Elias Del Valle
>mailto:mvall...@gmail.com>>>
>Reply-To: 
>"user@cassandra.apache.orgassandra.apache.org>"
>mailto:user@cassandra.apache.org>assandra.apache.org>>
>Date: Monday, July 22, 2013 4:04 PM
>To: 
>"user@cassandra.apache.orgassandra.apache.org>"
>mailto:user@cassandra.apache.org>assandra.apache.org>>
>Subject: cassandra 1.2.6 -> Start key's token sorts after end token
>
>Hello,
>
>I am trying to figure what might be cause this error. I am using
>Cassandra 1.2.6 (tried with 1.2.3 as well) and I am trying to read data
>from cassandra on hadoop using column family input format. I also got the
>same error using pure astyanax on a test.
>I am using Murmur3Partitioner and I created the keyspace using
>Cassandra 1.2.6, there is nothing from prior versions. I created the
>keyspace with SimpleStrategy and replication factor 1.
>Here is the exception I am getting:
>2013-07-22 21:53:05,824 WARN org.apache.hadoop.mapred.Child (main): Error
>running child
>java.lang.RuntimeException: InvalidRequestException(why:Start key's token
>sorts after end token)
>at 
>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybe
>Init(ColumnFamilyRecordReader.java:453)
>at 
>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.compu
>teNext(ColumnFamilyRecordReader.java:459)
>at 
>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.compu
>teNext(ColumnFamilyRecordReader.java:406)
>at 
>com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterat
>or.java:143)
>at 
>com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:1
>38)
>at 
>org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFam
>ilyRecordReader.java:103)
>at 
>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTa
>sk.java:522)
>at 
>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
>ask.java:547)
>at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:39

Re: Unable to describe table in CQL 3

2013-07-23 Thread Shahab Yunus
Rahul,

See this as it was discussed earlier:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html

Regards,
Shahab


On Tue, Jul 23, 2013 at 2:51 PM, Rahul Gupta wrote:

>  I am using Cassandra ver 1.1.9.7
>
> Created a Column Family using Cassandra-cli.
>
> ** **
>
> create column family events
>
> with comparator = 'CompositeType(DateType,UTF8Type)'
>
> and key_validation_class = 'UUIDType'
>
> and default_validation_class = 'UTF8Type';
>
> ** **
>
> I can describe this CF using CQL2 but getting error when trying the same
> describe with CQL 3
>
> ** **
>
> cqlsh:CQI> desc table events;
>
> ** **
>
> /usr/lib/python2.6/site-packages/cqlshlib/cql3handling.py:852:
> UnexpectedTableStructure: Unexpected table structure; may not translate
> correctly to CQL. expected composite key CF to have column aliases, but
> found none
>
> /usr/lib/python2.6/site-packages/cqlshlib/cql3handling.py:875:
> UnexpectedTableStructure: Unexpected table structure; may not translate
> correctly to CQL. expected [u'KEY'] length to be 2, but it's 1.
> comparator='org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.UTF8Type)'
> 
>
> CREATE TABLE events (
>
>   "KEY" uuid PRIMARY KEY
>
> ) WITH
>
>   comment='' AND
>
>   caching='KEYS_ONLY' AND
>
>   read_repair_chance=0.10 AND
>
>   gc_grace_seconds=864000 AND
>
>   replicate_on_write='true' AND
>
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> ** **
>
> Any ideas why CQL3 won’t display Composite columns? What should be done to
> make them compatible?
>
> ** **
>
> Thanks,
>
> *Rahul Gupta*
> *DEKA* *Research & Development* 
>
> 340 Commercial St  Manchester, NH  03101
>
> P: 603.666.3908 extn. 6504 | C: 603.718.9676
>
> ** **
>
> This e-mail and the information, including any attachments, it contains
> are intended to be a confidential communication only to the person or
> entity to whom it is addressed and may contain information that is
> privileged. If the reader of this message is not the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you have received this
> communication in error, please immediately notify the sender and destroy
> the original message.
>
> ** **
>
> --
> This e-mail and the information, including any attachments, it contains
> are intended to be a confidential communication only to the person or
> entity to whom it is addressed and may contain information that is
> privileged. If the reader of this message is not the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you have received this
> communication in error, please immediately notify the sender and destroy
> the original message.
>
> Thank you.
>
> Please consider the environment before printing this email.
>


Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-23 Thread Shahab Yunus
See this as this was discussed earlier:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html

Regards,
Shahab


On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus wrote:

> A basic question and it seems that I have a gap in my understanding.
>
> I have a simple table in Cassandra with multiple column families. I add
> new columns to each of these column families on the fly. When I view (using
> the 'DESCRIBE table' command) the schema of a particular column family, I
> see only one entry for column (bolded below). What is the reason for that?
> The column that I am adding have string names and byte values, written
> using Hector 1.1-3 (
> HFactory.createColumn(...) method).
>
> CREATE TABLE mytable (
>   key text,
>   *column1* ascii,
>   value blob,
>   PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE AND
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=1.00 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
>
> cqlsh 3.0.2
> Cassandra 1.2.5
> CQL spec 3.0.0
> Thrift protocol 19.36.0
>
>
> Given this, I can also only query on this one column1 or value using the
> 'SELECT' statement.
>
> The OpsCenter on the other hand, displays multiple columns as
> expected. Basically the demarcation of multiple columns i clearer.
>
> Thanks a lot.
>
> Regards,
> Shahab
>


Re: Decommission an entire DC

2013-07-23 Thread Omar Shibli
All you need to do is to decrease the replication factor of DC1 to 0, and
then decommission the nodes one by one,
I've tried this before and it worked with no issues.

Thanks,

On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple  wrote:

> Hi,
>
> We have a multi-dc setup using DC1:2, DC2:2.  We want to get rid of DC1.
>  We're in the position where we don't need to save any of the data on DC1.
>  We know we'll lose a (tiny.  already checked) bit of data but our
> processing is such that we'll recover over time.
>
> How do we drop DC1 and just move forward with DC2?  Using nodetool
> decommision or removetoken looks like we'll eventually end up with a single
> DC1 node containing the entire dc's data which would be slow and costly.
>
> We've speculated that setting DC1:0 or removing it from the schema would
> do the trick but without finding any hits during searching on that idea I
> hesitate to just do it.  We can drop DC1s data but have to keep a working
> ring in DC2.
>
>


unable to compact large rows

2013-07-23 Thread Paul Ingalls
I'm getting constant exceptions during compaction of large rows.  In fact, I 
have not seen one work, even starting from an empty DB.  As soon as I start 
pushing in data, when a row hits the large threshold, it fails compaction with 
this type of stack trace:

 INFO [CompactionExecutor:6] 2013-07-24 01:17:53,592 CompactionController.java 
(line 156) Compacting large row fanzo/tweets_by_id:352567939972603904 
(153360688 bytes) incrementally
ERROR [CompactionExecutor:6] 2013-07-24 01:18:12,496 CassandraDaemon.java (line 
192) Exception in thread Thread[CompactionExecutor:6,1,main]
java.lang.AssertionError: incorrect row data size 5722610 written to 
/mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_id/fanzo-tweets_by_id-tmp-ic-1453-Data.db;
 correct is 5767384
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

I'm not sure what to do or where to look.  Help…:)

Thanks,

Paul




disappointed

2013-07-23 Thread Paul Ingalls
I want to check in.  I'm sad, mad and afraid.  I've been trying to get a 1.2 
cluster up and working with my data set for three weeks with no success.  I've 
been running a 1.1 cluster for 8 months now with no hiccups, but for me at 
least 1.2 has been a disaster.  I had high hopes for leveraging the new 
features of 1.2, specifically vnodes and collections.   But at this point I 
can't release my system into production, and will probably need to find a new 
back end.  As a small startup, this could be catastrophic.  I'm mostly mad at 
myself.  I took a risk moving to the new tech.  I forgot sometimes when you 
gamble, you lose.

First, the performance of 1.2.6 was horrible when using collections.  I wasn't 
able to push through 500k rows before the cluster became unusable.  With a lot 
of digging, and way too much time, I discovered I was hitting a bug that had 
just been fixed, but was unreleased.  This scared me, because the release was 
already at 1.2.6 and I would have expected something as 
https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed 
long before.  But gamely I grabbed the latest code from the 1.2 branch, built 
it and I was finally able to get past half a million rows.  

But, then I hit ~4 million rows, and a multitude of problems.  Even with the 
fix above, I was still seeing a ton of compactions failing, specifically the 
ones for large rows.  Not a single large row will compact, they all assert with 
the wrong size.  Worse, and this is what kills the whole thing, I keep hitting 
a wall with open files, even after dumping the whole DB, dropping vnodes and 
trying again.  Seriously, 650k open file descriptors?  When it hits this limit, 
the whole DB craps out and is basically unusable.  This isn't that many rows.  
I have close to a half a billion in 1.1…

I'm now at a standstill.  I figure I have two options unless someone here can 
help me.  Neither of them involve 1.2.  I can either go back to 1.1 and remove 
the features that collections added to my service, or I find another data 
backend that has similar performance characteristics to cassandra but allows 
collections type behavior in a scalable manner.  Cause as far as I can tell, 
1.2 doesn't scale.  Which makes me sad, I was proud of what I accomplished with 
1.1….

Does anyone know why there are so many open file descriptors?  Any ideas on why 
a large row won't compact?

Paul

get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
hi,
I want to fetch all the row keys of a table using CQL3:

e.g
select id from mytable limit 999


#1
For this query, does the node need to wait for all rows return from all
other nodes before returning the data to the client(I am using astyanax) ?
In other words, will this operation create a lot of load to the initial
node receiving the request?


#2
if my table is big, I have to make sure the limit is set to a big enough
number, such that I can get all the result. Seems like I have to do a
count(*) to be sure
is there any alternative(always return all the rows)?

#3
if my id is a timeuuid, is it better to  combine the result from couple of
the following cql to obtain all keys?
e.g
select id from mytable where id t < minTimeuuid('2013-02-02 10:00+')
limit 2
+
select id from mytable where id t > maxTimeuuid('2013-02-02 10:00+')
limit 2

thanks


Re: get all row keys of a table using CQL3

2013-07-23 Thread Blake Eggleston
Hi Jimmy,

Check out the token function:

http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results

You can use it to page through your rows.

Blake


On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:

> hi,
> I want to fetch all the row keys of a table using CQL3:
>  
> e.g
> select id from mytable limit 999
>  
>  
> #1
> For this query, does the node need to wait for all rows return from all other 
> nodes before returning the data to the client(I am using astyanax) ?
> In other words, will this operation create a lot of load to the initial node 
> receiving the request?
>  
>  
> #2
> if my table is big, I have to make sure the limit is set to a big enough 
> number, such that I can get all the result. Seems like I have to do a 
> count(*) to be sure
> is there any alternative(always return all the rows)?
>  
> #3
> if my id is a timeuuid, is it better to  combine the result from couple of 
> the following cql to obtain all keys?
> e.g
> select id from mytable where id t < minTimeuuid('2013-02-02 10:00+') 
> limit 2
> +
> select id from mytable where id t > maxTimeuuid('2013-02-02 10:00+') 
> limit 2
>  
> thanks
> 
>  
>  
>  



Re: get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
hi Blake,
arh okay, token function is nice.

But I am still bit confused by the word "page through all rows"
select id from mytable where token(id) > token(12345)
it will return all rows whose partition key's corresponding token that is >
12345 ?
I guess my question #1 still there, that does this query create a big load
on the initial node that receive such request because it still has to wait
for all the result coming back from other nodes before returning to client?

thanks





On Tue, Jul 23, 2013 at 10:34 PM, Blake Eggleston wrote:

> Hi Jimmy,
>
> Check out the token function:
>
>
> http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results
>
> You can use it to page through your rows.
>
> Blake
>
>
> On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:
>
> hi,
> I want to fetch all the row keys of a table using CQL3:
>
> e.g
> select id from mytable limit 999
>
>
> #1
> For this query, does the node need to wait for all rows return from all
> other nodes before returning the data to the client(I am using astyanax) ?
> In other words, will this operation create a lot of load to the initial
> node receiving the request?
>
>
> #2
> if my table is big, I have to make sure the limit is set to a big enough
> number, such that I can get all the result. Seems like I have to do a
> count(*) to be sure
> is there any alternative(always return all the rows)?
>
> #3
> if my id is a timeuuid, is it better to  combine the result from couple of
> the following cql to obtain all keys?
> e.g
> select id from mytable where id t < minTimeuuid('2013-02-02 10:00+')
> limit 2
> +
> select id from mytable where id t > maxTimeuuid('2013-02-02 10:00+')
> limit 2
>
> thanks
>
>
>
>
>
>
>