RE: Direct IO with Spark and Hadoop over Cassandra

2014-09-16 Thread moshe.kranc
You will also have to read/resolve multiple row instances (if you update records) and tombstones (if you delete records) yourself. From: platon.tema [mailto:platon.t...@yandex.ru] Sent: Tuesday, September 16, 2014 1:51 PM To: user@cassandra.apache.org Subject: Re: Direct IO with Spark and Hadoop

RE: Performance migrating from MySQL to C*

2014-06-02 Thread moshe.kranc
From your email, I understand your use case a bit better - I now see that you want to query not just by dataName, but also by sensorId. Still, it seems like the major filter for the query is the dataName (you search for a few dozen at a time). Within that, you want to filter on some (potentiall

RE: Performance migrating from MySQL to C*

2014-05-28 Thread moshe.kranc
Just looking at the data modeling issue: Your queries seem to always be for a single dataName. So, that should be the main part of the row key. Within that, it seems you need to be able to select a range based on time. So, time should be the primary sort key for the column name. Based on those

RE: column with TTL of 10 seconds lives very long...

2013-05-23 Thread moshe.kranc
(Probably will not solve your problem, but worth mentioning): It's not enough to check that the clocks of all the servers are synchronized - I believe that the client node sets the timestamp for a record being written. So, you should also check the timestamp on your Hector client nodes. From: T

RE: column with TTL of 10 seconds lives very long...

2013-05-23 Thread moshe.kranc
Maybe you didn't set the TTL correctly. Check the TTL of the column using CQL, e.g.: SELECT TTL (colName) from colFamilyName WHERE ; From: Felipe Sere [mailto:felipe.s...@1und1.de] Sent: Thursday, May 23, 2013 1:28 PM To: user@cassandra.apache.org Subject: AW: column with TTL of 10 seconds lives v

RE: Repair session failed

2013-05-01 Thread moshe.kranc
Sounds like a job for "nodetool scrub", which rewrites the SStable rows in the correct order. After the scrub, nodetool repair should succeed. From: Haithem Jarraya [mailto:haithem.jarr...@struq.com] Sent: Wednesday, May 01, 2013 5:46 PM To: user@cassandra.apache.org Subject: Repair session faile

RE: Exception when setting tokens for the cassandra nodes

2013-04-29 Thread moshe.kranc
For starters: If you are using the Murmur3 partitioner, which is the default in cassandra.yaml, then you need to calculate the tokens using: python -c 'print [str(((2**64 / 2) * i) - 2**63) for i in range(2)]' which gives the following values: ['-9223372036854775808', '0'] From: Rahul [mailto:r

RE: Secondary Index on table with a lot of data crashes Cassandra

2013-04-25 Thread moshe.kranc
IMHO: user_name is not a column, it is the row key. Therefore, according to http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ , the row does not contain a relevant column index, which causes the iterator to read each column (including value) of each row. I believe that instead of refer

RE: Advice on memory warning

2013-04-21 Thread moshe.kranc
My experience (running C* 1.2.2): 1. I also observe that this occurs during compaction. 2. I have never yet seen a node recover from this state. Once it starts complaining about heap, it starts a death spiral, i.e., futile attempts to fix the situation. Eventually the node starts running GC fo

RE: CorruptedBlockException

2013-04-11 Thread moshe.kranc
I have formulated the following theory regarding C* 1.2.2 which may be relevant: Whenever there is a disk error during compaction of an SS table (e.g., bad block, out of disk space), that SStable's files stick around forever after, and do not subsequently get deleted by normal compaction (minor

RE: Thrift message length exceeded

2013-04-10 Thread moshe.kranc
I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 to 0.8 Turns out the Thrift message really was too long. The mystery to me: Why no complaints in previous versions? Were some checks added in Thrift or Hector? -Original Message- From: Lanny Ripple [mailto:la...@spot

RE: CQL vs. non-CQL data models

2013-04-02 Thread moshe.kranc
column family created using CQL is not visible via Cassandra CLI [default@test1] list employees1; employees1 not found in current keyspace. CQL3 automatically down-cases all unquoted literals You need to enclose any name with mixed case in quotes - see http://stackoverflow

RE: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-19 Thread moshe.kranc
This obscure feature of Cassandra is called "haunted handoff". Happy (early) April Fools :) From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 18, 2013 7:45 PM To: user@cassandra.apache.org Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10 As you see,

RE: Secondary Indexes

2013-03-17 Thread moshe.kranc
I do not think this is a good use case for Cassandra alone, assuming the queries can be any combination of the 18 columns. I would consider using some combination of Cassandra and Solr, where Solr provides the indexing/search, and Cassandra provides the bulk store. From: Andy Stec [mailto:andys.

RE: About the heap

2013-03-13 Thread moshe.kranc
Peaks may be occurring during compaction, when Sstable files are memmapped. If so: Upgrading to C* 1.2 may bring some relief: You can trigger minor compaction on an individual SStable file when the percentage of tombstones in that Sstable crosses a user-defined threshold. (Aaron, can you confirm?

RE: data model to store large volume syslog

2013-03-07 Thread moshe.kranc
Row key based on hour will create hot spots for write - for an entire hour, all the writes will be going to the same node, i.e., the node where the row resides. You need to come up with a row key that distributes writes evenly across all your C* nodes, e.g., time concatenated with a sequence cou