Re: emptying my cluster

aaron morton Thu, 05 Jan 2012 11:13:40 -0800

> * In the design discussed it is perfectly reasonable for data not to be on 
> the archive node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have 
> the 2 DC setup but don't use TTL I don't understand why data wouldn't be on 
> the archive node?
Originally you were talking about taking the archive node down, and then having 
HH write hints back. HH is not considered a reliable mechanism for obtaining 
consistency, it's better in 1.0 but repair is AFAIK still considered the way to 
achieve consistency. For example HH only collects hints for a down node for 1 
hour.  Also a read operation will check consistency and may repair it, 
snapshots do not do that.


Finally if you write into the DC with 2 nodes at a CL other than QUORUM or 
EACH_QUORUM there is no guarantee that the write will be committed in the other 
DC. 
 
>  So what data format should I use for historical archiving?
Plain text file, with documentation. So that any who follows you can work with 
the data.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/01/2012, at 12:31 AM, Alexandru Sicoe wrote:

> Hi,
> 
> On Wed, Jan 4, 2012 at 9:54 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Some thoughts on the plan:
> 
> * You are monkeying around with things, do not be surprised when surprising 
> things happen. 
> 
> I am just trying to explore different solutions for solving my problem.
>  
> * Deliberately unbalancing the cluster may lead to Bad Things happening. 
> 
> I will take your advice on this. I would have liked to have an extra node to 
> have 2 nodes in each DC.
>  
> * In the design discussed it is perfectly reasonable for data not to be on 
> the archive node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have 
> the 2 DC setup but don't use TTL I don't understand why data wouldn't be on 
> the archive node?
>  
> * Truncate is a cluster wide operation and all nodes must be online before it 
> will start. 
> * Truncate will snapshot before deleting data, you could use this snapshot. 
> * TTL for a column is for a column no matter which node it is on. 
> 
> Thanks for clarifying these!
>  
> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format 
> for a historical archive, nothing against Cassandra. You need the lowest 
> common format. 
> 
> So what data format should I use for historical archiving?
>  
> 
> If you have the resources for a second cluster could you put the two together 
> and just have one cluster with a very large retention policy? One cluster is 
> easier than two.  
> 
> I am constrained to have limited retention on the Cassandra cluster that is 
> collecting the data . Once I archive the data for long term storage I cannot 
> bring it back in the same Cassandra cluster that collected it in the first 
> place because it's in an enclosed network with strict rules. I have to load 
> it in another cluster outside the enclosed network. It's not that I have the 
> resources for a second cluster, I am forced to use a second cluster.
>  
> 
> Assuming there is no business case for this, consider either:
> 
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster 
> with high compression. If needed you could then run Hive / Pig to fill a 
> companion Cassandra cluster with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll your 
> own solution to fill a cluster. 
> 
> Ok, thanks for these suggestions, I will have to investigate further.
>  
> Also considering talking to Data Stax about DSE. 
> 
> Cheers 
>   
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:
> 
> 
> Cheers,
> Alex 
>> Hi,
>> 
>> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Running a time based rolling window of data can be done using the TTL. 
>> Backing up the nodes for disaster recover can be done using snapshots. 
>> Restoring any point in time will be tricky because to may restore columns 
>> where the TTL has expired. 
>>  
>> Yeah, that's the thing...if I want to use the system as I explain further 
>> below, I cannot do backing up of data (for later restoration) if I'm using 
>> TTLs. 
>>  
>> 
>>> Will I get a single copy of the data in the remote storage or will it be 
>>> twice the data (data + replica)?
>> You will  RF copies of the data. (By the way, there is no original copy)
>> 
>> Well, if I organize the cluster as I mentioned in the first email, I will 
>> get one copy of each row at a certain point in time on node2 if I take it 
>> offline, perform a major compaction and GC, won't I? I don't want to send 
>> duplicated data to the mass storage!
>>  
>> 
>> Can you share a bit more about the use case ? How much data and what sort of 
>> read patterns ? 
>> 
>> 
>> I have several applications that feed into Cassandra about 2 million 
>> different variables (each representing a different monitoring 
>> value/channel). The system receives updates for each of these monitoring 
>> values at different rates. For each new update, the timestamp and value are 
>> recorded in a Cassandra name-value pair. The schema of Cassandra is built 
>> using one CF for data and 4 other CFs for metadata (metadata CFs are static 
>> - don't grow almost at all once they've been loaded). The data CF uses a row 
>> for each variable. Each row acts as a 4 hour time bin. I achieve this by 
>> creating the row key as a concatenation of  the first 6 digits of the 
>> timestamp at which the data is inserted + the unique ID of the variable. 
>> After the time bin expires, a new row will be created for the same variable 
>> ID.
>> 
>> The system can currently sustain the insertion load. Now I'm looking into 
>> organizing the flow of data out of the cluster and retrieval performance for 
>> random queries:
>> 
>> Why do I need to organize the data out? Well, my requirement is to keep all 
>> the data coming into the system at the highest granularity for long term 
>> (several years). The 3 node cluster I mentioned is the online cluster which 
>> is supposed to be able to absorb the input load for a relatively short 
>> period of time, a few weeks (I am constrained to do this). After this period 
>> the data has to be shipped out of the cluster in a mass storage facility and 
>> the cluster needs to be emptied to make room for more data. Also, the online 
>> cluster will serve reads while it takes in data. For older data I am 
>> planning to have another cluster that gets loaded with data from the storage 
>> facility on demand and will serve reads from there.
>> 
>> Why random queries? There is no specific use case about them, that's why I 
>> want to rely only on the built in Cassandra indexes for now. Generally the 
>> client will ask for sets of values within a time range up to 8-10 hours in 
>> the past. Apart from some sets of variables that will be almost always asked 
>> together, any combination is possible because this system will feed in a web 
>> dashboard which will be used for debugging purposes  - to correlate and 
>> aggregate streams of variables. Depending on the problem, different variable 
>> combinations could be investigated. 
>>  
>> Can you split the data stream into a permanent log record and also into 
>> cassandra for a rolling window of query able data ?   
>> 
>> In the end, essentially that's what I've been meaning to do with organizing 
>> the cluster in a 2 DC setup: i wanted to have 2 nodes in DC1 taking the data 
>> and reads (the rolling window) and replicating to the node in DC2 (the 
>> permanent log - of a single copy of the data). I was thinking of 
>> implementing the rolling window by emptying the nodes in DC1 using truncate 
>> instead of what you propose now with the rolling window using TTL. 
>> 
>> Ok, so I can do what you are saying easily if Cassandra allows me to have a 
>> TTL only on the first copy of the data and have the second replica without a 
>> TTL. Is this possible? I think it would solve my problem, as long as I can 
>> backup and empty the node in DC2 before the TTLs expire in the other 2 nodes.
>> 
>> Cheers,
>> Alex
>> 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 3/01/2012, at 11:41 PM, Alexandru Sicoe wrote:
>> 
>>> Hi,
>>> 
>>> I need to build a system that stores data for years, so yes, I am backing 
>>> up data in another mass storage system from where it could be later 
>>> accessed. The data that I successfully back up has to be deleted from my 
>>> cluster to make space for new data coming in.
>>> 
>>> I was aware about the snapshotting which I will use for getting the data 
>>> out of node2: it creates hard links to the SSTables of a CF and then I can 
>>> copy over those files pointed to by the hard links into another location. 
>>> After that I get rid of the snapshot (hard links) and then I can truncate 
>>> my CFs. It's clear that snapshotting will give me a single copy of the data 
>>> in case I have a unique copy of the data on one node. It's not clear to me 
>>> what happens if I have let's say a cluster with 3 nodes and RF=2 and I do a 
>>> snapshot of every node and copy those snapshots to remote storage. Will I 
>>> get a single copy of the data in the remote storage or will it be twice the 
>>> data (data + replica)?
>>> 
>>> I've started reading about TTL and I think I can use it but it's not clear 
>>> to me how it would work in conjunction with the  snapshotting/backing up I 
>>> need to do. I mean, it will impose a deadline by which I need to perform a 
>>> backup in order not to miss any data. Also, I might duplicate the data if 
>>> some columns don't expire fully between 2 backups. Any clarifications on 
>>> this?
>>> 
>>> Cheers,
>>> Alex
>>> 
>>> On Tue, Jan 3, 2012 at 9:44 AM, aaron morton <aa...@thelastpickle.com> 
>>> wrote:
>>> That sounds a little complicated. 
>>> 
>>> Do you want to get the data out for an off node backup or is it for 
>>> processing in another system ? 
>>> 
>>> You may get by using:
>>> 
>>> * TTL to expire data via compaction
>>> * snapshots for backups
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:
>>> 
>>>> Hi everyone and Happy New Year!
>>>> 
>>>> I need advice for organizing data flow outside of my 3 node Cassandra 
>>>> 0.8.6 cluster. I am configuring my keyspace to use the 
>>>> NetworkTopologyStrategy. I have 2 data centers each with a replication 
>>>> factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnitch 
>>>> is:
>>>>                               
>>>>                                                                    
>>>> ip_node1=DC1:RAC1
>>>>                                                                            
>>>>                       ip_node2=DC2:RAC1
>>>>                                                                            
>>>>                       ip_node3=DC1:RAC1
>>>> I assign tokens like this:
>>>>                         node1 = 0
>>>>                         node2 = 1
>>>>                         node3 = 85070591730234615865843651857942052864
>>>> 
>>>> My write consistency level is ANY.
>>>> 
>>>> My data sources are only inserting data in node1 & node3. Essentially what 
>>>> happens is that a replica of every input value will end up on node2. Node 
>>>> 2 thus has a copy of the entire data written to the cluster. When Node2 
>>>> starts getting full, I want to have a script which pulls it off-line and 
>>>> does a sequence of operations 
>>>> (compaction/snapshotting/exporting/truncating the CFs) in order to back up 
>>>> the data in a remote place and to free it up so that it can take more 
>>>> data. When it comes back on-line it will take hints from the other 2 nodes.
>>>> 
>>>> This is how I plan on shipping data out of my cluster without any downtime 
>>>> or any major performance penalty. The problem is when I want to also 
>>>> truncate the CFs in node1 & node3 to also free them up of data. I don't 
>>>> know whether I can do this without any downtime or without any serious 
>>>> performance penalties. Is anyone using truncate to free up CFs of data? 
>>>> How efficient is this?
>>>> 
>>>> Any observations or suggestions are much appreciated!
>>>> 
>>>> Cheers,
>>>> Alex
>>> 
>>> 
>> 
>> 
> 
>

Re: emptying my cluster

Reply via email to