Hi Alain; 
The requirements are impossible to meet, since you are expected to have a 
predictable and determinist tests  while you need "recent data" (max 1 week old 
data).Reason:   You cannot have a replicable result set when the data is 
variable on a weekly basis.
To obtain a replicable test result, I recommend the following: a)   Keep the 
'data' expectation to a point in time which is a known quanta. b)   Load some 
data into your cluster & take a snapshot.    Reload this snapshot before every 
Test for consistent results.   
hope this helps. 
Jan/C* Architect 

     On Monday, January 26, 2015 10:43 AM, Eric Stevens <migh...@gmail.com> 
wrote:
   

 I don't have directly relevant advice, especially WRT getting a meaningful and 
coherent subset of your production data - that's probably too closely coupled 
with your business logic.  Perhaps you can run a testing cluster with a default 
TTL on all your tables of ~2 weeks, feeding it with real production data so 
that you have a rolling current snapshot of production.
We do this basic strategy to support integration tests with the rest of our 
platform.  We have a data access service with other internal teams acting as 
customers of that data.  But it's hard to write strong tests against this, 
because it becomes challenging to predict the values which you should expect to 
get back without rewriting the business logic directly into your tests (and 
then what exactly are you testing, are you testing your tests?)
But our data interaction layer tests all focus around inserting the data under 
test immediately before the assertions portion of the given test.  We use 
Specs2 as a testing framework, and that gives us access to a very nice 
"eventually { ... }" syntax which will retry the assertions portion several 
times with a backoff (so that we can account for the eventually consistent 
nature of Cassandra, and reduce the number of false failures without having to 
do test execution speed impacting operations like sleep before assert).
Basically our data access layer unit tests are strong and rely only on 
synthetic data (assert that the response is exact for every value), while 
integration tests from other systems use much softer tests against real data 
(more like is there data, and does that data seem to be the right format and 
for the right time range).
On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

Hi guys,
We currently use a CI with tests based on docker containers.
We have a C* service "dockerized". Yet we have an issue since we would like 2 
things, hard to achieve:
- A fix data set to have predictable and determinist tests (that we can repeat 
at any time with the same result)- A recent data set to perform smoke testing 
on things services that need "recent data" (max 1 week old data)
As our dataset is very big and data is not sorted by dates in SSTable, it is 
hard to have a coherent extract of the production data. Does anyone of you 
achieve to have something like this ?
For "static" data, we could write queries by hand but I find it more relevant 
to have a real production extract. Regarding dynamic data we need a process 
that we could repeat every day / week to update data and have something light 
enough to keep fastness in containers start.
How do you guys do this kind of things ?
FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 features.
Any idea is welcome and if you need more info, please ask.
C*heers,
Alain



   

Reply via email to