Hi Alain; The requirements are impossible to meet, since you are expected to have a predictable and determinist tests while you need "recent data" (max 1 week old data).Reason: You cannot have a replicable result set when the data is variable on a weekly basis. To obtain a replicable test result, I recommend the following: a) Keep the 'data' expectation to a point in time which is a known quanta. b) Load some data into your cluster & take a snapshot. Reload this snapshot before every Test for consistent results. hope this helps. Jan/C* Architect
On Monday, January 26, 2015 10:43 AM, Eric Stevens <migh...@gmail.com> wrote: I don't have directly relevant advice, especially WRT getting a meaningful and coherent subset of your production data - that's probably too closely coupled with your business logic. Perhaps you can run a testing cluster with a default TTL on all your tables of ~2 weeks, feeding it with real production data so that you have a rolling current snapshot of production. We do this basic strategy to support integration tests with the rest of our platform. We have a data access service with other internal teams acting as customers of that data. But it's hard to write strong tests against this, because it becomes challenging to predict the values which you should expect to get back without rewriting the business logic directly into your tests (and then what exactly are you testing, are you testing your tests?) But our data interaction layer tests all focus around inserting the data under test immediately before the assertions portion of the given test. We use Specs2 as a testing framework, and that gives us access to a very nice "eventually { ... }" syntax which will retry the assertions portion several times with a backoff (so that we can account for the eventually consistent nature of Cassandra, and reduce the number of false failures without having to do test execution speed impacting operations like sleep before assert). Basically our data access layer unit tests are strong and rely only on synthetic data (assert that the response is exact for every value), while integration tests from other systems use much softer tests against real data (more like is there data, and does that data seem to be the right format and for the right time range). On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: Hi guys, We currently use a CI with tests based on docker containers. We have a C* service "dockerized". Yet we have an issue since we would like 2 things, hard to achieve: - A fix data set to have predictable and determinist tests (that we can repeat at any time with the same result)- A recent data set to perform smoke testing on things services that need "recent data" (max 1 week old data) As our dataset is very big and data is not sorted by dates in SSTable, it is hard to have a coherent extract of the production data. Does anyone of you achieve to have something like this ? For "static" data, we could write queries by hand but I find it more relevant to have a real production extract. Regarding dynamic data we need a process that we could repeat every day / week to update data and have something light enough to keep fastness in containers start. How do you guys do this kind of things ? FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 features. Any idea is welcome and if you need more info, please ask. C*heers, Alain