Re: Contributing cassandra-diff

Ahmed Eljami Thu, 22 Aug 2019 02:57:47 -0700

Great addition! Thanks Marcus.

+1 for cassandra-compare as said by Jeremy.


We can also think about other features like:

- Comparing just the count between 2 tables. In some cases, It will be
enough to say that our copy is OK.

- Making a difference on a set of partition ==> This will avoid comparing
the full of data in case of large volumes and when a set of data will be
enough to be sure of our copy.

Thanks

Le jeu. 22 août 2019 à 09:49, Jeremy Hanna <jeremy.hanna1...@gmail.com> a
écrit :

> It’s great to contribute such a tool. The change between 2.x and 3.0
> brought a translation layer from thrift to cql that is hard to validate on
> real clusters without something like this. Thank you.
>
> As for naming, perhaps cassandra-compare might be clearer as diff is an
> overloaded word but that’s a bikeshed sort of argument.
>
> > On Aug 22, 2019, at 12:32 AM, Vinay Chella <vinaykumar...@gmail.com>
> wrote:
> >
> > This is a great addition to our Cassandra validation framework/tools. I
> can
> > see many teams in the community get benefited from tooling like this.
> >
> > I like the idea of the generic repo (repos/asf/cassandra-contrib.git
> > or *whatever
> > the name is*) for tools like this, for the following 2 main reasons.
> >
> >   1. Easily accessible/ reachable/ searchable
> >   2. Welcomes community in Cassandra ecosystem to contribute more easily
> >
> >
> >
> > Thanks,
> > Vinay Chella
> >
> >
> >> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson <marc...@apache.org>
> wrote:
> >>
> >> Hi, we are about to open source our tooling for comparing two cassandra
> >> clusters and want to get some feedback where to push it. I think the
> >> options are: (name bike-shedding welcome)
> >>
> >> 1. create repos/asf/cassandra-diff.git
> >> 2. create a generic repos/asf/cassandra-contrib.git where we can add
> more
> >> contributed tools in the future
> >>
> >> Temporary location: https://github.com/krummas/cassandra-diff
> >>
> >> Cassandra-diff is a spark job that compares the data in two clusters -
> it
> >> pages through all partitions and reads all rows for those partitions in
> >> both clusters to make sure they are identical. Based on the
> configuration
> >> variable “reverse_read_probability” the rows are either read forward or
> in
> >> reverse order.
> >>
> >> Our main use case for cassandra-diff has been to set up two identical
> >> clusters, transfer a snapshot from the cluster we want to test to these
> >> clusters and upgrade one side. When that is done we run this tool to
> make
> >> sure that 2.1 and 3.0 gives the same results. A few examples of the
> bugs we
> >> have found using this tool:
> >>
> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning
> multiple
> >> index blocks create invalid bound sequences on 3.0+
> >> * CASSANDRA-14803: Rows that cross index block boundaries can cause
> >> incomplete reverse reads in some cases
> >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse
> >> iteration of indexed partitions
> >>
> >> /Marcus
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Cordialement;

Ahmed ELJAMI

Re: Contributing cassandra-diff

Reply via email to