Should we use Materialised Views or ditch them ?

2020-02-28 Thread Tobias Eriksson
Hi
 A debate has surfaced in my company, whether to keep or remove Materialized 
Views
The Datastax FAQ says sure thing, go ahead and use it
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/faqMV.html
But know the limitations
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/knownLimitationsMV.html
and best practices
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/bestPracticesMV.html

What is the community take on using MV(Materialized Views) in production ?

-Tobias



Deleting Compaction Strategy for Cassandra 3.0?

2020-02-28 Thread Oleksandr Shulgin
Hi,

We have a task where we would need to remove roughly 25% of the data from
the SSTables that shouldn't be there anymore.

The rows to be removed can be identified by the first component of its
partitioning key, e.g. remove all rows where type='A' or type='B'.  The 2nd
component can be anything and we don't have a way to enumerate all of
them.  Even if we could, there are millions of records to remove and we
would like to avoid creating any tombstones.  Our total storage for this
cluster is close to 100 TiB.

We found that DCS (Deleting Compaction Strategy) can be a perfect fit:
https://github.com/protectwise/cassandra-util#deleting-compaction-strategy

However, the version over at Github only compiles against Cassandra 2.1.
We've seen that TLP has a fork with a branch that can compile with version
2.2: https://github.com/thelastpickle/cassandra-util/tree/cassandra-2.2

Has anybody tried to get it running with 3.0?  It looks like some classes
were moved around again since 2.2 and compilation fails due to missing
symbols...

Cheers,
--
Alex


Re: Should we use Materialised Views or ditch them ?

2020-02-28 Thread Max C.
The general view of the community is that you should *NOT* use them in 
production, due to multiple serious outstanding issues (see Jira).  We used 
them quite a bit when they first came out and have since rolled back all uses 
except for the absolute most basic cases (ex:  a table with 30K rows that isn’t 
updated).  If we were to do it over, we would not use them at all.

- Max

> On Feb 28, 2020, at 7:07 am, Tobias Eriksson  
> wrote:
> 
> Hi 
>  A debate has surfaced in my company, whether to keep or remove Materialized 
> Views
> The Datastax FAQ says sure thing, go ahead and use it
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/faqMV.html 
> 
> But know the limitations
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/knownLimitationsMV.html
>  
> 
> and best practices
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/bestPracticesMV.html 
> 
>  
> What is the community take on using MV(Materialized Views) in production ?
>  
> -Tobias



Re: Should we use Materialised Views or ditch them ?

2020-02-28 Thread Jon Haddad
I also recommend avoiding them.  I've seen too many clusters fall over as a
result of their usage.

On Fri, Feb 28, 2020 at 9:52 AM Max C.  wrote:

> The general view of the community is that you should *NOT* use them in
> production, due to multiple serious outstanding issues (see Jira).  We used
> them quite a bit when they first came out and have since rolled back all
> uses except for the absolute most basic cases (ex:  a table with 30K rows
> that isn’t updated).  If we were to do it over, we would not use them at
> all.
>
> - Max
>
> On Feb 28, 2020, at 7:07 am, Tobias Eriksson 
> wrote:
>
> Hi
>  A debate has surfaced in my company, whether to keep or remove
> Materialized Views
> The Datastax FAQ says sure thing, go ahead and use it
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/faqMV.html
> But know the limitations
>
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/knownLimitationsMV.html
> and best practices
> https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/bestPracticesMV.html
>
> What is the community take on using MV(Materialized Views) in production ?
>
> -Tobias
>
>
>


Re: Should we use Materialised Views or ditch them ?

2020-02-28 Thread Tobias Eriksson
It is interresting, cause when they arrived we started using them, but then 
there surfaced some posts on issues, even Datastax seemed to indicate that you 
shouldn’t use them
But now lately, from the articles/blogposts below, it seems like that is not 
the case anymore….
I wonder if there has been significant work done here to improve or even solve 
the problems
-Tobias

From: Jon Haddad 
Reply to: "user@cassandra.apache.org" 
Date: Friday, 28 February 2020 at 21:37
To: "user@cassandra.apache.org" 
Subject: Re: Should we use Materialised Views or ditch them ?

I also recommend avoiding them.  I've seen too many clusters fall over as a 
result of their usage.

On Fri, Feb 28, 2020 at 9:52 AM Max C. 
mailto:mc_cassan...@core43.com>> wrote:
The general view of the community is that you should *NOT* use them in 
production, due to multiple serious outstanding issues (see Jira).  We used 
them quite a bit when they first came out and have since rolled back all uses 
except for the absolute most basic cases (ex:  a table with 30K rows that isn’t 
updated).  If we were to do it over, we would not use them at all.

- Max


On Feb 28, 2020, at 7:07 am, Tobias Eriksson 
mailto:tobias.eriks...@qvantel.com>> wrote:

Hi
 A debate has surfaced in my company, whether to keep or remove Materialized 
Views
The Datastax FAQ says sure thing, go ahead and use it
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/faqMV.html
But know the limitations
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/knownLimitationsMV.html
and best practices
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/bestPracticesMV.html

What is the community take on using MV(Materialized Views) in production ?

-Tobias



Re: Should we use Materialised Views or ditch them ?

2020-02-28 Thread Erick Ramirez
Personally, I think MVs are still experimental and not ready for primetime.
It works for some but if you run into issues, fixing them have a huge
impact to your application. For example if the view updates get too far
behind, there's no effective way to resolve them other than having to drop
the MV then add it back in so it gets rebuilt from scratch which means that
the view is not available to your application for some time.

There are some improvements coming in 4.0 (unreleased) but you've got to
test it at scale before deploying it to production. For now, I'd stay away
from it. YMMV. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
have answers! Share your expertise on https://community.datastax.com/.

>


Re: Deleting Compaction Strategy for Cassandra 3.0?

2020-02-28 Thread Erick Ramirez
I'm not personally aware of anyone who is using it successfully other than
ProtectWise where it was a good fit for their narrow use case. My limited
knowledge of it is that it has some sharp edges which is the reason they
haven't pushed for it to be added to Cassandra (that's second hand info so
please don't quote me).

>
In any case, my initial reaction is that Spark would *probably* be a better
fit for what you're trying to achieve particularly since you don't know the
PK up front. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
have answers! Share your expertise on https://community.datastax.com/.


Re: Deleting Compaction Strategy for Cassandra 3.0?

2020-02-28 Thread Oleksandr Shulgin
On Fri, 28 Feb 2020, 23:02 Erick Ramirez, 
wrote:

> I'm not personally aware of anyone who is using it successfully other
> than ProtectWise where it was a good fit for their narrow use case. My
> limited knowledge of it is that it has some sharp edges which is the reason
> they haven't pushed for it to be added to Cassandra (that's second hand
> info so please don't quote me).
>
>>
> In any case, my initial reaction is that Spark would *probably* be a
> better fit for what you're trying to achieve particularly since you don't
> know the PK up front. Cheers!
>

I should have added that this is a one-time task: we do not intend to run
it constantly

And we are using TWCS, so after issuing millions of deletes with Spark (or
however else) we would have a hard time compacting the tombstones.

--
Alex

>
>


Re: Deleting Compaction Strategy for Cassandra 3.0?

2020-02-28 Thread Jeff Jirsa
If you’re really really advanced you MIGHT be able to  use spark + 
cqlsstablewriter to create a ton of sstables with just tombstones one them 
representing deletes, then either nodetool refresh or sstableloader them into 
the cluster 

If you create sstables on the right timestamp boundaries to match your twcs 
windows, each one will compact with the data file or the same window and delete 
the data. 

Will be a ton of compaction though. Not as efficient as the deleting strategy. 
Also not sure if the offline cqlsstablewriter actually supports deletes because 
I’m on my phone and too lazy to check. If it doesn’t it probably wouldn’t be 
that hard to add.


> On Feb 28, 2020, at 10:50 PM, Oleksandr Shulgin 
>  wrote:
> 
> 
>> On Fri, 28 Feb 2020, 23:02 Erick Ramirez,  wrote:
>> I'm not personally aware of anyone who is using it successfully other than 
>> ProtectWise where it was a good fit for their narrow use case. My limited 
>> knowledge of it is that it has some sharp edges which is the reason they 
>> haven't pushed for it to be added to Cassandra (that's second hand info so 
>> please don't quote me).
>> 
>> 
>> In any case, my initial reaction is that Spark would probably be a better 
>> fit for what you're trying to achieve particularly since you don't know the 
>> PK up front. Cheers!
> 
> 
> I should have added that this is a one-time task: we do not intend to run it 
> constantly
> 
> And we are using TWCS, so after issuing millions of deletes with Spark (or 
> however else) we would have a hard time compacting the tombstones.
> 
> --
> Alex
>>