The times I have run into similar requirements from legislation or standards 
the fact that SELECT no longer returns the data is enough for all auditors I 
have worked with.
Otherwise you get down into screwy requirements of needing to zero out all 
unused sectors on your disks to actually remove the data, and make sure nothing 
has the drive sectored cached somewhere, and other such things.

-Jeremiah

> On Feb 9, 2018, at 1:54 PM, Jon Haddad <j...@jonhaddad.com> wrote:
> 
> A layer violation?  Seriously?  Technical solutions exist to solve business 
> problems and I’m 100% fine with introducing former to solve the latter.
> 
> Look, if the goal is to purge information out of the DB as quickly as 
> possible from a lot of accounts, the fastest way to do it is to hijack the 
> fact that you’re constantly rewriting data through compaction and (ab)use it. 
>  It avoids the overhead of tombstones, and can be implemented in a way that 
> allows you to to perform a single write / edit a text file / some other 
> trivial system and immediately start removing customer data.  It’s an 
> incredibly efficient way of bulk removing customer data.  
> 
> The wording around "The Right To Be Forgotten” is a little vague [1], and I 
> don’t know if "the right to be forgotten entitles the data subject to have 
> the data controller erase his/her personal data” means that tombstones are 
> OK.  If you tombstone some row using TWCS, it will literally *never* be 
> deleted off disk, as opposed to using DeletingCompactionStrategy where it 
> could easily be removed without leaving data laying around in SSTables.  I’ve 
> done this already for this *exact* use case and know it works and works very 
> well.
> 
> The debate around what is the “correct” way to solve the problem is a 
> dogmatic one and I don’t have any interest in pursuing it any further.  I’ve 
> simply offered a solution that I know works because I’ve done it, which is 
> what the OP asked for.
> 
> [1] https://www.eugdpr.org/key-changes.html
> 
>> On Feb 9, 2018, at 10:33 AM, Dor Laor <d...@scylladb.com> wrote:
>> 
>> I think you're introducing a layer violation. GDPR is a business requirement 
>> and
>> compaction is an implementation detail. 
>> 
>> IMHO it's enough to delete the partition using regular CQL.
>> It's true that it won't be deleted immedietly but it will be eventually 
>> deleted (welcome to eventual consistency ;).
>> 
>> Even with user defined compaction, compaction may not be running instantly, 
>> repair will be required,
>> there are other nodes in the cluster, maybe partitioned nodes with the data. 
>> There is data in snapshots
>> and backups.
>> 
>> The business idea is to delete the data in a fast, reasonable time for 
>> humans and make it
>> first unreachable and later delete completely. 
>> 
>>> On Fri, Feb 9, 2018 at 8:51 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>> That might be fine for a one off but is totally impractical at scale or 
>>> when using TWCS. 
>>>> On Fri, Feb 9, 2018 at 8:39 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>>>> Or use the new user-defined compaction option recently introduced, 
>>>> provided you can determine over which SSTables a partition is spread
>>>> 
>>>>> On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>> Give this a read through:
>>>>> 
>>>>> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>>>>> 
>>>>> Basically you write your own logic for how stuff gets forgotten, then you 
>>>>> can recompact every sstable with upgradesstables -a.  
>>>>> 
>>>>> Jon
>>>>> 
>>>>> 
>>>>>> On Feb 9, 2018, at 8:10 AM, Nicolas Guyomar <nicolas.guyo...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> Because of GDPR we really face the need to support “Right to Be 
>>>>>> Forgotten” requests => https://gdpr-info.eu/art-17-gdpr/  stating that 
>>>>>> "the controller shall have the obligation to erase personal data without 
>>>>>> undue delay"
>>>>>> 
>>>>>> Because I usually meet customers that do not have that much clients, 
>>>>>> modeling one partition per client is almost always possible, easing 
>>>>>> deletion by partition key.
>>>>>> 
>>>>>> Then, appart from triggering a manual compaction on impacted tables 
>>>>>> using STCS, I do not see how I can be GDPR compliant.
>>>>>> 
>>>>>> I'm kind of surprised not to find any thread on that matter on the ML, 
>>>>>> do you guys have any modeling strategy that would make it easier to get 
>>>>>> rid of data ? 
>>>>>> 
>>>>>> Thank you for any given advice
>>>>>> 
>>>>>> Nicolas
>>>>> 
>>>> 
>> 
> 

Reply via email to