Well you could use a DOUBLE value to encode relative positions...
first item in list gets key 1.0
insert before first item -> key[first]/2.0;
append after last item -> key[last]*2.0;
insert after non-final item -> (key[n]+key[n+1])/2.0
Using double precision should give you quite a space to fit i
Hi David,
yes, what we are working on could be referenced as "encrypted database
service".
Thanks for your insights. We will continue to work on this topic!
Kind regards
Matthias
On 10/21/2011 02:31 AM, David Jeske wrote:
If I understand you correctly, you are saying that you will never have
>
>
> 2) If a single key, would adding a file/block/record-level encryption to
>> Cassandra solve this problem? If not, why not? Is there something
>> special about your encryption methods?
>>
>
> There is nothing special about our encryption methods but will never be
> able to encrypt or decrypt
David,
thanks for sharing these ideas. We basically implemented it by using
longs and dividing the long-namespace into multiple parts on each
insert. As you already described, the biggest problem with this approach
is, that we are not able to simply invoke get(x) because the indexes of
the lis
Hi David,
encrypting and Decrypting data in cassandra is not an option to us.
However, I read your elaboration on best practices for adding a custom
feature to cassandra with a lot of interest. Thank you!
Please see below for the answers to your questions.
Kind regards
Matthias
On 10/18/2011
On Tue, Oct 18, 2011 at 12:14 AM, Matthias Pfau wrote:
> we want to sort completely on the client-side (where the data is
> encrypted). But that requires an "insert at offset X" operation. We would
> always use CL QUORUM and client side synchronisation.
>
You can do "insert at offset X"... just
Aaron,
we want to sort completely on the client-side (where the data is
encrypted). But that requires an "insert at offset X" operation. We
would always use CL QUORUM and client side synchronisation.
However, it seems to be not be a good idea to add such a feature to
cassandra as everyone usi
On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau wrote:
> We would be very happy if cassandra would give us an option to maintain the
> sort order on our own (application logic). That is why it would be
> interesting to hear from any of the developers if it would be easily
> possible to add such a
Sort order is determined by the Comparator, which is an implementation of the
o.a.c.db.marshal.AbstractType class.
If you wish to order column (names) in a row based on an opaque (to cassandra)
byte value you can create your own implementation. You would then need to
decrypt and compare colum
Thanks for that hint! However, it seems like soundex is a very language
specific algorithm (US English). We have to get into this topic further...
Kind regards
Matthias
On 10/13/2011 10:43 PM, Stephen Connolly wrote:
Then just use a soundex function on the first word in the text... that
will s
David,
thanks for your nice summary on this topic.
We would be very happy if cassandra would give us an option to maintain
the sort order on our own (application logic). That is why it would be
interesting to hear from any of the developers if it would be easily
possible to add such a feature
Logically, whether you use cassandra or not, there is some "physics" of
sorted order structures which you should understand and dictate what is
possible.
In order to keep data sorted, a database needs to be able to see the proper
sort-order of the data "all the time" not just at insertion or query
Then just use a soundex function on the first word in the text... that
will shrink it sufficiently and give nice buckets in near sequential
order (http://en.wikipedia.org/wiki/Soundex)
On 13 October 2011 21:21, Matthias Pfau wrote:
> Hi Stephen,
> we are hashing the first 8 byte (8 US-ASCII chara
Hi Stephen,
we are hashing the first 8 byte (8 US-ASCII characters) of text that has
been written by humans. Wouldn't it be easy for the attacker to do a
dictionary attack on this text, especially if he knows the language of
the text?
Kind regards
Matthias
On 10/13/2011 08:20 PM, Stephen Con
in theory, however they have less than 32 bits of entropy from which they
can do that, leaving them with at least 32 more bits of combinations to
try... that's 2 billion or so... must be a big dictionary
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words
Hi Stephen,
this sounds very reasonable. But wouldn't this enable an attacker to
execute dictionary attacks in order to "decrypt" the first 8 bytes of
the plain text?
Kind regards
Matthias
On 10/13/2011 05:03 PM, Stephen Connolly wrote:
It wouldn't be unencrypted... which is the point
you u
Hi Zach,
thanks for your additional input. You are absolutely right: The long
namespace should be big enough. We are going to insert up to 2^32 values
into the list.
We only need support for get(index), insert(index) and remove(index)
while get and insert will be used very often. Remove is al
It wouldn't be unencrypted... which is the point
you use a one way linear hash function to take the first, say 8 bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.
You've used lost half the data in the process, so effectively each bit
is an OR of two bits and you can only infer
Matthias,
Answers below.
On Thu, Oct 13, 2011 at 9:03 AM, Matthias Pfau wrote:
> Hi Zach,
> thanks for that good idea. Unfortunately, our list needs to be rewritten
> often because our data is far away from being evenly distributed.
This shouldn't be a problem if you use long's. If you were to
Hi Zach,
thanks for that good idea. Unfortunately, our list needs to be rewritten
often because our data is far away from being evenly distributed.
However, we could get this under control but there is a more severe
problem: Random access is very hard to implement on a structure with
undefine
Matthias,
This is an interesting problem.
I would consider using long's as the column type, where your column
names are evenly distributed longs in sort order when you first write
your list out. So if you have items A and C with the long column
names 1000 and 2000, and then you have to insert B,
Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we
can not store the data in an unencrypted form.
Kind regards
Matthias
On 10/12/2011 07:42 PM, Stephen Connolly wrote:
could you prefix the data with 3-4 bytes of a linear hash of the
unencypted data? it wouldn'
could you prefix the data with 3-4 bytes of a linear hash of the unencypted
data? it wouldn't be a perfect sort, but you'd have less of a range to query
to get the sorted values?
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a
Unfortunately, that is not an option as we have to store the data in an
compressed and encrypted and therefore binary and non-sortable form.
On 10/12/2011 06:39 PM, David McNelis wrote:
Is it an option to not convert the data to binary prior to inserting
into Cassandra? Also, how large are the
Is it an option to not convert the data to binary prior to inserting into
Cassandra? Also, how large are the strings you're sorting? If its viable
to not convert to binary before writing to Cassandra, and you use one of the
string based column ordering techniques (utf8, ascii, for example), then
Hi there,
we are currently building a prototype based on cassandra and came into
problems on implementing sorted lists containing millions of items.
The special thing about the items of our lists is, that cassandra is not
able to sort them as the data is stored in a binary format which is not
26 matches
Mail list logo