Hi Stephen,
we are hashing the first 8 byte (8 US-ASCII characters) of text that has
been written by humans. Wouldn't it be easy for the attacker to do a
dictionary attack on this text, especially if he knows the language of
the text?
Kind regards
Matthias
On 10/13/2011 08:20 PM, Stephen Connolly wrote:
in theory, however they have less than 32 bits of entropy from which
they can do that, leaving them with at least 32 more bits of
combinations to try... that's 2 billion or so... must be a big dictionary
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on
the screen
On 13 Oct 2011 17:57, "Matthias Pfau" <p...@l3s.de <mailto:p...@l3s.de>>
wrote:
Hi Stephen,
this sounds very reasonable. But wouldn't this enable an attacker to
execute dictionary attacks in order to "decrypt" the first 8 bytes
of the plain text?
Kind regards
Matthias
On 10/13/2011 05:03 PM, Stephen Connolly wrote:
It wouldn't be unencrypted... which is the point
you use a one way linear hash function to take the first, say 8
bytes,
of unencrypted data and turn it into 4 bytes of a sort prefix.
You've used lost half the data in the process, so effectively
each bit
is an OR of two bits and you can only infer from 0 values... so data
is still encrypted, but you have an approximate sorting.
For example, if your data is US-ASCII text with no numbers, you
could
use Soundex to get the pre-key, so that worst case you have a bucket
of values in the range.
Using this technique, a random get will have to get the values
at the
desired prefix +/- a small amount rather than the whole row...
on the
client side you can then decrypt the data and sort that small bucket
to get the correct index position.
You could do a 1 byte prefix, but that only gives you at best 256
buckets and assumes that the first 2 bytes are uniformly
distributed... you've said your data is not uniformly
distributed, so
a linear hash function sounds like your best bet.
your hash function should have the property that hash(A)>=
hash(B) if
and only if A>= B
On 13 October 2011 08:47, Matthias Pfau<p...@l3s.de
<mailto:p...@l3s.de>> wrote:
Hi Stephen,
this is a great idea but unfortunately doesn't work for us
either as we can
not store the data in an unencrypted form.
Kind regards
Matthias
On 10/12/2011 07:42 PM, Stephen Connolly wrote:
could you prefix the data with 3-4 bytes of a linear
hash of the
unencypted data? it wouldn't be a perfect sort, but
you'd have less of a
range to query to get the sorted values?
- Stephen
---
Sent from my Android phone, so random spelling mistakes,
random nonsense
words and other nonsense are a direct result of using
swype to type on
the screen
On 12 Oct 2011 17:57, "Matthias Pfau"<p...@l3s.de
<mailto:p...@l3s.de><mailto:pfau@__l3s.de
<mailto:p...@l3s.de>>>
wrote:
Unfortunately, that is not an option as we have to
store the data in
an compressed and encrypted and therefore binary and
non-sortable form.
On 10/12/2011 06:39 PM, David McNelis wrote:
Is it an option to not convert the data to
binary prior to
inserting
into Cassandra? Also, how large are the strings
you're sorting?
If its
viable to not convert to binary before writing
to Cassandra, and
you use
one of the string based column ordering
techniques (utf8, ascii,
for
example), then the data would be sorted without
you needing to
specifically worry about that. Of course, if
the strings are
lengthy
you could run into additional issues.
On Wed, Oct 12, 2011 at 11:34 AM, Matthias
Pfau<p...@l3s.de <mailto:p...@l3s.de>
<mailto:p...@l3s.de <mailto:p...@l3s.de>>
<mailto:p...@l3s.de
<mailto:p...@l3s.de><mailto:pfa...@l3s.de
<mailto:p...@l3s.de>>>> wrote:
Hi there,
we are currently building a prototype based
on cassandra and
came
into problems on implementing sorted lists
containing
millions of items.
The special thing about the items of our
lists is, that
cassandra is
not able to sort them as the data is stored
in a binary
format which
is not sortable. However, we are able to
sort the data
before the
plain data gets encoded (our application is
responsible for
the order).
First Approach: Storing Lists in ColumnFamilies
***
We first tried to map the list to a single
row of a
ColumnFamily in
a way that the index of the list is mapped
to the column
names and
the items of the list to the column values.
The column names
are
increasing numbers which define the sort order.
This has the major drawback that big parts
of the list have
to be
rewritten on inserts (because the column
names are numbered
by their
index), which are quite common.
Second Approach: Storing the whole List as
Binary Data:
***
We tried to store the compressed list in a
single column.
However,
this is only feasible for smaller lists. Our
lists are far
to big
leading to multi megabyte reads and writes.
As we need to
read and
update the lists quite often, this would put
our Cassandra
cluster
under a lot of pressure.
Ideal Solution: Native support for storing lists
***
We would be very happy with a way to store a
list of sorted
values
without making improper use of column names
for the list
index. This
implies that we would need a possibility to
insert values at
defined
positions. We know that this could lead to
problems with
concurrent
inserts in a distributed environment, but
this is handled by
our
application logic.
What are your ideas on that?
Thanks
Matthias
--
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
<http://www.agentisenergy.com><http://__www.agentisenergy.com
<http://www.agentisenergy.com>>
<http://www.agentisenergy.com>
c: 219.384.5143
<tel:219.384.5143><tel:219.384.5143 <tel:219.384.5143>>
/A Smart Grid technology company focused on
helping consumers of
energy
control an often under-managed resource./