in theory, however they have less than 32 bits of entropy from which they can do that, leaving them with at least 32 more bits of combinations to try... that's 2 billion or so... must be a big dictionary
- Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 13 Oct 2011 17:57, "Matthias Pfau" <p...@l3s.de> wrote: > Hi Stephen, > this sounds very reasonable. But wouldn't this enable an attacker to > execute dictionary attacks in order to "decrypt" the first 8 bytes of the > plain text? > > Kind regards > Matthias > > On 10/13/2011 05:03 PM, Stephen Connolly wrote: > >> It wouldn't be unencrypted... which is the point >> >> you use a one way linear hash function to take the first, say 8 bytes, >> of unencrypted data and turn it into 4 bytes of a sort prefix. >> >> You've used lost half the data in the process, so effectively each bit >> is an OR of two bits and you can only infer from 0 values... so data >> is still encrypted, but you have an approximate sorting. >> >> For example, if your data is US-ASCII text with no numbers, you could >> use Soundex to get the pre-key, so that worst case you have a bucket >> of values in the range. >> >> Using this technique, a random get will have to get the values at the >> desired prefix +/- a small amount rather than the whole row... on the >> client side you can then decrypt the data and sort that small bucket >> to get the correct index position. >> >> You could do a 1 byte prefix, but that only gives you at best 256 >> buckets and assumes that the first 2 bytes are uniformly >> distributed... you've said your data is not uniformly distributed, so >> a linear hash function sounds like your best bet. >> >> your hash function should have the property that hash(A)>= hash(B) if >> and only if A>= B >> >> On 13 October 2011 08:47, Matthias Pfau<p...@l3s.de> wrote: >> >>> Hi Stephen, >>> this is a great idea but unfortunately doesn't work for us either as we >>> can >>> not store the data in an unencrypted form. >>> >>> Kind regards >>> Matthias >>> >>> On 10/12/2011 07:42 PM, Stephen Connolly wrote: >>> >>>> >>>> could you prefix the data with 3-4 bytes of a linear hash of the >>>> unencypted data? it wouldn't be a perfect sort, but you'd have less of a >>>> range to query to get the sorted values? >>>> >>>> - Stephen >>>> >>>> --- >>>> Sent from my Android phone, so random spelling mistakes, random nonsense >>>> words and other nonsense are a direct result of using swype to type on >>>> the screen >>>> >>>> On 12 Oct 2011 17:57, "Matthias >>>> Pfau"<p...@l3s.de<mailto:pfau@**l3s.de<p...@l3s.de> >>>> >> >>>> wrote: >>>> >>>> Unfortunately, that is not an option as we have to store the data in >>>> an compressed and encrypted and therefore binary and non-sortable >>>> form. >>>> >>>> On 10/12/2011 06:39 PM, David McNelis wrote: >>>> >>>> Is it an option to not convert the data to binary prior to >>>> inserting >>>> into Cassandra? Also, how large are the strings you're sorting? >>>> If its >>>> viable to not convert to binary before writing to Cassandra, and >>>> you use >>>> one of the string based column ordering techniques (utf8, ascii, >>>> for >>>> example), then the data would be sorted without you needing to >>>> specifically worry about that. Of course, if the strings are >>>> lengthy >>>> you could run into additional issues. >>>> >>>> On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau<p...@l3s.de >>>> <mailto:p...@l3s.de> >>>> <mailto:p...@l3s.de<mailto:pfa**u...@l3s.de <p...@l3s.de>>>> >>>> wrote: >>>> >>>> Hi there, >>>> we are currently building a prototype based on cassandra and >>>> came >>>> into problems on implementing sorted lists containing >>>> millions of items. >>>> >>>> The special thing about the items of our lists is, that >>>> cassandra is >>>> not able to sort them as the data is stored in a binary >>>> format which >>>> is not sortable. However, we are able to sort the data >>>> before the >>>> plain data gets encoded (our application is responsible for >>>> the order). >>>> >>>> First Approach: Storing Lists in ColumnFamilies >>>> *** >>>> We first tried to map the list to a single row of a >>>> ColumnFamily in >>>> a way that the index of the list is mapped to the column >>>> names and >>>> the items of the list to the column values. The column names >>>> are >>>> increasing numbers which define the sort order. >>>> This has the major drawback that big parts of the list have >>>> to be >>>> rewritten on inserts (because the column names are numbered >>>> by their >>>> index), which are quite common. >>>> >>>> >>>> Second Approach: Storing the whole List as Binary Data: >>>> *** >>>> We tried to store the compressed list in a single column. >>>> However, >>>> this is only feasible for smaller lists. Our lists are far >>>> to big >>>> leading to multi megabyte reads and writes. As we need to >>>> read and >>>> update the lists quite often, this would put our Cassandra >>>> cluster >>>> under a lot of pressure. >>>> >>>> Ideal Solution: Native support for storing lists >>>> *** >>>> We would be very happy with a way to store a list of sorted >>>> values >>>> without making improper use of column names for the list >>>> index. This >>>> implies that we would need a possibility to insert values at >>>> defined >>>> positions. We know that this could lead to problems with >>>> concurrent >>>> inserts in a distributed environment, but this is handled by >>>> our >>>> application logic. >>>> >>>> >>>> What are your ideas on that? >>>> >>>> Thanks >>>> Matthias >>>> >>>> >>>> >>>> >>>> -- >>>> *David McNelis* >>>> Lead Software Engineer >>>> Agentis Energy >>>> >>>> www.agentisenergy.com<http://**www.agentisenergy.com<http://www.agentisenergy.com> >>>> > >>>> <http://www.agentisenergy.com> >>>> c: 219.384.5143<tel:219.384.5143> >>>> >>>> /A Smart Grid technology company focused on helping consumers of >>>> energy >>>> control an often under-managed resource./ >>>> >>>> >>>> >>>> >>> >>> >