On 14.11.2012 13:33, Dave Angel wrote:
> Te birthday paradox could have been important had the OP stated his goal
> differently. What he said was:
>
> """Ideally I would want to avoid collisions altogether. But if that means
> significant extra CPU time then 1 collision in 10 million hashes wou
On 11/14/2012 06:29 AM, Johannes Bauer wrote:
>
>
> When doing these calculations, it's important to keep the birthday
> paradox in mind (this is kind of counter-intuitive): The chance of a
> collission raises tremendously when we're looking for *any* arbitrary
> two hashes colliding within a cert
On 14.11.2012 02:39, Roy Smith wrote:
> The next step is to reduce the number of bits you are encoding. You
> said in another post that "1 collision in 10 million hashes would be
> tolerable". So you need:
>
math.log(10*1000*1000, 2)
> 23.25349666421154
>
> 24 bits worth of key.
Nope
thanks for perspective!
--
http://mail.python.org/mailman/listinfo/python-list
On 14.11.2012 01:41, Richard Baron Penman wrote:
> I found the MD5 and SHA hashes slow to calculate.
Slow? For URLs? Are you kidding? How many URLs per second do you want to
calculate?
> The builtin hash is fast but I was concerned about collisions. What
> rate of collisions could I expect?
MD5
yeah good point - I have gone with md5 for now.
On Wednesday, November 14, 2012 3:06:18 PM UTC+11, Chris Angelico wrote:
> On Wed, Nov 14, 2012 at 2:25 PM, Richard wrote:
>
> > So the use case - I'm storing webpages on disk and want a quick retrieval
> > system based on URL.
>
> > I can't sto
On Wed, Nov 14, 2012 at 2:25 PM, Richard wrote:
> So the use case - I'm storing webpages on disk and want a quick retrieval
> system based on URL.
> I can't store the files in a single directory because of OS limitations so
> have been using a sub folder structure.
> For example to store data at
thanks for pointer to Varnish.
I found MongoDB had a lot of size overhead so that it ended up using 4x the
data stored.
--
http://mail.python.org/mailman/listinfo/python-list
In article <1ce88f36-bfc7-4a55-89f8-70d1645d2...@googlegroups.com>,
Richard wrote:
> So the use case - I'm storing webpages on disk and want a quick retrieval
> system based on URL.
> I can't store the files in a single directory because of OS limitations so
> have been using a sub folder str
> The next step is to reduce the number of bits you are encoding. You
>
> said in another post that "1 collision in 10 million hashes would be
>
> tolerable". So you need:
>
>
>
> >>> math.log(10*1000*1000, 2)
>
> 23.25349666421154
I think a difficulty would be finding a hash algorithm
So the use case - I'm storing webpages on disk and want a quick retrieval
system based on URL.
I can't store the files in a single directory because of OS limitations so have
been using a sub folder structure.
For example to store data at URL "abc": a/b/c/index.html
This data is also viewed loca
I am dealing with URL's rather than integers
--
http://mail.python.org/mailman/listinfo/python-list
In article <0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com>,
Richard wrote:
> Hello,
>
> I want to create a URL-safe unique ID for URL's.
> Currently I use:
> url_id = base64.urlsafe_b64encode(url)
>
> >>> base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
> 'ZG9jcy5weXRob24u
I found md5 / sha 4-5 times slower than hash. And base64 a lot slower.
No database or else I would just use their ID.
On Wednesday, November 14, 2012 11:59:55 AM UTC+11, Christian Heimes wrote:
> Am 14.11.2012 01:41, schrieb Richard Baron Penman:
>
> > I found the MD5 and SHA hashes slow to cal
Am 14.11.2012 01:50, schrieb Richard:
> These URL ID's would just be used internally for quick lookups, not exposed
> publicly in a web application.
>
> Ideally I would want to avoid collisions altogether. But if that means
> significant extra CPU time then 1 collision in 10 million hashes would
Am 14.11.2012 01:41, schrieb Richard Baron Penman:
> I found the MD5 and SHA hashes slow to calculate.
> The builtin hash is fast but I was concerned about collisions. What
> rate of collisions could I expect?
Seriously? It takes about 1-5msec to sha1() one MB of data on a modern
CPU, 1.5 on my bo
These URL ID's would just be used internally for quick lookups, not exposed
publicly in a web application.
Ideally I would want to avoid collisions altogether. But if that means
significant extra CPU time then 1 collision in 10 million hashes would be
tolerable.
--
http://mail.python.org/mailm
Am 14.11.2012 01:26, schrieb Chris Kaynor:
> One option would be using a hash. Python's built-in hash, a 32-bit
> CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
> depending on the needs. Higher bit counts will reduce the odds of
> accidental collisions; cryptographically secure
I found the MD5 and SHA hashes slow to calculate.
The builtin hash is fast but I was concerned about collisions. What
rate of collisions could I expect?
Outside attacks not an issue and multiple processes would be used.
On Wed, Nov 14, 2012 at 11:26 AM, Chris Kaynor wrote:
> One option would be
One option would be using a hash. Python's built-in hash, a 32-bit
CRC, 128-bit MD5, 256-bit SHA or one of the many others that exist,
depending on the needs. Higher bit counts will reduce the odds of
accidental collisions; cryptographically secure ones if outside
attacks matter. In such a case, yo
Good point - one way encoding would be fine.
Also this is performed millions of times so ideally efficient.
On Wednesday, November 14, 2012 10:34:03 AM UTC+11, John Gordon wrote:
> In <0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com> Richard
> writes:
>
>
>
> > I want to create a URL-
In <0692e6a2-343c-4eb0-be57-fe5c815ef...@googlegroups.com> Richard
writes:
> I want to create a URL-safe unique ID for URL's.
> Currently I use:
> url_id = base64.urlsafe_b64encode(url)
> >>> base64.urlsafe_b64encode('docs.python.org/library/uuid.html')
> 'ZG9jcy5weXRob24ub3JnL2xpYnJhcnkvdXVpZC
22 matches
Mail list logo