> Adam Olsen (AO) wrote:
>AO> The Wayback Machine has 150 billion pages, so 2**37. Google's index
>AO> is a bit larger at over a trillion pages, so 2**40. A little closer
>AO> than I'd like, but that's still 56294995000 to 1 odds of having
>AO> *any* collisions between *any* of the file
On Fri, 17 Apr 2009 11:19:31 -0700, Adam Olsen wrote:
> Actually, *cryptographic* hashes handle that just fine. Even for files
> with just a 1 bit change the output is totally different. This is known
> as the Avalanche Effect. Otherwise they'd be vulnerable to attacks.
>
> Which isn't to say
In message , Nigel
Rantor wrote:
> Adam Olsen wrote:
>
>> The chance of *accidentally* producing a collision, although
>> technically possible, is so extraordinarily rare that it's completely
>> overshadowed by the risk of a hardware or software failure producing
>> an incorrect result.
>
> Not
On Apr 17, 9:59 am, SpreadTooThin wrote:
> You know this is just insane. I'd be satisfied with a CRC16 or
> something in the situation i'm in.
> I have two large files, one local and one remote. Transferring every
> byte across the internet to be sure that the two files are identical
> is just n
On Apr 17, 9:59 am, norseman wrote:
> The more complicated the math the harder it is to keep a higher form of
> math from checking (or improperly displacing) a lower one. Which, of
> course, breaks the rules. Commonly called improper thinking. A number
> of math teasers make use of that.
Of cou
On Apr 17, 5:30 am, Tim Wintle wrote:
> On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote:
> > The Wayback Machine has 150 billion pages, so 2**37. Google's index
> > is a bit larger at over a trillion pages, so 2**40. A little closer
> > than I'd like, but that's still 56294995000 to 1 od
On Apr 17, 4:54 am, Nigel Rantor wrote:
> Adam Olsen wrote:
> > On Apr 16, 11:15 am, SpreadTooThin wrote:
> >> And yes he is right CRCs hashing all have a probability of saying that
> >> the files are identical when in fact they are not.
>
> > Here's the bottom line. It is either:
>
> > A) Sever
Adam Olsen wrote:
On Apr 16, 11:15 am, SpreadTooThin wrote:
And yes he is right CRCs hashing all have a probability of saying that
the files are identical when in fact they are not.
Here's the bottom line. It is either:
A) Several hundred years of mathematics and cryptography are wrong.
The
On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote:
> The Wayback Machine has 150 billion pages, so 2**37. Google's index
> is a bit larger at over a trillion pages, so 2**40. A little closer
> than I'd like, but that's still 56294995000 to 1 odds of having
> *any* collisions between *any* o
Adam Olsen wrote:
On Apr 16, 11:15 am, SpreadTooThin wrote:
And yes he is right CRCs hashing all have a probability of saying that
the files are identical when in fact they are not.
Here's the bottom line. It is either:
A) Several hundred years of mathematics and cryptography are wrong.
The
Adam Olsen wrote:
On Apr 16, 4:27 pm, "Rhodri James"
wrote:
On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote:
On Apr 16, 3:16 am, Nigel Rantor wrote:
Okay, before I tell you about the empirical, real-world evidence I have
could you please accept that hashes collide and that no matter ho
On Apr 16, 4:27 pm, "Rhodri James"
wrote:
> On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote:
> > On Apr 16, 3:16 am, Nigel Rantor wrote:
> >> Okay, before I tell you about the empirical, real-world evidence I have
> >> could you please accept that hashes collide and that no matter how many
On Apr 16, 11:15 am, SpreadTooThin wrote:
> And yes he is right CRCs hashing all have a probability of saying that
> the files are identical when in fact they are not.
Here's the bottom line. It is either:
A) Several hundred years of mathematics and cryptography are wrong.
The birthday problem
On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote:
On Apr 16, 3:16 am, Nigel Rantor wrote:
Okay, before I tell you about the empirical, real-world evidence I have
could you please accept that hashes collide and that no matter how many
samples you use the probability of finding two files th
On Apr 16, 8:59 am, Grant Edwards wrote:
> On 2009-04-16, Adam Olsen wrote:
> > I'm afraid you will need to back up your claims with real files.
> > Although MD5 is a smaller, older hash (128 bits, so you only need
> > 2**64 files to find collisions),
>
> You don't need quite that many to have a
On Apr 16, 3:16 am, Nigel Rantor wrote:
> Adam Olsen wrote:
> > On Apr 15, 12:56 pm, Nigel Rantor wrote:
> >> Adam Olsen wrote:
> >>> The chance of *accidentally* producing a collision, although
> >>> technically possible, is so extraordinarily rare that it's completely
> >>> overshadowed by the
On 2009-04-16, Adam Olsen wrote:
> The chance of *accidentally* producing a collision, although
> technically possible, is so extraordinarily rare that it's
> completely overshadowed by the risk of a hardware or software
> failure producing an incorrect result.
Not when
Adam Olsen wrote:
On Apr 16, 3:16 am, Nigel Rantor wrote:
Adam Olsen wrote:
On Apr 15, 12:56 pm, Nigel Rantor wrote:
Adam Olsen wrote:
The chance of *accidentally* producing a collision, although
technically possible, is so extraordinarily rare that it's completely
overshadowed by the risk
On Apr 16, 3:16 am, Nigel Rantor wrote:
> Adam Olsen wrote:
> > On Apr 15, 12:56 pm, Nigel Rantor wrote:
> >> Adam Olsen wrote:
> >>> The chance of *accidentally* producing a collision, although
> >>> technically possible, is so extraordinarily rare that it's completely
> >>> overshadowed by the
Adam Olsen wrote:
On Apr 15, 12:56 pm, Nigel Rantor wrote:
Adam Olsen wrote:
The chance of *accidentally* producing a collision, although
technically possible, is so extraordinarily rare that it's completely
overshadowed by the risk of a hardware or software failure producing
an incorrect resu
On Apr 15, 12:56 pm, Nigel Rantor wrote:
> Adam Olsen wrote:
> > The chance of *accidentally* producing a collision, although
> > technically possible, is so extraordinarily rare that it's completely
> > overshadowed by the risk of a hardware or software failure producing
> > an incorrect result.
Adam Olsen wrote:
The chance of *accidentally* producing a collision, although
technically possible, is so extraordinarily rare that it's completely
overshadowed by the risk of a hardware or software failure producing
an incorrect result.
Not when you're using them to compare lots of files.
Tr
On Apr 15, 11:04 am, Nigel Rantor wrote:
> The fact that two md5 hashes are equal does not mean that the sources
> they were generated from are equal. To do that you must still perform a
> byte-by-byte comparison which is much less work for the processor than
> generating an md5 or sha hash.
>
> I
On Apr 15, 8:04 am, Grant Edwards wrote:
> On 2009-04-15, Martin wrote:
>
>
>
> > Hi,
>
> > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote:
> >> On 2009-04-13, SpreadTooThin wrote:
>
> >>> I want to compare two binary files and see if they are the same.
> >>> I see the filecmp.cmp functi
Grant Edwards wrote:
We all rail against premature optimization, but using a
checksum instead of a direct comparison is premature
unoptimization. ;)
And more than that, will provide false positives for some inputs.
So, basically it's a worse-than-useless approach for determining if two
files
Martin wrote:
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
wrote:
The checksum does look at every byte in each file. Checksumming isn't a
way to avoid looking at each byte of the two files, it is a way of
mapping all the bytes to a single number.
My understanding of the original question
On 2009-04-15, Martin wrote:
> On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
> I'd still say rather burn CPU cycles than development hours (if I got
> the question right),
_Hours_? Calling the file compare module takes _one_line_of_code_.
Implementing a file compare from scratch takes abo
On 2009-04-15, Martin wrote:
> Hi,
>
> On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote:
>> On 2009-04-13, SpreadTooThin wrote:
>>
>>> I want to compare two binary files and see if they are the same.
>>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>>> that it is doin
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano
wrote:
> The checksum does look at every byte in each file. Checksumming isn't a
> way to avoid looking at each byte of the two files, it is a way of
> mapping all the bytes to a single number.
My understanding of the original question was a way t
On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote:
>> Perhaps I'm being dim, but how else are you going to decide if two
>> files are the same unless you compare the bytes in the files?
>
> I'd say checksums, just about every download relies on checksums to
> verify you do have indeed the same fil
Hi,
On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote:
> On 2009-04-13, SpreadTooThin wrote:
>
>> I want to compare two binary files and see if they are the same.
>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>> that it is doing a byte by byte comparison of two files
On Apr 13, 8:39 pm, Grant Edwards wrote:
> On 2009-04-13, Peter Otten <__pete...@web.de> wrote:
>
> > But there's a cache. A change of file contents may go
> > undetected as long as the file stats don't change:
>
> Good point. You can fool it if you force the stats to their
> old values after you
I want to compare two binary files and see if they are the same.
I see the filecmp.cmp function but I don't get a warm fuzzy feeling
that it is doing a byte by byte comparison of two files to see if they
are they same.
What should I be using if not filecmp.cmp?
--
http://mail.python.org/mailman/li
On 2009-04-13, Peter Otten <__pete...@web.de> wrote:
> But there's a cache. A change of file contents may go
> undetected as long as the file stats don't change:
Good point. You can fool it if you force the stats to their
old values after you modify a file and you don't clear the
cache.
--
Gra
SpreadTooThin wrote:
On Apr 13, 2:37 pm, Grant Edwards wrote:
On 2009-04-13, Grant Edwards wrote:
On 2009-04-13, SpreadTooThin wrote:
I want to compare two binary files and see if they are the same.
I see the filecmp.cmp function but I don't get a warm fuzzy feeling
that i
On Mon, 13 Apr 2009 15:03:32 -0500, Grant Edwards wrote:
> On 2009-04-13, SpreadTooThin wrote:
>
>> I want to compare two binary files and see if they are the same. I see
>> the filecmp.cmp function but I don't get a warm fuzzy feeling that it
>> is doing a byte by byte comparison of two files t
Grant Edwards wrote:
> On 2009-04-13, Grant Edwards wrote:
>> On 2009-04-13, SpreadTooThin wrote:
>>
>>> I want to compare two binary files and see if they are the same.
>>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>>> that it is doing a byte by byte comparison of two
On Apr 13, 2:37 pm, Grant Edwards wrote:
> On 2009-04-13, Grant Edwards wrote:
>
>
>
> > On 2009-04-13, SpreadTooThin wrote:
>
> >> I want to compare two binary files and see if they are the same.
> >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> >> that it is doing a by
On 2009-04-13, Grant Edwards wrote:
> On 2009-04-13, SpreadTooThin wrote:
>
>> I want to compare two binary files and see if they are the same.
>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>> that it is doing a byte by byte comparison of two files to see if they
>> are t
On Apr 13, 2:03 pm, Grant Edwards wrote:
> On 2009-04-13, SpreadTooThin wrote:
>
> > I want to compare two binary files and see if they are the same.
> > I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> > that it is doing a byte by byte comparison of two files to see if they
On 2009-04-13, SpreadTooThin wrote:
> I want to compare two binary files and see if they are the same.
> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> that it is doing a byte by byte comparison of two files to see if they
> are they same.
Perhaps I'm being dim, but how el
On Apr 13, 2:00 pm, Przemyslaw Kaminski wrote:
> SpreadTooThin wrote:
> > I want to compare two binary files and see if they are the same.
> > I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> > that it is doing a byte by byte comparison of two files to see if they
> > are they
SpreadTooThin wrote:
> I want to compare two binary files and see if they are the same.
> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
> that it is doing a byte by byte comparison of two files to see if they
> are they same.
>
> What should I be using if not filecmp.cmp?
W
43 matches
Mail list logo