Re: hash() yields different results for different platforms

2006-07-13 Thread Piet van Oostrum
> "Kerry, Richard" <[EMAIL PROTECTED]> (KR) wrote: >KR> The hash is not expected to be unique, it just provides a starting point >KR> for another search (usually linear ?). >KR> See http://en.wikipedia.org/wiki/Hash_function That only contains a definition of a hash function. I know what a

Re: hash() yields different results for different platforms

2006-07-12 Thread Grant Edwards
On 2006-07-12, Qiangning Hong <[EMAIL PROTECTED]> wrote: > Grant Edwards wrote: >> On 2006-07-11, Qiangning Hong <[EMAIL PROTECTED]> wrote: >> > However, when I come to Python's builtin hash() function, I >> > found it produces different values in my two computers! In a >> > pentium4, hash('a') ->

Re: hash() yields different results for different platforms

2006-07-12 Thread Grant Edwards
On 2006-07-12, Carl Banks <[EMAIL PROTECTED]> wrote: > Grant Edwards wrote: >> On 2006-07-11, Qiangning Hong <[EMAIL PROTECTED]> wrote: >> >> > I'm writing a spider. I have millions of urls in a table (mysql) to >> > check if a url has already been fetched. To check fast, I am >> > considering to a

Re: hash() yields different results for different platforms

2006-07-12 Thread Paul Rubin
"Kerry, Richard" <[EMAIL PROTECTED]> writes: > The hash is not expected to be unique, it just provides a starting point > for another search (usually linear ?). The database is good at organizing indexes and searching in them. Why not let the database do what it's good at. -- http://mail.pytho

RE: hash() yields different results for different platforms

2006-07-12 Thread Kerry, Richard
Oostrum Sent: 12 July 2006 10:56 To: python-list@python.org Subject: Re: hash() yields different results for different platforms >>>>> Grant Edwards <[EMAIL PROTECTED]> (GE) wrote: >GE> The low 32 bits match, so perhaps you should just use that >GE> portion of

Re: hash() yields different results for different platforms

2006-07-12 Thread Piet van Oostrum
> Grant Edwards <[EMAIL PROTECTED]> (GE) wrote: >GE> The low 32 bits match, so perhaps you should just use that >GE> portion of the returned hash? If the hashed should be unique, 32 bits is much too low if you have millions of entries. -- Piet van Oostrum <[EMAIL PROTECTED]> URL: http://www.

Re: hash() yields different results for different platforms

2006-07-12 Thread Nick Vatamaniuc
Using Python's hash as column in the table might not be a good idea. You just found out why. So you could instead just use the base url and create an index based on that so next time just quickly get all urls from same base address then do a linear search for a specific one, or even easier, impleme

Re: hash() yields different results for different platforms

2006-07-12 Thread Fredrik Lundh
Qiangning Hong wrote: > /.../ add a "hash" column in the table, make it a unique key at this point, you should have slapped yourself on the forehead, and gone back to the drawing board. -- http://mail.python.org/mailman/listinfo/python-list

Re: hash() yields different results for different platforms

2006-07-12 Thread Tim Peters
[Grant Edwards] >> ... >> The low 32 bits match, so perhaps you should just use that >> portion of the returned hash? >> >> >>> hex(12416037344) >> '0x2E40DB1E0L' >> >>> hex(-468864544 & 0x) >> '0xE40DB1E0L' >> >> >>> hex(12416037344 & 0x) >> '0xE40DB1E0L' >> >>> hex

Re: hash() yields different results for different platforms

2006-07-11 Thread Qiangning Hong
Grant Edwards wrote: > On 2006-07-11, Qiangning Hong <[EMAIL PROTECTED]> wrote: > > However, when I come to Python's builtin hash() function, I > > found it produces different values in my two computers! In a > > pentium4, hash('a') -> -468864544; in a amd64, hash('a') -> > > 12416037344. Does ha

Re: hash() yields different results for different platforms

2006-07-11 Thread Carl Banks
Grant Edwards wrote: > On 2006-07-11, Qiangning Hong <[EMAIL PROTECTED]> wrote: > > > I'm writing a spider. I have millions of urls in a table (mysql) to > > check if a url has already been fetched. To check fast, I am > > considering to add a "hash" column in the table, make it a unique key, > > a

Re: hash() yields different results for different platforms

2006-07-11 Thread Grant Edwards
On 2006-07-11, Qiangning Hong <[EMAIL PROTECTED]> wrote: > I'm writing a spider. I have millions of urls in a table (mysql) to > check if a url has already been fetched. To check fast, I am > considering to add a "hash" column in the table, make it a unique key, > and use the following sql stateme

Re: hash() yields different results for different platforms

2006-07-11 Thread Paul Rubin
"Qiangning Hong" <[EMAIL PROTECTED]> writes: > However, when I come to Python's builtin hash() function, I found it > produces different values in my two computers! In a pentium4, > hash('a') -> -468864544; in a amd64, hash('a') -> 12416037344. Does > hash function depend on machine's word length