Re: Python and Math

2014-05-22 Thread Frank Millman

 wrote in message 
news:281f5806-8793-4fd2-877c-214927dda...@googlegroups.com...
>
> pip looked and saw that you already had it, so did nothing -- what did it 
> report? In this caes:
>
> 'pip install -U ipython[notebook]'
>
> might have worked: -U means upgrade even if I already have it.
>

Indeed it did - thanks for the tip.

I used pip to uninstall jinja2. Afterwards, running 'ipython notebook' 
predictably failed.

Then I ran the above command to upgrade ipython notebook. It figured out 
that jinja2 was missing and re-installed it. Now it works again.

Very smooth.

Frank



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tkinter errors out without clear message

2014-05-22 Thread Serhiy Storchaka

21.05.14 20:19, Terry Reedy написав(ла):

There is also the issue that TkVersion == 8.5 is underspecied -- there
are multiple bugfix releases.


root.call('info', 'patchlevel') returns more detailed info.


--
https://mail.python.org/mailman/listinfo/python-list


转发: hi,How much time can transition to python3

2014-05-22 Thread who2are2...@gmail.com
hi,
i learn python is 0.5 year,
i'm so much love python,

i come from non English speaking countries,
Python2 coding problem has been troubling me,
I started to learn the python3 now,
But many libraries do not support python3,
I know python3 publishing for many years.
Why do so many libraries or does not support python3,
Perhaps it is because of your home page, still in the striking position put 
python2 download link,

You can speed up the elimination of python2  ?

please

thank you



who2are2...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: 转发: hi,How much time can transition to python3

2014-05-22 Thread Ben Finney
"who2are2...@gmail.com"  writes:

> i learn python is 0.5 year,
> i'm so much love python,

Welcome, you have found a very good programming language. I'm glad you
like it.

> i come from non English speaking countries,
> Python2 coding problem has been troubling me,
> I started to learn the python3 now,

This is good. Python 3 makes it much easier to do the right thing with
writing systems worldwide.

> But many libraries do not support python3,
> I know python3 publishing for many years.
> Why do so many libraries or does not support python3,

Because Python 2 has a lot of inertia. There is a great amount of
existing Python 2 code, and many other systems built on that code.
Change takes time.

Be glad that you are learning Python 3 now! There has been great
improvement in the Python 3 landscape in recent years.

> Perhaps it is because of your home page, still in the striking
> position put python2 download link,

You have that backward; the website reflects the current needs of the
community. While the PYthon 3 transition is still going through rapid
change, the safest choice is still Python 2 for *existing * uses.

But for newcomers like yourself, Python 3 is now the right choice and
has been for some years. Congratulations!

> You can speed up the elimination of python2  ?

Yes, much has already been done, and much is still being done now. But
there is still more work to do, as you observed.

You can help by contacting the specific projects you rely on which still
do not have Python 3 support, and ask those people kindly how you can
help. We all get there faster by helping each other!

-- 
 \ “Not to be absolutely certain is, I think, one of the essential |
  `\ things in rationality.” —Bertrand Russell |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Perhaps it is because of your home page, still in the striking position put python2 download link,

2014-05-22 Thread lovePython999999
hi,
i learn python is 0.5 year,
i'm so much love python,

i come from non English speaking countries,
Python2 coding problem has been troubling me,
I started to learn the python3 now,
But many libraries do not support python3,
I know python3 publishing for many years.
Why do so many libraries or does not support python3,
Perhaps it is because of your home page, still in the striking position put 
python2 download link,

You can speed up the elimination of python2  ?

please

thank you
-- 
https://mail.python.org/mailman/listinfo/python-list


Can Python do this? First steps, links to resources or complete software referals appreciated.

2014-05-22 Thread ed . cottam
Hi, I'm an academic and I want to find/adapt/create a script that will grab 
abstracts (150-250 words of text) from Google Scholar search results and sort 
them by relevance (e.g. keywords, keyword combinations, anything other way you 
can think of). 

Any of you guys know of a script that does this already? Preferably open 
source? If not, any resources you could bring to my attention? I' a complete 
Newb!

Thanks for your help. 

Ed
-- 
https://mail.python.org/mailman/listinfo/python-list


Python is horribly slow compared to bash!!

2014-05-22 Thread Chris Angelico
Figure some of you folks might enjoy this. Look how horrible Python
performance is!

http://thedailywtf.com/Articles/Best-of-Email-Brains,-Security,-Robots,-and-a-Risky-Click.aspx

Actually, probably a lot of you folks already read TDWTF, but maybe
some don't (yet).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.

It seems to me --- from past experience with other things, where
testing integers for equality is faster than testing strings, as well
as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
--- that the SELECT tests should be faster if I am looking up an
INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
right?

If so, what sort of hashing function should I use?  The "maxint" for
SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
thing I've thought of so far is to use MD5 or SHA-something modulo the
maxint value.  (Security isn't an issue --- i.e., I'm not worried
about someone trying to create a hash collision.)

Thanks,
Adam


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."   (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: 转发: hi,How much time can transition to python3

2014-05-22 Thread lovePython999999
在 2014年5月22日星期四UTC+8下午5时38分57秒,Ben Finney写道:
> "
> 
> 
> 
> > i learn python is 0.5 year,
> 
> > i'm so much love python,
> 
> 
> 
> Welcome, you have found a very good programming language. I'm glad you
> 
> like it.
> 
> 
> 
> > i come from non English speaking countries,
> 
> > Python2 coding problem has been troubling me,
> 
> > I started to learn the python3 now,
> 
> 
> 
> This is good. Python 3 makes it much easier to do the right thing with
> 
> writing systems worldwide.
> 
> 
> 
> > But many libraries do not support python3,
> 
> > I know python3 publishing for many years.
> 
> > Why do so many libraries or does not support python3,
> 
> 
> 
> Because Python 2 has a lot of inertia. There is a great amount of
> 
> existing Python 2 code, and many other systems built on that code.
> 
> Change takes time.
> 
> 
> 
> Be glad that you are learning Python 3 now! There has been great
> 
> improvement in the Python 3 landscape in recent years.
> 
> 
> 
> > Perhaps it is because of your home page, still in the striking
> 
> > position put python2 download link,
> 
> 
> 
> You have that backward; the website reflects the current needs of the
> 
> community. While the PYthon 3 transition is still going through rapid
> 
> change, the safest choice is still Python 2 for *existing * uses.
> 
> 
> 
> But for newcomers like yourself, Python 3 is now the right choice and
> 
> has been for some years. Congratulations!
> 
> 
> 
> > You can speed up the elimination of python2  ?
> 
> 
> 
> Yes, much has already been done, and much is still being done now. But
> 
> there is still more work to do, as you observed.
> 
> 
> 
> You can help by contacting the specific projects you rely on which still
> 
> do not have Python 3 support, and ask those people kindly how you can
> 
> help. We all get there faster by helping each other!
> 
> 
> 
> -- 
> 
>  \ “Not to be absolutely certain is, I think, one of the essential |
> 
>   `\ things in rationality.” —Bertrand Russell |
> 
> _o__)  |
> 
> Ben Finney



thank you  so much
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Peter Otten
Adam Funk wrote:

> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other things,
> values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings in the database.
> 
> It seems to me --- from past experience with other things, where
> testing integers for equality is faster than testing strings, as well
> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
> --- that the SELECT tests should be faster if I am looking up an
> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
> right?

My gut feeling tells me that this would matter more for join operations than 
lookup of a value. If you plan to do joins you could use an autoinc integer 
as the primary key and an additional string key for lookup.
 
> If so, what sort of hashing function should I use?  The "maxint" for
> SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
> thing I've thought of so far is to use MD5 or SHA-something modulo the
> maxint value.  (Security isn't an issue --- i.e., I'm not worried
> about someone trying to create a hash collision.)

Start with the cheapest operation you can think of, 

md5(s) % MAXINT

or even

hash(s) % MAXINT # don't forget to set PYTHONHASHSEED

then compare performance with just

s

and only if you can demonstrate a significant speedup keep the complication 
in your code.

If you find such a speedup I'd like to see the numbers because this cries 
PREMATURE OPTIMIZATION...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 9:47 PM, Adam Funk  wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other things,
> values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings in the database.
>
> It seems to me --- from past experience with other things, where
> testing integers for equality is faster than testing strings, as well
> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
> --- that the SELECT tests should be faster if I am looking up an
> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
> right?

It might be faster to use an integer primary key, but the possibility
of even a single collision means you can't guarantee uniqueness
without a separate check. I don't know sqlite3 well enough to say, but
based on what I know of PostgreSQL, it's usually best to make your
schema mimic your logical structure, rather than warping it for the
sake of performance. With a good indexing function, the performance of
a textual PK won't be all that much worse than an integral one, and
everything you do will read correctly in the code - no fiddling around
with hashes and collision checks.

Stick with the TEXT PRIMARY KEY and let the database do the database's
job. If you're processing a really large number of strings, you might
want to consider moving from sqlite3 to PostgreSQL anyway (I've used
psycopg2 quite happily), as you'll get better concurrency; and that
might solve your performance problem as well, as Pg plays very nicely
with caches.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Tim Chase
On 2014-05-22 12:47, Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other
> things, values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings in the database.
> 
> It seems to me --- from past experience with other things, where
> testing integers for equality is faster than testing strings, as
> well as from reading the SQLite3 documentation about INTEGER
> PRIMARY KEY --- that the SELECT tests should be faster if I am
> looking up an INTEGER PRIMARY KEY value rather than TEXT PRIMARY
> KEY.  Is that right?

If sqlite can handle the absurd length of a Python long, you *can* do
it as ints:

  >>> from hashlib import sha1
  >>> s = "Hello world"
  >>> h = sha1(s)
  >>> h.hexdigest()
  '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
  >>> int(h.hexdigest(), 16)
  703993777145756967576188115661016000849227759454L

That's a pretty honkin' huge int for a DB key, but you can use it.
And it's pretty capped on length regardless of the underlying
string's length.

> If so, what sort of hashing function should I use?  The "maxint" for
> SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
> thing I've thought of so far is to use MD5 or SHA-something modulo
> the maxint value.  (Security isn't an issue --- i.e., I'm not
> worried about someone trying to create a hash collision.)

You could truncate that to something like

  >>> int(h.hexdigest()[-8:], 16)

which should give you something that would result in a 32-bit number
that should fit in sqlite's int.

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python is horribly slow compared to bash!!

2014-05-22 Thread wxjmfauth
Le jeudi 22 mai 2014 12:54:22 UTC+2, Chris Angelico a écrit :
> Figure some of you folks might enjoy this. Look how horrible Python
> 
> performance is!
> 
> 
> 
> http://thedailywtf.com/Articles/Best-of-Email-Brains,-Security,-Robots,-and-a-Risky-Click.aspx
> 
> 
> 
> Actually, probably a lot of you folks already read TDWTF, but maybe
> 
> some don't (yet).
> 
> 
> 
> ChrisA

=
=

>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'")
[1.4027834829454946, 1.38714224331963, 1.3822586635296261]
>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = '\u0fce'")
[5.462776291480395, 5.4479432055423445, 5.447874284053398]

Na, na, na, I win.

But that's peanuts.

I can make an application running 100 times slower just
by replacing 'z' with Dutch characters. [*]. I win again.

I can take the same application and replace 'z' by ..., and
... No, I do not win :-( . Python fails.


[*] Unicode is fascinating, working with it is a little
bit travelling.

jmf

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Peter Otten wrote:

> Adam Funk wrote:
>
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>> 
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as well
>> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
>> --- that the SELECT tests should be faster if I am looking up an
>> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
>> right?
>
> My gut feeling tells me that this would matter more for join operations than 
> lookup of a value. If you plan to do joins you could use an autoinc integer 
> as the primary key and an additional string key for lookup.

I'm not doing any join operations.  I'm using sqlite3 for storing big
piles of data & persistence between runs --- not really "proper
relational database use".  In this particular case, I'm getting header
values out of messages & doing this:

  for this_string in these_strings:
if not already_seen(this_string):
  process(this_string)
# ignore if already seen 

...
> and only if you can demonstrate a significant speedup keep the complication 
> in your code.
>
> If you find such a speedup I'd like to see the numbers because this cries 
> PREMATURE OPTIMIZATION...

On further reflection, I think I asked for that.  In fact, the table
I'm using only has one column for the hashes --- I wasn't going to
store the strings at all in order to save disk space (maybe my mind is
stuck in the 1980s).


-- 
But the government always tries to coax well-known writers into the
Establishment; it makes them feel educated. [Robert Graves]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Can Python do this? First steps, links to resources or complete software referals appreciated.

2014-05-22 Thread William Ray Wing
On May 22, 2014, at 6:03 AM, ed.cot...@gmail.com wrote:

> Hi, I'm an academic and I want to find/adapt/create a script that will grab 
> abstracts (150-250 words of text) from Google Scholar search results and sort 
> them by relevance (e.g. keywords, keyword combinations, anything other way 
> you can think of). 
> 
> Any of you guys know of a script that does this already? Preferably open 
> source? If not, any resources you could bring to my attention? I' a complete 
> Newb!
> 
> Thanks for your help. 
> 
> Ed
> -- 
> https://mail.python.org/mailman/listinfo/python-list

Well, you might take a look at scholar.py, located here:  
http://www.icir.org/christian/scholar.html

Also, there is this at stackoverflow:  
http://stackoverflow.com/questions/13200709/extract-google-scholar-results-using-python-or-r

One of these may provide what you want, or serve as a jumping off point.

-Bill
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 9:47 PM, Adam Funk  wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>>
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as well
>> as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
>> --- that the SELECT tests should be faster if I am looking up an
>> INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
>> right?
>
> It might be faster to use an integer primary key, but the possibility
> of even a single collision means you can't guarantee uniqueness
> without a separate check. I don't know sqlite3 well enough to say, but
> based on what I know of PostgreSQL, it's usually best to make your
> schema mimic your logical structure, rather than warping it for the
> sake of performance. With a good indexing function, the performance of
> a textual PK won't be all that much worse than an integral one, and
> everything you do will read correctly in the code - no fiddling around
> with hashes and collision checks.
>
> Stick with the TEXT PRIMARY KEY and let the database do the database's
> job. If you're processing a really large number of strings, you might
> want to consider moving from sqlite3 to PostgreSQL anyway (I've used
> psycopg2 quite happily), as you'll get better concurrency; and that
> might solve your performance problem as well, as Pg plays very nicely
> with caches.

Well, actually I'm thinking about doing away with checking for
duplicates at this stage, since the substrings that I pick out of the
deduplicated header values go into another table as the TEXT PRIMARY
KEY anyway, with deduplication there.  So I think this stage reeks of
premature optimization.


-- 
The history of the world is the history of a privileged few.
--- Henry Miller
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Tim Chase wrote:

> On 2014-05-22 12:47, Adam Funk wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other
>> things, values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>> 
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as
>> well as from reading the SQLite3 documentation about INTEGER
>> PRIMARY KEY --- that the SELECT tests should be faster if I am
>> looking up an INTEGER PRIMARY KEY value rather than TEXT PRIMARY
>> KEY.  Is that right?
>
> If sqlite can handle the absurd length of a Python long, you *can* do
> it as ints:

It can't.  SQLite3 INTEGER is an 8-byte signed one.

https://www.sqlite.org/datatype3.html

But after reading the other replies to my question, I've concluded
that what I was trying to do is pointless.


>  >>> from hashlib import sha1
>  >>> s = "Hello world"
>  >>> h = sha1(s)
>  >>> h.hexdigest()
>   '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
>  >>> int(h.hexdigest(), 16)
>   703993777145756967576188115661016000849227759454L

That ties in with a related question I've been wondering about lately
(using MD5s & SHAs for other things) --- getting a hash value (which
is internally numeric, rather than string, right?) out as a hex string
& then converting that to an int looks inefficient to me --- is there
any better way to get an int?  (I haven't seen any other way in the
API.)


-- 
A firm rule must be imposed upon our nation before it destroys
itself. The United States needs some theology and geometry, some taste
and decency. I suspect that we are teetering on the edge of the abyss.
 --- Ignatius J Reilly
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 11:41 PM, Adam Funk  wrote:
> On further reflection, I think I asked for that.  In fact, the table
> I'm using only has one column for the hashes --- I wasn't going to
> store the strings at all in order to save disk space (maybe my mind is
> stuck in the 1980s).

That's a problem, then, because you will see hash collisions. Maybe
not often, but they definitely will occur if you have enough strings
(look up the birthday paradox - with a 32-bit arbitrarily selected
integer (such as a good crypto hash that you then truncate to 32
bits), you have a 50% chance of a collision at just 77,000 strings).

Do you have enough RAM to hold all the strings directly? Just load 'em
all up into a Python set. Set operations are fast, clean, and easy.
Your already_seen function becomes a simple 'in' check. These days you
can get 16GB or 32GB of RAM in a PC inexpensively enough; with an
average string size of 80 characters, and assuming Python 3.3+, that's
about 128 bytes each - close enough, and a nice figure. 16GB divided
by 128 gives 128M strings - obviously you won't get all of that, but
that's your ball-park. Anything less than, say, a hundred million
strings, and you can dump the lot into memory. Easy!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 11:54 PM, Adam Funk  wrote:
>>  >>> from hashlib import sha1
>>  >>> s = "Hello world"
>>  >>> h = sha1(s)
>>  >>> h.hexdigest()
>>   '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
>>  >>> int(h.hexdigest(), 16)
>>   703993777145756967576188115661016000849227759454L
>
> That ties in with a related question I've been wondering about lately
> (using MD5s & SHAs for other things) --- getting a hash value (which
> is internally numeric, rather than string, right?) out as a hex string
> & then converting that to an int looks inefficient to me --- is there
> any better way to get an int?  (I haven't seen any other way in the
> API.)

I don't know that there is, at least not with hashlib. You might be
able to use digest() followed by the struct module, but it's no less
convoluted. It's the same in several other languages' hashing
functions; the result is a string, not an integer.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: daemon.DaemonContext

2014-05-22 Thread wonko
I know it's 4 years later, but I'm currently battling this myself. I do exactly 
this and yet it doesn't appear to be keeping the filehandler open. Nothing ever 
gets written to logs after I daemonize!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 11:41 PM, Adam Funk  wrote:
>> On further reflection, I think I asked for that.  In fact, the table
>> I'm using only has one column for the hashes --- I wasn't going to
>> store the strings at all in order to save disk space (maybe my mind is
>> stuck in the 1980s).
>
> That's a problem, then, because you will see hash collisions. Maybe
> not often, but they definitely will occur if you have enough strings
> (look up the birthday paradox - with a 32-bit arbitrarily selected
> integer (such as a good crypto hash that you then truncate to 32
> bits), you have a 50% chance of a collision at just 77,000 strings).

Ah yes, there's a handy table for that:

https://en.wikipedia.org/wiki/Birthday_attack#Mathematics


> Do you have enough RAM to hold all the strings directly? Just load 'em
> all up into a Python set. Set operations are fast, clean, and easy.
> Your already_seen function becomes a simple 'in' check. These days you
> can get 16GB or 32GB of RAM in a PC inexpensively enough; with an
> average string size of 80 characters, and assuming Python 3.3+, that's
> about 128 bytes each - close enough, and a nice figure. 16GB divided
> by 128 gives 128M strings - obviously you won't get all of that, but
> that's your ball-park. Anything less than, say, a hundred million
> strings, and you can dump the lot into memory. Easy!

Good point, & since (as I explained in my other post) the substrings
are being deduplicated in their own table anyway it's probably not
worth bothering with persistence between runs for this bit.


-- 
Some say the world will end in fire; some say in segfaults.
 [XKCD 312]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread alister
On Thu, 22 May 2014 12:47:31 +0100, Adam Funk wrote:

> I'm using Python 3.3 and the sqlite3 module in the standard library. I'm
> processing a lot of strings from input files (among other things, values
> of headers in e-mail & news messages) and suppressing duplicates using a
> table of seen strings in the database.
> 
> It seems to me --- from past experience with other things, where testing
> integers for equality is faster than testing strings, as well as from
> reading the SQLite3 documentation about INTEGER PRIMARY KEY --- that the
> SELECT tests should be faster if I am looking up an INTEGER PRIMARY KEY
> value rather than TEXT PRIMARY KEY.  Is that right?
> 
> If so, what sort of hashing function should I use?  The "maxint" for
> SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
> thing I've thought of so far is to use MD5 or SHA-something modulo the
> maxint value.  (Security isn't an issue --- i.e., I'm not worried about
> someone trying to create a hash collision.)
> 
> Thanks,
> Adam

why not just set the filed in the DB to be unique & then catch the error 
when you try to Wright a duplicate?

let the DB engine handle the task


-- 
Your step will soil many countries.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote:

> On Thu, May 22, 2014 at 11:54 PM, Adam Funk  wrote:

>> That ties in with a related question I've been wondering about lately
>> (using MD5s & SHAs for other things) --- getting a hash value (which
>> is internally numeric, rather than string, right?) out as a hex string
>> & then converting that to an int looks inefficient to me --- is there
>> any better way to get an int?  (I haven't seen any other way in the
>> API.)
>
> I don't know that there is, at least not with hashlib. You might be
> able to use digest() followed by the struct module, but it's no less
> convoluted. It's the same in several other languages' hashing
> functions; the result is a string, not an integer.

Well, J*v* returns a byte array, so I used to do this:

digester = MessageDigest.getInstance("MD5");
...
digester.reset();
byte[] digest = digester.digest(bytes);
return new BigInteger(+1, digest);

I dunno why language designers don't make it easy to get a single big
number directly out of these things.


I just had a look at the struct module's fearsome documentation &
think it would present a good shoot(self, foot) opportunity.


-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Fri, May 23, 2014 at 12:47 AM, Adam Funk  wrote:
>> I don't know that there is, at least not with hashlib. You might be
>> able to use digest() followed by the struct module, but it's no less
>> convoluted. It's the same in several other languages' hashing
>> functions; the result is a string, not an integer.
>
> Well, J*v* returns a byte array...

I counted byte arrays along with strings. Whether it's notionally a
string of bytes or characters makes no difference - it's not an
integer.

> I dunno why language designers don't make it easy to get a single big
> number directly out of these things.

It's probably because these sorts of hashes are usually done on large
puddles of memory, to create a smaller puddle of memory. How you
interpret the resulting puddle is up to you; maybe you want to think
of it as a number, maybe as a string, but really it's just a sequence
of bytes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: daemon.DaemonContext and logging

2014-05-22 Thread wonko
On Saturday, April 10, 2010 11:52:41 PM UTC-4, Ben Finney wrote:
> pid = daemon.pidlockfile.TimeoutPIDLockFile(
> "/tmp/dizazzo-daemontest.pid", 10)

Has pidlockfile been removed? (1.6)

-brian
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: daemon.DaemonContext

2014-05-22 Thread wonko
On Thursday, May 22, 2014 10:31:11 AM UTC-4, wo...@4amlunch.net wrote:
> I know it's 4 years later, but I'm currently battling this myself. I do 
> exactly this and yet it doesn't appear to be keeping the filehandler open. 
> Nothing ever gets written to logs after I daemonize!

Ok, made it work, although I think this goes against the documentation as well 
as what's here.

I changed:

context = daemon.DaemonContext(
  # Stuff here
)
context.files_preserve[fh.stream]

to:

context = daemon.DaemonContext(
  # Stuff here
  files_preserve[fh.stream]
)

And now it works.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Peter Otten
Adam Funk wrote:

> On 2014-05-22, Chris Angelico wrote:
> 
>> On Thu, May 22, 2014 at 11:54 PM, Adam Funk  wrote:
> 
>>> That ties in with a related question I've been wondering about lately
>>> (using MD5s & SHAs for other things) --- getting a hash value (which
>>> is internally numeric, rather than string, right?) out as a hex string
>>> & then converting that to an int looks inefficient to me --- is there
>>> any better way to get an int?  (I haven't seen any other way in the
>>> API.)
>>
>> I don't know that there is, at least not with hashlib. You might be
>> able to use digest() followed by the struct module, but it's no less
>> convoluted. It's the same in several other languages' hashing
>> functions; the result is a string, not an integer.
> 
> Well, J*v* returns a byte array, so I used to do this:
> 
> digester = MessageDigest.getInstance("MD5");
> ...
> digester.reset();
> byte[] digest = digester.digest(bytes);
> return new BigInteger(+1, digest);

In Python 3 there's int.from_bytes()

>>> h = hashlib.sha1(b"Hello world")
>>> int.from_bytes(h.digest(), "little")
538059071683667711846616050503420899184350089339

> I dunno why language designers don't make it easy to get a single big
> number directly out of these things.
 
You hardly ever need to manipulate the numerical value of the digest. And on 
its way into the database it will be re-serialized anyway.
 
> I just had a look at the struct module's fearsome documentation &
> think it would present a good shoot(self, foot) opportunity.



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: daemon.DaemonContext

2014-05-22 Thread Ethan Furman

On 05/22/2014 07:31 AM, wo...@4amlunch.net wrote:

I know it's 4 years later, but I'm currently battling this myself. I do exactly 
this and yet it doesn't appear to be keeping the filehandler open. Nothing ever 
gets written to logs after I daemonize!


You didn't include any context (important after four years!) so what are you talking about?  And did you target the 
correct list?


--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


shebang & windows: call an extensionless git hook

2014-05-22 Thread Albert-Jan Roskam
Hi,

I wrote the git pre-commit hook below. It is supposed to reject commits that 
contain large files (e.g. accidental commits by inexperienced users, think of 
"git add .")


Anyway, I tried this under Linux, but the target platform is Windows. As per 
Git design the hook name *must* be "pre-commit" (no .py extension). How will 
Windows know that Python should be run? And (should it be relevant): how does 
Windows know which Python version to invoke? I read about custom shebangs with 
Pylauncher. Is that my only option? (see: 
https://bitbucket.org/vinay.sajip/pylauncher, 
http://legacy.python.org/dev/peps/pep-0397/)


In addition, I would really appreciate general feedback on the hook script 
below. 

Thanks!

Albert-Jan

albertjan@debian ~/Desktop/test_repo $ git config --global init.templatedir 
~/Desktop/git_template_dir
albertjan@debian ~/Desktop/test_repo $ cd ~/Desktop/git_template_dir
albertjan@debian ~/Desktop/git_template_dir $ cat hooks/pre-commit
#!/usr/bin/python
#-*- mode: python -*-

"""Git pre-commit hook: reject large files"""

import
sys
import os
import re
from subprocess import Popen, PIPE

def git_filesize_hook(megabytes_cutoff=5, verbose=False):
    """Git pre-commit hook: Return error if the maximum file size in the HEAD
    revision exceeds , succes (0) otherwise. You can bypass 
    this hook by specifying '--no-verify' as an option in 'git commit'."""
    if verbose: print os.getcwd()
    cmd = "git ls-tree --full-tree -r -l HEAD"
    git = Popen(cmd, shell=True, stdout=PIPE, cwd=os.getcwd())
    get_size = lambda item: int(re.split(" +",
item)[3].split("\t")[0])   
    sizes = map(get_size, git.stdout.readlines())
    cut_off_bytes = megabytes_cutoff * 2 ** 20
    if max(sizes) > cut_off_bytes:
    return ("ERROR: your commit contains at least one file "
    "that is larger than %d bytes" % cut_off_bytes)
    return 0

if __name__ == "__main__":
    sys.exit(git_filesize_hook(0.01, True))

albertjan@debian ~/Desktop/git_template_dir $ cd -
/home/antonia/Desktop/test_repo
albertjan@debian ~/Desktop/test_repo $ git init  ## this also fetches my own 
pre-commit hook from template_dir
Initialized empty Git repository in /home/antonia/Desktop/test_repo/.git/
albertjan@debian ~/Desktop/test_repo $ touch foo.txt
albertjan@debian ~/Desktop/test_repo $ git add foo.txt
albertjan@debian ~/Desktop/test_repo $ ls -l .git/hooks
total 4
-rw-r--r-- 1 albertjan albertjan 1468 May 22 14:49 pre-commit
albertjan@debian ~/Desktop/test_repo $ git commit -a -m "commit"   # hook 
does not yet work
[master (root-commit) dc82f3d] commit
 0 files changed
 create mode 100644 foo.txt
albertjan@debian
~/Desktop/test_repo $ chmod +x .git/hooks/pre-commit   ## can I avoid this 
in Linux? What should I do in Windows?
albertjan@debian ~/Desktop/test_repo $ echo "blaah\n" >> foo.txt
albertjan@debian ~/Desktop/test_repo $ git commit -a -m "commit"  # now the 
hook does its job
/home/antonia/Desktop/test_repo
ERROR: your commit contains at least one file that is larger than 1 bytes

Regards,

Albert-Jan




~~

All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a 

fresh water system, and public health, what have the Romans ever done for us?

 ~~ 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: All-numeric script names and import

2014-05-22 Thread Xavier de Gaye

On 05/21/2014 03:46 PM, Chris Angelico wrote:
> If I have a file called 1.py, is there a way to import it? Obviously I
> can't import it as itself, but in theory, it should be possible to
> import something from it. I can manage it with __import__ (this is
> Python 2.7 I'm working on, at least for the moment), but not with the
> statement form.
>
> # from 1 import app as application # Doesn't work with a numeric name
> application = __import__("1").app
>
> Is there a way to tell Python that, syntactically, this thing that
> looks like a number is really a name? Or am I just being dumb?
>
> (Don't hold back on that last question. "Yes" is a perfectly
> acceptable answer. But please explain which of the several
> possibilities is the way I'm being dumb. Thanks!)
>
> ChrisA
>

import 1.py as module_1 on Python 2.7 (module_1 is not inserted in sys.modules):

>>> import imp
>>> module_1 = imp.new_module('module_1')
>>> execfile('1.py', module_1.__dict__)
>>> del module_1.__dict__['__builtins__']

Xavier

--
https://mail.python.org/mailman/listinfo/python-list


Re: All-numeric script names and import

2014-05-22 Thread Xavier de Gaye

On 05/22/2014 12:32 PM, Xavier de Gaye wrote:
> import 1.py as module_1 on Python 2.7 (module_1 is not inserted in 
sys.modules):
>
>  >>> import imp
>  >>> module_1 = imp.new_module('module_1')
>  >>> execfile('1.py', module_1.__dict__)
>  >>> del module_1.__dict__['__builtins__']


Oups.. should not remove the builtins and should add __file__.
With corrections:

>>> import imp
>>> module_1 = imp.new_module('module_1')
>>> execfile('1.py', module_1.__dict__)
>>> module_1.__file__ = '1.py'

Xavier
--
https://mail.python.org/mailman/listinfo/python-list


Re: All-numeric script names and import

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 8:32 PM, Xavier de Gaye  wrote:
> import 1.py as module_1 on Python 2.7 (module_1 is not inserted in
> sys.modules):
>
 import imp
 module_1 = imp.new_module('module_1')
 execfile('1.py', module_1.__dict__)
 del module_1.__dict__['__builtins__']

Heh, I think __import__() is simpler than that :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread Aseem Bansal
I am working on a hobby project - a Bookmarker 
https://github.com/anshbansal/Bookmarker. 

Basically bookmarks like in webbrowser stored in a app. The twist is storage by 
categories. I have spent some time on choosing the correct tech for making this 
project but it seems it would be better to take some advice on this after I 
went through this discussion on django forums 
https://groups.google.com/forum/#!topic/django-users/rSqSftkl5mg.


I want to be able to add bookmarks to the app through browser. I want a 
front-end from which I am able to browse the bookmarks. The browsing front-end 
should have a search option(search for category) for filtering the bookmarks.

As per these requirements that I have framed so far I thought that a web 
framework would be a good choice and so I chose Django. The reason being the 
capability to add bookmarks through browser can be done easily through 
JavaScript. But I hit a snag today that webbrowser's won't allow client to open 
hyperlinks with file protocol. I have both offline and online bookmarks so that 
was a problem for me.

Now I am at my experience's ends. I have spent 15-20 days' spare time trying to 
decide the technology and now this snag. Can someone advice on this? Am I using 
correct technology?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: daemon.DaemonContext and logging

2014-05-22 Thread Mark H Harris

On 5/22/14 10:28 AM, wo...@4amlunch.net wrote:

On Saturday, April 10, 2010 11:52:41 PM UTC-4, Ben Finney wrote:

 pid = daemon.pidlockfile.TimeoutPIDLockFile(
 "/tmp/dizazzo-daemontest.pid", 10)


Has pidlockfile been removed? (1.6)

-brian



"Have you released the inertial dampener?"

:)

--
https://mail.python.org/mailman/listinfo/python-list


Re: Python is horribly slow compared to bash!!

2014-05-22 Thread Mark H Harris

On 5/22/14 5:54 AM, Chris Angelico wrote:

Figure some of you folks might enjoy this. Look how horrible Python
performance is!

http://thedailywtf.com/Articles/Best-of-Email-Brains,-Security,-Robots,-and-a-Risky-Click.aspx



> From TDWTF:

Most of the interesting physics analysis code here is based
on a framework using Python scripts for setup and configuration
which then calls native analysis code, that usually is implemented in C++.


This goes back to a previous discussion about about Julia (couple weeks 
back) and IPython. What these guys at CERN need is the dynamic duo of 
IPython and Julia. (its gonna be fabulous, seriously)


Or, Julia by itself. The whole point of the Julia project was to bring 
the whole dynamic scripting, glue, lightning fast FORTRAN or C++ 
specialty code, into one screaming fast package that "does it all".


Of course that's a pipe dream, but they are getting very close. And, if 
they pull off the IPython | Julia match-up thing, man, its going to 
change the way technical computation is handled for decades to come.


Back to the TDWTF post, what a hoot. Ok, you heard it there first 
people, Python is dead everyone learn BASH.:-pheh


marcus


--
https://mail.python.org/mailman/listinfo/python-list


Re: Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread John Gordon
In <6a3c5b20-bce5-4c95-b27f-3840e9cc7...@googlegroups.com> Aseem Bansal 
 writes:

> But I hit a snag today that webbrowser's won't allow client to open
> hyperlinks with file protocol. I have both offline and online bookmarks
> so that was a problem for me.

What do you mean by saying "webbrowser's won't allow client to open
hyperlinks with file protocol"?  Of course they do.

My web browser works just fine with links such as this:

foo.html

-- 
John Gordon Imagine what it must be like for a real medical doctor to
gor...@panix.comwatch 'House', or a real serial killer to watch 'Dexter'.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread Mark H Harris

On 5/22/14 1:54 PM, Aseem Bansal wrote:

I am working on a hobby project - a Bookmarker{snip}


hi,  no django is not really the correct tool-set. Django is for 
server-side content management, but who knows, you might come up with a 
great hack (I don't want to discourage you).  But, a straight python 
trimmed down app would probably be better...  what led you to django?


It seems from your descriptions, which don't make sense by the way, that 
you are attempting to create your own 'browser' within your app (web 
api) and you want to use a standard browser (like firefox or chrome) to 
'front-end' the apps bookmarks. So, your app needs to be able to read 
your browser's bookmarks file.


Browsers most certainly can read http:// https:// file:// etc. (and many 
more). Your api may not be able to read local file://  urls, but I'm 
skeptical about that (most web api(s) have no trouble with file:// either).


Provide some more info, somebody will help.


marcus

--
https://mail.python.org/mailman/listinfo/python-list


Re: Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread Ian Kelly
On Thu, May 22, 2014 at 1:28 PM, John Gordon  wrote:
> In <6a3c5b20-bce5-4c95-b27f-3840e9cc7...@googlegroups.com> Aseem Bansal 
>  writes:
>
>> But I hit a snag today that webbrowser's won't allow client to open
>> hyperlinks with file protocol. I have both offline and online bookmarks
>> so that was a problem for me.
>
> What do you mean by saying "webbrowser's won't allow client to open
> hyperlinks with file protocol"?  Of course they do.
>
> My web browser works just fine with links such as this:
>
> foo.html

It works if the document that contains the link is also opened from
the local filesystem, but browsers will refuse to follow the link if
it was served over http.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread John Gordon
In  Ian Kelly 
 writes:

> > My web browser works just fine with links such as this:
> >
> > foo.html

> It works if the document that contains the link is also opened from
> the local filesystem, but browsers will refuse to follow the link if
> it was served over http.

Aha!  I didn't know that.  Now that I think about it, I suppose it makes
sense.

Perhaps the OP could write a separate application for handling local
files, something like:




-- 
John Gordon Imagine what it must be like for a real medical doctor to
gor...@panix.comwatch 'House', or a real serial killer to watch 'Dexter'.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Advice for choosing correct architecture/tech for a hobby project

2014-05-22 Thread Ethan Furman

On 05/22/2014 11:54 AM, Aseem Bansal wrote:


I am working on a hobby project - a Bookmarker 
https://github.com/anshbansal/Bookmarker.


Take a look at delicio.us -- it seems to be a similar type of experience.

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: All-numeric script names and import

2014-05-22 Thread Rustom Mody
On Wednesday, May 21, 2014 7:16:46 PM UTC+5:30, Chris Angelico wrote:
> If I have a file called 1.py, is there a way to import it? Obviously I
> can't import it as itself, but in theory, it should be possible to
> import something from it. I can manage it with __import__ (this is
> Python 2.7 I'm working on, at least for the moment), but not with the
> statement form.


$ cat ا.py
x = 1
def foo(x): print("Hi %s!!" % x)



$ python3
Python 3.3.5 (default, Mar 22 2014, 13:24:53) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ا
>>> ا.foo('Chris')
Hi Chris!!
>>> ا.x
1
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: All-numeric script names and import

2014-05-22 Thread Chris Angelico
On Fri, May 23, 2014 at 12:08 PM, Rustom Mody  wrote:
> $ cat ا.py
> x = 1
> def foo(x): print("Hi %s!!" % x)

Yeah, no thanks. I am not naming my scripts in Arabic. :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list