Re: For-each behavior while modifying a collection

2013-11-29 Thread Valentin Zahnd
2013/11/28 Ned Batchelder :
> On 11/28/13 10:49 AM, Valentin Zahnd wrote:
>>
>> Hello
>>
>> For-each does not iterate ober all entries of collection, if one
>> removes elements during the iteration.
>>
>> Example code:
>>
>> def keepByValue(self, key=None, value=[]):
>>  for row in self.flows:
>>  if not row[key] in value:
>>  self.flows.remove(row)
>>
>> It is clear why it behaves on that way. Every time one removes an
>> element, the length of the colleciton decreases by one while the
>> counter of the for each statement is not.
>> The questions are:
>> 1. Why does the interprete not uses a copy of the collection to
>> iterate over it? Are there performance reasons?
>
>
> Because implicit copying would be pointless in most cases.  Most loops don't
> even want to modify the collection, why copy all iterables just in case your
> loop might be one of the tiny few that might change the collection?
>

Okay, I get this point.
But how is the for-each parsed. Is it realised with a iterator or ist
done with a for loop where a counter is incremented whilst it is less
than the length of the list and reads the element at the index of the
counter?

> Of course, if that prices is acceptable to you, you could do the copy
> yourself:
>
> for row in list(self.flows):
> if row[key] not in value:
> self.flows.remove(row)
>
>
>> 2. Why is the counter for the iteration not modified?
>
>
> Because the list and the iterator over the list are different objects. I
> suppose the list and the iterator could have been written to update when the
> list is modified, but it could get pretty complicated, even more so if you
> want to do the same for other collections like dictionaries.
>
> The best advice is: don't modify the list, instead make a new list:
>
> self.flows = [r for r in self.flows if r[key] not in value]
>
How is the list comprehension done by the interpreter?
The list I'm have to work with is not that small. So to avoid
duplicated parts in memory the function looks currently like this:

 def keepByValue(self, key=None, value=[]):
 tmpFlows = []
 while len(self.flows) > 0:
 row = self.flows.pop()
 if row[key] in value:
 tmpFlows.append(row)
 self.flows = tmpFlows

If there is no duplication in memory, the list comprehension would be
much more elegant.

> Be careful though, since there might be other references to the list, and
> now you have two.
>
> --Ned.
>
>>
>> Valentin
>>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list


2013/11/28 Ned Batchelder :
> On 11/28/13 10:49 AM, Valentin Zahnd wrote:
>>
>> Hello
>>
>> For-each does not iterate ober all entries of collection, if one
>> removes elements during the iteration.
>>
>> Example code:
>>
>> def keepByValue(self, key=None, value=[]):
>>  for row in self.flows:
>>  if not row[key] in value:
>>  self.flows.remove(row)
>>
>> It is clear why it behaves on that way. Every time one removes an
>> element, the length of the colleciton decreases by one while the
>> counter of the for each statement is not.
>> The questions are:
>> 1. Why does the interprete not uses a copy of the collection to
>> iterate over it? Are there performance reasons?
>
>
> Because implicit copying would be pointless in most cases.  Most loops don't
> even want to modify the collection, why copy all iterables just in case your
> loop might be one of the tiny few that might change the collection?
>
Okay, I get this point. It is senseless especially in view of memory usage.
But how is the for-each parsed. Is it
> Of course, if that prices is acceptable to you, you could do the copy
> yourself:
>
> for row in list(self.flows):
> if row[key] not in value:
> self.flows.remove(row)
>
>
>> 2. Why is the counter for the iteration not modified?
>
>
> Because the list and the iterator over the list are different objects. I
> suppose the list and the iterator could have been written to update when the
> list is modified, but it could get pretty complicated, even more so if you
> want to do the same for other collections like dictionaries.
>
> The best advice is: don't modify the list, instead make a new list:
>
> self.flows = [r for r in self.flows if r[key] not in value]
>
How is the list comprehension done by the interpreter?
The list I'm have to work with is not that small. So to avoid
duplicated parts in memory the function looks currently like this:

 def keepByValue(self, key=None, value=[]):
 tmpFlows = []
 while len(self.flows) > 0:
 row = self.flows.pop()
 if row[key] in value:
 tmpFlows.append(row)
 self.flows = tmpFlows

If there is no duplication in memory, the list comprehension would be
much more elegant.

> Be careful though, since there might be other references to the list, and
> now you have two.
>
> --Ned.
>
>>
>> Valentin
>>
>
>
> --
> htt

Re: For-each behavior while modifying a collection

2013-11-29 Thread Chris Angelico
On Fri, Nov 29, 2013 at 9:14 AM, Valentin Zahnd  wrote:
>  def keepByValue(self, key=None, value=[]):
>  tmpFlows = []
>  while len(self.flows) > 0:
>  row = self.flows.pop()
>  if row[key] in value:
>  tmpFlows.append(row)
>  self.flows = tmpFlows

This is almost certainly going to be less efficient than the
comprehension - it churns the length of the list terribly. There's
very little duplication, as the actual _contents_ will not be
duplicated, only references to them; so the list comp is both clean
and efficient.

More importantly, the way you write a Python program should be:

1) Make it work and look clean.
2) See if it's fast enough. If so - and it usually will be - you're done.
3) Profile it and find out where the slow bits really are.
4) Try to speed up some of the slow bits.

You'll seldom get past step 2. The difference between "fast" and "fast
enough" is usually not visible to a human.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: For-each behavior while modifying a collection

2013-11-29 Thread Ned Batchelder

On 11/28/13 5:14 PM, Valentin Zahnd wrote:

2013/11/28 Ned Batchelder :

On 11/28/13 10:49 AM, Valentin Zahnd wrote:


Hello

For-each does not iterate ober all entries of collection, if one
removes elements during the iteration.

Example code:

def keepByValue(self, key=None, value=[]):
  for row in self.flows:
  if not row[key] in value:
  self.flows.remove(row)

It is clear why it behaves on that way. Every time one removes an
element, the length of the colleciton decreases by one while the
counter of the for each statement is not.
The questions are:
1. Why does the interprete not uses a copy of the collection to
iterate over it? Are there performance reasons?



Because implicit copying would be pointless in most cases.  Most loops don't
even want to modify the collection, why copy all iterables just in case your
loop might be one of the tiny few that might change the collection?



Okay, I get this point.
But how is the for-each parsed. Is it realised with a iterator or ist
done with a for loop where a counter is incremented whilst it is less
than the length of the list and reads the element at the index of the
counter?


The for loop creates an iterator over the expression, and pulls values 
from the iterator, running the body once for each value.


This presentation explains the mechanics of loops, along with some 
examples of more exotic uses, and (at least) one bad joke: 
http://nedbatchelder.com/text/iter.html





Of course, if that prices is acceptable to you, you could do the copy
yourself:

 for row in list(self.flows):
 if row[key] not in value:
 self.flows.remove(row)



2. Why is the counter for the iteration not modified?



Because the list and the iterator over the list are different objects. I
suppose the list and the iterator could have been written to update when the
list is modified, but it could get pretty complicated, even more so if you
want to do the same for other collections like dictionaries.

The best advice is: don't modify the list, instead make a new list:

 self.flows = [r for r in self.flows if r[key] not in value]


How is the list comprehension done by the interpreter?
The list I'm have to work with is not that small. So to avoid
duplicated parts in memory the function looks currently like this:

  def keepByValue(self, key=None, value=[]):
  tmpFlows = []
  while len(self.flows) > 0:
  row = self.flows.pop()
  if row[key] in value:
  tmpFlows.append(row)
  self.flows = tmpFlows

If there is no duplication in memory, the list comprehension would be
much more elegant.



The list comprehension won't copy the elements of the list, if that is 
what you're worried about.  And it won't reverse the order of your list 
the way this loop does.  BTW, my list comprehension got the sense of the 
condition wrong, it should be:


self.flows = [r for r in self.flows if r[key] in value]

--Ned.


Be careful though, since there might be other references to the list, and
now you have two.

--Ned.



Valentin




--
https://mail.python.org/mailman/listinfo/python-list



--
https://mail.python.org/mailman/listinfo/python-list


Re: how to implement a queue-like container with sort function

2013-11-29 Thread iMath
it seems  PriorityQueue satisfy my requirement here .

BTW ,the Queue object has an attribute 'queue' ,but I cannot find it described 
in the DOC ,what it means ?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python application for rpm creation

2013-11-29 Thread Unix SA
Hello,

Thanks Runge,

I am not looking for tool which can help to create package for python code,
but i am looking for tool which creates package or SPEC file for all kind
of lib it's C or >NET, or JAVA or anything.

Regards,
DJ


On Wed, Nov 27, 2013 at 2:41 PM, Matthias Runge wrote:

> On 11/27/2013 03:28 AM, Amit Saha wrote:
> > On Wed, Nov 27, 2013 at 1:39 AM, Unix SA  wrote:
> >>
> >>> Sounds to me more like he is looking to package some other in house
> >>> software, as opposed to packaging python specific libraries, etc..
> >>
> >> - Yes, This is exactly i am looking at
> >>
> >>
> >>> Doing an apt-cache search on my Ubuntu desktop results with a project,
> >>> Spectacle, coincidentally written in Python. (I haven't really looked
> > into
> >>> it):
> >>> http://meego.gitorious.org/meego-developer-tools/spectacle
> >>
> >> this looks useful, i shall looking to this... or may be try myself
> writing
> >> something.
> >>
> >> if you guys ( others ) got something else for Redhat Linux rpm creation
> do
> >> let me know.
> >
> > I played with creating a RPM SPEC file "generator" for Sphinx
> documentation:
> > https://github.com/amitsaha/sphinx_doc_packaging
> >
> > It's written in Python, so perhaps may help with you a starting point.
> >
> > Best,
> > Amit.
> >
> >
> In Fedora (and IMHO in EPEL, too) there is a package named pyp2rpm. This
> is quite handy. It fetches sources from pypi, writes a basic SPEC file,
> which might need minor tweaks, but in general, it really saves you time.
>
> Matthias
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Managing Google Groups headaches

2013-11-29 Thread Mark Lawrence

On 28/11/2013 16:29, Zero Piraeus wrote:

:

On Thu, Nov 28, 2013 at 08:40:47AM -0700, Michael Torrie wrote:

My opinion is that the Python list should dump the Usenet tie-in and
just go straight e-mail.


+1 Hell yes.



I'd happily use semaphore but given time you're bound to find someone 
who could screw that up.  So I'll stick with Thunderbird and gmane, 
reading some 40-ish Python lists and blogs.  Well, I think they're blogs :)


--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Managing Google Groups headaches

2013-11-29 Thread Mark Lawrence

On 29/11/2013 00:46, Arif Khokar wrote:

On 11/28/2013 1:50 PM, Michael Torrie wrote:

On 11/28/2013 11:37 AM, rusi wrote:



2. All kinds of people hop onto the list. In addition to genuine ones
there are
spammers, trolls, dicks, nuts, philosophers, help-vampires etc etc.


What they have in common is usenet.  Ditching usenet would solve both
problems.


The problem could also be solved through client side filtering (i. e.,
killfiles).  I usually killfile posters who crosspost to unrelated
groups (which filters 99% of the spam that comes through).  I'm sure
that the usenet/email gateway could be configured to filter such posts
on the server side so those who read this list via email won't have
those problems.



Read through gmane, it's effectively spam free.

--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: how to implement a queue-like container with sort function

2013-11-29 Thread Mark Lawrence

On 29/11/2013 12:33, iMath wrote:


BTW ,the Queue object has an attribute 'queue' ,but I cannot find it described 
in the DOC ,what it means ?



Really? AttributeError: type object 'Queue' has no attribute 'queue'

--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Managing Google Groups headaches

2013-11-29 Thread Grant Edwards
On 2013-11-28, Zero Piraeus  wrote:
>:
>
> On Thu, Nov 28, 2013 at 08:40:47AM -0700, Michael Torrie wrote:
>> My opinion is that the Python list should dump the Usenet tie-in and
>> just go straight e-mail.
>
> +1 Hell yes.

I'd have to reluctantly agree.  I've been using Usenet for 25 years,
and I still read this as comp.lang.python, but this is practically the
only Usenet group left that I follow.  There are a number of mailing
lists I follow via gmane's NNTP server, and I can certainly do the
same for this one.

I've been filtering out all postings from GG for years, so it doesn't
really matter to me, but apparently there are a lot of people with
defective mail/news clients for whom that's apparently not possible?
[Otherwise, I don't understand what all the complaining is about.]

-- 
Grant



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Jython - Can't access enumerations?

2013-11-29 Thread Eamonn Rea
Ok, here's the code:

body_def = BodyDef() # Imports the BodyDef class fine.

body_def.type = BodyDef.DynamicBody # Says it can't find a module from LibGDX 
called BodyDef.

All my code:

from com.badlogic.gdx import Game, Gdx, Screen
from com.badlogic.gdx.backends.lwjgl import LwjglApplicationConfiguration, \
LwjglApplication
from com.badlogic.gdx.graphics import OrthographicCamera, GL20
from com.badlogic.gdx.graphics.g2d import SpriteBatch
from com.badlogic.gdx.math import Vector2
from com.badlogic.gdx.physics.box2d import Box2DDebugRenderer, World, BodyDef, \
FixtureDef, PolygonShape

import Player


class Playing(Screen):
def __init__(self):
self.batch = None
self.renderer = None
self.world = None
self.camera = None
self.player = None
self.player_body_def = None
self.player_fixture_def = None
self.player_shape = None


def show(self):
self.batch = SpriteBatch()
self.renderer = Box2DDebugRenderer()
self.camera = OrthographicCamera()
self.player_body_def = BodyDef()
self.player_fixture_def = FixtureDef()
self.player_shape = PolygonShape()

self.world = World(Vector2(0, -9.81), True)

self.player_shape.setAsBox(0.5, 1)

self.player_body_def.fixedRotation = True
self.player_body_def.position.set(0, 0)
self.player_body_def.type = BodyDef.DynamicBody

self.player_fixture_def.density = 1
self.player_fixture_def.friction = 0.5
self.player_fixture_def.shape = self.player_shape
self.player_fixture_def.restitution = 0.01



self.player = Player(self.world, self.player_body_def, 
self.player_fixture_def)


def resize(self, width, height):
pass


def pause(self):
pass


def resume(self):
pass


def dispose(self):
pass


def render(self, delta):
Gdx.gl.glClearColor(0, 0, 0, 1)
Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT)

self.renderer.render(self.camera.combined, self.world)

BodyDef.DynamicBody is defiantly in the BodyType Enumeration. I've tested it in 
Java.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Managing Google Groups headaches

2013-11-29 Thread Grant Edwards
On 2013-11-29, Arif Khokar  wrote:
> On 11/28/2013 1:50 PM, Michael Torrie wrote:
>> On 11/28/2013 11:37 AM, rusi wrote:
>
>>> 2. All kinds of people hop onto the list. In addition to genuine ones there 
>>> are
>>> spammers, trolls, dicks, nuts, philosophers, help-vampires etc etc.
>>
>> What they have in common is usenet.  Ditching usenet would solve both
>> problems.
>
> The problem could also be solved through client side filtering (i. e., 
> killfiles).  I usually killfile posters who crosspost to unrelated 
> groups (which filters 99% of the spam that comes through).  I'm sure 
> that the usenet/email gateway could be configured to filter such posts 
> on the server side so those who read this list via email won't have 
> those problems.
>
> The problem with just using email is that it's a bit more difficult to 
> browse archived posts to this group.  After I subscribed to this group 
> (comp.lang.python) using my news client, I could immediately browse 
> posts made as far back as April.

You're assuming that Usenet === NNTP.  You can point your news client
at gmane.org's NNTP server and get all the benefits of "news" for
regular mailing lists.

-- 
Grant
-- 
https://mail.python.org/mailman/listinfo/python-list


strip away html tags from extracted links

2013-11-29 Thread Max Cuban
I have the following code to extract certain links from a webpage:

from bs4 import BeautifulSoup
import urllib2, sys
import re

def tonaton():
site = "http://tonaton.com/en/job-vacancies-in-ghana";
hdr = {'User-Agent' : 'Mozilla/5.0'}
req = urllib2.Request(site, headers=hdr)
jobpass = urllib2.urlopen(req)
invalid_tag = ('h2')
soup = BeautifulSoup(jobpass)
print soup.find_all('h2')

The links are contained in the 'h2' tags so I get the links as follows:

cashiers 
Cake baker
Automobile Technician
Marketing Officer

But I'm interested in getting rid of all the 'h2' tags so that I have links 
only in this manner:

cashiers 
Cake baker
Automobile Technician
Marketing Officer
 

I therefore updated my code to look like this:

def tonaton():
site = "http://tonaton.com/en/job-vacancies-in-ghana";
hdr = {'User-Agent' : 'Mozilla/5.0'}
req = urllib2.Request(site, headers=hdr)
jobpass = urllib2.urlopen(req)
invalid_tag = ('h2')
soup = BeautifulSoup(jobpass)
jobs = soup.find_all('h2')
for tag in invalid_tag:
for match in jobs(tag):
match.replaceWithChildren()
print jobs

But I couldn't get it to work, even though  I thought that was the best logic i 
could come up with.I'm a newbie though so I know there is something better that 
could be done.

Any help will be gracefully appreciated

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: strip away html tags from extracted links

2013-11-29 Thread Mark Lawrence

On 29/11/2013 16:56, Max Cuban wrote:

I have the following code to extract certain links from a webpage:

from bs4 import BeautifulSoup
import urllib2, sys
import re

def tonaton():
 site = "http://tonaton.com/en/job-vacancies-in-ghana";
 hdr = {'User-Agent' : 'Mozilla/5.0'}
 req = urllib2.Request(site, headers=hdr)
 jobpass = urllib2.urlopen(req)
 invalid_tag = ('h2')
 soup = BeautifulSoup(jobpass)
 print soup.find_all('h2')

The links are contained in the 'h2' tags so I get the links as follows:

cashiers 
Cake baker
Automobile Technician
Marketing Officer

But I'm interested in getting rid of all the 'h2' tags so that I have links 
only in this manner:

cashiers 
Cake baker
Automobile Technician
Marketing Officer


I therefore updated my code to look like this:

def tonaton():
 site = "http://tonaton.com/en/job-vacancies-in-ghana";
 hdr = {'User-Agent' : 'Mozilla/5.0'}
 req = urllib2.Request(site, headers=hdr)
 jobpass = urllib2.urlopen(req)
 invalid_tag = ('h2')
 soup = BeautifulSoup(jobpass)
 jobs = soup.find_all('h2')
 for tag in invalid_tag:
 for match in jobs(tag):
 match.replaceWithChildren()
 print jobs

But I couldn't get it to work, even though  I thought that was the best logic i 
could come up with.I'm a newbie though so I know there is something better that 
could be done.

Any help will be gracefully appreciated

Thanks



Please help us to help you.  A good starter is your versions of Python 
and OS.  But more importantly here, what does "couldn't get it to work" 
mean?  The output you get isn't what you expected?  You get a traceback, 
in which case please give us the whole of the output, not just the last 
line?


One last thing, I observe that you've a gmail address.  This is 
currently guaranteed to send shivers down my spine.  So if you're using 
google groups, would you be kind enough to read and action this, 
https://wiki.python.org/moin/GoogleGroupsPython, thanks.


--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: strip away html tags from extracted links

2013-11-29 Thread Chris Angelico
On Sat, Nov 30, 2013 at 4:33 AM, Mark Lawrence  wrote:
> One last thing, I observe that you've a gmail address.  This is currently
> guaranteed to send shivers down my spine.  So if you're using google groups,
> would you be kind enough to read and action this,
> https://wiki.python.org/moin/GoogleGroupsPython, thanks.

Don't blame all gmail users, some of us are using the mailing list. :)
You should be able to check the headers - with the email posts,
there's an Injection-Info header which cites Google Groups. Presumably
you get the same or similar if you read as a newsgroup.

And the OP was, indeed, using GG. Why is it so suddenly so popular?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: strip away html tags from extracted links

2013-11-29 Thread Joel Goldstick
On Fri, Nov 29, 2013 at 12:33 PM, Mark Lawrence wrote:

> On 29/11/2013 16:56, Max Cuban wrote:
>
>> I have the following code to extract certain links from a webpage:
>>
>> from bs4 import BeautifulSoup
>> import urllib2, sys
>> import re
>>
>> def tonaton():
>>  site = "http://tonaton.com/en/job-vacancies-in-ghana";
>>  hdr = {'User-Agent' : 'Mozilla/5.0'}
>>  req = urllib2.Request(site, headers=hdr)
>>  jobpass = urllib2.urlopen(req)
>>  invalid_tag = ('h2')
>>  soup = BeautifulSoup(jobpass)
>>  print soup.find_all('h2')
>>
>> The links are contained in the 'h2' tags so I get the links as follows:
>>
>> cashiers 
>> Cake baker
>> Automobile
>> Technician
>> Marketing Officer
>>
>> But I'm interested in getting rid of all the 'h2' tags so that I have
>> links only in this manner:
>>
>> cashiers 
>> Cake baker
>> Automobile Technician
>> Marketing Officer
>>
>>
>> This is more a beautiful soup question than python.  Have you gone
>> through their tutorial.  Check here:
>>
>
They have an example that looks close here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/

One common task is extracting all the URLs found within a page’s  tags:

for link in soup.find_all('a'):
print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

In your case, you want the href values for the child of the h2 refences.

So this might be close (untested)

for link in soup.find_all('a'):
print (link.a.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie






-- 
Joel Goldstick
http://joelgoldstick.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: strip away html tags from extracted links

2013-11-29 Thread Joel Goldstick
On Fri, Nov 29, 2013 at 12:44 PM, Joel Goldstick
wrote:

>
>
>
> On Fri, Nov 29, 2013 at 12:33 PM, Mark Lawrence 
> wrote:
>
>> On 29/11/2013 16:56, Max Cuban wrote:
>>
>>> I have the following code to extract certain links from a webpage:
>>>
>>> from bs4 import BeautifulSoup
>>> import urllib2, sys
>>> import re
>>>
>>> def tonaton():
>>>  site = "http://tonaton.com/en/job-vacancies-in-ghana";
>>>  hdr = {'User-Agent' : 'Mozilla/5.0'}
>>>  req = urllib2.Request(site, headers=hdr)
>>>  jobpass = urllib2.urlopen(req)
>>>  invalid_tag = ('h2')
>>>  soup = BeautifulSoup(jobpass)
>>>  print soup.find_all('h2')
>>>
>>> The links are contained in the 'h2' tags so I get the links as follows:
>>>
>>> cashiers 
>>> Cake baker
>>> Automobile
>>> Technician
>>> Marketing Officer
>>>
>>> But I'm interested in getting rid of all the 'h2' tags so that I have
>>> links only in this manner:
>>>
>>> cashiers 
>>> Cake baker
>>> Automobile Technician
>>> Marketing Officer
>>>
>>>
>>> This is more a beautiful soup question than python.  Have you gone
>>> through their tutorial.  Check here:
>>>
>>
> They have an example that looks close here:
> http://www.crummy.com/software/BeautifulSoup/bs4/doc/
>
> One common task is extracting all the URLs found within a page’s  tags:
>
> for link in soup.find_all('a'):
> print(link.get('href'))
> # http://example.com/elsie
> # http://example.com/lacie
> # http://example.com/tillie
>
> In your case, you want the href values for the child of the h2 refences.
>
> So this might be close (untested)
>

Pardon my typo.  Try this:

>
> for link in soup.find_all('h2'):
> print (link.a.get('href'))
> # http://example.com/elsie
> # http://example.com/lacie
> # http://example.com/tillie
>
>
>
>
>
>
> --
> Joel Goldstick
> http://joelgoldstick.com
>



-- 
Joel Goldstick
http://joelgoldstick.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: strip away html tags from extracted links

2013-11-29 Thread Gene Heskett
On Friday 29 November 2013 13:44:57 Chris Angelico did opine:

> On Sat, Nov 30, 2013 at 4:33 AM, Mark Lawrence  
wrote:
> > One last thing, I observe that you've a gmail address.  This is
> > currently guaranteed to send shivers down my spine.  So if you're
> > using google groups, would you be kind enough to read and action
> > this,
> > https://wiki.python.org/moin/GoogleGroupsPython, thanks.
> 
> Don't blame all gmail users, some of us are using the mailing list. :)
> You should be able to check the headers - with the email posts,
> there's an Injection-Info header which cites Google Groups. Presumably
> you get the same or similar if you read as a newsgroup.
> 
> And the OP was, indeed, using GG. Why is it so suddenly so popular?
> 
> ChrisA

Thank you for that hint Chris, it should enhance my enjoyment of this list.

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

There is a 20% chance of tomorrow.
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
 law-abiding citizens.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Jython - Can't access enumerations?

2013-11-29 Thread Tim Delaney
On 30 November 2013 03:15, Eamonn Rea  wrote:

> Ok, here's the code:
> [elided]
>

As I said, please also show the *exact* error - copy and paste the stack
trace.

Tim Delaney
-- 
https://mail.python.org/mailman/listinfo/python-list


Need help with programming in python for class (beginner level)

2013-11-29 Thread farhanken
It's for a school assignment. Basically, I need to roll 5 dies with 6 sides 
each. So basically, 6 random numbers. That part is easy. Then I need to add it 
up. Ok, done that. However, I also need to say something along the lines of 
"your total number was X". That's what I'm having trouble with. I added the 
dice rolls together and put them into a variable I called "number" but it seems 
to glitch out that variable is in any command other than "print number". Like, 
if I try to write:

print "your total number was:" number "" 

It just doesn't work. 

Here is my code so far:


import cgi

form = cgi.FieldStorage()
name = form.getvalue("name")
value = form.getvalue("value")


print """Content-type: text/html
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>

A CGI Script

"""
import random

die1 = random.randint(1,6)
die2 = random.randint(1,6)
die3 = random.randint(1,6)
die4 = random.randint(1,6)
die5 = random.randint(1,6)
print die1, die2, die3, die4, die5
number = die1 + die2 + die3 + die4 + die5
print "The total rolled was: "number" "

print "Thanks for playing, "  + name +  "."
print "You bet the total would be at least " + value + "."
print ""
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Johannes Findeisen
On Fri, 29 Nov 2013 16:31:21 -0800 (PST)
farhan...@gmail.com wrote:

> print "The total rolled was: "number" "

The above line is wrong. You did it right below:

> print "Thanks for playing, "  + name +  "."
> print "You bet the total would be at least " + value + "."

Do this:

print "The total rolled was: " + number + " "

Regards,
Johannes
-- 
https://mail.python.org/mailman/listinfo/python-list


Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
There's a recent blog post complaining about the lousy support for 
Unicode text in most programming languages:

http://mortoray.com/2013/11/27/the-string-type-is-broken/

The author, Mortoray, gives nine basic tests to understand how well the 
string type in a language works. The first four involve "user-perceived 
characters", also known as grapheme clusters.


(1) Does the decomposed string "noe\u0308l" print correctly? Notice that 
the accented letter ë has been decomposed into a pair of code points, 
U+0065 (LATIN SMALL LETTER E) and U+0308 (COMBINING DIAERESIS).

Python 3.3 passes this test:

py> print("noe\u0308l")
noël

although I expect that depends on the terminal you are running in.


(2) If you reverse that string, does it give "lëon"? The implication of 
this question is that strings should operate on grapheme clusters rather 
than code points. Python fails this test:

py> print("noe\u0308l"[::-1])
leon

Some terminals may display the umlaut over the l, or following the l.

I'm not completely sure it is fair to expect a string type to operate on 
grapheme clusters (collections of decomposed characters) as the author 
expects. I think that is going above and beyond what a basic string type 
should be expected to do. I would expect a solid Unicode implementation 
to include support for grapheme clusters, and in that regard Python is 
lacking functionality.


(3) What are the first three characters? The author suggests that the 
answer should be "noë", in which case Python fails again:

py> print("noe\u0308l"[:3])
noe

but again I'm not convinced that slicing should operate across decomposed 
strings in this way. Surely the point of decomposing the string like that 
is in order to count the base character e and the accent "\u0308" 
separately?


(4) Likewise, what is the length of the decomposed string? The author 
expects 4, but Python gives 5:

py> len("noe\u0308l")
5

So far, Python passes only one of the four tests, but I'm not convinced 
that the three failed tests are fair for a string type. If strings 
operated on grapheme clusters, these would be good tests, but it is not a 
given that strings should.

The next few tests have to do with characters in the Supplementary 
Multilingual Planes, and this is where Python 3.3 shines. (In older 
versions, wide builds would also pass, but narrow builds would fail.)

(5) What is the length of "😸😾"?

Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E 
(POUTING CAT FACE) are outside the Basic Multilingual Plane, which means 
they require more than two bytes each. Most programming languages using 
UTF-16 encodings internally (including Javascript and Java) fail this 
test. Python 3.3 passes:

py> s = '😸😾'
py> len(s)
2

(Older versions of Python distinguished between *narrow builds*, which 
used UTF-16 internally and *wide builds*, which used UTF-32. Narrow 
builds would also fail this test.)

This makes Python one of a very few programming languages which can 
easily handle so-called "astral characters" from the Supplementary 
Multilingual Planes while still having O(1) indexing operations.


(6) What is the substring after the first character? The right answer is 
a single character POUTING CAT FACE, and Python gets that correct:

py> unicodedata.name(s[1:])
'POUTING CAT FACE'

UTF-16 languages invariable end up with broken, invalid strings 
containing half of a surrogate pair.


(7) What is the reverse of the string? 

Python passes this test too:

py> print(s[::-1])
😾😸
py> for c in s[::-1]:
... unicodedata.name(c)
...
'POUTING CAT FACE'
'GRINNING CAT FACE WITH SMILING EYES'

UTF-16 based languages typically break, again getting invalid strings 
containing surrogate pairs in the wrong order.


The next test involves ligatures. Ligatures are pairs, or triples, of 
characters which have been moved closer together in order to look better. 
Normally you would expect the type-setter to handle ligatures by 
adjusting the spacing between characters, but there are a few pairs (such 
as "fi" <=> "fi" where type designers provided them as custom-designed 
single characters, and Unicode includes them as legacy characters.

(8) What's the uppercase of "baffle" spelled with an ffl ligature?

Like most other languages, Python 3.2 fails:

py> 'baffle'.upper()
'BAfflE'

but Python 3.3 passes:

py> 'baffle'.upper()
'BAFFLE'


Lastly, Mortoray returns to noël, and compares the composed and 
decomposed versions of the string:

(9) Does "noël" equal "noe\u0308l"?

Python (correctly, in my opinion) reports that they do not:

py> "noël" == "noe\u0308l"
False

Again, one might argue whether a string type should report these as equal 
or not, I believe Python is doing the right thing here. As the author 
points out, any decent Unicode-aware language should at least offer the 
ability to convert between normalisation forms, and Python passes this 
test:

py> unicodedata.normalize("NFD", "noël") == "noe\u0308l"
True
py> "noël" == unicodedata.normalize("NFC

Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Eduardo A . Bustamante López
On Fri, Nov 29, 2013 at 04:31:21PM -0800, farhan...@gmail.com wrote:
> It's for a school assignment. Basically, I need to roll 5 dies with 6 sides 
> each. So basically, 6 random numbers. That part is easy. Then I need to add 
> it up. Ok, done that. However, I also need to say something along the lines 
> of "your total number was X". That's what I'm having trouble with. I added 
> the dice rolls together and put them into a variable I called "number" but it 
> seems to glitch out that variable is in any command other than "print 
> number". Like, if I try to write:
> 
> print "your total number was:" number "" 
> 
> It just doesn't work. 
That would be: print "your total number was" + str(number) + ""

Notice two differences:

- Used the + operator to concatenate two strings: "foo" + "bar" == "foobar"
- Converted the number from integer to string, you can't do:  +
   directly, you have to either int() +  if you want to do
  an integer addition, or: str() + , if you want string
  concatenation.

You didn't use an operator, and «"string" variable "string"» is not
valid python.

-- 
Eduardo Alan Bustamante López
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Johannes Findeisen
On Sat, 30 Nov 2013 01:38:36 +0100
Johannes Findeisen wrote:

> On Fri, 29 Nov 2013 16:31:21 -0800 (PST)
> farhan...@gmail.com wrote:
> 
> > print "The total rolled was: "number" "
> 
> The above line is wrong. You did it right below:
> 
> > print "Thanks for playing, "  + name +  "."
> > print "You bet the total would be at least " + value + "."
> 
> Do this:
> 
> print "The total rolled was: " + number + " "

Sorry, that was wrong! You need to convert to a string when
concatenating...

print "The total rolled was: " + str(number) + " "
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Tim Chase
On 2013-11-29 16:31, farhan...@gmail.com wrote:
> It's for a school assignment.

Thanks for the honesty--you'll get far more helpful & useful replies
because of that. :-)

> put them into a variable I called "number" but it seems to glitch
> out that variable is in any command other than "print number".
> Like, if I try to write:
> 
> print "your total number was:" number "" 
> 
> It just doesn't work. 

You're so close.  Python doesn't let you directly combine strings and
numbers like that, especially without any operator.  However it does
offer several ways to combine strings:

  print "convert to string and use " + str(number) + " a '+' operator"
  print "use %s classic C-style formatting" % number
  print "use new {} formatting style".format(number)

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Mark Lawrence

On 30/11/2013 00:49, Johannes Findeisen wrote:

On Sat, 30 Nov 2013 01:38:36 +0100
Johannes Findeisen wrote:


On Fri, 29 Nov 2013 16:31:21 -0800 (PST)
farhan...@gmail.com wrote:


print "The total rolled was: "number" "


The above line is wrong. You did it right below:


print "Thanks for playing, "  + name +  "."
print "You bet the total would be at least " + value + "."


Do this:

print "The total rolled was: " + number + " "


Sorry, that was wrong! You need to convert to a string when
concatenating...

print "The total rolled was: " + str(number) + " "



Wrong again, or at least overengineered.

print "The total rolled was:", number, "

You don't even need the spaces as print kindly does it for you :)

--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Tim Chase
On 2013-11-30 00:59, Mark Lawrence wrote:
> Wrong again, or at least overengineered.
> 
> print "The total rolled was:", number, "
^
> 
> You don't even need the spaces as print kindly does it for you :)

but you could at least include the missing quotation mark ;-)

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Mark Lawrence

On 30/11/2013 00:44, Steven D'Aprano wrote:


(5) What is the length of "😸😾"?

Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E
(POUTING CAT FACE) are outside the Basic Multilingual Plane, which means
they require more than two bytes each. Most programming languages using
UTF-16 encodings internally (including Javascript and Java) fail this
test. Python 3.3 passes:

py> s = '😸😾'
py> len(s)
2



I couldn't care less if it passes, it's too slow and uses too much 
memory[1], so please get the completely bug ridden Python 2 unicode 
implementation restored at the earliest possible opportunity :)


[1]because I say so although I don't actually have any evidence to 
support my case. :) :)


--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Mark Lawrence

On 30/11/2013 01:06, Tim Chase wrote:

On 2013-11-30 00:59, Mark Lawrence wrote:

Wrong again, or at least overengineered.

print "The total rolled was:", number, "

 ^


You don't even need the spaces as print kindly does it for you :)


but you could at least include the missing quotation mark ;-)

-tkc





It's way past my bedtime :)

--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Need help with programming in python for class (beginner level)

2013-11-29 Thread Johannes Findeisen
On Sat, 30 Nov 2013 01:08:28 +
Mark Lawrence wrote:

> On 30/11/2013 01:06, Tim Chase wrote:
> > On 2013-11-30 00:59, Mark Lawrence wrote:
> >> Wrong again, or at least overengineered.
> >>
> >> print "The total rolled was:", number, "
> >  ^
> >>
> >> You don't even need the spaces as print kindly does it for you :)
> >
> > but you could at least include the missing quotation mark ;-)

:)

> It's way past my bedtime :)

Same to me... I should get more sleep before answering.

Anyway, Thank you for the explanation of this.

Sleep well... ;)

Johannes
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano  wrote:

> (8) What's the uppercase of "baffle" spelled with an ffl ligature?
> 
> Like most other languages, Python 3.2 fails:
> 
> py> 'baffle'.upper()
> 'BAfflE'
> 
> but Python 3.3 passes:
> 
> py> 'baffle'.upper()
> 'BAFFLE'

I disagree.

The whole idea of ligatures like fi is purely typographic.  The crossbar 
on the "f" (at least in some fonts) runs into the dot on the "i".  
Likewise, the top curl on an "f" run into the serif on top of the "l" 
(and similarly for ffl).

There is no such thing as a "FFL" ligature, because the upper case 
letterforms don't run into each other like the lower case ones do.  
Thus, I would argue that it's wrong to say that calling upper() on an 
ffl ligature should yield FFL.

I would certainly expect, x.lower() == x.upper().lower(), to be True for 
all values of x over the set of valid unicode codepoints.  Having 
u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect len(x) == 
len(x.upper()) to be True.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Chris Angelico
On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith  wrote:
> I would certainly expect, x.lower() == x.upper().lower(), to be True for
> all values of x over the set of valid unicode codepoints.  Having
> u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect len(x) ==
> len(x.upper()) to be True.

That's a nice theory, but the Unicode consortium disagrees with you on
both points.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article ,
 Chris Angelico  wrote:

> On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith  wrote:
> > I would certainly expect, x.lower() == x.upper().lower(), to be True for
> > all values of x over the set of valid unicode codepoints.  Having
> > u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect len(x) ==
> > len(x.upper()) to be True.
> 
> That's a nice theory, but the Unicode consortium disagrees with you on
> both points.
> 
> ChrisA

Harumph.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Dave Angel

On Fri, 29 Nov 2013 21:28:47 -0500, Roy Smith  wrote:

In article ,
 Chris Angelico  wrote:
> On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith  wrote:
> > I would certainly expect, x.lower() == x.upper().lower(), to be 

True for
> > all values of x over the set of valid unicode codepoints.  

Having
> > u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect 

len(x) ==

> > len(x.upper()) to be True.


> That's a nice theory, but the Unicode consortium disagrees with 

you on

> both points.


And they were already false long before Unicode.  I don’t know 
specifics but there are many cases where there are no uppercase 
equivalents for a particular lowercase character.  And others where 
the uppercase equivalent takes multiple characters.


--
DaveA

--
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote:

> In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>,
>  Steven D'Aprano  wrote:
> 
>> (8) What's the uppercase of "baffle" spelled with an ffl ligature?
>> 
>> Like most other languages, Python 3.2 fails:
>> 
>> py> 'baffle'.upper()
>> 'BAfflE'

You edited my text to remove the ligature? That's... unfortunate.



>> but Python 3.3 passes:
>> 
>> py> 'baffle'.upper()
>> 'BAFFLE'
> 
> I disagree.
> 
> The whole idea of ligatures like fi is purely typographic.

In English, that's correct. I'm not sure if we can generalise that to all 
languages that have ligatures. It also partly depends on how you define 
ligatures. For example, would you consider that ampersand & to be a 
ligature? These days, I would consider & to be a distinct character, but 
originally it began as a ligature for "et" (Latin for "and").

But let's skip such corner cases, as they provide much heat but no 
illumination, and I'll agree that when it comes to ligatures like fl, fi 
and ffl, they are purely typographic.


> The crossbar
> on the "f" (at least in some fonts) runs into the dot on the "i".
> Likewise, the top curl on an "f" run into the serif on top of the "l"
> (and similarly for ffl).
> 
> There is no such thing as a "FFL" ligature, because the upper case
> letterforms don't run into each other like the lower case ones do. Thus,
> I would argue that it's wrong to say that calling upper() on an ffl
> ligature should yield FFL.

Your conclusion doesn't follow from the argument you are making. Since 
the ffl ligature ffl is purely a typographical feature, then it should 
uppercase to FFL (there being no typographic feature for uppercase FFL 
ligature).

Consider the examples shown above, where you or your software 
unfortunately edited out the ligature and replaced it with ASCII "ffl". 
Or perhaps I should say *fortunately*, since it demonstrates the problem.

Since we agree that the ffl ligature is merely a typographic artifact of 
some type-designers whimsy, we can expect that the word "baffle" is 
semantically exactly the same as the word "baffle". How foolish Python 
would look if it did this:

py> 'baffle'.upper()
'BAfflE'


Replace the 'ffl' with the ligature, and the conclusion remains:

py> 'baffle'.upper()
'BAfflE'

would be equally wrong.

Now, I accept that this picture isn't entirely black and white. For 
example, we might argue that if ffl is purely typographical in nature, 
surely we would also want 'baffle' == 'baffle' too? Or maybe not. This 
indicates that capturing *all* the rules for text across the many 
languages, writing systems and conventions is impossible.

There are some circumstances where we would want 'baffle' and 'baffle' to 
compare equal, and others where we would want them to compare the same. 
Python gives us both:

py> "bapy> "baffle" == "baffle"
False
ffle" == unicodedata.normalize("NFKC", "baffle")
True


but frankly I'm baffled *wink* that you think there are any circumstances 
where you would want the uppercase of ffl to be anything but FFL.


> I would certainly expect, x.lower() == x.upper().lower(), to be True for
> all values of x over the set of valid unicode codepoints.

You would expect wrongly. You are over-generalising from English, and if 
you include ligatures and other special cases, not even all of English.

See, for example:

http://www.unicode.org/faq/casemap_charprop.html#7a

Apart from ligatures, some examples of troublesome characters with regard 
to case are:

* German Eszett (sharp-S) ß can be uppercased to SS, SZ or ẞ depending 
  on context, particular when dealing with placenames and family names.

  (That last character, LATIN CAPITAL LETTER SHARP S, goes back to at
  least the 1930s, although the official rules of German orthography
  still insist on uppercasing ß to SS.)

* The English long-s ſ is uppercased to regular S.

* Turkish dotted and dotless I (İ and i, I and ı) uses the same Latin
  letters I and i but the case conversion rules are different.

* Both the Greek sigma σ and final sigma ς uppercase to Σ.


That last one is especially interesting: Python 3.3 gets it right, while 
older Pythons do not. In Python 3.2:

py> 'Ὀδυσσεύς (Odysseus)'.upper().title()
'Ὀδυσσεύσ (Odysseus)'

while in 3.3 it roundtrips correctly:

py> 'Ὀδυσσεύς (Odysseus)'.upper().title()
'Ὀδυσσεύς (Odysseus)'


So... case conversions are not as simple as they appear at first glance. 
They aren't always reversible, nor do they always roundtrip. Titlecase is 
not necessarily the same as "uppercase the first letter and lowercase the 
rest". Case conversions can be context or locale sensitive.

Anyway... even if you disagree with everything I have said, it is a fact 
that Python has committed to following the Unicode standard, and the 
Unicode standard requires that certain ligatures, including FFL, FL and 
FI, are decomposed when converted to uppercase.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano  wrote:

> You edited my text to remove the ligature? That's... unfortunate.

It was un-ligated by the time it reached me.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Zero Piraeus
:

On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote:
> On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote:
> > The whole idea of ligatures like fi is purely typographic.
> 
> In English, that's correct. I'm not sure if we can generalise that to
> all languages that have ligatures. It also partly depends on how you
> define ligatures. For example, would you consider that ampersand & to
> be a ligature? These days, I would consider & to be a distinct
> character, but originally it began as a ligature for "et" (Latin for
> "and").
> 
> But let's skip such corner cases, as they provide much heat but no 
> illumination, [...]

In the interest of warmth (I know it's winter in some parts of the
world) ...

As I understand it, "&" has always been used to replace the word "et"
specifically, rather than the letter-pair e,t (no-one has ever written
"k&tle" other than ironically), which makes it a logogram rather than a
ligature (like "@").

(I happen to think the presence of ligatures in Unicode is insane, but
my dictator-of-the-world certificate appears to have gotten lost in the
post, so fixing that will have to wait).

 -[]z.

-- 
Zero Piraeus: inter caetera
http://etiol.net/pubkey.asc
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Gene Heskett
On Saturday 30 November 2013 00:23:22 Zero Piraeus did opine:

> On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote:
> > On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote:
> > > The whole idea of ligatures like fi is purely typographic.
> > 
> > In English, that's correct. I'm not sure if we can generalise that to
> > all languages that have ligatures. It also partly depends on how you
> > define ligatures. For example, would you consider that ampersand & to
> > be a ligature? These days, I would consider & to be a distinct
> > character, but originally it began as a ligature for "et" (Latin for
> > "and").
> > 
> > But let's skip such corner cases, as they provide much heat but no
> > illumination, [...]
> 
> In the interest of warmth (I know it's winter in some parts of the
> world) ...
> 
> As I understand it, "&" has always been used to replace the word "et"
> specifically, rather than the letter-pair e,t (no-one has ever written
> "k&tle" other than ironically), which makes it a logogram rather than a
> ligature (like "@").

Whereas in these here parts, the "&" has always been read as a single 
character shortcut for the word "and".
> 
> (I happen to think the presence of ligatures in Unicode is insane, but
> my dictator-of-the-world certificate appears to have gotten lost in the
> post, so fixing that will have to wait).
> 
>  -[]z.


Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

"I remember when I was a kid I used to come home from Sunday School and
 my mother would get drunk and try to make pancakes."
-- George Carlin
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
 law-abiding citizens.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano  wrote:

> > The whole idea of ligatures like fi is purely typographic.
> 
> In English, that's correct. I'm not sure if we can generalise that to all 
> languages that have ligatures. It also partly depends on how you define 
> ligatures.

I was speaking specifically of "ligatures like fi" (or, if you prefer, 
"ligatures like ό".  By which I mean those things printers invented 
because some letter combinations look funny when typeset as two distinct 
letters.

There are other kinds of ligatures.  For example, œ is a dipthong.  It 
makes sense (well, to me, anyway) that upper case œ is Έ.

Well, anyway, that's the truth according to me.  Apparently the Unicode 
Consortium disagrees.  So, who am I to argue with the people who decided 
that I needed to be able to type a "PILE OF POO" character.  Which, by 
the way, I can find in my "Character Viewer" input helper, but which MT 
Newswatcher doesn't appear to be willing to insert into text.  I guess 
Basic Multilingual Poo would have been OK but Astral Poo is too much for 
it.
-- 
https://mail.python.org/mailman/listinfo/python-list


restype is None( something about ctypes)

2013-11-29 Thread april122409
There are two functions in the dll, The C code in the dll both like this:
char * Login(char*host,int port,char* user, char *passwd,int loginSec)
{
return host
/*……*/
}

but when call the func, the result is different, 
the func1 ,the type of the result is bytes;
the func2, the type of the result is None
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Ian Kelly
On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith  wrote:
> I was speaking specifically of "ligatures like fi" (or, if you prefer,
> "ligatures like ό".  By which I mean those things printers invented
> because some letter combinations look funny when typeset as two distinct
> letters.

I think the encoding of your email is incorrect, because GREEK SMALL
LETTER OMICRON WITH TONOS is not a ligature.

> There are other kinds of ligatures.  For example, oe is a dipthong.  It
> makes sense (well, to me, anyway) that upper case oe is Έ.

As above. I can't fathom why would it make sense for the upper case of
LATIN SMALL LIGATURE OE to be GREEK CAPITAL LETTER EPSILON WITH TONOS.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 02:05:59 -0300, Zero Piraeus wrote:

> (I happen to think the presence of ligatures in Unicode is insane, but
> my dictator-of-the-world certificate appears to have gotten lost in the
> post, so fixing that will have to wait).

You're probably right, but we live in an insane world of dozens of insane 
legacy encodings, and part of the mission of Unicode is to include every 
single character that those legacy encodings did. Since some of them 
included ligatures, so must Unicode. Sad but true.

(Unicode is intended as a replacement for the insanity of dozens of 
multiply incompatible character sets. It cannot hope to replace them if 
it cannot represent every distinct character they represent.)


-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 23:00:27 -0700, Ian Kelly wrote:

> On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith  wrote:
>> I was speaking specifically of "ligatures like fi" (or, if you prefer,
>> "ligatures like ό".  By which I mean those things printers invented
>> because some letter combinations look funny when typeset as two
>> distinct letters.
> 
> I think the encoding of your email is incorrect, because GREEK SMALL
> LETTER OMICRON WITH TONOS is not a ligature.

Roy's post, which is sent via Usenet not email, doesn't have an encoding 
set. Since he's sending from a Mac, his software may believe that the 
entire universe understands the Mac Roman encoding, which makes a certain 
amount of sense since if I recall correctly the fi and fl ligatures 
originally appeared in early Mac fonts. 

I'm going to give Roy the benefit of the doubt and assume he actually 
entered the fi ligature at his end. If his software was using Mac Roman, 
it would insert a single byte DE into the message:

py> '\N{LATIN SMALL LIGATURE FI}'.encode('macroman')
b'\xde'


But that's not what his post includes. The message actually includes two 
bytes CF8C, in other words:

'\N{LATIN SMALL LIGATURE FI}'.encode('who the hell knows')
=> b'\xCF\x8C'


Since nearly all of his post is in single bytes, it's some variable-width 
encoding, but not UTF-8.

With no encoding set, our newsreader software starts off assuming that 
the post uses UTF-8 ('cos that's the only sensible default), and those 
two bytes happen to encode to ό GREEK SMALL LETTER OMICRON WITH TONOS.

I'm not surprised that Roy has a somewhat jaundiced view of Unicode, when 
the tools he uses are apparently so broken. But it isn't Unicode's fault, 
its the tools.

The really bizarre thing is that apparently Roy's software, MT-
NewsWatcher, knows enough Unicode to normalise ffl LATIN SMALL LIGATURE FFL 
(sent in UTF-8 and therefore appearing as bytes b'\xef\xac\x84') to the 
ASCII letters "ffl". That's astonishingly weird.

That is really a bizarre error. I suppose it is not entirely impossible 
that the software is actually being clever rather than dumb. Having 
correctly decoded the UTF-8 bytes, perhaps it realised that there was no 
glyph for the ligature, and rather than display a MISSING CHAR glyph 
(usually one of those empty boxes you sometimes see), it normalized it to 
ASCII. But if it's that clever, why the hell doesn't it set an encoding 
line in posts?


-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 00:37:17 -0500, Roy Smith wrote:

> So, who am I to argue with the people who decided that I needed to be
> able to type a "PILE OF POO" character.

Blame the Japanese for that. Apparently some of the biggest users of 
Unicode are the various Japanese mobile phone manufacturers, TV stations, 
map makers and similar. So there's a large number of symbols and emoji 
(emoticons) specifically added for them, presumably because they pay big 
dollars to the Unicode Consortium and therefore get a lot of influence in 
what gets added.


-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list