Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-08 Thread Nobody
On Wed, 05 Oct 2011 21:39:17 -0700, Greg wrote: > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileObj = open(filePath,"r").read() > fileContent =

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread John Gordon
In xDog Walker writes: > What is this io of which you speak? It was introduced in Python 2.6. -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted by bears -- Edward Gorey, "The Gashl

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread xDog Walker
On Thursday 2011 October 06 10:41, jmfauth wrote: > or  (Python2/Python3) > > >>> import io > >>> with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: > > ...     r = f.read() > ... > > >>> repr(r) > > u'a\nb\nc\n' > > >>> with io.open('def.txt', 'w', encoding='utf-8-sig') as f: > > ...     t

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread jmfauth
On 6 oct, 06:39, Greg wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileObj = open(filePath,"r").read() >

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Chris Angelico
On Thu, Oct 6, 2011 at 8:29 PM, Ulrich Eckhardt wrote: > Just wondering, why do you split the latter two parts? I would have used > codecs.open() to open the file and define the encoding in a single step. Is > there a downside to this approach? > Those two steps still happen, even if you achieve

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Ulrich Eckhardt
Am 06.10.2011 05:40, schrieb Steven D'Aprano: (4) Do all your processing in Unicode, not bytes. (5) Encode the text into bytes using UTF-8 encoding. (6) Write the bytes to a file. Just wondering, why do you split the latter two parts? I would have used codecs.open() to open the file and defi

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Chris Angelico
On Thu, Oct 6, 2011 at 3:39 PM, Greg wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileContent = fileObj.

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Greg
Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. fileObj = open(filePath,"r").read() fileContent = fileObj.decode("iso-8859-2") fileSo

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Steven D'Aprano
On Wed, 05 Oct 2011 16:35:59 -0700, Greg wrote: > Hi, I am having some encoding problems when I first parse stuff from a > non-english website using BeautifulSoup and then write the results to a > txt file. If you haven't already read this, you should do so: http://www.joelonsoftware.com/article

encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Greg
Hi, I am having some encoding problems when I first parse stuff from a non-english website using BeautifulSoup and then write the results to a txt file. I have the text both as a normal (text) and as a unicode string (utext): print repr(text) 'Branie zak\xc2\xb3adnik\xc3\xb3w' print repr(utext) u