Re: Can somebody tell me what's wrong wrong with my code? I don't understand

2016-11-22 Thread Gene Heskett
On Monday 21 November 2016 22:20:40 Chris Angelico wrote:

> On Tue, Nov 22, 2016 at 2:10 PM,   wrote:
> > Hi! This is my first post! I'm having trouble understanding my code.
> > I get "SyntaxError:invalid syntax" on line 49. I'm trying to code a
> > simple text-based rpg on repl.it. Thank you for reading.
> >
> >
> > elif raceNum==3:
> >   print("Nice fur. I don't see too many of your kind 'round here.
> > Maybe that's a good thing...") print('')
> >   classNum=int(input("What's your profession mate?")
> >
> > elif raceNum==4: #this line has an error for some reason
> >   print("Your a 'Mongo eh? I thought you lads were extinct...Just
> > keep your tongue in ya mouth and we'll get along fine mate.")
> > classNum=int(input("What's your profession?"))
>
> Welcome to the community! I've trimmed your code to highlight the part
> I'm about to refer to.
>
> One of the tricks to understanding these kinds of errors is knowing
> how the code is read, which is: top to bottom, left to right, exactly
> the same as in English. Sometimes, a problem with one line of code is
> actually discovered on the next line of code. (Occasionally further
> down.) When you get a syntax error at the beginning of a line, it's
> worth checking the previous line to see if it's somehow unfinished.
>
> Have a look at your two blocks of code here. See if you can spot a
> difference. There is one, and it's causing your error.
>
> I'm hinting rather than overtly pointing it out, so you get a chance
> to try this for yourself. Have at it!
>
> ChrisA

I'm a fading 82 yo, and python dummy, but I think I see it.  In fact, I 
wrote a program to check for that and similar errors in my C code back 
in the late '80's. I called it cntx at the time.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 
-- 
https://mail.python.org/mailman/listinfo/python-list


Result is not Displayed

2016-11-22 Thread prihantoro2001
Dear all, 

i am new to Python and have this problem

=
import nltk
puzzle_letters = nltk.FreqDist('egivrvonl')
obligatory = 'r'
wordlist = nltk.corpus.words.words()
[w for w in wordlist if len(w) >= 6
and obligatory in w
and nltk.FreqDist(w) <= puzzle_letters]
print puzzle_letters
==

this gives me 
while the expected outcome is ['glover', 'govern', ...]
did i miss something?

Thank you

Pri
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Result is not Displayed

2016-11-22 Thread Irmen de Jong
On 22-11-2016 9:18, prihantoro2...@gmail.com wrote:
> Dear all, 
> 
> i am new to Python and have this problem
> 
> =
> import nltk
> puzzle_letters = nltk.FreqDist('egivrvonl')
> obligatory = 'r'
> wordlist = nltk.corpus.words.words()
> [w for w in wordlist if len(w) >= 6
> and obligatory in w
> and nltk.FreqDist(w) <= puzzle_letters]
> print puzzle_letters
> ==
> 
> this gives me 
> while the expected outcome is ['glover', 'govern', ...]
> did i miss something?


Review your code carefully. You're printing the wrong thing at the end, and 
you're doing
nothing with the list comprehension that comes before it.

-irmen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Result is not Displayed

2016-11-22 Thread Peter Otten
prihantoro2...@gmail.com wrote:

> Dear all,
> 
> i am new to Python and have this problem
> 
> =
> import nltk
> puzzle_letters = nltk.FreqDist('egivrvonl')
> obligatory = 'r'
> wordlist = nltk.corpus.words.words()
> [w for w in wordlist if len(w) >= 6
> and obligatory in w
> and nltk.FreqDist(w) <= puzzle_letters]
> print puzzle_letters
> ==
> 
> this gives me 
> while the expected outcome is ['glover', 'govern', ...]
> did i miss something?

You asked for letters when you wanted to see the words. You actually build 
the list of words with

> [w for w in wordlist if len(w) >= 6
> and obligatory in w
> and nltk.FreqDist(w) <= puzzle_letters]

but don't even bind it to a name. Try

words = [
w for w in wordlist if len(w) >= 6
and obligatory in w
and nltk.FreqDist(w) <= puzzle_letters
]
print words

to get what you want.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread BartC

On 22/11/2016 02:44, Steve D'Aprano wrote:

On Tue, 22 Nov 2016 05:43 am, BartC wrote:


The fastest I can get compiled, native code to do this is at 250 million
cross-products per second.


(Actually 300 million using 64-bit code.)


Yes, yes, you're awfully clever, and your secret private language is so much
more efficient than even C that the entire IT industry ought to hang their
head in shame.


The 250M/300M timing was using C and gcc-O3.

This gives an indication of the upper limit of what is possible.

The suggestion was that the numpy solution, which was *one thousand time 
slower* than these figures, wouldn't be any faster if written in C.


I'm simply suggesting there is plenty of room for improvement. I even 
showed a version that did *exactly* what numpy does (AFAIK) that was 
three times the speed of numpy even executed by CPython. So there is 
some mystery there.


(FWIW my own compiled language manages 70M, and my interpreted language 
up to 3M. Poor, so I might look at them again next time I have to do 
loads of vector-products.)



I'm only being *half* sarcastic here, for what its worth. I remember the
days when I could fit an entire operating system, plus applications, on a
400K floppy disk, and they would run at acceptable speed on something like
an 8 MHz CPU. Code used to be more efficient, with less overhead. But given
that your magic compiler runs only on one person's PC in the entire world,
it is completely irrelevant.


So you're saying that numpy's speed is perfectly reasonable and we 
should do nothing about it? Because, after all, it does a few extra 
clever things (whether the programmer wants them or not!).


And the way it is written is completely beyond reproach:

msg = "incompatible dimensions for cross product\n"\
  "(dimension must be 2 or 3)"
if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]):
raise ValueError(msg)

So it is fine for msg to be bound to a string value here (and having to 
unbind it again when it returns) even though it will never be used again 
in a normal call. (Or having to test both a.shape[0] and b.shape[0] for 
inclusion in a list, expensive looking operations, when they are tested 
again anyway in the next few lines.)


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread Skip Montanaro
> I'm simply suggesting there is plenty of room for improvement. I even
showed a version that did *exactly* what numpy does (AFAIK) that was three
times the speed of numpy even executed by CPython. So there is some mystery
there.

As I indicated in my earlier response, your version doesn't pass all of
numpy's cross product unit tests. Fix that and submit a patch to the numpy
maintainers. I suspect it would be accepted.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread BartC

On 22/11/2016 12:34, Skip Montanaro wrote:

I'm simply suggesting there is plenty of room for improvement. I even

showed a version that did *exactly* what numpy does (AFAIK) that was three
times the speed of numpy even executed by CPython. So there is some mystery
there.

As I indicated in my earlier response, your version doesn't pass all of
numpy's cross product unit tests. Fix that and submit a patch to the numpy
maintainers. I suspect it would be accepted.


I saw your response but didn't understand it. My code was based around 
what Peter Otten posted from numpy sources.


I will have a look. Don't forget however that all someone is trying to 
do is to multiply two vectors. They're not interested in axes 
transformation or making them broadcastable, whatever that means.


So making numpy.cross do all that may simply be too big a cost.

--
bartc

--
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread BartC

On 22/11/2016 03:00, Steve D'Aprano wrote:

On Tue, 22 Nov 2016 12:45 pm, BartC wrote:



You get to know after while what kinds of processes affect timings. For
example, streaming a movie at the same time.


Really, no.



py> with Stopwatch():
... x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.007164 seconds


I tried 'Stopwatch()' and it didn't understand what it was.


And again:

py> with Stopwatch():
... x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.14 seconds


Look at the variation in the timing: 0.007164 versus 0.14 second. That's
the influence of a cache, or more than one cache, somewhere. But if I run
it again:

py> with Stopwatch():
... x = math.sin(1.234)
...
elapsed time is very small; consider using the timeit module for
micro-timings of small code snippets
time taken: 0.13 seconds

there's a smaller variation, this time "only" 7%, for code which hasn't
changed. That's what your up against.


I tried this:

import math

def fn():
for i in xrange(300):
x=math.sin(1.234)
print x

fn()

If I run this, my IDE tells me the whole thing took 0.93 seconds. That's 
including the overheads of the IDE, invoking python.exe, Python's 
start-up time, the function call, the loop overheads, and whatever is 
involved in shutting it all down again.


Now I replace the x=math.sin() line with pass. The IDE now says 0.21 
seconds. The only thing that's changed is the sin() call. I can probably 
deduce that executing x=math.sin(1.2340) three million times took 0.72 
seconds (with some expected variation).


So over 4 million times per second. Which is interesting because your 
timings with Stopwatch varied from 140 to 71000 per second (and I doubt 
your machine is that much slower than mine).




[steve@ando ~]$ python2.7 -m timeit -s "x = 257" "3*x"
1000 loops, best of 3: 0.106 usec per loop
[steve@ando ~]$ python3.5 -m timeit -s "x = 257" "3*x"
1000 loops, best of 3: 0.137 usec per loop



That's *brilliant* and much simpler than anything you are doing with loops
and clocks and whatnot.


But you lose all context. And there is a limit to what you can do with 
such micro-benchmarks.


Sometimes you want to time a real task, but with one line substituted 
for another, or some other minor rearrangement.


Or perhaps the overall time of a code fragment depends on the mix of 
data it has to deal with.



Code will normally exist as a proper part of a module, not on the
command line, in a command history, or in a string, so why not test it
running inside a module?


Sure, you can do that, if you want potentially inaccurate results.


When your customer runs your application (and complains about how long 
it takes), they will be running the code as normal not inside a string 
or on the command line!



In this specific example, the OP is comparing two radically different pieces
of code that clearly and obviously perform differently. He's doing the
equivalent of timing the code with his heartbeat, and getting 50 beats for
one and 150 beats for the other. That's good enough to show gross
differences in performance.


No, he's using time.clock() with, presumably, a consistent number of 
ticks per second.



But often you're comparing two code snippets which are very nearly the same,
and trying to tease out a real difference of (say) 3% out of a noisy signal
where each run may differ by 10% just from randomness. Using your heartbeat
to time code is not going to do it.


Yes, exactly, that's why typing a code fragment on the command line is a 
waste of time. You need to design a test where the differences are going 
to be more obvious.


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread eryk sun
On Tue, Nov 22, 2016 at 1:06 PM, BartC  wrote:
>> In this specific example, the OP is comparing two radically different
>> pieces of code that clearly and obviously perform differently. He's doing
>> the equivalent of timing the code with his heartbeat, and getting 50 beats
>> for one and 150 beats for the other. That's good enough to show gross
>> differences in performance.
>
> No, he's using time.clock() with, presumably, a consistent number of ticks
> per second.

Note that some people in this discussion use Unix systems, on which
using time.clock for wall-clock timing is completely wrong. Please use
timeit.default_timer to ensure that examples are portable.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread BartC

On 22/11/2016 12:45, BartC wrote:

On 22/11/2016 12:34, Skip Montanaro wrote:

I'm simply suggesting there is plenty of room for improvement. I even

showed a version that did *exactly* what numpy does (AFAIK) that was
three
times the speed of numpy even executed by CPython. So there is some
mystery
there.

As I indicated in my earlier response, your version doesn't pass all of
numpy's cross product unit tests. Fix that and submit a patch to the
numpy
maintainers. I suspect it would be accepted.


I saw your response but didn't understand it. My code was based around
what Peter Otten posted from numpy sources.

I will have a look. Don't forget however that all someone is trying to
do is to multiply two vectors. They're not interested in axes
transformation or making them broadcastable, whatever that means.


It seems the posted code of numpy.cross wasn't complete. The full code 
is below (I've removed the doc string, but have had to add some 'np.' 
prefixes as it is now out of context).


If anyone is still wondering why numpy.cross is slow, then no further 
comments are necessary!


--
def cross(a, b, axisa=-1, axisb=-1, axisc=-1, axis=None):

if axis is not None:
axisa, axisb, axisc = (axis,) * 3
a = np.asarray(a)
b = np.asarray(b)
# Check axisa and axisb are within bounds
axis_msg = "'axis{0}' out of bounds"
if axisa < -a.ndim or axisa >= a.ndim:
raise ValueError(axis_msg.format('a'))
if axisb < -b.ndim or axisb >= b.ndim:
raise ValueError(axis_msg.format('b'))
# Move working axis to the end of the shape
a = np.rollaxis(a, axisa, a.ndim)
b = np.rollaxis(b, axisb, b.ndim)
msg = ("incompatible dimensions for cross product\n"
   "(dimension must be 2 or 3)")
if a.shape[-1] not in (2, 3) or b.shape[-1] not in (2, 3):
raise ValueError(msg)

# Create the output array
shape = np.broadcast(a[..., 0], b[..., 0]).shape
if a.shape[-1] == 3 or b.shape[-1] == 3:
shape += (3,)
# Check axisc is within bounds
if axisc < -len(shape) or axisc >= len(shape):
raise ValueError(axis_msg.format('c'))
dtype = np.promote_types(a.dtype, b.dtype)
cp = np.empty(shape, dtype)

# create local aliases for readability
a0 = a[..., 0]
a1 = a[..., 1]
if a.shape[-1] == 3:
a2 = a[..., 2]
b0 = b[..., 0]
b1 = b[..., 1]
if b.shape[-1] == 3:
b2 = b[..., 2]
if cp.ndim != 0 and cp.shape[-1] == 3:
cp0 = cp[..., 0]
cp1 = cp[..., 1]
cp2 = cp[..., 2]

if a.shape[-1] == 2:
if b.shape[-1] == 2:
# a0 * b1 - a1 * b0
multiply(a0, b1, out=cp)
cp -= a1 * b0
return cp
else:
assert b.shape[-1] == 3
# cp0 = a1 * b2 - 0  (a2 = 0)
# cp1 = 0 - a0 * b2  (a2 = 0)
# cp2 = a0 * b1 - a1 * b0
np.multiply(a1, b2, out=cp0)
np.multiply(a0, b2, out=cp1)
np.negative(cp1, out=cp1)
np.multiply(a0, b1, out=cp2)
cp2 -= a1 * b0
else:
assert a.shape[-1] == 3
if b.shape[-1] == 3:
# cp0 = a1 * b2 - a2 * b1
# cp1 = a2 * b0 - a0 * b2
# cp2 = a0 * b1 - a1 * b0
np.multiply(a1, b2, out=cp0)
tmp = np.array(a2 * b1)
cp0 -= tmp
np.multiply(a2, b0, out=cp1)
np.multiply(a0, b2, out=tmp)
cp1 -= tmp
np.multiply(a0, b1, out=cp2)
np.multiply(a1, b0, out=tmp)
cp2 -= tmp
else:
assert b.shape[-1] == 2
# cp0 = 0 - a2 * b1  (b2 = 0)
# cp1 = a2 * b0 - 0  (b2 = 0)
# cp2 = a0 * b1 - a1 * b0
np.multiply(a2, b1, out=cp0)
np.negative(cp0, out=cp0)
np.multiply(a2, b0, out=cp1)
np.multiply(a0, b1, out=cp2)
cp2 -= a1 * b0

# This works because we are moving the last axis
return np.rollaxis(cp, -1, axisc)


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


How to you convert list of tuples to string

2016-11-22 Thread Ganesh Pal
Dear friends ,


I am using fedora 18 and on  Python 2.7 version


I have a list of tuples as shown below

>> list

[(1, 1, 373891072L, 8192), (1, 3, 390348800L, 8192), (1, 4, 372719616L,
8192), (2, 3, 382140416L, 8192), (2, 5, 398721024L, 8192), (3, 1,
374030336L, 8192), (3, 3, 374079488L, 8192), (3, 5, 340058112L, 8192)]

(a) I need to select any element randomly the list say (x, y, xL,
8192)

  >>> list
 [(1, 1, 373891072L, 8192), (1, 3, 390348800L, 8192), (1, 4,
372719616L, 8192), (2, 3, 382140416L, 8192), (2, 5, 398721024L, 8192), (3,
1, 374030336L, 8192), (3, 3, 374079488L, 8192), (3, 5, 340058112L, 8192)]


  >>> import random
  >>> i = random.randrange(len(list))
  >>> sel_item = list[i]
  >>> sel_item
  (3, 5, 340058112L, 8192)



(b) Then convert the selected item in the below format i.e
 1,1,373891072:8192 ( strip L and add :)

 >>> sel_item
  (3, 5, 340058112L, 8192)
   >> c1 = ','.join(map(str,sel_item))

# what happened to 'L' it got stripped automatically ? will these be a
problem
   >>> c1
   '3,5,340058112,8192'
#last four are always 8912 and
   >>> c1 = c1[0:-5] + ':8912'
>>> c1
 '3,5,340058112:8912'
   >>>


Any better suggestion to improve this piece of code and make it look more /
pythonic


Regards,
Ganesh Pal
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread Steve D'Aprano
On Tue, 22 Nov 2016 11:45 pm, BartC wrote:


> I will have a look. Don't forget however that all someone is trying to
> do is to multiply two vectors. They're not interested in axes
> transformation or making them broadcastable, whatever that means.

You don't know that.

Bart, you have a rather disagreeable tendency towards assuming that if you
don't want to do something, then nobody else could possibly want to do it.
You should try to keep an open mind to the possibility that perhaps there
are use-cases that you didn't think of. numpy is not *the* most heavily
used third-party library in the Python ecosystem because its slow.

numpy is a library for doing vectorised operations over massive arrays.
You're thinking of numpy users doing the cross product of two vectors, but
you should be thinking of numpy users doing the cross product of a million
pairs of vectors.

Could numpy optimize the single pair of vectors case a bit better? Perhaps
they could. It looks like there's a bunch of minor improvements which could
be made to the code, a few micro-optimizations that shave a microsecond or
two off the execution time. And maybe they could even detect the case where
the arguments are a single pair of vectors, and optimize that. Even
replacing it with a naive pure-Python cross product would be a big win.

But for the big array of vectors case, you absolutely have to support doing
fast vectorized cross-products over a huge number of vectors.

py> a = np.array([
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... ]*1000
... )
py> b = np.array([
... [9, 8, 7],
... [6, 5, 4],
... [3, 2, 1],
... ]*1000
... )
py> a.shape
(3000, 3)
py> result = np.cross(a, b)
py> result.shape
(3000, 3)


On my computer, numpy took only 10 times longer to cross-multiply 3000 pairs
of vectors than it took to cross-multiply a single pair of vectors. If I
did that in pure Python, it would take 3000 times longer, or more, so numpy
wins here by a factor of 300.




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to you convert list of tuples to string

2016-11-22 Thread Michiel Overtoom
Hi Ganesh,

> Any better suggestion to improve this piece of code and make it look more 
> pythonic?


import random

# A list of tuples. Note that the L behind a number means that the number is a 
'long'.

data = [(1, 1, 373891072L, 8192), (1, 3, 390348800L, 8192), (1, 4, 372719616L,
8192), (2, 3, 382140416L, 8192), (2, 5, 398721024L, 8192), (3, 1,
374030336L, 8192), (3, 3, 374079488L, 8192), (3, 5, 340058112L, 8192)]

item = random.choice(data)  # Select a random item from the 'data' list.

msg = "%d,%d,%d:%d" % item  # Format it in the way you like.

print msg


Greetings,

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Guido? Where are you?

2016-11-22 Thread Jon Ribbens
On 2016-11-22, Gilmeh Serda  wrote:
> On Mon, 21 Nov 2016 00:53:33 -0800, Ethan Furman wrote:
>> Unfortunately, we do not have any control over the comp.lang.python
>> newsgroup
>
> Gee, "unfortunately"? Really!? Gosh! I'm glad I don't have to live 
> anywhere close to you. 8·[
>
> NOBODY "owns" the groups, and rightfully so. If you want the groups to be 
> private, run your own server!

... or make them moderated, which I think they should do to this group,
so that the list mirroring works properly.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Numpy slow at vector cross product?

2016-11-22 Thread BartC

On 22/11/2016 16:48, Steve D'Aprano wrote:

On Tue, 22 Nov 2016 11:45 pm, BartC wrote:



I will have a look. Don't forget however that all someone is trying to
do is to multiply two vectors. They're not interested in axes
transformation or making them broadcastable, whatever that means.


You don't know that.


When I did vector arithmetic at school, the formula for the 
cross-product of (x1,y1,z1) with (x2,y2,z2) was quite straightforward. 
This is what you expect given a library function to do the job. 
Especially in an implementation that could be crippled by running as 
unoptimised byte-code.


If someone was given the task of cross-multiplying two three-element 
vectors, the formula in the text-book (or Wikipedia) is what they would 
write.



Could numpy optimize the single pair of vectors case a bit better? Perhaps
they could. It looks like there's a bunch of minor improvements which could
be made to the code, a few micro-optimizations that shave a microsecond or
two off the execution time. And maybe they could even detect the case where
the arguments are a single pair of vectors, and optimize that. Even
replacing it with a naive pure-Python cross product would be a big win.


The code it ends up executing for individual pairs of vectors is a joke. 
97% of what it does is nothing to do with the cross-product!



But for the big array of vectors case, you absolutely have to support doing
fast vectorized cross-products over a huge number of vectors.


That's how I expected numpy to work. I said something along those lines 
in my first post:


BC:
> Maybe numpy has extra overheads, and the arrays being operated on are
> very small, but even so, 30 times slower than CPython? (2.5 to 0.083
> seconds.)



py> a = np.array([
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... ]*1000
... )
py> b = np.array([
... [9, 8, 7],
... [6, 5, 4],
... [3, 2, 1],
... ]*1000
... )
py> a.shape
(3000, 3)
py> result = np.cross(a, b)
py> result.shape
(3000, 3)

On my computer, numpy took only 10 times longer to cross-multiply 3000 pairs
of vectors than it took to cross-multiply a single pair of vectors. If I
did that in pure Python, it would take 3000 times longer, or more, so numpy
wins here by a factor of 300.


I tested this with 3 million pairs on my machine. It managed 8 million 
products per second (about the same as the simplest Python code, but run 
with pypy), taking nearly 0.4 seconds. Except it took over 4 seconds to 
create the special numpy arrays!


Setting up ordinary arrays was much faster, but then the cross-product 
calculation was slower.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Steven Truppe

I all,


i'm using linux and python 2 and want to parse a file line by line by 
executing a command with the line (with os.system).


My problem now is that i'm opening the file and parse the title but i'm 
not able to get it into a normal filename:



import os,sys

import urlib,re,cgi

import HTMLParser, uincodedata

import htmlentiytdefs

imort chardet

for ULR in open('list.txt', "r").readlines():

teste_egex="(.+?)

patter = re.compile(these_regex)

htmlfile=urlib.urlopen(URL)

htmltext=htmlfile.read()

title=re.aindall(pater, htmltext)[0]

title = HTMLParser.HTMLParser.unescape(title)

print "title = ", title

# here i would like to create a directory named after the content of the title


I allways get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2



i've played around with .ecode('latin-1') or ('utf8') but i was not yet 
able to sove this simple issue.



Tanks in advance,

Truppe Steven

--
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Steven Truppe

I've made a pastebin with a few examples: http://pastebin.com/QQQFhkRg



On 2016-11-22 21:33, Steven Truppe wrote:

I all,


i'm using linux and python 2 and want to parse a file line by line by 
executing a command with the line (with os.system).


My problem now is that i'm opening the file and parse the title but 
i'm not able to get it into a normal filename:



import os,sys

import urlib,re,cgi

import HTMLParser, uincodedata

import htmlentiytdefs

imort chardet

for ULR in open('list.txt', "r").readlines():

teste_egex="(.+?)

patter = re.compile(these_regex)

htmlfile=urlib.urlopen(URL)

htmltext=htmlfile.read()

title=re.aindall(pater, htmltext)[0]

title = HTMLParser.HTMLParser.unescape(title)

print "title = ", title

# here i would like to create a directory named after the content of 
the title



I allways get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2



i've played around with .ecode('latin-1') or ('utf8') but i was not 
yet able to sove this simple issue.



Tanks in advance,

Truppe Steven



--
https://mail.python.org/mailman/listinfo/python-list


new python package: financial_life

2016-11-22 Thread Martin Pyka

Hi folks,

for all data scientists and especially financial analysts out there, 
this python package might be a useful resource:


https://github.com/MartinPyka/financial_life

With financial_life, monetary flows between different bank accounts can 
be simulated with a few lines of code. These simulations can help to get 
a deeper understanding of financial plans and a better comparison of 
financial products (in particular loan conditions) for personal 
circumstances.


You can
- analyse loan conditions and payment strategies
- create dynamic monetary flows between accounts for modeling more 
realistic scenarios

- extend the code by controller functions (e.g. for modeling tax payments)

I wrote it to analyse my own financial plans. Maybe, this package is 
also helpful for some of you.


Best,
Martin


--
https://mail.python.org/mailman/listinfo/python-list


Anyone needs a graphdb written in Python?

2016-11-22 Thread Amirouche Boubekki
Héllo,


I am working on a graphdb written Python on top of wiredtiger.

Anyone want to share about the subject about where this could be made
useful?

TIA!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Lew Pitcher
On Tuesday November 22 2016 15:54, in comp.lang.python, "Steven Truppe"
 wrote:

> I've made a pastebin with a few examples: http://pastebin.com/QQQFhkRg
> 
> 
> 
> On 2016-11-22 21:33, Steven Truppe wrote:
>> I all,
>>
>>
>> i'm using linux and python 2 and want to parse a file line by line by
>> executing a command with the line (with os.system).
>>
>> My problem now is that i'm opening the file and parse the title but
>> i'm not able to get it into a normal filename:
>>
>>
>> import os,sys
>>
>> import urlib,re,cgi
>>
>> import HTMLParser, uincodedata
>>
>> import htmlentiytdefs
>>
>> imort chardet
>>
>> for ULR in open('list.txt', "r").readlines():
>>
>> teste_egex="(.+?)
>>
>> patter = re.compile(these_regex)
>>
>> htmlfile=urlib.urlopen(URL)
>>
>> htmltext=htmlfile.read()
>>
>> title=re.aindall(pater, htmltext)[0]
>>
>> title = HTMLParser.HTMLParser.unescape(title)
>>
>> print "title = ", title
>>
>> # here i would like to create a directory named after the content of
>> the title
>>
>>
>> I allways get this error:
>>
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2
>>
>>
>>
>> i've played around with .ecode('latin-1') or ('utf8') but i was not
>> yet able to sove this simple issue.

I'm no python programmer, but I do have a couple of observations.

First, though, here's an extract from that pastebin posting of yours:
> print "Title = ", title.decode()
>  
> - RESULT 
> Title =  Wizo - Anderster Full Album - YouTube
> Title =  Wizo - Bleib Tapfer / für'n Arsch Full Album - YouTube
> Title =  WIZO - Uuaarrgh Full Album - YouTube
> Title =  WIZO - Full Album - "Punk gibt's nicht umsonst! (Teill
III)" - YouTube
> Title =  WIZO - Full Album - "DER" - YouTube
> Title =  Alarmsignal -  Wir leben - YouTube
> Title =  the Pogues - Body of an american - YouTube
> Title =  The Pogues -  The band played waltzing matilda - YouTube
> Title =  Hey Rote Zora - Heiter bis Wolkig - YouTube
> Title =  Für immer Punk - die goldenen Zitronen - YouTube
> Title =  Fuckin' Faces - Krieg und Frieden - YouTube
> Title =  Sluts - Anders - YouTube
> Title =  Absturz - Es ist schön ein Punk zu sein - YouTube
> Title =  Broilers - Ruby Light & Dark - YouTube
> Title =  Less Than Jake 02 - My Very Own Flag - YouTube
> Title =  The Mighty Mighty Bosstones - The Impression That I Get - YouTube
> Title =  Streetlight Manifesto - Failing Flailing (lyrics) - YouTube
> Title =  Mustard Plug - Mr. Smiley - YouTube
>  
> But when i try:
> os.mkdir(title)
> i get the following:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23:
> ordinal not in range(128) 

Now for the observations

1) some of your titles contain the '/' character, which on your platform
(Linux) is taken as a path separator character. The os.mkdir() method
apparently expects it's "path" argument to name a file in an already existing
directory. That is to say, if path is "/a/b/c", then os.mkdir() expects that
the directory /a/b will already exist. Those titles that contain the path
separator character will cause os.mkdir() to attempt to create a file in a
subdirectory of the current directory, and that subdirectory doesn't exist
yet. You either have to sanitize your input to remove the path separators,
and use os.mkdir() to create a file named with the name of the sanitized
path, /or/ use os.makedirs(), which will create all the subdirectories
required by your given path.

2) Apparently os.mkdir() (at least) defaults to requiring an ASCII pathname.
Those of your titles that contain Unicode characters cannot be stored
verbatim without either
  a) re-encoding the title in ASCII, or
  b) flagging to os.mkdir() that Unicode is acceptable.
Apparently, this is a common problem; a google search brought up several pages
dedicated to answering this question, including one extensive paper on the
topic (http://nedbatchelder.com/text/unipain.html). There apparently are ways
to cause os.mkdir() to accept Unicode inputs; their effectiveness and
side-effects are beyond me.

HTH
-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Steve D'Aprano
On Wed, 23 Nov 2016 07:33 am, Steven Truppe wrote:

> imort chardet

That's not working Python code.

Steven, you have asked us to help you with some code. For us to do that, we
need to see the ACTUAL code you are running, not some other code which is
full of typos and may be very different from what is actually being run.

Don't re-type your program, copy and paste it. And make sure that it is the
code that does what you say it does. We're volunteers, and it isn't very
nice to have us waste our time trying to fix your code only for you to then
later say "oh sorry, that was the wrong code".


> # here i would like to create a directory named after the content of the
> # title

This appears to be the critical code: you're saying you would like to create
a directory, but don't show us the code that creates the directory!

We're very clever, but we cannot read your mind.


> I allways get this error:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2

Are we supposed to guess where you get that error? Python gives you lots of
excellent debugging information: the traceback. The traceback shows you
which line of code fails with that error, and the full list of lines of
code calling it.

Please COPY and PASTE (don't re-type, don't summarise, don't simplify, and
especially don't take a screen shot) the entire traceback, starting from
the line beginning "Traceback" and going to the final error message.





-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Steve D'Aprano
On Wed, 23 Nov 2016 07:54 am, Steven Truppe wrote:

> I've made a pastebin with a few examples: http://pastebin.com/QQQFhkRg

Your pastebin appears to be the same code as you've shown here. And, again,
it doesn't seem to be the actual code you are really running.

The only new or helpful information is that you're trying to call

os.mkdir(title)

where title is, well, we have to guess, because you don't tell us.

My *guess* is that the failure happens when processing the title:

"Wizo - Bleib Tapfer / für'n Arsch Full Album - YouTube"

but since we really don't know the actual code you are using, we have to
guess.

My guess is that you have the byte string:

title = "Wizo - Bleib Tapfer / f\xc3\xbcr'n Arsch Full Album - YouTube"

Notice the \xc3 byte? You can check this by printing the repr() of the
title:

py> print repr(title)
'Wizo - Bleib Tapfer / f\xc3\xbcr'n Arsch Full Album - YouTube'



When you try title.decode(), it fails:

py> print title.decode()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23:
ordinal not in range(128)



But if you tell it to use UTF-8, it succeeds:

py> print title.decode('utf-8')
Wizo - Bleib Tapfer / für'n Arsch Full Album - YouTube



So my guess is that you're tying to create the directory like this:


os.mkdir(title.decode())


and getting the same error. You should try:

os.mkdir(title.decode('utf-8'))

which will at least give you a new error: you cannot use '/' inside a
directory name. So you can start by doing this:


os.mkdir(title.replace('/', '-').decode('utf-8'))


and see what happens.


Beyond that, I cannot guess what you need to do to fix the code I haven't
seen.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Steve D'Aprano
On Wed, 23 Nov 2016 09:00 am, Lew Pitcher wrote:

> 2) Apparently os.mkdir() (at least) defaults to requiring an ASCII
> pathname. 

No, you have misinterpreted what you have seen.

Even in Python 2, os.mkdir will accept a Unicode argument. You just have to
make sure it is given as unicode:

os.mkdir(u'/tmp/für')

Notice the u' delimiter instead of the ordinary ' delimiter? That tells
Python to use a unicode (text) string instead of an ascii byte-string.

If you don't remember the u' delimiter, and write an ordinary byte-string '
delimiter, then the result you get will depend on some combination of your
operating system, the source code encoding, and Python's best guess of what
you mean.

os.mkdir('/tmp/für')  # don't do this!

*might* work, if all the factors align correctly, but often won't. And when
it doesn't, the failure can be extremely mysterious, usually involving a
spurious 

UnicodeDecodeError: 'ascii' codec

error.

Dealing with Unicode text is much simpler in Python 3. Dealing with
*unknown* encodings is never easy, but so long as you can stick with
Unicode and UTF-8, Python 3 makes it easy. 




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Unexpected PendingDeprecationWarning

2016-11-22 Thread Nathan Ernst
I'm using Python 3.5.2, and the following code (when invoked) causes a
PendingDeprecationWarning when used in a unit test:

def identity(x):
  return x

def adjacent_difference(seq, selector=identity):
  i = iter(seq)
  l = selector(next(i))
  while True:
r = selector(next(i))
yield r - l
l = r

I wrote this to mimic the C++ std algorithm (defined here:
http://en.cppreference.com/w/cpp/algorithm/adjacent_difference).

What I don't understand is why I get this warning.

The exact error message I get from unittest is:
PendingDeprecationWarning: generator 'adjacent_difference' raised
StopIteration

I'd appreciate any insight into what is causing this deprecation warning,
as I am stumped.

Regards,
Nate
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unexpected PendingDeprecationWarning

2016-11-22 Thread MRAB

On 2016-11-23 02:50, Nathan Ernst wrote:

I'm using Python 3.5.2, and the following code (when invoked) causes a
PendingDeprecationWarning when used in a unit test:

def identity(x):
  return x

def adjacent_difference(seq, selector=identity):
  i = iter(seq)
  l = selector(next(i))
  while True:
r = selector(next(i))
yield r - l
l = r

I wrote this to mimic the C++ std algorithm (defined here:
http://en.cppreference.com/w/cpp/algorithm/adjacent_difference).

What I don't understand is why I get this warning.

The exact error message I get from unittest is:
PendingDeprecationWarning: generator 'adjacent_difference' raised
StopIteration

I'd appreciate any insight into what is causing this deprecation warning,
as I am stumped.

The 'while' loop keeps calling next(i) until it raises StopIteration, 
and that kind of behaviour can hide obscure bugs.


If you want to know the details, you can read the rationale behind the 
change in the relevant PEP:


PEP 479 -- Change StopIteration handling inside generators
https://www.python.org/dev/peps/pep-0479/

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unexpected PendingDeprecationWarning

2016-11-22 Thread Nathan Ernst
Thanks,

I was not aware of that PEP.

The logic in my function is exactly as desired, so to squelch the warning,
I merely wrapped the iteration in a try/except:

def adjacent_difference(seq, selector=identity):
  i = iter(seq)
  l = selector(next(i))
  try:
while True:
  r = selector(next(i))
  yield r - l
  l = r
  except StopIteration:
return


On Tue, Nov 22, 2016 at 9:02 PM, MRAB  wrote:

> On 2016-11-23 02:50, Nathan Ernst wrote:
>
>> I'm using Python 3.5.2, and the following code (when invoked) causes a
>> PendingDeprecationWarning when used in a unit test:
>>
>> def identity(x):
>>   return x
>>
>> def adjacent_difference(seq, selector=identity):
>>   i = iter(seq)
>>   l = selector(next(i))
>>   while True:
>> r = selector(next(i))
>> yield r - l
>> l = r
>>
>> I wrote this to mimic the C++ std algorithm (defined here:
>> http://en.cppreference.com/w/cpp/algorithm/adjacent_difference).
>>
>> What I don't understand is why I get this warning.
>>
>> The exact error message I get from unittest is:
>> PendingDeprecationWarning: generator 'adjacent_difference' raised
>> StopIteration
>>
>> I'd appreciate any insight into what is causing this deprecation warning,
>> as I am stumped.
>>
>> The 'while' loop keeps calling next(i) until it raises StopIteration, and
> that kind of behaviour can hide obscure bugs.
>
> If you want to know the details, you can read the rationale behind the
> change in the relevant PEP:
>
> PEP 479 -- Change StopIteration handling inside generators
> https://www.python.org/dev/peps/pep-0479/
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unexpected PendingDeprecationWarning

2016-11-22 Thread Chris Angelico
On Wed, Nov 23, 2016 at 1:50 PM, Nathan Ernst  wrote:
> I'm using Python 3.5.2, and the following code (when invoked) causes a
> PendingDeprecationWarning when used in a unit test:
>
> def identity(x):
>   return x
>
> def adjacent_difference(seq, selector=identity):
>   i = iter(seq)
>   l = selector(next(i))
>   while True:
> r = selector(next(i))
> yield r - l
> l = r
>
> I wrote this to mimic the C++ std algorithm (defined here:
> http://en.cppreference.com/w/cpp/algorithm/adjacent_difference).
>
> What I don't understand is why I get this warning.
>
> The exact error message I get from unittest is:
> PendingDeprecationWarning: generator 'adjacent_difference' raised
> StopIteration
>
> I'd appreciate any insight into what is causing this deprecation warning,
> as I am stumped.

It's because there are some extremely confusing possibilities when a
generator raises StopIteration. In the example you post above, you're
probably expecting the behaviour you do indeed get, but there are
other ways of writing generators that would be a lot more surprising,
particularly when you refactor the generator a bit. The future
behaviour is much simpler to explain: *any* exception is an error, and
the way to cleanly terminate the generator is to return from it.

Here's how you'd write that generator for a post-3.6 world:

def adjacent_difference(seq, selector=identity):
  i = iter(seq)
  try: l = selector(next(i))
  except StopIteration: return
  for r in i:
r = selector(r)
yield r - l
l = r

This is more explicit about the behaviour in the face of an empty
iterable (it will return without yielding any results), and uses the
most obvious way of stepping an iterator, namely a 'for' loop. You're
now iterating through 'i' and yielding stuff.

As a general rule, you should be able to replace 'yield x' with
'print(x)' and the code will do the same thing, only printing to the
console instead of being iterated over. Deliberately allowing
StopIteration to leak breaks that.

Here's the doc that set it all out:

https://www.python.org/dev/peps/pep-0479/

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unexpected PendingDeprecationWarning

2016-11-22 Thread Chris Angelico
On Wed, Nov 23, 2016 at 2:14 PM, Nathan Ernst  wrote:
> I was not aware of that PEP.
>
> The logic in my function is exactly as desired, so to squelch the warning,
> I merely wrapped the iteration in a try/except:
>
> def adjacent_difference(seq, selector=identity):
>   i = iter(seq)
>   l = selector(next(i))
>   try:
> while True:
>   r = selector(next(i))
>   yield r - l
>   l = r
>   except StopIteration:
> return

You'll probably want to move the 'try' up one line - unless you want
an exception if the sequence is empty. And it's exactly this kind of
ambiguity that this change helps to catch.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unexpected PendingDeprecationWarning

2016-11-22 Thread Nathan Ernst
Thanks, ChrisA

On Tue, Nov 22, 2016 at 9:24 PM, Chris Angelico  wrote:

> On Wed, Nov 23, 2016 at 2:14 PM, Nathan Ernst 
> wrote:
> > I was not aware of that PEP.
> >
> > The logic in my function is exactly as desired, so to squelch the
> warning,
> > I merely wrapped the iteration in a try/except:
> >
> > def adjacent_difference(seq, selector=identity):
> >   i = iter(seq)
> >   l = selector(next(i))
> >   try:
> > while True:
> >   r = selector(next(i))
> >   yield r - l
> >   l = r
> >   except StopIteration:
> > return
>
> You'll probably want to move the 'try' up one line - unless you want
> an exception if the sequence is empty. And it's exactly this kind of
> ambiguity that this change helps to catch.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about working with html entities in python 2 to use them as filenames

2016-11-22 Thread Paul Rubin
Steven Truppe  writes:

> # here i would like to create a directory named after the content of
> # the title... I allways get this error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2

The title has a à (capital A with tilde) character in it, and there is
no corresponding ascii character.  So you can't encode that string into
ascii.  You can encode it into utf8 or whatever, but are you on an OS
that recognizes utf8 in filenames?  Maybe you want to transcode it
somehow.  Otherwise you may be asking for trouble if some of those html
strings have control characters and stuff in them.

Also, if you're scraping web pages, you may have an easier time with
BeautifulSoup (search web for it) than HTMLparser.
-- 
https://mail.python.org/mailman/listinfo/python-list