Re: New to PSF

2014-12-28 Thread prateek pandey
Yeah, I mean Python Software Foundation. I am a developer and I'm want to 
contribute. So, Can you please help me in getting started ?

Thanks

On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote:
> prateek pandey wrote:
> 
> > Hey, I'm new to PSF. Can someone please help me in getting started.
> 
> 
> Can we have some context? What do you mean by PSF? The Python Software
> Foundation? Something else?
> 
> 
> -- 
> Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New to PSF

2014-12-28 Thread prateek pandey
Yeah, I mean Python Software Foundation. I am a developer and I want to 
contribute. So, Can you please help me in getting started ? 

Thanks

On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote:
> prateek pandey wrote:
> 
> > Hey, I'm new to PSF. Can someone please help me in getting started.
> 
> 
> Can we have some context? What do you mean by PSF? The Python Software
> Foundation? Something else?
> 
> 
> -- 
> Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New to PSF

2014-12-28 Thread Michiel Overtoom

On Dec 28, 2014, at 09:54, prateek pandey wrote:

> Yeah, I mean Python Software Foundation. I am a developer and I want to 
> contribute. So, Can you please help me in getting started ? 

https://www.python.org/psf/volunteer/

-- 
"You can't actually make computers run faster, you can only make them do less." 
- RiderOfGiraffes

-- 
https://mail.python.org/mailman/listinfo/python-list


CSV Error

2014-12-28 Thread JC
Hello,

I am trying to read a csv file using DictReader. I am getting error -

Traceback (most recent call last):
  File "", line 1, in 
r.fieldnames
  File "/usr/lib/python2.7/csv.py", line 90, in fieldnames
self._fieldnames = self.reader.next()
ValueError: I/O operation on closed file

Here is my code in a Python shell -

>>> with open('x.csv','rb') as f:
... r = csv.DictReader(f,delimiter=",")
>>> r.fieldnames

I have tried to open the file in 'rU', 'r' mode. But still I am getting 
the above error.

Please help.
Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CSV Error

2014-12-28 Thread Skip Montanaro
> ValueError: I/O operation on closed file
>
> Here is my code in a Python shell -
>
> >>> with open('x.csv','rb') as f:
> ... r = csv.DictReader(f,delimiter=",")
> >>> r.fieldnames

The file is only open during the context of the with statement. Indent the
last line to match the assignment to r and you should be fine.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CSV Error

2014-12-28 Thread Jussi Piitulainen
Skip Montanaro writes:

> > ValueError: I/O operation on closed file
> >
> > Here is my code in a Python shell -
> >
> > >>> with open('x.csv','rb') as f:
> > ... r = csv.DictReader(f,delimiter=",")
> > >>> r.fieldnames
> 
> The file is only open during the context of the with statement.
> Indent the last line to match the assignment to r and you should be
> fine.

Or, don't use "with" when experimenting in the shell.

   >>> import csv
   >>> f = open('x.csv')
   >>> r = csv.DictReader(f, delimiter = ',')
   >>> r.fieldnames
   ['Foo', 'Bar']
   >>> 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CSV Error

2014-12-28 Thread JC
On Sun, 28 Dec 2014 06:19:58 -0600, Skip Montanaro wrote:

>> ValueError: I/O operation on closed file
>>
>> Here is my code in a Python shell -
>>
>> >>> with open('x.csv','rb') as f:
>> ... r = csv.DictReader(f,delimiter=",")
>> >>> r.fieldnames
> 
> The file is only open during the context of the with statement. Indent
> the last line to match the assignment to r and you should be fine.
> 
> Skip > ValueError: I/O operation on closed file
> >
> > Here is my code in a Python shell -
> >
> > >>> with open('x.csv','rb') as f:
> > ...     r = csv.DictReader(f,delimiter=",")
> > >>> r.fieldnames
> The file is only open during the context of the with
> statement. Indent the last line to match the assignment to r and you
> should be fine.
> Skip

I have indented the line. I am working in the shell. The error is still 
there.

Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CSV Error

2014-12-28 Thread JC
On Sun, 28 Dec 2014 14:41:55 +0200, Jussi Piitulainen wrote:

> Skip Montanaro writes:
> 
>> > ValueError: I/O operation on closed file
>> >
>> > Here is my code in a Python shell -
>> >
>> > >>> with open('x.csv','rb') as f:
>> > ... r = csv.DictReader(f,delimiter=",")
>> > >>> r.fieldnames
>> 
>> The file is only open during the context of the with statement. Indent
>> the last line to match the assignment to r and you should be fine.
> 
> Or, don't use "with" when experimenting in the shell.
> 
>>>> import csv f = open('x.csv')
>>>> r = csv.DictReader(f, delimiter = ',')
>>>> r.fieldnames
>['Foo', 'Bar']
>>>>

Yes, Thanks. It's fixed.
Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: CSV Error

2014-12-28 Thread Skip Montanaro
Hmmm... Works for me.

% python
Python 2.7.6+ (2.7:db842f730432, May  9 2014, 23:53:26)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("coconutBattery.csv", "rb") as f:
... r = csv.DictReader(f)
... x = r.fieldnames
...
autoloading csv
>>> x
['date', 'capacity', 'loadcycles']

(Ignore the "autoloading" message. I use an autoloader in interactive
mode which comes in handy when I forget to import a module, as I did
here.)

It also works without assigning r.fieldnames to a new variable:

>>> with open("coconutBattery.csv", "rb") as f:
... r = csv.DictReader(f)
... r.fieldnames
...
['date', 'capacity', 'loadcycles']
>>> r.fieldnames
['date', 'capacity', 'loadcycles']

I think you're going to have to paste another example session to show
us what you might have done differently.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Autoloader (was Re: CSV Error)

2014-12-28 Thread Chris Angelico
On Mon, Dec 29, 2014 at 12:58 AM, Skip Montanaro
 wrote:
> (Ignore the "autoloading" message. I use an autoloader in interactive
> mode which comes in handy when I forget to import a module, as I did
> here.)

We were discussing something along these lines a while ago, and I
never saw anything truly satisfactory - there's no easy way to handle
a missing name by returning a value (comparably to __getattr__), you
have to catch it and then try to re-execute the failing code, which
isn't perfect. How does yours work? Or was it one of the ones that was
mentioned last time?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Skip Montanaro
> We were discussing something along these lines a while ago, and I
> never saw anything truly satisfactory - there's no easy way to handle
> a missing name by returning a value (comparably to __getattr__), you
> have to catch it and then try to re-execute the failing code, which
> isn't perfect. How does yours work? Or was it one of the ones that was
> mentioned last time?

Just like that. I've attached a copy. As you said, I'm sure it's not
perfect, but it's handy in precisely those interactive interpreter
cases when *dope slap* you forgot to import a standard module before
launching into a block of code.

Skip
"""
autoload - load common symbols automatically on demand

When a NameError is raised attempt to find the name in a couple places.
Check to see if it's a name in a list of commonly used modules.  If it's
found, import the name.  If it's not in the common names try importing it.
In either case (assuming the imports succeed), reexecute the code in the
original context.
"""

import sys, traceback, re

_common = {}
# order important - most important needs to be last - os.path is chosen over
# sys.path for example
for mod in "sys os math xmlrpclib".split():
m = __import__(mod)
try:
names = m.__all__
except AttributeError:
names = dir(m)
names = [n for n in names if not n.startswith("_") and n.upper() != n]
for n in names:
_common[n] = mod

def _exec(import_stmt, tb):
f_locals = tb.tb_frame.f_locals
f_globals = tb.tb_frame.f_globals
sys.excepthook = _eh
try:
exec import_stmt in f_locals, f_globals
exec tb.tb_frame.f_code in f_locals, f_globals
finally:
sys.excepthook = _autoload_exc

def _autoload_exc(ty, va, tb):
##if ty != ImportError:
##traceback.print_exception(ty, va, tb)
##return
mat = re.search("name '([^']*)' is not defined", va.args[0])
if mat is not None:
name = mat.group(1)
if name in _common:
mod = _common[name]
print >> sys.stderr, "found", name, "in", mod, "module"
_exec("from %s import %s" % (mod, name), tb)
else:
print >> sys.stderr, "autoloading", name
_exec("import %s" % name, tb)
else:
traceback.print_exception(ty, va, tb)

_eh = sys.excepthook
sys.excepthook = _autoload_exc
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Chris Angelico
On Mon, Dec 29, 2014 at 1:15 AM, Skip Montanaro
 wrote:
>> We were discussing something along these lines a while ago, and I
>> never saw anything truly satisfactory - there's no easy way to handle
>> a missing name by returning a value (comparably to __getattr__), you
>> have to catch it and then try to re-execute the failing code, which
>> isn't perfect. How does yours work? Or was it one of the ones that was
>> mentioned last time?
>
> Just like that. I've attached a copy. As you said, I'm sure it's not
> perfect, but it's handy in precisely those interactive interpreter
> cases when *dope slap* you forgot to import a standard module before
> launching into a block of code.

Right, so its primary imperfection is that it potentially re-executes
a block of code that had partially succeeded. Still of value, but
definitely has its dangers.

I wonder how hard it would be to tinker at the C level and add a
__getattr__ style of hook...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Chris Angelico
On Mon, Dec 29, 2014 at 1:22 AM, Chris Angelico  wrote:
> I wonder how hard it would be to tinker at the C level and add a
> __getattr__ style of hook...

You know what, it's not that hard. It looks largeish as there are four
places where NameError (not counting UnboundLocalError, which I'm not
touching) can be raised - LOAD_GLOBAL and LOAD_NAME, both of which
have a fast path for the normal case and a fall-back for when
globals/builtins isn't a dict; but refactoring it into a helper
function keeps it looking reasonable.

Once that's coded in, all you need is:

def try_import(n):
try: return __import__(n)
except ImportError: raise NameError("Name %r is not defined"%n)
import sys
sys.__getglobal__ = try_import

and then any unknown name will be imported, if available, and
returned. It's just like __getattr__: if it returns something, it's as
if the name pointed to that thing, otherwise it raises NameError.

Is anyone else interested in the patch? Should I create a tracker
issue and upload it?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Chris Angelico
On Mon, Dec 29, 2014 at 2:38 AM, Chris Angelico  wrote:
> It's just like __getattr__: if it returns something, it's as
> if the name pointed to that thing, otherwise it raises NameError.

To clarify: The C-level patch has nothing about imports. What it does
is add a hook at the point where NameError is about to be raised,
allowing a Python function (stuffed into sys.__getglobal__) to control
what happens.

I do *not* recommend this for application code, and I would strongly
discourage it for library code, but it's handy for interactive work.
Like with Skip's hook, you could have a specific set of "from" imports
supported as well - here's a port of that script that uses this hook
instead:

"""
autoload - load common symbols automatically on demand

When a NameError is raised attempt to find the name in a couple places.
Check to see if it's a name in a list of commonly used modules.  If it's
found, import the name.  If it's not in the common names try importing it.
In either case (assuming the imports succeed), reexecute the code in the
original context.
"""

import sys

_common = {}
# order important - most important needs to be last - os.path is chosen over
# sys.path for example
for mod in "sys os math xmlrpclib".split():
m = __import__(mod)
try:
names = m.__all__
except AttributeError:
names = dir(m)
names = [n for n in names if not n.startswith("_") and n.upper() != n]
for n in names:
_common[n] = m

def _autoload_exc(name):
if name in _common:
return getattr(_common[name], name)
else:
return __import__(name)

sys.__getglobal__ = _autoload_exc

-- cut --

Note that I've removed the print-to-stderr when something gets
auto-imported. This is because the original hook inserted something
into the namespace, but this one doesn't; every time you reference
"exp", it'll look it up afresh from the math module, so it'd keep
spamming you with messages.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Mark Lawrence

On 28/12/2014 15:38, Chris Angelico wrote:

On Mon, Dec 29, 2014 at 1:22 AM, Chris Angelico  wrote:

I wonder how hard it would be to tinker at the C level and add a
__getattr__ style of hook...


You know what, it's not that hard. It looks largeish as there are four
places where NameError (not counting UnboundLocalError, which I'm not
touching) can be raised - LOAD_GLOBAL and LOAD_NAME, both of which
have a fast path for the normal case and a fall-back for when
globals/builtins isn't a dict; but refactoring it into a helper
function keeps it looking reasonable.

Once that's coded in, all you need is:

def try_import(n):
 try: return __import__(n)
 except ImportError: raise NameError("Name %r is not defined"%n)
import sys
sys.__getglobal__ = try_import

and then any unknown name will be imported, if available, and
returned. It's just like __getattr__: if it returns something, it's as
if the name pointed to that thing, otherwise it raises NameError.

Is anyone else interested in the patch? Should I create a tracker
issue and upload it?

ChrisA



I'd raise a tracker issue so it's easier to find in the future.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Autoloader (was Re: CSV Error)

2014-12-28 Thread Chris Angelico
On Mon, Dec 29, 2014 at 3:14 AM, Mark Lawrence  wrote:
>> Is anyone else interested in the patch? Should I create a tracker
>> issue and upload it?
>
> I'd raise a tracker issue so it's easier to find in the future.

http://bugs.python.org/issue23126

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Searching through more than one file.

2014-12-28 Thread Seymore4Head
I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?

fname = raw_input("Enter file name: ")  #"*.txt"
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
line=line.rstrip()
line=line.split()
biglst+=line
final=[]
for out in biglst:
if out not in final:
final.append(out)
final.sort()
print (final)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Mark Lawrence

On 28/12/2014 17:27, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?

fname = raw_input("Enter file name: ")  #"*.txt"
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)



See the glob function in the glob module here 
https://docs.python.org/3/library/glob.html#module-glob


Similar functionality is available in the pathlib module 
https://docs.python.org/3/library/pathlib.html#module-pathlib but this 
is only available with Python 3.4


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Paul Rubin
Seymore4Head  writes:
> How can I modify the code to search through a directory of files that
> have different filenames, but the same extension?

Use the os.listdir function to read the directory.  It gives you a list
of filenames that you can filter for the extension you want.

Per Mark Lawrence, there's also a glob function.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Dave Angel

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?



You have two other replies to your specific question, glob and 
os.listdir.  I would also mention the module fileinput:


https://docs.python.org/2/library/fileinput.html

import fileinput
from glob import glob

fnames = glob('*.txt')
for line in fileinput.input(fnames):
pass # do whatever

If you're not on Windows, I'd mention that the shell will expand the 
wildcards for you, so you could get the filenames from argv even 
simpler.  See first example on the above web page.



I'm more concerned that you think the following code you supplied does a 
search for a string.  It does something entirely different, involving 
making a crude dictionary.  But it could be reduced to just a few lines, 
and probably take much less memory, if this is really the code you're 
working on.



fname = raw_input("Enter file name: ")  #"*.txt"
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)



Something like the following:

import fileinput
from glob import glob

res = set()
fnames = glob('*.txt')
for line in fileinput.input(fnames):
res.update(line.rstrip().split())
print sorted(res)




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Dave Angel

On 12/28/2014 02:12 PM, Dave Angel wrote:

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?



You have two other replies to your specific question, glob and
os.listdir.  I would also mention the module fileinput:

https://docs.python.org/2/library/fileinput.html

import fileinput
from glob import glob

fnames = glob('*.txt')
for line in fileinput.input(fnames):
 pass # do whatever

If you're not on Windows, I'd mention that the shell will expand the
wildcards for you, so you could get the filenames from argv even
simpler.  See first example on the above web page.


I'm more concerned that you think the following code you supplied does a
search for a string.  It does something entirely different, involving
making a crude dictionary.  But it could be reduced to just a few lines,
and probably take much less memory, if this is really the code you're
working on.


Note:  the changes I suggest also should be tons faster, if you have 
very many words you're parsing this way.





fname = raw_input("Enter file name: ")  #"*.txt"
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)







Something like the following:

Untested, I should have said.



import fileinput
from glob import glob

res = set()
fnames = glob('*.txt')
for line in fileinput.input(fnames):
 res.update(line.rstrip().split())


And I should have omitted the rsplit(), which does nothing that split() 
isn't already going to do.



print sorted(res)







--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Paul Rubin
Dave Angel  writes:
> res = set()
> fnames = glob('*.txt')
> for line in fileinput.input(fnames):
> res.update(line.rstrip().split())
> print sorted(res)

Untested:

print sorted(set(line.rstrip().split() for line in fileinput(fnames)))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Terry Reedy

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?


You could simplify the relevant parts of idlelib/grep.py

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: suggestions for VIN parsing

2014-12-28 Thread Vincent Davis
On Fri, Dec 26, 2014 at 12:15 PM, Denis McMahon 
wrote:

> Note, I think the 1981 model year ran KCA - DCA prefixes, not as shown on
> the website you quoted.
>

​Denis,
Regarding the KCA - DCA prefixes, do you have a source as to why you think
this?

Here is what I have so far with a simple test at the end. I don't show is a
dict which contains more information about the year/model, its not
relivant. I am happy with how it is working, I hope to be able to decode
BSA, and other British or more generally vintage motorcycle frame and
engine numbers. BSA looks like a mess.

def vin_to_year2(vin):
vin = vin.lower()
alpha_digit_alpha = re.match(r'^(\D+)(\d+)(\D+)$', vin)
digit_alpha = re.match(r'^(\d+)(\D+)$', vin)
alpha_digit = re.match(r'^(\D+)+(\d+)$', vin)
alpha = re.match(r'^(\d+)$', vin)

if alpha_digit_alpha:
alpha_digit_alpha.groups()
elif digit_alpha:
g = digit_alpha.groups()
if 100<=int(g[0]) and g[-1]=='n': # Triumph 1950: From
100N
return 't1950'
elif 101<=int(g[0])<=15808 and g[-1]=='na':   # Triumph 1951: 101NA
- 15808NA
return 't1951'
elif 15809<=int(g[0])<=25000 and g[-1]=='na': # Triumph 1952:
15809NA - 25000NA, see also alpha only vin for 1952
# Triumph 1952: 15809NA - 25000NA
return 't1952'
else:
return None
elif alpha_digit:
g = alpha_digit.groups()
if g[0] == 'h' and 101 <= int(g[1]) <= 760:   # tu1957: H101 -
H760
return 'tu1957'
elif g[0] == 'h' and 761 <= int(g[1]) <= 5484:# tu1958: H761 -
H5484
return 'tu1958'
elif g[0] == 'h' and 5485 <= int(g[1]) <= 11511:  # tu1959: H5485 -
H11511
return 'tu1959'
elif g[0] == 'h' and 11512 <= int(g[1]) <= 18611: # tu1960: H11512
- H18611
return 'tu1960'
elif g[0] == 'h' and 18612 <= int(g[1]) <= 25251: # tu1961: H18612
- H25251
return 'tu1961'
elif g[0] == 'h' and 25252 <= int(g[1]) <= 29732: # tu1962: H25252
- H29732
return 'tu1962'
elif g[0] == 'h' and 29733 <= int(g[1]) <= 32464: # tu1963: H29733
- H32464
return 'tu1963'
elif g[0] == 'h' and 32465 <= int(g[1]) <= 35986: # tu1964: H32465
- H35986
return 'tu1964'
elif g[0] == 'h' and 35987 <= int(g[1]) <= 40527: # tu1965: H35987
- H40527
return 'tu1965'
elif g[0] == 'h' and 40528 <= int(g[1]) <= 49832: # tu1966: H40528
- H49832
return 'tu1966'
elif g[0] == 'h' and 49833 <= int(g[1]) <= 57082: # tu1967: H49833
- H57082
return 'tu1967'
elif g[0] == 'h' and 57083 <= int(g[1]) <= 65572: # tu1968: H57083
- H65572
return 'tu1968'
elif g[0] == 'h' and 65573 <= int(g[1]) <= 67331: # tu1969: H65573
- H67331
return 'tu1969'
elif g[0] == 'd' and 101 <= int(g[1]) <= 7726: # tp1960: D101 -
D7726
return 'tp1960'
elif g[0] == 'd' and 7727 <= int(g[1]) <= 15788: # tp1961: D7727 -
D15788
return 'tp1961'
elif g[0] == 'd' and 15789 <= int(g[1]): # tp1962: D15789 - onward
return 'tp1962'
elif g[0] == 'du' and 101 <= int(g[1]) <= 5824: # 650 t65u1963:
DU101 - DU5824
return 't65u1963'
elif g[0] == 'du' and 5825 <= int(g[1]) <= 13374: # 650 t65u1964:
DU5825 - DU13374
return 't65u1964'
elif g[0] == 'du' and 5825 <= int(g[1]) <= 13374: # 650 t65u1965:
DU5825 - DU13374
return 't65u1965'
elif g[0] == 'du' and 24875 <= int(g[1]) <= 44393: # 650 t65u1966:
DU24875 - DU44393
return 't65u1966'
elif g[0] == 'du' and 44394 <= int(g[1]) <= 66245: # 650 t65u1967:
DU44394 - DU66245
return 't65u1967'
elif g[0] == 'du' and 66246 <= int(g[1]) <= 85903: # 650 t65u1968:
DU66246 - DU85903
return 't65u1968'
elif g[0] == 'du' and 85904 <= int(g[1]) <= 90282: # 650 t65u1969:
DU85904 - DU90282
return 't65u1969'
else:
return None
elif alpha:
g = alpha.groups()
if 25000 <= int(g[0]) <= 32302:   # t1952: 25000 - 32302
return 't1952'
elif 32303 <= int(g[0]) <= 44134: # t1953: 32303 - 44134
return 't1953'
elif 44135 <= int(g[0]) <= 56699: # t1954: 44135 - 56699
return 't1954'
elif 56700 <= int(g[0]) <= 70929: # t1955: 56700 - 70929
return 't1955'
elif 70930 <= int(g[0]) <= 82799: # t1956: 70930 - 82799
return 't1956'
elif 100 <= int(g[0]) <= 944 and g[0][0]=='0': # t1956: 0100 - 0944
return 't1956'
elif g[0][0] == '0' and 945 <= int(g[0]) <= 5: # tp1957: 0945 -
05
return 'tp1957'
elif g[0][0] == '0' and 6 <= int(g[0]) <= 20075: # tp1958:
06 - 020075
return 'tp1958'
elif g[0][0] == '0' and 20076 <= int(g[0]) <= 29363: # tp1

Re: suggestions for VIN parsing

2014-12-28 Thread Rick Johnson
On Sunday, December 28, 2014 5:34:11 PM UTC-6, Vincent Davis wrote:
> 
> [snip: code sample with Unicode spaces! Yes, *UNICODE SPACES*!]


Oh my! Might i offer some suggestions to improve the
readability of this code?

1. Indexing is syntactically noisy, so if you find yourself
fetching the same index more than once, then that is a good
time to store the indexed value into a local variable.

2. The only thing worse than duplicating code which fetches
the same index over and over again, is wrapping the fetch in
casting function (in this case: "int()") OVER and OVER again!

3. I see that you are utilizing regexps to aid in the logic,
and although i agree that regexps are overkill for this
problem (since it could "technically" be solved with string
methods) if *I* had to solve this problem, i would use the
power of regexps -- although i would use them more wisely ;-)

I have not studied the data thoroughly, but just by "grazing
over" the code you posted i can see a few distinct patterns
that emerge from the VIN data-set. Here is a description of
the patterns:

"\d+n"
"\d+na"
"d\d+"
"du\d+"

and the last pattern being all digits:

"\d+"

Even though your "verbose-run-on-conditional" would most
likely execute faster, i prefer to write code (when
performance is not mission critical!) in the most readable
and maintainable fashion. And in order to achieve that goal,
you always want to keep the main logic as succinct as
possible whist encapsulating the difficult bits in "suitably
abstracted structures".

DIVIDE AND CONQUER!


 My approach would be as follows:


1. Create a map for each distinct set of VIN patterns with
the keys being a two-tuple that represents the low and high
limits of the serial number, and the values being the year
of that range..

database = {
'map_NA':{
(101, 15808): "Triumph 1951",
(15809, 25000): "Triumph 1952",
...,
},

'map_N':{
...,
},

'map_H':{
...,
},

'map_D':{
...,
},

'map_DU':{
...,
},
}

2. Create a regexp pattern for each "distinct VIN pattern".
The group captures will be used to strip-out *ONLY* the
numeric parts! Then concatenate all the regexp patterns into
a single monolithic program utilizing "named groups". (The
group names will be the corresponding "map_*" for which to
search)

[code stub here] :-P"

3. Now you can write some fairly simple logic.

prog = re.compile("pat1|pat2|pat3...")
def parse_vin(vin):
match = prog.search(vin)
if match:
gname = # Fetch the groupname from the match object.
number = # Fetch the digits from the group capture.
d = database[gname]
for k in d:
low, high = d[k]
if low <= number <= high:
return d[k]
return None 

While this approach could be "heavy handed", i feel it will
be much easier to maintain and expand. I'd argue that if
you're going to utilize re's, then you should wield the full
power they provide, else, use some other method.

PS: You know you have a Unicode monkey on your back when you
use tools that insert Unicode spaces!

PPS: Hopefully i did not make any stupid mistakes, it's past my
bedtime! 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: suggestions for VIN parsing

2014-12-28 Thread Rick Johnson
On Monday, December 29, 2014 12:50:39 AM UTC-6, Rick Johnson wrote:

[EDIT]

> 3. Now you can write some fairly simple logic.
> 
> prog = re.compile("pat1|pat2|pat3...")
> def parse_vin(vin):
> match = prog.search(vin)
> if match:
> gname = # Fetch the groupname from the match object.
> number = # Fetch the digits from the group capture.
> d = database[gname]
> for k in d:
> low, high = d[k]

Dammit! That last line should have been:

low, high = k

But even better would be:

d = database[gname]
for low,high in d:
if low <= number <= high:
...

I knew something was tickling my sub-conscience as i sent
that reply, i should have known better!

PS: Hey, I said it was "fairly simple" logic, not "perfect" 
logic!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Rick Johnson
On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote:
> I need to search through a directory of text files for a string.
> Here is a short program I made in the past to search through a single
> text file for a line of text.

Step1: Search through a single file. 
# Just a few more brush strokes...

Step2: Search through all files in a directory. 
# Time to go exploring! 

Step3: Option to filter by file extension. 
# Waste not, want not!

Step4: Option for recursing down sub-directories. 
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
 [Opps, fell into a recursive black hole!]
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
 [BREAK]
# Whew, no worries, MaximumRecursionError is my best friend! 

;-)

In addition to the other advice, you might want to check out os.walk().
-- 
https://mail.python.org/mailman/listinfo/python-list