Re: New to PSF
Yeah, I mean Python Software Foundation. I am a developer and I'm want to contribute. So, Can you please help me in getting started ? Thanks On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote: > prateek pandey wrote: > > > Hey, I'm new to PSF. Can someone please help me in getting started. > > > Can we have some context? What do you mean by PSF? The Python Software > Foundation? Something else? > > > -- > Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: New to PSF
Yeah, I mean Python Software Foundation. I am a developer and I want to contribute. So, Can you please help me in getting started ? Thanks On Sunday, December 28, 2014 4:27:54 AM UTC+5:30, Steven D'Aprano wrote: > prateek pandey wrote: > > > Hey, I'm new to PSF. Can someone please help me in getting started. > > > Can we have some context? What do you mean by PSF? The Python Software > Foundation? Something else? > > > -- > Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: New to PSF
On Dec 28, 2014, at 09:54, prateek pandey wrote: > Yeah, I mean Python Software Foundation. I am a developer and I want to > contribute. So, Can you please help me in getting started ? https://www.python.org/psf/volunteer/ -- "You can't actually make computers run faster, you can only make them do less." - RiderOfGiraffes -- https://mail.python.org/mailman/listinfo/python-list
CSV Error
Hello, I am trying to read a csv file using DictReader. I am getting error - Traceback (most recent call last): File "", line 1, in r.fieldnames File "/usr/lib/python2.7/csv.py", line 90, in fieldnames self._fieldnames = self.reader.next() ValueError: I/O operation on closed file Here is my code in a Python shell - >>> with open('x.csv','rb') as f: ... r = csv.DictReader(f,delimiter=",") >>> r.fieldnames I have tried to open the file in 'rU', 'r' mode. But still I am getting the above error. Please help. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: CSV Error
> ValueError: I/O operation on closed file > > Here is my code in a Python shell - > > >>> with open('x.csv','rb') as f: > ... r = csv.DictReader(f,delimiter=",") > >>> r.fieldnames The file is only open during the context of the with statement. Indent the last line to match the assignment to r and you should be fine. Skip -- https://mail.python.org/mailman/listinfo/python-list
Re: CSV Error
Skip Montanaro writes: > > ValueError: I/O operation on closed file > > > > Here is my code in a Python shell - > > > > >>> with open('x.csv','rb') as f: > > ... r = csv.DictReader(f,delimiter=",") > > >>> r.fieldnames > > The file is only open during the context of the with statement. > Indent the last line to match the assignment to r and you should be > fine. Or, don't use "with" when experimenting in the shell. >>> import csv >>> f = open('x.csv') >>> r = csv.DictReader(f, delimiter = ',') >>> r.fieldnames ['Foo', 'Bar'] >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: CSV Error
On Sun, 28 Dec 2014 06:19:58 -0600, Skip Montanaro wrote: >> ValueError: I/O operation on closed file >> >> Here is my code in a Python shell - >> >> >>> with open('x.csv','rb') as f: >> ... r = csv.DictReader(f,delimiter=",") >> >>> r.fieldnames > > The file is only open during the context of the with statement. Indent > the last line to match the assignment to r and you should be fine. > > Skip > ValueError: I/O operation on closed file > > > > Here is my code in a Python shell - > > > > >>> with open('x.csv','rb') as f: > > ... r = csv.DictReader(f,delimiter=",") > > >>> r.fieldnames > The file is only open during the context of the with > statement. Indent the last line to match the assignment to r and you > should be fine. > Skip I have indented the line. I am working in the shell. The error is still there. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: CSV Error
On Sun, 28 Dec 2014 14:41:55 +0200, Jussi Piitulainen wrote: > Skip Montanaro writes: > >> > ValueError: I/O operation on closed file >> > >> > Here is my code in a Python shell - >> > >> > >>> with open('x.csv','rb') as f: >> > ... r = csv.DictReader(f,delimiter=",") >> > >>> r.fieldnames >> >> The file is only open during the context of the with statement. Indent >> the last line to match the assignment to r and you should be fine. > > Or, don't use "with" when experimenting in the shell. > >>>> import csv f = open('x.csv') >>>> r = csv.DictReader(f, delimiter = ',') >>>> r.fieldnames >['Foo', 'Bar'] >>>> Yes, Thanks. It's fixed. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: CSV Error
Hmmm... Works for me. % python Python 2.7.6+ (2.7:db842f730432, May 9 2014, 23:53:26) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> with open("coconutBattery.csv", "rb") as f: ... r = csv.DictReader(f) ... x = r.fieldnames ... autoloading csv >>> x ['date', 'capacity', 'loadcycles'] (Ignore the "autoloading" message. I use an autoloader in interactive mode which comes in handy when I forget to import a module, as I did here.) It also works without assigning r.fieldnames to a new variable: >>> with open("coconutBattery.csv", "rb") as f: ... r = csv.DictReader(f) ... r.fieldnames ... ['date', 'capacity', 'loadcycles'] >>> r.fieldnames ['date', 'capacity', 'loadcycles'] I think you're going to have to paste another example session to show us what you might have done differently. Skip -- https://mail.python.org/mailman/listinfo/python-list
Autoloader (was Re: CSV Error)
On Mon, Dec 29, 2014 at 12:58 AM, Skip Montanaro wrote: > (Ignore the "autoloading" message. I use an autoloader in interactive > mode which comes in handy when I forget to import a module, as I did > here.) We were discussing something along these lines a while ago, and I never saw anything truly satisfactory - there's no easy way to handle a missing name by returning a value (comparably to __getattr__), you have to catch it and then try to re-execute the failing code, which isn't perfect. How does yours work? Or was it one of the ones that was mentioned last time? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
> We were discussing something along these lines a while ago, and I > never saw anything truly satisfactory - there's no easy way to handle > a missing name by returning a value (comparably to __getattr__), you > have to catch it and then try to re-execute the failing code, which > isn't perfect. How does yours work? Or was it one of the ones that was > mentioned last time? Just like that. I've attached a copy. As you said, I'm sure it's not perfect, but it's handy in precisely those interactive interpreter cases when *dope slap* you forgot to import a standard module before launching into a block of code. Skip """ autoload - load common symbols automatically on demand When a NameError is raised attempt to find the name in a couple places. Check to see if it's a name in a list of commonly used modules. If it's found, import the name. If it's not in the common names try importing it. In either case (assuming the imports succeed), reexecute the code in the original context. """ import sys, traceback, re _common = {} # order important - most important needs to be last - os.path is chosen over # sys.path for example for mod in "sys os math xmlrpclib".split(): m = __import__(mod) try: names = m.__all__ except AttributeError: names = dir(m) names = [n for n in names if not n.startswith("_") and n.upper() != n] for n in names: _common[n] = mod def _exec(import_stmt, tb): f_locals = tb.tb_frame.f_locals f_globals = tb.tb_frame.f_globals sys.excepthook = _eh try: exec import_stmt in f_locals, f_globals exec tb.tb_frame.f_code in f_locals, f_globals finally: sys.excepthook = _autoload_exc def _autoload_exc(ty, va, tb): ##if ty != ImportError: ##traceback.print_exception(ty, va, tb) ##return mat = re.search("name '([^']*)' is not defined", va.args[0]) if mat is not None: name = mat.group(1) if name in _common: mod = _common[name] print >> sys.stderr, "found", name, "in", mod, "module" _exec("from %s import %s" % (mod, name), tb) else: print >> sys.stderr, "autoloading", name _exec("import %s" % name, tb) else: traceback.print_exception(ty, va, tb) _eh = sys.excepthook sys.excepthook = _autoload_exc -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
On Mon, Dec 29, 2014 at 1:15 AM, Skip Montanaro wrote: >> We were discussing something along these lines a while ago, and I >> never saw anything truly satisfactory - there's no easy way to handle >> a missing name by returning a value (comparably to __getattr__), you >> have to catch it and then try to re-execute the failing code, which >> isn't perfect. How does yours work? Or was it one of the ones that was >> mentioned last time? > > Just like that. I've attached a copy. As you said, I'm sure it's not > perfect, but it's handy in precisely those interactive interpreter > cases when *dope slap* you forgot to import a standard module before > launching into a block of code. Right, so its primary imperfection is that it potentially re-executes a block of code that had partially succeeded. Still of value, but definitely has its dangers. I wonder how hard it would be to tinker at the C level and add a __getattr__ style of hook... ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
On Mon, Dec 29, 2014 at 1:22 AM, Chris Angelico wrote: > I wonder how hard it would be to tinker at the C level and add a > __getattr__ style of hook... You know what, it's not that hard. It looks largeish as there are four places where NameError (not counting UnboundLocalError, which I'm not touching) can be raised - LOAD_GLOBAL and LOAD_NAME, both of which have a fast path for the normal case and a fall-back for when globals/builtins isn't a dict; but refactoring it into a helper function keeps it looking reasonable. Once that's coded in, all you need is: def try_import(n): try: return __import__(n) except ImportError: raise NameError("Name %r is not defined"%n) import sys sys.__getglobal__ = try_import and then any unknown name will be imported, if available, and returned. It's just like __getattr__: if it returns something, it's as if the name pointed to that thing, otherwise it raises NameError. Is anyone else interested in the patch? Should I create a tracker issue and upload it? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
On Mon, Dec 29, 2014 at 2:38 AM, Chris Angelico wrote: > It's just like __getattr__: if it returns something, it's as > if the name pointed to that thing, otherwise it raises NameError. To clarify: The C-level patch has nothing about imports. What it does is add a hook at the point where NameError is about to be raised, allowing a Python function (stuffed into sys.__getglobal__) to control what happens. I do *not* recommend this for application code, and I would strongly discourage it for library code, but it's handy for interactive work. Like with Skip's hook, you could have a specific set of "from" imports supported as well - here's a port of that script that uses this hook instead: """ autoload - load common symbols automatically on demand When a NameError is raised attempt to find the name in a couple places. Check to see if it's a name in a list of commonly used modules. If it's found, import the name. If it's not in the common names try importing it. In either case (assuming the imports succeed), reexecute the code in the original context. """ import sys _common = {} # order important - most important needs to be last - os.path is chosen over # sys.path for example for mod in "sys os math xmlrpclib".split(): m = __import__(mod) try: names = m.__all__ except AttributeError: names = dir(m) names = [n for n in names if not n.startswith("_") and n.upper() != n] for n in names: _common[n] = m def _autoload_exc(name): if name in _common: return getattr(_common[name], name) else: return __import__(name) sys.__getglobal__ = _autoload_exc -- cut -- Note that I've removed the print-to-stderr when something gets auto-imported. This is because the original hook inserted something into the namespace, but this one doesn't; every time you reference "exp", it'll look it up afresh from the math module, so it'd keep spamming you with messages. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
On 28/12/2014 15:38, Chris Angelico wrote: On Mon, Dec 29, 2014 at 1:22 AM, Chris Angelico wrote: I wonder how hard it would be to tinker at the C level and add a __getattr__ style of hook... You know what, it's not that hard. It looks largeish as there are four places where NameError (not counting UnboundLocalError, which I'm not touching) can be raised - LOAD_GLOBAL and LOAD_NAME, both of which have a fast path for the normal case and a fall-back for when globals/builtins isn't a dict; but refactoring it into a helper function keeps it looking reasonable. Once that's coded in, all you need is: def try_import(n): try: return __import__(n) except ImportError: raise NameError("Name %r is not defined"%n) import sys sys.__getglobal__ = try_import and then any unknown name will be imported, if available, and returned. It's just like __getattr__: if it returns something, it's as if the name pointed to that thing, otherwise it raises NameError. Is anyone else interested in the patch? Should I create a tracker issue and upload it? ChrisA I'd raise a tracker issue so it's easier to find in the future. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Autoloader (was Re: CSV Error)
On Mon, Dec 29, 2014 at 3:14 AM, Mark Lawrence wrote: >> Is anyone else interested in the patch? Should I create a tracker >> issue and upload it? > > I'd raise a tracker issue so it's easier to find in the future. http://bugs.python.org/issue23126 ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Searching through more than one file.
I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? fname = raw_input("Enter file name: ") #"*.txt" fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 28/12/2014 17:27, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? fname = raw_input("Enter file name: ") #"*.txt" fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) See the glob function in the glob module here https://docs.python.org/3/library/glob.html#module-glob Similar functionality is available in the pathlib module https://docs.python.org/3/library/pathlib.html#module-pathlib but this is only available with Python 3.4 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
Seymore4Head writes: > How can I modify the code to search through a directory of files that > have different filenames, but the same extension? Use the os.listdir function to read the directory. It gives you a list of filenames that you can filter for the extension you want. Per Mark Lawrence, there's also a glob function. -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You have two other replies to your specific question, glob and os.listdir. I would also mention the module fileinput: https://docs.python.org/2/library/fileinput.html import fileinput from glob import glob fnames = glob('*.txt') for line in fileinput.input(fnames): pass # do whatever If you're not on Windows, I'd mention that the shell will expand the wildcards for you, so you could get the filenames from argv even simpler. See first example on the above web page. I'm more concerned that you think the following code you supplied does a search for a string. It does something entirely different, involving making a crude dictionary. But it could be reduced to just a few lines, and probably take much less memory, if this is really the code you're working on. fname = raw_input("Enter file name: ") #"*.txt" fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) Something like the following: import fileinput from glob import glob res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) print sorted(res) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 02:12 PM, Dave Angel wrote: On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You have two other replies to your specific question, glob and os.listdir. I would also mention the module fileinput: https://docs.python.org/2/library/fileinput.html import fileinput from glob import glob fnames = glob('*.txt') for line in fileinput.input(fnames): pass # do whatever If you're not on Windows, I'd mention that the shell will expand the wildcards for you, so you could get the filenames from argv even simpler. See first example on the above web page. I'm more concerned that you think the following code you supplied does a search for a string. It does something entirely different, involving making a crude dictionary. But it could be reduced to just a few lines, and probably take much less memory, if this is really the code you're working on. Note: the changes I suggest also should be tons faster, if you have very many words you're parsing this way. fname = raw_input("Enter file name: ") #"*.txt" fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) Something like the following: Untested, I should have said. import fileinput from glob import glob res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) And I should have omitted the rsplit(), which does nothing that split() isn't already going to do. print sorted(res) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
Dave Angel writes: > res = set() > fnames = glob('*.txt') > for line in fileinput.input(fnames): > res.update(line.rstrip().split()) > print sorted(res) Untested: print sorted(set(line.rstrip().split() for line in fileinput(fnames))) -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You could simplify the relevant parts of idlelib/grep.py -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: suggestions for VIN parsing
On Fri, Dec 26, 2014 at 12:15 PM, Denis McMahon wrote: > Note, I think the 1981 model year ran KCA - DCA prefixes, not as shown on > the website you quoted. > Denis, Regarding the KCA - DCA prefixes, do you have a source as to why you think this? Here is what I have so far with a simple test at the end. I don't show is a dict which contains more information about the year/model, its not relivant. I am happy with how it is working, I hope to be able to decode BSA, and other British or more generally vintage motorcycle frame and engine numbers. BSA looks like a mess. def vin_to_year2(vin): vin = vin.lower() alpha_digit_alpha = re.match(r'^(\D+)(\d+)(\D+)$', vin) digit_alpha = re.match(r'^(\d+)(\D+)$', vin) alpha_digit = re.match(r'^(\D+)+(\d+)$', vin) alpha = re.match(r'^(\d+)$', vin) if alpha_digit_alpha: alpha_digit_alpha.groups() elif digit_alpha: g = digit_alpha.groups() if 100<=int(g[0]) and g[-1]=='n': # Triumph 1950: From 100N return 't1950' elif 101<=int(g[0])<=15808 and g[-1]=='na': # Triumph 1951: 101NA - 15808NA return 't1951' elif 15809<=int(g[0])<=25000 and g[-1]=='na': # Triumph 1952: 15809NA - 25000NA, see also alpha only vin for 1952 # Triumph 1952: 15809NA - 25000NA return 't1952' else: return None elif alpha_digit: g = alpha_digit.groups() if g[0] == 'h' and 101 <= int(g[1]) <= 760: # tu1957: H101 - H760 return 'tu1957' elif g[0] == 'h' and 761 <= int(g[1]) <= 5484:# tu1958: H761 - H5484 return 'tu1958' elif g[0] == 'h' and 5485 <= int(g[1]) <= 11511: # tu1959: H5485 - H11511 return 'tu1959' elif g[0] == 'h' and 11512 <= int(g[1]) <= 18611: # tu1960: H11512 - H18611 return 'tu1960' elif g[0] == 'h' and 18612 <= int(g[1]) <= 25251: # tu1961: H18612 - H25251 return 'tu1961' elif g[0] == 'h' and 25252 <= int(g[1]) <= 29732: # tu1962: H25252 - H29732 return 'tu1962' elif g[0] == 'h' and 29733 <= int(g[1]) <= 32464: # tu1963: H29733 - H32464 return 'tu1963' elif g[0] == 'h' and 32465 <= int(g[1]) <= 35986: # tu1964: H32465 - H35986 return 'tu1964' elif g[0] == 'h' and 35987 <= int(g[1]) <= 40527: # tu1965: H35987 - H40527 return 'tu1965' elif g[0] == 'h' and 40528 <= int(g[1]) <= 49832: # tu1966: H40528 - H49832 return 'tu1966' elif g[0] == 'h' and 49833 <= int(g[1]) <= 57082: # tu1967: H49833 - H57082 return 'tu1967' elif g[0] == 'h' and 57083 <= int(g[1]) <= 65572: # tu1968: H57083 - H65572 return 'tu1968' elif g[0] == 'h' and 65573 <= int(g[1]) <= 67331: # tu1969: H65573 - H67331 return 'tu1969' elif g[0] == 'd' and 101 <= int(g[1]) <= 7726: # tp1960: D101 - D7726 return 'tp1960' elif g[0] == 'd' and 7727 <= int(g[1]) <= 15788: # tp1961: D7727 - D15788 return 'tp1961' elif g[0] == 'd' and 15789 <= int(g[1]): # tp1962: D15789 - onward return 'tp1962' elif g[0] == 'du' and 101 <= int(g[1]) <= 5824: # 650 t65u1963: DU101 - DU5824 return 't65u1963' elif g[0] == 'du' and 5825 <= int(g[1]) <= 13374: # 650 t65u1964: DU5825 - DU13374 return 't65u1964' elif g[0] == 'du' and 5825 <= int(g[1]) <= 13374: # 650 t65u1965: DU5825 - DU13374 return 't65u1965' elif g[0] == 'du' and 24875 <= int(g[1]) <= 44393: # 650 t65u1966: DU24875 - DU44393 return 't65u1966' elif g[0] == 'du' and 44394 <= int(g[1]) <= 66245: # 650 t65u1967: DU44394 - DU66245 return 't65u1967' elif g[0] == 'du' and 66246 <= int(g[1]) <= 85903: # 650 t65u1968: DU66246 - DU85903 return 't65u1968' elif g[0] == 'du' and 85904 <= int(g[1]) <= 90282: # 650 t65u1969: DU85904 - DU90282 return 't65u1969' else: return None elif alpha: g = alpha.groups() if 25000 <= int(g[0]) <= 32302: # t1952: 25000 - 32302 return 't1952' elif 32303 <= int(g[0]) <= 44134: # t1953: 32303 - 44134 return 't1953' elif 44135 <= int(g[0]) <= 56699: # t1954: 44135 - 56699 return 't1954' elif 56700 <= int(g[0]) <= 70929: # t1955: 56700 - 70929 return 't1955' elif 70930 <= int(g[0]) <= 82799: # t1956: 70930 - 82799 return 't1956' elif 100 <= int(g[0]) <= 944 and g[0][0]=='0': # t1956: 0100 - 0944 return 't1956' elif g[0][0] == '0' and 945 <= int(g[0]) <= 5: # tp1957: 0945 - 05 return 'tp1957' elif g[0][0] == '0' and 6 <= int(g[0]) <= 20075: # tp1958: 06 - 020075 return 'tp1958' elif g[0][0] == '0' and 20076 <= int(g[0]) <= 29363: # tp1
Re: suggestions for VIN parsing
On Sunday, December 28, 2014 5:34:11 PM UTC-6, Vincent Davis wrote: > > [snip: code sample with Unicode spaces! Yes, *UNICODE SPACES*!] Oh my! Might i offer some suggestions to improve the readability of this code? 1. Indexing is syntactically noisy, so if you find yourself fetching the same index more than once, then that is a good time to store the indexed value into a local variable. 2. The only thing worse than duplicating code which fetches the same index over and over again, is wrapping the fetch in casting function (in this case: "int()") OVER and OVER again! 3. I see that you are utilizing regexps to aid in the logic, and although i agree that regexps are overkill for this problem (since it could "technically" be solved with string methods) if *I* had to solve this problem, i would use the power of regexps -- although i would use them more wisely ;-) I have not studied the data thoroughly, but just by "grazing over" the code you posted i can see a few distinct patterns that emerge from the VIN data-set. Here is a description of the patterns: "\d+n" "\d+na" "d\d+" "du\d+" and the last pattern being all digits: "\d+" Even though your "verbose-run-on-conditional" would most likely execute faster, i prefer to write code (when performance is not mission critical!) in the most readable and maintainable fashion. And in order to achieve that goal, you always want to keep the main logic as succinct as possible whist encapsulating the difficult bits in "suitably abstracted structures". DIVIDE AND CONQUER! My approach would be as follows: 1. Create a map for each distinct set of VIN patterns with the keys being a two-tuple that represents the low and high limits of the serial number, and the values being the year of that range.. database = { 'map_NA':{ (101, 15808): "Triumph 1951", (15809, 25000): "Triumph 1952", ..., }, 'map_N':{ ..., }, 'map_H':{ ..., }, 'map_D':{ ..., }, 'map_DU':{ ..., }, } 2. Create a regexp pattern for each "distinct VIN pattern". The group captures will be used to strip-out *ONLY* the numeric parts! Then concatenate all the regexp patterns into a single monolithic program utilizing "named groups". (The group names will be the corresponding "map_*" for which to search) [code stub here] :-P" 3. Now you can write some fairly simple logic. prog = re.compile("pat1|pat2|pat3...") def parse_vin(vin): match = prog.search(vin) if match: gname = # Fetch the groupname from the match object. number = # Fetch the digits from the group capture. d = database[gname] for k in d: low, high = d[k] if low <= number <= high: return d[k] return None While this approach could be "heavy handed", i feel it will be much easier to maintain and expand. I'd argue that if you're going to utilize re's, then you should wield the full power they provide, else, use some other method. PS: You know you have a Unicode monkey on your back when you use tools that insert Unicode spaces! PPS: Hopefully i did not make any stupid mistakes, it's past my bedtime! -- https://mail.python.org/mailman/listinfo/python-list
Re: suggestions for VIN parsing
On Monday, December 29, 2014 12:50:39 AM UTC-6, Rick Johnson wrote: [EDIT] > 3. Now you can write some fairly simple logic. > > prog = re.compile("pat1|pat2|pat3...") > def parse_vin(vin): > match = prog.search(vin) > if match: > gname = # Fetch the groupname from the match object. > number = # Fetch the digits from the group capture. > d = database[gname] > for k in d: > low, high = d[k] Dammit! That last line should have been: low, high = k But even better would be: d = database[gname] for low,high in d: if low <= number <= high: ... I knew something was tickling my sub-conscience as i sent that reply, i should have known better! PS: Hey, I said it was "fairly simple" logic, not "perfect" logic! -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote: > I need to search through a directory of text files for a string. > Here is a short program I made in the past to search through a single > text file for a line of text. Step1: Search through a single file. # Just a few more brush strokes... Step2: Search through all files in a directory. # Time to go exploring! Step3: Option to filter by file extension. # Waste not, want not! Step4: Option for recursing down sub-directories. # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [Opps, fell into a recursive black hole!] # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [BREAK] # Whew, no worries, MaximumRecursionError is my best friend! ;-) In addition to the other advice, you might want to check out os.walk(). -- https://mail.python.org/mailman/listinfo/python-list