2010/11/27 Dax Bloom <bloom....@gmail.com>: > On Nov 6, 6:41 am, Vlastimil Brom <vlastimil.b...@gmail.com> wrote: >> 2010/11/6 Dax Bloom <bloom....@gmail.com>: >> ... >> Rask_Grimm_re = ur"[bdgptk]ʰ?" >> Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ... >> >> def repl_fn(m): >> return Rask_Grimm_dct.get(m.group(), m.group()) >> >> ie_txt = u" bʰrāter ... " >> almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt) >> print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD >> >> ######################################## >> >> bʰrāter ... >> brāþer ... >> >> hth, >> vbr > ... > Hello Vlastimil, > > Could you please explain what the variables %s and % mean and how to > implement this part of the code in a working python program? I can't > fully appreciate Peter's quote on rules > > > Best regards, > > Dax Bloom > Hi, the mentioned part is called string interpolation; the last line is equivalent to print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD is equivalent to the simple string concatenation: print ie_txt+ u" >> " + almost_germ_txt see: http://docs.python.org/library/stdtypes.html#string-formatting-operations
The values of the tuple (or eventually dict or another mapping) given after the modulo operator % are inserted at the respective positions (here %s) of the preceding string (or unicode); some more advanced adjustments or conversions are also possible here, which aren't needed in this simple case. (There is also another string formatting mechanism in the newer versions of python http://docs.python.org/library/string.html#formatstrings which may be more suitable for more complex tasks.) The implementation depends on the rest of your program and the input/output of the data, you wish to have (to be able to print the output with rather non-trivial characters, you will need the unicode enabled console (Idle is a basic one available with python). Otherwise the sample is self contained and should be runnable as is; you can add other needed items to Rask_Grimm_dct and all substrings matching Rask_Grimm_re will be replaced in one pass. You can also add a series of such replacements (re pattern and a dict of a ie: germ pairs), of course only for context-free changes. On the other hand, I have no simple idea how th deal with Verner's Law and the like (even if you passed the accents in the PIE forms); well besides a lexicographic approach, where you would have to identify the word stems to decide the changes to be applied. hth, vbr -- http://mail.python.org/mailman/listinfo/python-list