First try, probably there are better ways to do it, and it's far from resilient, it breaks in lot of different ways (example: more than one number in one line, number with text on both sides of the line, etc.) I have divided the data munging in many lines so I can see what's happening, and you can fix/modify the code quikly.
Bye, bearophile data1 = """ Some text that can span some lines. More text Apples 34 56 Ducks Some more text. 0.5 g butter """ import re # Separate lines in a list data2 = data1.split("\n") print data2, "\n" # clear lines from trailing and leading spaces, newlines, etc. data3 = map(str.strip, data2) print data3, "\n" # remove blank lines after the stripping data4 = filter(None, data3) print data4, "\n" # create a list of (lines, numbers) of only the lines with a number inside patt1 = re.compile("\d+\.?\d*") # No scientific notation data5 = [(line, n) for line in data4 for n in patt1.findall(line)] print data5, "\n" # remove the number from the lines, and strip such lines data6 = [(line.replace(num, "").strip(), num) for line, num in data5] print data6, "\n" def nconv(num): "To convert a number to an int, and if not possible to a float" try: result = int(num) except ValueError: result = float(num) return result # convert the number strings into ints or floats data7 = [(line, nconv(num)) for line, num in data6] print data7, "\n" # build the final dict of (line: number) result = dict(data7) print result, "\n" -- http://mail.python.org/mailman/listinfo/python-list