Chris wrote: > Dear All, > > I'm trying to read ten 200 MB textfiles into a MySQL MyISAM database > (Linux, ext4). The script output is suddenly stopping, while the Python > process is still running (or should I say sleeping?). It's not in top, > but in ps visible. > > Why is it stopping? Is there a way to make it continue, without calling > "kill -9", deleting the processed lines and starting it again? > > Thank you in advance. > > > > [1] http://pastebin.com/CxHCA9eB >
> #!/usr/bin/python > > import MySQLdb, pprint, re > db = None > daten = "/home/chris/temp/data/data/" > host = "localhost" > user = "data" > passwd = "data" > database = "data" > table = "data" > > def connect_mysql(): > global db, host, user, passwd, database > db = MySQLdb.connect(host, user, passwd, database) > return(db) > > > def read_file(srcfile): > lines = [] > f = open(srcfile, 'r') > while True: > line = f.readline() > #print line > lines.append(line) > if len(line) == 0: > break > return(lines) The read_file() function looks suspicious. It uses a round-about way to read the whole file into memory. Maybe your system is just swapping? Throw read_file() away and instead iterate over the file directly (see below). > def write_db(anonid, query, querytime, itemrank, clickurl): > global db, table > print "write_db aufgerufen." > cur = db.cursor() > try: > cur.execute("""INSERT INTO data (anonid,query,querytime,itemrank,clickurl) VALUES (%s,%s,%s,%s,%s)""", (anonid,query,querytime,itemrank,clickurl)) > db.commit() > except: > db.rollback() > > > def split_line(line): > print "split_line called." > print "line is:", line > searchObj = re.split(r'(\d*)\t(.*)\t([0-9: -]+)\t(\d*)\t([A-Za- z0-9._:/ -]*)',line, re.I|re.U) > return(searchObj) > > > > db = connect_mysql() > pprint.pprint(db) with open(daten + "test-07b.txt") as lines: for line in lines: result = split_line(line) write_db(result[1], result[2], result[3], result[4], result[5]) > db.close() Random remarks: - A bare except is evil. You lose valuable information. - A 'global' statement is only needed to rebind a module-global variable, not to access such a variable. At first glance all your 'global' declarations seem superfluous. - You could change the signature of write_db() to accept result[1:6]. - Do you really need a new cursor for every write? Keep one around as a global. - You might try cur.executemany() to speed things up a bit. -- https://mail.python.org/mailman/listinfo/python-list