Formatting question.
Hi all, My input file looks like this : ( the data is separated by tabs ) 11/26/2007 56.366 898.90 -10.086 23.11 1212.3 11/26/2007 52.25 897.6 -12.5 12.61.5 11/26/2007 52.25 897.6 -12.5 12.6133.5 The output I'm trying to get is as follows : ( Insert NBUSER.Grochamber Values '11/26/2007','56.366','898.90','-10.086','23.11','1212.3', ) ( Insert NBUSER.Grochamber Values '11/26/2007','52.25','897.6','-12.5','12.6','1.5', ) ( Insert NBUSER.Grochamber Values '11/26/2007','52.25','897.6','-12.5','12.6','133.5', ) The following is the program i have written so far : LoL = [] for line in open('mydata.txt'): LoL.append(line.split("\t")) print "read from a file: ", LoL, outfile = open("out.dat", "w") lilength = len(LoL) liwidelength = len(LoL[1]) print "length of list is " , lilength, "long" print "length of list is " , liwidelength, "long" for x in range(lilength): outfile.write(" ( ") outfile.write('Insert NBUSER.Grochamber Values ') for y in range(liwidelength): outfile.write( "'%s'," % (LoL[x][y])) outfile.write(" ) \n") outfile.close() I have 3 questions : 1. The formatting in the first line comes out wrong all the time. I m using windows python 2.5.1. The last part of the first line is always on the second line. 2. How do I avoid the "," symbol after the last entry in the line? (this are supposed to be sql-queries - importing excel based tabbed data to sql database) 3. What do I do when the data is missing? Like missing data? Thanks for all your help! Mike -- http://mail.python.org/mailman/listinfo/python-list
Re: Fwd: Formatting question.
On Nov 20, 9:13 pm, "Sergio Correia" <[EMAIL PROTECTED]> wrote: > Hey Mike, > Welcome to Python! > > About your first issue, just change the line > outfile.write( "'%s'," % (LoL[x][y])) > With > outfile.write( "'%s'," % (LoL[x][y][:-1])) > > Why? Because when you do the line.split, you are including the '\n' at > the end, so a new line is created. > > Now, what you are doing is not very pythonic (batteries are included > in python, so you could just use the CSV module). Also, the for x in > range(len(somelist)) is not recommended, you can just do something > like: > > > import csv > > infile = open("mydata.txt", "rb") > outfile = open("out.txt", "wb") > > reader = csv.reader(infile, delimiter='\t') > writer = csv.writer(outfile, quotechar=None, delimiter = "\\") > > for row in reader: > data = "'" + "', '".join(row) + "'" > base = " ( Insert NBUSER.Grochamber Values %s, )" > writer.writerow([base % data]) > > infile.close() > outfile.close() > > The above lines works like your program, writing exactly what you asked. > Again, all lists are iterable, you don't need to iterate an integer > from 1 to len(list). (isn't python wonderful?) > > HTH, > Sergio > > On Nov 20, 2007 6:11 PM, mike5160 <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > My input file looks like this : ( the data is separated by tabs ) > > > 11/26/2007 56.366 898.90 -10.086 23.11 1212.3 > > 11/26/2007 52.25 897.6 -12.5 12.61.5 > > 11/26/2007 52.25 897.6 -12.5 12.6133.5 > > > The output I'm trying to get is as follows : > > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','56.366','898.90','-10.086','23.11','1212.3', ) > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','52.25','897.6','-12.5','12.6','1.5', ) > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','52.25','897.6','-12.5','12.6','133.5', ) > > > The following is the program i have written so far : > > > LoL = [] > > > for line in open('mydata.txt'): > > LoL.append(line.split("\t")) > > > print "read from a file: ", LoL, > > > outfile = open("out.dat", "w") > > > lilength = len(LoL) > > liwidelength = len(LoL[1]) > > > print "length of list is " , lilength, "long" > > print "length of list is " , liwidelength, "long" > > > for x in range(lilength): > > outfile.write(" ( ") > > outfile.write('Insert NBUSER.Grochamber Values ') > > for y in range(liwidelength): > > outfile.write( "'%s'," % (LoL[x][y])) > > outfile.write(" ) \n") > > > outfile.close() > > > I have 3 questions : > > > 1. The formatting in the first line comes out wrong all the time. I m > > using windows python 2.5.1. The last part of the first line is always > > on the second line. > > > 2. How do I avoid the "," symbol after the last entry in the line? > > (this are supposed to be sql-queries - importing excel based tabbed > > data to sql database) > > > 3. What do I do when the data is missing? Like missing data? > > > Thanks for all your help! > > > Mike > > > -- > >http://mail.python.org/mailman/listinfo/python-list76 HI Sergio, First of all, thanks for your reply and yes I'm new to Python. Did a google on CSV and I am reading the documentation about it right now. In the post I mentioned I was using Windows. I also have a laptop with linux installed on it. When I ran the same program on my linux laptop I did see the \n included in the list. Somehow, I did not see it on windows, or missed it. So that cleared up the first problem. Also, I will be doing a lot of this data importing from excel etc. can you point me to a tutorial/document/book etc. where I can find snippets of using various python utilities. For eg. something which has the sample for using "line.split("\t") " or " outfile.write( "'%s'," % (LoL[x][y][:-1])) " , explaining the various options available. The default "Idle gui help" is not too informative to a newbie like me. Thanks again for your reply, Mike. -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting question.
On Nov 21, 1:22 am, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > On Tue, 20 Nov 2007 15:11:38 -0800 (PST), mike5160 <[EMAIL PROTECTED]> > declaimed the following in comp.lang.python: > > > > > Hi all, > > > My input file looks like this : ( the data is separated by tabs ) > > > 11/26/2007 56.366 898.90 -10.086 23.11 1212.3 > > 11/26/2007 52.25 897.6 -12.5 12.61.5 > > 11/26/2007 52.25 897.6 -12.5 12.6133.5 > > > The output I'm trying to get is as follows : > > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','56.366','898.90','-10.086','23.11','1212.3', ) > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','52.25','897.6','-12.5','12.6','1.5', ) > > ( Insert NBUSER.Grochamber Values > > '11/26/2007','52.25','897.6','-12.5','12.6','133.5', ) > > > > > 2. How do I avoid the "," symbol after the last entry in the line? > > (this are supposed to be sql-queries - importing excel based tabbed > > data to sql database) > > If those are SQL inserts, the ( is in the wrong place... > > insert into NBUSER.Grochamber values (v1, v2, ... , vx) > > > 3. What do I do when the data is missing? Like missing data? > > First, for reading the file, recommend you look at the CSV module, > which can be configured to use TABS rather than COMMAS. > > For SQL -- if you are going to be writing raw text strings to an > output file for later batching, YOU are going to have to supply some > means to properly escape the data. The better way is to have the program > connect to the database, using the applicable database adapter: MySQLdb > for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC > adapter if going that route... Let IT do the escaping. > > Now, since MySQLdb just happens to expose the escaping function, AND > just uses %s formatting of the results, one could easily get stuff to > write to a file. > > >>> import MySQLdb > >>> con = MySQLdb.connect(host="localhost", user="test", passwd="test", > >>> db="test") > >>> data = [ "11/26/200756.366 898.90 -10.086 23.11 1212.3", > > ... "11/26/2007897.6 O'Reilly12.6 > 1.5", > ... "11/26/200752.25 897.6 -12.5 12.6133.5" > ] > > Note how I left out a field (two tabs, nothing between), and how I > put in a data item with a ' in it. > > >>> for ln in data: > > ... flds = ln.split("\t") > ... placeholders = ", ".join(["%s"] * len(flds)) > ... sql = BASE % placeholders > ... sql = sql % con.literal(flds) > ... print sql > ... > insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90', > '-10.086', '23.11', '1212.3') > insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6', > 'O\'Reilly', '12.6', '1.5') > insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6', > '-12.5', '12.6', '133.5') > > > > Note how the empty field is just '' (If you really need a NULL, > you'll have to do some games to put a Python None entity into that empty > string field). Also note how the single quote string value has been > escaped. > > Something like this for NULL in STRING DATA -- if a field were > numeric 0 it would get substituted with a NULL too... > > >>> for ln in data: > > ... flds = ln.split("\t") > ... placeholders = ", ".join(["%s"] * len(flds)) > ... sql = BASE % placeholders > ... flds = [(fld or None) for fld in flds] > ... sql = sql % con.literal(flds) > ... print sql > ... > insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90', > '-10.086', '23.11', '1212.3') > insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6', > 'O\'Reilly', '12.6', '1.5') > insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6', > '-12.5', '12.6', '133.5') >
Re: Formatting question.
On Nov 21, 11:36 am, mike5160 <[EMAIL PROTECTED]> wrote: > On Nov 21, 1:22 am, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > > > > > On Tue, 20 Nov 2007 15:11:38 -0800 (PST), mike5160 <[EMAIL PROTECTED]> > > declaimed the following in comp.lang.python: > > > > Hi all, > > > > My input file looks like this : ( the data is separated by tabs ) > > > > 11/26/2007 56.366 898.90 -10.086 23.11 1212.3 > > > 11/26/2007 52.25 897.6 -12.5 12.61.5 > > > 11/26/2007 52.25 897.6 -12.5 12.6133.5 > > > > The output I'm trying to get is as follows : > > > > ( Insert NBUSER.Grochamber Values > > > '11/26/2007','56.366','898.90','-10.086','23.11','1212.3', ) > > > ( Insert NBUSER.Grochamber Values > > > '11/26/2007','52.25','897.6','-12.5','12.6','1.5', ) > > > ( Insert NBUSER.Grochamber Values > > > '11/26/2007','52.25','897.6','-12.5','12.6','133.5', ) > > > > > > > 2. How do I avoid the "," symbol after the last entry in the line? > > > (this are supposed to be sql-queries - importing excel based tabbed > > > data to sql database) > > > If those are SQL inserts, the ( is in the wrong place... > > > insert into NBUSER.Grochamber values (v1, v2, ... , vx) > > > > 3. What do I do when the data is missing? Like missing data? > > > First, for reading the file, recommend you look at the CSV module, > > which can be configured to use TABS rather than COMMAS. > > > For SQL -- if you are going to be writing raw text strings to an > > output file for later batching, YOU are going to have to supply some > > means to properly escape the data. The better way is to have the program > > connect to the database, using the applicable database adapter: MySQLdb > > for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC > > adapter if going that route... Let IT do the escaping. > > > Now, since MySQLdb just happens to expose the escaping function, AND > > just uses %s formatting of the results, one could easily get stuff to > > write to a file. > > > >>> import MySQLdb > > >>> con = MySQLdb.connect(host="localhost", user="test", passwd="test", > > >>> db="test") > > >>> data = [ "11/26/200756.366 898.90 -10.086 23.11 1212.3", > > > ... "11/26/2007897.6 O'Reilly12.6 > > 1.5", > > ... "11/26/200752.25 897.6 -12.5 12.6133.5" > > ] > > > Note how I left out a field (two tabs, nothing between), and how I > > put in a data item with a ' in it. > > > >>> for ln in data: > > > ... flds = ln.split("\t") > > ... placeholders = ", ".join(["%s"] * len(flds)) > > ... sql = BASE % placeholders > > ... sql = sql % con.literal(flds) > > ... print sql > > ... > > insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90', > > '-10.086', '23.11', '1212.3') > > insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6', > > 'O\'Reilly', '12.6', '1.5') > > insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6', > > '-12.5', '12.6', '133.5') > > > Note how the empty field is just '' (If you really need a NULL, > > you'll have to do some games to put a Python None entity into that empty > > string field). Also note how the single quote string value has been > > escaped. > > > Something like this for NULL in STRING DATA -- if a field were > > numeric 0 it would get substituted with a NULL too... > > > >>> for ln in data: > > > ... flds = ln.split("\t") > > ... placeholders = ", ".join(["%s"] * len(flds)) > > ... sql = BASE % placeholders > > ... flds = [(fld or None) for fld in flds] > > ... sql = sql % con.literal(flds) > > ... print sql > > ... > > insert into NBUSER.Grochamber values ('11/26/2007', '56.366',