[EMAIL PROTECTED] wrote: > In a message dated 8/4/2007 11:50:05 PM Central Daylight Time, > [EMAIL PROTECTED] writes: > > [EMAIL PROTECTED] wrote: > > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > >> Hi All. > >> Let's say I have some badly formatted text called doc: > >> > >> doc= > >> """ > >> friendid > >> Female > >> > >> 23 years old > >> > >> Los Gatos > >> > >> United States > >> friendid > >> Male > >> > >> 24 years old > >> > >> San Francisco, California > >> > >> United States > >> """ > >> > >> How would I get these results to be displayed in a format similar to: > >> friendid;Female;23 years old;Los Gatos;United States > >> friendid;Male; 24 years old;San Francisco, California;United States > >> > >> The latter is a lot easier to organize and can be quickly imported > >> into Excel's column format. > >> > >> Thanks Much, > >> Sam > > > > d = doc.split('\n') > > > > f = [i.split() for i in d if i] > > > > g = [' '.join(i) for i in f] > > > > rec = [] > > temprec = [] > > for i in g: > > if i: > > if i == 'friendid': > > rec.append(temprec) > > temprec = [i] > > else: > > temprec.append(i) > > rec.append(temprec) > > > > output = [';'.join(i) for i in rec if i] > > > > for i in output: print i > > > > ## friendid;Female;23 years old;Los Gatos;United States > > ## friendid;Male;24 years old;San Francisco, California;United > > States > > > > Also : > > docList = [ i.strip() for i in doc.split('\n') if i.strip()] > > lines = [i for i in xrange(len(docList)) if docList[i] == > 'friendid']+[len(docList)] > > docOut = '' > for k in [docList[lines[j]:lines[j+1]] for j in xrange(len(lines)-1)]: > docOut += '\n' + ';'.join(k) > > docOut = docOut[1:] # Get rid of initial '\n' > > Aren't you making an unwarranted assumption here? > That doc ALWAYS starts with EXACTLY one blank line?
I guess you are referring to : docList = [ i.strip() for i in doc.split('\n') if i.strip()] Actually no, "doc.split('\n')" will split it a list member per line, whether there are blank lines or not. Then "i.strip()" will take care of redundant space, and "if i.strip()" will take care of blank lines. Try it in your shell, play with "doc" and you'll see (I did try it without the first blank line, and with a lot of blank lines and it's ok). > > That's why I didn't use a list comprehension in that > one section, to cover the possibility of any number > (including none) of blank lines. It works, didn't make a thorough testing but the cases you suggest were tested. > > By blindly coding [1:] you run the risk of data loss, > and that's a poor example for the OP. It is not "blindly". I guarantee the first byte will ALLWAYS be a '\n', that's because I'm putting it there with "docOut += '\n' + ';'.join(k)" (check the '\n' added at the beginning of each cycle), and I need to strip the beginning '\n' which is not necessary. Cheers Ricardo -- http://mail.python.org/mailman/listinfo/python-list