XML -> Tab-delimited text file (using lxml)
I'm attempting to do the following: A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb) B) Grab specific fields and output to a tab-delimited text file The only problem I'm having is that the tab-delimited text file requires a different order of values than which appear in the XML file. Example below. (Note: The last item "3456cdef" shows the description value as being before the name, where as in previous items, it comes after. This is to simulate the XML data with which I am working.) And the tab-delimited text file should appear as follows: (tabs are as 2 spaces, for the sake of readability here) (ID,name,description,image) 1234abcd My Wonderful Product 1 My Wonderful Product 1 is a wonderful product, indeed. image.jpg 2345bcde My Wonderful Product 2 My Wonderful Product 2 is a wonderful product, indeed. image2.jpg 3456cdef My Wonderful Product 3 My Wonderful Product 3 is a wonderful product, indeed. image3.jpg Currently, I'm working with the lxml library for iteration and parsing, though this is proving to be a bit of a challenge for data that needs to be reorganized (such as mine). Sample below. ''' Start code ''' from lxml import etree def main(): # Far too much room would be taken up if I were to paste my # real code here, so I will give a smaller example of what # I'm doing. Also, I do realize this is a very naive way to do # what it is I'm trying to accomplish... besides the fact # that it doesn't work as intended in the first place. out = open('output.txt','w') cat = etree.parse('catalog.xml') for el in cat.iter(): # Search for the first item, make a new line for it # and output the ID if el.tag == "Item": out.write("\n%s\t" % (el.attrib['ID'])) elif el.tag == "ItemVal": if el.attrib['ValueID'] == "name": out.write("%s\t" % (el.attrib['value'])) elif el.attrib['ValueID'] == "description": out.write("%s\t" % (el.attrib['value'])) elif el.attrib['ValueID'] == "image": out.write("%s\t" % (el.attrib['value'])) out.close() if __name__ == '__main__': main() ''' End code ''' I now realize that etree.iter() is meant to be used in an entirely different fashion, but my brain is stuck on this naive way of coding. If someone could give me a push in any correct direction I would be most grateful. -- http://mail.python.org/mailman/listinfo/python-list
Re: XML -> Tab-delimited text file (using lxml)
On Nov 19, 11:03 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > > Use iterparse() instead of parsing the file into memory completely. > > *stuff* > > Stefan That worked wonders. Thanks a lot, Stefan. So, iterparse() uses an iterate -> parse method instead of parse() and iter()'s parse -> iterate method (if that makes any sense)? -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] Usage of U+00B6 PILCROW SIGN
In article , Michael Torrie wrote: > On 02/04/2014 08:21 AM, wxjmfa...@gmail.com wrote: > > > > Useless and really ugly. > > How do you recommend we discover the anchor links for linking to? Use the Table Of Contents panel on the left? -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: querry on queue ( thread safe ) multithreading
In article , Jaiprakash Singh wrote: > hey i am working on scraping a site , so i am using multi-threading concept. > i wrote a code based on queue (thread safe) but still my code block out after > sometime, please help , i have searched a lot but unable to resolve it. > please help i stuck here. Do you really want to subject the web server to 150 simultaneous requests? Some would consider that a denial-of-service attack. When I scrape a site, and I have been doing that occasionally of late, I put a 10-second sleep after each HTTP request. That makes my program more considerate of other people's resources and a better web citizen. It is also much easier to program. -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: finding data from two different files.
In article , <"torque.in...@gmail.com"> wrote: > Hi all, > > I am new to python, just was looking for logic to understand to write code in > the below scenario. > > I am having a file (filea) with multiple columns, and another file(fileb) > with again multiple columns, but say i want to use column2 of fileb as a > search expression to search for similar value in column3 of filea. and print > it with value of rows of filea. > > filea: > a 1 ab > b 2 bc > d 3 de > e 4 ef > . > . > . > > fileb > z ab 24 > y bc 85 > x ef 123 > w de 33 > > Regards../ omps Interestingly, somebody named "Om Prakash Singh" asked the identical question on the perl beginners list, except with the word "perl" substituted for "python". Is this a homework problem? Are you unsure about which language to use? Are you comparison shopping? -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: Python on a MacBook Pro (not my machine)
In article <0799708c-59d5-41c2-9fcc-24b7ca873...@googlegroups.com>, John Ladasky wrote: > > So, what other free and lightweight editing options do I have for a Mac? I > have found a few (fairly old) discussions on comp.lang.python which suggest > Eric (http://eric-ide.python-projects.org/) and Editra (http://editra.org/). > Opinions on these and other choices are appreciated. I use BBEdit (paid) and MacVim (free) for Mac editing. Bare Bones Software has a free version of BBEdit called TextWrangler that a lot of people use. <http://www.barebones.com/products/bbedit/> <http://www.barebones.com/products/textwrangler/> <http://code.google.com/p/macvim/> -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: Basic Python Questions - Oct. 31, 2013
In article , E.D.G. wrote: >My main, complex programs won't be run at Web sites. They will > instead continue to be available as downloadable exe programs. The CGI (or > whatever) programming work would involve relatively simple programs. But > they would need to be able to generate charts that would be displayed on Web > pages. That sounds like it is probably fairly easy to do using Python. A > Perl - Gnuplot combination is also supposed to be able to do that. But so > far I have not seen any good explanations for how to actually get Gnuplot to > run as a callable CGI program. So other programs such as Python are being > considered. One way to generate plot within a CGI program is this: 1. Write a file with gnuplot commands (e.g., 'gnuplot.cmd') that set the output device to a graphics file of some format (e.g., PNG), generate a plot, and quit gnuplot. 2. Run gnuplot and point it to the file of commands (e.g., 'gnuplot gunplot.cmd') . How this is done depends upon the CGI program language (see below). 3. Generate HTML that uses the generated graphics file as an embedded image (using the tag). I have done this in the past, but not recently. This should work for Python (os.system("gnuplot gnuplot.cmd") or Perl (system("gnuplot gnuplot.cmd") with suitable commands to execute external programs. -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: python newbie
In article , Maura E Monville wrote: > My supervisor has palmed me off with a python code, written by a > collaborator, which implements an algorithm aimed at denoising the dose > distribution (energy per unit mass) output from a radiation transport Monte > Carlo code. > My task is to translate the python code into a MatLab code. Interestingly, I was recently given the task of translating a Matlab program into Python. Just the opposite of your problem. And I knew neither! > A colleague of mine and I stared at the python code for a while without > understanding the logic. > To me that code looks like Chinese or Arabian. > I don't have clear ideas about the syntax and python variables. For instance, > in MatLab there is the cell array storage type to host data of different > type. I have no idea how a 3D matrix can be loaded through python. Does the Python program use scipy and numpy modules? Those, in my short experience, provide the closest equivalent to Matlab matrices and functions. They allowed me to do almost a line-by-line translation of Matlab code into Python. The h5py module allows you to read and write Matlab "save" files in Python, too. Some of the differences between Matlab and Python: 1. Matlab array indexing starts at 1; Python starts at 0. 2. Matlab uses parentheses () for sequence indexing; Python uses brackets []. 3. Matlab arrays are column-major order; Numpy arrays are row-major order, Post Python code here or provide a link to code posted elsewhere for additional help. Good luck. -- Jim Gibson -- https://mail.python.org/mailman/listinfo/python-list
Re: A Sort Optimization Technique: decorate-sort-dedecorate
In article <[EMAIL PROTECTED]>, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, Tom Cole wrote: > > > In Java, classes can implement the Comparable interface. This interface > > contains only one method, a compareTo(Object o) method, and it is > > defined to return a value < 0 if the Object is considered less than the > > one being passed as an argument, it returns a value > 0 if considered > > greater than, and 0 if they are considered equal. > > > > The object implementing this interface can use any of the variables > > available to it (AKA address, zip code, longitude, latitude, first > > name, whatever) to return this -1, 0 or 1. This is slightly different > > than what you mention as we don't have to "decorate" the object. These > > are all variables that already exist in the Object, and if fact make it > > what it is. So, of course, there is no need to un-decorate at the end. > > Python has such a mechanism too, the special `__cmp__()` method > has basically the same signature. The problem the decorate, sort, > un-decorate pattern solves is that this object specific compare operations > only use *one* criteria. I can't believe I am getting drawn into a thread started by xahlee, but here goes anyway: The problem addressed by what is know in Perl as the 'Schwartzian Transform' is that the compare operation can be an expensive one, regardless of the whether the comparison uses multiple keys. Since in comparison sorts, the compare operation will be executed N(logN) times, it is more efficient to pre-compute a set of keys, one for each object to be sorted. That need be done only N times. The sort can then use these pre-computed keys to sort the objects. See, for example: http://en.wikipedia.org/wiki/Schwartzian_transform -- Jim Gibson Posted Via Usenet.com Premium Usenet Newsgroup Services -- ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY ** -- http://www.usenet.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Fortran vs Python - Newbie Question
"Beliavsky" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Mar 26, 10:16 am, [EMAIL PROTECTED] (Cameron Laird) wrote: >> In article >> <[EMAIL PROTECTED]>,[EMAIL PROTECTED] >> <[EMAIL PROTECTED]> wrote: > >> >Is there a mac version?? >> >Thanks >> >Chris >> >> Yes. >> >> Several, in fact--all available at no charge. The Python >> world is different from what experience with Fortran might >> lead you to expect. > > Your experience with Fortran is dated -- see below. > >> >> I'll be more clear: Fortran itself is a distinguished >> language with many meritorious implementations. It can be >> costly, though, finding the implementation you want/need >> for any specific environment. > > Gfortran, which supports Fortran 95 and a little of Fortran 2003, is > part of GCC and is thus widely available. Binaries for g95, also based > on GCC, are available for more than a dozen platforms, including > Windows, Mac OS X, and Linux. I use both and consider only g95 mature, > but gfortran does produce faster programs. Intel's Fortran compilers > cost about $500 on Windows and Mac OS and $700 on Linux. It's not > free, but I would not call it costly for professional developers. > > Speaking of money, gfortran and g95 have free manuals, the latter > available in six languages > http://ftp.g95.org/ . Final drafts of Fortran standards, identical to > the official ISO standards, are freely available. The manual for Numpy > costs $40 per copy. Sun also provides its sun studio ide and compilers(c , c++ and fortran) free of charge on x86 linux just have to register for free. http://developers.sun.com/sunstudio/downloads/ Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Mathematica 7 compares to other languages
In article <5ebe5a7d-cbdf-4d66-a816-a7d2a0a27...@40g2000prx.googlegroups.com>, Xah Lee wrote: > On Dec 10, 2:47 pm, John W Kennedy wrote: > > Xah Lee wrote: > > > In lisp, python, perl, etc, you'll have 10 or so lines. In C or Java, > > > you'll have 50 or hundreds lines. > > > > C: > > > > #include > > #include > > > > void normal(int dim, float* x, float* a) { > > float sum = 0.0f; > > int i; > > float divisor; > > for (i = 0; i < dim; ++i) sum += x[i] * x[i]; > > divisor = sqrt(sum); > > for (i = 0; i < dim; ++i) a[i] = x[i]/divisor; > > > > } > > > > Java: > > > > static float[] normal(final float[] x) { > > float sum = 0.0f; > > for (int i = 0; i < x.length; ++i) sum += x[i] * x[i]; > > final float divisor = (float) Math.sqrt(sum); > > float[] a = new float[x.length]; > > for (int i = 0; i < x.length; ++i) a[i] = x[i]/divisor; > > return a; > > > > } > > Thanks to various replies. > > I've now gather code solutions in ruby, python, C, Java, here: > > A Example of Mathematica's Expressiveness > http://xahlee.org/UnixResource_dir/writ/Mathematica_expressiveness.html > > now lacking is perl, elisp, which i can do well in a condensed way. > It'd be interesting also to have javascript... and perhaps erlang, > OCaml/F#, Haskell too. Perl: sub normal { my $sum = 0; $sum += $_ ** 2 for @_; my $length = sqrt($sum); return map { $_/$length } @_; } -- Jim Gibson -- http://mail.python.org/mailman/listinfo/python-list
determine file type
Is there an equivalent to the unix 'file' command? [mark tmp]$ file min.txt min.txt: ASCII text [mark tmp]$ file trunk trunk: directory [mark tmp]$ file compliance.tgz compliance.tgz: gzip compressed data, from Unix What I really want to do is determine if a file is 1) a directory, 2) a text file 3) a binary file. Is there a way to do this? Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: determine file type
> > > import os > def test_file(filename, maxread=1024): >if os.path.isdir(filename): > return 'directory' >afile = open(filename) # open as text >for achar in afile.read(maxread): > if ord(achar) > 127: >return 'binary' >return 'text' > > Pefect, thanks! -- http://mail.python.org/mailman/listinfo/python-list