XML -> Tab-delimited text file (using lxml)

2008-11-19 Thread Gibson
I'm attempting to do the following:
A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
B) Grab specific fields and output to a tab-delimited text file

The only problem I'm having is that the tab-delimited text file
requires a different order of values than which appear in the XML
file. Example below.


   
  
  
  
   
   
  
  
  
   
   
  
  
  
   


(Note: The last item "3456cdef" shows the description value as being
before the name, where as in previous items, it comes after. This is
to simulate the XML data with which I am working.)
And the tab-delimited text file should appear as follows: (tabs are as
2 spaces, for the sake of readability here)

(ID,name,description,image)
1234abcd  My Wonderful Product 1  My Wonderful Product 1 is a
wonderful product, indeed.  image.jpg
2345bcde  My Wonderful Product 2  My Wonderful Product 2 is a
wonderful product, indeed.  image2.jpg
3456cdef  My Wonderful Product 3  My Wonderful Product 3 is a
wonderful product, indeed.  image3.jpg

Currently, I'm working with the lxml library for iteration and
parsing, though this is proving to be a bit of a challenge for data
that needs to be reorganized (such as mine). Sample below.

''' Start code '''

from lxml import etree

def main():
  # Far too much room would be taken up if I were to paste my
  # real code here, so I will give a smaller example of what
  # I'm doing. Also, I do realize this is a very naive way to do
  # what it is I'm trying to accomplish... besides the fact
  # that it doesn't work as intended in the first place.

  out = open('output.txt','w')
  cat = etree.parse('catalog.xml')
  for el in cat.iter():
# Search for the first item, make a new line for it
# and output the ID
if el.tag == "Item":
  out.write("\n%s\t" % (el.attrib['ID']))
elif el.tag == "ItemVal":
  if el.attrib['ValueID'] == "name":
out.write("%s\t" % (el.attrib['value']))
  elif el.attrib['ValueID'] == "description":
out.write("%s\t" % (el.attrib['value']))
  elif el.attrib['ValueID'] == "image":
out.write("%s\t" % (el.attrib['value']))
  out.close()

if __name__ == '__main__': main()

''' End code '''

I now realize that etree.iter() is meant to be used in an entirely
different fashion, but my brain is stuck on this naive way of coding.
If someone could give me a push in any correct direction I would be
most grateful.
--
http://mail.python.org/mailman/listinfo/python-list


Re: XML -> Tab-delimited text file (using lxml)

2008-11-19 Thread Gibson
On Nov 19, 11:03 am, Stefan Behnel <[EMAIL PROTECTED]> wrote:
>
> Use iterparse() instead of parsing the file into memory completely.
>
> *stuff*
>
> Stefan

That worked wonders. Thanks a lot, Stefan.

So, iterparse() uses an iterate -> parse method instead of parse() and
iter()'s parse -> iterate method (if that makes any sense)?
--
http://mail.python.org/mailman/listinfo/python-list


Re: [OT] Usage of U+00B6 PILCROW SIGN

2014-02-04 Thread Jim Gibson
In article ,
Michael Torrie  wrote:

> On 02/04/2014 08:21 AM, wxjmfa...@gmail.com wrote:
> > 
> > Useless and really ugly.
> 
> How do you recommend we discover the anchor links for linking to?

Use the Table Of Contents panel on the left?

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: querry on queue ( thread safe ) multithreading

2014-03-11 Thread Jim Gibson
In article ,
Jaiprakash Singh  wrote:

> hey i am working on scraping a site , so  i am using multi-threading concept.
> i wrote a code based on queue (thread safe) but still my code block out after
> sometime, please help , i have searched a lot but unable to resolve it.
> please help i stuck here.

Do you really want to subject the web server to 150 simultaneous
requests? Some would consider that a denial-of-service attack.

When I scrape a site, and I have been doing that occasionally of late,
I put a 10-second sleep after each HTTP request. That makes my program
more considerate of other people's resources and a better web citizen.
It is also much easier to program.

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: finding data from two different files.

2013-10-18 Thread Jim Gibson
In article ,
<"torque.in...@gmail.com"> wrote:

> Hi all,
> 
> I am new to python, just was looking for logic to understand to write code in
> the below scenario.
> 
> I am having a file (filea) with multiple columns, and another file(fileb)
> with again multiple columns, but say i want to use column2 of fileb as a
> search expression to search for similar value in column3 of filea. and print
> it with value of rows of filea.
> 
> filea:
> a 1 ab
> b 2 bc
> d 3 de
> e 4 ef
> .
> .
> .
> 
> fileb
> z ab 24
> y bc 85
> x ef 123
> w de 33 
> 
> Regards../ omps

Interestingly, somebody named "Om Prakash Singh" asked the identical
question on the perl beginners list, except with the word "perl"
substituted for "python". Is this a homework problem? Are you unsure
about which language to use? Are you comparison shopping?

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python on a MacBook Pro (not my machine)

2013-10-28 Thread Jim Gibson
In article <0799708c-59d5-41c2-9fcc-24b7ca873...@googlegroups.com>,
John Ladasky  wrote:


> 
> So, what other free and lightweight editing options do I have for a Mac?  I
> have found a few (fairly old) discussions on comp.lang.python which suggest
> Eric (http://eric-ide.python-projects.org/) and Editra (http://editra.org/). 
> Opinions on these and other choices are appreciated.

I use BBEdit (paid) and MacVim (free) for Mac editing. Bare Bones
Software has a free version of BBEdit called TextWrangler that a lot of
people use.

<http://www.barebones.com/products/bbedit/>
<http://www.barebones.com/products/textwrangler/>
<http://code.google.com/p/macvim/>

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Basic Python Questions - Oct. 31, 2013

2013-11-03 Thread Jim Gibson
In article , E.D.G.
 wrote:

>My main, complex programs won't be run at Web sites. They will 
> instead continue to be available as downloadable exe programs.  The CGI (or 
> whatever) programming work would involve relatively simple programs. But 
> they would need to be able to generate charts that would be displayed on Web 
> pages. That sounds like it is probably fairly easy to do using Python. A 
> Perl - Gnuplot combination is also supposed to be able to do that. But so 
> far I have not seen any good explanations for how to actually get Gnuplot to 
> run as a callable CGI program. So other programs such as Python are being 
> considered.

One way to generate plot within a CGI program is this:

1. Write a file with gnuplot commands (e.g., 'gnuplot.cmd') that set
the output device to a graphics file of some format (e.g., PNG),
generate a plot, and quit gnuplot.

2. Run gnuplot and point it to the file of commands (e.g., 'gnuplot
gunplot.cmd') . How this is done depends upon the CGI program language
(see below).

3. Generate HTML that uses the generated graphics file as an embedded
image (using the  tag).

I have done this in the past, but not recently. This should work for
Python (os.system("gnuplot gnuplot.cmd") or Perl (system("gnuplot
gnuplot.cmd") with suitable commands to execute external programs.

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: python newbie

2014-06-18 Thread Jim Gibson
In article ,
Maura E Monville  wrote:

> My supervisor has palmed me off with a python code, written by a
> collaborator, which implements an algorithm aimed at denoising the dose
> distribution (energy per unit mass) output from a radiation transport Monte
> Carlo code.
> My task is to translate the python code into a MatLab code.

Interestingly, I was recently given the task of translating a Matlab
program into Python. Just the opposite of your problem. And I knew
neither!

> A colleague of mine and I stared at the python code for a while without
> understanding the logic. 
> To me that code looks like Chinese or Arabian.
> I don't have clear ideas about the syntax and python variables. For instance,
> in MatLab there is the cell array storage type to host data of different
> type. I have no idea how a 3D matrix can be loaded through python.

Does the Python program use scipy and numpy modules? Those, in my short
experience, provide the closest equivalent to Matlab matrices and
functions. They allowed me to do almost a line-by-line translation of
Matlab code into Python. The h5py module allows you to read and write
Matlab "save" files in Python, too.

Some of the differences between Matlab and Python:

1. Matlab array indexing starts at 1; Python starts at 0.
2. Matlab uses parentheses () for sequence indexing; Python uses
brackets [].
3. Matlab arrays are column-major order; Numpy arrays are row-major
order,

Post Python code here or provide a link to code posted elsewhere for
additional help.

Good luck.

-- 
Jim Gibson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: A Sort Optimization Technique: decorate-sort-dedecorate

2006-08-28 Thread Jim Gibson
In article <[EMAIL PROTECTED]>, Marc 'BlackJack'
Rintsch <[EMAIL PROTECTED]> wrote:

> In <[EMAIL PROTECTED]>, Tom Cole wrote:
> 
> > In Java, classes can implement the Comparable interface. This interface
> > contains only one method, a compareTo(Object o) method, and it is
> > defined to return a value < 0 if the Object is considered less than the
> > one being passed as an argument, it returns a value > 0 if considered
> > greater than, and 0 if they are considered equal.
> > 
> > The object implementing this interface can use any of the variables
> > available to it (AKA address, zip code, longitude, latitude, first
> > name, whatever) to return this -1, 0 or 1. This is slightly different
> > than what you mention as we don't have to "decorate" the object. These
> > are all variables that already exist in the Object, and if fact make it
> > what it is. So, of course, there is no need to un-decorate at the end.
> 
> Python has such a mechanism too, the special `__cmp__()` method
> has basically the same signature.  The problem the decorate, sort,
> un-decorate pattern solves is that this object specific compare operations
> only use *one* criteria.

I can't believe I am getting drawn into a thread started by xahlee, but
here goes anyway:

The problem addressed by what is know in Perl as the 'Schwartzian
Transform' is that the compare operation can be an expensive one,
regardless of the whether the comparison uses multiple keys. Since in
comparison sorts, the compare operation will be executed N(logN) times,
it is more efficient to pre-compute a set of keys, one for each object
to be sorted. That need be done only N times. The sort can then use
these pre-computed keys to sort the objects. See, for example:

http://en.wikipedia.org/wiki/Schwartzian_transform

-- 
Jim Gibson

 Posted Via Usenet.com Premium Usenet Newsgroup Services
--
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
--
http://www.usenet.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fortran vs Python - Newbie Question

2007-03-27 Thread Alex Gibson

"Beliavsky" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Mar 26, 10:16 am, [EMAIL PROTECTED] (Cameron Laird) wrote:
>> In article 
>> <[EMAIL PROTECTED]>,[EMAIL PROTECTED] 
>> <[EMAIL PROTECTED]> wrote:
>
>> >Is there a mac version??
>> >Thanks
>> >Chris
>>
>> Yes.
>>
>> Several, in fact--all available at no charge.  The Python
>> world is different from what experience with Fortran might
>> lead you to expect.
>
> Your experience with Fortran is dated -- see below.
>
>>
>> I'll be more clear:  Fortran itself is a distinguished
>> language with many meritorious implementations.  It can be
>> costly, though, finding the implementation you want/need
>> for any specific environment.
>
> Gfortran, which supports Fortran 95 and a little of Fortran 2003, is
> part of GCC and is thus widely available. Binaries for g95, also based
> on GCC, are available for more than a dozen platforms, including
> Windows, Mac OS X, and Linux. I use both and consider only g95 mature,
> but gfortran does produce faster programs. Intel's Fortran compilers
> cost about $500 on Windows and Mac OS and $700 on Linux. It's not
> free, but I would not call it costly for professional developers.
>
> Speaking of money, gfortran and g95 have free manuals, the latter
> available in six languages
> http://ftp.g95.org/ . Final drafts of Fortran standards, identical to
> the official ISO standards, are freely available. The manual for Numpy
> costs $40 per copy.

Sun also provides its sun studio ide and compilers(c , c++ and fortran) free 
of charge on x86 linux
just have to register for free.
http://developers.sun.com/sunstudio/downloads/

Alex 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Mathematica 7 compares to other languages

2008-12-11 Thread Jim Gibson
In article
<5ebe5a7d-cbdf-4d66-a816-a7d2a0a27...@40g2000prx.googlegroups.com>, Xah
Lee  wrote:

> On Dec 10, 2:47 pm, John W Kennedy  wrote:
> > Xah Lee wrote:
> > > In lisp, python, perl, etc, you'll have 10 or so lines. In C or Java,
> > > you'll have 50 or hundreds lines.
> >
> > C:
> >
> > #include 
> > #include 
> >
> > void normal(int dim, float* x, float* a) {
> >     float sum = 0.0f;
> >     int i;
> >     float divisor;
> >     for (i = 0; i < dim; ++i) sum += x[i] * x[i];
> >     divisor = sqrt(sum);
> >     for (i = 0; i < dim; ++i) a[i] = x[i]/divisor;
> >
> > }
> >
> > Java:
> >
> > static float[] normal(final float[] x) {
> >     float sum = 0.0f;
> >     for (int i = 0; i < x.length; ++i) sum += x[i] * x[i];
> >     final float divisor = (float) Math.sqrt(sum);
> >     float[] a = new float[x.length];
> >     for (int i = 0; i < x.length; ++i) a[i] = x[i]/divisor;
> >     return a;
> >
> > }
> 
> Thanks to various replies.
> 
> I've now gather code solutions in ruby, python, C, Java, here:
> 
> € A Example of Mathematica's Expressiveness
>   http://xahlee.org/UnixResource_dir/writ/Mathematica_expressiveness.html
> 
> now lacking is perl, elisp, which i can do well in a condensed way.
> It'd be interesting also to have javascript... and perhaps erlang,
> OCaml/F#, Haskell too.

Perl:

sub normal
{
  my $sum = 0;
  $sum += $_ ** 2 for @_;
  my $length = sqrt($sum);
  return map { $_/$length } @_;
}

-- 
Jim Gibson
--
http://mail.python.org/mailman/listinfo/python-list


determine file type

2006-03-26 Thread Mark Gibson
Is there an equivalent to the unix 'file' command?

[mark tmp]$ file min.txt
min.txt: ASCII text
[mark tmp]$ file trunk
trunk: directory
[mark tmp]$ file compliance.tgz
compliance.tgz: gzip compressed data, from Unix

What I really want to do is determine if a file is 1) a directory, 2) a 
text file 3) a binary file.

Is there a way to do this?

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: determine file type

2006-03-26 Thread Mark Gibson

> 
> 
> import os
> def test_file(filename, maxread=1024):
>if os.path.isdir(filename):
>  return 'directory'
>afile = open(filename) # open as text
>for achar in afile.read(maxread):
>  if ord(achar) > 127:
>return 'binary'
>return 'text'
> 
> 

Pefect, thanks!

-- 
http://mail.python.org/mailman/listinfo/python-list