On Wed, Dec 19, 2001 at 05:46:15PM -0500, McCollum, Frank wrote: > I do not understand what is meant by 'depth' in this module (i've read the > accompanying documentation, but I didn't follow it well). Does anyone know > where a good description is?
The HTML::Element documentation has an introduction to tree data structures. _Mastering Algorithms with Perl_ has some good information on data structures. It helps if you understand how HTML::TreeBuilder deals with HTML: it creates a tree data structure out of the elements. So, for example, the HTML code: <HTML> <HEAD> <TITLE>test</TITLE> </HEAD> <BODY> <H1>test</H1> </BODY> </HTML> Is, conceptually, turned into a tree something like this: TITLE H1 \ / HEAD BODY \ / HTML As you should be able to glean from my crude ascii drawings, elements that are nested within other elements are further up the tree. For example, the H1 is nested with the BODY tag, which is nested within the HTML tag. Now, I drew the tree that way because that's how people expect trees to be structured; branches going up. Tree data structures are never drawn that way, except in introductory texts. They are always drawn upside-down. HTML / \ BODY HEAD / \ H1 TITLE Think of it as the root system of a tree. Now, given all that, depth is easy. The depth is the distance from the top. depth 0 HTML / \ depth 1 BODY HEAD / \ depth 2 H1 TITLE > I basically want to go to a website and figure out what the 'depth' is of > a given table on that site, so that I can grab this table for later use. That's fairly simple. Just count how deeply nested the table is. For example: <HTML> <BODY> <TABLE><TR><TD></TD></TR></TABLE> </BODY> </HTML> Assuming a starting depth of 0, the table tag is at a depth of 2. The HTML tag is at depth 0, the BODY at 1, and thus the TABLE at 2. Just make sure to count only elements that it nests within, P tags included. You should, naturally, test with HTML::TreeBuilder to make sure you have the right element. Also, if you're only dealing with HTML table data, you may want to look at HTML::TableExtract. It hides a lot of the gory details for you. Hopefully I haven't butchered the subject to the point where you can't glean some information from it. You may still want to take a look at the sources of information I mentioned. Michael -- Administrator www.shoebox.net Programmer, System Administrator www.gallanttech.com -- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]