On Wed, Dec 19, 2001 at 05:46:15PM -0500, McCollum, Frank wrote:
> I do not understand what is meant by 'depth' in this module (i've read the
> accompanying documentation, but I didn't follow it well).  Does anyone know
> where a good description is?  

The HTML::Element documentation has an introduction to tree data structures. 
_Mastering Algorithms with Perl_ has some good information on data
structures.

It helps if you understand how HTML::TreeBuilder deals with HTML: it creates
a tree data structure out of the elements.  So, for example, the HTML code:

    <HTML>
        <HEAD>
            <TITLE>test</TITLE>
        </HEAD>

        <BODY>
            <H1>test</H1>
        </BODY>
    </HTML>

Is, conceptually, turned into a tree something like this:

    TITLE            H1
       \            /
        HEAD    BODY
           \    /
            HTML

As you should be able to glean from my crude ascii drawings, elements that
are nested within other elements are further up the tree.  For example, the
H1 is nested with the BODY tag, which is nested within the HTML tag.

Now, I drew the tree that way because that's how people expect trees to be
structured; branches going up.  Tree data structures are never drawn that
way, except in introductory texts.  They are always drawn upside-down.

            HTML
           /    \
        BODY    HEAD
       /           \
      H1           TITLE  

Think of it as the root system of a tree.  Now, given all that, depth is
easy.  The depth is the distance from the top.

    depth 0         HTML
                   /    \
    depth 1     BODY    HEAD
               /           \
    depth 2   H1           TITLE



> I basically want to go to a website and figure out what the 'depth' is of
> a given table on that site, so that I can grab this table for later use.

That's fairly simple.  Just count how deeply nested the table is.  For
example:

    <HTML>
        <BODY>
            <TABLE><TR><TD></TD></TR></TABLE>
        </BODY>
    </HTML>

Assuming a starting depth of 0, the table tag is at a depth of 2.  The HTML
tag is at depth 0, the BODY at 1, and thus the TABLE at 2.

Just make sure to count only elements that it nests within, P tags included. 
You should, naturally, test with HTML::TreeBuilder to make sure you have the
right element.

Also, if you're only dealing with HTML table data, you may want to look at
HTML::TableExtract.  It hides a lot of the gory details for you.


Hopefully I haven't butchered the subject to the point where you can't glean
some information from it.  You may still want to take a look at the sources
of information I mentioned.


Michael
--
Administrator                      www.shoebox.net
Programmer, System Administrator   www.gallanttech.com
--

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to