Re: A Bit of a Strange Situation

RiverWind Fri, 26 Aug 2011 05:47:42 -0700


Hey There,


I wish to sincerely thank everyone who has responded to my queries
concerning the Linux Cookbook. Very few if any of your instructions
and/or descriptions have interfered with my screen-reader. I would
have disclosed initially the fact that I was a speech user, but I
honestly thought that everyone in the group was blind. This is
chiefly because I tend to subscribe to blind user groups for
technical issues. I have been disabused of the misguided notion
that all sighted people use windows and to hell with everything
else. Many blind computer users turn to Linux in order to escape
the necessity of having to use MS Windows.

If I am not mistaken, in order to run any script, it must be saved
in a certain format. How is that done?

cheerio,
Riv

Feel free to visit my website and my blog and learn more about me
and what I stand for.
My Website @ http://riverwind.shellworld.net
My Blog http://windraven13.livejournal.com/

On Thu, 25 Aug 2011, Bob Proulx wrote:

RiverWind wrote:

You see, the files have a bit of an unconventional extension, to
wit "cookbook3.html#SEC1 or cookbook14.html#SEC2" and so on. You
see, the first number before the ".html" I believe designates the
part, and the number following the "#SEC" indicates the different
sections in the respective parts of the book.


I see your problem now.  You are being confused by the types of links
used in the document.  This is simply a misunderstanding.  I think I
can clear this up for you.  When I look at this next URL:

 http://www.dsl.org/cookbook/cookbook_toc.html

I see a table of contents in 45 parts.  But each link has an id tag
associated with it to jump into the middle of the part.  Each link
jumps to the sub-section of the chapter.  This is how it is making it
convenient for readers to jump to the sub-part of the document.  But
you should ignore those.  They are not separate files.  They are
anchor tags in the middle of the section.

Let me dive into a little detail of the anchors.  But do keep reading
because after this I will show you how to solve your problem.

Let me repeat the html of the very first link on the page.  This might
confuse your screen reader and if so give me a hint on how I should
represent verbatim html text and I will be happy to do so.

 <A NAME="TOC1" HREF="cookbook_1.html#SEC1">Preface</A>

That generates a link to cookbook_1.html#SEC1 as you already know.
But that "#SEC1" part is simply an anchor with an id attribute to jump
into the middle of a page.

Here let me repeat the html of the part it jumps to:

 <H1><A NAME="SEC1" HREF="cookbook_toc.html#TOC1">Preface</A></H1>

Each sub-section is referenced in this way.
You can read and learn more about these here at this URL to the World
Wide Web Consortium reference documentation page.  It itself uses an
id anchor to jump to the particular part of the document that
references these.

 http://www.w3.org/TR/html4/struct/links.html#h-12.2.3

This would tend to make the use of wild cards a bit ticklish.


Actually, no.  Even if those were the filenames you could simply match
them with a wildcard with no problem.  But let's not talk about that
for a moment since it isn't important.  Let's help get you going in
the direction of solving your actual problem and not the side tracking
problem.

If I could just figure around this problem however, I would be in
business, because html2txt conversions would be easy, and the
concatenation even easier.


There are 45 links on the page.  They are named and numbered very
regularly.  You can simply write a for-loop to walk over all 45 of
them.  Let me say a three line shell script snippet that will do this
for you.

 for chapternum in $(seq 1 45); do
   wget http://www.dsl.org/cookbook/cookbook_$chapternum.html
 done

Let me describe it with some verbosity hoping that it will make it
easier for your reader.  The 'seq' command generates a sequence of
numbers.  Here I am calling "seq 1 45" to generate the numbers from 1
through 45 inclusive.  Those are called within a dollar-parenthesis
command substitution to place those 45 numbers on the comand line for
the for-loop to iterate over.  Then the for-loop walks through each in
turn setting the variable named "chapternum" to the current index
value.  Then the wget command uses that dollar chapter num variable to
create the URL to pull each chapter in turn.  The "#SEC" parts are not
really in the filename nor should they be in the filename.

Running that three line shell script snippet should produce 45 files
called chapter_1.html through chapter_45.html in the current
directory.  I think at that point you should be okay to convert each
in turn to plain text.

Hope that helps,
Bob

--

To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.orgwith a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: 
http://lists.debian.org/pine.bsf.4.64.1108260842030.37...@server1.shellworld.net

Re: A Bit of a Strange Situation

Reply via email to