On Wed, 11 Jan 2017 14:46:29 -0800 (PST)
Nako Zeta <nakotoff...@gmail.com> wrote:

> Is there any package to read http directories?
> 
> For example to read an Apache index of /files/
> 
> I been doing this using goquery and parsing the HTML to know if its a 
> folder or a file but I am looking for better alternatives

Do you control that Apache's instance?

I mean, the path "/files/" in an URL served by a server is just the
location of a resource, and it can be served in several different ways
depending on multiple conditions -- for instance, whatever a client
sent in the Accept header of its HTTP request.

What your client receives now when requesting the resource by that URL
is a HTML document generated by Apache; such documents do not follow
any standard structure and are different between different server
implementations.  In other words, they are for humans to read them
rendered in their browsers.

What I'm leading you to, is that if you control the server, you can
stick your own handler to serve that resource (programmatically)
and then teach that handler to understand, say, "text/json" in the
client's Accept header and generate a JSON document in response -- which
is trivially parsable programmatically.

With Apache, it may be even possible to use a combinaion of
settings of mod_mime and mod_negotiation to have the Apache serve the
directory index by itself -- unless the client explicitly asked for
something more interesting like that index returned as JSON, -- in which
case your custom handler would be called.

If you do not control the server, I'm afraid the situation is no
different from any common "web scraping" task.  You might get help from
packages such as [1] for it, or by directly using [2].

1. https://godoc.org/github.com/PuerkitoBio/goquery
2. http://golang.org/x/net/html

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to