Mark's message was very nice to see about LIA, but I want to reply and second Eric's comments about the Lucene distributable. I just downloaded and unpacked the 2.0 .zip to test out a newbie to Lucene, but Java savvy (our target audience), experience.

I opened docs/index.html and the first thing I noticed was a broken image link to the ASF logo in the upper left corner.

Eric has some great points, that I'll reply to below...

On Dec 26, 2006, at 4:36 PM, Haszlakiewicz, Eric wrote:
I'm sorry you are not finding what you need.   The snowball analyzers
come in a separate jar, in the release zip, under the contrib/
snowball directory.  You may also want/need the analyzers in contrib/
analyzers for other languages.  The README delivered w/ the release

uh.. maybe I'm being dense, but where exactly would I find this directory?

In the unpacked lucene-2.0.0.zip file, there is a contrib directory with lots of goodies hidden in plain sight there. You're certainly right that there is very little documentation available for this stuff, even in javadocs. We should leverage the Java Lucene wiki (which needs to be moved from jakarta-lucene URL structure) a lot more, ala Solr, to let the community contribute to the documentation area much more freely on all these pieces.

It seems that all of the mirrors I look at don't have it, nor even does the
main(?) url  (i.e. http://www.apache.org/dist/lucene/java/)

        <http://www.apache.org/dyn/closer.cgi/lucene/java/>

which is "free download" from here <http://lucene.apache.org/java>

Have you gone through the demo and the "Getting Started" section:
http://lucene.apache.org/java/docs/gettingstarted.html ?

yeah, I did, but all I found there was some info about a demo application,
a link to the aforementioned download directory, and some links to the
online sources through svn.

Sadly, our demo application is pretty pathetic. A better index/ search demo could fairly easily be whipped up that actually was real- world usable without even writing code for basic file system document searching. The demo is barely usable in this capacity. Again, we should look to Solr to shed some light (heh) on this path.

A real-world usable demo (you know, like command-line switches to control the indexers parameters a lot more, and customizable output from the command-line searcher (what fields, CSV/XML output), with the web application providing some computer usable response (doesn't have to be nearly as fancy as Solr here, even *gasp* just XML would do the trick).

There are a number of articles, presentations and books available,
many of which are listed at http://wiki.apache.org/jakarta-lucene/
Resources

I'll take a closer look. I just figured the best documentation would be on
the actual lucene site.

Point well taken, Eric.  I concur.

was hoping to find an online overview of how things are supposed to work. i.e. some thing that explains what the important classes are, how to use
them, etc..  Also a definition of vocabulary would be nice.
Here's a just a brief selection of questions that I had (have)
What is javacc? Why do I care? What is snowball? What is a stemmer, and
how/why would I use one.  What's a term vector?  What happens when you
add a the same field to a document twice? How do I combine two queries?
(I figured some of these out, don't answer them now)

All very poignant points. We do have javadocs, which is ultimately where the low-level API stuff should go, and with decent summary pages we can guide users to the important classes. We have good stuff at the core Lucene API, but now that we are blending the contrib pieces in (as well see below) we've lowered our overall published documentation quality. We need to hold the contrib pieces to at least core API documentation standards for acceptance into the codebase.

A definition of vocabulary fits perfectly on the wiki.

javacc should be fairly well hidden in the documentation as it is Lucene developer related, not end-user related (at least not for an initial user of Lucene, only after getting familiar with QueryParser should one really be venturing into javacc land).

Snowball would fit well into a glossary wiki area, as would term vector - with of course links to the appropriate API documentation.

Adding the same field multiple times ought to be on our FAQ wiki page (I just looked, it's not, and "searching" comes before "indexing" in the FAQ sections, awkwardly). Those FAQ URLs, *ugh*, we really need an overhaul of that area structurally.

 I was able to pick out some info from the faq, but a lot of
that seems to assume you already know what you're doing.

Mea culpa.

  I ended up
doing a lot of trial and error to get things going.

As we all did. And once most of us have it figured out it seems so easy and obvious that going back and writing documentation has little appeal. Your bringing these issues up to the list is an atypical and welcome step. It brings our weaknesses to the forefront and is likely to spark positive changes.

For instance, it took me forever just to figure out how to combine a couple of queries together. The apparently appropriately named AndQuery class, isn't what it seems, and the javadocs don't say anything that would point me
towards the correct class (which seems to be BooleanQuery)

Now that _is_ confusing. Ouch. This comes from blending in the contrib javadocs and sure enough AndQuery is exactly what I would have expected to use myself. That change to the API docs, having the contrib blended in, is new to 2.0, and our contrib pieces are not as well javadoc'd as the core. I can see having the contrib stuff javadoc'd separately, but it is also nice to see it all blended as well. I'd love to see others thoughts on how to make this a better Newcene experience.

AndQuery is part of the surround query language, which has likely not gotten much usage in the field - only niche environments would use it, I think. Having it named that way, and near the top of the API list is way misleading.

I know much of what I just said is basically just complaining about the level of documentation, and I'd be happy to help, but I'm still feeling
a bit overwhelmed with the amount of implied knowledge that seems
to be necessary, so picking out specific places is a bit difficult.

I suppose the most useful thing would be a better getting started guide that actually explains how things work, rather than just saying "look at
this app".



We will learn from your experience, thanks to your forwardness as well as the specific details of where things are lacking. There are some easy steps we can take to get things improved for our next release:

* Leverage the wiki lots more for a glossary, quick start user guides, and FAQs (revamping the wiki structure, renaming the top-level URL would go a long way to encouraging its use, and learning from Solr's great lead)

   * Tighten up our API docs specifically on the contrib pieces.

* Tidy up and generalize the demo application, ship Luke too (if possible, licensing-wise).

        Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to