Re: best strategy to deal with large index file

2005-12-16 Thread Dan Funk
Are there specific queries that cause the out of memory problem? Or will any query do it? How large is the index? MultiSearcher allows you to search over multiple indexes, and is well supported throughout the API. How you split your indexes is depends on what you want to achieve. There are many

Re: all stop words in exact phrase get 0 hits

2005-12-15 Thread Dan Funk
The latest binary "stable" release is 1.4.3. Though not officially released, Lucene 1.9 is available from the source code repository, and, IMHO, is more than ready for day to day use. You will need to check the code out with subversion or cvs via the apache code repository and build it your self.

Re: all stop words in exact phrase get 0 hits

2005-12-15 Thread Dan Funk
That is certainly the behaviour I would expect. The "+" means the term or phrase is required - you are requiring words that are not stored in your index. Why don't remove the "+"? Alternately you could run the search, and if no matches are found, run it again without the second argument. I've fo

Re: where to store the index

2005-12-11 Thread Dan Funk
27;ll be around 5.000 database records with three indexed fields: > id, title(1 line) and description(around three lines). I was even > considering using the in memory feature for faster access but I'm new to > lucene and I'don't know if that I'll cause my problems in the f

Re: where to store the index

2005-12-10 Thread Dan Funk
If this is a small index and it won't change after install (you are just using it to search, not to index), place it in a sub-directory of WEB-INF. If it is a larger index (something you don't want to copy frequently), or it will change after install, then you shouldn't keep it inside your web app

exporting and importing Lucene documents

2005-12-09 Thread Dan Funk
We build indexes, then share those indexes (along with files and database records) with our client installations. We now have multiple clients, and they are beginning to say things like, "I'd like this group of documents here, and this little bit over here, and ah yea that document there too

Broken Link to WordNet on Sandbox

2005-12-06 Thread Dan Funk
In the sandbox at http://lucene.apache.org/java/docs/lucene-sandbox/ There is a link to the WordNet repository: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/WordNet it should be: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/wordnet Where "wordnet" is not capitalized. J

Re: I Need System Design Suggestion. Please.

2005-11-07 Thread Dan Funk
I've run through exactly the same train of thought. Php is an efficient and effective web development language - Java provides excellent libraries for developing powerful business logic layer. Wouldn't it be nice to couple the two together? The answer is no, it would suck. You end up with some clus

Re: Folksonomies

2005-10-04 Thread Dan Funk
Thanks Erik, seems Otis keeps a very nice blog about simply http://blog.simpy.com/blojsom/blog/ that's full of helpful advice on the topic. On 10/4/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Oct 4, 2005, at 11:52 AM, mark harwood wrote: > > >> Is anyone out there incorporating folksonom

Folksonomies

2005-10-04 Thread Dan Funk
I've been reading about Folksonomies ( http://en.wikipedia.org/wiki/Folksonomy), and I would like to incorporate them into a project I'm developing with Lucene. The concept is pretty simple, a targeted community of users add labels of their choosing (just off the top of their head, not from a list

Re: Displaying search context

2005-09-23 Thread Dan Funk
aka more simplistic) and therefore probably more scalable? This is probably a question for the Nutch user list, but why doesn't Nutch use the Lucene Summarizer? Thoughts, comments? - Jeff -Original Message- From: Dan Funk [mailto:[EMAIL PROTECTED] Sent: Friday, September 23, 2005

Re: Displaying search context

2005-09-23 Thread Dan Funk
What you are doing is a good, scalable practice. You need to store those email messages somewhere outside of Lucene, and use a unique id to correlate the two. When you want to display relevant text for a search result, find the file on disk, and pass it through the Lucene Highlighter (see th

Re: Blackberry

2005-09-14 Thread Dan Funk
Yep, runs great on the zaurus and we got lucene running on an Ipaq 3970 as well (we used the Creme JVM). Not sure what you would need to do for the Blackberry, PDAs are so different, but I'd love to hear if you get it working. christopher may wrote: Well it is being run on the Sharp Zarus

Re: Index files in jar

2005-08-29 Thread Dan Funk
r any help. -Tom -- Dan Funk Software Engineer Information Technology Solutions Battelle Charlottesville Operations 1000 Research Park Boulevard, Suite 105 Charlottesville, Virginia 22911 434.984.0951 x244 434.984.0947 (fax) [EMAIL PROT

Re: Hierarchical Documents

2005-08-23 Thread Dan Funk
People indexing XML documents tend to deal with the same kind of problem, there is an excellent article at the URL below showing how they handled some fairly complex hierarchical queries. http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html Rohit Lodha wrote: Hi A

to filter or not to filter

2005-08-17 Thread Dan Funk
Currently I'm working with a single index where content is indexed by it's original printed page. I have to show the total number of matching documents, so I end up running through all the hits and taking an order of magnitude hit on performance as I calculate the number of unique documents. I

Distributable CD

2005-08-04 Thread Dan Funk
We deliver HTML web sites to our clients on a CD. It often remains on that CD, and they pass the CD around, and use it when they need to do research on some topic. We would like to offer them the ability to search the contents of the CD. We can not install any software on their windows mach

Re: Indexing Forums (Document & Field Paradigm)

2005-06-27 Thread Dan Funk
You could have a parentId field in each document - which will give you a nice hierarchy. You could also create a topicId (Linux, Microsoft, etc...) and a storyId. At that point you can quickly identify the topic and story for the message - and you can also search within a specific thread (AND

Re: IndexSearcher

2005-06-24 Thread Dan Funk
Lucene uses a lock file to prevent simultaneous writes to index. You can just delete the file at C:\DOCUME~1\tom\LOCALS~1\Temp\Lucene-81022e186820264e5b78801c219b8e8b-commit.lock and be on your way. avrootshell wrote: Hi, I'm using using lucene for full text search. It worked gr8. But now

Re: Mobile Lucene

2005-06-13 Thread Dan Funk
difficult to port - all I had was a web service- and that moved over without a hitch. christopher may wrote: What are you running as far as the OS ? And thanks for the responce. From: Dan Funk <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache

Re: Mobile Lucene

2005-06-13 Thread Dan Funk
code in the J2me wireless toolkit ? Any help would be appreciated, Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dan Funk Software Engineer Information Technology

Re: Using Highlighter to highlight entire HTML documents?

2005-05-25 Thread Dan Funk
r this, it seems that the term positions could still be useful? Any suggestions would be appreciated. Thanks, Fred Toth - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dan Fun

Re: Index Sizes

2005-05-17 Thread Dan Funk
2660 http://www.taluskie.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dan Funk Software Engineer Information Technology Solutions Battelle Charlottesville Operations 1000 Research Park Boulevar

Re: Alert function (aka "profiled alerting")

2005-03-16 Thread Dan Funk
I don't understand - this is all happening in the background right? Why not just add the document to the index, then execute all the queries (with an extra clause to restrict results to that document) and see what hits? Robert Watkins wrote: Okay, I only bought your book a few days ago, so I ha