Re: Lucene's Mean Average Precision

2008-05-15 Thread Dave Kor
e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Dave Kor

Re: word frequency list?

2006-09-04 Thread Dave Kor
If you scrolled down the page, there is a download link to the data files. There's no need to use the search form. On 9/4/06, Dejan Nenov <[EMAIL PROTECTED]> wrote: Unfortunately the term search at the site is down - gives 500 internal server error. -Original Message- Fro

Re: word frequency list?

2006-09-03 Thread Dave Kor
nglish language? Obviously it would differ by corpus but I would like to see what's already available. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Dave Kor, PhD Candidate

Re: Frequency of phrase

2006-02-23 Thread Dave Kor
Not sure if this is what you want, but what I have done is to issue exact phrase queries to Lucene and counted the number of hits found. On 2/23/06, Eric Jain <[EMAIL PROTECTED]> wrote: > This is somewhat related to a question sent to this list a while ago: Is > there an efficient way to count the

Re: TREC,INEX and Lucene

2006-02-22 Thread Dave Kor
L PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Dave Kor, Research Assistant Center for Information Mining and Extraction School of Computing National University of Singapore. - To unsubscribe, e-

Re: Relevance Feedback Lucene+Algorithms

2006-02-15 Thread Dave Kor
thms and papers which can help me in > > building an effective Relevance Feedback system? > > > > Thanks in advance. > > > > Dexter. > > > > > ----- > To unsubscribe, e-mail: [EMAIL PRO

Re: Related searches

2006-02-01 Thread Dave Kor
is purpose by creating a second index that stores all unique queries and their set of relevant docids as Lucene Documents. Instead of indexing text terms, we index docids. Finding queries similiar to the original query, Q, is a simple matter of querying this second index with the set of docids relevent

Re: Keyword fields, Porter stemming, and QueryParser

2006-01-24 Thread Dave Kor
If reindexing doesn't take too much time and effor, you can reindex using the PerFieldAnalyzerWrapper to have different analyzers for each field. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: performance implications for an index with large number of documents.

2006-01-23 Thread Dave Kor
this nature and what kind of > request time should be expected from Lucene? > > thanks > ori > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --

Re: :intersection of two hits objects:

2006-01-18 Thread Dave Kor
gt;Hits2 contains records numbers 3,6,8,9 > > Now I need a solution which can give the hits object which contains 3,6 > records > You can iterate through the Hits objects, flagging the document numbers in a java.util.BitSet. To compare hits between different queries, all you hav

Re: How to check, whether Index is optimized or not?

2006-01-12 Thread Dave Kor
Do we need to check if any documents are marked for deletion? On 1/12/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I don't think we have a public API for that, but the index is considered > optimized when it contains only a single segment. > Then, we could add the following to IndexReader: >

Good representation for part-of-speech, chunk, sentence boundary tags?

2006-01-03 Thread Dave Kor
Hi, I would like to associate information (or labels) with each word or a range of words in a document. Information such as this word is a noun, that word is a verb, this period marks the end of a sentence, "kick the bucket" is a contiguous phrase, "white house" is a location and so on. I am see

Re: Indexing and deleting simultaneously..

2005-12-27 Thread Dave Kor
On 12/27/05, K.A.Hussain Ali <[EMAIL PROTECTED]> wrote: > HI all. > > I am a newbie to Lucene.. > Could we do indexing and deleting a document on the same file simultaneously ? At any one time, there can only be a single Lucene index writer and any number of index readers. You cannot have two diff

Re: ApacheCon next week

2005-12-27 Thread Dave Kor
topic (Eg, Tell me all there is to know about the Grand Canyon). Again, a set of documents might each describe a single aspect about the Grand Canyon. To build a complete picture, we may need to sample most documents that mention the Grand Canyon. I hope this helps. Regards, Dave Kor.

Re: Index Question

2005-12-18 Thread Dave Kor
On 12/19/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi, > > I know that lucene index takes a directory of files to be indexed and > builds the index. Now is there a way to specify the number of files from > the directory to be indexed? > > I mean if I have a directory of 10,000 files and I

Re: best strategy to deal with large index file

2005-12-16 Thread Dave Kor
On 12/17/05, Jeff Liang <[EMAIL PROTECTED]> wrote: > thanks for the reply. > I'm indexing emails. Fields are the common attribute on emails: > subject, content, attachment, message size, date, sender, recipients, > etc. The index is a few GB. Is there a good practice to keep the index > file siz

Re: Lucene + LSI

2005-12-13 Thread Dave Kor
On 12/13/05, Dave Kor <[EMAIL PROTECTED]> wrote: > On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote: > > Paul Libbrecht <[EMAIL PROTECTED]> writes: > > > > > We're also thinking about implementing something similar to LSI within > > > Ac

Re: Lucene + LSI

2005-12-12 Thread Dave Kor
On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote: > Paul Libbrecht <[EMAIL PROTECTED]> writes: > > > We're also thinking about implementing something similar to LSI within > > ActiveMath which is lucene-powered where both formulae and text > > searching would benefit of the latent-semantic-simil

Re: Did you mean?

2005-08-29 Thread Dave Kor
Quoting Martin Rode <[EMAIL PROTECTED]>: > Hi everybody, > > Has anyone tried to code a solution like Google's "Did you mean?" in > Lucene? > > I would be very happy to hear your ideas, approaches, suggestions. I know that what Google does is look at consecutive queries by the same user that are

Re: Lucene does NOT use UTF-8.

2005-08-28 Thread Dave Kor
http://java.sun.com/docs/books/tutorial/i18n/text/stream.html Yes, its confusing. Sun calls its own encoding format as "Unicode" and the above webpage talks about how to convert between Java's Unicode format and the UTF-8 format. Its just a matter of specifying "UTF-8" when creating output strea

Re: Lucene in IR Research

2005-08-26 Thread Dave Kor
Quoting Karl Koch <[EMAIL PROTECTED]>: > Hello all, > > I would like to know about papers that where written and used Lucene as the > unerlying search engine. E.g. Lucene as baseline search engine and some > modifications to compare it with baseline Lucene system etc. > > Please provide links to p

Re: Lucene vs Derby (vs MySQL) for spatial indexing

2005-07-28 Thread Dave Kor
Quoting Andrew Boyd <[EMAIL PROTECTED]>: > I did a small demonstration application using lucene's range query and it > worked fine. > I didn't use a DB at all > > > "Mosul_Iraq.html", "E043.13535" > "Mosul_Iraq.html", "N36.33608" > > Having the directional (E, W, N, S) worked out well > > Andrew

RE: n-gram indexing

2005-07-24 Thread Dave Kor
Quoting Rajesh Munavalli <[EMAIL PROTECTED]>: > Let me explain a scenario where I would need to add the n-grams at > indexing time. I see your point and I do agree. As it stands, Lucene does not innately support n-gram indexing. However it is not impossible to adapt Lucene to serve as an n-gram i

Boosting SpanQueries

2005-07-06 Thread Dave Kor
I was just wondering, if I set the boost factor in SpanQueries such as the SpanNearQuery or SpanOrQuery, does it get used? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Retrieval model used by Lucene

2005-07-04 Thread Dave Kor
Quoting [EMAIL PROTECTED]: > Hi everybody, > > which kind of retrieval model is lucene using? Is it a simple vector model, > a extended boolean model or another model? A reliable source with > information about it would be fine, cause every source i found is telling > something different. :) > Lu

Unexpected: ordered

2005-07-03 Thread Dave Kor
I have a system that automatically generate span queries to Lucene. Sometimes, the system generates a query like this one which always throws a RuntimeException: spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting, text:john, text:said], 8, true)], 2, true) Basically, the

Re: Sentence and Paragraph searching

2005-07-01 Thread Dave Kor
Quoting Peter Laurinc <[EMAIL PROTECTED]>: > Hi, > > I'm newbie to lucene. > I wan to ask, how to implement search for phrase that must be in > sentence/paragraph. > I did see som examples, that uses term position changing, but I think > that this is not the way, because it breaks classic proximit

Re: Question for Wildcard Search:

2005-06-23 Thread Dave Kor
Quoting Dave Kor <[EMAIL PROTECTED]>: > Quoting Erik Hatcher <[EMAIL PROTECTED]>: > > > Anyone tried this technique with Lucene? > > Actually, the problem is that the wildcard code has to search over a large > subset of terms because the list of terms is, well

Re: Question for Wildcard Search:

2005-06-23 Thread Dave Kor
Quoting Erik Hatcher <[EMAIL PROTECTED]>: > Anyone tried this technique with Lucene? Actually, the problem is that the wildcard code has to search over a large subset of terms because the list of terms is, well, a linear structure. If, for example, all terms in the index is arranged as a suffix

Re: Ideas Needed - Finding Duplicate Documents

2005-06-12 Thread Dave Kor
r example), providing a fast way to find duplicates at > search time. > > If you can give more details on your requirements, people in this list > can probably come up with some pretty good solutions. > > -chris > > On 6/12/05, Dave Kor <[EMAIL PROTECTED]> wrote: > > Hi

Ideas Needed - Finding Duplicate Documents

2005-06-12 Thread Dave Kor
d grouping sentences using their hashCodes() and then do a pairwise compare between sentences that has the same hashCode, but even with a 1GB heap I ran out of memory after comparing 200k sentences. Any other ideas? Regards Dave Kor. --

Re: A special PhraseQuery

2005-05-21 Thread Dave Kor
Quoting Chris Hostetter <[EMAIL PROTECTED]>: > : I'm in need of a special version of the phrase query. For example, given a > : search phrase "alpha beta gamma", I'ld like a to score documents something > like > : the following manner. > > it sounds like what you want isn't really a special type o

A special PhraseQuery

2005-05-20 Thread Dave Kor
document contains "alpha gamma" score = 0.666 If document contains "alpha" score = 0.333 If document contains "beta" score = 0.333 If document contains "gamma" score = 0.333 Has anyone done something l

Re: MultiSearcher GUI? Before/After query?

2005-05-18 Thread Dave Kor
Quoting Andrzej Bialecki <[EMAIL PROTECTED]>: > Regarding Luke - actually, it would not be so difficult to implement > this (at least for me ;-) ). Save for some minor exceptions, Luke opens > an IndexReader once, and I could add another version of the Open dialog > to use open multiple indexes. >