e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
Regards,
Dave Kor
If you scrolled down the page, there is a download link to the data
files. There's no need to use the search form.
On 9/4/06, Dejan Nenov <[EMAIL PROTECTED]> wrote:
Unfortunately the term search at the site is down - gives 500 internal
server error.
-Original Message-
Fro
nglish
language? Obviously it would differ by corpus but I would like to see
what's already available.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Dave Kor, PhD Candidate
Not sure if this is what you want, but what I have done is to issue
exact phrase queries to Lucene and counted the number of hits found.
On 2/23/06, Eric Jain <[EMAIL PROTECTED]> wrote:
> This is somewhat related to a question sent to this list a while ago: Is
> there an efficient way to count the
L PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
Dave Kor, Research Assistant
Center for Information Mining and Extraction
School of Computing
National University of Singapore.
-
To unsubscribe, e-
thms and papers which can help me in
> > building an effective Relevance Feedback system?
> >
> > Thanks in advance.
> >
> > Dexter.
> >
>
>
> -----
> To unsubscribe, e-mail: [EMAIL PRO
is purpose by creating a second index that
stores all unique queries and their set of relevant docids as Lucene
Documents. Instead of indexing text terms, we index docids. Finding
queries similiar to the original query, Q, is a simple matter of
querying this second index with the set of docids relevent
If reindexing doesn't take too much time and effor, you can reindex
using the PerFieldAnalyzerWrapper to have different analyzers for each
field.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
this nature and what kind of
> request time should be expected from Lucene?
>
> thanks
> ori
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
gt;Hits2 contains records numbers 3,6,8,9
>
> Now I need a solution which can give the hits object which contains 3,6
> records
>
You can iterate through the Hits objects, flagging the document
numbers in a java.util.BitSet. To compare hits between different
queries, all you hav
Do we need to check if any documents are marked for deletion?
On 1/12/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> I don't think we have a public API for that, but the index is considered
> optimized when it contains only a single segment.
> Then, we could add the following to IndexReader:
>
Hi,
I would like to associate information (or labels) with each word or a
range of words in a document. Information such as this word is a noun, that
word is a verb, this period marks the end of a sentence, "kick the bucket"
is a contiguous phrase, "white house" is a location and so on. I am see
On 12/27/05, K.A.Hussain Ali <[EMAIL PROTECTED]> wrote:
> HI all.
>
> I am a newbie to Lucene..
> Could we do indexing and deleting a document on the same file simultaneously ?
At any one time, there can only be a single Lucene index writer and
any number of index readers. You cannot have two diff
topic (Eg, Tell me all
there is to know about the Grand Canyon). Again, a set of documents
might each describe a single aspect about the Grand Canyon. To build a
complete picture, we may need to sample most documents that mention
the Grand Canyon.
I hope this helps.
Regards,
Dave Kor.
On 12/19/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I know that lucene index takes a directory of files to be indexed and
> builds the index. Now is there a way to specify the number of files from
> the directory to be indexed?
>
> I mean if I have a directory of 10,000 files and I
On 12/17/05, Jeff Liang <[EMAIL PROTECTED]> wrote:
> thanks for the reply.
> I'm indexing emails. Fields are the common attribute on emails:
> subject, content, attachment, message size, date, sender, recipients,
> etc. The index is a few GB. Is there a good practice to keep the index
> file siz
On 12/13/05, Dave Kor <[EMAIL PROTECTED]> wrote:
> On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote:
> > Paul Libbrecht <[EMAIL PROTECTED]> writes:
> >
> > > We're also thinking about implementing something similar to LSI within
> > > Ac
On 12/13/05, Ian Soboroff <[EMAIL PROTECTED]> wrote:
> Paul Libbrecht <[EMAIL PROTECTED]> writes:
>
> > We're also thinking about implementing something similar to LSI within
> > ActiveMath which is lucene-powered where both formulae and text
> > searching would benefit of the latent-semantic-simil
Quoting Martin Rode <[EMAIL PROTECTED]>:
> Hi everybody,
>
> Has anyone tried to code a solution like Google's "Did you mean?" in
> Lucene?
>
> I would be very happy to hear your ideas, approaches, suggestions.
I know that what Google does is look at consecutive queries by the same user
that are
http://java.sun.com/docs/books/tutorial/i18n/text/stream.html
Yes, its confusing. Sun calls its own encoding format as "Unicode" and the above
webpage talks about how to convert between Java's Unicode format and the UTF-8
format.
Its just a matter of specifying "UTF-8" when creating output strea
Quoting Karl Koch <[EMAIL PROTECTED]>:
> Hello all,
>
> I would like to know about papers that where written and used Lucene as the
> unerlying search engine. E.g. Lucene as baseline search engine and some
> modifications to compare it with baseline Lucene system etc.
>
> Please provide links to p
Quoting Andrew Boyd <[EMAIL PROTECTED]>:
> I did a small demonstration application using lucene's range query and it
> worked fine.
> I didn't use a DB at all
>
>
> "Mosul_Iraq.html", "E043.13535"
> "Mosul_Iraq.html", "N36.33608"
>
> Having the directional (E, W, N, S) worked out well
>
> Andrew
Quoting Rajesh Munavalli <[EMAIL PROTECTED]>:
> Let me explain a scenario where I would need to add the n-grams at
> indexing time.
I see your point and I do agree. As it stands, Lucene does not innately support
n-gram indexing. However it is not impossible to adapt Lucene to serve as an
n-gram i
I was just wondering, if I set the boost factor in SpanQueries such as the
SpanNearQuery or SpanOrQuery, does it get used?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Quoting [EMAIL PROTECTED]:
> Hi everybody,
>
> which kind of retrieval model is lucene using? Is it a simple vector model,
> a extended boolean model or another model? A reliable source with
> information about it would be fine, cause every source i found is telling
> something different. :)
>
Lu
I have a system that automatically generate span queries to Lucene. Sometimes,
the system generates a query like this one which always throws a
RuntimeException:
spanNear([spanNear([text:interesting], 3, true), spanNear([text:interesting,
text:john, text:said], 8, true)], 2, true)
Basically, the
Quoting Peter Laurinc <[EMAIL PROTECTED]>:
> Hi,
>
> I'm newbie to lucene.
> I wan to ask, how to implement search for phrase that must be in
> sentence/paragraph.
> I did see som examples, that uses term position changing, but I think
> that this is not the way, because it breaks classic proximit
Quoting Dave Kor <[EMAIL PROTECTED]>:
> Quoting Erik Hatcher <[EMAIL PROTECTED]>:
>
> > Anyone tried this technique with Lucene?
>
> Actually, the problem is that the wildcard code has to search over a large
> subset of terms because the list of terms is, well
Quoting Erik Hatcher <[EMAIL PROTECTED]>:
> Anyone tried this technique with Lucene?
Actually, the problem is that the wildcard code has to search over a large
subset of terms because the list of terms is, well, a linear structure.
If, for example, all terms in the index is arranged as a suffix
r example), providing a fast way to find duplicates at
> search time.
>
> If you can give more details on your requirements, people in this list
> can probably come up with some pretty good solutions.
>
> -chris
>
> On 6/12/05, Dave Kor <[EMAIL PROTECTED]> wrote:
> > Hi
d grouping sentences using their hashCodes() and then do a pairwise compare
between sentences that has the same hashCode, but even with a 1GB heap I ran
out of memory after comparing 200k sentences.
Any other ideas?
Regards
Dave Kor.
--
Quoting Chris Hostetter <[EMAIL PROTECTED]>:
> : I'm in need of a special version of the phrase query. For example, given a
> : search phrase "alpha beta gamma", I'ld like a to score documents something
> like
> : the following manner.
>
> it sounds like what you want isn't really a special type o
document contains "alpha gamma" score = 0.666
If document contains "alpha" score = 0.333
If document contains "beta" score = 0.333
If document contains "gamma" score = 0.333
Has anyone done something l
Quoting Andrzej Bialecki <[EMAIL PROTECTED]>:
> Regarding Luke - actually, it would not be so difficult to implement
> this (at least for me ;-) ). Save for some minor exceptions, Luke opens
> an IndexReader once, and I could add another version of the Open dialog
> to use open multiple indexes.
>
34 matches
Mail list logo