On Tuesday 12 Apr 2005 00:53, Eric Chow wrote:
> But how about one document contains more than two different languages ??
>
>
> Eric
If you're indexing many documents which contain multiple languages then it's
probably just better to use a SimpleAnalyser, rather than one that does any
language s
: You'll need some kind of lookup to know how to split a token like
: "cybercafe" into two words - once you've done that it will be easy to
: set the position increment of them to zero so that they overlay the
: original term.
but how would you set the position increment of a multi-word synonym s
Hi,
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > My problem is, however, that some words needs to have alternatives
> > where the word is decomposed / decompounded into two or more words:
> >
> > "FooBar Corp" or "cybercafe"
> >
> > should be found when searching for
> >
> > "Foo Ba*" or
But how about one document contains more than two different languages ??
Eric
On Apr 12, 2005 12:13 AM, Andy Roberts <[EMAIL PROTECTED]> wrote:
> On Monday 11 Apr 2005 14:55, Mike Baranczak wrote:
> > Your example with Arabic wouldn't work reliably either - there are
> > several other languages
In my application, by default I display all documents that are in the
index. I sort them either using a "time modified" or "time created".
If I have a newly created empty index, I find I get an error if I sort
by "time modified" but not "time created". In either case there are
actually n
Bill Tschumy wrote:
So, did this happen because he copied the data while in an inconsistent
state? I'm a bit surprised that an inconsistent index is ever left on
disk (except for temporarily while something is being written). Would
this happen if there was a Writer that was not closed?
An inde
Daniel Naber wrote:
Yes, the *.cfs shows that this is a compound index which has *.fnm files
only when it's being modified.
When creating a compound segment, a "segments" file is never written
that refers to the segment until the .cfs file is created and the .fnm
files are removed.
The real pro
cerberus yao wrote:
Does anyone knows how to add the Lucene search results with Line
number in original source content?
When you display each hit, first scan the text and build an array
containing the positions of each newline. Then use the highlighter (in
contrib/highlighter) to find fragment
Hi everybody,
I have some questions concerning using Lucene for Geo-searching. I have
a bunch of documents (> 100,000) in the index that all have a latitude
and longitude associated with them.
I wanted to be able to search within a certain radius of a point of
origin, which I accomplished by app
On Apr 11, 2005, at 9:36 AM, Peter Hotm. Nørregaard wrote:
According to "Lucene in Action" it is possible to get synonyms indexed
together with a word by putting multiple words with the same
position-id in the term vector.
My problem is, however, that some words needs to have alternatives
where
On Monday 11 Apr 2005 14:55, Mike Baranczak wrote:
> Your example with Arabic wouldn't work reliably either - there are
> several other languages that use the Arabic script (Persian for
> example).
Good point. Although you could try a simple approach to test for the
additional characters that exi
Oh, forgot your last question, thats why the field "line" has to be
stored, upon query you have to get the "line" number from the document
that represents the line and in "forward" / "back" actions you will
have sort the resultset by line value and print only chunks of that
result.
Mvh Karl Øi
Your example with Arabic wouldn't work reliably either - there are
several other languages that use the Arabic script (Persian for
example).
You could also try to pick out characters that are unique to a
particular language - for example, Ä or Å only occur in Polish (as far
as I know...). Of c
Yes, the biggest drawback is text spanning lines:
L1 - it was the best of times,
L2 - it was the worst of times
will return no hits for the search "it was the best of times, it was
the worst of times" (with quotes). because no single lucene document
contains the whole text alone.
I would be inte
But the "crash.java" is a just single document physically.
Do we have any drawback if we treat each line in "crash.java" as a doucment?
Another question:
If we need to present the search result with the hit lines plus n
lines forward and backword, how can I do this if each lines are
seperated in
According to "Lucene in Action" it is possible to get synonyms indexed
together with a word by putting multiple words with the same position-id in
the term vector.
My problem is, however, that some words needs to have alternatives where the
word is decomposed / decompounded into two or more wor
Daniel,
Thanks for responding on this thread. I doubt the copy was made while
the index was being updated and I don't see any indication of a crash.
Just for my clarification, if I update the index, but don't close the
IndexWriter (because I may need it again soon), can the index on disk
be le
> Now, I would like to obtain the List of all Terms (and their corresponding
> position) from each document (hits.doc(i)).
Try IndexReader.getTermFreqVector(), which will return an instance of
TermPositionVector when the corresponding field has been indexed with
storeTermVector==true.
--
Maik Sc
I've managed something like this from a slightly different perspective.
IndexReader ir = new IndexReader(yourIndex);
String searchTerm = "word";
TermPositions tp = ir.termPositions(new Term("contents", searchTerm);
tp.next();
int termFreq = tp.freq();
System.out.print(currentTerm.text());
On Apr 10, 2005, at 11:52 AM, Patricio Galeas wrote:
Hello,
I am new with Lucene. I have following problem.
When I execute a search I receive the list of document Hits.
I get without problem the content of the documents too:
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i)
On Apr 11, 2005, at 4:48 AM, Chris Lamprecht wrote:
I was attempting to cache QueryFilters in a Map using the Query as the
key (a BooleanQuery instance containing two RangeQueries), and I
discovered that my BooleanQueries' equals() methods would always
return false, even when the queries were equiv
Most indexing creates a Lucene document for each Source document. What
would need is to create a Lucene document for each line.
String src_doc = "crash.java";
int line_number = 0;
while(reader!=EOF) {
String line = reader.readLine();
Document ld = new Document();
ld.add(ne
Can you not provide the user with a option list to specify their input
language?
Language identification can be a pretty tricky field. There are some tricks
you can do with unicode to identify language, e.g., \u0600 - \u06FF contains
the Arabic characters, so if you're input contains lots of ch
Hello,
I am new with Lucene. I have following problem.
When I execute a search I receive the list of document Hits.
I get without problem the content of the documents too:
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i);
System.out.println(doc.get("content"));
}
N
I don't think you can figure out the language from the input box value
alone, i can't see any way to select the correct language analyzer at
this point. What you can do is to put Chinese, Japanese, English and
Dutch content in separate indexes and use multisearcher to search in
all of them, and
For instance look at http://www.zilverline.org/zilverlineweb/space/faq
Michael
Karl Øie wrote:
If you use a servlet and a HTML Form to feed queries to the
QueryParser take good care of all configurations around the servlet
container. If you, like me, use tomcat you might have to recode the
query
If you use a servlet and a HTML Form to feed queries to the QueryParser
take good care of all configurations around the servlet container. If
you, like me, use tomcat you might have to recode the query into
internal java form (utf-8) before you pass it to lucene.
read this:
http://www.crazysqui
Hello,
I am a beginner in using Lucene.
My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.
I am now developing a web-based search en
Hello,
I am a beginner in using Lucene.
My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.
I am now developing a web-based search
Hi, Lucene users:
Does anyone knows how to add the Lucene search results with Line
number in original source content?
for example:
I have a file "Test.java" which is indexed by lucene.
When I search inside the index, how to enhance the search result
with line number in Test.java?
I was attempting to cache QueryFilters in a Map using the Query as the
key (a BooleanQuery instance containing two RangeQueries), and I
discovered that my BooleanQueries' equals() methods would always
return false, even when the queries were equivalent. The culprit was
RangeQuery - it doesn't impl
31 matches
Mail list logo