Hello All,
I am looking for Japanese/Chinese stemmer . Does this exists ? do we
require it ?
(Analyser are already present in lucene)
I did a goggle and did not find any conclusive answer.
Thanks in advance
vinaya
-
To unsub
Hi all.
I'm trying to parallelise writing documents into an index. Let's set
aside the fact that 3.1 is much better at this than 3.0.x... but I'm
using 3.0.3.
One of the things I need to know is the doc ID of each document added
so that we can add them into auxiliary database tables which are ke
: I see, well if you say the norm isn't a problem for my case, I will just
: disable the coord factor by initializing BooleanQuery(true); and I should be
: done.
querynorm hsouldn't be a problem (since your booleanqueries all have hte
same structure, and odn't use query boosts ... i assume) but
I see, well if you say the norm isn't a problem for my case, I will just
disable the coord factor by initializing BooleanQuery(true); and I should be
done.
If this is not correct, please anybody let me know.
On 28 March 2011 11:44, Uwe Schindler wrote:
> Hi,
>
> As you seem to want to do very s
Hi,
As you seem to want to do very specific things, it might still be
interesting to provide a modified Similarity (by subclassing
DefaultSimilaity). You could then e.g. return also 1.0 to disable the
queryNorm() which may also be a problem (but it isn't for your queries).
Theoretically, you can c
ok thanks, I will pass well I dunno how to verify it. Even if I try then I
get some scores, but I dunno if comparing them is reliable.
On 28 March 2011 11:36, Uwe Schindler wrote:
> Hi,
>
> You don't need to extend BooleanQuery, you can just pass "true" in its
> ctor,
> see: http://s.apache.org
Hi,
You don't need to extend BooleanQuery, you can just pass "true" in its ctor,
see: http://s.apache.org/QvK
Of course you can also subclass DefaultSimilarity and return 1 as coord, but
that is more work than passing true to a ctor.
For your type of queries, disabling coord should be enough, bu
One more thing, instead of extending the BooleanQuery class to remove the
coord factor, can I also extend the Similarity class to do it ?
Still the other question is open: just to be sure, if I disable the coord
factor I can finally compare my BooleanQuery results ?
thanks
>
>
>
> On 28 March 20
Cool, so just to be sure, if I disable the coord factor I can finally
compare my BooleanQuery results ?
On 28 March 2011 10:11, Uwe Schindler wrote:
> Hi Patrick,
>
> You can disable the coord factor in the constructor of BooleanQuery.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63,
Hi Patrick,
You can disable the coord factor in the constructor of BooleanQuery.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> Sent: Monday,
Hi, thanks for reply.
Yeah, I've read the Similarity class documentation several times, but I need
some tip.
My queries are BooleanQueries but they always have the same structure (the
same structure of the docs, they are actually docs from collection): 3
fields.
What if I simplify the similarity
No, scores are in general not comparable between different queries. The
problem lies in many things:
- Each query has a norm factor that makes it more compareable if they are
sub clauses of a BooleanQuery. But you are right, this norm factor should be
the same.
- Some queries like FuzzyQuery rely
Hi,
sorry I've already asked few days ago, but I got no reply and I really need
some help on this..
I'm running several queries against a doc collection. The queries are
documents of the collection itself, I need to measure how similar is each
document to the rest of the collection.
Now, Lucene
thanks, solved
On 28 March 2011 09:30, Uwe Schindler wrote:
> Hi,
>
> Replace the "stupid":
> writer = new BufferedWriter(new FileWriter(fileOutput));
>
> by:
> writer = new BufferedWriter(new OutputStreamWriter(new
> FileOutputStream(fileOutput), "UTF-8"));
>
> Unfortunately, you cannot give a
Hi,
Replace the "stupid":
writer = new BufferedWriter(new FileWriter(fileOutput));
by:
writer = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(fileOutput), "UTF-8"));
Unfortunately, you cannot give a charset to FileWriter itself.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213
hi, I'm using my own code:
Writer writer = null;
try {
//File fileOutput = new File("output.trectext");
File fileOutput = new File(args[1]);
writer = new BufferedWriter(new FileWriter(fileOutput));
writer.write(contents.toString());
} catch (FileNotFoundException e) {
e.printStackTrace();
} cat
Hi,
You have to give the Charset when creating the Writer. If you give no
charset, Java uses the platform default. This question has nothing to do
with Lucene, it is better suited at an XML or JAVA general forum.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
java -Dfile.encoding=utf-8
should do the trick.
Or... which java app are you using?
paul
Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
> When I run my Lucene app and a parse a xml file I get the following error
> due to some fonts such as "é" written in the text file.
>
> If I save the
When I run my Lucene app and a parse a xml file I get the following error
due to some fonts such as "é" written in the text file.
If I save the text file as UTF-8 with my text editor I don't have this
issue, but when I create it with a java app, it is saved as MacRoman.
How can I specify a differ
19 matches
Mail list logo