On Fri, Aug 21, 2009 at 12:49 AM, Chris
Hostetter wrote:
>
> : But in that case, I assume Solr does a commit per document added.
>
> not at all ... it computes a signature and then uses that as a unique key.
> IndexWriter.updateDocument does all the hard work.
Right - Solr used to do that hard wor
: But in that case, I assume Solr does a commit per document added.
not at all ... it computes a signature and then uses that as a unique key.
IndexWriter.updateDocument does all the hard work.
-Hoss
-
To unsubscribe, e-mai
Paul Cowan wrote:
> oh...@cox.net wrote:
> > - I'd have to create a (very small) index, for each sub-document, where I
> > do the Document.add() with just the (for example) two terms, then
> > - Run a query against the 1-entry index, which
> > - Would either give me a "yes" or "no" (for th
oh...@cox.net wrote:
- I'd have to create a (very small) index, for each sub-document, where I do
the Document.add() with just the (for example) two terms, then
- Run a query against the 1-entry index, which
- Would either give me a "yes" or "no" (for that sub-document)
As I said, I'm concerned
Paul Cowan wrote:
> oh...@cox.net wrote:
> > Document1 subdoc1 term1 term2
> > subdoc2 term1a term2a
> > subdoc3 term1b term2b
> >
> > However, I've now been asked to implement the ability to query t
oh...@cox.net wrote:
Document1 subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b
However, I've now been asked to implement the ability to query the sub-documents.
In other words, rather t
Hi,
I guess, that, in short, what I'm really trying to find out is:
If I construct a Lucene query, can I (somehow) use that to query a String
object that I have, rather than querying against a Lucene index?
Thanks,
Jim
oh...@cox.net wrote:
> Hi,
>
> This question is going to be a littl
Hi,
This question is going to be a little complicated to explain, but let me try.
I have implemented an indexer app based on the demo IndexFiles app, and a web
app based on the luceneweb web app for the searching.
In my case, the "Documents" that I'm indexing are a proprietary file type, and
e
Valery, have you tried to use whitespaceTokenizer / CharTokenizer and
do any further processing in a custom TokenFilter?!
simon
On Thu, Aug 20, 2009 at 8:48 PM, Robert Muir wrote:
> Valery, I think it all depends on how you want your search to work.
>
> when I say this, I mean for example: if a
Valery, I think it all depends on how you want your search to work.
when I say this, I mean for example: if a document only contains "C++"
do you want searches on just "C" to match or not?
another thing I would suggest is to take a look at the capabilities of
Solr: it has some analysis stuff that
Hi Robert,
so, would you expect a Tokenizer to consider '/' in both cases as a separate
Token?
Personally, I see no problem if Tokenzer would do the following job:
"C/C++" ==> TokenStream of { "C", "/", "C", "+", "+"}
and come up with "C" and "C++" tokens after processing through the
downstre
Hi Ken,
thanks for the comments. Well, Terrence's ANTLR was and is a good piece of
work.
Do you mean that you use ANTLR to generate a Tokenzer (lexem parser)
or
did you even proceed further and used ANTLR to generate higher level parsers
to overrule Lucene's TokenFilters?
or maybe even bo
Hi Valery,
From our experience at Krugle, we wound up having to create our own
tokenizers (actually kind of specialized parser) for the different
languages. It didn't seem like a good option to try to twist one of
the existing tokenizers into something that would work well enough. We
wou
Valery, oh I think there might be other ways to solve this.
But you provided some examples such as C/C++ and SAP R/3.
In these two examples you want the "/" to behave differently depending
upon context, so my first thought was that a grammar might be a good
way to ensure it does what you want.
On
Hi Robert,
thanks for the hint.
Indeed, a natural way to go. Especially if one builds a Tokenizer of the
level of quality like StandardTokenizer's.
OTOH, you mean that the out-of-the-box stuff is indeed not customizable for
this task?..
regards
Valery
Robert Muir wrote:
>
> Valery,
>
>
Valery,
One thing you could try would be to create a JFlex-based tokenizer,
specifying a grammar with the rules you want.
You could use the source code & grammar of StandardTokenizer as a
starting point.
On Thu, Aug 20, 2009 at 10:28 AM, Valery wrote:
>
> Hi all,
>
> I am trying to tune Lucene t
Hi all,
I am trying to tune Lucene to respect such tokens like C++, C#, .NET
The task is known for Lucene community, but surprisingly I can't google out
somewhat good info on it.
Of course, I tried to re-use Lucene's building blocks for Tokenizer. Here
we go:
1) StandardTokenizer -- oh, th
You could simply set Similarity.setDefault(yourSimilarity) to make
sure it is used all over the place.
Simon
On Thu, Aug 20, 2009 at 3:25 PM, Chris Salem wrote:
> No, I take it I have to use it for both? Is there anything else I should
> have to do?
> Sincerely,
> Chris Salem
>
>
> - Origin
No, I take it I have to use it for both? Is there anything else I should have
to do?
Sincerely,
Chris Salem
- Original Message -
To: java-user@lucene.apache.org
From: Grant Ingersoll
Sent: 8/19/2009 7:17:45 PM
Subject: Re: custom scorer
Are you setting the Similarity before indexin
Hi
I'd like to extend Lucene's FieldCache such that it will read native values
from a different place (in my case, payloads). That is, instead of iterating
on a field's terms and parsing each String to long (for example), I'd like
to iterate over one term (sort:long, again - an example) and decode
------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
-
To unsubscribe, e-mail: java-use
You should definitely upgrade to the latest JDK 1.6 to get the fix for
the JRE bug in LUCENE-1282, but, I don't think you are hitting that
bug (read past EOF during merge is a different exception).
Can you describe more detail on how you merge 6 IndexWriters?
Mike
On Thu, Aug 20, 2009 at 5:21 AM
I checked at http://issues.apache.org/jira/browse/LUCENE-1282
SegmentMerger.java has this code
TermFreqVector[] vectors = reader.getTermFreqVectors(docNum);
termVectorsWriter.addAllDocVectors(vectors);
so this issue appears inspite of this fix.
I am using java version "1.6.0_07". Is it fixed in
Hi
I am getting this issue in Lucene2.4 when I try to merge multiple
IndexWriters(generally 6)
sh-3.2# Exception in thread "Lucene Merge Thread #5"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
read past EOF
at
org.apache.lucene.index.ConcurrentMergeSche
24 matches
Mail list logo