ot; in the query under one of the following
conditions: preceded either by "(" or " ", or at the beginning of the string, e.g. using a regex
like /(?:^|[\s(])[+-]/, and if you find a match, use default OR operator, and
to extend myself the
queryparser contrib ?
[1] http://lucidworks.lucidimagination.com/display/LWEUG/Boolean+Operators
Thanks
--
Renaud Delbru
On 20/05/11 13:21, Steven A Rowe wrote:
Hi Renaud,
That's normal behavior, since you have AND as default operator. This is equivalent to placing a
"+&q
normal behaviour ? A Bug ? Am I doing something wrong ?
Thanks in advance for your help,
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h
rare cases.
I'll start the query benchmark this week end. Let's hope I'll have
something to share during the next week.
Cheers
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
ty
segment, and the faulty term (or even better, the index of the faulty
block), I will be able to display the content of the blocks, and see if
there is some problems in the PFor encoding.
Cheers,
--
Renaud Delbru
-
To unsubsc
easy way to get this
information, so I will be able to check these segments and their encoded
blocks in order to find and understand the problem ?
Thanks in advance,
--
Renaud Delbru
-
To unsubscribe, e-mai
aybe SIREn will be more suitable.
[1] http://siren.sindice.com/
--
Renaud Delbru
On 12/03/10 13:43, Erick Erickson wrote:
There's no requirement that all documents have the same
fields, Lucene is fine with different docs having different
fields.
There's no limit on the number of diff
Codec interface ? How is it working currently ?
Is there some restrictions on how segments can be merged ?
Is there a way to extend easily the mechanism on how segments are merged ?
Cheers,
--
Renaud Delbru
-
To unsubscribe, e-mail:
way I am testing the postings (using termPositionsEnum on the top-level
reader) was not really the proper way to test it, and that the correct
way will be instead to use directly a TermQuery.
Thanks for the clarification.
--
Renaud Delbru
using the
DocsEnum interface, and therefore do not know if it manipulates
segment-level enum or a Multi*Enums. What search (or query operators) in
Lucene is using segment-level enums ?
Cheers
--
Renaud Delbru
-
To unsubscri
But what you were suggesting is to create my own "MultiReader" that is
optimised for my codec. Is that right ? A MultiReader that just iterates
over the subreaders, checks if they are using my codec (and therefore
associated fields), and uses them to iterate
On 09/02/10 16:04, Michael McCandless wrote:
On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote:
So, does it mean that the codec interface is likely to change ? Do I need to
be prepared to change again all my code ;o) ?
This particular patch doesn't change the Codecs API
the information
that have been stored in the new index data structure are correctly
retrieved.
In that case, I got the previous errors (a MultiDocsAndPositionsEnum is
returned). However, when I am indexing only one or two documents, the
original DocsAndPositionsEnum is returne
all.
Ok, it works like a charm except the problem related to MultiReaders.
Thanks
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
her way to extends
StandardCodec without having to deal with these classes ?
Cheers
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi Michael,
I have started to look at the PFOR codec. However, when I include the
codec files inside the flex_1458 branch, it misses the
org.apache.lucene.util.pfor.PFor class which is the core of the codec.
Where can I find this class ?
Thanks,
Regards
--
Renaud Delbru
On 16/11/09 14:01
where (correct me
if I am wrong) that this new version includes some optimisations for
dictionary lookups, which should minimize the overhead.
--
Renaud Delbru
On 30/12/09 16:18, Jason Tesser wrote:
I have a situation where I might have 1000 different types of Lucene
Documents each with 10 or so f
a medium
term period. I will continue to follow the advancement of 1458, test it,
and continue to report you my feedbacks and experiences with it.
Thanks,
Best Regards
[1] http://siren.sindice.com
--
Renaud Delbru
On 16/11/09 13:01, Michael McCandless wrote:
Yes, the branch is he
e the
experience!
I will.
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
,
in order to be able to plug my own chain, but I have the impression that
you've done something similar already (with the codec abstraction).
Would be a pity to lose my time doing something less convenient that
your appraoch.
Thanks.
--
Renaud Delbru
On 14/11/09 13:22, Michael McCan
Hi,
there is also the SIREn plugin [1] that allows to index multi-valued
fields, with values of variable length, and to query them individually.
[1] http://siren.sindice.com
--
Renaud Delbru
On 12/10/09 21:31, Angel, Eric wrote:
I need to analyze these values since I also want the benefits
If you need some
help, feel free to ask your questions in our mailing list.
[1] http://siren.sindice.com
[2]
https://dev.deri.ie/confluence/display/SIREn/Indexing+and+Searching+Tabular+Data
Best Regards,
--
Renaud Delbru
Donal Murtagh wrote:
Hi,
I'm trying to use Lucene to qu
nputs to
make this project happen ... but also to the Data Intensive
Infrastructure Group and DERI.
[1] http://di2.deri.ie/
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional comman
rowse/LUCENE-1410
[2] http://videolectures.net/wsdm09_dean_cblirs/
--
Renaud Delbru
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
define how to serialise
positions and payloads
I think other parts of the FreqProxTermsWriter can stay generic. What do
you think ?
Regards.
--
Renaud Delbru
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
so bad predictor in general.
Regards.
--
Renaud Delbru
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Andrzej Bialecki wrote:
Renaud Delbru wrote:
Hi Andrzej,
sorry for the late reply.
I have looked at the code. As far as I understand, you sort the
posting lists based on the first doc skip. The first posting list
will be the one who have the first biggest document skip.
Do the sparseness of
ou could then create your own indexing chain for
indexing? If you take that approach, please report back so we can
learn how to improve Lucene for these very advanced customizations!
Ok, thanks for the reference. I will try this solution, and will report
you any proble
ConjunctiveScorer). This will require a call to
IndexReader.docFreq(term) for each of the term queries. Is docFreq call
mean another IO access ?
Thanks for the clarification,
Regards.
--
Renaud Delbru
Andrzej Bialecki wrote:
Renaud Delbru wrote:
> Hi all,
>
> I am wondering if Lucene implements
modifications ? Make a branch of
lucene, and add my new classes to the lucene package
org.apache.lucene.index ? Or do a more elegant solution is possible ?
Thanks in advance,
Regards.
--
Renaud Delbru
-
To unsubscribe, e-mail: [EMAIL
Yes, I know to research project that have implemented a triple store on
top of Lucene:
- Semplore [1]
- Sindice [2]
[1] http://apex.sjtu.edu.cn/apex_wiki/Demos/Semplore
[2] http://www.sindice.com
--
Renaud Delbru
Cam Bazz wrote:
Has anyone tried to implement a triplet store with lucene?
Best
Hi all,
I am wondering if Lucene implements the query optimisation that consists
of ordering the posting lists based on the term frequency before
intersection ?
If yes, could somebody point me to the java class / method that
implements such strategy ?
Thanks in advance,
Regards.
--
Renaud
the field data (.fdt) file.
Then, could it be possible to overwrite the old float value by a new
float value ?
Thanks,
--
Renaud delbru
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
r can know (because tokenStream doesn't
return them). It's as if we need the ability to query a tokenStream
for its "final" offset or something.
One workaround might be to insert an "end marker" token, with the true
end offset, which is a term you would never sear
instances will have their offset shifted back.
Is it a bug ? Or is it a desired behavior (in this case, why ?) ?
Regards.
--
Renaud Delbru,
E.C.S., Ph.D. Student,
Semantic Information Systems and
Language Engineering Group (SmILE),
Digital Enterprise Research Institute,
National University of
35 matches
Mail list logo