Hi!
I cannot open by lucene master my indexes created by lucene 8.5. I get an
error
Exception in thread "main" org.apache.lucene.index.CorruptIndexException:
codec mismatch: actual codec=Lucene84PostingsWriterDoc vs expected
codec=Lucene90PostingsWriterDoc
(resource=MMapIndexInput(path="C:\data\luc
ck up and ask what the use-case
> > > > is. Returning 6.5M docs to a user is useless, so are you’re doing
> > > > some kind of analytics maybe? In which case, and again
> > > > assuming you’re using Solr, Streaming Aggregation might
> > > > be a better
e you’re doing
> > > some kind of analytics maybe? In which case, and again
> > > assuming you’re using Solr, Streaming Aggregation might
> > > be a better option.
> > >
> > > This really sounds like an XY problem. You’re trying to solve problem X
> &
problem X
> and asking how to accomplish it with Y. What I’m questioning
> is whether Y (grouping) is a good approach or not. Perhaps if
> you explained X there’d be a better suggestion.
>
> Best,
> Erick
>
> > On Oct 9, 2020, at 8:19 AM, Dmitry Emets wrote:
> >
>
I have 12_000_000 documents, 6_500_000 groups
With sort: It takes around 1 sec without grouping, 2 sec with grouping and
12 sec with setAllGroups(true)
Without sort: It takes around 0.2 sec without grouping, 0.6 sec with
grouping and 10 sec with setAllGroups(true)
Thank you, Erick, I will look in
Yes, it is
пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net>:
> Is the field that you are using to dedupe stored as a docvalue?
>
> From: java-user@lucene.apache.org At: 10/09/20 12:18:04To:
> java-user@lucene.apache.org
> Subject: Deduplication of sea
Hi,
I need to deduplicate search results by specific field and I have no idea
how to implement this properly.
I have tried grouping with setGroupDocsLimit(1) and it gives me expected
results, but has not very good performance.
I think that I need something like DiversifiedTopDocsCollector, but
suit
fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html
Dmitry
On Mon, Nov 25, 2013 at 8:42 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
> I just posted a writeup of the Lucene/Solr Revolution Dublin conference.
> I've been waiting for videos to become ava
}
} while (termEnum.next());
// adding last term variations
mq.add(qTail.toArray(new Term[] {}));
// mq is now the query you need
Best regards,
Dmitry.
- Original Message -
From: "Ralf Heyde"
To: java-user@lucene.apache.org
Sent: Thursday, October 13, 2011 5:07:20 PM
Subject
"offset" is. But I don't really need sophisticated queries. I
just need simple substring search. May be, Lucene is not supposed to be used
that way. But I also need to manage a number of big files and be able to search
in multiple files at once and produce results quickly - th
Iterate over all ints from 0 .. IndexReader.maxDoc() (exclusive) and
call IndexReader.isDeleted?
Excellent, works perfect for us!
Michael, thank you very much for your help!
Best regards,
Dmitry
-
To unsubscribe, e-mail
g?
Thank you for your prompt reply
Dmitry
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hello!
What is the appropriate way to obtain Lucene internal IDs for _all_ the
tuples stored in a Lucene index?
Thank you for your help
Dmitry
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional
Lucas,
Probably one of the solution will be to use database - like my sql and
setup Lucene against MySQL - in thi scase you don't need to think less
concerning implementaiton based on the content sotrage. ALso you need to
create middle tier to catch all event concerning Users Search / Hostory
What is advantage to use term
frequency vector?
thanks,
DT
www.ejinz.com Search News
- Original Message -
From: "Kai Hu" <[EMAIL PROTECTED]>
To:
Sent: Sunday, August 05, 2007 8:40 PM
Subject: 答复: Get the terms and frequency vector of an indexed but unstored
field
you use the flag
Not sure how exactly understand corrupted indexes in the sense that could
not read / use indexes or something else..
thanks
DT
www.ejinz.com
EjinZ Search Engine
- Original Message -
From: "Doron Cohen" <[EMAIL PROTECTED]>
To:
Sent: Friday, August 03, 2007 1:03 AM
Subject: Re: How do
We trying to find are any implementation for Lucene - detection index
duclicates.
Assuming we have a set of documents and a document is a bunch of words.
After we created indexec for the same document we need to knwo that all
ideces will be uniq for specific document. (lexical equivalency).
- Original Message -
From: "Dmitry" <[EMAIL PROTECTED]>
To:
Sent: Saturday, July 28, 2007 6:56 PM
Subject: Re: lucene integration with PDM Windchill (Product Data Management
System)
Karl,
thanks for help.
I will try to explain requirements. There is system PDM - product Data
M
Document
clasess.
This was just short desription of architecture of PDMLink - Windchill.
So we need create some Lucene services(processors) embedded to the system
using extended interfaces for creation indexes and Search all Documnets by
Attributes.
thanks,
Dmitry
www.ejinz.com
Search Eng
What the conditions you are following when running lucene - like
configuration, parameters..can you describe more?
thanks,
dt,
www.ejinz.com
Search Engine News
- Original Message -
From: "testn" <[EMAIL PROTECTED]>
To:
Sent: Friday, July 27, 2007 7:50 PM
Subject: NPE in MultiReader
26, 2007 3:49 PM
Subject: Re: Linear Hashing in Lucene?
26 jul 2007 kl. 05.56 skrev Dmitry:
1. does exist Ontology Wraper in Lucene implementation?
Not publically available as far as I know. There have been some
discussion on the forums though, you could try to search for OWL, RDF or
something
Hey,
Some common questions about Lucene.
1. does exist Ontology Wraper in Lucene implementation?
2. Does Lucene using Linear Hashing?
thnaks,
DT,
www.ejinz.com
Search news
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addition
Waht kind of Highlighter strategy Lucene is using?
thanks,
Dt
www.ejinz.com
Search Engine for News
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Is there a way to update a document in the Index without causing any change
to the order in which it comes up in searches?
thanks,
DT,
www.ejinz.com
Search everything
news, tech, movies, music
-
To unsubscribe, e-mail: [EMAIL P
Askar,
why do you need to add +id:?
thanks,
dt,
www.ejinz.com
search engine news forms
- Original Message -
From: "Askar Zaidi" <[EMAIL PROTECTED]>
To: ; <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene implementation
Hey Hira ,
Thanks so mu
Nina,
can you point me to the link where I can get documentation about
CJKAnalyzer.
thanks,
DT
www.ejinz.com
Search Engine Economy
- Original Message -
From: "Nina Khosravi" <[EMAIL PROTECTED]>
To:
Sent: Sunday, July 22, 2007 11:38 PM
Subject: CJKAnalyzer - Issues with scoring
He
carme" <[EMAIL PROTECTED]>
To:
Sent: Sunday, July 22, 2007 4:16 AM
Subject: Re: Lucene indexing for PDM system like Windchill
If you wont to index Hibernate persisted data, just use Compass.
M.
Le 22 juil. 07 à 04:19, Dmitry a écrit :
Folks,
Trying to integrate PDM system : WTPart ob
Folks,
Trying to integrate PDM system : WTPart obejct with Lucene indexing search
framework.
Part of the work is integration with persistent layer + indeces
storage+ mysql
Could not find a good solution ...
please advice
thanks, DT
www.ejinz.com
search engine
-
Andreas,
Thanks, I get it now.
www.ejinz.com
- Original Message -
From: "Andreas Knecht" <[EMAIL PROTECTED]>
To:
Sent: Friday, July 20, 2007 2:23 AM
Subject: Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA
Hi Dmitry,
The full re-index is necessary, because DateTo
Folks,
why do we need full re0indexing during the switch from DateFields to
DateTools ? I'll have the same issue in future may be?
Thanks,
Dm
www.ejinz.com
- Original Message -
From: "Andreas Knecht" <[EMAIL PROTECTED]>
To:
Sent: Friday, July 20, 2007 1:32 AM
Subject: Upgrade of Luce
Hello,
What are the best practices for document classification / categorization using
Lucene? Any recommendations as far as manual vs. automatic, which products to
use or not to use? Does Lucene offer anything out of the box?
Thanks,
- Dmitry
Simon,
I wonder if using Zoe might do the trick - http://guests.evectors.it/zoe/
Have you tried it?
- Dmitry
From: Fisheye [mailto:[EMAIL PROTECTED]
Sent: Fri 4/21/2006 7:23 AM
To: java-user@lucene.apache.org
Subject: Lucene - FileFormat
Im trying to
Agreed, an inverted index cannot be efficiently maintained in a
B-tree(hence RDBMS). But I think we can(or should) have the option of
a B-tree based storage for unindexed fields, whereas for indexed fields
we can use the existing lucene's architecture.
prasen
[EMAIL PROTECTED] wrote:
>
r
storing the actual "documents"? This way you're using lucene for what
lucene is best at, and using the database for what it's good at. At
least up to a point -- RDBMSs have their limits too. OR maybe if you
have a huge dataset, you might want to check out Nutch.
On 4/6/06, Dmitry G
xing structures at a single byte level are just way too much trouble to
deal with for application integrator like myself.
- Dmitry
From: Samuru Jackson [mailto:[EMAIL PROTECTED]
Sent: Mon 3/6/2006 10:05 AM
To: java-user@lucene.apache.org
Subject: Re: Distributed
Ideally, I'd love to see an article explaining both in detail: the index
structure as well as the merge algorithm...
From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED]
Sent: Tue 3/28/2006 11:57 PM
To: java-user@lucene.apache.org
Subject: Data structure of a Luce
rmance, so maybe we could also make this more common
setting the default also?
Erik
On Feb 8, 2006, at 2:17 PM, Dmitry Goldenberg wrote:
> Duh! Bingo! Mistery solved. I should have thought of this :)
> The discrepancies come in with larger documents, definitely > 10K
> terms whi
Chris,
Awesome stuff. A few questions: is your Excel extractor somehow better than
POI's? and, what do you see as the timeframe for adding WordPerfect support?
Are you considering supporting any other sources such as MS Project,
Framemaker, etc?
Thanx,
- D
Duh! Bingo! Mistery solved. I should have thought of this :)
The discrepancies come in with larger documents, definitely > 10K terms which
is Lucene's default maxFieldLength.
Thanks for your help, Chris
- Dmitry
From: Chris Hostetter [mailto:[EMAIL P
manually, or
by QueryParser). the direct equals comparisons you are dong should be
fine.
have you tried adding logging of the raw term field/text and the freq
counts you get back to see if that helps you spot the problem?
: Date: Mon, 6 Feb 2006 14:34:05 -0800
: From: Dmitry Goldenberg <[EMA
Given a query, I want to be able to, for each query term, get the number of
occurrences of the term. I have tried what I'm including below and it does not
seem to provide reliable results. Seems to work fine with exact matching but
as soon as stemming kicks in, all bets are off as to value of
d fashion, e.g. function\() -- or is function() ok?
Thanks,
- Dmitry
From: Michael D. Curtin [mailto:[EMAIL PROTECTED]
Sent: Fri 1/27/2006 2:14 PM
To: java-user@lucene.apache.org
Subject: Re: How to find "function()" - ?
Dmitry Goldenberg wrote:
>
ot;function()" - the query succeeds but what it finds is actually "function", not
"function()". If I run the same query against "function { statement1,
statement2 } it still succeeds and I get "function" in best fragments.
How can I enforce () to be included?
Thanks,
- Dmitry
Dave,
Thanks for the pointer. The Wrapper worked marvellously! This was exactly the
situation - wanting to treat the standard fields and keyword fields differently
as far as stemming is concerned (no stemming for the latter).
- Dmitry
From: Dave Kor
clues?
From: Dmitry Goldenberg [mailto:[EMAIL PROTECTED]
Sent: Tue 1/24/2006 3:52 PM
To: java-user@lucene.apache.org
Cc: java-dev@lucene.apache.org
Subject: java.io.IOException: read past EOF in BufferedIndexInput.refill
Has anyone seen this exception and been able to resolve the
Has anyone seen this exception and been able to resolve the cause? I have seen
numerous mentions of it in the Lucene lists archives but no resolutions, looks
like. Anyone? Thanks.
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java
ly DocType:xls. So, the query does not bring
back the expected results.
Has anyone run into this issue? How do I work around it?
Thanks,
- Dmitry
Hi,
Can someone provide a quick summary of the Regex capabilities in Lucene? I see
there's a RegexQuery and a SpanRegexQuery - what are they intended for and how
do I use them?
Thanks,
- Dmitry
Lucene's findings in the actual documents.
I am probably "all wet" but if this makes any sense, please give me a shout.
Thanks,
- Dmitry
back end, can the back
end always call rewrite or not?
By looking at the code, it seems that things just get reinterpreted as
BooleanQuery's and such. I just want to know if there are any pitfalls to
watch out for.
Thanks again,
- Dmitry
From: Er
Erik,
What do you mean by _rewriting_ the query? I checked all the classes in the
highlighter package and did not see any mention of having to rewrite.
Sorry for the highjacking, didn't mean to be a terrorist :)
- Dmitry
From: Erik Hatcher [mailto:[
first escaping it, as
just item+with+pluses. In this case, Keyword.Name:item\+with\+pluses still
brought back no results.
Has anyone run into this issue before? Any recommendations?
Thanks,
- Dmitry
with any other query and highlight any matching token
sequences? E.g. if I'm searching for lava~, I'd expect it to highlight words
like lava, java, etc. This is the whole point of highlighting, is it not?
Thanks,
- Dmitry
Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tue 12/27/2005 10:56 AM
To: java-user@lucene.apache.org
Subject: Re: Proximity searches and Porter stemming - ??
On Dec 27, 2005, at 1:45 PM, Dmitry Goldenberg wrote:
> I tried using Porter stemming in our application and it worked
> great exc
with any other query and highlight any matching token
sequences? E.g. if I'm searching for lava~, I'd expect it to highlight words
like lava, java, etc. This is the whole point of highlighting, is it not?
Thanks,
- Dmitry
-
ts than no results at all, the
latter being the case I've observed.
Any recommendations?
Thanks,
- Dmitry
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
first escaping it, as
just item+with+pluses. In this case, Keyword.Name:item\+with\+pluses still
brought back no results.
Has anyone run into this issue before? Any recommendations?
Thanks,
- Dmitry
-
To unsubscrib
mized enough
performance-wise so as not to clog up the results filtering process...
Hope this helps,
- Dmitry
From: Murali [mailto:[EMAIL PROTECTED]
Sent: Wed 12/21/2005 9:32 AM
To: java-user@lucene.apache.org
Subject: searching portions of an index
Hi,
I a
58 matches
Mail list logo