Backward compatibility of FST50 and UniformSplit formats

2021-04-18 Thread Dmitry Emets
Hi! I cannot open by lucene master my indexes created by lucene 8.5. I get an error Exception in thread "main" org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene84PostingsWriterDoc vs expected codec=Lucene90PostingsWriterDoc (resource=MMapIndexInput(path="C:\data\luc

Re: Deduplication of search result with custom with custom sort

2020-10-13 Thread Dmitry Emets
ck up and ask what the use-case > > > > is. Returning 6.5M docs to a user is useless, so are you’re doing > > > > some kind of analytics maybe? In which case, and again > > > > assuming you’re using Solr, Streaming Aggregation might > > > > be a better

Re: Deduplication of search result with custom with custom sort

2020-10-12 Thread Dmitry Emets
e you’re doing > > > some kind of analytics maybe? In which case, and again > > > assuming you’re using Solr, Streaming Aggregation might > > > be a better option. > > > > > > This really sounds like an XY problem. You’re trying to solve problem X > &

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
problem X > and asking how to accomplish it with Y. What I’m questioning > is whether Y (grouping) is a good approach or not. Perhaps if > you explained X there’d be a better suggestion. > > Best, > Erick > > > On Oct 9, 2020, at 8:19 AM, Dmitry Emets wrote: > > >

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
I have 12_000_000 documents, 6_500_000 groups With sort: It takes around 1 sec without grouping, 2 sec with grouping and 12 sec with setAllGroups(true) Without sort: It takes around 0.2 sec without grouping, 0.6 sec with grouping and 10 sec with setAllGroups(true) Thank you, Erick, I will look in

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
Yes, it is пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net>: > Is the field that you are using to dedupe stored as a docvalue? > > From: java-user@lucene.apache.org At: 10/09/20 12:18:04To: > java-user@lucene.apache.org > Subject: Deduplication of sea

Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
Hi, I need to deduplicate search results by specific field and I have no idea how to implement this properly. I have tried grouping with setGroupDocsLimit(1) and it gives me expected results, but has not very good performance. I think that I need something like DiversifiedTopDocsCollector, but suit

Re: Revolution writeup

2013-11-26 Thread Dmitry Kan
fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html Dmitry On Mon, Nov 25, 2013 at 8:42 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > I just posted a writeup of the Lucene/Solr Revolution Dublin conference. > I've been waiting for videos to become ava

Re: Is it possible to combine Wildcard and Phrasequery for the Queryparser

2011-10-13 Thread Dmitry Savenko
} } while (termEnum.next()); // adding last term variations mq.add(qTail.toArray(new Term[] {})); // mq is now the query you need Best regards, Dmitry. - Original Message - From: "Ralf Heyde" To: java-user@lucene.apache.org Sent: Thursday, October 13, 2011 5:07:20 PM Subject

How to get hit offsets?

2011-09-12 Thread Dmitry Savenko
"offset" is. But I don't really need sophisticated queries. I just need simple substring search. May be, Lucene is not supposed to be used that way. But I also need to manage a number of big files and be able to search in multiple files at once and produce results quickly - th

Re: getting all Lucene internal IDs

2009-06-19 Thread Dmitry Lizorkin
Iterate over all ints from 0 .. IndexReader.maxDoc() (exclusive) and call IndexReader.isDeleted? Excellent, works perfect for us! Michael, thank you very much for your help! Best regards, Dmitry - To unsubscribe, e-mail

Re: getting all Lucene internal IDs

2009-06-19 Thread Dmitry Lizorkin
g? Thank you for your prompt reply Dmitry - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

getting all Lucene internal IDs

2009-06-19 Thread Dmitry Lizorkin
Hello! What is the appropriate way to obtain Lucene internal IDs for _all_ the tuples stored in a Lucene index? Thank you for your help Dmitry - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: How to keep user search history and how to turn it into information?

2007-08-10 Thread Dmitry
Lucas, Probably one of the solution will be to use database - like my sql and setup Lucene against MySQL - in thi scase you don't need to think less concerning implementaiton based on the content sotrage. ALso you need to create middle tier to catch all event concerning Users Search / Hostory

Re: Get the terms and frequency vector of an indexed but unstored field

2007-08-05 Thread Dmitry
What is advantage to use term frequency vector? thanks, DT www.ejinz.com Search News - Original Message - From: "Kai Hu" <[EMAIL PROTECTED]> To: Sent: Sunday, August 05, 2007 8:40 PM Subject: 答复: Get the terms and frequency vector of an indexed but unstored field you use the flag

Re: How do YOU detect corrupt indexes?

2007-08-02 Thread Dmitry
Not sure how exactly understand corrupted indexes in the sense that could not read / use indexes or something else.. thanks DT www.ejinz.com EjinZ Search Engine - Original Message - From: "Doron Cohen" <[EMAIL PROTECTED]> To: Sent: Friday, August 03, 2007 1:03 AM Subject: Re: How do

Detection of index dublicates in Lucene

2007-07-28 Thread Dmitry
We trying to find are any implementation for Lucene - detection index duclicates. Assuming we have a set of documents and a document is a bunch of words. After we created indexec for the same document we need to knwo that all ideces will be uniq for specific document. (lexical equivalency).

Re: lucene integration with PDM Windchill (Product Data Management System)

2007-07-28 Thread Dmitry
- Original Message - From: "Dmitry" <[EMAIL PROTECTED]> To: Sent: Saturday, July 28, 2007 6:56 PM Subject: Re: lucene integration with PDM Windchill (Product Data Management System) Karl, thanks for help. I will try to explain requirements. There is system PDM - product Data M

Re: lucene integration with PDM Windchill (Product Data Management System)

2007-07-28 Thread Dmitry
Document clasess. This was just short desription of architecture of PDMLink - Windchill. So we need create some Lucene services(processors) embedded to the system using extended interfaces for creation indexes and Search all Documnets by Attributes. thanks, Dmitry www.ejinz.com Search Eng

Re: NPE in MultiReader

2007-07-27 Thread Dmitry
What the conditions you are following when running lucene - like configuration, parameters..can you describe more? thanks, dt, www.ejinz.com Search Engine News - Original Message - From: "testn" <[EMAIL PROTECTED]> To: Sent: Friday, July 27, 2007 7:50 PM Subject: NPE in MultiReader

Re: Linear Hashing in Lucene?

2007-07-26 Thread Dmitry
26, 2007 3:49 PM Subject: Re: Linear Hashing in Lucene? 26 jul 2007 kl. 05.56 skrev Dmitry: 1. does exist Ontology Wraper in Lucene implementation? Not publically available as far as I know. There have been some discussion on the forums though, you could try to search for OWL, RDF or something

Linear Hashing in Lucene?

2007-07-25 Thread Dmitry
Hey, Some common questions about Lucene. 1. does exist Ontology Wraper in Lucene implementation? 2. Does Lucene using Linear Hashing? thnaks, DT, www.ejinz.com Search news - To unsubscribe, e-mail: [EMAIL PROTECTED] For addition

Highlighter strategy in Lucene

2007-07-25 Thread Dmitry
Waht kind of Highlighter strategy Lucene is using? thanks, Dt www.ejinz.com Search Engine for News - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Displaying results in the order

2007-07-25 Thread Dmitry
Is there a way to update a document in the Index without causing any change to the order in which it comes up in searches? thanks, DT, www.ejinz.com Search everything news, tech, movies, music - To unsubscribe, e-mail: [EMAIL P

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Dmitry
Askar, why do you need to add +id:? thanks, dt, www.ejinz.com search engine news forms - Original Message - From: "Askar Zaidi" <[EMAIL PROTECTED]> To: ; <[EMAIL PROTECTED]> Sent: Wednesday, July 25, 2007 12:39 AM Subject: Re: Fine Tuning Lucene implementation Hey Hira , Thanks so mu

Re: CJKAnalyzer - Issues with scoring

2007-07-22 Thread Dmitry
Nina, can you point me to the link where I can get documentation about CJKAnalyzer. thanks, DT www.ejinz.com Search Engine Economy - Original Message - From: "Nina Khosravi" <[EMAIL PROTECTED]> To: Sent: Sunday, July 22, 2007 11:38 PM Subject: CJKAnalyzer - Issues with scoring He

Re: Lucene indexing for PDM system like Windchill

2007-07-22 Thread Dmitry
carme" <[EMAIL PROTECTED]> To: Sent: Sunday, July 22, 2007 4:16 AM Subject: Re: Lucene indexing for PDM system like Windchill If you wont to index Hibernate persisted data, just use Compass. M. Le 22 juil. 07 à 04:19, Dmitry a écrit : Folks, Trying to integrate PDM system : WTPart ob

Lucene indexing for PDM system like Windchill

2007-07-21 Thread Dmitry
Folks, Trying to integrate PDM system : WTPart obejct with Lucene indexing search framework. Part of the work is integration with persistent layer + indeces storage+ mysql Could not find a good solution ... please advice thanks, DT www.ejinz.com search engine -

Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA

2007-07-20 Thread Dmitry
Andreas, Thanks, I get it now. www.ejinz.com - Original Message - From: "Andreas Knecht" <[EMAIL PROTECTED]> To: Sent: Friday, July 20, 2007 2:23 AM Subject: Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA Hi Dmitry, The full re-index is necessary, because DateTo

Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA

2007-07-19 Thread Dmitry
Folks, why do we need full re0indexing during the switch from DateFields to DateTools ? I'll have the same issue in future may be? Thanks, Dm www.ejinz.com - Original Message - From: "Andreas Knecht" <[EMAIL PROTECTED]> To: Sent: Friday, July 20, 2007 1:32 AM Subject: Upgrade of Luce

Doc classification / categorization with Lucene ?

2006-11-06 Thread Dmitry Goldenberg
Hello, What are the best practices for document classification / categorization using Lucene? Any recommendations as far as manual vs. automatic, which products to use or not to use? Does Lucene offer anything out of the box? Thanks, - Dmitry

RE: Lucene - FileFormat

2006-04-21 Thread Dmitry Goldenberg
Simon, I wonder if using Zoe might do the trick - http://guests.evectors.it/zoe/ Have you tried it? - Dmitry From: Fisheye [mailto:[EMAIL PROTECTED] Sent: Fri 4/21/2006 7:23 AM To: java-user@lucene.apache.org Subject: Lucene - FileFormat Im trying to

RE: Distributed Lucene.. - clustering as a requirement

2006-04-11 Thread Dmitry Goldenberg
Agreed, an inverted index cannot be efficiently maintained in a B-tree(hence RDBMS). But I think we can(or should) have the option of a B-tree based storage for unindexed fields, whereas for indexed fields we can use the existing lucene's architecture. prasen [EMAIL PROTECTED] wrote: >

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg
r storing the actual "documents"? This way you're using lucene for what lucene is best at, and using the database for what it's good at. At least up to a point -- RDBMSs have their limits too. OR maybe if you have a huge dataset, you might want to check out Nutch. On 4/6/06, Dmitry G

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg
xing structures at a single byte level are just way too much trouble to deal with for application integrator like myself. - Dmitry From: Samuru Jackson [mailto:[EMAIL PROTECTED] Sent: Mon 3/6/2006 10:05 AM To: java-user@lucene.apache.org Subject: Re: Distributed

RE: Data structure of a Lucene Index

2006-04-06 Thread Dmitry Goldenberg
Ideally, I'd love to see an article explaining both in detail: the index structure as well as the merge algorithm... From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tue 3/28/2006 11:57 PM To: java-user@lucene.apache.org Subject: Data structure of a Luce

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-09 Thread Dmitry Goldenberg
rmance, so maybe we could also make this more common setting the default also? Erik On Feb 8, 2006, at 2:17 PM, Dmitry Goldenberg wrote: > Duh! Bingo! Mistery solved. I should have thought of this :) > The discrepancies come in with larger documents, definitely > 10K > terms whi

RE: Word files & Build vs. Buy?

2006-02-09 Thread Dmitry Goldenberg
Chris, Awesome stuff. A few questions: is your Excel extractor somehow better than POI's? and, what do you see as the timeframe for adding WordPerfect support? Are you considering supporting any other sources such as MS Project, Framemaker, etc? Thanx, - D

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-08 Thread Dmitry Goldenberg
Duh! Bingo! Mistery solved. I should have thought of this :) The discrepancies come in with larger documents, definitely > 10K terms which is Lucene's default maxFieldLength. Thanks for your help, Chris - Dmitry From: Chris Hostetter [mailto:[EMAIL P

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-08 Thread Dmitry Goldenberg
manually, or by QueryParser). the direct equals comparisons you are dong should be fine. have you tried adding logging of the raw term field/text and the freq counts you get back to see if that helps you spot the problem? : Date: Mon, 6 Feb 2006 14:34:05 -0800 : From: Dmitry Goldenberg <[EMA

How to get mapping of query terms to number of their occurrences in a doc?

2006-02-06 Thread Dmitry Goldenberg
Given a query, I want to be able to, for each query term, get the number of occurrences of the term. I have tried what I'm including below and it does not seem to provide reliable results. Seems to work fine with exact matching but as soon as stemming kicks in, all bets are off as to value of

RE: How to find "function()" - ?

2006-01-30 Thread Dmitry Goldenberg
d fashion, e.g. function\() -- or is function() ok? Thanks, - Dmitry From: Michael D. Curtin [mailto:[EMAIL PROTECTED] Sent: Fri 1/27/2006 2:14 PM To: java-user@lucene.apache.org Subject: Re: How to find "function()" - ? Dmitry Goldenberg wrote: >

How to find "function()" - ?

2006-01-27 Thread Dmitry Goldenberg
ot;function()" - the query succeeds but what it finds is actually "function", not "function()". If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments. How can I enforce () to be included? Thanks, - Dmitry

RE: Keyword fields, Porter stemming, and QueryParser

2006-01-25 Thread Dmitry Goldenberg
Dave, Thanks for the pointer. The Wrapper worked marvellously! This was exactly the situation - wanting to treat the standard fields and keyword fields differently as far as stemming is concerned (no stemming for the latter). - Dmitry From: Dave Kor

RE: java.io.IOException: read past EOF in BufferedIndexInput.refill

2006-01-24 Thread Dmitry Goldenberg
clues? From: Dmitry Goldenberg [mailto:[EMAIL PROTECTED] Sent: Tue 1/24/2006 3:52 PM To: java-user@lucene.apache.org Cc: java-dev@lucene.apache.org Subject: java.io.IOException: read past EOF in BufferedIndexInput.refill Has anyone seen this exception and been able to resolve the

java.io.IOException: read past EOF in BufferedIndexInput.refill

2006-01-24 Thread Dmitry Goldenberg
Has anyone seen this exception and been able to resolve the cause? I have seen numerous mentions of it in the Lucene lists archives but no resolutions, looks like. Anyone? Thanks. java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java

Keyword fields, Porter stemming, and QueryParser

2006-01-24 Thread Dmitry Goldenberg
ly DocType:xls. So, the query does not bring back the expected results. Has anyone run into this issue? How do I work around it? Thanks, - Dmitry

Lucene and Regex - ?

2006-01-04 Thread Dmitry Goldenberg
Hi, Can someone provide a quick summary of the Regex capabilities in Lucene? I see there's a RegexQuery and a SpanRegexQuery - what are they intended for and how do I use them? Thanks, - Dmitry

Correlating best fragments back to native documents - ?

2005-12-29 Thread Dmitry Goldenberg
Lucene's findings in the actual documents. I am probably "all wet" but if this makes any sense, please give me a shout. Thanks, - Dmitry

RE: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
back end, can the back end always call rewrite or not? By looking at the code, it seems that things just get reinterpreted as BooleanQuery's and such. I just want to know if there are any pitfalls to watch out for. Thanks again, - Dmitry From: Er

RE: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
Erik, What do you mean by _rewriting_ the query? I checked all the classes in the highlighter package and did not see any mention of having to rewrite. Sorry for the highjacking, didn't mean to be a terrorist :) - Dmitry From: Erik Hatcher [mailto:[

Field searches and special characters - ??

2005-12-27 Thread Dmitry Goldenberg
first escaping it, as just item+with+pluses. In this case, Keyword.Name:item\+with\+pluses still brought back no results. Has anyone run into this issue before? Any recommendations? Thanks, - Dmitry

Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
with any other query and highlight any matching token sequences? E.g. if I'm searching for lava~, I'd expect it to highlight words like lava, java, etc. This is the whole point of highlighting, is it not? Thanks, - Dmitry

RE: Proximity searches and Porter stemming - ??

2005-12-27 Thread Dmitry Goldenberg
Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tue 12/27/2005 10:56 AM To: java-user@lucene.apache.org Subject: Re: Proximity searches and Porter stemming - ?? On Dec 27, 2005, at 1:45 PM, Dmitry Goldenberg wrote: > I tried using Porter stemming in our application and it worked > great exc

Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
with any other query and highlight any matching token sequences? E.g. if I'm searching for lava~, I'd expect it to highlight words like lava, java, etc. This is the whole point of highlighting, is it not? Thanks, - Dmitry -

Proximity searches and Porter stemming - ??

2005-12-27 Thread Dmitry Goldenberg
ts than no results at all, the latter being the case I've observed. Any recommendations? Thanks, - Dmitry - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Field searches and special characters - ??

2005-12-27 Thread Dmitry Goldenberg
first escaping it, as just item+with+pluses. In this case, Keyword.Name:item\+with\+pluses still brought back no results. Has anyone run into this issue before? Any recommendations? Thanks, - Dmitry - To unsubscrib

RE: searching portions of an index

2005-12-25 Thread Dmitry Goldenberg
mized enough performance-wise so as not to clog up the results filtering process... Hope this helps, - Dmitry From: Murali [mailto:[EMAIL PROTECTED] Sent: Wed 12/21/2005 9:32 AM To: java-user@lucene.apache.org Subject: searching portions of an index Hi, I a