Re: Getting LinkageError due to Panama APIs

2023-06-30 Thread Uwe Schindler
Hi, It is not obvious what you have done, but the issue may come from custom builds, e.g., if you are not using the original Lucene JAR file but a modified one. Another reason may be Maven Shade plugin or other assemblies like Uber-JARs! Make sure that all class files and module information

Re: Getting LinkageError due to Panama APIs

2023-06-29 Thread Shubham Chaudhary
This was an internal build issue that is now fixed. Sorry for the confusion. Thanks, Shubham On Tue, Jun 27, 2023 at 12:48 AM Shubham Chaudhary wrote: > Hi everyone, > > I’m trying to build and run my software using JDK 19 which has a direct > dependency on Apache Lucene 9.6 built with JDK 17 a

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Harald Braumann
Hi! Thanks a lot for your help. I will try both of you suggestions (taxo index and per-segment ord ranges). Thanks for clarifying, that I have to iterate the ords. I wasn't sure, if I didn't just overlook something obvious. Like some way to do an advanceExact on ords. Regards harry On 01.

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Greg Miller
To address the last topic (building up ordinal ranges per-segment), what I'm thinking is that you'd iterate all unique ordinals in the SSDV field and "memorize" the ordinal range for each dimension up-front, but on a per-segment basis. This would be very similar to what DefaultSortedSetDocValuesRea

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-07-01 Thread Harald Braumann
Hi! On 01.07.22 00:46, Greg Miller wrote: Have you considered taxonomy faceting for your use-case? Because the taxonomy structure is maintained in a separate index, it's (relatively) trivial to iterate all direct child ordinals of a given dimension. The cost of mapping to a global ordinal space

Re: Getting all values for a specific dimension for SortedSetDocValues per document

2022-06-30 Thread Greg Miller
Hi Harry- Have you considered taxonomy faceting for your use-case? Because the taxonomy structure is maintained in a separate index, it's (relatively) trivial to iterate all direct child ordinals of a given dimension. The cost of mapping to a global ordinal space is done when the index is merged.

Re: Getting a MaxBytesLengthExceededException for a TextField

2019-10-25 Thread Erick Erickson
Text-based fields indeed do not have that limit for the _entire_ field. They _do_ have that limit for any single token produced. So if your field contains, say, a base-64 encoded image that is not broken up into smaller tokens, you’ll still get this error. Best, Erick > On Oct 25, 2019, at 4:2

Re: Getting Exception : java.nio.channels.ClosedByInterruptException

2019-04-01 Thread Robert Muir
Some code interrupted (Thread.interrupt) a java thread while it was blocked on I/O. This is not safe to do with lucene, because unfortunately in this situation java's NIO code closes file descriptors and releases locks. The second exception is because the indexwriter tried to write when it no long

Re: getting Lucene Docid from inside score()

2018-03-10 Thread Erick Erickson
I was thinking this was a Solr question rather than a Lucene one so the [docid] bit doesn't apply if you're in the lucene code. If you _are_ really going from solr, just put [docid] in your Solr "fl" list. Look in the Solr ref guide for an explanation: https://lucene.apache.org/solr/guide/6_6/trans

Re: getting Lucene Docid from inside score()

2018-03-10 Thread dwaipayan . roy
Hi Erick, Many thanks for your reply and explanation. I really want this to work. The good news for me is, the index is static, there is no chance of any modification of the index. > Luke and the like are using a point-in-time snapshot of the index. I want to get that lucene-assigned docid, th

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Erick Erickson
You almost certainly do _not_ want this unless you are absolutely and totally sure that your index does not change between the time you ask for for the internal Lucene doc ID and the time you use it. No docs may be added. No forceMerges are done. In fact, I'd go so far as to say you shouldn't open

Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy
Thank you very much for your reply. Yes, I really want this (for implementing a retrieval function that extends the LMDir function). Precisely, I want the document numbering same as that we see in Lucene-Index-Viewers like Luke. I am not sure what you meant by "segment offset, held by a leaf reade

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
Thank you very much for your reply. Yes, I really want this (for implementing a retrieval function that extends the LMDir function). Precisely, I want the document numbering same as that we see in Lucene-Index-Viewers like Luke. I am not sure what you meant by "segment offset, held by a leaf reade

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Michael Sokolov
Are you sure you want this? Lucene docids aren't generally useful outside a narrow internal context. They can change over time for example. But if you do, it sounds like maybe what you are seeing is the per segment docid. To get a global one you have to add the segment offset, held by a leaf reade

Re: Getting list of committed documents

2016-11-13 Thread lukes
Thanks Mike. Yeah, i saw the changelist you mentioned. Unfortunately i can't upgrade to 6.2 because of stack limitations :( . Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305728.html Sent from the Lucene - Java User

Re: Getting list of committed documents

2016-11-13 Thread Michael McCandless
Hi lukes, Sorry, this was a recent change in Lucene: https://issues.apache.org/jira/browse/LUCENE-7302 You need to upgrade to at least 6.2 to see it. And the long value that is returned is just an incrementing number, incremented for every op (add, update, delete) that changes the index. Mike M

Re: Getting list of committed documents

2016-11-12 Thread lukes
Hi Michael, Thanks for the reply. Regarding IW(IndexWriter) returning long sequence number, i looked at the signature of commit and it seems to be void. Can you please point me in the direction ? I am using Lucene 5.5.2. Also is this number aggregation of deletes, updates and new documents ? Is

Re: Getting list of committed documents

2016-11-11 Thread Michael McCandless
Hi lukes, First, IW never "auto commits". The maxBufferedDocs/RAMBufferSizeMB settings control when IW moves the recently indexed documents from RAM to disk, but that moving, which writes new segments files, does not commit them. It just writes them to disk, not visible yet to an external reader

Re: Getting list of committed documents

2016-11-10 Thread lukes
Hi, Can anyone please suggest or point in some directions. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-list-of-committed-documents-tp4305258p4305503.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
e >- >Uwe Schindler >H.-H.-Meier-Allee 63, D-28213 Bremen >http://www.thetaphi.de >eMail: u...@thetaphi.de > > >> -Original Message- >> From: Wayne Xin [mailto:wayne_...@hotmail.com] >> Sent: Friday, August 14, 2015 8:44 PM >

RE: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Uwe Schindler
; Sent: Friday, August 14, 2015 8:44 PM > To: java-user@lucene.apache.org > Subject: Re: getting full english word from tokenizing with > SmartChineseAnalyzer > > Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is > final, otherwise we could overwrite createCompone

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is final, otherwise we could overwrite createComponents(). New output: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanese player

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Michael Mastroianni
The easiest thing to do is to create your own analyzer, cut and paste the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it, and get rid of the line in createComponents(String fieldName, Reader reader) that says result = new PorterStemFilter(result); On Fri, Aug 14,

Re: Getting a proper ID value into every document

2015-06-05 Thread Chris Hostetter
: If you cannot do this for whatever reason, I vaguely remember someone : posting a link to a program they'd put together to do this for a : docValues field, you'd have to search the archives to find it. It was Toke - he generated DocValues for an existing index by writing an IndexReader Filter

Re: Getting a proper ID value into every document

2015-06-05 Thread Erick Erickson
My first recommendation, of course, would be to re-index the corpus with a new field. If possible, frankly, that would probably be less effort than trying to hack in an ID after the fact as well as not as error-prone. If you cannot do this for whatever reason, I vaguely remember someone posting a

RE: getting exception in lucene 4.0

2015-04-30 Thread Uwe Schindler
Hi, This generally happens if you don't deploy the original Lucene JAR files and instead create so-called super-jars (one large JAR file with all classes merged together). Unfortunately this approach misses to copy/merge relevant metadata in the META-INF folder of the original JARs. Without the

Re: Getting most occurring words in lucene

2015-02-22 Thread Michael McCandless
Use TermsEnum.totalTermFreq(), which is the total number of occurrences of the term, not TermsEnum.docFreq(), which is the number of documents that contain at least one occurrence of the term. Mike McCandless http://blog.mikemccandless.com On Sun, Feb 22, 2015 at 6:47 AM, Maisnam Ns wrote: > H

Re: getting number of terms in a document/field

2015-02-08 Thread Ahmet Arslan
Hi, Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader? I figured above code but it gives me a list of atomic readers. for (AtomicReaderContext context : reader.leaves()) { NumericDocValues docValues = context.reader().getNormValues(field); if (docValues != null) normValu

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan
Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally. I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can s

Re: Getting min/max of numeric doc-values facets

2014-10-09 Thread Chris Hostetter
: Is there some way when faceted search is executed, we can retrieve the : possible min/max values of numeric doc-values field with supplied custom : ranges in (LongRangeFacetCounts) or some other way to do it ? : : As i believe this can give application hint, and next search request can be : muc

RE: getting exception while deploying on axis 2

2014-09-25 Thread Uwe Schindler
e- > From: Rajendra Rao [mailto:rajendra@launchship.com] > Sent: Thursday, September 25, 2014 11:28 AM > To: java-user@lucene.apache.org > Subject: Re: getting exception while deploying on axis 2 > > Hello Uwe, > > My project Is java project built in eclipse and I

Re: getting exception while deploying on axis 2

2014-09-25 Thread Rajendra Rao
ka.tuf...@launchship.com] > > Sent: Thursday, September 25, 2014 9:22 AM > > To: java-user@lucene.apache.org > > Subject: Re: getting exception while deploying on axis 2 > > > > thanks Uwe for your reply, > > > > Can you explain what you mean by *original* JAR files of

RE: getting exception while deploying on axis 2

2014-09-25 Thread Uwe Schindler
remen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Priyanka Tufchi [mailto:priyanka.tuf...@launchship.com] > Sent: Thursday, September 25, 2014 9:22 AM > To: java-user@lucene.apache.org > Subject: Re: getting exception while deploying on axis 2

Re: getting exception while deploying on axis 2

2014-09-25 Thread Priyanka Tufchi
thanks Uwe for your reply, Can you explain what you mean by *original* JAR files of Lucene. And if I did not use original Jar, from where i can get it? As my project is java project and i have no idea how to use maven .Can you give some idea how to add and use maven shade plugin in my project a

RE: getting exception while deploying on axis 2

2014-09-24 Thread Uwe Schindler
Hi, this happens if you don't use the *original* JAR files of Lucene. If you repackage them, be sure to include the META-INF/services folders, and if multiple Lucene JAR files are included, merge the entries in the services files from all of them. Yu can do this with the Maven Shade Plugin and

Re: Getting multi-values to use in filter?

2014-04-29 Thread Shai Erera
Hi Rob, While the demo code uses a fixed number of 3 values, you don't need to encode the number of values up front. Since your read the byte[] of a document up front, you can read in a while loop as long as in.position() < in.length(). Shai On Tue, Apr 29, 2014 at 10:04 AM, Rob Audenaerde wrot

Re: Getting multi-values to use in filter?

2014-04-29 Thread Rob Audenaerde
Hi Shai, I read the article on your blog, thanks for it! It seems to be a natural fit to do multi-values like this, and it is helpful indeed. For my specific problem, I have multiple values that do not have a fixed number, so it can be either 0 or 10 values. I think the best way to solve this i

Re: Getting multi-values to use in filter?

2014-04-27 Thread Shai Erera
Hi Rob, Your question got me interested, so I wrote a quick prototype of what I think solves your problem (and if not, I hope it solves someone else's! :)). The idea is to write a special ValueSource, e.g. MaxValueSource which reads a BinadyDocValues, decodes the values and returns the maximum one

Re: Getting multi-values to use in filter?

2014-04-24 Thread Shai Erera
I don't think that you should use the facet module. If all you want is to encode a bunch of numbers under a 'foo' field, you can encode them into a byte[] and index them as a BDV. Then at search time you get the BDV and decode the numbers back. The facet module adds complexity here: yes, you get th

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
Thanks for all the questions, gives me an opportunity to clarify it :) I want the user to be able to give a (simple) formula (so I don't know it on beforehand) and use that formula in the search. The Javascript expressions are really powerful in this use case, but have the single-value limitation.

Re: Getting multi-values to use in filter?

2014-04-23 Thread Shai Erera
A NumericDocValues field can only hold one value. Have you thought about encoding the values in a BinaryDocValues field? Or are you talking about multiple fields (different names), each has its own single value, and at search time you sum the values from a different set of fields? If it's one fiel

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
Hi Shai, all, I am trying to write that Filter :). But I'm a bit at loss as how to efficiently grab the multi-values. I can access the context.reader().document() that accesses the storedfields, but that seems slow. For single-value fields I use a compiled JavaScript Expression with simplebinding

Re: Getting multi-values to use in filter?

2014-04-23 Thread Shai Erera
You can do that by writing a Filter which returns matching documents based on a sum of the field's value. However I suspect that is going to be slow, unless you know that you will need several such filters and can cache them. Another approach would be to write a Collector which serves as a Filter,

Re: Getting multi-values to use in filter?

2014-04-23 Thread Rob Audenaerde
Hi Mike, Thanks for your reply. I think it is not-so-much an invalid use case for Lucene. Lucene already has (experimental) support for Dynamic Range Facets, expressions (javascript expressions, geospatial haversin etc. etc). There are all computed on the fly; and work really well. They just depe

Re: Getting multi-values to use in filter?

2014-04-23 Thread Michael Sokolov
This isn't really a good use case for an index like Lucene. The most essential property of an index is that it lets you look up documents very quickly based on *precomputed* values. -Mike On 04/23/2014 06:56 AM, Rob Audenaerde wrote: Hi all, I'm looking for a way to use multi-values in a f

Re: Getting IndexWriterConfig details for a closed index

2014-04-22 Thread Jose Carlos Canova
You can persist the IndexConfiguration somewhere using a Serializable object and persisting the configuration on a "File using an ObjectOutputStream", persist the configuration on a "persistent mechanism like a Database or on a fever of the moment a JSON storage" or like "Solr" using a Xml File. I

RE: Getting term ords during collect

2014-02-13 Thread Kyle Judson
The SortedSetDocValuesField worked great. Thanks. Kyle > From: luc...@mikemccandless.com > Date: Wed, 12 Feb 2014 05:39:24 -0500 > Subject: Re: Getting term ords during collect > To: java-user@lucene.apache.org > > It sounds like you are just indexing at TextFiel

Re: Getting term ords during collect

2014-02-12 Thread Michael McCandless
> Kyle > >> From: luc...@mikemccandless.com >> Date: Tue, 11 Feb 2014 19:59:03 -0500 >> Subject: Re: Getting term ords during collect >> To: java-user@lucene.apache.org >> >> SortedSetDV is probably the best way to do so. You could also encode >>

RE: Getting term ords during collect

2014-02-11 Thread Kyle Judson
gt; From: luc...@mikemccandless.com > Date: Tue, 11 Feb 2014 19:59:03 -0500 > Subject: Re: Getting term ords during collect > To: java-user@lucene.apache.org > > SortedSetDV is probably the best way to do so. You could also encode > the ords yourself into a byte[] and use binary DV. > > Bu

Re: Getting term ords during collect

2014-02-11 Thread Michael McCandless
SortedSetDV is probably the best way to do so. You could also encode the ords yourself into a byte[] and use binary DV. But why are you seeing it take too long to load? You can switch to different DV formats to tradeoff RAM usage and lookup speed.. Mike McCandless http://blog.mikemccandless.co

Re: Getting integer value from BytesRef

2013-10-07 Thread 장용석
Peter. Thanks for reply. This code is just sample for question. Actually, I have index many documents. And the reason for try this, I want to get statistics of index file. Thanks and Regards. 2013/10/8 Peter Chang > Your doc freq is always 1. It's useless. > I don't know why you try to inde

Re: Getting integer value from BytesRef

2013-10-07 Thread 장용석
Thanks very much! Uwe. I have get right value using NumericUtils. And as you talk, there were many terms more than I have indexing. Thanks and Regards. 2013/10/8 Uwe Schindler > Hi, > > Use NumericUtils to convert the BytesRef back to a number: > http://goo.gl/3KG9Pd > But be careful, the term

Re: Getting integer value from BytesRef

2013-10-07 Thread Peter Chang
Your doc freq is always 1. It's useless. I don't know why you try to index and search a binary field except for range searching. On Mon, Oct 7, 2013 at 11:23 PM, 장용석 wrote: > Dear, > > I have indexing integer field like this > > - > Document doc = new Document(); > FieldType fieldType = new

RE: Getting integer value from BytesRef

2013-10-07 Thread Uwe Schindler
Hi, Use NumericUtils to convert the BytesRef back to a number: http://goo.gl/3KG9Pd But be careful, the terms index contains more terms with lower precisions (bits stripped off), unless you use infinite precisionStep! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.theta

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
k Krupansky -Original Message- From: Michael McCandless Sent: Thursday, May 23, 2013 10:39 AM To: Lucene Users Subject: Re: Getting position increments directly from the the index On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov wrote: But, just to clarify, is there a way to get, let

Re: Getting position increments directly from the the index

2013-05-23 Thread Michael McCandless
On Thu, May 23, 2013 at 9:54 AM, Igor Shalyminov wrote: > But, just to clarify, is there a way to get, let's say, a vector of position > increments directly from the index, without re-parsing document contents? Term vectors (as Jack suggested) are one option, but they are very heavy (slows down

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
Take a look at the Term Vectors Component: http://wiki.apache.org/solr/TermVectorComponent -- Jack Krupansky -Original Message- From: Igor Shalyminov Sent: Thursday, May 23, 2013 9:54 AM To: java-user@lucene.apache.org Subject: Re: Getting position increments directly from the the

Re: Getting position increments directly from the the index

2013-05-23 Thread Igor Shalyminov
upansky > > -Original Message- > From: Michael McCandless > Sent: Thursday, May 23, 2013 6:28 AM > To: Lucene Users > Subject: Re: Getting position increments directly from the the index > > Do you actually index the sentence boundary as a token?  If so, you > could j

Re: Getting position increments directly from the the index

2013-05-23 Thread Jack Krupansky
-Original Message- From: Michael McCandless Sent: Thursday, May 23, 2013 6:28 AM To: Lucene Users Subject: Re: Getting position increments directly from the the index Do you actually index the sentence boundary as a token? If so, you could just get the totalTermFreq of that token? Mike

Re: Getting position increments directly from the the index

2013-05-23 Thread Michael McCandless
Do you actually index the sentence boundary as a token? If so, you could just get the totalTermFreq of that token? Mike McCandless http://blog.mikemccandless.com On Wed, May 22, 2013 at 10:11 AM, Igor Shalyminov wrote: > Hello! > > I'm storing sentence bounds in the index as position increme

Re: Getting documents from suggestions

2013-03-22 Thread Bratislav Stojanovic
OK, I've played with all this solutions and basically only one gave me satisfying results. Using build() with TermFreqPayload argument gave me horrible performance, because it takes more than 5 mins to iterate through all Terms in the index and to filter them based on the doc id. Not sure if this n

Re: Getting documents from suggestions

2013-03-16 Thread Michael McCandless
On Sat, Mar 16, 2013 at 7:47 AM, Bratislav Stojanovic wrote: > Hey Mike, > > Is this what I should be looking at? > https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html > > Not sure how to call build(), i.e. what to pa

Re: Getting documents from suggestions

2013-03-16 Thread Jack Krupansky
013 7:29 AM To: java-user@lucene.apache.org Subject: Re: Getting documents from suggestions Hey Jack, I've tried MoreLikeTHis, but it always returns me 0 hits. Here's the code, it's very simple : // test2 Index lucene = null; try { lucene = new Index(); MoreLikeThis mlt = new More

Re: Getting documents from suggestions

2013-03-16 Thread Bratislav Stojanovic
Hey Mike, Is this what I should be looking at? https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html Not sure how to call build(), i.e. what to pass as a parameter...Any examples? Where to specify my payload (which is

Re: Getting documents from suggestions

2013-03-16 Thread Bratislav Stojanovic
Hey Jack, I've tried MoreLikeTHis, but it always returns me 0 hits. Here's the code, it's very simple : // test2 Index lucene = null; try { lucene = new Index(); MoreLikeThis mlt = new MoreLikeThis(lucene.reader); mlt.setAnalyzer(lucene.analyzer); Reader target = new StringReader("apache"); Quer

Re: Getting documents from suggestions

2013-03-15 Thread Bratislav Stojanovic
gestion is X, do you simply want to know a few of the > >> documents which have the highest term frequency for X? > >> > >> Or is there some other term-oriented metric you might propose? > >> > >> > >> -- Jack Krupansky > >> > >> -Or

Re: Getting documents from suggestions

2013-03-14 Thread Steve Rowe
y want to know a few of the >> documents which have the highest term frequency for X? >> >> Or is there some other term-oriented metric you might propose? >> >> >> -- Jack Krupansky >> >> -Original Message- From: Bratislav Stojanovic

Re: Getting documents from suggestions

2013-03-14 Thread Bratislav Stojanovic
you simply want to know a few of the > documents which have the highest term frequency for X? > > Or is there some other term-oriented metric you might propose? > > > -- Jack Krupansky > > -Original Message- From: Bratislav Stojanovic > Sent: Thursday, March 14, 2013

Re: Getting documents from suggestions

2013-03-14 Thread Jack Krupansky
Sent: Thursday, March 14, 2013 6:14 PM To: java-user@lucene.apache.org Subject: Re: Getting documents from suggestions Wow that was fast :) I have implemented a simple search box with auto-suggestions, so whenever user types in something, ajax call is fired to the SuggestServlet and in return 10 sugges

Re: Getting documents from suggestions

2013-03-14 Thread Bratislav Stojanovic
Wow that was fast :) I have implemented a simple search box with auto-suggestions, so whenever user types in something, ajax call is fired to the SuggestServlet and in return 10 suggestions are shown. It's working fine with the SpellChecker class, but I only get array of Strings. What I want is t

Re: Getting documents from suggestions

2013-03-14 Thread Michael McCandless
If you are using AnalyzingSuggester or FuzzySuggester than you can use its new payloads feature to store an arbitrary byte[] with each suggestion: https://issues.apache.org/jira/browse/LUCENE-4820 But this won't help if you're using spell checker ... Mike McCandless http://blog.mikemccandle

Re: Getting documents from suggestions

2013-03-14 Thread Jack Krupansky
Could you give us some examples of what you expect? I mean, how is your suggested set of documents any different from simply executing a query with the list of suggested terms (using q.op=OR)? Or, maybe you want something like MoreLikeThis? -- Jack Krupansky -Original Message- From:

Re: Getting a similarity score for an arbitrary pair of documents or a query and a document

2013-03-06 Thread Emmanuel Espina
Have you already checked Solr's more like this? http://wiki.apache.org/solr/MoreLikeThisHandler and http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to the use case of that component and if there is something to hack is solr's more like this. Lucene's similarity is a low le

Re: Getting the number of all hits for the SpanQuery

2013-02-01 Thread Igor Shalyminov
Hi again! So far I think that the easiest way to get all span matches is indeed this method (Lucene v 4.1 code): public Spans getSpans(final AtomicReaderContext context, Bits acceptDocs, Map termContexts) But there is no annotation for this code except 'for internal use only', and the input pa

Re: getting the token position

2013-01-10 Thread Igal @ getRailo.org
hi Denis, thanks for your reply. OffsetAttribute gives the character position whereas I was looking for the Token Position. I ended up adding the attached PositionAttribute/PositionAttributeImpl/PositionFilter. as it turned out though I didn't need that attribute as there was an easier way

Re: getting the token position

2013-01-10 Thread Denis Bazhenov
What you are looking for is OffsetAttribute. Also consider the possibility of using ShingleFilter with position increment > 1 and then filtering tokens containing "_" (underscore). This will be easier, I guess. On Jan 11, 2013, at 7:14 AM, Igal @ getRailo.org wrote: > hi all, > > how can I ge

Re: getting the offset of hits in a search

2013-01-09 Thread Itai Peleg
Great! I'll look into that. Thanks! 2013/1/9 김한규 > Try SpanTermQuery, getSpans() function. It returns Spans object which you > can iterate through to find position of every hits in every documents. > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html >

Re: getting the offset of hits in a search

2013-01-09 Thread 김한규
Try SpanTermQuery, getSpans() function. It returns Spans object which you can iterate through to find position of every hits in every documents. http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html 2013/1/9 Itai Peleg > Hi, > > I'n new to Lucene, and I'm hav

Re: Getting terms from unstored fields, doc-wise

2012-07-27 Thread Phanindra R
Thanks a lot Aditya and Andrzej .. Your responses were really helpful. On Fri, Jul 27, 2012 at 6:15 AM, Andrzej Bialecki wrote: > On 26/07/2012 22:04, Phanindra R wrote: > >> Thanks for the reply Abdul. >> >> I was exploring the API and I think we can retrieve all those words by >> using a brute

Re: Getting terms from unstored fields, doc-wise

2012-07-27 Thread Andrzej Bialecki
On 26/07/2012 22:04, Phanindra R wrote: Thanks for the reply Abdul. I was exploring the API and I think we can retrieve all those words by using a brute-force approach. 1) Get all the terms using indexReader.terms() 2) Process the term only if it belongs to the target field. 3) Get all the do

Re: Getting terms from unstored fields, doc-wise

2012-07-26 Thread Aditya
Hi If the data is not stored then it cannot be retrieved in the same format. Using IndexReader as you listed you could retrieve the list of the terms available in the doc. It may be analyzed. You may not be getting exact data. Regards Aditya www.findbestopensource.com On Fri, Jul 27, 2012 at 1:3

Re: Getting terms from unstored fields, doc-wise

2012-07-26 Thread Phanindra R
Thanks for the reply Abdul. I was exploring the API and I think we can retrieve all those words by using a brute-force approach. 1) Get all the terms using indexReader.terms() 2) Process the term only if it belongs to the target field. 3) Get all the docs using indexReader.termDocs(term); 4) S

Re: Getting terms from unstored fields, doc-wise

2012-07-26 Thread in.abdul
No , it's not possible to get the data which not stored .. On Jul 26, 2012 10:27 PM, "Phanindra R [via Lucene]" > Hi, > I've an index to analyze (manually). Unfortunately, I cannot rebuild > the index. Some of the fields are 'unstored'. I was wondering whether > there's any way to get the ter

Re: Getting DF & IDF

2012-05-20 Thread yura.minsk
int numDocs = filterIndexReader.numDocs(); ... idf = Math.log10((double) numDocs / docFreq); Sethu_424 wrote > > wrong formula. numDoc should not be a count of documents in index - but documents containing searching term. We need something like IndexReader.docFreq( term ); -- View this messa

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-14 Thread Erick Erickson
In general you can't rely on anything like this. I admit the merge stuff isn't my area of expertise, but when segments are merged, there's no guarantee that they're merged in order. In general the internal Lucene doc ID should be treated as predictable only for closed segments. Your solution of us

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Ian Lea
What version of lucene are you using? If not the latest, try that. If you really think there is a lucene bug post a small self-contained test case that demonstrates the problem. -- Ian. On Fri, May 11, 2012 at 12:35 PM, Kasun Perera wrote: > On Fri, May 11, 2012 at 4:52 PM, Ian Lea wrote: >

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Kasun Perera
On Fri, May 11, 2012 at 4:52 PM, Ian Lea wrote: > Can't spot anything obviously wrong in your code and what you are > trying to do should work. Are you positive that what you think is the > second doc is really being added second? You only show one doc being > added. Are there already 7 docs i

Re: Getting the frequencies by corresponding order of documents were indexed

2012-05-11 Thread Ian Lea
Can't spot anything obviously wrong in your code and what you are trying to do should work. Are you positive that what you think is the second doc is really being added second? You only show one doc being added. Are there already 7 docs in the index before you start? -- Ian. On Fri, May 11,

Re: Getting RuntimeException: after flush: fdx size mismatch while Indexing

2011-12-09 Thread Michael McCandless
Hmm... it looks like File.length() is somehow, sometimes lying, on your NFS filesystem. What's happening is Lucene is writing out a file, and it wrote 59540 bytes, closed the file (all with no exceptions), and then tried to verify the length was 59540 but in fact the filesystem reported 32768 byte

Re: Getting RuntimeException: after flush: fdx size mismatch while Indexing

2011-12-09 Thread Jamir Shaikh
OS : RHEL 5.5 64 bit. Filesystem: NFS Thanks for the reply. Thanks, Jamir On Fri, Dec 9, 2011 at 10:22 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Which OS/filesystem? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Dec 8, 2011 at 9:46 PM, Jamir Shaikh > wro

Re: Getting RuntimeException: after flush: fdx size mismatch while Indexing

2011-12-09 Thread Michael McCandless
Which OS/filesystem? Mike McCandless http://blog.mikemccandless.com On Thu, Dec 8, 2011 at 9:46 PM, Jamir Shaikh wrote: > I am using Lucene 3.5. I want to create around 30 million documents. > While doing Indexing I am getting the following Exception: > > Caused by: java.lang.RuntimeException:

Re: getting OutOfMemoryError

2011-06-21 Thread Ian Lea
Complicated with all those indexes. 3 suggestions: 1. Just give it more memory. 2. Profile it to find out what is actually using the memory. 3. Cut down the number of indexes. See recent threads on pros and cons of multiple indexes vs one larger index. -- Ian. On Mon, Jun 20, 2011 at 2:

Re: getting OutOfMemoryError

2011-06-20 Thread harsh srivastava
Hi Erick, In continuation to my below mails, I have a socket based multithreaded server that serves in average 1 request per second. The index size is 31GB and document count is about 22 millions. The index directories are first divided in 4 directories and then each subdivided to 21 directories.

Re: getting OutOfMemoryError

2011-06-17 Thread harsh srivastava
Hi Erick, i will gather the info and let u know. thanks harsh On 6/17/11, Erick Erickson wrote: > Please review: > http://wiki.apache.org/solr/UsingMailingLists > > You've given us no information to go on here, what are you > trying to do when this happens? What have you tried? What > is the quer

Re: getting OutOfMemoryError

2011-06-17 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists You've given us no information to go on here, what are you trying to do when this happens? What have you tried? What is the query you're running when this happens? How much memory are you allocating to the JVM? You're apparently sorting

Re: getting the number of updated documents

2011-03-10 Thread Koji Sekiguchi
Does IndexWriter (or somewhere else) have the method such that it gets the number of updated documents before commit? you have maxDocs which gives you the maxdocid-1 but this might not be super accurate since there might have been merges going on in the background. I am not sure if this number yo

Re: getting the number of updated documents

2011-03-10 Thread Simon Willnauer
hey Koji, 2011/3/10 Koji Sekiguchi : > Hello, > > Does IndexWriter (or somewhere else) have the method such that > it gets the number of updated documents before commit? you have maxDocs which gives you the maxdocid-1 but this might not be super accurate since there might have been merges going on

Re: Getting

2010-12-24 Thread Erick Erickson
>From Hossman's Apache Page: When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in th

  1   2   3   >