Re: Help running the demo program

2024-04-22 Thread Dawid Weiss
If you download the binary distribution, try this: Windows: java --module-path modules;modules-thirdparty --module org.apache.lucene.demo/org.apache.lucene.demo.IndexFiles Linux/Unix/Mac: java --module-path modules:modules-thirdparty --module org.apache.lucene.demo/org.apache.lucene.demo.IndexFil

Re: Help running the demo program

2024-04-22 Thread Siddharth Jain
Thanks Michael. I did add all the 4 modules to my classpath like this: java \ -cp $CWD/demo/build/classes/java/main:$CWD/analysis/common/build/classes/java/main:$CWD/queryparser/build/classes/java/main:$CWD/core/build/classes/java/main \ org.apache.lucene.demo.IndexFiles \ -docs $DOCS_DIR but get

Re: Help running the demo program

2024-04-22 Thread Michael Sokolov
I also found this helpful documentation by looking in the source code of SearchFiles.java: https://lucene.apache.org/core/9_10_0/demo/ On Mon, Apr 22, 2024 at 4:40 AM Stefan Vodita wrote: > > Hi Siddharth, > > If you happen to be using IntelliJ, you can run a demo class from the IDE. > It probabl

Re: Help running the demo program

2024-04-22 Thread Stefan Vodita
Hi Siddharth, If you happen to be using IntelliJ, you can run a demo class from the IDE. It probably works with other IDEs too, though I haven't tried it. Stefan On Sun, 21 Apr 2024 at 23:59, Siddharth Jain wrote: > Hello, > > I am a new user to Lucene. I checked out the Lucene repo >

Re: Help to understand the per-field formats in Lucene

2022-10-25 Thread Mikhail Khludnev
Hello McCoy. "DocValues", "KnnVectors" and "Postings" are three core principally different APIs/data structures ie docValues is data column; and postings is inverted index. By default codec defines these three formats for all fields. And per-field wrappers allow configuring separate formats for a p

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-08 Thread Aditya Varun Chadha
is not fixed yet). > > > > > > Please let us know your Jira/Github usernames if you don't see > > > > mapping(s) > > > > > > for your account in this file: > > > > > > > > > > > > > > >

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-08 Thread Vincenzo D'Amore
Hi I would like to be added, thanks. My Github account: freedev Jira: v.dam...@gmail.com On Sun, 31 Jul 2022 at 12:09, Michael McCandless wrote: > Hello Lucene users, contributors and developers, > > If you have used Lucene's Jira and you have a GitHub account as well, > please check whether y

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-07 Thread Glen Newton
gt; > > > > > > Thank You Thank You > > > > Best regards > > > > > > > > From: Michael McCandless > > > > Sent: Saturday, August 6, 2022 11:29:25 AM > > > > To: Baris Kazar

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Tomoko Uchida
hub usernames if you don't see > > > mapping(s) > > > > > for your account in this file: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/accou

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Aditya Varun Chadha
7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKR6GBFgnw$ > > > > > > > > > > > > > > > > Mike McCandless > > > > > > > > > > http://blog.mikemccandless.com< > > > > > > > > >

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Tomoko Uchida
; > > > > > > > > https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified > > > > > > Tomoko > > > > > > > > > 2022年8月7日(日) 1:36 Baris Kazar : > > > > > >

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Tomoko Uchida
2年8月7日(日) 1:36 Baris Kazar : > > > > > Thank You Thank You > > > Best regards > > > ____ > > > From: Michael McCandless > > > Sent: Saturday, August 6, 2022 11:29:25 AM > > > To: Baris Kazar > >

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Aditya Varun Chadha
gt; ____ > > > From: Michael McCandless > > > Sent: Saturday, August 6, 2022 11:29:25 AM > > > To: Baris Kazar > > > Cc: java-user@lucene.apache.org > > > Subject: Re: [HELP] Link your Apache Lucene Jira and G

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Glen Newton
azar : > > > Thank You Thank You > > Best regards > > > > From: Michael McCandless > > Sent: Saturday, August 6, 2022 11:29:25 AM > > To: Baris Kazar > > Cc: java-user@lucene.apache.org > > Subject: Re: [HELP] Link your

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Tomoko Uchida
Cc: java-user@lucene.apache.org > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account ids > before Thursday August 4 midnight (in your local time) > > OK done: > https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1 > &

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Baris Kazar
Thank You Thank You Best regards From: Michael McCandless Sent: Saturday, August 6, 2022 11:29:25 AM To: Baris Kazar Cc: java-user@lucene.apache.org Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Michael McCandless
l McCandless > *Sent:* Saturday, August 6, 2022 10:12 AM > *To:* java-user@lucene.apache.org > *Cc:* Baris Kazar > *Subject:* Re: [HELP] Link your Apache Lucene Jira and GitHub account ids > before Thursday August 4 midnight (in your local time) > > Thanks Baris, > > And

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Baris Kazar
I think so. Best regards From: Michael McCandless Sent: Saturday, August 6, 2022 10:12 AM To: java-user@lucene.apache.org Cc: Baris Kazar Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Michael McCandless
22 6:05:51 AM > To: d...@lucene.apache.org > Cc: Lucene Users ; java-dev < > java-...@lucene.apache.org> > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account ids > before Thursday August 4 midnight (in your local time) > > Hi Adam, I added your linked

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Baris Kazar
My github username is bmkazar can You please register me? Best regards From: Michael McCandless Sent: Saturday, August 6, 2022 6:05:51 AM To: d...@lucene.apache.org Cc: Lucene Users ; java-dev Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Michael McCandless
Hi Adam, I added your linked accounts here: https://github.com/apache/lucene-jira-archive/commit/c228cb184c073f4b96cd68d45a000cf390455b7c And Tomoko added Rushabh's linked accounts here: https://github.com/apache/lucene-jira-archive/commit/6f9501ec68792c1b287e93770f7a9dfd351b86fb Keep the linked

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-04 Thread Rushabh Shah
Hi, My mapping is: JiraName,GitHubAccount,JiraDispName shahrs87, shahrs87, Rushabh Shah Thank you Tomoko and Mike for all of your hard work. On Sun, Jul 31, 2022 at 3:08 AM Michael McCandless < luc...@mikemccandless.com> wrote: > Hello Lucene users, contributors and developers, > > If you hav

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-01 Thread Tomoko Uchida
Hi Atri and Christian, thanks for your reply, we already have your accounts in - https://github.com/apache/lucene-jira-archive/blob/7654c0168a86fb05e942666d4514d48966d223bb/migration/mappings-data/account-map.csv.20220722.verified#L42 - https://github.com/apache/lucene-jira-archive/blob/7654c0168a

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-01 Thread Christian Moen
Thanks. My mapping is: cm,cmoen,Christian Moen On Sun, Jul 31, 2022 at 12:08 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Hello Lucene users, contributors and developers, > > If you have used Lucene's Jira and you have a GitHub account as well, > please check whether your user i

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-01 Thread Atri Sharma
Mine is atris for github, atri for JIRA On Mon, Aug 1, 2022 at 4:03 PM Tomoko Uchida wrote: > > Hi Mike, Marcus, and Praveen: > > I verified the added two mappings - these Jira users have activity on > Lucene Jira, also corresponding GitHub accounts are valid. > - marcussorealheis > - pru30 > > T

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-01 Thread Tomoko Uchida
Hi Mike, Marcus, and Praveen: I verified the added two mappings - these Jira users have activity on Lucene Jira, also corresponding GitHub accounts are valid. - marcussorealheis - pru30 Tomoko 2022年8月1日(月) 18:40 Michael McCandless : > Thanks Praveen, > > I added your mapping here: > https://gi

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-07-31 Thread Marcus Eagan
marcussorealheis, marcussorealheis, Marcus Eagan On Sun, Jul 31, 2022 at 7:39 AM Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks, added here: > > https://github.com/apache/lucene-jira-archive/commit/d91534e67b35004f212100d73008283327f2f1e7 > > Please confirm it's right ;) > > Mike

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-07-31 Thread Michael McCandless
Thanks, added here: https://github.com/apache/lucene-jira-archive/commit/d91534e67b35004f212100d73008283327f2f1e7 Please confirm it's right ;) Mike On Sun, Jul 31, 2022 at 7:27 AM 翁剑平 wrote: > Hi, could you help to add my info, thanks a lot > jira full name: jianping weng > github id: wjp719 >

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-07-31 Thread 翁剑平
Hi, could you help to add my info, thanks a lot jira full name: jianping weng github id: wjp719 the jira issue I create before: https://issues.apache.org/jira/browse/LUCENE-10425 the github pr I submit before: https://github.com/apache/lucene/pull/780 Best Regards, jianping weng Michael McCan

Re: Help! - Max Segment name reached

2018-04-21 Thread Michael McCandless
EMAIL AND PLEASE DELETE THIS > E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES. > > -Original Message- > From: Uwe Schindler > Sent: Tuesday, April 17, 2018 4:02 PM > To: java-user@lucene.apache.org > Subject: Re: Help! - Max Segment name reached > > Hi, > > Crea

RE: Help! - Max Segment name reached

2018-04-17 Thread Stuart Goldberg
ED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES. -Original Message- From: Uwe Schindler Sent: Tuesday, April 17, 2018 4:02 PM To: java-user@lucene.apache.org Subjec

Re: Help! - Max Segment name reached

2018-04-17 Thread Uwe Schindler
Hi, Create a new empty index in a new directory and use addIndex() using the other directory with the broken index. This will copy all segments but renumber them. Uwe Am April 17, 2018 3:52:27 PM UTC schrieb Stuart Goldberg : >We have an index that has run into this bug: >https://issues.apach

Re: Help Regarding token filter

2018-03-15 Thread Michael Sokolov
Since you are writing a custom token filter, it's up to you to return successive tokens by setting the appropriate attributes when nextToken is called. Have you read the tokenstream javadocs? On Mar 15, 2018 10:35 AM, "deepu srinivasan" wrote: > Hi . > How do i split a single token and index the

Re: Help with huge index

2018-03-04 Thread Michael Sokolov
I wonder if you might not get better performance in a case like this if you were ok taking your index off line, disabling merges, performing deletions and only then enabling merges? This could be done on a copy of the index if updates can be turned off or held in a queue, so that queries could stil

Re: Help with huge index

2018-02-28 Thread Stuart Goldberg
Thanks so much. I actually found that my purging routine finished after about 35 minutes which is really acceptable given that this routine is supposed to run during the overnight period. On Feb 28, 2018 8:34 PM, "Adrien Grand" wrote: > Thanks. Deleting lots of documents can indeed trigger a lot

Re: Help with huge index

2018-02-28 Thread Adrien Grand
Thanks. Deleting lots of documents can indeed trigger a lot of work in the Lucene side. First Lucene likely needs to rewrite the live docs of all your segments and then this might trigger significant merging activity due to the fact that Lucene tries to keep the number of deleted docs reasonable so

Re: Help with huge index

2018-02-28 Thread Stuart Goldberg
I call deleteDocuments On Feb 28, 2018 8:16 PM, "Adrien Grand" wrote: > What do you mean by purging? What methods do you call? > > Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg a > écrit : > > > I have huge lucene index. On disk it's about 24Gb. > > > > > > > > I have a purging routine that is

Re: Help with huge index

2018-02-28 Thread Adrien Grand
What do you mean by purging? What methods do you call? Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg a écrit : > I have huge lucene index. On disk it's about 24Gb. > > > > I have a purging routine that is supposed to run and purge old docs. > > > > There are about 650 million docs in there and

Re: Help required in Ways to compress index size.

2018-02-28 Thread benafia salem
Hi prtahp, I think the first step you need to check if you want to reduce your index size, is avoid "storing" fields. Do a first test and check if your search performance still met your expections. 2018-02-28 12:46 GMT+01:00 prathap simha : > Hi Lucene Team, > > Greetings of the day. Thanks for

Re: Help regarding BM25Similarity

2018-01-05 Thread Parit Bansal
Thankx Adrien. I'll try this approach too. - Best Parit Bansal On 01/05/2018 10:43 AM, Adrien Grand wrote: You can use PerFieldSimilarityWrapper to have different BM25 settings per field. Le ven. 5 janv. 2018 à 10:37, Parit Bansal a écrit : Hi Robert, passing b = 0 will influence the simil

Re: Help regarding BM25Similarity

2018-01-05 Thread Adrien Grand
You can use PerFieldSimilarityWrapper to have different BM25 settings per field. Le ven. 5 janv. 2018 à 10:37, Parit Bansal a écrit : > Hi Robert, > > passing b = 0 will influence the similarity across all the fields (no?) > . I wanted it to be field specific. I think Uwe's suggestion of not > i

Re: Help regarding BM25Similarity

2018-01-05 Thread Parit Bansal
Hi Robert, passing b = 0 will influence the similarity across all the fields (no?) . I wanted it to be field specific. I think Uwe's suggestion of not indexing norms for specific fields should work better. Thankx again. - Best Parit Bansal On 01/04/2018 08:34 PM, Robert Muir wrote: You do

Re: Help regarding BM25Similarity

2018-01-05 Thread Parit Bansal
Hi Robert, passing b = 0 will influence the similarity across all the fields (no?) . I wanted it to be field specific. I think Uwe's suggestion of not indexing norms for specific fields should work better. - Best Parit Bansal On 01/04/2018 08:34 PM, Robert Muir wrote: You don't need to do

Re: Help regarding BM25Similarity

2018-01-05 Thread Parit Bansal
Hi Uwe, You are right. Thankx! :) - Best Parit Bansal On 01/04/2018 05:02 PM, Uwe Schindler wrote: How about just indexing the field without norms? Uwe Am January 4, 2018 3:58:27 PM UTC schrieb Parit Bansal : Hi, I am trying to tweak BM25Similarity for my use case wherein, I want to avoid

Re: Help regarding BM25Similarity

2018-01-04 Thread Robert Muir
You don't need to do any subclassing for this: just pass parameter b=0 to the constructor. On Thu, Jan 4, 2018 at 10:58 AM, Parit Bansal wrote: > Hi, > > I am trying to tweak BM25Similarity for my use case wherein, I want to avoid > the effects of field-length normalization for certain fields (re

Re: Help regarding BM25Similarity

2018-01-04 Thread Uwe Schindler
How about just indexing the field without norms? Uwe Am January 4, 2018 3:58:27 PM UTC schrieb Parit Bansal : >Hi, > >I am trying to tweak BM25Similarity for my use case wherein, I want to >avoid the effects of field-length normalization for certain fields >(return a constant value irrespective

Re: help for apache lucene

2016-12-26 Thread Uwe Schindler
Hi, It looks like you have different Lucene versions on your classpath. Make sure that you have every artifact JAR file only in one single version. Also make sure all different Lucene artifacts have same version. Uwe Am 26. Dezember 2016 10:57:30 MEZ schrieb "RAGOT, vincent" : >Hello, >I try

Re: help for a migration error to 6.1 version

2016-08-18 Thread Cristian Lorenzetto
using TYPE.setDocValuesType(DocValuesType.SORTED); it works. I didnt undestand the reasons. Maybe for for fast grouping is necessary maybe to sorting , so algo can find distinct groups 2016-08-18 17:40 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > in my old code > > i create

Re: Help Relevance Feedback (Rocchio) with lucene

2016-06-28 Thread Ahmet Arslan
Hi Andres, While there can be other ways, in general term vectors are used to extract "important terms" from top-k documents returned by the initial query. Please see getTopTerms() method in http://www.cortecostituzionale.it/documenti/news/advancedluceneeu_69.pdf Ahmet On Tuesday, June 28, 20

Re: Help with a fieldcomparator!

2015-01-17 Thread Victor Podberezski
Erick: (sorry, I misspelled your name in my last email ) I tried a bunch of solutions none worked as I expected. Basically none of them sorts the documents using the pattern as I expect. This is my simplified code: public class PatternFieldComparatorSource extends FieldComparatorSource {

Re: Help with a fieldcomparator!

2015-01-17 Thread Erick Erickson
Ah, OK. H.L. Mencken wrote something like: "For every complex problem there is a solution that is simple, elegant, and wrong". I specialize in these... I don't have a good answer for your question then. How is what you're trying failing? Best, Erick On Fri, Jan 16, 2015 at 4:59 PM, Victor Podber

Re: Help with a fieldcomparator!

2015-01-16 Thread Victor Podberezski
Erik, Thanks for your reply. I wrote a simplification of the problem. Not only the values in the field that can be sorted are "val1, val2,..." . they can also be "patternX1, patternX2", etc. and in that case I need to sort according to different criteria. They're a lot of differents patterns but

Re: Help with a fieldcomparator!

2015-01-16 Thread Erick Erickson
Personally I would do this on the ingestion side with a new field. That is, analyze the input field when you were indexing the doc, extract the min value from any numbers, and put that in a new field. Then it's simply sorting by the new field. This is likely to be much more performant than reproces

Re: Help using ShingleFilter/NGramTokenizer: Could not find implementing class for org.apache.lucene.analysis.tokenattributes.OffsetAttribute

2014-01-24 Thread Koji Sekiguchi
Hi Russell, Seems that the error messages says that the implementing class for OffsetAttribute cannot be found in your classpath on the (Pig?) environment. There seems to be implementing classes OffsetAttributeImpl and Token, according to Javadoc: http://lucene.apache.org/core/4_6_0/core/org/a

Re: Help in Lucene Postings Highlighter..

2013-11-25 Thread VIGNESH S
Hi Mike, I indexed 1 GB document with postingshighlighter and Fast Vector Highlighter. To my Surprise PostingsHighlighter took almost 1.6 times FastVectorHighlighter.. I thought storing document offset will take less space compared to Storing Term Vector. On Mon, Nov 25, 2013 at 7:04 PM, M

Re: Help in Lucene Postings Highlighter..

2013-11-25 Thread Michael McCandless
Yes, you need to store it; this is where PH gets the "original" content from for highlighting. Alternatively you can store/retrieve this content yourself and pass it to PH. But, what NPE did you hit? We should improve that if we can... Mike McCandless http://blog.mikemccandless.com On Mon, N

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Tommaso Teofili
Hi, you can have a look at the (early stage) Lucene classification module on trunk [1], see also a brief introduction given at last ApacheCon EU [2]. Hope this helps, Tommaso [1] : http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/ [2] : http://www.slideshare.net/teofili/tex

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Shashi Kant
http://www.slideshare.net/teofili/text-categorization-with-lucene-and-solr On Wed, Jan 9, 2013 at 5:46 AM, VIGNESH S wrote: > Hi, > > can anyone suggest me how can i use lucene for text classification. > > -- > Thanks and Regards > Vignesh Srinivasan > > -

Re: Help needed: search is returning no results

2012-12-18 Thread Ramon Casha
I verified that the index was correct using the app Luke, tested some queries using it then replicated the results via code. It seems I need to refine the token parsing but at least I have something now. Ramon Casha On 18 December 2012 15:50, Ramon Casha wrote: > Hmm ok I got something. > > > R

Re: Help needed: search is returning no results

2012-12-18 Thread Ramon Casha
Hmm ok I got something. Ramon Casha On 18 December 2012 15:44, Ramon Casha wrote: > I converted them to TextField but the result is the same. > > doc.add(new TextField("text", text.toString(), Store.YES)); > > The search always returns an empty array. > > Ramon Casha > > > On 18 December 201

Re: Help needed: search is returning no results

2012-12-18 Thread Ramon Casha
I converted them to TextField but the result is the same. doc.add(new TextField("text", text.toString(), Store.YES)); The search always returns an empty array. Ramon Casha On 18 December 2012 15:35, Jack Krupansky wrote: > Maybe you wanted "text" fields that are analyzed and tokenized, as o

Re: Help needed: search is returning no results

2012-12-18 Thread Jack Krupansky
Maybe you wanted "text" fields that are analyzed and tokenized, as opposed to string fields which are not analyzed and stored and queried exactly as-is. See: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/TextField.html But, show us some of your indexed data and queries th

Re: Help needed: search is returning no results

2012-12-18 Thread Ian Lea
I think you need TextField rather than StringField. See also http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F -- Ian. On Tue, Dec 18, 2012 at 2:14 PM, Ramon Casha wrote: > I have just downloaded and set up Lucene 4.0.0 to implement a search > faci

Re: Help for multi-language support

2012-12-04 Thread parnab kumar
Hi Deepak , Lucene already has multi-language support . For any language you just need to write the custom Analyzer for that language .While indexing you can configure the indexer to use the custom analyzer as and when needed . During searching also, the same applies .You just need to provide the

Re: Help on DOCX and XLSX

2012-03-07 Thread Ian Lea
earch text is being performed on indexing, then we are > filtering the documents by reading document record from database table > for the above key values. > > Thanks > Prasad > > > > -Original Message- > From: Ian Lea [mailto:ian@gmail.com] > Sent: Wed

RE: Help on DOCX and XLSX

2012-03-07 Thread Prasad KVSH
Prasad -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Wednesday, March 07, 2012 4:03 PM To: java-user@lucene.apache.org Subject: Re: Help on DOCX and XLSX You'll have to find something that parses the formats you are interested in and extracts the text you

Re: Help on DOCX and XLSX

2012-03-07 Thread Ian Lea
You'll have to find something that parses the formats you are interested in and extracts the text you want. Apache Tika comes to mind. Why are you using such an old version of Lucene? Why aren't you using Solr? That might just work for you out of the box. See also http://www.lucidimagination.c

Re: Help running out of files

2012-01-09 Thread Charlie Hubbard
Ian, Thanks for you help and patients with me. I started to look at porting this to a simple self contained example, and I finally found the error in my code. IndexSearcher.close() doesn't close the underlying IndexReader when using the constructor new IndexSearcher( IndexReader ). Went back an

Re: Help running out of files

2012-01-09 Thread Ian Lea
Charlie >From the FAQ >http://wiki.apache.org/lucene-java/LuceneFAQ#Does_Lucene_allow_searching_and_indexing_simultaneously.3F "... an IndexReader only searches the index as of the "point in time" that it was opened. Any updates to the index, either added or deleted documents, will not be visi

Re: Help running out of files

2012-01-09 Thread Charlie Hubbard
Ian, >From reading the docs it's seems clear all I need to do is call IndexWriter.commit() in order for the changes to my single IndexWriter to be visible to the IndexReader and hence my single IndexSearcher. When you say "you need to close old readers, you need to reopen readers to pick up chang

Re: Help running out of files

2012-01-09 Thread Ian Lea
It's hard, impossible for me, to figure out from this what your problem might be, Multiple indexes, MultiReader, multiple writers (?), multiple threads? However I can make some statements: Lucene doesn't leak files, you need to close old readers, you need to reopen readers to pick up changes. Ha

Re: Help running out of files

2012-01-07 Thread Charlie Hubbard
Ok I think I've fixed my original problem by converting everything to use commit() and never call close() except when the server shuts down. This means I'm not closing my IndexWriter or IndexSearcher after opening them. I periodically call commit() on the IndexWriter after indexing my documents.

Re: Help running out of files

2012-01-06 Thread Ian Lea
Something that did change at some point, can't remember when, was the way that discarded but not explicitly closed searchers/readers are handled. I think that they used to get garbage collected, causing open files to be closed, but now need to be explicitly closed. Sounds to me like you are openi

Re: Help running out of files

2012-01-06 Thread Erick Erickson
Can you show the code? In particular are you re-opening the index writer? Bottom line: This isn't a problem anyone expects in 3.1 absent some programming error on your part, so it's hard to know what to say without more information. 3.1 has other problems if you use spellcheck.collate, you might

Re: Help running out of files

2012-01-06 Thread Charlie Hubbard
Thanks for the reply. I'm still having trouble. I've made some changes to use commit over close, but I'm not seeing much in terms of changes on what seems like ever increasing open file handles. I'm developing on Mac OS X 10.6 and testing on Linux CentOS 4.5. My biggest problem is I can't tell

Re: Help running out of files

2012-01-02 Thread Simon Willnauer
hey charlie, there are a couple of wrong assumptions in your last email mostly related to merging. mergefactor = 10 doesn't mean that you are ending up with one file neither is it related to files. Yet, my first guess is that you are using CompoundFileSystem (CFS) so each segment corresponds to a

Re: Help running out of files

2012-01-02 Thread Charlie Hubbard
I'm beginning to think there is an issue with 3.1 that's causing this. After looking over my code again I forgot that the mechanism that does the indexing hasn't changed, and the index IS being closed between cycles. Even when using push vs pull. This code used to work on 2.x lucene, but I had t

Re: Help: About performance of search with sorting.

2011-12-20 Thread Erick Erickson
What are you specifying for your sort criteria? And what kind of field is it we're talking about here? Best Erick On Tue, Dec 20, 2011 at 8:45 AM, Qiurun wrote: > Dear all, > > I select some of docs that meet some criteria by using TopDocs search(Query > query, int n). Also It's easy to select

RE: [Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread karl.wright
The site looks great. And thank you for including the ManifoldCF link. ;-) Karl -Original Message- From: ext Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 10, 2011 10:09 AM To: solr-u...@lucene.apache.org; java-user@lucene.apache.org Subject: [Help Wanted] Graphic

Re: Help needed on Ant build script for creating Lucene index

2011-05-12 Thread Erik Hatcher
There's an example build file, see It's pretty outdated stuff there though. It has some flexibility for a custom document handler in order to allow full control over how a File gets turned into a Lucene Document

Re: Help with delimited text

2011-04-07 Thread Mark Wiltshire
Thanks Ian, your a star :-)RMarkOn 7 Apr 2011, at 11:18, Ian Lea wrote:Mark - I've uploaded some code to http://pastebin.com/mqSVcWUi thatindexes and searches file system paths.  It demonstrates what I'vebeen trying to suggest and may help you get your search up andrunning.--Ian.On Thu, Apr 7, 2011

Re: Help with delimited text

2011-04-07 Thread Ian Lea
Mark - I've uploaded some code to http://pastebin.com/mqSVcWUi that indexes and searches file system paths. It demonstrates what I've been trying to suggest and may help you get your search up and running. -- Ian. On Thu, Apr 7, 2011 at 8:18 AM, Mark Wiltshire wrote: > Hi Thanks Ian for you hel

Re: Help with delimited text

2011-04-07 Thread Mark Wiltshire
Hi Thanks Ian for you help on this, its driving me nuts :-) The StandardAnalyser is only used on the search query term being passed also. But In this case I am just adding a filter to the search. The actual category may be /Top/Books/Accountancy/10_Compliance/Internatio

Re: Help with delimited text

2011-04-06 Thread Erick Erickson
A TermQuery is really dumb. It doesn't do anything at all to the input, it assumes you've done all that up front. Try parsing a query rather than using TermQuery And I suspect you'll have problems with casing, but that's another story Best Erick On Wed, Apr 6, 2011 at 6:33 AM, Mark Wilts

Re: Help with delimited text

2011-04-06 Thread Mark Wiltshire
Thanks Ian, I have managed to do that and through Luke I get My expected results. Here is now my Index Code.                StringTokenizer st = buildSubjectArea(dbConnection, oid);                int tokenCount = 0;                while (st.hasMoreTokens()){                tokenCount++;         

Re: Help with delimited text

2011-04-06 Thread Ian Lea
You can add multiple values for a field to a single document. Document doc = new Document(); String[] paths = whatever.split(","); for (String p : paths) { doc.add(new Field("path", p, whatever ...); } For searching, assuming you only want to be able to wildcard on path delimiters, you could i

Re: Help with delimited text

2011-04-05 Thread Mark Wiltshire
To add more information I am then wanting to search this field using part or all of the path using wildcards i.e. Search category_path with /Top/My Prods* Hi java-users I need some help. I am indexing categories into a single field category_path

Re: Help!

2011-03-01 Thread Lance Norskog
Check out the Mahout project: mahout.apache.org -> there is a lucene-based text classifier project in there. Lance On Tue, Mar 1, 2011 at 9:25 PM, Sundus Hassan wrote: > I am doing MS-Thesis on content-based text categorization. > For This purpose I intend to use LUCENE.I need some > help/tutori

RE: Help Overriding behavior in BooleanQuery scorer

2010-12-07 Thread Ryan Aylward
way to do that to make a copy of the classes? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, December 07, 2010 4:57 AM To: java-user@lucene.apache.org Subject: Re: Help Overriding behavior in BooleanQuery scorer I haven't a clue about the pac

Re: Help Overriding behavior in BooleanQuery scorer

2010-12-07 Thread Erick Erickson
I haven't a clue about the package protected thing, but you may not need to go there. This sounds a lot like DisjunctionMaxQuery, have you looked at it? http://lucene.apache.org/java/3_0_2/api/all/index.html Best Erick On Tue, Dec 7, 2010 a

RE: Help with Numeric Range

2010-06-24 Thread Uwe Schindler
ilto:t...@spidertracks.co.nz] Sent: Thursday, June 24, 2010 8:26 AM To: Uwe Schindler Subject: RE: Help with Numeric Range Hey Uwe. I've implemented the same test with a RAM store, and it doesn't work. Maybe I'm doing something wrong, but the tests all appear to be in order and work the way I w

RE: Help with Numeric Range

2010-06-23 Thread Uwe Schindler
; http://www.thetaphi.de eMail: u...@thetaphi.de From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, June 24, 2010 7:36 AM To: 't...@spidertracks.co.nz' Subject: RE: Help with Numeric Range Are you sure that the term enum return the terms in correct order? For all types of R

RE: Help with Numeric Range

2010-06-23 Thread Todd Nine
Hi Uwe, Thank you for your help, it is greatly appreciated. Unfortunately, my tests all fail except for RangeInclusive. I've changed the step to be 6 as per your recommendation. I had it at max to eliminate step precision as the cause of the test failure. Essentially, all keys in Cassandra a

RE: Help with Numeric Range

2010-06-22 Thread Uwe Schindler
Hi Todd, I am not sure if I understand your problem correctly. I am not familiar with Lucandra/Cassandra at all, but if Lucandra implements the IndexWriter and IndexReader according to the documentation, numeric queries should work. A NumericField internally creates a TokenStream and "analyzes"

Re: Help wanted with Indexing PDF Documents

2010-03-02 Thread Ian Lea
Sounds like a question for the PDFBox mailing list. Once you've got the relevant info out of the PDF you can index it however you like. -- Ian. On Tue, Mar 2, 2010 at 4:11 PM, Ching Zheng wrote: > Hi, > I have about 50 PDF douments with size of each is around 10MB. I am using > PDFbox for pars

RE: help customfilter with incrementToken() and AttributeSource APIs

2009-12-24 Thread Digy
The source code for LowerCaseFilter or StopFilter can be a good starting point. DIGY -Original Message- From: maxSchlein [mailto:m_schl...@hotmail.com] Sent: Thursday, December 24, 2009 7:10 PM To: java-user@lucene.apache.org Subject: help customfilter with incrementToken() and AttributeS

Re: Help me with this error on indexing

2009-11-21 Thread Michael McCandless
Are you absolutely certain you are closing the IndexReader after each iteration? That exception looks like there is still a file open... Committing after every added document is hideously inefficient (though, should not cause exceptions like what you're seeing). It's best to commit (or, simply c

Re: Help me with this error on indexing

2009-11-21 Thread Simon Willnauer
Try to switch you AntiVirus SW off I you have any. simon 2009/11/21 Fabrício Raphael : > This hapened only on Windows, on Ubuntu it don't happen. > > And I corrected this problem by removing the commit, and put it in the end > of the addition of all documents. > > On Fri, Nov 20, 2009 at 9:14 PM,

Re: Help me with this error on indexing

2009-11-20 Thread Fabrício Raphael
This hapened only on Windows, on Ubuntu it don't happen. And I corrected this problem by removing the commit, and put it in the end of the addition of all documents. On Fri, Nov 20, 2009 at 9:14 PM, Erick Erickson wrote: > What operating system are you running on? This sounds like Windows behavi

Re: Help me with this error on indexing

2009-11-20 Thread Erick Erickson
What operating system are you running on? This sounds like Windows behavior when some other process is holding the file open. Erick 2009/11/20 Fabrício Raphael > Hi, > > I am evaluating several search algoritms, and I iterate on each. In each > interation I delete the index directory, index

  1   2   3   4   >