Fwd: How to retain % sign against numbers in lucene indexing/ search

2023-07-13 Thread Amitesh Kumar
*Warm Regards,* *Amitesh K* -- Forwarded message - From: Amitesh Kumar Date: Wed, Jul 12, 2023 at 7:03 AM Subject: How to retain % sign against numbers in lucene indexing/ search To: Hi Group, I am facing a requirement change to get % sign retained in searches. e.g Sample

lucene indexing stuck with NFS storage mount

2021-05-10 Thread peterbasut...@gmail.com
Hi all, We are indexing documents using apache lucene using several parallel indexing pipelines(java process) to NFS mounted directory. All of them follows same code and workflow most of the pipelines succeeds without any issue, but only only few indexing pipelines remains in idle and in RUN state

Deleting document from Lucene indexing not working in version 34

2017-02-22 Thread har...@oneit.com.au
this message in context: http://lucene.472066.n3.nabble.com/Deleting-document-from-Lucene-indexing-not-working-in-version-34-tp4321911.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-05-23 Thread Michael McCandless
I finally dug into this, and it turns out the nightly benchmark I run had bad bottlenecks such that it couldn't feed documents quickly enough to Lucene to take advantage of the concurrent hardware in beast2. I fixed that and just re-ran the nightly run and it shows good gains: https://plus.google.

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-15 Thread Robert Muir
you won't see indexing improvements there because the dataset in question is wikipedia and mostly indexing full text. I think it may have one measly numeric field. On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić wrote: > (replying to my original email because I didn't get people's replies, even

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Otis Gospodnetić
(replying to my original email because I didn't get people's replies, even though I see in the archives people replied) Re BJ and beast2 upgrade. Yeah, I saw that, but * if there is no indexing throughput improvement after that, does that mean that those particular indexing tests happen to be

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Stephen Green
As someone who runs Lucene on big hardware, I'd be very interested to see the tuning parameters when you do get a chance.. On Thu, Apr 14, 2016 at 3:41 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow > got slower

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Michael McCandless
Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow got slower :) I haven't re-tuned indexing threads, IW buffer size yet for this new hardware ... Mike McCandless http://blog.mikemccandless.com On Thu, Apr 14, 2016 at 2:09 PM, Ishan Chattopadhyaya wrote: > Wow, 72 cores? T

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Ishan Chattopadhyaya
Wow, 72 cores? That sounds astounding. Are they dual Xeon E5 2699 v3 CPUs with 18 cores each, with hyperthreading = 18*2*2=72 threads? On Thu, Apr 14, 2016 at 11:33 PM, Dawid Weiss wrote: > The GC change is after this: > > BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM) > > which leads

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Dawid Weiss
The GC change is after this: BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM) which leads me to believe these results are not comparable (different machines, architectures, disks, CPUs perhaps?). Dawid On Thu, Apr 14, 2016 at 7:13 PM, Otis Gospodnetić wrote: > Hi, > > I was looking a

Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Otis Gospodnetić
Hi, I was looking at Mike's http://home.apache.org/~mikemccand/lucenebench/indexing.html secretly hoping to spot some recent improvements in indexing throughput but instead it looks like: * indexing throughput hasn't really gone up in the last ~5 years * indexing was faster in 2014, but then

Re: Lucene indexing speed on NVMe drive

2015-05-01 Thread Michael McCandless
Hyper-threading should help Lucene indexing go faster, when it's not IO bound ... I found 20 threads (on 12 real cores, 24 with HT) to be fastest in the nightly benchmark (http://people.apache.org/~mikemccand/lucenebench/indexing.html). But it's curious you're unable to saturate o

RE: Lucene indexing speed on NVMe drive

2015-04-30 Thread Anahita Shayesteh-SSI
AM To: java-user@lucene.apache.org Cc: Anahita Shayesteh-SSI Subject: Re: Lucene indexing speed on NVMe drive : Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : proc

Re: Lucene indexing speed on NVMe drive

2015-04-30 Thread Chris Hostetter
: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : processor ,40 with hyperthreading, 64G Memory) and study indexing speed ... : I get best performance

Lucene indexing speed on NVMe drive

2015-04-30 Thread Anahita Shayesteh-SSI
Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. I am using nightlybench for indexing wiki (1K docs) with similar parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with hyperthreading, 64G Memory) and s

Re: Making lucene indexing multi threaded

2014-10-28 Thread Erick Erickson
PM, Jason Wu wrote: > Hi Gary, > > Thanks for your response. I only call the commit when all my docs are added. > > Here is the procedure of my Lucene indexing and re-indexing: > >1. If index data exists inside index directory, remove all the index >data. >

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Gary, Thanks for your response. I only call the commit when all my docs are added. Here is the procedure of my Lucene indexing and re-indexing: 1. If index data exists inside index directory, remove all the index data. 2. Create IndexWriter with 256MB RAMBUFFERSIZE 3. Process

Re: Making lucene indexing multi threaded

2014-10-27 Thread G.Long
ext: http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166116.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
n and take 22 mins. Did you have any similar experience like the above before? Thank you, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/Making-lucene-indexing-multi-threaded-tp4087830p4166116.html Sent from the Lucene - Java Users mailing list archive at Nabbl

RE: Making lucene indexing multi threaded

2014-10-27 Thread Fuad Efendi
@lucene.apache.org Subject: Re: Making lucene indexing multi threaded Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it? Thank you, Jason -- View this message in context: http

Lucene Indexing performance issue

2014-10-22 Thread Jason Wu
Hi Team, I am a new user of Lucene 4.8.1. I encountered a Lucene indexing performance issue which slow down my application greatly. I tried several ways from google searchs but still couldn't resolve it. Any suggestions from your experts might help me a lot. One of my application uses the l

[ALFRESCO] - lucene indexing

2014-08-04 Thread Tristan
not sure if i'm in the right place but, looking for help with lucene indexing in alfresco. It looks like indexing is turned on however, i'm specifically having issues with not being able to query values on a custom property in a custom model. I added the index enable on the field but

Re: query regarding Lucene Indexing and searching

2014-03-02 Thread Jack Krupansky
-user@lucene.apache.org Subject: query regarding Lucene Indexing and searching Sir i am PG student, my research topic is to optimize the indexing file [reduce index file size, RAM usage, CPU utilization, and create index with payload to improve searching speed]. Currently working scope is Desktop

query regarding Lucene Indexing and searching

2014-03-02 Thread Mrugendra
Sir i am PG student, my research topic is to optimize the indexing file [reduce index file size, RAM usage, CPU utilization, and create index with payload to improve searching speed]. Currently working scope is Desktop search engine 1.i am using lucene for indexing the pdf files[indexing file nam

Re: Making lucene indexing multi threaded

2013-09-02 Thread Danil ŢORIN
takes. 9 times out of 10, the bottleneck is here. > > As a comparison, I can index 3-4K docs/second on my laptop. > > This is using Solr and is the Wikipedia dump so the docs > > are several K each. > > > > So, if you're going to multi-thread, you'll probably want

Re: Making lucene indexing multi threaded

2013-09-02 Thread nischal reddy
e data and feed that > through a separate thread that actually does the indexing, > you don't want multiple IndexWriters active at once. > > FWIW, > Erick > > > > On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy > wrote: > > > Hi, > > > > I am thi

Re: Making lucene indexing multi threaded

2013-09-02 Thread nischal reddy
Erick > > > > On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy > wrote: > > > Hi, > > > > I am thinking to make my lucene indexing multi threaded, can someone > throw > > some light on the best approach to be followed for achieving this. > > > > I

Re: Making lucene indexing multi threaded

2013-09-02 Thread Adrien Grand
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed which gives good advices on how to improve Lucene indexing speed. -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-u

Re: Making lucene indexing multi threaded

2013-09-02 Thread Erick Erickson
#x27;ll probably want to multi-thread the acquisition of the data and feed that through a separate thread that actually does the indexing, you don't want multiple IndexWriters active at once. FWIW, Erick On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy wrote: > Hi, > > I am thinki

Making lucene indexing multi threaded

2013-09-02 Thread nischal reddy
Hi, I am thinking to make my lucene indexing multi threaded, can someone throw some light on the best approach to be followed for achieving this. I will give short gist about what i am trying to do, please suggest me the best way to tackle this. What am i trying to do? I am building an index

Re: Lucene Indexing on NFS

2012-12-19 Thread Ian Lea
Use SimpleFSLockFactory. See the javadocs about locks being left behind on abnormal JVM termination. There was a thread on this list a while ago about some pros and cons of using lucene on NFS. 2-Oct-2012 in fact. http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/thread -- I

Lucene Indexing on NFS

2012-12-19 Thread Bowden Wise
Hello, I have been getting the following lock error when attempting to open an index writer to add new documents to an index. org.apache.lucene.store.LockObtainFailedException Lock obtain timed out: NativeFSLock@/opt/shared/data/CTXTMNG/PAC_INDEX/lucene/aero/prod/index/write.lock I believe this i

Re: Many File Descriptors Which Showing As Deleted Related To Lucene Indexing, But Not Emptied

2011-07-21 Thread Michael McCandless
This is expected, when you have a reader still open on a point-in-time snapshot of the index, yet the writer is still indexing/merging. The writer will delete old files, but the reader still has them open, so you see those "(deleted)" entries in the lsof output. Mike McCandless http://blog.mikem

Re: Lucene indexing & Searching

2011-06-08 Thread Pranav goyal
Oh sry, I got my error and it worked. Thanks On Wed, Jun 8, 2011 at 3:57 PM, Pranav goyal wrote: > import java.io.File; > import java.io.IOException; > import java.util.Collection; > import java.util.Iterator; > import java.util.List; > import java.util.Map; > > import org.apache.lucene.analysi

Lucene indexing & Searching

2011-06-08 Thread Pranav goyal
import java.io.File; import java.io.IOException; import java.util.Collection; import java.util.Iterator; import java.util.List; import java.util.Map; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document;

Re: Lucene Indexing

2011-06-07 Thread bmdakshinamur...@gmail.com
If i understand the requirement correctly, contract is one document in your system which in turn contains some *'n*' fields. Contract_ID is the key in all the documents(Structures). Contract_ID is the only field you want to retrieve no matter on what field you search for. If this is the case, store

Lucene Indexing

2011-06-06 Thread Pranav goyal
Hi all, Got stuck at a place and not able to think what should I do. I have one structure which I have to index. Let say the structure name is Contract which has a unique Contract_ID. Let say I have 50 contracts which I have to index. Now each contract has let say 100 different keys with their va

Re: Lucene Indexing

2011-06-06 Thread Anshum
Yes, You'd need to delete the document and then re-add a newly created document object. You may use the key and delete the doc using the Term(key, value). -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Jun 6, 2011 at 4:45 PM, Pranav goyal wrote: > Hi Anshum, > > Thanks for answering my que

Re: Lucene Indexing

2011-06-06 Thread Pranav goyal
Hi Anshum, Thanks for answering my question. By this I got to know that I cannot update without deleting my document. So whenever I am indexing the documents first I need to check whether the particular key exists in the document or not and if it exists I need to delete it and add the updated one

Re: Lucene Indexing

2011-06-06 Thread Anshum
Hii Pranav, By what you've mentioned, it looks like you want to modify a particular document (or all docs) by adding a particular field in the document(s). As of right now, its not possible to modify a document inside a lucene index. That is due to the way the index is structured. The only way as

Lucene Indexing

2011-06-05 Thread Pranav goyal
Hi all, I am a newbie to lucene. I have successfully created my lucene index. But I am not getting how to invalidate previous indexes whenever I add/delete/update any field in my lucene index. Please help me out. for better understanding I have wrote my indexing function : StandardAnalyzer analy

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Shuai Weng To: java-user@lucene.apache.org Sent: Fri, August 20, 2010 5:47:31 PM Subject: Re: lucene indexing configuration Hey, Currently we have indexed some biological

Re: lucene indexing configuration

2010-08-20 Thread Otis Gospodnetic
://search-lucene.com/ - Original Message > From: Shuai Weng > To: java-user@lucene.apache.org > Sent: Fri, August 20, 2010 5:47:31 PM > Subject: Re: lucene indexing configuration > > > Hey, > > Currently we have indexed some biological full text pages, I wa

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
Hey, Currently we have indexed some biological full text pages, I was wondering how to config the schema.xml such that the gene names 'met1', 'met2', 'met3' will be treated as different words. Currently they are all mapped to 'met'. Thanks, Shuai

Re: Lucene Indexing out of memory

2010-03-15 Thread Michael McCandless
at 8:27 AM, ajay_gupta >>> wrote: >>> >>>> >>> >>>>> >>> >>>>> Hi, >>> >>>>> It might be general question though but I couldn't find the answer >>> yet. >>> >>>>> I &g

Re: Lucene Indexing out of memory

2010-03-14 Thread ajay_gupta
ve around 90k documents sizing around 350 MB. Each document >> contains >> >>>>> a >> >>>>> record which has some text content. For each word in this text I >> want >> >>>>> to >> >>>>> store context for that word a

Re: Lucene Indexing out of memory

2010-03-14 Thread ajay_gupta
n of this method and I >>>>> observed that after each call of update_context memory increases and >>>>> when >>>>> it >>>>> reaches around 65-70k it goes outofmemory so somewhere memory is >>>>> increasing >>>>>

Re: Lucene Indexing out of memory

2010-03-04 Thread Michael McCandless
ystem.gc() to release >>>> memory >>>> and I also tried various other parameters like >>>> context_writer.setMaxBufferedDocs() >>>> context_writer.setMaxMergeDocs() >>>> context_writer.setRAMBufferSizeMB() >>>> I set these parame

Re: Lucene Indexing out of memory

2010-03-04 Thread Ian Lea
rameters smaller values as well but nothing worked. >>> >>> Any hint will be very helpful. >>> >>> Thanks >>> Ajay >>> >>> >>> Michael McCandless-2 wrote: >>> > >>> > The worst case RAM usage for Lucene i

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
ument -- it must flush after the doc has been fully >> > indexed. >> > >> > This past thread (also from Paul) delves into some of the details: >> > >> > http://lucene.markmail.org/thread/pbeidtepentm6mdn >> > >> > But it's not clear whe

Re: Lucene Indexing out of memory

2010-03-03 Thread Erick Erickson
gt; > more details about the docs, or, some code fragments, could help shed > > light. > > > > Mike > > > > On Tue, Mar 2, 2010 at 8:47 AM, Murdoch, Paul > > wrote: > >> Ajay, > >> > >> Here is another thread I started on the same issue. > >

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
e. >> >> http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe >> n-indexing-large-files >> >> Paul >> >> >> -Original Message- >> From: java-user-return-45254-paul.b.murdoch=saic@lucene.apache.org >> [mailto

Re: Lucene Indexing out of memory

2010-03-03 Thread Michael McCandless
5254-paul.b.murdoch=saic@lucene.apache.org > [mailto:java-user-return-45254-paul.b.murdoch=saic@lucene.apache.org > ] On Behalf Of ajay_gupta > Sent: Tuesday, March 02, 2010 8:28 AM > To: java-user@lucene.apache.org > Subject: Lucene Indexing out of memory > > > Hi, &g

Re: Lucene Indexing out of memory

2010-03-03 Thread Erick Erickson
> and > >>>>> for each word in that document I am appending fixed number of > >>>>> surrounding > >>>>> words. To do that first I search in existing indices if this word > >>>>> already > >

Re: Lucene Indexing out of memory

2010-03-03 Thread Ian Lea
d the new context >>>>> and >>>>> update the document. In case no context exist I create a document with >>>>> fields "word" and "context" and add these two fields with values as >>>>> word >>>>> value and

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
t;> fields "word" and "context" and add these two fields with values as >>>> word >>>> value and context value. >>>> >>>> I tried this in RAM but after certain no of docs it gave out of memory >>>>

Re: Lucene Indexing out of memory

2010-03-02 Thread Erick Erickson
;> error > >> so I thought to use FSDirectory method but surprisingly after 70k > >> documents > >> it also gave OOM error. I have enough disk space but still I am getting > >> this > >> error.I am not sure even for disk b

Re: Lucene Indexing out of memory

2010-03-02 Thread Ian Lea
; documents >>> it also gave OOM error. I have enough disk space but still I am getting >>> this >>> error.I am not sure even for disk based indexing why its giving this >>> error. >>> I thought disk based indexing will be slow but atleast it will be >>

Re: Lucene Indexing out of memory

2010-03-02 Thread ajay_gupta
disk space but still I am getting >> this >> error.I am not sure even for disk based indexing why its giving this >> error. >> I thought disk based indexing will be slow but atleast it will be >> scalable. >> Could so

RE: Lucene Indexing out of memory

2010-03-02 Thread Murdoch, Paul
-paul.b.murdoch=saic@lucene.apache.org ] On Behalf Of ajay_gupta Sent: Tuesday, March 02, 2010 8:28 AM To: java-user@lucene.apache.org Subject: Lucene Indexing out of memory Hi, It might be general question though but I couldn't find the answer yet. I have around 90k documents sizing around 3

RE: Lucene Indexing out of memory

2010-03-02 Thread Murdoch, Paul
cene.apache.org ] On Behalf Of ajay_gupta Sent: Tuesday, March 02, 2010 8:28 AM To: java-user@lucene.apache.org Subject: Lucene Indexing out of memory Hi, It might be general question though but I couldn't find the answer yet. I have around 90k documents sizing around 350 MB. Each document con

Re: Lucene Indexing out of memory

2010-03-02 Thread Erick Erickson
0k documents > it also gave OOM error. I have enough disk space but still I am getting > this > error.I am not sure even for disk based indexing why its giving this error. > I thought disk based indexing will be slow but atleast it will be scalable. > Could someone suggest what could b

Lucene Indexing out of memory

2010-03-02 Thread ajay_gupta
ave enough disk space but still I am getting this error.I am not sure even for disk based indexing why its giving this error. I thought disk based indexing will be slow but atleast it will be scalable. Could someone suggest what could be the issue ? Thanks Ajay -- View this message in context: htt

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
nd field2? > > > > Thanks > > > > > Date: Mon, 16 Nov 2009 09:44:35 -0800 > > > Subject: Re: What is the best way to handle the primary key case during > > luceneindexing > > > From: jake.man...@gmail.com > > > To: java-user@lucene.apache

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Jake Mannix
09 09:44:35 -0800 > > Subject: Re: What is the best way to handle the primary key case during > luceneindexing > > From: jake.man...@gmail.com > > To: java-user@lucene.apache.org > > > > The usual way to do this is to use: > > > >IndexWriter.upd

RE: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
ry key case during > lucene indexing > From: jake.man...@gmail.com > To: java-user@lucene.apache.org > > The usual way to do this is to use: > >IndexWriter.updateDocument(Term, Document) > > This method deletes all documents with the given Term in it (this

RE: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
What I mean is that for one index, client can defined multi field in the index as the primary key (composite key). > Date: Mon, 16 Nov 2009 12:45:40 -0500 > Subject: Re: What is the best way to handle the primary key case during > lucene indexing > From: erickerick...@gm

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
Sorry, forgot to add "then re-add the documents in question". On Mon, Nov 16, 2009 at 12:45 PM, Erick Erickson wrote: > What is the form of the unique key? I'm a bit confused here by your > comment: > "which can contain one or multi fields". > > But it seems like IndexWriter.deleteDocuments shoul

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Erick Erickson
What is the form of the unique key? I'm a bit confused here by your comment: "which can contain one or multi fields". But it seems like IndexWriter.deleteDocuments should work here. It's easy if your PKs are single terms, there's even a deleteDocuments(Term[]) form. But this really *requires* that

Re: What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread Jake Mannix
The usual way to do this is to use: IndexWriter.updateDocument(Term, Document) This method deletes all documents with the given Term in it (this would be your primary key), and then adds the Document you want to add. This is the traditional way to do updates, and it is fast. -jake On Mo

What is the best way to handle the primary key case during lucene indexing

2009-11-16 Thread java8964 java8964
Hi, In our application, we will allow the user to create a primary key defined in the document. We are using lucene 2.9. In this case, when we index the data coming from the client, if the metadata contains the primary key defined, we have to do the search/update for every row based on the pr

How to make wordDelimiterFilter[pulled from Solr nighly] to not break non-english words in a wrong way in lucene indexing/searching?

2009-06-08 Thread KK
Hi All, I'm trying to index some indian web page content which are basically a mix of indian and say 5% of english content in the same page itself. For all this I can not use standard or simple analyzer as they break the non-english words in a wrong places say[because the isLetter(ch) happens to be

Re: Lucene Indexing and Search Policy

2009-01-21 Thread Anshum
Its about building a custom similarity class that scores using your normalization factors etc. This might help in that case, http://www.gossamer-threads.com/lists/lucene/java-user/69553 -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opini

Re: Lucene Indexing and Search Policy

2009-01-21 Thread M Seetha Ramaiah
Hi Anshum, Even that document says that higher frequency implied higher score. My doubt is if the score is based only on the frequency, won't it be inappropriate for Internet based search? For example, if Google did the same thing, when I search for "Microsoft", there is a chance that Google

Re: Lucene Indexing and Search Policy

2009-01-21 Thread Anshum
Hi msr, Perhaps this could be useful for you. Lucene implements a modified vector space model in short. http://jayant7k.blogspot.com/2006/07/document-scoringcalculating-relevance_08.html -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the op

Lucene Indexing and Search Policy

2009-01-21 Thread MSR
Hi, Does Lucene take into consideration anything other than the frequency of the query words in a document? If it does, what are the other considerations? If it is purely based on word frequency, is it appropriate for Internet based search (where we need to consider reference count also)? Th

Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Glen Newton
LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQ

RE: Lucene Indexing DB records?

2008-08-22 Thread John Griffin
Try Hibernate Search - http://www.hibernate.org/410.html John G. -Original Message- From: ??? [mailto:[EMAIL PROTECTED] Sent: Friday, August 22, 2008 3:27 AM To: java-user@lucene.apache.org Subject: Lucene Indexing DB records? Guess I don't quite understand why there are so few

Re: Lucene Indexing DB records?

2008-08-22 Thread Marcelo Ochoa
> Actually there are many projects for Lucene + Database. Here is a list I > know: > > * Hibernate Search > * Compass, (also Hibernate + Lucene) > * Solr + DataImportHandler (Searching + Crawler) > * DBSight, (Specific for database, closed source, but very customizable, > easy to setup) > * Browse

Re: Lucene Indexing DB records?

2008-08-22 Thread Chris Lu
shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Fri, Aug 22, 2008 at 2:26 AM, ??? <[EMAIL PROTECTED]> wrote: > Guess I don't quite understand why there are so few posts about Lucene > indexing DB records. Searched Markmail, but most of the Lucene+DB p

Re: Lucene Indexing DB records?

2008-08-22 Thread Shalin Shekhar Mangar
You might also want to look at Solr and DataImportHandler. http://lucene.apache.org/solr http://wiki.apache.org/solr/DataImportHandler On Fri, Aug 22, 2008 at 2:56 PM, ??? <[EMAIL PROTECTED]> wrote: > Guess I don't quite understand why there are so few posts about Lucene > in

Lucene Indexing DB records?

2008-08-22 Thread ???
Guess I don't quite understand why there are so few posts about Lucene indexing DB records. Searched Markmail, but most of the Lucene+DB posts have to do with lucene index management. The only thing I found so far is the following, if you have a minute or two: http://kalanir.blogspot.com

Re: Lucene Indexing structure

2008-05-04 Thread Grant Ingersoll
Would a Function Query (ValueSourceQuery, see the org.apache.lucene.search.function package) work in this case? -Grant On May 4, 2008, at 9:35 AM, Vaijanath N. Rao wrote: Hi Chris, Sorry for the cross-posting and also for not making clear the problem. Let me try to explain the problem at

Re: Lucene Indexing structure

2008-05-04 Thread Vaijanath N. Rao
Hi Chris, Sorry for the cross-posting and also for not making clear the problem. Let me try to explain the problem at my hand. I am tying to write a CBIR (Content Based Image Reterival) frame work using lucene. As each document have entities such as title, description, author and so on. I a

Re: Lucene Indexing structure

2008-05-02 Thread Glen Newton
Vaijanath, I think I would do things in a different fashion: Lucene default distance metric is based on tf/idf and the cosine model, i.e. the frequencies of items. I believe the values that you are adding as Fields are the values in n-space for each of these image-based attributes. I don't believe

Re: Lucene Indexing structure

2008-05-02 Thread Chris Hostetter
: Hi Lucene-user and Lucene-dev, Please do not cross post -- java-user is the suitable place for your question. : Obviously there is something wrong with the above approach (as to get the : correct document we need to get all the documents and than do the required : distance calculation), but t

Lucene Indexing structure

2008-04-26 Thread Vaijanath N. Rao
Hi Lucene-user and Lucene-dev, I want to use lucene as an backend for the Image search (Content based Image retrieval). Indexing Mechanism: a) Get the Image properties such as Texture Tamura (TT), Texture Edge Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and Color Co

MapReduce usage with Lucene Indexing

2008-01-24 Thread roger dimitri
Hi, I am very new to Lucene & Hadoop, and I have a project where I need to use Lucene to index some input given either as a a huge collection of Java objects or one huge java object. I read about Hadoop's MapReduce utilities and I want to leverage that feature in my case described above.

Re: lucene indexing doubts

2007-10-26 Thread mark harwood
TED]> To: java-user@lucene.apache.org Sent: Friday, 26 October, 2007 5:31:36 AM Subject: Re: lucene indexing doubts hi, thanks for your response. I think you hanven't got what my question is? I will explain with an example. I have a folder which contains the indexed files. so, suppose i

Re: lucene indexing doubts

2007-10-26 Thread Karl Wettin
26 okt 2007 kl. 06.31 skrev poojasreejith: I have a folder which contains the indexed files. so, suppose if i want to add one more indexed data into it, without deleting the whole folder and performing the indexing for all the files again. I want it to do only that one file and add the i

Re: lucene indexing doubts

2007-10-25 Thread poojasreejith
-- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/lucene-indexing-doubts-tf4692435.html#a13420712 Sent from the

Re: lucene indexing doubts

2007-10-25 Thread Karl Wettin
25 okt 2007 kl. 19.35 skrev poojasreejith: Can anyone of you guide me, how to index into an already indexed folder. Right now, I am deleting the indexed info and running the indexer again. I dont want to do that. I want a method, how to append into the same folder when new files are ind

lucene indexing doubts

2007-10-25 Thread poojasreejith
d any solution for it. Pooja -- View this message in context: http://www.nabble.com/lucene-indexing-doubts-tf4692435.html#a13412076 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
Thanks very much for the information. I did not include the other portion of the stack trace because it was totally belonging to Jackrabbit library. Now I guess the problem is due to the fact that Jackrabbit's latest version is using Lucene 2.0 for its indexing purposes. So I will search some patch

Re: Lucene indexing error

2007-10-08 Thread Chris Hostetter
: I think this bug is related to the one posted on Lucene JIRA: : http://issues.apache.org/jira/browse/LUCENE-665 : Please let me know if there is any solution to this bug of Lucene. note that the issue is "Closed, Resolution: Won't Fix" it was determined that ultimately there was no bug in Luce

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
But then the core problem is that the index that is created is in a totally corrupted state. So deleting or keeping the lock does not make a difference as the Index itself is not created properly. The problem arises when the index is getting created itself. Regards Narendra On 10/8/07, saikrishn

Re: Lucene indexing error

2007-10-08 Thread saikrishna venkata pendyala
Lucene creates an lock on the index before using it and then unlock the index, after using it. If the lucene is interuptted and is closed by force the, index remains in locked state and it cannot be used. Generally in linux lucene lock information file is create in /tmp directory. Delete the lock

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
I think this bug is related to the one posted on Lucene JIRA: http://issues.apache.org/jira/browse/LUCENE-665 Please let me know if there is any solution to this bug of Lucene. Thanks Narendra On 10/8/07, Joe Attardi <[EMAIL PROTECTED]> wrote: > > On 10/8/07, Narendra yadala <[EMAIL PROTECTED]>

Re: Lucene indexing error

2007-10-08 Thread Joe Attardi
On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote: > > I do have permission to access Lucene files. They reside on my local > machine. > But still this is giving the error.I am using Windows XP operationg > system. > Well, since you are opening an IndexReader (as evidenced by your stack trace)

  1   2   3   >