Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-05-23 Thread Michael McCandless
I finally dug into this, and it turns out the nightly benchmark I run had bad bottlenecks such that it couldn't feed documents quickly enough to Lucene to take advantage of the concurrent hardware in beast2. I fixed that and just re-ran the nightly run and it shows good gains: https://plus.google.

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-15 Thread Robert Muir
you won't see indexing improvements there because the dataset in question is wikipedia and mostly indexing full text. I think it may have one measly numeric field. On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić wrote: > (replying to my original email because I didn't get people's replies, even

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Otis Gospodnetić
(replying to my original email because I didn't get people's replies, even though I see in the archives people replied) Re BJ and beast2 upgrade. Yeah, I saw that, but * if there is no indexing throughput improvement after that, does that mean that those particular indexing tests happen to be

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Stephen Green
As someone who runs Lucene on big hardware, I'd be very interested to see the tuning parameters when you do get a chance.. On Thu, Apr 14, 2016 at 3:41 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow > got slower

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Michael McCandless
Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow got slower :) I haven't re-tuned indexing threads, IW buffer size yet for this new hardware ... Mike McCandless http://blog.mikemccandless.com On Thu, Apr 14, 2016 at 2:09 PM, Ishan Chattopadhyaya wrote: > Wow, 72 cores? T

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Ishan Chattopadhyaya
Wow, 72 cores? That sounds astounding. Are they dual Xeon E5 2699 v3 CPUs with 18 cores each, with hyperthreading = 18*2*2=72 threads? On Thu, Apr 14, 2016 at 11:33 PM, Dawid Weiss wrote: > The GC change is after this: > > BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM) > > which leads

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Dawid Weiss
The GC change is after this: BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM) which leads me to believe these results are not comparable (different machines, architectures, disks, CPUs perhaps?). Dawid On Thu, Apr 14, 2016 at 7:13 PM, Otis Gospodnetić wrote: > Hi, > > I was looking a

Re: Lucene indexing speed on NVMe drive

2015-05-01 Thread Michael McCandless
Hyper-threading should help Lucene indexing go faster, when it's not IO bound ... I found 20 threads (on 12 real cores, 24 with HT) to be fastest in the nightly benchmark (http://people.apache.org/~mikemccand/lucenebench/indexing.html). But it's curious you're unable to saturate one of CPU or IO,

RE: Lucene indexing speed on NVMe drive

2015-04-30 Thread Anahita Shayesteh-SSI
AM To: java-user@lucene.apache.org Cc: Anahita Shayesteh-SSI Subject: Re: Lucene indexing speed on NVMe drive : Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : proc

Re: Lucene indexing speed on NVMe drive

2015-04-30 Thread Chris Hostetter
: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : processor ,40 with hyperthreading, 64G Memory) and study indexing speed ... : I get best performance

Re: Lucene Indexing on NFS

2012-12-19 Thread Ian Lea
Use SimpleFSLockFactory. See the javadocs about locks being left behind on abnormal JVM termination. There was a thread on this list a while ago about some pros and cons of using lucene on NFS. 2-Oct-2012 in fact. http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/thread -- I

Re: Lucene indexing & Searching

2011-06-08 Thread Pranav goyal
Oh sry, I got my error and it worked. Thanks On Wed, Jun 8, 2011 at 3:57 PM, Pranav goyal wrote: > import java.io.File; > import java.io.IOException; > import java.util.Collection; > import java.util.Iterator; > import java.util.List; > import java.util.Map; > > import org.apache.lucene.analysi

Re: Lucene Indexing

2011-06-07 Thread bmdakshinamur...@gmail.com
If i understand the requirement correctly, contract is one document in your system which in turn contains some *'n*' fields. Contract_ID is the key in all the documents(Structures). Contract_ID is the only field you want to retrieve no matter on what field you search for. If this is the case, store

Re: Lucene Indexing

2011-06-06 Thread Anshum
Yes, You'd need to delete the document and then re-add a newly created document object. You may use the key and delete the doc using the Term(key, value). -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Jun 6, 2011 at 4:45 PM, Pranav goyal wrote: > Hi Anshum, > > Thanks for answering my que

Re: Lucene Indexing

2011-06-06 Thread Pranav goyal
Hi Anshum, Thanks for answering my question. By this I got to know that I cannot update without deleting my document. So whenever I am indexing the documents first I need to check whether the particular key exists in the document or not and if it exists I need to delete it and add the updated one

Re: Lucene Indexing

2011-06-06 Thread Anshum
Hii Pranav, By what you've mentioned, it looks like you want to modify a particular document (or all docs) by adding a particular field in the document(s). As of right now, its not possible to modify a document inside a lucene index. That is due to the way the index is structured. The only way as

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Shuai Weng To: java-user@lucene.apache.org Sent: Fri, August 20, 2010 5:47:31 PM Subject: Re: lucene indexing configuration Hey, Currently we have indexed some biological

Re: lucene indexing configuration

2010-08-20 Thread Otis Gospodnetic
://search-lucene.com/ - Original Message > From: Shuai Weng > To: java-user@lucene.apache.org > Sent: Fri, August 20, 2010 5:47:31 PM > Subject: Re: lucene indexing configuration > > > Hey, > > Currently we have indexed some biological full text pages, I wa

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
Hey, Currently we have indexed some biological full text pages, I was wondering how to config the schema.xml such that the gene names 'met1', 'met2', 'met3' will be treated as different words. Currently they are all mapped to 'met'. Thanks, Shuai

Re: Lucene Indexing out of memory

2010-03-15 Thread Michael McCandless
Try the ideas here? http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Mike On Mon, Mar 15, 2010 at 1:51 AM, ajay_gupta wrote: > > Erick, > I did get some hint for my problem. There was a bug in the code which was > eating up the memory which I figured out after lot of effort. > Thanks

Re: Lucene Indexing out of memory

2010-03-14 Thread ajay_gupta
Erick, I did get some hint for my problem. There was a bug in the code which was eating up the memory which I figured out after lot of effort. Thanks All of you for your suggestions. But I still feel it takes lot of time to index documents. Its taking around an hour or more for indexing 330 MB f

Re: Lucene Indexing out of memory

2010-03-14 Thread ajay_gupta
Hi Michale and others, I did get some hint for my problem. There was a bug in the code which was eating up the memory which I figured out after lot of effort. Thanks All of you for your suggestions. Regards Ajay Michael McCandless-2 wrote: > > I agree, memory profiler or heap dump or small

Re: Lucene Indexing out of memory

2010-03-04 Thread Michael McCandless
I agree, memory profiler or heap dump or small test case is the next step... the code looks fine. This is always a single thread adding docs? Are you really certain that the iterator only iterates over 2500 docs? What analyzer are you using? Mike On Thu, Mar 4, 2010 at 4:50 AM, Ian Lea wrote:

Re: Lucene Indexing out of memory

2010-03-04 Thread Ian Lea
Have you run it through a memory profiler yet? Seems the obvious next step. If that doesn't help, cut it down to the simplest possible self-contained program that demonstrates the problem and post it here. -- Ian. On Thu, Mar 4, 2010 at 6:04 AM, ajay_gupta wrote: > > Erick, > w_context and c

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
Erick, w_context and context_str are local to this method and are used only for 2500 K documents not entire 70 k. I am clearing the hashmap after each 2500k doc processing and also I printed memory consumed by hashmap which is kind of constant for each chunk processing. For each invocation of up

Re: Lucene Indexing out of memory

2010-03-03 Thread Erick Erickson
The first place I'd look is how big my your strings got. w_context and context_str come to mind. My first suspicion is that you're building ever-longer strings and around 70K documents your strings are large enough to produce OOMs. FWIW Erick On Wed, Mar 3, 2010 at 1:09 PM, ajay_gupta wrote: >

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
Mike, Actually my documents are very small in size. We have csv files where each record represents a document which is not very large so I don't think document size is an issue. For each record I am tokenizing it and for each token I am keeping 3 neighbouring tokens in a Hashtable. After X number

Re: Lucene Indexing out of memory

2010-03-03 Thread Michael McCandless
The worst case RAM usage for Lucene is a single doc with many unique terms. Lucene allocates ~60 bytes per unique term (plus space to hold that term's characters = 2 bytes per char). And, Lucene cannot flush within one document -- it must flush after the doc has been fully indexed. This past thr

Re: Lucene Indexing out of memory

2010-03-03 Thread Erick Erickson
Interpolating from your data (and, by the way, some code examples would help a lot), if you're reopening the index reader to pick up recent additions but not closing it if a different one is returned from reopen, you'll consume resources. From the JavaDocs... IndexReader new = r.reopen(); if (ne

Re: Lucene Indexing out of memory

2010-03-03 Thread Ian Lea
Lucene doesn't load everything into memory and can carry on running consecutive searches or loading documents for ever without hitting OOM exceptions. So if it isn't failing on a specific document the most likely cause is that your program is hanging on to something it shouldn't. Previous docs? Fi

Re: Lucene Indexing out of memory

2010-03-03 Thread ajay_gupta
Ian, OOM exception point varies not fixed. It could come anywhere once memory exceeds a certain point. I have allocated 1 GB memory for JVM. I haven't used profiler. When I said after 70 K docs it fails i meant approx 70k documents but if I reduce memory then it will OOM before 70K so its not sp

Re: Lucene Indexing out of memory

2010-03-02 Thread Erick Erickson
It's not searching that I'm wondering about. The memory size, as far as I understand, really only has document resolution. That is, you can't index a part of a document, flush to disk, then index the rest of the document. The entire document is parsed into memory, and only then flushed to disk if R

Re: Lucene Indexing out of memory

2010-03-02 Thread Ian Lea
Where exactly are you hitting the OOM exception? Have you got a stack trace? How much memory are you allocating to the JVM? Have you run a profiler to find out what is using the memory? If it runs OK for 70K docs then fails, 2 possibilities come to mind: either the 70K + 1 doc is particularly la

Re: Lucene Indexing out of memory

2010-03-02 Thread ajay_gupta
Hi Erick, I tried setting setRAMBufferSizeMB as 200-500MB as well but still it goes OOM error. I thought its filebased indexing so memory shouldn't be an issue but you might be right that when searching it might be using lot of memory ? Is there way to load documents in chunks or someothere way

RE: Lucene Indexing out of memory

2010-03-02 Thread Murdoch, Paul
Ajay, Here is another thread I started on the same issue. http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe n-indexing-large-files Paul -Original Message- From: java-user-return-45254-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45254-paul.

RE: Lucene Indexing out of memory

2010-03-02 Thread Murdoch, Paul
Ajay, I've posted a few times on OOM issues. Here is one thread. http://mail-archives.apache.org/mod_mbox//lucene-java-user/200909.mbox/% 3c5b20def02611534db08854076ce825d803626...@sc1exc2.corp.emainc.com%3e I'll try and get some more links to you from some other threads I started for OOM issue

Re: Lucene Indexing out of memory

2010-03-02 Thread Erick Erickson
I'm not following this entirely, but these docs may be huge by the time you add context for every word in them. You say that you "search the existing indices then I get the content and append". So is it possible that after 70K documents your additions become so huge that you're blowing up? Have

Re: Lucene Indexing and Search Policy

2009-01-21 Thread Anshum
Its about building a custom similarity class that scores using your normalization factors etc. This might help in that case, http://www.gossamer-threads.com/lists/lucene/java-user/69553 -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opini

Re: Lucene Indexing and Search Policy

2009-01-21 Thread M Seetha Ramaiah
Hi Anshum, Even that document says that higher frequency implied higher score. My doubt is if the score is based only on the frequency, won't it be inappropriate for Internet based search? For example, if Google did the same thing, when I search for "Microsoft", there is a chance that Google

Re: Lucene Indexing and Search Policy

2009-01-21 Thread Anshum
Hi msr, Perhaps this could be useful for you. Lucene implements a modified vector space model in short. http://jayant7k.blogspot.com/2006/07/document-scoringcalculating-relevance_08.html -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the op

RE: Lucene Indexing DB records?

2008-08-22 Thread John Griffin
Try Hibernate Search - http://www.hibernate.org/410.html John G. -Original Message- From: ??? [mailto:[EMAIL PROTECTED] Sent: Friday, August 22, 2008 3:27 AM To: java-user@lucene.apache.org Subject: Lucene Indexing DB records? Guess I don't quite understand why there are so few posts ab

Re: Lucene Indexing DB records?

2008-08-22 Thread Marcelo Ochoa
> Actually there are many projects for Lucene + Database. Here is a list I > know: > > * Hibernate Search > * Compass, (also Hibernate + Lucene) > * Solr + DataImportHandler (Searching + Crawler) > * DBSight, (Specific for database, closed source, but very customizable, > easy to setup) > * Browse

Re: Lucene Indexing DB records?

2008-08-22 Thread Chris Lu
Actually there are many projects for Lucene + Database. Here is a list I know: * Hibernate Search * Compass, (also Hibernate + Lucene) * Solr + DataImportHandler (Searching + Crawler) * DBSight, (Specific for database, closed source, but very customizable, easy to setup) * Browse Engine -- Chris

Re: Lucene Indexing DB records?

2008-08-22 Thread Shalin Shekhar Mangar
You might also want to look at Solr and DataImportHandler. http://lucene.apache.org/solr http://wiki.apache.org/solr/DataImportHandler On Fri, Aug 22, 2008 at 2:56 PM, ??? <[EMAIL PROTECTED]> wrote: > Guess I don't quite understand why there are so few posts about Lucene > indexing DB records. S

Re: Lucene Indexing structure

2008-05-04 Thread Grant Ingersoll
Would a Function Query (ValueSourceQuery, see the org.apache.lucene.search.function package) work in this case? -Grant On May 4, 2008, at 9:35 AM, Vaijanath N. Rao wrote: Hi Chris, Sorry for the cross-posting and also for not making clear the problem. Let me try to explain the problem at

Re: Lucene Indexing structure

2008-05-04 Thread Vaijanath N. Rao
Hi Chris, Sorry for the cross-posting and also for not making clear the problem. Let me try to explain the problem at my hand. I am tying to write a CBIR (Content Based Image Reterival) frame work using lucene. As each document have entities such as title, description, author and so on. I a

Re: Lucene Indexing structure

2008-05-02 Thread Glen Newton
Vaijanath, I think I would do things in a different fashion: Lucene default distance metric is based on tf/idf and the cosine model, i.e. the frequencies of items. I believe the values that you are adding as Fields are the values in n-space for each of these image-based attributes. I don't believe

Re: Lucene Indexing structure

2008-05-02 Thread Chris Hostetter
: Hi Lucene-user and Lucene-dev, Please do not cross post -- java-user is the suitable place for your question. : Obviously there is something wrong with the above approach (as to get the : correct document we need to get all the documents and than do the required : distance calculation), but t

Re: lucene indexing doubts

2007-10-26 Thread mark harwood
TED]> To: java-user@lucene.apache.org Sent: Friday, 26 October, 2007 5:31:36 AM Subject: Re: lucene indexing doubts hi, thanks for your response. I think you hanven't got what my question is? I will explain with an example. I have a folder which contains the indexed files. so, suppose i

Re: lucene indexing doubts

2007-10-26 Thread Karl Wettin
26 okt 2007 kl. 06.31 skrev poojasreejith: I have a folder which contains the indexed files. so, suppose if i want to add one more indexed data into it, without deleting the whole folder and performing the indexing for all the files again. I want it to do only that one file and add the i

Re: lucene indexing doubts

2007-10-25 Thread poojasreejith
hi, thanks for your response. I think you hanven't got what my question is? I will explain with an example. I have a folder which contains the indexed files. so, suppose if i want to add one more indexed data into it, without deleting the whole folder and performing the indexing for all the fil

Re: lucene indexing doubts

2007-10-25 Thread Karl Wettin
25 okt 2007 kl. 19.35 skrev poojasreejith: Can anyone of you guide me, how to index into an already indexed folder. Right now, I am deleting the indexed info and running the indexer again. I dont want to do that. I want a method, how to append into the same folder when new files are ind

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
Thanks very much for the information. I did not include the other portion of the stack trace because it was totally belonging to Jackrabbit library. Now I guess the problem is due to the fact that Jackrabbit's latest version is using Lucene 2.0 for its indexing purposes. So I will search some patch

Re: Lucene indexing error

2007-10-08 Thread Chris Hostetter
: I think this bug is related to the one posted on Lucene JIRA: : http://issues.apache.org/jira/browse/LUCENE-665 : Please let me know if there is any solution to this bug of Lucene. note that the issue is "Closed, Resolution: Won't Fix" it was determined that ultimately there was no bug in Luce

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
But then the core problem is that the index that is created is in a totally corrupted state. So deleting or keeping the lock does not make a difference as the Index itself is not created properly. The problem arises when the index is getting created itself. Regards Narendra On 10/8/07, saikrishn

Re: Lucene indexing error

2007-10-08 Thread saikrishna venkata pendyala
Lucene creates an lock on the index before using it and then unlock the index, after using it. If the lucene is interuptted and is closed by force the, index remains in locked state and it cannot be used. Generally in linux lucene lock information file is create in /tmp directory. Delete the lock

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
I think this bug is related to the one posted on Lucene JIRA: http://issues.apache.org/jira/browse/LUCENE-665 Please let me know if there is any solution to this bug of Lucene. Thanks Narendra On 10/8/07, Joe Attardi <[EMAIL PROTECTED]> wrote: > > On 10/8/07, Narendra yadala <[EMAIL PROTECTED]>

Re: Lucene indexing error

2007-10-08 Thread Joe Attardi
On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote: > > I do have permission to access Lucene files. They reside on my local > machine. > But still this is giving the error.I am using Windows XP operationg > system. > Well, since you are opening an IndexReader (as evidenced by your stack trace)

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
I do have permission to access Lucene files. They reside on my local machine. But still this is giving the error.I am using Windows XP operationg system. Regards Narendra On 10/8/07, Joe Attardi <[EMAIL PROTECTED]> wrote: > > On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote: > > > > This is

Re: Lucene indexing error

2007-10-08 Thread Joe Attardi
On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote: > > This is the relevant portion of the stack trace: > > Caused by: java.io.IOException: Access is denied > at java.io.WinNTFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:850) > at or

Re: Lucene indexing error

2007-10-08 Thread Narendra yadala
This is the relevant portion of the stack trace: Caused by: java.io.IOException: Access is denied at java.io.WinNTFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:850) at org.apache.jackrabbit.core.query.lucene.FSDirectory$1.obtain( FSDirect

Re: Lucene indexing error

2007-10-08 Thread Karl Wettin
8 okt 2007 kl. 15.58 skrev Narendra yadala: Hi All I am getting this error when I am doing Indexing using Lucene. java.io.IOException: Access is denied on java.io.WinNTFileSystem.createFileExclusively Please let me know if there is any fix for this bug. Please supply the complete stack trace

Re: Lucene indexing for pdf files

2007-08-31 Thread Steven Rowe
Hi Madhu, Madhu wrote: > i am indexing pdf document using pdfbox 7.4, its working fine for some pdf > files. for japanese pdf files its giving the below exception. > > caught a class java.io.IOException > with message: Unknown encoding for 'UniJIS-UCS2-H' > > Can any one help me , how to set th

Re: Lucene indexing

2007-08-30 Thread Karl Wettin
30 aug 2007 kl. 11.24 skrev Madhu: Hi all.. I am trying to index 5Mb excel file ,but while indexing using poi 3..Its giving me out of memory exception. Can any one knows how to index large size excle files files. Increase the maximum VM heap size? http://blogs.sun.com/watt/resource/jvm-

Re: Lucene indexing for PDM system like Windchill

2007-07-23 Thread Mathieu Lecarme
Le dimanche 22 juillet 2007 à 13:17 -0500, Dmitry a écrit : > Mathieru, > I never used Compass, i know that there is integration Shards /Search with > Hibernate, but it absolutely different what actually I need, probably I can > take a look on it. any way thanks > thanks, > DT Not only hibernate

Re: Lucene indexing for PDM system like Windchill

2007-07-22 Thread Dmitry
carme" <[EMAIL PROTECTED]> To: Sent: Sunday, July 22, 2007 4:16 AM Subject: Re: Lucene indexing for PDM system like Windchill If you wont to index Hibernate persisted data, just use Compass. M. Le 22 juil. 07 à 04:19, Dmitry a écrit : Folks, Trying to integrate PDM system : WTPart ob

Re: Lucene indexing for PDM system like Windchill

2007-07-22 Thread Mathieu Lecarme
If you wont to index Hibernate persisted data, just use Compass. M. Le 22 juil. 07 à 04:19, Dmitry a écrit : Folks, Trying to integrate PDM system : WTPart obejct with Lucene indexing search framework. Part of the work is integration with persistent layer + indeces storage+ mysql Could no

Re: Lucene indexing for PDM system like Windchill

2007-07-21 Thread karl wettin
22 jul 2007 kl. 04.19 skrev Dmitry: Trying to integrate PDM system : WTPart obejct with Lucene indexing search framework. Part of the work is integration with persistent layer + indeces storage+ mysql You have a product data management software of some sort that use MySQL via Hibernate

Re: Lucene Indexing and searching - help

2007-07-05 Thread emmettwalsh
ok heres the deal with my application... I have got an xml file with about 8000 of these properties... Dighton Rock The Rock Across the Taunton River from Dighton in Dighton Rock State Park Dighton MA I parse t

Re: Lucene Indexing and searching - help

2007-07-04 Thread Erick Erickson
A couple of things would help us help you 1> tell us what you're trying to do. What's the point of your code? Offhand, I can't tell what it is you're really after. 2> post an example of query.toString(); along with your sample for one of the offending queries. 3> Post the query stri

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread mark harwood
- Original Message From: Chaminda Amarasinghe <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 12 March, 2007 9:30:22 AM Subject: Re: Lucene Indexing - Getting Hited words in a query Thanks mark harwood , I want something like Highlighter thing where ca

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Chaminda Amarasinghe
istribution. The Junit test rig gives some example uses. - Original Message From: karl wettin To: java-user@lucene.apache.org Sent: Monday, 12 March, 2007 7:40:34 AM Subject: Re: Lucene Indexing - Getting Hited words in a query 12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe: >

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread mark harwood
12 March, 2007 7:40:34 AM Subject: Re: Lucene Indexing - Getting Hited words in a query 12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe: > Why nobody is anwering me? > Pls help me. It might take some time until someone that knows the answer reads you question. > > Chaminda Ama

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Chaminda Amarasinghe
Many thaks Vipin, I'l check Vipin <[EMAIL PROTECTED]> wrote: Hi chaminda, you just go through this link http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 in this articles last portion(page 3) the author has suggested a way to handle such kind of things(Composit

Re: Lucene Indexing - Getting Hited words in a query

2007-03-12 Thread Vipin
Hi chaminda, you just go through this link http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 in this articles last portion(page 3) the author has suggested a way to handle such kind of things(Composite didyoumean parser).. i think it will open up a way... Regard

Re: Lucene Indexing - Getting Hited words in a query

2007-03-11 Thread Chaminda Amarasinghe
thanks karl, karl wettin <[EMAIL PROTECTED]> wrote: 12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe: > Why nobody is anwering me? > Pls help me. It might take some time until someone that knows the answer reads you question. > > Chaminda Amarasinghe wrote: > Hi all, > > I'm new to this grou

Re: Lucene Indexing - Getting Hited words in a query

2007-03-11 Thread karl wettin
12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe: Why nobody is anwering me? Pls help me. It might take some time until someone that knows the answer reads you question. Chaminda Amarasinghe <[EMAIL PROTECTED]> wrote: Hi all, I'm new to this group, I'm using lucene for indexing. I

Re: Lucene Indexing - Getting Hited words in a query

2007-03-11 Thread Chaminda Amarasinghe
Why nobody is anwering me? Pls help me. Chaminda Amarasinghe <[EMAIL PROTECTED]> wrote: Hi all, I'm new to this group, I'm using lucene for indexing. I have a problem. Any help gratly appreciate. Please see the following code // three fields MultiFieldQueryParser parser = new MultiFieldQuer

Re: Lucene Indexing

2007-01-26 Thread Grant Ingersoll
te: This document should contain the information you need : http://lucene.sourceforge.net/talks/inktomi/ Damien. -Original Message- From: Sairaj Sunil [mailto:[EMAIL PROTECTED] Sent: 26 January 2007 03:22 To: java-user@lucene.apache.org Subject: Re: Lucene Indexing Hi I was asking wh

Re: Lucene Indexing

2007-01-26 Thread Sairaj Sunil
]> wrote: This document should contain the information you need : http://lucene.sourceforge.net/talks/inktomi/ Damien. -Original Message- From: Sairaj Sunil [mailto:[EMAIL PROTECTED] Sent: 26 January 2007 03:22 To: java-user@lucene.apache.org Subject: Re: Lucene Indexing Hi I was asking w

RE: Lucene Indexing

2007-01-26 Thread Damien McCarthy
This document should contain the information you need : http://lucene.sourceforge.net/talks/inktomi/ Damien. -Original Message- From: Sairaj Sunil [mailto:[EMAIL PROTECTED] Sent: 26 January 2007 03:22 To: java-user@lucene.apache.org Subject: Re: Lucene Indexing Hi I was asking what

Re: Lucene Indexing

2007-01-25 Thread Sairaj Sunil
Hi I was asking what exactly is the inverted indexing strategy used for storing the index. Is it batch-based index/b-tree based/segment-based data structure that is used as an index data structure. On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote: http://lucene.apache.org/java/docs/api/org/

Re: Lucene Indexing

2007-01-24 Thread Rajiv Roopan
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote: Hi all, Can you tell me the exact indexing algorithm used by Lucene. or give some links to the documents that describe the algorithm used by lucene Thanks in adva

RE: Lucene indexing PPT

2006-06-30 Thread mcarcelen
Hello Nick! Thanks for your help, it´s useful for me Bye -Mensaje original- De: Nick Burch [mailto:[EMAIL PROTECTED] Enviado el: viernes, 30 de junio de 2006 12:19 Para: java-user@lucene.apache.org Asunto: Re: Lucene indexing PPT On Fri, 30 Jun 2006, mcarcelen wrote: > I´m trying

Re: Lucene indexing PPT

2006-06-30 Thread Nick Burch
On Fri, 30 Jun 2006, mcarcelen wrote: > I´m trying to build a index with PPT files. I have downloaded the api > POI, "poi.bin.3.0" and "poi.src.3.0", but I don´t know where may I have > to unzip them. I´d like to build the index by the command line, the same > way as I don't know about the lucene

Re: Lucene indexing RDF

2006-06-29 Thread adasal
Hi Chris, I find this incredibly interesting! Thank you for your full explanation. I was aware of the components, but not the implementation. ... to provide a means to query both document full-text and metadata using an RDF model Is there any thing I can read about how you have some to this ap

Re: Lucene indexing RDF

2006-06-28 Thread Christiaan Fluit
adasal wrote: As far as i have researched this I know that the gnowsis project uses both rdf and lucene, but I have not had time to determine their relationship. www.gnowsis.org/ I can tell you a bit about Gnowsis, as we (Aduna) are cooperating with the Gnowsis people on RDF creation, storage

Re: Lucene indexing RDF

2006-06-28 Thread adasal
AIL PROTECTED] > Enviado el: martes, 27 de junio de 2006 17:38 > Para: java-user@lucene.apache.org > Asunto: Re: Lucene indexing pdf > > I used PDFBox library as mentioned in Lucene in Action. It works for me. > You can access it from www.pdfbox.org > > suba suresh > >

Re: Lucene indexing RDF

2006-06-27 Thread Suba Suresh
nsaje original- De: Suba Suresh [mailto:[EMAIL PROTECTED] Enviado el: martes, 27 de junio de 2006 17:38 Para: java-user@lucene.apache.org Asunto: Re: Lucene indexing pdf I used PDFBox library as mentioned in Lucene in Action. It works for me. You can access it from www.pdfbox.org suba s

Re: Lucene indexing pdf

2006-06-27 Thread Suba Suresh
I used PDFBox library as mentioned in Lucene in Action. It works for me. You can access it from www.pdfbox.org suba suresh mcarcelen wrote: Hi, I´m new with Lucene and I´m trying to index a pdf but when I query everything it returns nothing. Can anyone help me? Thans a lot Teresa ---

Re: Lucene indexing pdf

2006-06-27 Thread Patrick Kimber
Hi Teresa You need to convert the pdf file into text format before adding the text to the Lucene index. You may like to look at http://www.pdfbox.org/ for a library to convert pdf files to text format. Patrick On 27/06/06, mcarcelen <[EMAIL PROTECTED]> wrote: Hi, I´m new with Lucene and I´m t

Re: lucene indexing

2006-04-07 Thread Grant Ingersoll
Lucene does not provide this out of the box. You will have to write a program to do it and feed the results to Lucene. If I remember right, these files are in XML, so you can probably use SAX or a pull parser. I think a number of TREC participants, in the past, have used Lucene, so you may

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Doug Cutting
Igor Bolotin wrote: Does it make sense to change TermInfosWriter.FORMAT in the patch? Yes. This should be updated for any change to the format of the file, and this certainly constitutes a format change. This discussion should move to [EMAIL PROTECTED] Doug --

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Igor Bolotin
Does it make sense to change TermInfosWriter.FORMAT in the patch? Igor On 3/27/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Igor Bolotin wrote: > > If somebody is interested - I can post our changes in TermInfosWriter > and > > SegmentTermEnum code, although they are pretty trivial. > > Pleas

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Andrzej Bialecki
Doug Cutting wrote: Igor Bolotin wrote: If somebody is interested - I can post our changes in TermInfosWriter and SegmentTermEnum code, although they are pretty trivial. Please submit this as a patch attached to a bug report. I contemplated making this change to Lucene myself, when writing

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Doug Cutting
Igor Bolotin wrote: If somebody is interested - I can post our changes in TermInfosWriter and SegmentTermEnum code, although they are pretty trivial. Please submit this as a patch attached to a bug report. I contemplated making this change to Lucene myself, when writing Nutch's FsDirectory, b

Re: Lucene indexing on Hadoop distributed file system

2006-03-26 Thread Raghavendra Prabhu
I would like to see lucene operate with hadoop As you rightly pointed out, writing using FSDirectory to DFS would be a performance issue. I am interested in the idea. But i do not know how much time i can contribute to this because of the little time which i can spare. If anyone else is interest

RE: lucene indexing performance

2005-05-16 Thread Jayakumar.V
2005 1:58 AM To: java-user@lucene.apache.org Subject: Re: lucene indexing performance One immediate optimization would be to only close the writer and open the reader if the document is present. You can have a reader open and do searches while indexing (and optimization) are underway. It'

Re: lucene indexing performance

2005-04-23 Thread Chuck Williams
One immediate optimization would be to only close the writer and open the reader if the document is present. You can have a reader open and do searches while indexing (and optimization) are underway. It's just the delete operation that requires you to close the writer (so you don't have two d