I finally dug into this, and it turns out the nightly benchmark I run had
bad bottlenecks such that it couldn't feed documents quickly enough to
Lucene to take advantage of the concurrent hardware in beast2.
I fixed that and just re-ran the nightly run and it shows good gains:
https://plus.google.
you won't see indexing improvements there because the dataset in
question is wikipedia and mostly indexing full text. I think it may
have one measly numeric field.
On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić
wrote:
> (replying to my original email because I didn't get people's replies, even
(replying to my original email because I didn't get people's replies, even
though I see in the archives people replied)
Re BJ and beast2 upgrade. Yeah, I saw that, but
* if there is no indexing throughput improvement after that, does that mean
that those particular indexing tests happen to be
As someone who runs Lucene on big hardware, I'd be very interested to see
the tuning parameters when you do get a chance..
On Thu, Apr 14, 2016 at 3:41 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow
> got slower
Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow
got slower :) I haven't re-tuned indexing threads, IW buffer size yet
for this new hardware ...
Mike McCandless
http://blog.mikemccandless.com
On Thu, Apr 14, 2016 at 2:09 PM, Ishan Chattopadhyaya
wrote:
> Wow, 72 cores? T
Wow, 72 cores? That sounds astounding. Are they dual Xeon E5 2699 v3 CPUs
with 18 cores each, with hyperthreading = 18*2*2=72 threads?
On Thu, Apr 14, 2016 at 11:33 PM, Dawid Weiss wrote:
> The GC change is after this:
>
> BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM)
>
> which leads
The GC change is after this:
BJ (2015-12-02): Upgrade to beast2 (72 cores, 256 GB RAM)
which leads me to believe these results are not comparable (different
machines, architectures, disks, CPUs perhaps?).
Dawid
On Thu, Apr 14, 2016 at 7:13 PM, Otis Gospodnetić
wrote:
> Hi,
>
> I was looking a
Hyper-threading should help Lucene indexing go faster, when it's not
IO bound ... I found 20 threads (on 12 real cores, 24 with HT) to be
fastest in the nightly benchmark
(http://people.apache.org/~mikemccand/lucenebench/indexing.html).
But it's curious you're unable to saturate one of CPU or IO,
AM
To: java-user@lucene.apache.org
Cc: Anahita Shayesteh-SSI
Subject: Re: Lucene indexing speed on NVMe drive
: Hi. I am studying Lucene performance and in particular how it benefits from
faster I/O such as SSD and NVMe.
: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20
: proc
: Hi. I am studying Lucene performance and in particular how it benefits from
faster I/O such as SSD and NVMe.
: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20
: processor ,40 with hyperthreading, 64G Memory) and study indexing speed
...
: I get best performance
Use SimpleFSLockFactory. See the javadocs about locks being left
behind on abnormal JVM termination.
There was a thread on this list a while ago about some pros and cons
of using lucene on NFS. 2-Oct-2012 in fact.
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/thread
--
I
Oh sry,
I got my error and it worked.
Thanks
On Wed, Jun 8, 2011 at 3:57 PM, Pranav goyal wrote:
> import java.io.File;
> import java.io.IOException;
> import java.util.Collection;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
>
> import org.apache.lucene.analysi
If i understand the requirement correctly, contract is one document in your
system which in turn contains some *'n*' fields. Contract_ID is the key in
all the documents(Structures). Contract_ID is the only field you want to
retrieve no matter on what field you search for. If this is the case, store
Yes,
You'd need to delete the document and then re-add a newly created document
object. You may use the key and delete the doc using the Term(key, value).
--
Anshum Gupta
http://ai-cafe.blogspot.com
On Mon, Jun 6, 2011 at 4:45 PM, Pranav goyal wrote:
> Hi Anshum,
>
> Thanks for answering my que
Hi Anshum,
Thanks for answering my question. By this I got to know that I cannot update
without deleting my document.
So whenever I am indexing the documents first I need to check whether the
particular key exists in the document or not and if it exists I need to
delete it and add the updated one
Hii Pranav,
By what you've mentioned, it looks like you want to modify a particular
document (or all docs) by adding a particular field in the document(s). As
of right now, its not possible to modify a document inside a lucene index.
That is due to the way the index is structured. The only way as
://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Shuai Weng
To: java-user@lucene.apache.org
Sent: Fri, August 20, 2010 5:47:31 PM
Subject: Re: lucene indexing configuration
Hey,
Currently we have indexed some biological
://search-lucene.com/
- Original Message
> From: Shuai Weng
> To: java-user@lucene.apache.org
> Sent: Fri, August 20, 2010 5:47:31 PM
> Subject: Re: lucene indexing configuration
>
>
> Hey,
>
> Currently we have indexed some biological full text pages, I wa
Hey,
Currently we have indexed some biological full text pages, I was wondering how
to config the schema.xml such that
the gene names 'met1', 'met2', 'met3' will be treated as different words.
Currently they are all mapped to 'met'.
Thanks,
Shuai
Try the ideas here?
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Mike
On Mon, Mar 15, 2010 at 1:51 AM, ajay_gupta wrote:
>
> Erick,
> I did get some hint for my problem. There was a bug in the code which was
> eating up the memory which I figured out after lot of effort.
> Thanks
Erick,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of effort.
Thanks All of you for your suggestions.
But I still feel it takes lot of time to index documents. Its taking around
an hour or more for indexing 330 MB f
Hi Michale and others,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of effort.
Thanks All of you for your suggestions.
Regards
Ajay
Michael McCandless-2 wrote:
>
> I agree, memory profiler or heap dump or small
I agree, memory profiler or heap dump or small test case is the next
step... the code looks fine.
This is always a single thread adding docs?
Are you really certain that the iterator only iterates over 2500 docs?
What analyzer are you using?
Mike
On Thu, Mar 4, 2010 at 4:50 AM, Ian Lea wrote:
Have you run it through a memory profiler yet? Seems the obvious next step.
If that doesn't help, cut it down to the simplest possible
self-contained program that demonstrates the problem and post it here.
--
Ian.
On Thu, Mar 4, 2010 at 6:04 AM, ajay_gupta wrote:
>
> Erick,
> w_context and c
Erick,
w_context and context_str are local to this method and are used only for
2500 K documents not entire 70 k. I am clearing the hashmap after each 2500k
doc processing and also I printed memory consumed by hashmap which is kind
of constant for each chunk processing. For each invocation of
up
The first place I'd look is how big my your strings
got. w_context and context_str come to mind. My
first suspicion is that you're building ever-longer
strings and around 70K documents your strings
are large enough to produce OOMs.
FWIW
Erick
On Wed, Mar 3, 2010 at 1:09 PM, ajay_gupta wrote:
>
Mike,
Actually my documents are very small in size. We have csv files where each
record represents a document which is not very large so I don't think
document size is an issue.
For each record I am tokenizing it and for each token I am keeping 3
neighbouring tokens in a Hashtable. After X number
The worst case RAM usage for Lucene is a single doc with many unique
terms. Lucene allocates ~60 bytes per unique term (plus space to hold
that term's characters = 2 bytes per char). And, Lucene cannot flush
within one document -- it must flush after the doc has been fully
indexed.
This past thr
Interpolating from your data (and, by the way, some code
examples would help a lot), if you're reopening the index
reader to pick up recent additions but not closing it if a
different one is returned from reopen, you'll consume
resources. From the JavaDocs...
IndexReader new = r.reopen();
if (ne
Lucene doesn't load everything into memory and can carry on running
consecutive searches or loading documents for ever without hitting OOM
exceptions. So if it isn't failing on a specific document the most
likely cause is that your program is hanging on to something it
shouldn't. Previous docs? Fi
Ian,
OOM exception point varies not fixed. It could come anywhere once memory
exceeds a certain point.
I have allocated 1 GB memory for JVM. I haven't used profiler.
When I said after 70 K docs it fails i meant approx 70k documents but if I
reduce memory then it will OOM before 70K so its not sp
It's not searching that I'm wondering about. The memory size, as
far as I understand, really only has document resolution. That is, you
can't index a part of a document, flush to disk, then index the rest of
the document. The entire document is parsed into memory, and only
then flushed to disk if R
Where exactly are you hitting the OOM exception? Have you got a stack
trace? How much memory are you allocating to the JVM? Have you run a
profiler to find out what is using the memory?
If it runs OK for 70K docs then fails, 2 possibilities come to mind:
either the 70K + 1 doc is particularly la
Hi Erick,
I tried setting setRAMBufferSizeMB as 200-500MB as well but still it goes
OOM error.
I thought its filebased indexing so memory shouldn't be an issue but you
might be right that when searching it might be using lot of memory ? Is
there way to load documents in chunks or someothere way
Ajay,
Here is another thread I started on the same issue.
http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe
n-indexing-large-files
Paul
-Original Message-
From: java-user-return-45254-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45254-paul.
Ajay,
I've posted a few times on OOM issues. Here is one thread.
http://mail-archives.apache.org/mod_mbox//lucene-java-user/200909.mbox/%
3c5b20def02611534db08854076ce825d803626...@sc1exc2.corp.emainc.com%3e
I'll try and get some more links to you from some other threads I
started for OOM issue
I'm not following this entirely, but these docs may be huge by the
time you add context for every word in them. You say that you
"search the existing indices then I get the content and append".
So is it possible that after 70K documents your additions become
so huge that you're blowing up? Have
Its about building a custom similarity class that scores using your
normalization factors etc.
This might help in that case,
http://www.gossamer-threads.com/lists/lucene/java-user/69553
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opini
Hi Anshum,
Even that document says that higher frequency implied higher score. My
doubt is if the score is based only on the frequency, won't it be
inappropriate for Internet based search? For example, if Google did the
same thing, when I search for "Microsoft", there is a chance that Google
Hi msr,
Perhaps this could be useful for you. Lucene implements a modified vector
space model in short.
http://jayant7k.blogspot.com/2006/07/document-scoringcalculating-relevance_08.html
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the op
Try Hibernate Search - http://www.hibernate.org/410.html
John G.
-Original Message-
From: ??? [mailto:[EMAIL PROTECTED]
Sent: Friday, August 22, 2008 3:27 AM
To: java-user@lucene.apache.org
Subject: Lucene Indexing DB records?
Guess I don't quite understand why there are so few posts ab
> Actually there are many projects for Lucene + Database. Here is a list I
> know:
>
> * Hibernate Search
> * Compass, (also Hibernate + Lucene)
> * Solr + DataImportHandler (Searching + Crawler)
> * DBSight, (Specific for database, closed source, but very customizable,
> easy to setup)
> * Browse
Actually there are many projects for Lucene + Database. Here is a list I
know:
* Hibernate Search
* Compass, (also Hibernate + Lucene)
* Solr + DataImportHandler (Searching + Crawler)
* DBSight, (Specific for database, closed source, but very customizable,
easy to setup)
* Browse Engine
--
Chris
You might also want to look at Solr and DataImportHandler.
http://lucene.apache.org/solr
http://wiki.apache.org/solr/DataImportHandler
On Fri, Aug 22, 2008 at 2:56 PM, ??? <[EMAIL PROTECTED]> wrote:
> Guess I don't quite understand why there are so few posts about Lucene
> indexing DB records. S
Would a Function Query (ValueSourceQuery, see the
org.apache.lucene.search.function package) work in this case?
-Grant
On May 4, 2008, at 9:35 AM, Vaijanath N. Rao wrote:
Hi Chris,
Sorry for the cross-posting and also for not making clear the
problem. Let me try to explain the problem at
Hi Chris,
Sorry for the cross-posting and also for not making clear the problem.
Let me try to explain the problem at my hand.
I am tying to write a CBIR (Content Based Image Reterival) frame work
using lucene. As each document have entities such as title, description,
author and so on. I a
Vaijanath,
I think I would do things in a different fashion:
Lucene default distance metric is based on tf/idf and the cosine
model, i.e. the frequencies of items. I believe the values that you
are adding as Fields are the values in n-space for each of these
image-based attributes. I don't believe
: Hi Lucene-user and Lucene-dev,
Please do not cross post -- java-user is the suitable place for your
question.
: Obviously there is something wrong with the above approach (as to get the
: correct document we need to get all the documents and than do the required
: distance calculation), but t
TED]>
To: java-user@lucene.apache.org
Sent: Friday, 26 October, 2007 5:31:36 AM
Subject: Re: lucene indexing doubts
hi,
thanks for your response. I think you hanven't got what my question
is? I
will explain with an example. I have a folder which contains the
indexed
files. so, suppose i
26 okt 2007 kl. 06.31 skrev poojasreejith:
I have a folder which contains the indexed files. so, suppose if i
want to add one more indexed data into it, without deleting the
whole folder and performing the indexing for all the files again.
I want it to do only that one file and add the i
hi,
thanks for your response. I think you hanven't got what my question is? I
will explain with an example. I have a folder which contains the indexed
files. so, suppose if i want to add one more indexed data into it, without
deleting the whole folder and performing the indexing for all the fil
25 okt 2007 kl. 19.35 skrev poojasreejith:
Can anyone of you guide me, how to index into an already indexed
folder.
Right now, I am deleting the indexed info and running the indexer
again. I
dont want to do that. I want a method, how to append into the same
folder
when new files are ind
Thanks very much for the information. I did not include the other portion of
the stack trace because it was totally belonging to Jackrabbit library. Now
I guess the problem is due to the fact that Jackrabbit's latest version is
using Lucene 2.0 for its indexing purposes. So I will search some patch
: I think this bug is related to the one posted on Lucene JIRA:
: http://issues.apache.org/jira/browse/LUCENE-665
: Please let me know if there is any solution to this bug of Lucene.
note that the issue is "Closed, Resolution: Won't Fix" it was determined
that ultimately there was no bug in Luce
But then the core problem is that the index that is created is in a totally
corrupted state.
So deleting or keeping the lock does not make a difference as the Index
itself is not created properly.
The problem arises when the index is getting created itself.
Regards
Narendra
On 10/8/07, saikrishn
Lucene creates an lock on the index before using it and then unlock the
index, after using it. If the lucene is interuptted and is closed by force
the, index remains in locked state and it cannot be used.
Generally in linux lucene lock information file is create in /tmp directory.
Delete the lock
I think this bug is related to the one posted on Lucene JIRA:
http://issues.apache.org/jira/browse/LUCENE-665
Please let me know if there is any solution to this bug of Lucene.
Thanks
Narendra
On 10/8/07, Joe Attardi <[EMAIL PROTECTED]> wrote:
>
> On 10/8/07, Narendra yadala <[EMAIL PROTECTED]>
On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote:
>
> I do have permission to access Lucene files. They reside on my local
> machine.
> But still this is giving the error.I am using Windows XP operationg
> system.
>
Well, since you are opening an IndexReader (as evidenced by your stack
trace)
I do have permission to access Lucene files. They reside on my local
machine.
But still this is giving the error.I am using Windows XP operationg system.
Regards
Narendra
On 10/8/07, Joe Attardi <[EMAIL PROTECTED]> wrote:
>
> On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote:
> >
> > This is
On 10/8/07, Narendra yadala <[EMAIL PROTECTED]> wrote:
>
> This is the relevant portion of the stack trace:
>
> Caused by: java.io.IOException: Access is denied
> at java.io.WinNTFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:850)
> at or
This is the relevant portion of the stack trace:
Caused by: java.io.IOException: Access is denied
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:850)
at org.apache.jackrabbit.core.query.lucene.FSDirectory$1.obtain(
FSDirect
8 okt 2007 kl. 15.58 skrev Narendra yadala:
Hi All
I am getting this error when I am doing Indexing using Lucene.
java.io.IOException: Access is denied on
java.io.WinNTFileSystem.createFileExclusively
Please let me know if there is any fix for this bug.
Please supply the complete stack trace
Hi Madhu,
Madhu wrote:
> i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
> files. for japanese pdf files its giving the below exception.
>
> caught a class java.io.IOException
> with message: Unknown encoding for 'UniJIS-UCS2-H'
>
> Can any one help me , how to set th
30 aug 2007 kl. 11.24 skrev Madhu:
Hi all..
I am trying to index 5Mb excel file ,but while indexing using poi
3..Its
giving me out of memory exception.
Can any one knows how to index large size excle files files.
Increase the maximum VM heap size?
http://blogs.sun.com/watt/resource/jvm-
Le dimanche 22 juillet 2007 à 13:17 -0500, Dmitry a écrit :
> Mathieru,
> I never used Compass, i know that there is integration Shards /Search with
> Hibernate, but it absolutely different what actually I need, probably I can
> take a look on it. any way thanks
> thanks,
> DT
Not only hibernate
carme" <[EMAIL PROTECTED]>
To:
Sent: Sunday, July 22, 2007 4:16 AM
Subject: Re: Lucene indexing for PDM system like Windchill
If you wont to index Hibernate persisted data, just use Compass.
M.
Le 22 juil. 07 à 04:19, Dmitry a écrit :
Folks,
Trying to integrate PDM system : WTPart ob
If you wont to index Hibernate persisted data, just use Compass.
M.
Le 22 juil. 07 à 04:19, Dmitry a écrit :
Folks,
Trying to integrate PDM system : WTPart obejct with Lucene
indexing search framework.
Part of the work is integration with persistent layer +
indeces storage+ mysql
Could no
22 jul 2007 kl. 04.19 skrev Dmitry:
Trying to integrate PDM system : WTPart obejct with Lucene
indexing search framework.
Part of the work is integration with persistent layer +
indeces storage+ mysql
You have a product data management software of some sort that use
MySQL via Hibernate
ok heres the deal with my application...
I have got an xml file with about 8000 of these properties...
Dighton Rock
The Rock
Across the Taunton River from Dighton in Dighton Rock
State
Park
Dighton
MA
I parse t
A couple of things would help us help you
1> tell us what you're trying to do. What's the point of your code?
Offhand, I can't tell what it is you're really after.
2> post an example of query.toString(); along with your sample
for one of the offending queries.
3> Post the query stri
- Original Message
From: Chaminda Amarasinghe <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, 12 March, 2007 9:30:22 AM
Subject: Re: Lucene Indexing - Getting Hited words in a query
Thanks mark harwood ,
I want something like Highlighter thing
where ca
istribution.
The Junit test rig gives some example uses.
- Original Message
From: karl wettin
To: java-user@lucene.apache.org
Sent: Monday, 12 March, 2007 7:40:34 AM
Subject: Re: Lucene Indexing - Getting Hited words in a query
12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe:
>
12 March, 2007 7:40:34 AM
Subject: Re: Lucene Indexing - Getting Hited words in a query
12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe:
> Why nobody is anwering me?
> Pls help me.
It might take some time until someone that knows the answer reads you
question.
>
> Chaminda Ama
Many thaks Vipin,
I'l check
Vipin <[EMAIL PROTECTED]> wrote:
Hi chaminda,
you just go through this link
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1
in this articles last portion(page 3) the author has suggested a way to
handle such kind of things(Composit
Hi chaminda,
you just go through this link
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1
in this articles last portion(page 3) the author has suggested a way to
handle such kind of things(Composite didyoumean parser)..
i think it will open up a way...
Regard
thanks karl,
karl wettin <[EMAIL PROTECTED]> wrote:
12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe:
> Why nobody is anwering me?
> Pls help me.
It might take some time until someone that knows the answer reads you
question.
>
> Chaminda Amarasinghe wrote:
> Hi all,
>
> I'm new to this grou
12 mar 2007 kl. 08.35 skrev Chaminda Amarasinghe:
Why nobody is anwering me?
Pls help me.
It might take some time until someone that knows the answer reads you
question.
Chaminda Amarasinghe <[EMAIL PROTECTED]> wrote:
Hi all,
I'm new to this group,
I'm using lucene for indexing. I
Why nobody is anwering me?
Pls help me.
Chaminda Amarasinghe <[EMAIL PROTECTED]> wrote:
Hi all,
I'm new to this group,
I'm using lucene for indexing. I have a problem. Any help gratly appreciate.
Please see the following code
// three fields
MultiFieldQueryParser parser = new MultiFieldQuer
te:
This document should contain the information you need :
http://lucene.sourceforge.net/talks/inktomi/
Damien.
-Original Message-
From: Sairaj Sunil [mailto:[EMAIL PROTECTED]
Sent: 26 January 2007 03:22
To: java-user@lucene.apache.org
Subject: Re: Lucene Indexing
Hi
I was asking wh
]> wrote:
This document should contain the information you need :
http://lucene.sourceforge.net/talks/inktomi/
Damien.
-Original Message-
From: Sairaj Sunil [mailto:[EMAIL PROTECTED]
Sent: 26 January 2007 03:22
To: java-user@lucene.apache.org
Subject: Re: Lucene Indexing
Hi
I was asking w
This document should contain the information you need :
http://lucene.sourceforge.net/talks/inktomi/
Damien.
-Original Message-
From: Sairaj Sunil [mailto:[EMAIL PROTECTED]
Sent: 26 January 2007 03:22
To: java-user@lucene.apache.org
Subject: Re: Lucene Indexing
Hi
I was asking what
Hi
I was asking what exactly is the inverted indexing strategy used for storing
the index. Is it batch-based index/b-tree based/segment-based data structure
that is used as an index data structure.
On 1/25/07, Rajiv Roopan <[EMAIL PROTECTED]> wrote:
http://lucene.apache.org/java/docs/api/org/
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
On 1/24/07, Sairaj Sunil <[EMAIL PROTECTED]> wrote:
Hi all,
Can you tell me the exact indexing algorithm used by Lucene. or give some
links to the documents that describe the algorithm used by lucene
Thanks in adva
Hello Nick!
Thanks for your help, it´s useful for me
Bye
-Mensaje original-
De: Nick Burch [mailto:[EMAIL PROTECTED]
Enviado el: viernes, 30 de junio de 2006 12:19
Para: java-user@lucene.apache.org
Asunto: Re: Lucene indexing PPT
On Fri, 30 Jun 2006, mcarcelen wrote:
> I´m trying
On Fri, 30 Jun 2006, mcarcelen wrote:
> I´m trying to build a index with PPT files. I have downloaded the api
> POI, "poi.bin.3.0" and "poi.src.3.0", but I don´t know where may I have
> to unzip them. I´d like to build the index by the command line, the same
> way as
I don't know about the lucene
Hi Chris,
I find this incredibly interesting!
Thank you for your full explanation. I was aware of the components, but not
the implementation.
... to provide a means to query both document full-text and metadata using
an RDF model
Is there any thing I can read about how you have some to this ap
adasal wrote:
As far as i have researched this I know that the gnowsis project uses both
rdf and lucene, but I have not had time to determine their relationship.
www.gnowsis.org/
I can tell you a bit about Gnowsis, as we (Aduna) are cooperating with
the Gnowsis people on RDF creation, storage
AIL PROTECTED]
> Enviado el: martes, 27 de junio de 2006 17:38
> Para: java-user@lucene.apache.org
> Asunto: Re: Lucene indexing pdf
>
> I used PDFBox library as mentioned in Lucene in Action. It works for me.
> You can access it from www.pdfbox.org
>
> suba suresh
>
>
nsaje original-
De: Suba Suresh [mailto:[EMAIL PROTECTED]
Enviado el: martes, 27 de junio de 2006 17:38
Para: java-user@lucene.apache.org
Asunto: Re: Lucene indexing pdf
I used PDFBox library as mentioned in Lucene in Action. It works for me.
You can access it from www.pdfbox.org
suba s
I used PDFBox library as mentioned in Lucene in Action. It works for me.
You can access it from www.pdfbox.org
suba suresh
mcarcelen wrote:
Hi,
I´m new with Lucene and I´m trying to index a pdf but when I query
everything it returns nothing. Can anyone help me?
Thans a lot
Teresa
---
Hi Teresa
You need to convert the pdf file into text format before adding the
text to the Lucene index.
You may like to look at http://www.pdfbox.org/ for a library to
convert pdf files to text format.
Patrick
On 27/06/06, mcarcelen <[EMAIL PROTECTED]> wrote:
Hi,
I´m new with Lucene and I´m t
Lucene does not provide this out of the box. You will have to write a
program to do it and feed the results to Lucene.
If I remember right, these files are in XML, so you can probably use SAX
or a pull parser.
I think a number of TREC participants, in the past, have used Lucene, so
you may
Igor Bolotin wrote:
Does it make sense to change TermInfosWriter.FORMAT in the patch?
Yes. This should be updated for any change to the format of the file,
and this certainly constitutes a format change. This discussion should
move to [EMAIL PROTECTED]
Doug
--
Does it make sense to change TermInfosWriter.FORMAT in the patch?
Igor
On 3/27/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Igor Bolotin wrote:
> > If somebody is interested - I can post our changes in TermInfosWriter
> and
> > SegmentTermEnum code, although they are pretty trivial.
>
> Pleas
Doug Cutting wrote:
Igor Bolotin wrote:
If somebody is interested - I can post our changes in TermInfosWriter
and
SegmentTermEnum code, although they are pretty trivial.
Please submit this as a patch attached to a bug report.
I contemplated making this change to Lucene myself, when writing
Igor Bolotin wrote:
If somebody is interested - I can post our changes in TermInfosWriter and
SegmentTermEnum code, although they are pretty trivial.
Please submit this as a patch attached to a bug report.
I contemplated making this change to Lucene myself, when writing Nutch's
FsDirectory, b
I would like to see lucene operate with hadoop
As you rightly pointed out, writing using FSDirectory to DFS would be a
performance issue.
I am interested in the idea. But i do not know how much time i can
contribute to this because of the little time which i can spare.
If anyone else is interest
2005 1:58 AM
To: java-user@lucene.apache.org
Subject: Re: lucene indexing performance
One immediate optimization would be to only close the writer and open
the reader if the document is present. You can have a reader open and
do searches while indexing (and optimization) are underway. It'
One immediate optimization would be to only close the writer and open
the reader if the document is present. You can have a reader open and
do searches while indexing (and optimization) are underway. It's just
the delete operation that requires you to close the writer (so you don't
have two d
99 matches
Mail list logo