That lies in that my apps add indexes to those in RAM rather than update them.
So the size doubled. Seem not related to the OpenMode.CREATE option.
-- Original --
From: "Ian Lea";
Date: Wed, Jan 11, 2012 05:20 PM
To: "java-user";
Subject: Re: Build RAMDire
Thanks to all that have done a reply to my question.
Send regards,
Reyna
2012/1/11 Michael Wechner
> Maybe Tika is also of help to you
>
> http://tika.apache.org/
>
> HTH
>
> Michael
>
> Am 11.01.12 20:13, schrieb Reyna Melara:
>
>> Hi, my name is Reyna Melara I'm a PhD student form Mexico, an
If your e-mail client sends things in anything but plain text, you might try
switching the format to plain text. I've had the spam filter
reject formatted e-mail before...
May not be relevant, but it's worth a try.
Best
Erick
On Wed, Jan 11, 2012 at 12:44 PM, Bennett, Tony
wrote:
> I tried to u
I am currently using the following statement at the end of each index
writing, although I don't know if the writing modifies the indexes or not:
is = new IndexSearcher(IndexReader.openIfChanged(ir));
# is -> IndexSearcher, ir-> IndexReader
My question is how expensive to create a searcher insta
Maybe Tika is also of help to you
http://tika.apache.org/
HTH
Michael
Am 11.01.12 20:13, schrieb Reyna Melara:
Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a set
of 11,051,447 files with txt extension but the content of each file is in
fact in wiki format, I want and
Will do thanks
On Wed, Jan 11, 2012 at 3:37 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Yes, it's best to share one IndexSearcher/IndexReader across all
> threads... and if you ever find evidence this hurts concurrency then
> please post back :)
>
> Mike McCandless
>
> http://blo
Yes, it's best to share one IndexSearcher/IndexReader across all
threads... and if you ever find evidence this hurts concurrency then
please post back :)
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jan 11, 2012 at 3:29 PM, Cheng wrote:
> Will do if I see a perf gain.
>
> The other is
Will do if I see a perf gain.
The other issue is that in each thread my apps will not only do indexing
but searching. That means I will have to pass through the ram directory
instance, along with the writer instance, to every thread so that the
searcher can be built on.
Should I create a same rea
Yes that would work fine but you should see a net perf loss by
doing so (once you include time to flush/sync the RAMDir to an FSDir).
If you see a perf gain then please report back!
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jan 11, 2012 at 3:09 PM, Cheng wrote:
> Can I create
Can I create a RAMDirectory based writer and have it work cross all
threads? In the sense, I would like to use RAMDirectory every where and
have the RAMDirectory written to FSDirectory in the end.
I suppose that should work, right?
On Wed, Jan 11, 2012 at 2:31 PM, Michael McCandless <
luc...@mik
You might be interested in looking at ManifoldCF for getting your documents
into Solr. See http://incubator.apache.org/connectors for more details.
Karl
-Original Message-
From: ext Reyna Melara [mailto:reynamel...@gmail.com]
Sent: Wednesday, January 11, 2012 2:13 PM
To: java-user@luc
On Wed, Jan 11, 2012 at 1:32 PM, dyzc2010 wrote:
> Mike, do you mean if I create a FSDirectory based writer in first place, then
> the writer should be used in every thread rather than create a new
> RAMDirectory based writer in that thread?
Right.
> What about I do want to use RAMDirectory t
Hi Reyna,
I have never used it, but there is a WikipediaTokenizer defined in the
analyzer contrib:
http://lucene.apache.org/java/3_5_0/api/contrib-analyzers/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.html
You can find a test case for this tokenizer in the source code.
Hopefully othe
Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a set
of 11,051,447 files with txt extension but the content of each file is in
fact in wiki format, I want and I need them to be indexed, but I don't know
if I have to convert this content to flat text, I have been reading and I
Mike, do you mean if I create a FSDirectory based writer in first place, then
the writer should be used in every thread rather than create a new RAMDirectory
based writer in that thread?
What about I do want to use RAMDirectory to speed up the index and search
processes?
--
You shouldn't have to write first to intermediate RAMDirectorys
anymore just share a single IndexWriter instance across all of
your threads.
Mike McCandless
http://blog.mikemccandless.com
On Wed, Jan 11, 2012 at 12:19 PM, Cheng wrote:
> I have read a lot about IndexWriter and multi-threadin
I tried to unsubscribe from this list, without success.
I sent an email to 'java-user-unsubscr...@lucene.apache.org',
I received the "please confirm" response, requesting that I send
an email to:
java-user-uc.1326295748.hcdpoljefehgobokinbd-Bennett.Tony=con-way@lucene.apache.org
I did so, a
I have read a lot about IndexWriter and multi-threading over the Internet.
It seems to me that the normal practice is:
1) use a same indexwriter instance for multiple threads;
2) create an individual RAMDirectory per threads;
3) use addIndexes(Directory[]) methods to add to a local drive folder al
Call for Submission Berlin Buzzwords 2012 - Search, Store, Scale --
June 4 / 5. 2012
The event will comprise presentations on scalable data processing. We
invite you to submit talks on the topics:
* IR / Search - Lucene, Solr, katta, ElasticSearch or comparable solutions
* NoSQL - like CouchDB,
I think it's hard to compare the results here?
In test 1 (single IW shared across threads) you end up with one index.
In test 2 (private IW per thread) you end up with N indexes, which to
be "fair" need to be merged down into one index (eg with .addIndexes)?
Or seen another way, test 1 should ha
Hello all,
Recently i saw couple of discussions in LinkedIn group about generating
large data set or data corpus. I have compiled the same in to an article.
Hope it would be helpful. If you have any other links where we could get
large data set for free, please reply to this mail thread, i will up
No clue, I am not a hardware expert. Removing memory extensions one by
one (or binary-searching for the faulty one)?
Dawid
On Wed, Jan 11, 2012 at 10:47 AM, Frank Moss wrote:
> The same with IBM J9. The dump file is attached.
>
> It seems to be HW related. Recently, we have added more RAM. We ac
> I tried IndexWriterConfig.OpenMode CREATE, and the size is doubled.
Prove it.
--
Ian.
> The only way that is effective is the writer's deleteAll() methods.
>
> On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea wrote:
>
>> If you load an existing disk index into a RAMDirectory, make some
>> changes in
Contention. There is always a limit somewhere, I/O, CPU, memory, locks, ...
Use your OS tools or java profiling/logging/debugging to find out what
is going on - or just go with what works for you.
If you're doing something like loading data read from a database, it
is my experience that the bott
Opps, yes, sorry -- I only quickly looked at the invocation line on
stack overflow and overlooked it. -Xms4g shouldn't make any
difference.
Dawid
On Wed, Jan 11, 2012 at 10:02 AM, Frank Moss wrote:
> 4gb is the initial heap size. Are you thinking about Xss? I will try it
> as well as the rest
4gb is the initial heap size. Are you thinking about Xss? I will try it
as well as the rest of your suggestions and post back the results.
Thanks.
On Wed, Jan 11, 2012 at 9:56 AM, Dawid Weiss wrote:
> The dump you're getting indicates a sigserv in a garbage collection.
> This isn't unlikely
The dump you're getting indicates a sigserv in a garbage collection.
This isn't unlikely (there are bugs in there as well), but less likely
than a hardware error on your side... at least in my opinion. I would
experiment with the following:
1) do you really need a 4gb max stack? Seems weird to me.
27 matches
Mail list logo