Can IndexSort help here?
-Original Message-
From: "Erick Erickson"
Sent: 11/16/2016 9:29
To: "java-user"
Subject: Re: Possible to cause documents to be contiguous after forceMerge?
Well, codecs are pluggable so if you can show that you'd get
an improvement (however you measure them)
Well, codecs are pluggable so if you can show that you'd get
an improvement (however you measure them) and that whatever
you have in mind wouldn't penalize the general case you could
submit it as a proposal/patch.
Best,
Erick
On Tue, Nov 15, 2016 at 6:21 PM, Kevin Burton wrote:
> On Tue, Nov 15,
On Tue, Nov 15, 2016 at 6:16 PM, Erick Erickson
wrote:
> You can make no assumptions about locality in terms of where separate
> documents land on disk. I suppose if you have the whole corpus at index
> time you
> could index these "similar" documents contiguously. T
>
Wow.. that's shockingly fr
You can make no assumptions about locality in terms of where separate
documents land
on disk. I suppose if you have the whole corpus at index time you
could index these
"similar" documents contiguously. Then, assuming there was absolutely never any
updates/deletes I _think_ the doc might tend to be
I have a large index (say 500GB) that with a large percentage of near
duplicate documents.
I have to keep the documents there (can't delete them) as the metadata is
important.
Is it possible to get the documents to be contiguous somehow?
Once they are contiguous then they will compress very well