[jira] [Updated] (LUCENE-5914) More options for stored fields compression

Robert Muir (JIRA) Sat, 29 Nov 2014 22:13:32 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-5914:
--------------------------------
    Attachment: LUCENE-5914.patch

Some updates:
* port to trunk apis (i guess this was outdated?)
* fix some javadoc bugs
* nuke lots of now-unused stuff in .compressing, only still used for term 
vectors
* improve float/double compression: it was not so effective and wasteful. these 
now write 1..5 and 1..9 bytes.

We should try to do more cleanup:
* I don't like the delegator. maybe its the best solution, but at least it 
should not write its own file. I think we should revive SI.attributes 
(properly: so it rejects any attribute puts on dv updates) and use that.
* The delegator shouldnt actually need to delegate the writer? If i add this 
code, all tests pass:
{code}
    final StoredFieldsWriter in = format.fieldsWriter(directory, si, context);
    if (true) return in; // wrapper below is useless
    return new StoredFieldsWriter() {
{code}
This seems to be all about delegating some manual file deletion on abort() ? Do 
we really need to do this? If we have some bugs around indexfiledeleter where 
it doesn't do the right thing, enough to warrant such apis, then we should have 
tests for it. Such tests would also show the current code deletes the wrong 
filename:
{code}
IOUtils.deleteFilesIgnoringExceptions(directory, formatName); // formatName is 
NOT the file the delegator writes
{code}
But this is obselete if we add back SI.attributes.
* The header check logic should be improved. I don't know why we need the 
Reader.checkHeader method, why cant we just check it with the other files? 
* We should try to use checkFooter(Input, Throwable) for better corruption 
messages, with this type of logic. It does more an appends suppressed 
exceptions when things go wrong:
{code}
try (ChecksumIndexInput input =(...) {
  Throwable priorE = null;
  try {
    // ... read a bunch of stuff ... 
  } catch (Throwable exception) {
    priorE = exception;
  } finally {
    CodecUtil.checkFooter(input, priorE);
  }
}
{code}
* Any getChildResources() should return immutable list: doesn't seem to always 
be the case. Maybe assertingcodec can be improved to actually test this 
automatically.

I will look more tomorrow.

> More options for stored fields compression
> ------------------------------------------
>
>                 Key: LUCENE-5914
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5914
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 5.0
>
>         Attachments: LUCENE-5914.patch, LUCENE-5914.patch, LUCENE-5914.patch
>
>
> Since we added codec-level compression in Lucene 4.1 I think I got about the 
> same amount of users complaining that compression was too aggressive and that 
> compression was too light.
> I think it is due to the fact that we have users that are doing very 
> different things with Lucene. For example if you have a small index that fits 
> in the filesystem cache (or is close to), then you might never pay for actual 
> disk seeks and in such a case the fact that the current stored fields format 
> needs to over-decompress data can sensibly slow search down on cheap queries.
> On the other hand, it is more and more common to use Lucene for things like 
> log analytics, and in that case you have huge amounts of data for which you 
> don't care much about stored fields performance. However it is very 
> frustrating to notice that the data that you store takes several times less 
> space when you gzip it compared to your index although Lucene claims to 
> compress stored fields.
> For that reason, I think it would be nice to have some kind of options that 
> would allow to trade speed for compression in the default codec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-5914) More options for stored fields compression

Reply via email to