[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

Robert Muir (JIRA) Mon, 16 Jul 2012 06:56:42 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415126#comment-13415126
 ]


Robert Muir commented on LUCENE-4225:
-------------------------------------

Looks good Mike. I think the slower cases are all explained: the skip interval 
is crazy, and lazy-loading the freq blocks should fix IntNRQ. (Though, i dont 
know how you get away with AndHighHigh currently).

Still the second benchmark could be confusing: we are mixing concerns 
benchmarking FOR vs Vint and also different index layouts :)
Maybe we can we benchmark this layout with bulkvint vs Lucene40 to get a better 
idea of just how the index layout is doing?

I like how clean it is without the payloads crap: I still think we probably 
need to know up-front if the consumer is going to consume a payload off the 
enum for positional queries, without that its going to make things like this 
really hairy and messy.

Do you think its worth it that even for "big terms" we write the last partial 
block as vints the way we do? 
Since these terms are going to be biggish anyway (at least enough to fill a 
block), this seems not worth the trouble?

Instead if we only did this for low-freq terms, the code might even be 
clearer/faster, but I guess there would be a downside of
not being able to reuse these enums as much that would hurt e.g. NIOFSDirectory?

Thanks for bringing all this back to life... and the new test looks awesome! I 
think it will really make our lives a lot easier...

                
> New FixedPostingsFormat for less overhead than SepPostingsFormat
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4225
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4225
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4225.patch
>
>
> I've worked out the start at a new postings format that should have
> less overhead for fixed-int[] encoders (For,PFor)... using ideas from
> the old bulk branch, and new ideas from Robert.
> It's only a start: there's no payloads support yet, and I haven't run
> Lucene's tests with it, except for one new test I added that tries to
> be a thorough PostingsFormat tester (to make it easier to create new
> postings formats).  It does pass luceneutil's performance test, so
> it's at least able to run those queries correctly...
> Like Lucene40, it uses two files (though once we add payloads it may
> be 3).  The .doc file interleaves doc delta and freq blocks, and .pos
> has position delta blocks.  Unlike sep, blocks are NOT shared across
> terms; instead, it uses block encoding if there are enough ints to
> encode, else the same Lucene40 vInt format.  This means low-freq terms
> (< 128 = current default block size) are always vInts, and high-freq
> terms will have some number of blocks, with a vInt final block.
> Skip points are only recorded at block starts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4225) New FixedPostingsFormat for less overhead than SepPostingsFormat

Reply via email to