Yes, as you suggested simply wrapping up postings with LZ4 could not be best-fit for all cases. Byte-Pair Encoding looks very promising
I accidentally stumbled upon this JIRA and found it was abandoned mid-way. Thanks for sharing the details -- Ravi On Fri, Jul 3, 2015 at 5:46 PM, Adrien Grand <jpou...@gmail.com> wrote: > We try to make the default postings format a good default for most > use-cases and it's unclear to me whether trading speed of multi-term > queries for compression of the terms dictionary would be a better > trade-off for most users. I think this idea needs more iterations, for > instance on this issue I experimented with lz4 which works with blocks > of data, so in order to read a single byte, you need to decompress > everything. Robert suggested that we could use something more > fine-grained like byte pair encoding[1]. I think this is a nice idea > and it would be interesting to see how it would affect multi-term > queries compared to lz4 blocks. > > [1] https://en.wikipedia.org/wiki/Byte_pair_encoding > > On Fri, Jul 3, 2015 at 12:09 PM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > An unrelated question… > > > > I came across a JIRA issue where you tried compressing Terms-Dictionary > > just before writing and achieved reduction in storage space… > > > > https://issues.apache.org/jira/browse/LUCENE-4702 > > > > Was it abandoned because of Terms-Dict intensive queries like Fuzzy etc.. > > din't behave well? > > > > Currently we don't have plans of providing queries like Fuzzy/Re-spell > > etc.. and thought could benefit from it > > > > > > On Thu, Jul 2, 2015 at 6:02 PM, Ravikumar Govindarajan < > > ravikumar.govindara...@gmail.com> wrote: > > > >> Thanks Adrien… > >> > >> Works like a charm!!! > >> > >> On Wed, Jul 1, 2015 at 10:22 PM, Adrien Grand <jpou...@gmail.com> > wrote: > >> > >>> Hi Ravikumar, > >>> > >>> You need to run a BooleanQuery with two clauses: > >>> - a must clause that matches all parent documents > >>> - a must_not clause that matches all parents that have children > >>> > >>> Building this second clause can be done easily with a > >>> ToParentBlockJoinQuery around a child query that matches all your > >>> children documents. > >>> > >>> > >>> -- > >>> Adrien > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > >> > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >