bf.getBits() * Long.BYTES  may be as long as Math.Ceil(
Shape.getNumberOfBits() / 8.0 ) or it may be shorter.

I am building byte buffers of fixed length that is the maximum size that
any valid bf.getBits() * Long.BYTES I need to know
Shape.getNumberOfBytes().
The  conversion is required for some Bloom filter indexing techniques.

And while serialization is outside the scope of the library, it is only
reasonable that we provide enough information to allow developers to
serialize/deserialse the data.  For example BloomFilter allows you to get
either the long[] representation or the list of bit indexes (via OfInt) and
there are ways to reconstruct a BloomFilter if you were to write that out
and read it back.



On Wed, Mar 18, 2020 at 4:07 PM Alex Herbert <alex.d.herb...@gmail.com>
wrote:

>
>
> > On 18 Mar 2020, at 14:39, Claude Warren <cla...@xenei.com> wrote:
> >
> >>> Shape Discussion:
> >>>
> >>> as for getNumberOfBytes() it should return the maximum number of bytes
> >>> returned by a getBits() call to a filter with this shape.  So yes, if
> > there
> >>> is a compressed internal representation, no it won't be that.  It is a
> >>> method on Shape so it should literally be Math.ceil( getNumberOfBits()
> /
> >>> 8.0 )
> >>>
> >>> Basically, if you want to create an array that will fit all the bits
> >>> returned by BloomFilter.iterator() you need an array of
> >>> Shape.getNumberOfBytes().  And that is actually what I use it for.
> >
> >> Then you are also mapping the index to a byte index and a bit within the
> > byte. So if you are doing these two actions then this is something that
> you
> > should control.
> >
> > BloomFilter.getBits returns a long[].  that long[] may be shorter than
> the
> > absolute number of bytes specified by Shape.  It also may be longer.
> >
> > If you want to create a copy of the byte[] you have to know how long it
> > should be.  The only way to determine that is from Shape, and currently
> > only if you do the Ceil() method noted above.  There is a convenience in
> > knowing how long (in bytes) the buffer can be.
>
> Copy of what byte[]?
>
> There is no method to create a byte[] for a BloomFilter. So no need for
> getNumberOfBytes().
>
> Are you talking about compressing the long[] to a byte[] by truncating the
> final long into 1-8 bytes?
>
>     BloomFilter bf;
>     long[] bits = bf.getBits();
>     ByteBuffer bb = ByteBuffer.allocate(bits.length *
> Long.BYTES).order(ByteOrder.LITTLE_ENDIAN);
>     Arrays.stream(bits).forEachOrdered(bb::putLong);
>     byte[] bytes = bb.array();
>     int expected = (int) Math.ceil(bf.getShape().getNumberOfBits() / 8.0);
>     if (bytes.length != expected) {
>         bytes = Arrays.copyOf(bytes, expected);
>     }
>
> For a BloomFilter of any reasonable number of bits the storage saving will
> be small.
>
> Is this for serialisation? This is outside of the scope of the library.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to