I think that having a set of options on the ledger metadata will be a good
enhancement and I am sure we will do it as soon as it will be needed, maybe
we do not need it now.

Actually I think we will need to declare this durability-level at entry
level to support some uses cases in BP-14 document, let me explain two of
my usecases for which I need it:

At higher level we have to choices:

A) per-ledger durability options (JV proposal)
all addEntry operations are durable or non-durable and there is an explicit
'sync' API (+ forced sync at close)

B) per-entry durability options (original BP-14 proposal)
every addEntry has an own durable/non-durable option (sync/no-sync), with
the ability to call 'sync' without addEntry (+ forced sync at close)

I am speaking about the the database WAL case, I am using the ledger as
segment for the WAL of a database and I am writing all data changes in the
scope of a 'transaction' with the relaxed-durability flag, then I am
writing the 'transaction committed' entry with "strict durability"
requirement, this will in fact require that all previous entries are
persisted durably and so that the transaction will never be lost.

In this scenario we would need an addEntry + sync API in fact:

using option  A) the WAL will look like:
- open ledger no-sync = true
- addEntry (set foo=bar)  (this will be no-sync)
- addEntry (set foo=bar2) (this will be no-sync)
- addEntry (commit)
- sync

using option B) the WAL will look like
- open ledger
- addEntry (set foo=bar), no-sync
- addEntry (set foo=bar2), no-sync
- addEntry (commit), sync

in case B) we are "saving" one RPC call to every bookie (the 'sync' one)
same for single data change entries, like updating a single record on the
database, this with BK 4.5 "costs" only a single RPC to every bookie

Second case:
I am using BookKeeper to store binary objects, so I am packing more
'objects' (named sequences of bytes) into a single ledger, like you do when
you write many records to a file in a streaming fashion and keep track of
offsets of the beginning of every record (LedgerHandeAdv is perfect for
this case).
I am not using a single ledger per 'file' because it kills zookeeper to
create many ledgers very fast, in my systems I have big busts of writes,
which need to be really "fast", so I am writing multiple 'files' to every
single ledger. So the close-to-open consistency at ledger level is not
suitable for this case.
I have to write as fast as possible to this 'ledger-backed' stream, and as
with a 'traditional'  filesystem I am writing parts of each file and than
requiring 'sync' at the end of each file.
Using BookKeeper you need to split big 'files' into "little" parts, you
cannot transmit the contents as to "real" stream on network.

I am not talking about bookie level implementation details I would like to
define the high level API in order to support all the relevant known use
cases and keep space for the future,
at this moment adding a per-entry 'durability option' seems to be very
flexible and simple to implement, it does not prevent us from doing further
improvements, like namely skipping the journal.

Enrico



2017-08-26 19:55 GMT+02:00 Enrico Olivelli <eolive...@gmail.com>:

>
>
> On sab 26 ago 2017, 19:19 Venkateswara Rao Jujjuri <jujj...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> As promised during Thursday call, here is my proposal.
>>
>> *NOTE*: Major difference in this proposal compared to Enrico’s
>> <https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> NW8VOUUgUWVBmswCUOG158/edit#heading=h.q2rewiqndr5v>
>> is
>> making the durability a property of the ledger(type) as opposed to
>> addEntry(). Rest of the technical details have a lot of similarities.
>>
>
> Thank you JV. I have just read quickly the doc and your view is centantly
> broader.
> I will dig into the doc as soon as possible on Monday.
> For me it is ok to have a ledger wide configuration I think that the most
> important decision is about the API we will provide as in the future it
> will be difficult to change it.
>
>
> Cheers
> Enrico
>
>
>
>> https://docs.google.com/document/d/1g1eBcVVCZrTG8YZliZP0LVqvWpq43
>> 2ODEghrGVQ4d4Q/edit?usp=sharing
>>
>> On Thu, Aug 24, 2017 at 1:14 AM, Enrico Olivelli <eolive...@gmail.com>
>> wrote:
>>
>> > Thank you all for the comments and for taking a look to the document so
>> > soon.
>> > I have updated the doc, we will discuss the document at the meeting,
>> >
>> >
>> > Enrico
>> >
>> > 2017-08-24 2:27 GMT+02:00 Sijie Guo <guosi...@gmail.com>:
>> >
>> > > Enrico,
>> > >
>> > > Thank you so much! It is a great effort for putting this up. Overall
>> > looks
>> > > good. I made some comments, we can discuss at tomorrow's community
>> > meeting.
>> > >
>> > > - Sijie
>> > >
>> > > On Wed, Aug 23, 2017 at 8:25 AM, Enrico Olivelli <eolive...@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > > I have drafted a first proposal for BP-14 - Relax Durability
>> > > >
>> > > > We are talking about limiting the number of fsync to the journal
>> while
>> > > > preserving the correctness of the LAC protocol.
>> > > >
>> > > > This is the link to the wiki page, but as the issue is huge we
>> prefer
>> > to
>> > > > use Google Documents for sharing comments
>> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
>> > > > BP+-+14+Relax+durability
>> > > >
>> > > > This is the document
>> > > > https://docs.google.com/document/d/1JLYO3K3tZ5PJGmyS0YK_-
>> > > > NW8VOUUgUWVBmswCUOG158/edit?usp=sharing
>> > > >
>> > > > All comments are welcome
>> > > >
>> > > > I have added DL dev list in cc as the discussion is interesting for
>> > both
>> > > > groups
>> > > >
>> > > > Enrico Olivelli
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Jvrao
>> ---
>> First they ignore you, then they laugh at you, then they fight you, then
>> you win. - Mahatma Gandhi
>>
> --
>
>
> -- Enrico Olivelli
>

Reply via email to