+1
Penghui
On Fri, Jun 17, 2022 at 11:40 AM Yubiao Feng
wrote:
> Hi Enrico
>
> > I am not sure I understand the part of making it configurable via a
> classname.
> I believe it is better to simply have a flag
> "transactionEnableBatchWrites".
> Otherwise the matrix of possible implementations will grow without limits.
>
> Good idea, I've modified the design and added a switch in the Configure
> Changes section. Could you take a look again.
>
> On Fri, Jun 10, 2022 at 7:14 PM Enrico Olivelli
> wrote:
>
> > I have read the PIP, and overall I agree with the design.
> > Good work !
> >
> > I am not sure I understand the part of making it configurable via a
> > classname.
> > I believe it is better to simply have a flag
> > "transactionEnableBatchWrites".
> > Otherwise the matrix of possible implementations will grow without
> limits.
> >
> > Enrico
> >
> > Il giorno ven 10 giu 2022 alle ore 11:35 Yubiao Feng
> > ha scritto:
> > >
> > > Hi Pulsar community:
> > >
> > > I open a pip to discuss "Batch writing ledger for transaction
> operation"
> > >
> > > Proposal Link: https://github.com/apache/pulsar/issues/15370
> > >
> > > ## Motivation
> > >
> > > Before reading the background, I suggest you read section “Transaction
> > > Flow” of [PIP-31: Transactional Streaming](
> > >
> >
> https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx
> > > )
> > >
> > > ### Normal Flow vs. Transaction Flow
> >
> > > 
> > > In *Figure 1. Normal Flow vs. Transaction Flow*:
> > > - The gray square boxes represent logical components.
> > > - All the blue boxes represent logs. The logs are usually Managed
> ledger
> > > - Each arrow represents the request flow or message flow. These
> > operations
> > > occur in sequence indicated by the numbers next to each arrow.
> > > - The black arrows indicate those shared by transaction and normal
> flow.
> > > - The blue arrows represent normal-message-specific flow.
> > > - The orange arrows represent transaction-message-specific flow.
> > > - The sections below are numbered to match the operations showed in the
> > > diagram(differ from [PIP-31: Transactional Streaming](
> > >
> >
> https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx
> > > ))
> > >
> > >
> > > 2.4a Write logs to ledger which Acknowledgement State is
> PENDING_ACK
> > > [Acknowledgement State Machine](
> > >
> >
> https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#bookmark=id.4bikq6sjiy8u
> > )
> > > tells about the changes of the Acknowledge State and why we need
> > persistent
> > > “The Log which the Acknowledgement State is PENDING_ACK”.
> > > 2.4a’ Mark messages is no longer useful with current subscription
> > > Update `Cursor` to mark the messages as DELETED. So they can be
> deleted.
> > > 3.2b Mark messages is no longer useful with current subscription
> > > The implementation here is exactly the same as 2.4a’, except that the
> > > execution is triggered later, after the Transaction has been committed.
> > >
> > >
> > > ### Analyze the performance cost of transaction
> > > As you can see Figure 1. Normal Flow
> > vs.
> > > Transaction Flow]: 2.4a 'and 3.2b are exactly the same logic, so
> the
> > > remaining orange arrows are the additional performance overhead of all
> > > transactions.
> > > In terms of whether or not each transaction is executed multiple times,
> > we
> > > can split the flow into two classes(Optimizing a process that is
> executed
> > > multiple times will yield more benefits):
> > > - Executed once each transaction: flow-1.x and flow-3.x
> > > - Executed multiple times each transaction: flow-2.x
> > >
> > > So optimizing the flow 2.x with a lot of execution is a good choice.
> > Let's
> > > split flow-2.x into two groups: those that cost more and those that
> cost
> > > less:
> > > - No disk written: flow-2.1 and fow-2.3
> > > - Disk written: fow-2.1a, fow-2.3a, flow-2.4a
> > >
> > > From the previous analysis, we found that optimizing flow-2.1a,
> > flow-2.3a,
> > > flow-2.4a would bring the most benefits, and batch writes would be an
> > > excellent solution for multiple disk writes. Flow-2.1a and Flow-2.3a
> are
> > > both manipulations written into the transaction log, we can combine
> them
> > in
> > > one batch; 2.4a is the operation of writing pending ACK log, we combine
> > > multiple 2.4a's into one batch for processing.
> > > As we can see from “Transaction Flow” of [PIP-31: Transactional
> > Streaming](
> > >
> >
> https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx
> > ),
> > > these instructions are strictly sequential (guaranteed by the client):
> > > - flow-1.x end before flow-2.x start
> > > - flow-2.x end before flow-3.x start
>