On Mon, Jul 07, 2025 at 09:58:30PM +0530, Jyothi Kumar Seerapu wrote: > > > On 7/4/2025 1:11 AM, Dmitry Baryshkov wrote: > > On Thu, 3 Jul 2025 at 15:51, Jyothi Kumar Seerapu > > <quic_jseer...@quicinc.com> wrote: > > > > > > > > > > > > On 6/19/2025 9:46 PM, Jyothi Kumar Seerapu wrote: > > > > > > > > > > > > On 6/18/2025 1:02 AM, Dmitry Baryshkov wrote: > > > > > On Tue, 17 Jun 2025 at 17:11, Jyothi Kumar Seerapu > > > > > <quic_jseer...@quicinc.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 5/30/2025 10:12 PM, Dmitry Baryshkov wrote: > > > > > > > On Fri, May 30, 2025 at 07:36:05PM +0530, Jyothi Kumar Seerapu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 5/21/2025 6:15 PM, Dmitry Baryshkov wrote: > > > > > > > > > On Wed, May 21, 2025 at 03:58:48PM +0530, Jyothi Kumar > > > > > > > > > Seerapu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 5/9/2025 9:31 PM, Dmitry Baryshkov wrote: > > > > > > > > > > > On 09/05/2025 09:18, Jyothi Kumar Seerapu wrote: > > > > > > > > > > > > Hi Dimitry, Thanks for providing the review comments. > > > > > > > > > > > > > > > > > > > > > > > > On 5/6/2025 5:16 PM, Dmitry Baryshkov wrote: > > > > > > > > > > > > > On Tue, May 06, 2025 at 04:48:44PM +0530, Jyothi > > > > > > > > > > > > > Kumar Seerapu > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > The I2C driver gets an interrupt upon transfer > > > > > > > > > > > > > > completion. > > > > > > > > > > > > > > When handling multiple messages in a single > > > > > > > > > > > > > > transfer, this > > > > > > > > > > > > > > results in N interrupts for N messages, leading to > > > > > > > > > > > > > > significant > > > > > > > > > > > > > > software interrupt latency. > > > > > > > > > > > > > > > > > > > > > > > > > > > > To mitigate this latency, utilize Block Event > > > > > > > > > > > > > > Interrupt (BEI) > > > > > > > > > > > > > > mechanism. Enabling BEI instructs the hardware to > > > > > > > > > > > > > > prevent > > > > > > > > > > > > > > interrupt > > > > > > > > > > > > > > generation and BEI is disabled when an interrupt is > > > > > > > > > > > > > > necessary. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Large I2C transfer can be divided into chunks of 8 > > > > > > > > > > > > > > messages > > > > > > > > > > > > > > internally. > > > > > > > > > > > > > > Interrupts are not expected for the first 7 message > > > > > > > > > > > > > > completions, only > > > > > > > > > > > > > > the last message triggers an interrupt, indicating > > > > > > > > > > > > > > the > > > > > > > > > > > > > > completion of > > > > > > > > > > > > > > 8 messages. This BEI mechanism enhances overall > > > > > > > > > > > > > > transfer > > > > > > > > > > > > > > efficiency. > > > > > > > > > > > > > > > > > > > > > > > > > > Why do you need this complexity? Is it possible to > > > > > > > > > > > > > set the > > > > > > > > > > > > > DMA_PREP_INTERRUPT flag on the last message in the > > > > > > > > > > > > > transfer? > > > > > > > > > > > > > > > > > > > > > > > > If i undertsand correctly, the suggestion is to get the > > > > > > > > > > > > single > > > > > > > > > > > > intetrrupt for last i2c message only. > > > > > > > > > > > > > > > > > > > > > > > > But With this approach, we can't handle large number of > > > > > > > > > > > > i2c > > > > > > > > > > > > messages > > > > > > > > > > > > in the transfer. > > > > > > > > > > > > > > > > > > > > > > > > In GPI driver, number of max TREs support is harcoded > > > > > > > > > > > > to 64 > > > > > > > > > > > > (#define > > > > > > > > > > > > CHAN_TRES 64) and for I2C message, we need Config > > > > > > > > > > > > TRE, GO TRE > > > > > > > > > > > > and > > > > > > > > > > > > DMA TREs. So, the avilable TREs are not sufficient to > > > > > > > > > > > > handle > > > > > > > > > > > > all the > > > > > > > > > > > > N messages. > > > > > > > > > > > > > > > > > > > > > > It sounds like a DMA driver issue. In other words, the DMA > > > > > > > > > > > driver can > > > > > > > > > > > know that it must issue an interrupt before exausting 64 > > > > > > > > > > > TREs in > > > > > > > > > > > order > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here, the plan is to queue i2c messages > > > > > > > > > > > > (QCOM_I2C_GPI_MAX_NUM_MSGS > > > > > > > > > > > > or 'num' incase for less messsages), process and > > > > > > > > > > > > unmap/free > > > > > > > > > > > > upon the > > > > > > > > > > > > interrupt based on QCOM_I2C_GPI_NUM_MSGS_PER_IRQ. > > > > > > > > > > > > > > > > > > > > > > Why? This is some random value which has no connection > > > > > > > > > > > with > > > > > > > > > > > CHAN_TREs. > > > > > > > > > > > Also, what if one of the platforms get a 'liter' GPI which > > > > > > > > > > > supports less > > > > > > > > > > > TREs in a single run? Or a super-premium platform which > > > > > > > > > > > can use 256 > > > > > > > > > > > TREs? Please don't workaround issues from one driver in > > > > > > > > > > > another > > > > > > > > > > > one. > > > > > > > > > > > > > > > > > > > > We are trying to utilize the existing CHAN_TRES mentioned > > > > > > > > > > in the > > > > > > > > > > GPI driver. > > > > > > > > > > With the following approach, the GPI hardware can process N > > > > > > > > > > number of I2C > > > > > > > > > > messages, thereby improving throughput and transfer > > > > > > > > > > efficiency. > > > > > > > > > > > > > > > > > > > > The main design consideration for using the block event > > > > > > > > > > interrupt > > > > > > > > > > is as > > > > > > > > > > follows: > > > > > > > > > > > > > > > > > > > > Allow the hardware to process the TREs (I2C messages), > > > > > > > > > > while the > > > > > > > > > > software > > > > > > > > > > concurrently prepares the next set of TREs to be submitted > > > > > > > > > > to the > > > > > > > > > > hardware. > > > > > > > > > > Once the TREs are processed, they can be freed, enabling the > > > > > > > > > > software to > > > > > > > > > > queue new TREs. This approach enhances overall optimization. > > > > > > > > > > > > > > > > > > > > Please let me know if you have any questions, concerns, or > > > > > > > > > > suggestions. > > > > > > > > > > > > > > > > > > The question was why do you limit that to > > > > > > > > > QCOM_I2C_GPI_NUM_MSGS_PER_IRQ. > > > > > > > > > What is the reason for that limit, etc. If you think about > > > > > > > > > it, The > > > > > > > > > GENI > > > > > > > > > / I2C doesn't impose any limit on the number of messages > > > > > > > > > processed in > > > > > > > > > one go (if I understand it correctly). Instead the limit comes > > > > > > > > > from the > > > > > > > > > GPI DMA driver. As such, please don't add extra 'handling' to > > > > > > > > > the I2C > > > > > > > > > driver. Make GPI DMA driver responsible for saying 'no more > > > > > > > > > for now', > > > > > > > > > then I2C driver can setup add an interrupt flag and proceed > > > > > > > > > with > > > > > > > > > submitting next messages, etc. > > > > > > > > > > > > > > > > > > > > > > > > > For I2C messages, we need to prepare TREs for Config, Go and > > > > > > > > DMAs. > > > > > > > > However, > > > > > > > > if a large number of I2C messages are submitted then may may run > > > > > > > > out of > > > > > > > > memory for serving the TREs. The GPI channel supports a maximum > > > > > > > > of > > > > > > > > 64 TREs, > > > > > > > > which is insufficient to serve 32 or even 16 I2C messages > > > > > > > > concurrently, > > > > > > > > given the multiple TREs required per message. > > > > > > > > > > > > > > > > To address this limitation, a strategy has been implemented to > > > > > > > > manage how > > > > > > > > many messages can be queued and how memory is recycled. The > > > > > > > > constant > > > > > > > > QCOM_I2C_GPI_MAX_NUM_MSGS is set to 16, defining the upper > > > > > > > > limit of > > > > > > > > messages that can be queued at once. Additionally, > > > > > > > > QCOM_I2C_GPI_NUM_MSGS_PER_IRQ is set to 8, meaning that > > > > > > > > half of the queued messages are expected to be freed or > > > > > > > > deallocated > > > > > > > > per > > > > > > > > interrupt. > > > > > > > > This approach ensures that the driver can efficiently manage TRE > > > > > > > > resources > > > > > > > > and continue queuing new I2C messages without exhausting memory. > > > > > > > > > I really don't see a reason for additional complicated > > > > > > > > > handling in > > > > > > > > > the > > > > > > > > > geni driver that you've implemented. Maybe I misunderstand > > > > > > > > > something. In > > > > > > > > > such a case it usually means that you have to explain the > > > > > > > > > design > > > > > > > > > in the > > > > > > > > > commit message / in-code comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The I2C Geni driver is designed to prepare and submit > > > > > > > > descriptors > > > > > > > > to the GPI > > > > > > > > driver one message at a time. > > > > > > > > As a result, the GPI driver does not have visibility into the > > > > > > > > current > > > > > > > > message index or the total number of I2C messages in a transfer. > > > > > > > > This lack > > > > > > > > of context makes it challenging to determine when to set the > > > > > > > > block > > > > > > > > event > > > > > > > > interrupt, which is typically used to signal the completion of a > > > > > > > > batch of > > > > > > > > messages. > > > > > > > > > > > > > > > > So, the responsibility for deciding when to set the BEI should > > > > > > > > lie > > > > > > > > with the > > > > > > > > I2C driver. > > > > > > > > > > > > > > > > If this approach is acceptable, I will proceed with updating the > > > > > > > > relevant > > > > > > > > details in the commit message. > > > > > > > > > > > > > > > > Please let me know if you have any concerns or suggestions. > > > > > > > > > > > > > Hi Dmitry, Sorry for the delayed response, and thank you for the > > > > > > suggestions. > > > > > > > > > > > > > - Make gpi_prep_slave_sg() return NULL if flags don't have > > > > > > > DMA_PREP_INTERRUPT flag and there are no 3 empty TREs for the > > > > > > > interrupt-enabled transfer. > > > > > > "there are no 3 empty TREs for the interrupt-enabled transfer." > > > > > > Could you please help me understand this a bit better? > > > > > > > > > > In the GPI driver you know how many TREs are available. In > > > > > gpi_prep_slave_sg() you can check that and return an error if there > > > > > are not enough TREs available. > > > > > > > > > > > > > > > > > > > - If I2C driver gets NULL from dmaengine_prep_slave_single(), > > > > > > > retry > > > > > > > again, adding DMA_PREP_INTERRUPT. Make sure that the last one > > > > > > > always > > > > > > > gets DMA_PREP_INTERRUPT. > > > > > > Does this mean we need to proceed to the next I2C message and ensure > > > > > > that the DMA_PREP_INTERRUPT flag is set for the last I2C message in > > > > > > each > > > > > > chunk? And then, should we submit the chunk of messages to the GSI > > > > > > hardware for processing? > > > > > > > > > > No. You don't have to peek at the next I2C message. This all concerns > > > > > the current I2C message. The only point where you have to worry is to > > > > > explicitly set the flag for the last message. > > > > > > > > > > > > > > > > > > > > > > > > > - In geni_i2c_gpi_xfer() split the loop to submit messages until > > > > > > > you > > > > > > > can, then call wait_for_completion_timeout() and then > > > > > > > geni_i2c_gpi_unmap() for submitted messages, then continue > > > > > > > with > > > > > > > a new > > > > > > > portion of messages. > > > > > > Since the GPI channel supports a maximum of 64 TREs, should we > > > > > > consider > > > > > > submitting a smaller number of predefined messages — perhaps fewer > > > > > > than > > > > > > 32, such as 16? > > > > > > > > > > Why? Just submit messages until they fit, then flush the DMA async > > > > > channel. > > > > > > > > > > > This is because handling 32 messages would require one TRE for > > > > > > config > > > > > > and 64 TREs for the Go and DMA preparation steps, which exceeds the > > > > > > channel's TRE capacity of 64. > > > > > > > > > > > > We designed the approach to submit a portion of the messages — for > > > > > > example, 16 at a time. Once 8 messages are processed and freed, the > > > > > > hardware can continue processing the TREs, while the software > > > > > > simultaneously prepares the next set of TREs. This parallelism > > > > > > helps in > > > > > > efficiently utilizing the hardware and enhances overall system > > > > > > optimization. > > > > > > > > > > > > > > > And this overcomplicates the driver and introduces artificial > > > > > limitations which need explanation. Please fix it in a simple way > > > > > first. Then you can e.g. implement the watermark at the half of the > > > > > GPI channel depth and request DMA_PREP_INTERRUPT to be set in the > > > > > middle of the full sequence, allowing it to be used asynchronously in > > > > > the background. > > > > > > > > > > > > > Okay, will review it. Thanks. > > > > > > > > > > > > > > Hi Dmitry, > > > > > > Can you please check and confirm the approach to follow is something > > > like the pseudo code mentioned below: > > > > Yes, this is what I've had in mind. > > So, Apart from the changes related to "submitting I2C messages until they > fit" and "unmapping all processed I2C messages together", the rest of the > code looks remains the same as in the v6 patch ? > Also, in the GPI driver, we need to add logic to retrieve the number of > available TREs. > > I have a concern regarding throughput and achieving parallelism between > software and hardware processing with this new approach. Since we need to > unmap all processed messages together, the software cannot queue the next > set of TREs while the hardware is still processing the current ones.
Does that warrant the over-complexity of the driver or close-coupling of I2C and GPE drivers? The I2C is a slow bus and it is not expected to be used for high-throughput data. > > As I mentioned earlier, the previous approach allowed partial unmapping > where half of the messages processed by the hardware could be > freed/unmapped. This enabled the hardware to continue processing the > remaining TREs while the software simultaneously prepared the next batch. > This parallelism helped in better hardware utilization and improved overall > system performance. Measurements / values / impact? > > Could you please confirm if can go with the similar approach of unmap the > processed TREs based on a fixed threshold or constant value, instead of > unmapping them all at once? I'd still say, that's a bad idea. Please stay within the boundaries of the DMA API. > > > > > > > > GPI driver: > > > In gpi_prep_slave_sg() function, > > > > > > if (!(flags & DMA_PREP_INTERRUPT) && !gpi_available_tres(chan)) > > > return NULL; > > > > > > > > > I2C GENI driver: > > > > > > for (i = 0; i < num; i++) > > > { > > > /* Always set interrupt for the last message */ > > > if (i == num_msgs - 1) > > > flags |= DMA_PREP_INTERRUPT; > > > > > > > > > desc = dmaengine_prep_slave_single(chan, dma_addr, len, dir, flags); > > > if (!desc && !(flags & DMA_PREP_INTERRUPT)) { > > > /* Retry with interrupt if not enough TREs */ > > > flags |= DMA_PREP_INTERRUPT; > > > desc = dmaengine_prep_slave_single(chan, dma_addr, len, dir, > > > flags); > > > } > > > > > > > > > if (!desc) > > > break; > > > > > > > > > dmaengine_submit(desc); > > > msg_idx++; > > > } > > > > > > dma_async_issue_pending(chan)); > > > > > > time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT); > > > if (!time_left) > > > return -ETIMEDOUT; > > > > > > Now Invoke "geni_i2c_gpi_unmap" for unmapping all submitted I2C messages. > > > > > > > > > Thanks, > > > JyothiKumar > > > > > > > > > > > > > > -- With best wishes Dmitry