Am 23.07.2012 19:15, schrieb Benoît Thébaudeau:
On Monday 23 July 2012 15:35:25 Stefan Herbrechtsmeier wrote:
Am 20.07.2012 17:35, schrieb Benoît Thébaudeau:
On Friday 20 July 2012 17:15:13 Stefan Herbrechtsmeier wrote:
Am 20.07.2012 17:03, schrieb Benoît Thébaudeau:
On Friday 20 July 2012 16:51:33 Stefan Herbrechtsmeier wrote:
Am 20.07.2012 15:56, schrieb Benoît Thébaudeau:
Dear Marek Vasut,

On Friday 20 July 2012 15:44:01 Marek Vasut wrote:
On Friday 20 July 2012 13:37:37 Stefan Herbrechtsmeier wrote:
Am 20.07.2012 13:26, schrieb Benoît Thébaudeau:
+                       int xfr_bytes = min(left_length,
+                                           (QT_BUFFER_CNT * 4096 -
+                                            ((uint32_t)buf_ptr & 4095)) &
+                                           ~4095);
Why you align the length to 4096?
It's to guarantee that each transfer length is a multiple of
the
max packet
length. Otherwise, early short packets are issued, which
breaks
the
transfer and results in time-out error messages.
Early short packets ? What do you mean?
During a USB transfer, all packets must have a length of max
packet
length for
the pipe/endpoint, except the final one that can be a short
packet.
Without the
alignment I make for xfr_bytes, short packets can occur within
a
transfer,
because the hardware starts a new packet for each new queued
qTD
it
handles.
But if I am right, the max packet length is 512 for bulk and
1024
for
Interrupt transfer.
There are indeed different max packet lengths for different
transfer types, but
it does not matter since the chosen alignment guarantees a
multiple
of all these
possible max packet lengths.
But thereby you limit the transfer to 4 qT buffers for unaligned
transfers.
Not exactly. The 5 qt_buffers are used for page-unaligned buffers,
but that
results in only 4 full pages of unaligned data, requiring 5 aligned
pages.
Sorry I mean 4 full pages of unaligned data.
For page-aligned buffers, the 5 qt_buffers result in 5 full pages
of aligned
data.
Sure.
The unaligned case could be a little bit improved to always use as
many packets
as possible per qTD, but that would over-complicate things for a
very negligible
speed and memory gain.
In my use case (fragmented file on usb storage)  the gain would be
nearly 20%. The reason is that the data are block aligned (512) and
could be aligned to 4096 with the first transfer (5 qt_buffers).
Can you explain where this gain would come from? In both cases, the data in USB
transfers would be organized in the same way, and it would be accessed in memory
also in the same way (regarding bursts). The only difference would be the fetch
time of a little bit more qTDs, which is extremely fast and insignificant
compared to the transfer time of the payload, which remains unchanged.
You are right, the speed different will be minimal, only the memory usage will be lower.

Moreover, in your use case, if you are e.g. using FAT, on the one hand, the
buffers in fat.c are never aligned to more than the DMA min alignment, and on
the other hand, if you can align your user buffers to 512 bytes, you can also
align them directly to 4 kB.
The user buffer is aligned to 4kB, but this doesn't matter as a file load from a storage device (ex. fatload) can be segmented in partial USB transfers. This can lead to any block aligned buffer for a partial transfer.


My suggestion would be to truncate the xfr_bytes with the max
wMaxPacketSize (1024) and for the qtd_count use:

if ((uint32_t)buffer & 1023)    /* wMaxPacketSize unaligned */
      qtd_count += DIV_ROUND_UP(((uint32_t)buffer & 4095) +
              length, (QT_BUFFER_CNT - 1) * 4096);
else                /* wMaxPacketSize aligned */
      qtd_count += DIV_ROUND_UP(((uint32_t)buffer & 4095) +
              length, QT_BUFFER_CNT * 4096);

This allows 50% of unaligned block data (512) to be transferred with
min
qTDs.
That would also require a realignment-to-page stage. This is specific code for
specific buffer alignment from the upper layers. We could also skip the
realignment to page and always keep the same qTD transfer size except for the
last one, by adding as many packets as possible for the buffer alignment.
What you mean by realignment-to-page stage?

But I still don't see a significant reason to complicate code to do that.
I don't understand where you expect to complicate the code.

You limit the size of one transfer (xfr_bytes) to (QT_BUFFER_CNT - 1) * 4kB for unaligned buffers. But you only need to limit it to a multiple of the maximal possible wMaxPacketSize (1kB) to make sure that you always send full packages.

I only suggest to replace the causeless 4kB alignment with a reasonable 1kB alignment and adapte the qtd_count caluclation.

                        int xfr_bytes = min(left_length,
                                            (QT_BUFFER_CNT * 4096 -
                                             ((uint32_t)buf_ptr & 4095)) &
-                                           ~4095);
+                                           ~1023);

BTW, the 15x speed gain that I gave in my patch description was compared to an
older version of the original code that used 20 blocks per transfer in
usb_storage.c. This is now 40 blocks per transfer with a page-aligned buffer, so
the speed gain compared to the current code should be rather about 7x. I should
update that.
I'm sure that there is a significant speed gain but you shouldn't miss the heap usage as the main CONFIG_SYS_MALLOC_LEN is 128kB.

Maybe you should also add a worst case heap usage and I'm not sure, if your calculation are right, as the size of struct qTD is allays 32B and thereby I get 50kB or 64kB.

Best regards,
    Stefan

_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to