ST-E STM Driver Review

Deao, Douglas Wed, 25 May 2011 14:55:29 -0700

Sorry it took a while to get back to you guys. I was visiting customers last 
week. Most of my comments are just highlighting the differences between TI's 
STM 1.0 driver and ST-E's STM 1.0 driver, but there are a few questions, 
observations and suggestions. At the end I included some discussion on TI's 
meta data and OST header requirements.


I have not had a chance to look at your actual implementation yet. Did you do 
anything to abstract the actual HW transport ports and control registers from 
the higher level driver functions?

I realize there is a lot here to work through so if you would rather schedule a 
conference call to talk through the differences I can do that. I would like to 
start work on a Linaro (Unified) STM Spec next week if I can get feedback from 
everybody over the next few days. I will be out of the office on 5/27 and 5/31.

I am especially interested in details of what you guys have in mind for a 
"common trace framework to receive STM drivers". If by framework you mean well 
defined APIs that are implemented for specific devices, then I think we are in 
agreement. What Michael and I have talked about is a common STM user mode 
experience across all Linaro supported devices, making Linux user mode code 
100% portable between our devices.  

ST-E STM Driver stm-trace.txt review:

1. Software Overview

In your "Software Overview" it states:

"The end of data packet is marked by a time stamp on latest byte(s) only."

I assume that user messages can be made up of any number of bytes, half-words, 
words or longs (what ever is most efficient) and you simply terminate the last 
element of the message with a time-stamp - right? 

In the TI STM implementation a message can be any number and combination of 
bytes, half-words, or word transfers terminated with a time-stamp on the last 
element. In addition to that we also add an OST header to a message. (See below 
for discussion on OST header).


2. Lossless/Lossy modes.

TI only supports lossless mode for sw generated messages and is enforced in our 
hw implementation. Lossy mode is reserved for true hw messages.

I did not notice that you documented a way to modify this through the debugfs 
API or IOCTLS.

I am kind of thinking that may be ok since this is really a hw configuration 
choice in your case, but in the TI case the user does not get to make that 
choice.

3. Channel Assignment

TI makes the assignment with mknod using the minor number to assign a fixed 
channel. This allows the user mode application to overload the channel usage 
for categorizing data (not my idea). I think we see the error of our ways here 
and will be ok with a dynamic channel allocation.  

I am thinking that for each unique pid a channel should be assigned when the 
device is opened. I would guess you are keeping a channel table around and 
write() just checks the table for a pid assignment (no time to look at your 
implementation yet), if none is found the first free channel is used. If you 
moved this function back to open then you could do the IOCTL STM_GET_CHANNEL_NO 
anytime, not just after the first write.

In write how do you flag an error if you exhaust the number of available 
channels?


4. Kernel API

TI does not support a Kernel API (yet). I can see that the Alloc/Free and File 
IO type functions are useful and should be standard. 

Not sure what you mean by "lockless" trace functions? 

It looks like your "low level atomic trace functions for 1, 2, 4 or 8 bytes" is 
similar to TI's binary library functions (not supported by the TI STM Driver). 
This is what we use the OST header for, allowing our tool chain to 
differentiate between different message formats, rather than just assuming the 
data is a simple stream of bytes. 

5. Debugfs APIs.

TI used a different approach. The tool-chain on the host provides all the 
transport setup through JTAG, so our driver does not support setting up the 
actual STM data export (number of pins and clock rate). In our case device 
transport parameters must match the host receiver's collection setup. 

With your approach the user can change the clock rate and export pin width 
effectively at any time. Our tools actually go through a calibration process 
during initialization so any changes to the device's transport setup (clock 
rate, number of pins data exported on) would cause the TI tool chain a lot of 
grief. 

There are some parameters we know we need to add (like master enables). This 
are currently also handled by the host tools. TI's STM module allows up to 4 SW 
masters to be enabled (with id masks that can be used to enable multiple 
masters from the same group) and 4 HW masters that can be enabled at the same 
time as the SW masters. If the user tries to enable more than the HW allows do 
you have a mechanism to flag an error?

I don't have a lot of experience with debugfs but I am assuming it's primarily 
used for allowing scripts to configure a driver (like in your example) or 
extract information. 

We may want to define a standard set of debugfs options whose implementation is 
vendor specific. But that raises some questions:

- How do we deal with options that don't make sense for a specific vendor?
  Maybe just doing nothing is acceptable or do we want to provide a    
discovery mechanism?
- Would user scripts then also be vendor specific? 
  We should probably make an effort to avoid this. A discovery mechanism may 
allow user mode scripts to be generic.

6. Mapped Channels

I believe the TI hw transport channel mapping is compatible. In the TI case a 
channel is mapped into two spaces, the first half is for non-timestamp 
transfers and the second half is for time-stamped transfers. When we write a 
message (from a user mode write call for example) we simply write all the data 
except the last element through the non-timestamp port, and then the last 
element is written to the time-stamped port. So I think we could be compatible 
here. 

With that said I am not sure about exposing all channels to a user mode 
library. You are relying on the library to use the convention of getting a free 
channel from the driver to make sure there are no conflicts. If the channel 
assignment is made when you open the device, you could conceivably map just the 
address space needed for the single channel, thus eliminating the need to get a 
free channel from the driver. In the TI case a single channel's transport 
mapping is 4K bytes, which matches the typical PAGE_SIZE. I realize not all hw 
implementations will match up with the PAGE_SIZE, which may be why you simply 
map all the channels back to user space.

Since free channels can become busy rapidly, maybe a better convention would be 
to simply use another device node if the user wants the library STM data to be 
transmitted on a different STM channel than the current process. This may be a 
case where providing a mechanism (see meta data discussion below) to allow 
channels to be named for the toolchain may be a good idea (provide task name 
and process id). 

7. 8-byte Writes

TI does not support 64-bit writes with our STM 1.0 module. We may need an IOCTL 
to get the largest transfer supported for the mmap case. For all other cases 
this should just be hidden in the device dependent code.

8. Kernel Internal Usage

I like the idea of having dedicated support in the driver for common kernel 
logging. Any ideas on how you would support kernel STM channel assignments 
without hard-coding?

We may need a mechanism to communicate the definition of each hard-coded 
channel to our tools.


The following are TI specific:

9. Data protection

In SMP systems if the processor is switched a new master is generated (in some 
TI devices). So we protect the data with a mutex to guarantee a complete 
message is generated by the same master.

10. Meta Data

Our user mode HW libraries use meta data to transport data needed to process 
the HW profiling STM messages. Items like processor speed, sampling rate, 
processing options, ... (just a predefined byte buffer our tool-chain 
understands). The meta data is currently broadcasted on a dedicated channel 
(255), which conflicts with your hard-coded channel for logging printk output. 
So we will need to resolve hard-coded conflicts.

We need the driver to support registration and transport of the meta data on 
demand from the library (when the HW master is disabled, in case the collection 
buffer is small and circular).

I am thinking an IOCTL could be used to register meta data and then the data 
simply broadcast on a STM channel (will need to figure out which one) when the 
HW master is enabled and disabled.

Meta data transmission is problematic for circular buffers (like ETB's) thus 
the reason for also sending meta data when a hw master is disabled. SW masters 
are not typically disabled, and our HW does not provide a transmission byte 
count (remember there are HW messages also being generated in the TI case). So 
there is no way from a driver we can tell when the recoding buffer will wrap 
even if the user told us the buffer size. I am thinking the best solution would 
be to force the user to gracefully disable the channel to get any sw channel 
meta data provided by the driver. 

TI supports three cases of data capture:
- DTC/Host collection (stop on buffer full)
- DTC/Host collection (circular buffer)
- ETB/on-chip collection (circular buffer)

Of if the user is at a point in their code where they know thery will stop 
recoding on the hOst or ETB, we provide an IOCTL that simply disables all 
channels.

In the ETB case we may want to simply disable any open STM channels when the 
user decides to stop recording as a fail safe mechanism. 

Note: Periodic transmission of meta data into a small circular buffer will not 
work well. In cases where the data is sparse the buffer will simply be filled 
with meta data rather than useful data.
      

11. OST Headers

Adding an OST header to each message is a requirement for compatibility with 
TI's toolchain. There are a couple of ways to approach:

Completely hidden from the user - The device specific code will know if the 
header is necessary. On a write, prior to the copy from user space, the device 
independent code would have to make a call to get a properly sized memory 
buffer from the device dependent code that would include the header.

User enabled - Provide an IOCTL that allows the user to put the driver in a 
tool-chain specific mode (like add OST headers).


Regards,
Doug Deao


________________________________________
From: Philippe Langlais [mailto:philippe.langl...@linaro.org] 
Sent: Wednesday, May 04, 2011 3:08 AM
To: Deao, Douglas
Cc: Linus Walleij
Subject: Re: STM at UDS-Budapest

Hi Doug,

On STE ux500 platforms we have the same STM module (follow MIDP STP 1.0), I 
have already posted our current
implementation to the LKML and Linaro ML, it's very similar to your proposal.
I can't be present to the Linaro summit but Linus Walleij can replace me for 
this topic, he proposes to write a common
trace framework to receive STM drivers.
Attached all our current proposal and work around STM.

Regards
Philippe Langlais
ST-Ericsson
On 3 May 2011 00:42, Deao, Douglas <d-d...@ti.com> wrote:
I am hosting an introductory session on System Trace at the summit. TI's System 
Trace Module (STM) provides a common protocol for instrumentation messages 
across multiple cores and system level hardware profiling in complex SoCs. 
Attached is a whitepaper for background reading. 
 
Looking forward to meeting you at the summit.
 
Regards,
Doug Deao
Texas Instruments
 

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

ST-E STM Driver Review

Reply via email to