Re: d_len/d_buf arbitration for s32k1xx_flexcan

2023-01-04 Thread raiden00pl
Related issue: https://github.com/apache/nuttx/issues/5142.
Not using a work queue to handle CAN RX is basically wrong. Unfortunately
all NXP SocketCAN drivers are affected.

wt., 3 sty 2023 o 19:38 Xiang Xiao  napisał(a):

>   Sorry, "you must do..." may confuse you. What I mean is the CAN driver.
>
> On Wed, Jan 4, 2023 at 2:30 AM Carlos Sanchez
>  wrote:
>
> > Hi Xiang,
> >
> > Please note what I describe is not caused by my code using multiple
> > threads, but is happening on Nuttx upstream. My code is single threaded,
> > but s32k1xx_flexcan driver (and several other Socket CAN drivers as they
> > all seem to be derived from the same code base) does some things on the
> > thread I call write() from, and some other things on CANWORK work queue
> > thread.
> > My understanding is that net code is structured so work queue threads are
> > used, generally, but in Socket CAN drivers this was "waived" to avoid
> data
> > loss, causing the problem I describe.
> >
> > Thanks,
> >
> > Carlos
> >
> > On Tue, Jan 3, 2023 at 6:56 PM Xiang Xiao 
> > wrote:
> >
> > > Since tx/rx share the same d_len/d_buf, you must do send/recv in one
> and
> > > only thread(either by system work thread or driver dedicated thread) to
> > > avoid the race condition you describe below.
> > >
> > > On Wed, Jan 4, 2023 at 1:45 AM Carlos Sanchez
> > >  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am observing an extrange behavior: under heavy-error CAN TX
> scenario
> > > (no
> > > > acks so TX fails always), usually after the second call to write() my
> > > > writes fail. This is expected as s32k1xx_flexcan has two TX mailboxes
> > and
> > > > from my understanding of the code there is no other buffering (on
> this
> > > > specific CAN driver at least).
> > > >
> > > > However, if I enable CAN errors, and depending on runtime sync
> > conditions
> > > > (basically, if I put a breakpoint on s32k1xx_txpoll) then all the
> > writes
> > > > after the first silently fail, without really trying to send anything
> > (I
> > > > see on ESR2 register than second TX mailbox does not really become
> > > active).
> > > >
> > > > After some debugging, I have seen that the CANWORK=LPWORK thread has
> > > > scheduled calls to s32k1xx_error_work, overwrites d_buf/d_len to send
> > the
> > > > error frames in. But TX polling does not always happen in CANWORK
> > thread:
> > > > it does when it comes from s32k1xx_txdone_work, but not when it comes
> > > from
> > > > s32k1xx_txavail_work, which, despite the name, is called directly on
> > > > s32k1xx_txavail context (which is the application context). What is
> > > > happening in my case, is txavail_work->...->devif_poll->can_poll is
> > > setting
> > > > d_buf/d_len to the packed to be sent, but before s32k1xx_txpoll
> checks
> > it
> > > > (due to my breakpoint), s32k1xx_error kicks in, "steals" d_buf/d_len
> to
> > > > setup the error frame and calls can_input. The frame which was set up
> > by
> > > > the polling sequence gets silently discarded.
> > > >
> > > > I have tried setting s32k1xx_txavail_work inside CANWORK, but this
> > fails
> > > > because can_sendmsg checks immediately for non-blocking writes;
> placing
> > > the
> > > > polling in the work queue would cause all non-blocking writes to
> fail.
> > > >
> > > > Related to this, I also fail to see the arbitration between TX/RX for
> > > > d_buf/d_len. From what I see, the same problem I am describing could
> > > happen
> > > > by s32k1xx_receive "stealing" d_buf/d_len, same as s32k1xx_error is
> > doing
> > > > in my case. But this is only a thought, I have not observed it.
> > > >
> > > > A possible clean solution is to use another buffer, but it is complex
> > and
> > > > would mean losing the direct connection between the write and the HW
> TX
> > > > (which might be useful in general and it is for my use case). A
> quicker
> > > > solution would be for s32k1xx_error to lock the network, forcing it
> to
> > > wait
> > > > until txavail_work is done. This would solve my case. My second
> concern
> > > is
> > > > more difficult to solve as comments in the code explicitly say RX
> > cannot
> > > be
> > > > delayed to the work queue or CAN frames would be lost.
> > > >
> > > > Any ideas or anything I might be missing here?
> > > >
> > > > Thanks,
> > > >
> > > > Carlos
> > > >
> > > > --
> > > >
> > > > Carlos Sanchez (he, him, his)
> > > > Geotab
> > > >
> > > > Embedded Systems Developer Team Lead | Europe
> > > >
> > > > Visit
> > > >
> > > > www.geotab.com
> > > >
> > > > Twitter  | Facebook
> > > >  | YouTube
> > > >  | LinkedIn
> > > > 
> > > >
> > >
> >
> >
> > --
> >
> > Carlos Sanchez (he, him, his)
> > Geotab
> >
> > Embedded Systems Developer Team Lead | Europe
> >
> > Visit
> >
> > www.geotab.com
> >
> > Twitter  | Facebook
> >  

Re: Version when building the latest NuttX

2023-01-04 Thread Alan C. Assis
Hi Jernej,

I faced similar issue some time all, I don't remember all the details but
in some special cases the script get the wrong information. I think it
happens when we do git reset --hard to some commit from old release.

I cloned again and it got fixed.

BR,

Alan

On Wednesday, January 4, 2023, Jernej Turnsek 
wrote:

> Hi,
>
> I have done the prune and prune-tags option again and now it seems to work
> ok. The .version file gets the proper version, although git describe
> command still gives me 8.2 version.
>
> On Tue, Jan 3, 2023 at 2:50 PM Gregory Nutt  wrote:
>
> > Look at the hidden file .version.  You will need to modify that file as
> > you see fit.  It is not controlled under GIT (but is provided with each
> > new  release package).
> >
> > On 1/3/2023 2:38 AM, Jernej Turnsek wrote:
> > > Hi All,
> > >
> > > when building the latest NuttX OS from git, I am having problems with
> > > .version file, where wrong version is used. Running the git describe, I
> > > get nuttx-8.2-12100-g56c6943311, although I have the latest code. I
> have
> > > updated the local repo from upstream with --prune and --prune- tags
> > option
> > > to get the latest tagging, but all the tags after 8.2 are on releases
> and
> > > not the master branch, thus describe command is not seeing them. I was
> > also
> > > trying to manually run the version.sh script, but make command is
> always
> > > overriding the version. How to get around the problem, without using
> the
> > > release?
> > >
> > > Thanks, Jernej
> > >
> >
> >
>


RE: Re: d_len/d_buf arbitration for s32k1xx_flexcan

2023-01-04 Thread Peter van der Perk
Hi,

It seems that calling can_input directly from IRQ got broken since the IOB 
rewrite.
Before can_input only used dev->d_appdata, but now can_input overwrites the 
dev->d_buf pointer as well.
https://github.com/apache/nuttx/blob/779a610ca3ba495640b49d6c36bce89784955e0d/net/can/can_input.c#L231
dev->d_len always has been fixed to sizeof(struct can_frame) or sizeof(struct 
canfd_frame) depending on kconfig setting.

Once the whole IOB rewrite gets in, is mature and is fully documented I can 
took a look at the mechanism again.

Short term you could either go back to an older version of NuttX, or try to 
schedule the workqueue for can_input and see if you can enough throughput.

Yours sincerely,

Peter van der Perk

-Original Message-
From: raiden00pl  
Sent: Wednesday, January 4, 2023 10:01 AM
To: dev@nuttx.apache.org
Subject: Re: d_len/d_buf arbitration for s32k1xx_flexcan

Related issue: 
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fnuttx%2Fissues%2F5142&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HIk%2Fza58GP94ZSc1ANtynWEz3fD3o6PV05Kv90ORYig%3D&reserved=0.
Not using a work queue to handle CAN RX is basically wrong. Unfortunately all 
NXP SocketCAN drivers are affected.

wt., 3 sty 2023 o 19:38 Xiang Xiao  napisał(a):

>   Sorry, "you must do..." may confuse you. What I mean is the CAN driver.
>
> On Wed, Jan 4, 2023 at 2:30 AM Carlos Sanchez 
>  wrote:
>
> > Hi Xiang,
> >
> > Please note what I describe is not caused by my code using multiple 
> > threads, but is happening on Nuttx upstream. My code is single 
> > threaded, but s32k1xx_flexcan driver (and several other Socket CAN 
> > drivers as they all seem to be derived from the same code base) does 
> > some things on the thread I call write() from, and some other things 
> > on CANWORK work queue thread.
> > My understanding is that net code is structured so work queue 
> > threads are used, generally, but in Socket CAN drivers this was 
> > "waived" to avoid
> data
> > loss, causing the problem I describe.
> >
> > Thanks,
> >
> > Carlos
> >
> > On Tue, Jan 3, 2023 at 6:56 PM Xiang Xiao 
> > 
> > wrote:
> >
> > > Since tx/rx share the same d_len/d_buf, you must do send/recv in 
> > > one
> and
> > > only thread(either by system work thread or driver dedicated 
> > > thread) to avoid the race condition you describe below.
> > >
> > > On Wed, Jan 4, 2023 at 1:45 AM Carlos Sanchez 
> > >  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am observing an extrange behavior: under heavy-error CAN TX
> scenario
> > > (no
> > > > acks so TX fails always), usually after the second call to 
> > > > write() my writes fail. This is expected as s32k1xx_flexcan has 
> > > > two TX mailboxes
> > and
> > > > from my understanding of the code there is no other buffering 
> > > > (on
> this
> > > > specific CAN driver at least).
> > > >
> > > > However, if I enable CAN errors, and depending on runtime sync
> > conditions
> > > > (basically, if I put a breakpoint on s32k1xx_txpoll) then all 
> > > > the
> > writes
> > > > after the first silently fail, without really trying to send 
> > > > anything
> > (I
> > > > see on ESR2 register than second TX mailbox does not really 
> > > > become
> > > active).
> > > >
> > > > After some debugging, I have seen that the CANWORK=LPWORK thread 
> > > > has scheduled calls to s32k1xx_error_work, overwrites 
> > > > d_buf/d_len to send
> > the
> > > > error frames in. But TX polling does not always happen in 
> > > > CANWORK
> > thread:
> > > > it does when it comes from s32k1xx_txdone_work, but not when it 
> > > > comes
> > > from
> > > > s32k1xx_txavail_work, which, despite the name, is called 
> > > > directly on s32k1xx_txavail context (which is the application 
> > > > context). What is happening in my case, is 
> > > > txavail_work->...->devif_poll->can_poll is
> > > setting
> > > > d_buf/d_len to the packed to be sent, but before s32k1xx_txpoll
> checks
> > it
> > > > (due to my breakpoint), s32k1xx_error kicks in, "steals" 
> > > > d_buf/d_len
> to
> > > > setup the error frame and calls can_input. The frame which was 
> > > > set up
> > by
> > > > the polling sequence gets silently discarded.
> > > >
> > > > I have tried setting s32k1xx_txavail_work inside CANWORK, but 
> > > > this
> > fails
> > > > because can_sendmsg checks immediately for non-blocking writes;
> placing
> > > the
> > > > polling in the work queue would cause all non-blocking writes to
> fail.
> > > >
> > > > Related to this, I also fail to see the arbitration between 
> > > > TX/RX for d_buf/d_len. From what I see, the same problem I am 
> > > > describing could
> > > happen
> > > > by s32k1xx_receive "stealing" d_buf/d_len, same as s32k1xx_error 
> > > > is
> > doing
> > > > in my case. But this

Re: Re: d_len/d_buf arbitration for s32k1xx_flexcan

2023-01-04 Thread Carlos Sanchez
Hi Peter,


> It seems that calling can_input directly from IRQ got broken since the IOB
> rewrite.
> Before can_input only used dev->d_appdata, but now can_input overwrites
> the dev->d_buf pointer as well.
>
> https://github.com/apache/nuttx/blob/779a610ca3ba495640b49d6c36bce89784955e0d/net/can/can_input.c#L231
> dev-
> >d_len
> always has been fixed to sizeof(struct can_frame) or sizeof(struct
> canfd_frame) depending on kconfig setting.
>

Please note dev->d_len might change depending on the timestamping setting
for each socket.

Short term you could either go back to an older version of NuttX, or try to
> schedule the workqueue for can_input and see if you can enough throughput.
>

I have added net_lock()/net_unlock() around s32k1xx_error() call and this
solves the interaction between the application thread and CANWORK thread
for error frame injection. Other _work handlers do lock the network, so I
think this just slipped by and it should have been there since the
beginning. I have found a couple of other problems in the driver, I
will create a PR with all this.

The interactions with interrupts are a little bit more problematic, though.

Best regards,

Carlos

-- 

Carlos Sanchez (he, him, his)
Geotab

Embedded Systems Developer Team Lead | Europe

Visit

www.geotab.com

Twitter  | Facebook
 | YouTube
 | LinkedIn



Problem with exported lib

2023-01-04 Thread Roberto Bucher

Hi

I have a new configuration for an Olimex-ESP32-PoE board. I can compile 
and build the nuttx flash without problems.


I did a

make export

and putted the generated nuttx-export folder into another project 
(pysimCoder). The Makefile of the project, which correctly works with my 
STM32 boards, fails the compilation of the image for esp32 with this 
strange message:



xtensa-esp32-elf-ld -nostdlib --gc-sections --cref 
-Map=/home/bucher/sviluppo/NUTTX/nuttx/nuttx.map -L 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/libs 
--entry=__start  -T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/esp32_rom.ld 
-T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/flat_memory.ld 
-T 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/legacy_sections.ld 
\

  -o ../test  \
  nuttx_main.o test.o  nuttx_main-builtintab.o 
/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/lib/libpyblk.a --start-group 
-lsched -ldrivers -lboards -lc -lmm -larch -lxx -lapps -lnet -lfs 
-lbinfmt -lwireless -lboard -lboard 
/home/bucher/sviluppo/GITHUB/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/libgcc.a 
--end-group
*xtensa-esp32-elf-ld:/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/flat_memory.ld:32: 
ignoring invalid character `#' in expression**
**xtensa-esp32-elf-ld:/home/bucher/CACSD/pysimCoder/CodeGen/nuttx/nuttx-export/scripts/flat_memory.ld:32: 
syntax error**

**make: *** [Makefile:133: ../test] Error 1**
*

The line 32 of the script "flat_memory.ld" contains simply the line

#include 

Hints are welcomed!

Thanks in advance

Roberto