> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Guo,
> Yejun
> Sent: 2021年2月16日 18:37
> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data
> AVDnnBoundingBox for dnn based detect/classify filters
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Mark
> > Thompson
> > Sent: 2021年2月16日 7:48
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data
> > AVDnnBoundingBox for dnn based detect/classify filters
> >
> > On 11/02/2021 08:15, Guo, Yejun wrote:
> > >> -----Original Message-----
> > >> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of
> > >> Mark Thompson
> > >> Sent: 2021年2月11日 6:19
> > >> To: ffmpeg-devel@ffmpeg.org
> > >> Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side
> > >> data AVDnnBoundingBox for dnn based detect/classify filters
> > >>
> > >> On 10/02/2021 09:34, Guo, Yejun wrote:
> > >>> Signed-off-by: Guo, Yejun <yejun....@intel.com>
> > >>> ---
> > >>>    doc/APIchanges       |  2 ++
> > >>>    libavutil/Makefile   |  1 +
> > >>>    libavutil/dnn_bbox.h | 68
> > >> ++++++++++++++++++++++++++++++++++++++++++++
> > >>>    libavutil/frame.c    |  1 +
> > >>>    libavutil/frame.h    |  7 +++++
> > >>>    libavutil/version.h  |  2 +-
> > >>>    6 files changed, 80 insertions(+), 1 deletion(-)
> > >>>    create mode 100644 libavutil/dnn_bbox.h
> > >>
> > >> What is the intended consumer of this box information?  (Is there
> > >> some other filter which will read these are do something with them,
> > >> or some sort of user
> > >> program?)
> > >>
> > >> If there is no use in ffmpeg outside libavfilter then the header
> > >> should probably be in libavfilter.
> > >
> > >
> > > Thanks for the feedback.
> > >
> > > For most case, other filters will use this box information, for
> > > example, a classify filter will recognize the car number after the
> > > car plate is detected, another filter can apply 'beauty' if a face
> > > is detected, and updated drawbox filter (in plan) can draw the box
> > > for visualization, and a new filter such as bbox_to_roi can be added
> > > to apply roi
> > encoding for the detected result.
> > >
> > > It is possible that some others will use it, for example, the new
> > > codec is adding AI labels and so libavcodec might need it in the
> > > future, and a user program might do something special such as:
> > > 1. use libavcodec to decode
> > > 2. use filter detect
> > > 3. write his own code to handle the detect result
> > >
> > > As the first step, how about to put it in the libavfilter (so do not
> > > expose it at API level and we are free to change it when needed)?
> > > And we can move it to libavutil once it is required.
> >
> > Sure.
> >
> > >> How tied is this to the DNN implementation, and hence the DNN name?
> > >> If someone made a standalone filter doing object detection by some
> > >> other method, would it make sense for them to reuse this structure?
> > >
> > > Yes, this structure is general, I add dnn prefix because of two reasons:
> > > 1. There's already bounding box in libavfilter/bbox.h, see below,
> > > it's simple and we could not reuse it, so we need a new name.
> > > typedef struct FFBoundingBox {
> > >      int x1, x2, y1, y2;
> > > } FFBoundingBox;
> >
> > Right, really this is just the return type for the internal
> > ff_calculate_bounding_box() function - if you want to reuse the name
> > externally then it would be fine to rename the existing stuff to get
> > it out of the way.
> 
> yeah, I'll consider to rename it after these patches are done, since they now
> are not conflict from compiler's perspective.
> 
> >
> > > 2. DNN is currently the dominate method for object detection.
> >
> > Unless your ID values or something else about the output are
> > DNN-specific then I'm not really seeing the attraction of associating
> > them with the DNN name for external use.  If a user wants to detect
> > some objects in an image and then do something with the result then
> > maybe they know they are using DNN for first step, but they won't care
> about where the result came from after that.
> 
> It reminds me that we might need to save some information such as model
> name, name of data set trained, name/parameters of other non-dnn
> implementations etc., and so the user knows better about the bbox. For
> example, we can add 'char source[128]'
> as box header for all the bboxes.
> 
> I'll think about it and send new patches for the side data and detect filter.

hi, I'll push the first 7 patches of this patch set tomorrow if there's no other
comment for them, and then send new patch set for side data and detect filter,
thanks.

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to