classify filters

Mark Thompson Wed, 10 Feb 2021 14:19:39 -0800

On 10/02/2021 09:34, Guo, Yejun wrote:

Signed-off-by: Guo, Yejun <yejun....@intel.com>
---
  doc/APIchanges       |  2 ++
  libavutil/Makefile   |  1 +
  libavutil/dnn_bbox.h | 68 ++++++++++++++++++++++++++++++++++++++++++++
  libavutil/frame.c    |  1 +
  libavutil/frame.h    |  7 +++++
  libavutil/version.h  |  2 +-
  6 files changed, 80 insertions(+), 1 deletion(-)
  create mode 100644 libavutil/dnn_bbox.h


What is the intended consumer of this box information?  (Is there some other 
filter which will read these are do something with them, or some sort of user 
program?)

If there is no use in ffmpeg outside libavfilter then the header should 
probably be in libavfilter.

How tied is this to the DNN implementation, and hence the DNN name?  If someone 
made a standalone filter doing object detection by some other method, would it 
make sense for them to reuse this structure?

diff --git a/libavutil/dnn_bbox.h b/libavutil/dnn_bbox.h
new file mode 100644
index 0000000000..50899c4486
--- /dev/null
+++ b/libavutil/dnn_bbox.h
@@ -0,0 +1,68 @@
+/*
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_DNN_BBOX_H
+#define AVUTIL_DNN_BBOX_H
+
+#include "rational.h"
+
+typedef struct AVDnnBoundingBox {
+    /**
+     * Must be set to the size of this data structure (that is,
+     * sizeof(AVDnnBoundingBox)).
+     */
+    uint32_t self_size;
+
+    /**
+     * Object detection is usually applied to a smaller image that
+     * is scaled down from the original frame.
+     * width and height are attributes of the scaled image, in pixel.
+     */
+    int model_input_width;
+    int model_input_height;


Other than to interpret the distances below, what will the user do with this 
information?  (Alternatively: why not map the distances back onto the original 
frame size?)

+
+    /**
+     * Distance in pixels from the top edge of the scaled image to top
+     * and bottom, and from the left edge of the scaled image to left and
+     * right, defining the bounding box.
+     */
+    int top;
+    int left;
+    int bottom;
+    int right;
+
+    /**
+     * Detect result
+     */
+    int detect_label;


How does a user interpret this label?  Is it from some known enum?

+    AVRational detect_conf;


"conf"... idence?  A longer name and a descriptive comment might help.

+
+    /**
+     * At most 4 classifications based on the detected bounding box.
+     * For example, we can get max 4 different attributes with 4 different
+     * DNN models on one bounding box.
+     * classify_count is zero if no classification.
+     */
+#define AV_NUM_BBOX_CLASSIFY 4
+    uint32_t classify_count;
+    int classify_labels[AV_NUM_BBOX_CLASSIFY];
+    AVRational classify_confs[AV_NUM_BBOX_CLASSIFY];


Same comment on these.

+} AVDnnBoundingBox;
+
+#endif
diff --git a/libavutil/frame.c b/libavutil/frame.c
index eab51b6a32..4308507827 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -852,6 +852,7 @@ const char *av_frame_side_data_name(enum 
AVFrameSideDataType type)
      case AV_FRAME_DATA_VIDEO_ENC_PARAMS:            return "Video encoding 
parameters";
      case AV_FRAME_DATA_SEI_UNREGISTERED:            return "H.26[45] User Data 
Unregistered SEI message";
      case AV_FRAME_DATA_FILM_GRAIN_PARAMS:           return "Film grain 
parameters";
+    case AV_FRAME_DATA_DNN_BBOXES:                  return "DNN bounding 
boxes";
      }
      return NULL;
  }
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 1aeafef6de..a4dcfd27c9 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -198,6 +198,13 @@ enum AVFrameSideDataType {
       * Must be present for every frame which should have film grain applied.
       */
      AV_FRAME_DATA_FILM_GRAIN_PARAMS,
+
+    /**
+     * Bounding box generated by dnn based filters for object detection and 
classification,
+     * the data is an array of AVDnnBoudingBox, the number of array element is 
implied by
+     * AVFrameSideData.size / AVDnnBoudingBox.self_size.
+     */
+    AV_FRAME_DATA_DNN_BBOXES,
  };

enum AVActiveFormatDescription {

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data AVDnnBoundingBox for dnn based detect/classify filters

Reply via email to