On 10/02/2021 09:34, Guo, Yejun wrote:
Signed-off-by: Guo, Yejun <yejun....@intel.com>
---
doc/APIchanges | 2 ++
libavutil/Makefile | 1 +
libavutil/dnn_bbox.h | 68 ++++++++++++++++++++++++++++++++++++++++++++
libavutil/frame.c | 1 +
libavutil/frame.h | 7 +++++
libavutil/version.h | 2 +-
6 files changed, 80 insertions(+), 1 deletion(-)
create mode 100644 libavutil/dnn_bbox.h
What is the intended consumer of this box information? (Is there some other
filter which will read these are do something with them, or some sort of user
program?)
If there is no use in ffmpeg outside libavfilter then the header should
probably be in libavfilter.
How tied is this to the DNN implementation, and hence the DNN name? If someone
made a standalone filter doing object detection by some other method, would it
make sense for them to reuse this structure?
diff --git a/libavutil/dnn_bbox.h b/libavutil/dnn_bbox.h
new file mode 100644
index 0000000000..50899c4486
--- /dev/null
+++ b/libavutil/dnn_bbox.h
@@ -0,0 +1,68 @@
+/*
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_DNN_BBOX_H
+#define AVUTIL_DNN_BBOX_H
+
+#include "rational.h"
+
+typedef struct AVDnnBoundingBox {
+ /**
+ * Must be set to the size of this data structure (that is,
+ * sizeof(AVDnnBoundingBox)).
+ */
+ uint32_t self_size;
+
+ /**
+ * Object detection is usually applied to a smaller image that
+ * is scaled down from the original frame.
+ * width and height are attributes of the scaled image, in pixel.
+ */
+ int model_input_width;
+ int model_input_height;
Other than to interpret the distances below, what will the user do with this
information? (Alternatively: why not map the distances back onto the original
frame size?)
+
+ /**
+ * Distance in pixels from the top edge of the scaled image to top
+ * and bottom, and from the left edge of the scaled image to left and
+ * right, defining the bounding box.
+ */
+ int top;
+ int left;
+ int bottom;
+ int right;
+
+ /**
+ * Detect result
+ */
+ int detect_label;
How does a user interpret this label? Is it from some known enum?
+ AVRational detect_conf;
"conf"... idence? A longer name and a descriptive comment might help.
+
+ /**
+ * At most 4 classifications based on the detected bounding box.
+ * For example, we can get max 4 different attributes with 4 different
+ * DNN models on one bounding box.
+ * classify_count is zero if no classification.
+ */
+#define AV_NUM_BBOX_CLASSIFY 4
+ uint32_t classify_count;
+ int classify_labels[AV_NUM_BBOX_CLASSIFY];
+ AVRational classify_confs[AV_NUM_BBOX_CLASSIFY];
Same comment on these.
+} AVDnnBoundingBox;
+
+#endif
diff --git a/libavutil/frame.c b/libavutil/frame.c
index eab51b6a32..4308507827 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -852,6 +852,7 @@ const char *av_frame_side_data_name(enum
AVFrameSideDataType type)
case AV_FRAME_DATA_VIDEO_ENC_PARAMS: return "Video encoding
parameters";
case AV_FRAME_DATA_SEI_UNREGISTERED: return "H.26[45] User Data
Unregistered SEI message";
case AV_FRAME_DATA_FILM_GRAIN_PARAMS: return "Film grain
parameters";
+ case AV_FRAME_DATA_DNN_BBOXES: return "DNN bounding
boxes";
}
return NULL;
}
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 1aeafef6de..a4dcfd27c9 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -198,6 +198,13 @@ enum AVFrameSideDataType {
* Must be present for every frame which should have film grain applied.
*/
AV_FRAME_DATA_FILM_GRAIN_PARAMS,
+
+ /**
+ * Bounding box generated by dnn based filters for object detection and
classification,
+ * the data is an array of AVDnnBoudingBox, the number of array element is
implied by
+ * AVFrameSideData.size / AVDnnBoudingBox.self_size.
+ */
+ AV_FRAME_DATA_DNN_BBOXES,
};
enum AVActiveFormatDescription {
- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".