Apr 9, 2021, 17:10 by yejun....@intel.com: > > >> -----Original Message----- >> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of >> Guo, Yejun >> Sent: 2021年4月9日 20:57 >> To: FFmpeg development discussions and patches >> <ffmpeg-devel@ffmpeg.org> >> Subject: Re: [FFmpeg-devel] [PATCH V7 4/6] lavu: add side data >> AV_FRAME_DATA_BOUNDING_BOXES >> >> >> >> > -----Original Message----- >> > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of >> > Lynne >> > Sent: 2021年4月9日 18:03 >> > To: FFmpeg development discussions and patches >> > <ffmpeg-devel@ffmpeg.org> >> > Subject: Re: [FFmpeg-devel] [PATCH V7 4/6] lavu: add side data >> > AV_FRAME_DATA_BOUNDING_BOXES >> > >> > Apr 9, 2021, 06:12 by yejun....@intel.com: >> > >> > > >> > > >> > >> -----Original Message----- >> > >> From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf >> Of >> > Lynne >> > >> Sent: 2021年4月9日 0:57 >> > >> To: FFmpeg development discussions and patches >> > <ffmpeg-devel@ffmpeg.org> >> > >> Subject: Re: [FFmpeg-devel] [PATCH V7 4/6] lavu: add side data >> > >> AV_FRAME_DATA_BOUNDING_BOXES >> > >> >> > > >> > > First of all, thanks for the quick replies, I see, all the >> > discussions/comments are to >> > > make this patch better, thank you. >> > > >> > >> >> > >> > >> >> >> >> > + >> > >> >> >> >> > +typedef struct AVBoundingBoxHeader { >> > >> >> >> >> > + /** >> > >> >> >> >> > + * Information about how the bounding box is >> > generated. >> > >> >> >> >> > + * for example, the DNN model name. >> > >> >> >> >> > + */ >> > >> >> >> >> > + char source[128]; >> > >> >> >> >> > + >> > >> >> >> >> > + /** >> > >> >> >> >> > + * The size of frame when it is detected. >> > >> >> >> >> > + */ >> > >> >> >> >> > + int frame_width; >> > >> >> >> >> > + int frame_height; >> > >> >> >> >> > >> > >> >> >> >> >> > >> >> >> >> Why? This side data is attached to AVFrames only, where we >> > >> >> >> >> already have width and height. >> > >> >> >> >> >> > >> >> >> > >> > >> >> >> > The detection result will be used by other filters, for example, >> > >> >> >> > dnn_classify (see >> > https://github.com/guoyejun/ffmpeg/tree/classify). >> > >> >> >> > >> > >> >> >> > The filter dnn_detect detects all the objects (cat, dog, >> person ...) >> > in a >> > >> >> >> > frame, while dnn_classify classifies one detected object (for >> > example, >> > >> >> person) >> > >> >> >> > for its attribute (for example, emotion, etc.) >> > >> >> >> > >> > >> >> >> > The filter dnn_classify have to check if the frame size is >> changed >> > after >> > >> >> >> > it is detected, to handle the below filter chain: >> > >> >> >> > dnn_detect -> scale -> dnn_classify >> > >> >> >> > >> > >> >> >> >> > >> >> >> This doesn't look good. Why is dnn_classify needing to know >> > >> >> >> the original frame size at all? >> > >> >> >> >> > >> >> > >> > >> >> > For example, the original size of the frame is 100*100, and >> > dnn_detect >> > >> >> > detects a face at place (10, 10) -> (30, 40), such data will be >> > >> >> > saved >> in >> > >> >> > AVBoundingBox.top/left/right/bottom. >> > >> >> > >> > >> >> > Then, the frame is scaled into 50*50. >> > >> >> > >> > >> >> > Then, dnn_classify is used to analyze the emotion of the face, it >> > needs to >> > >> >> > know the frame size (100*100) when it is detected, otherwise, it >> > does not >> > >> >> > work with just (10,10), (30,40) and 50*50. >> > >> >> > >> > >> >> >> > >> >> Why can't the scale filter also rescale this side data as well? >> > >> >> >> > >> > >> > >> > I'm afraid that we could not make sure all such filters (including >> filters >> > in the >> > >> > future) to do the rescale. And in the previous discussion, I got to >> > know that >> > >> > 'many other existing side-data types are invalidated by scaling'. >> > >> > >> > >> > So, we need frame_width and frame_height here. >> > >> > >> > >> >> > >> No, you don't. You just need to make sure filters which change >> > resolution >> > >> or do cropping also change the side data parameters. >> > >> It's called maintainership. As-is, this won't even work with cropping, >> > >> only with basic aspect ratio preserving scaling. >> > >> For the lack of a better term, this is a hack. >> > >> >> > > >> > > As discussed in previous email, for the frame size change case, >> > dnn_classify >> > > (and other filters which use the detection result, for example drawbox) >> > can >> > > just output a warning message to tell user what happens, and don't do >> > the >> > > classification, otherwise, it will give a wrong/weird result which makes >> > the >> > > user confused. >> > > >> > >> >> > >> I would accept just specifying that if the frame dimensions are >> > >> altered in any way, the side-data is no longer valid and it's up >> > >> to users to figure that out by out of bound coordinates. >> > >> This is what we currently do with video_enc_params. >> > >> >> > > >> > > frame_width/frame_height is not perfect (for the cases such as: scale >> > down >> > > + crop + scale up to the same size), but it provides more info than the >> > checking >> > > of 'out of bound coordinates'. There are many other possible issues >> > when the >> > > coordinates are within the frame. >> > > >> > > If we think we'd better not let user get more info from the warning >> > message, >> > > I'm ok to remove them. >> > > >> > > I'll remove them if there's another comment supporting the removal, >> and >> > > there's no objection. >> > > >> > >> > We definitely shouldn't include variables in public API structs >> > that only serve to print a warning if they don't match. >> >> Not just 'print a warning', it also impacts the behavior of dnn_classify. >> >> > Especially one that's fragile and flawed like this one. >> > Users should know that scaling or altering a frame could break >> > this side data, and filters could detect obvious out of bounds >> > results and report them. >> >> I'll remove them since it is user's responsibility. >> >> > >> > In the meantime, the main scaling and cropping filters could >> > receive separate patches to preserve metadata at a later data. >> > This is how the avframe cropping field work - they're just metadata, >> > and cropping/scaling filters update those. >> > >> > >> > >> >> >> >> > diff --git a/libavutil/frame.h b/libavutil/frame.h >> > >> >> >> >> > index a5ed91b20a..41e22de02a 100644 >> > >> >> >> >> > --- a/libavutil/frame.h >> > >> >> >> >> > +++ b/libavutil/frame.h >> > >> >> >> >> > @@ -198,6 +198,13 @@ enum AVFrameSideDataType { >> > >> >> >> >> > * Must be present for every frame which should have >> film >> > grain >> > >> >> applied. >> > >> >> >> >> > */ >> > >> >> >> >> > AV_FRAME_DATA_FILM_GRAIN_PARAMS, >> > >> >> >> >> > + >> > >> >> >> >> > + /** >> > >> >> >> >> > + * Bounding boxes for object detection and >> > classification, the >> > >> >> data is >> > >> >> >> a >> > >> >> >> >> AVBoundingBoxHeader >> > >> >> >> >> > + * followed with an array of AVBoudingBox, and >> > >> >> >> >> AVBoundingBoxHeader.nb_bboxes is the number >> > >> >> >> >> > + * of array element. >> > >> >> >> >> > + */ >> > >> >> >> >> > + AV_FRAME_DATA_BOUNDING_BOXES, >> > >> >> >> >> > }; >> > >> >> >> >> > >> > >> >> >> >> >> > >> >> >> >> Finally, why call it a Bounding Box? It's not descriptive at >> > >> >> >> >> all. >> > >> >> >> >> How about "Object Classification"? It makes much more >> sense, >> > it's >> > >> >> >> >> exactly what this is. So AVObjectClassification, >> > AVObjectClassification, >> > >> >> >> >> AV_FRAME_DATA_OBJECT_CLASSIFICATION and so on. >> > >> >> >> >> >> > >> >> >> > >> > >> >> >> > In object detection papers, bounding box is usually used. >> > >> >> >> > We'd better use the same term, imho, thanks. >> > >> >> >> > >> > >> >> >> >> > >> >> >> Not in this case, API users won't have any idea what this is or >> > what >> > >> >> >> it's for. This is user-facing code after all. >> > >> >> >> Papers in fields can get away with overloading language, but >> > we're >> > >> >> >> trying to make a concise API. Object classification makes sense >> > >> >> >> because this is exactly what this is. >> > >> >> >> >> > >> >> > >> > >> >> > The term bounding box is dominating the domain, for example, >> > even >> > >> >> > HEVC spec uses this term, see page 317 of >> > >> >> > >> > >> >> >> > >> >> > >> https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.265-201911-I >> > !!P >> > >> >> DF-E&type=items >> > >> >> > >> > >> >> > also copy some here for your convenient. >> > >> >> > ar_bounding_box_top[ ar_object_idx[ i ] ] u(16) >> > >> >> > ar_bounding_box_left[ ar_object_idx[ i ] ] u(16) >> > >> >> > ar_bounding_box_width[ ar_object_idx[ i ] ] u(16) >> > >> >> > ar_bounding_box_height[ ar_object_idx[ i ] ] u(16) >> > >> >> > >> > >> >> > I would prefer to use bounding box. >> > >> >> > >> > >> >> >> > >> >> It's for an entirely different thing, and like I said, it's just an >> > overloaded >> > >> >> language because they can get away. We're trying to be generic. >> > >> >> This side data is for detecting _and_ classifying objects. In fact, >> most >> > of >> > >> >> the structure is dedicated towards classifying. If you'd like users >> > >> >> to >> > actually >> > >> >> use this, give it a good name and don't leave them guessing what >> > this >> > >> >> structure is by throwing vague jargon some other paper or spec >> has >> > >> >> because it's close enough. >> > >> >> >> > >> > >> > >> > all the people in the domain accepts bounding box, they can >> > understand this >> > >> > struct name easily and clearly, they might be bothered if we use >> > another >> > >> name. >> > >> > >> > >> > btw, AVObjectClassification confuses people who just want object >> > detection. >> > >> > >> > >> >> > >> As I said, the name "bounding box" makes no sense once it gets >> > overloaded >> > >> with object classification. >> > >> >> > > >> > > dnn_detect creates an array of 'bounding box' for all detected objects, >> > and >> > > dnn_classify assigns attributes for a set of bounding boxes (with same >> > object >> > > id). 'bounding box' serves both detection and classification properly. >> > > >> > > >> > >> Object classification is still the main use of the filters, >> > >> because the original proposal was to have all of this info be >> > ffmpeg-private, >> > >> which would forbid simple object detection. >> > >> >> > > >> > > The original proposal is to add it as side data which is ffmpeg-public, >> and >> > then, >> > > we spent much time discussing/trying with ffmpeg-private as an >> > temporary >> > > method, and since it is not good to be temporary, we now switch back >> to >> > > ffmpeg-public. >> > > >> > > During the whole period, we don't have any intention to >> > > 'forbid simple object detection', not quite understand your point here. >> > > >> > > >> > >> So I still maintain this should be called "Object classification". I'd >> accept >> > >> "Object detection" as well, but definitely not "bounding box". >> > >> >> > > >> > > imho, ' Object detection' and ' Object classification' are worse, they >> > > just >> > > describe one aspect of the struct. The users might just use filter >> > dnn_detect, >> > > they might use filters dnn_detect + dnn_classify. >> > > >> > >> > The whole reason why we have this side data is to both detect >> > _and_ classify. Keyword being _both_. Hence object detection >> > and object classification are much better names. >> > I am opposed to merging this without a name change. >> >> I understand it is not a good name with codec background, but it is a good >> (possible best) name with object detection background. >> >> To make things move forward, I'll change the name to 'object detection' >> for both dnn_detect and dnn_classify, they would be: >> AV_FRAME_DATA_OJBECT_DETECTION >> AVObjectDetection >> AVObjectDetectionHeader >> > > I tried to change the code with this new name, but feel like it is not > straight-forward during the coding. > > Pedro, who initialized DNN module, just commented on the naming, maybe > I can just wait for more several days to get more comments on the naming, > thanks. >
If you want to move forward with this patch, please send a new one with the field and name changes. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".