On 08/23/2016 08:49 PM, Nicolas George wrote:
+ int speaker_position; /** an OutputChannelPosition from ISO/IEC 23001-8 */
I think most our API users and even developers have better use for 158 Swiss
Francs than feeding the ISO. Please make the API self-sufficient; I suggest
an enum.
I'll introduce an enumeration AVSpeakerPosition. However, the spec is
available publicly at
http://standards.iso.org/ittf/PubliclyAvailableStandards/c062088_ISO_IEC_23001-8_2013.zip
. (And the table of output channel positions starts at page 24.)
Le septidi 7 fructidor, an CCXXIV, erkki.seppala....@nokia.com a écrit :
From: Erkki Seppälä <erkki.seppala....@nokia.com>
Added support for passing complex channel layout configuration as side
packet data (AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT,
AV_PKT_DATA_AUDIO_CHANNEL_PREDEFINED_LAYOUT,
AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT_OBJECT_STRUCTURED) to ISO media files
as specified by ISO/IEC 14496-12.
This information isn't integrated into the existing channel layout
system though,
I think it needs to be.
Unfortunately properly integrating this information into FFmpeg sounds
larger than the scope of my project. I imagine this still would be very
helpful base work should such information be actually supported by
various parts of FFmpeg.
A special bit in the AV_CH_LAYOUT_* could be used for indicating that a
special layout exists, but that information by itself is nowhere enough
to work with this data with ie. in the functions available in
channel_layout.c. And if codec or client code is able to deal with the
advanced channel layout configurations, they also have access to this
side data.
I also believe is is easier to study more deeper integration once a
non-intrusive possibility to work with advanced ISOBMFF channel layouts
exists.
Here is how the mapping on speaker position level might be done:
/** Speaker positions according to ISO/IEC 23001-8 */
enum AVSpeakerPosition {
! AV_SPEAKER_POSITION_L = 0, /// Left front
! AV_SPEAKER_POSITION_R = 1, /// Right front
AV_SPEAKER_POSITION_C = 2, /// Centre front
AV_SPEAKER_POSITION_LFE = 3, /// Low frequency enhancement
AV_SPEAKER_POSITION_LS = 4, /// Left surround
AV_SPEAKER_POSITION_RS = 5, /// Right surround
! AV_SPEAKER_POSITION_LC = 6, /// Left front centre
! AV_SPEAKER_POSITION_RC = 7, /// Right front centre
AV_SPEAKER_POSITION_LSR = 8, /// Rear surround left
AV_SPEAKER_POSITION_RSR = 9, /// Rear surround right
AV_SPEAKER_POSITION_CS = 10, /// Rear centre
AV_SPEAKER_POSITION_LSD = 11, /// Left surround direct
AV_SPEAKER_POSITION_RSD = 12, /// Right surround direct
AV_SPEAKER_POSITION_LSS = 13, /// Left side surround
AV_SPEAKER_POSITION_RSS = 14, /// Right side surround
! AV_SPEAKER_POSITION_LW = 15, /// Left wide front
! AV_SPEAKER_POSITION_RW = 16, /// Right wide front
AV_SPEAKER_POSITION_LV = 17, /// Left front vertical height
AV_SPEAKER_POSITION_RV = 18, /// Right front vertical height
AV_SPEAKER_POSITION_CV = 19, /// Centre front vertical height
AV_SPEAKER_POSITION_LVR = 20, /// Left surround vertical height
rear
AV_SPEAKER_POSITION_RVR = 21, /// Right surround vertical
height rear
AV_SPEAKER_POSITION_CVR = 22, /// Centre vertical height rear
AV_SPEAKER_POSITION_LVSS = 23, /// Left vertical height side
surround
AV_SPEAKER_POSITION_RVSS = 24, /// Right vertical height side
surround
AV_SPEAKER_POSITION_TS = 25, /// Top centre surround
AV_SPEAKER_POSITION_LFE2 = 26, /// E2 Low frequency enhancement 2
AV_SPEAKER_POSITION_LB = 27, /// Left front vertical bottom
AV_SPEAKER_POSITION_RB = 28, /// Right front vertical bottom
AV_SPEAKER_POSITION_CB = 29, /// Centre front vertical bottom
AV_SPEAKER_POSITION_LVS = 30, /// Left vertical height surround
AV_SPEAKER_POSITION_RVS = 31, /// Right vertical height surround
/// 32-45 Reserved
AV_SPEAKER_POSITION_LFE3 = 36, /// E3 Low frequency enhancement 3
AV_SPEAKER_POSITION_LEOS = 37, /// Left edge of screen
AV_SPEAKER_POSITION_REOS = 38, /// Right edge of screen
AV_SPEAKER_POSITION_HWBCAL = 39, /// half-way between centre of
screen and left edge of screen
AV_SPEAKER_POSITION_HWBCAR = 40, /// half-way between centre of
screen and right edge of screen
AV_SPEAKER_POSITION_LBS = 41, /// Left back surround
AV_SPEAKER_POSITION_RBS = 42, /// Right back surround
/// 43–125 Reserved
AV_SPEAKER_POSITION_EXPL = 126, /// Explicit position (see text)
/// 127 Unknown / undefined
};
I suppose if this is preferred, then one for 23001-8 channel layouts
would be preferred as well..
It does seem confusing though that similar #defines are available in
channel_layout.h and I am a bit uncertain of a 1:1 mapping between the
two. I've marked in above with ! the ones that I found a match from the
defines:
AV_CH_FRONT_LEFT AV_SPEAKER_POSITION_L
AV_CH_FRONT_RIGHT AV_SPEAKER_POSITION_R
AV_CH_FRONT_CENTER AV_SPEAKER_POSITION_LC
AV_CH_LOW_FREQUENCY AV_SPEAKER_POSITION_LFE
AV_CH_BACK_LEFT AV_SPEAKER_POSITION_LBS
AV_CH_BACK_RIGHT AV_SPEAKER_POSITION_RBS
AV_CH_FRONT_LEFT_OF_CENTER AV_SPEAKER_POSITION_LC
AV_CH_FRONT_RIGHT_OF_CENTER AV_SPEAKER_POSITION_RC
AV_CH_BACK_CENTER
AV_CH_SIDE_LEFT AV_SPEAKER_POSITION_LSS
AV_CH_SIDE_RIGHT AV_SPEAKER_POSITION_RSS
AV_CH_TOP_CENTER AV_SPEAKER_POSITION_TS
AV_CH_TOP_FRONT_LEFT
AV_CH_TOP_FRONT_CENTER
AV_CH_TOP_FRONT_RIGHT
AV_CH_TOP_BACK_LEFT
AV_CH_TOP_BACK_CENTER
AV_CH_TOP_BACK_RIGHT
AV_CH_STEREO_LEFT
AV_CH_STEREO_RIGHT
AV_CH_WIDE_LEFT AV_SPEAKER_POSITION_LW
AV_CH_WIDE_RIGHT AV_SPEAKER_POSITION_RW
AV_CH_SURROUND_DIRECT_LEFT AV_SPEAKER_POSITION_LSD
AV_CH_SURROUND_DIRECT_RIGHT AV_SPEAKER_POSITION_RSD
AV_CH_LOW_FREQUENCY_2 AV_SPEAKER_POSITION_LFE2
If this approach were taken, then the missing speakers positions
(channels?) would need to be added, conversion maps for 23001-8 would
need to be implemented (and probably refusing if there are non-mappable
positions). Then layout bitmaps would be mapped into 23001-8 ones or if
the layout cannot be found, a custom one generated.. Personally I don't
mind how separate&simple this system still is :).
which is much more restricted compared to what the
standard permits.
Certainly, but the bitfield channel layout system is enough for most cases
and well tested. Any new system must work with it, not around it.
But as you may agree, such a modification would not be small. Certainly
not one I would try as my first feature to FFmpeg.
Please follow the surrounding style for Doxygen comments: /** and */ on a
separate line, * at the beginning of each line.
Oops. I'll fix those.
+ /** The following are used if speaker_position == 126 */
+ int azimuth;
+ int elevation;
Please document the units and references.
Done.
Also, since it is repeated and packed, using a small type would make sense
here if possible.
I'll use the same types as in the headers: int16_t and int8_t.
+} AVAudioTrackChannelPosition;
+
+/** Describes the channel layout (ie. speaker position) of a single audio
track */
+typedef struct AVAudioTrackChannelLayout {
+ int nb_positions;
+ /** followed by: AVAudioTrackChannelPosition positions[nb_positions]; */
AVAudioTrackChannelPosition positions[0];
That way, computing the address is taken care of by the compiler without
tricky pointer arithmetic, including alignment requirements.
Actually this is the way I had it in the first place, but compiler
warnings convinced me not to do it. If I'll use positions[1] the
allocations will end up slightly too large unless without slightly
convoluted code.
Is there a preferred way? It doesn't seem like foo[1] is used anywhere.
(We already use 0-sized final arrays once in vaapi_encode.h; since VA API is
for Linux and BSD, it does not probe the crappy proprietary compilers
support it. In that case, using 1 instead of 0 is fine.)
In particular I don't want to do it conditionally :).
+} AVAudioTrackChannelLayout;
+
+/** Describes the channel layout based on predefined layout of a single track
+ by providing the layout and the list of channels are are omitted */
Could you make that clearer?
+typedef struct AVAudioTrackChannelPredefinedLayout {
+ int layout; /** ChannelConfiguration from ISO/IEC 23001-8 */
+ int nb_omitted_channels;
+ /** followed by: char omitted_channels[nb_omitted_channels]; - non-zero
indicates the channel is omitted */
Is there a reasonable upper bound to nb_omitted_channels?
Yes, ISOBMFF supports only 64 channels for this mask. But perhaps even a
higher number is usable for some special case scenarios, not involving
ISOBMFF?
Also, I think the naming is inconsistent: if nb_omitted_channels = 3 and
omitted_channels[] = { 1, 0, 1 }, there are only two omitted channels, not
three.
I agree that it seems a bit inconsistent.. Would an integer array
enumerating the omitted channels be better? Of course this whole problem
goes away if the structure is fixed-size (ie. uint8_t
omitted_channels_bitmask[64 / 8] or the uint64_t you suggested).
enum AVPacketSideDataType {
AV_PKT_DATA_PALETTE,
@@ -1556,7 +1585,27 @@ enum AVPacketSideDataType {
* Configured the timed metadata parameters, such as the uri and
* meta data configuration. The key is of type AVTimedMetadata.
*/
- AV_PKT_DATA_TIMED_METADATA_INFO
+ AV_PKT_DATA_TIMED_METADATA_INFO,
+
+ /**
+ * Channel layout, describing the position of spakers for the
+ * channels of a track, following the structure
+ * AVAudioTrackChannelLayout.
+ */
+ AV_PKT_DATA_AUDIO_CHANNEL_LAYOUT,
I think it would be a good idea if the name of the side data matches the
name of the structure as much as possible: Track -> _TRACK_. Ditto for the
others.
Would it make sense to have a single side data type and an integer field at
the beginning announcing the type of the rest:
typedef struct AVComplexChannelLayout {
enum AVComplexChannelLayoutType type;
union {
uint64_t bitmask;
AVAudioTrackChannelLayout complete;
AVAudioTrackChannelPredefinedLayout predefined;
AVAudioTrackChannelLayoutObjectStructured object;
} layout;
};
It does seem nice. The actual structs/enums might look like this:
typedef struct AVAudioTrackChannelLayout {
int nb_positions;
/** Contains nb_positions entries*/
AVAudioTrackChannelPosition positions[1];
} AVAudioTrackChannelLayout;
typedef struct AVAudioTrackChannelPredefinedLayout {
int layout; /** ChannelConfiguration from ISO/IEC 23001-8 */
uint64_t omitted_channels;
} AVAudioTrackChannelPredefinedLayout;
typedef enum AVComplexAudioTrackChannelLayoutType {
AV_COMPLEX_CHANNEL_LAYOUT_PREDEFINED,
AV_COMPLEX_CHANNEL_LAYOUT_CUSTOM
} AVComplexAudioTrackChannelLayoutType;
typedef struct AVComplexAudioChannelLayout {
enum AVComplexAudioTrackChannelLayoutType type;
union {
AVAudioTrackChannelPredefinedLayout predefined;
AVAudioTrackChannelLayout complete;
};
} AVComplexAudioChannelLayout;
enum AVPacketSideDataType {
...
/**
* Channel layout, describing the position of speakers for the
* channels of a track, following the structure
* AVComplexAudioChannelLayout.
*/
AV_PKT_DATA_COMPLEX_AUDIO_CHANNEL_LAYOUT,
/**
* The channel layout is object structured with the number of
objects in an int (may accompany AV_PKT_DATA_COMPLEX_AUDIO_CHANNEL_LAYOUT)
*/
AV_PKT_DATA_COMPLEX_AUDIO_TRACK_CHANNEL_LAYOUT_OBJECT_STRUCTURED,
} AVPacketSideDataType;
I'm not super happy about the long names (though tempted to
s/AVAudio/AVComplexAudio/), but what is the alternative :).
Thanks for reviewing and feedback!
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel