[FFmpeg-devel] WHIP Feature latest patch Preparation Notes

Jack Lau Fri, 16 May 2025 19:57:21 -0700

Hi everyone,
I’m excited to be a GSoC student this year, working on supporting the WHIP 
feature in FFmpeg with my mentors Steven Liu and Zhao Jun.
Two years ago, my mentor Steven Liu sent a whip 
patch(https://patchwork.ffmpeg.org/project/ffmpeg/patch/20230529115039.17409-1...@chinaffmpeg.org/)
 and got some reviews, most of these reviews were fixed except DTLS. So I 
refactored the DTLS implementation and merged it into tls_openssl.c. 
Before I send out the latest patch, I plan to respond to the previous reviews 
one by one. BTW, most of response from Winlin 
(https://github.com/ossrs/ffmpeg-webrtc/discussions)


 On Mon May 29 16:00:33 EEST 2023, Michael Niedermayer wrote:
> 
> On Mon, May 29, 2023 at 07:50:39PM +0800, Steven Liu wrote:
> > Co-authored-by: winlin <winlinvip at gmail.com>
> > Co-authored-by: yangrtc <yangrtc at aliyun.com>
> > Co-authored-by: cloudwebrtc <duanweiwei1982 at gmail.com>
> > Co-authored-by: Haibo Chen <495810242 at qq.com>
> > Signed-off-by: Steven Liu <lq at chinaffmpeg.org>
> > ---
> > configure | 1 +
> > doc/muxers.texi | 50 +
> > libavformat/Makefile | 1 +
> > libavformat/allformats.c | 1 +
> > libavformat/http.c | 6 +
> > libavformat/http.h | 2 +
> > libavformat/rtcenc.c | 2208 ++++++++++++++++++++++++++++++++++++++
> > 7 files changed, 2269 insertions(+)
> > create mode 100644 libavformat/rtcenc.c
> 
> breaks build:
> 
> CC libavformat/rtcenc.o
> libavformat/rtcenc.c:30:2: error: #error "OpenSSL version 1.1.1b or newer is 
> required"
> #error "OpenSSL version 1.1.1b or newer is required"
> ^~~~~
> libavformat/rtcenc.c: In function ‘dtls_context_init’:
> libavformat/rtcenc.c:210:34: error: implicit declaration of function 
> ‘EVP_EC_gen’; did you mean ‘EVP_PBE_get’? 
> [-Werror=implicit-function-declaration]
> ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
> ^~~~~~~~~~
> EVP_PBE_get
Winlin: This issue was caused by a combination of OpenSSL version check and API 
usage. I tested a set of OpenSSL versions, fixed the issue, and found that 
OpenSSL 1.0.1k and newer versions should work now.
OpenSSL 1.0.1k or newer versions are supported.
Fixed.
> libavformat/rtcenc.c:210:32: warning: assignment makes pointer from integer 
> without a cast [-Wint-conversion]
> ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
> ^
> cc1: some warnings being treated as errors
> ffbuild/common.mak:81: recipe for target 'libavformat/rtcenc.o' failed
> make: *** [libavformat/rtcenc.o] Error 1
> 
> [...]
> -- 
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB




On Tue May 30 11:44:06 EEST 2023, Kieran Kunhya wrote:


> On Mon, 29 May 2023 at 12:51, Steven Liu <lq at chinaffmpeg.org> wrote:
> 
> > Co-authored-by: winlin <winlinvip at gmail.com>
> > Co-authored-by: yangrtc <yangrtc at aliyun.com>
> > Co-authored-by: cloudwebrtc <duanweiwei1982 at gmail.com>
> > Co-authored-by: Haibo Chen <495810242 at qq.com>
> > Signed-off-by: Steven Liu <lq at chinaffmpeg.org>
> > ---
> > configure | 1 +
> > doc/muxers.texi | 50 +
> > libavformat/Makefile | 1 +
> > libavformat/allformats.c | 1 +
> > libavformat/http.c | 6 +
> > libavformat/http.h | 2 +
> > libavformat/rtcenc.c | 2208 ++++++++++++++++++++++++++++++++++++++
> > 7 files changed, 2269 insertions(+)
> > create mode 100644 libavformat/rtcenc.c
> >
> > diff --git a/configure b/configure
> > index 495493aa0e..526ddc0bd6 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3483,6 +3483,7 @@ ogg_demuxer_select="dirac_parse"
> > ogv_muxer_select="ogg_muxer"
> > opus_muxer_select="ogg_muxer"
> > psp_muxer_select="mov_muxer"
> > +rtc_muxer_deps_any="openssl"
> > rtp_demuxer_select="sdp_demuxer"
> > rtp_mpegts_muxer_select="mpegts_muxer rtp_muxer"
> > rtpdec_select="asf_demuxer mov_demuxer mpegts_demuxer rm_demuxer
> > rtp_protocol srtp"
> > diff --git a/doc/muxers.texi b/doc/muxers.texi
> > index 31fca17dd6..00052f51b4 100644
> > --- a/doc/muxers.texi
> > +++ b/doc/muxers.texi
> > @@ -1333,6 +1333,56 @@ Set custom HTTP headers, can override built in
> > default headers. Applicable only
> >
> > @end table
> >
> > + at anchor{rtc}
> > + at section rtc
> > +
> > +WebRTC (Real-Time Communication) muxer that supports sub-second latency
> > streaming according to
> > +the WHIP (WebRTC-HTTP ingestion protocol) specification.
> > +
> > +It uses HTTP as a signaling protocol to exchange SDP capabilities and ICE
> > lite candidates. Then,
> > +it uses STUN binding requests and responses to establish a session over
> > UDP. Subsequently, it
> > +initiates a DTLS handshake to exchange the SRTP encryption keys. Lastly,
> > it splits video and
> > +audio frames into RTP packets and encrypts them using SRTP.
> > +
> > +Ensure that you use H.264 without B frames and Opus for the audio codec.
> > For example, to convert
> > +an input file with @command{ffmpeg} to WebRTC:
> > + at example
> > +ffmpeg -re -i input.mp4 -acodec libopus -ar 48000 -ac 2 \
> > + -vcodec libx264 -profile:v baseline -tune zerolatency -threads 1 -bf 0 \
> > + -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
> > + at end example
> > +
> > +For this example, we have employed low latency options, resulting in an
> > end-to-end latency of
> > +approximately 150ms.
> > +
> > + at subsection Options
> > +
> > +This muxer supports the following options:
> > +
> > + at table @option
> > +
> > + at item ice_arq_max @var{size}
> > +Set the maximum number of retransmissions for the ICE ARQ mechanism.
> > +Default value is 5.
> > +
> > + at item ice_arq_timeout @var{size}
> > +Set the start timeout in milliseconds for the ICE ARQ mechanism.
> > +Default value is 30.
> > +
> > + at item dtls_arq_max @var{size}
> > +Set the maximum number of retransmissions for the DTLS ARQ mechanism.
> > +Default value is 5.
> > +
> > + at item dtls_arq_timeout @var{size}
> > +Set the start timeout in milliseconds for the DTLS ARQ mechanism.
> > +Default value is 50.
> > +
> > + at item pkt_size @var{size}
> > +Set the maximum size, in bytes, of RTP packets that send out.
> > +Default value is 1500.
> > +
> > + at end table
> > +
> > @anchor{ico}
> > @section ico
> >
> > diff --git a/libavformat/Makefile b/libavformat/Makefile
> > index f8ad7c6a11..2a76e613b0 100644
> > --- a/libavformat/Makefile
> > +++ b/libavformat/Makefile
> > @@ -493,6 +493,7 @@ OBJS-$(CONFIG_RSD_DEMUXER) += rsd.o
> > OBJS-$(CONFIG_RPL_DEMUXER) += rpl.o
> > OBJS-$(CONFIG_RSO_DEMUXER) += rsodec.o rso.o pcm.o
> > OBJS-$(CONFIG_RSO_MUXER) += rsoenc.o rso.o rawenc.o
> > +OBJS-$(CONFIG_RTC_MUXER) += rtcenc.o avc.o http.o srtp.o
> > OBJS-$(CONFIG_RTP_MPEGTS_MUXER) += rtpenc_mpegts.o
> > OBJS-$(CONFIG_RTP_MUXER) += rtp.o \
> > rtpenc_aac.o \
> > diff --git a/libavformat/allformats.c b/libavformat/allformats.c
> > index efdb34e29d..56e852e5ea 100644
> > --- a/libavformat/allformats.c
> > +++ b/libavformat/allformats.c
> > @@ -392,6 +392,7 @@ extern const AVInputFormat ff_rpl_demuxer;
> > extern const AVInputFormat ff_rsd_demuxer;
> > extern const AVInputFormat ff_rso_demuxer;
> > extern const FFOutputFormat ff_rso_muxer;
> > +extern const FFOutputFormat ff_rtc_muxer;
> > extern const AVInputFormat ff_rtp_demuxer;
> > extern const FFOutputFormat ff_rtp_muxer;
> > extern const FFOutputFormat ff_rtp_mpegts_muxer;
> > diff --git a/libavformat/http.c b/libavformat/http.c
> > index 0817aafb5b..6f509656b5 100644
> > --- a/libavformat/http.c
> > +++ b/libavformat/http.c
> > @@ -530,6 +530,12 @@ int ff_http_averror(int status_code, int
> > default_averror)
> > return default_averror;
> > }
> >
> > +const char* ff_http_get_new_location(URLContext *h)
> > +{
> > + HTTPContext *s = h->priv_data;
> > + return s->new_location;
> > +}
> > +
> > static int http_write_reply(URLContext* h, int status_code)
> > {
> > int ret, body = 0, reply_code, message_len;
> > diff --git a/libavformat/http.h b/libavformat/http.h
> > index 5f650ef143..d1b691826b 100644
> > --- a/libavformat/http.h
> > +++ b/libavformat/http.h
> > @@ -62,4 +62,6 @@ int ff_http_do_new_request2(URLContext *h, const char
> > *uri, AVDictionary **optio
> >
> > int ff_http_averror(int status_code, int default_averror);
> >
> > +const char* ff_http_get_new_location(URLContext *h);
> > +
> > #endif /* AVFORMAT_HTTP_H */
> > diff --git a/libavformat/rtcenc.c b/libavformat/rtcenc.c
> > new file mode 100644
> > index 0000000000..65173d0073
> > --- /dev/null
> > +++ b/libavformat/rtcenc.c
> > @@ -0,0 +1,2208 @@
> > +/*
> > + * WebRTC-HTTP ingestion protocol (WHIP) muxer
> > + * Copyright (c) 2023 The FFmpeg Project
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > 02110-1301 USA
> > + */
> > +
> > +#include "config.h"
> > +
> > +#ifndef CONFIG_OPENSSL
> > +#error "DTLS is not supported, please enable openssl"
> > +#else
> > +#include <openssl/ssl.h>
> > +#include <openssl/err.h>
> > +#if OPENSSL_VERSION_NUMBER < 0x1010102fL
> > +#error "OpenSSL version 1.1.1b or newer is required"
> > +#endif
> > +#endif
> > +
> > +#include "libavcodec/avcodec.h"
> > +#include "libavutil/base64.h"
> > +#include "libavutil/bprint.h"
> > +#include "libavutil/crc.h"
> > +#include "libavutil/hmac.h"
> > +#include "libavutil/opt.h"
> > +#include "libavutil/random_seed.h"
> > +#include "libavutil/time.h"
> > +#include "avc.h"
> > +#include "avio_internal.h"
> > +#include "http.h"
> > +#include "internal.h"
> > +#include "mux.h"
> > +#include "network.h"
> > +#include "srtp.h"
> > +
> > +/**
> > + * Maximum size limit of a Session Description Protocol (SDP),
> > + * be it an offer or answer.
> > + */
> > +#define MAX_SDP_SIZE 8192
> >
> 
> ?
> 
> 
Winlin: When generating an SDP offer or parsing an SDP answer, a maximum length 
for the SDP is established.
I don't really understand what the question is about. Won't fix.
> 
> 
> 
> 
> > +/**
> > + * Maximum size of the buffer for sending and receiving UDP packets.
> > + * Please note that this size does not limit the size of the UDP packet
> > that can be sent.
> > + * To set the limit for packet size, modify the `pkt_size` parameter.
> > + * For instance, it is possible to set the UDP buffer to 4096 to send or
> > receive packets,
> > + * but please keep in mind that the `pkt_size` option limits the packet
> > size to 1400.
> > + */
> > +#define MAX_UDP_BUFFER_SIZE 4096
> > +/*
> >
> >
> 
> 
> > +
> > +/**
> > + * Parses the ISOM AVCC format of extradata and extracts SPS/PPS.
> > + *
> > + * This function is used to parse SPS/PPS from the extradata in ISOM AVCC
> > format.
> > + * It can handle both ISOM and annexb formats but only parses data in
> > ISOM format.
> > + * If the extradata is in annexb format, this function ignores it, and
> > uses the entire
> > + * extradata as a sequence header with SPS/PPS. Refer to
> > ff_isom_write_avcc.
> > + *
> > + * @param s Pointer to the AVFormatContext
> > + * @param extradata Pointer to the extradata
> > + * @param extradata_size Size of the extradata
> > + * @returns Returns 0 if successful or AVERROR_xxx in case
> > of an error.
> > + */
> > +static int isom_read_avcc(AVFormatContext *s, uint8_t *extradata, int
> > extradata_size)
> > +{
> > + int ret = 0;
> > + uint8_t version, nal_length_size, nb_sps, nb_pps;
> > + AVIOContext *pb;
> > + RTCContext *rtc = s->priv_data;
> > +
> > + if (!extradata || !extradata_size)
> > + return 0;
> > +
> > + /* Not H.264 ISOM format, may be annexb etc. */
> > + if (extradata_size < 4 || extradata[0] != 1) {
> > + if (!ff_avc_find_startcode(extradata, extradata +
> > extradata_size)) {
> > + av_log(s, AV_LOG_ERROR, "Format must be ISOM or annexb\n");
> > + return AVERROR_INVALIDDATA;
> > + }
> > + return 0;
> > + }
> > +
> > + /* Parse the SPS/PPS in ISOM format in extradata. */
> > + pb = avio_alloc_context(extradata, extradata_size, 0, NULL, NULL,
> > NULL, NULL);
> > + if (!pb)
> > + return AVERROR(ENOMEM);
> > +
> > + version = avio_r8(pb); /* version */
> > + avio_r8(pb); /* avc profile */
> > + avio_r8(pb); /* avc profile compat */
> > + avio_r8(pb); /* avc level */
> > + nal_length_size = avio_r8(pb); /* 6 bits reserved (111111) + 2 bits
> > nal size length - 1 (11) */
> > + nb_sps = avio_r8(pb); /* 3 bits reserved (111) + 5 bits number of sps
> > */
> > +
> > + if (version != 1) {
> > + av_log(s, AV_LOG_ERROR, "Invalid version=%d\n", version);
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + rtc->avc_nal_length_size = (nal_length_size & 0x03) + 1;
> > + if (rtc->avc_nal_length_size == 3) {
> > + av_log(s, AV_LOG_ERROR, "Invalid nal length size=%d\n",
> > rtc->avc_nal_length_size);
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + /* Read SPS */
> > + nb_sps &= 0x1f;
> > + if (nb_sps != 1 || avio_feof(pb)) {
> > + av_log(s, AV_LOG_ERROR, "Invalid number of sps=%d, eof=%d\n",
> > nb_sps, avio_feof(pb));
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + rtc->avc_sps_size = avio_rb16(pb); /* sps size */
> > + if (rtc->avc_sps_size <= 0 || avio_feof(pb)) {
> > + av_log(s, AV_LOG_ERROR, "Invalid sps size=%d, eof=%d\n",
> > rtc->avc_sps_size, avio_feof(pb));
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + rtc->avc_sps = av_malloc(rtc->avc_sps_size);
> > + if (!rtc->avc_sps) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > +
> > + ret = avio_read(pb, rtc->avc_sps, rtc->avc_sps_size); /* sps */
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to read sps, size=%d\n",
> > rtc->avc_sps_size);
> > + goto end;
> > + }
> > +
> > + /* Read PPS */
> > + nb_pps = avio_r8(pb); /* number of pps */
> > + if (nb_pps != 1 || avio_feof(pb)) {
> > + av_log(s, AV_LOG_ERROR, "Invalid number of pps=%d, eof=%d\n",
> > nb_pps, avio_feof(pb));
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + rtc->avc_pps_size = avio_rb16(pb); /* pps size */
> > + if (rtc->avc_pps_size <= 0 || avio_feof(pb)) {
> > + av_log(s, AV_LOG_ERROR, "Invalid pps size=%d, eof=%d\n",
> > rtc->avc_pps_size, avio_feof(pb));
> > + ret = AVERROR_INVALIDDATA;
> > + goto end;
> > + }
> > +
> > + rtc->avc_pps = av_malloc(rtc->avc_pps_size);
> > + if (!rtc->avc_pps) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > +
> > + ret = avio_read(pb, rtc->avc_pps, rtc->avc_pps_size); /* pps */
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to read pps, size=%d\n",
> > rtc->avc_pps_size);
> > + goto end;
> > + }
> > +
> > +end:
> > + avio_context_free(&pb);
> > + return ret;
> > +}
> >
> 
> This code is duplicated from other parts of FFmpeg
Winlin: Utilize the h264_mp4toannexb BSF and remove this function.
> 
> +/**
> 
> > + * Generate SDP offer according to the codec parameters, DTLS and ICE
> > information.
> > + * The below is an example of SDP offer:
> > + *
> > + * v=0
> > + * o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1
> > + * s=FFmpegPublishSession
> > + * t=0 0
> > + * a=group:BUNDLE 0 1
> > + * a=extmap-allow-mixed
> > + * a=msid-semantic: WMS
> > + *
> > + * m=audio 9 UDP/TLS/RTP/SAVPF 111
> > + * c=IN IP4 0.0.0.0
> > + * a=ice-ufrag:a174B
> > + * a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
> > + * a=fingerprint:sha-256
> > EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
> > + * a=setup:actpass
> > + * a=mid:0
> > + * a=sendonly
> > + * a=msid:FFmpeg audio
> > + * a=rtcp-mux
> > + * a=rtpmap:111 opus/48000/2
> > + * a=ssrc:4267647086 cname:FFmpeg
> > + * a=ssrc:4267647086 msid:FFmpeg audio
> > + *
> > + * m=video 9 UDP/TLS/RTP/SAVPF 106
> > + * c=IN IP4 0.0.0.0
> > + * a=ice-ufrag:a174B
> > + * a=ice-pwd:wY8rJ3gNLxL3eWZs6UPOxy
> > + * a=fingerprint:sha-256
> > EE:FE:A2:E5:6A:21:78:60:71:2C:21:DC:1A:2C:98:12:0C:E8:AD:68:07:61:1B:0E:FC:46:97:1E:BC:97:4A:54
> > + * a=setup:actpass
> > + * a=mid:1
> > + * a=sendonly
> > + * a=msid:FFmpeg video
> > + * a=rtcp-mux
> > + * a=rtcp-rsize
> > + * a=rtpmap:106 H264/90000
> > + * a=fmtp:106
> > level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
> > + * a=ssrc:107169110 cname:FFmpeg
> > + * a=ssrc:107169110 msid:FFmpeg video
> > + *
> >
> 
> This is not a legal SDP, each line must end only with one space (yes, this
> is regularly wrong).
Winlin: I'm sorry, this SDP example is not specific and correct SDP data from 
the real world. I added an empty line between m lines to make it easier for 
other users to read. I have also removed unncessary SDP examples in the 
comments.
> 
> 
> >
> >
> > +
> > +/**
> > + * Opens the UDP transport and completes the ICE handshake, using fast
> > retransmit to
> > + * handle packet loss for the binding request.
> > + *
> > + * To initiate a fast retransmission of the STUN binding request during
> > ICE, we wait only
> > + * for a successful local ICE process i.e., when a binding response is
> > received from the
> > + * server. Since the server's binding request may not arrive, we do not
> > always wait for it.
> > + * However, we will always respond to the server's binding request during
> > ICE, DTLS or
> > + * RTP streaming.
> > + *
> > + * @param s Pointer to the AVFormatContext
> > + * @return Returns 0 if the handshake was successful or AVERROR_xxx in
> > case of an error
> > + */
> > +static int ice_handshake(AVFormatContext *s)
> > +{
> > + int ret, size;
> > + char url[256], tmp[16];
> > + char req_buf[MAX_UDP_BUFFER_SIZE], res_buf[MAX_UDP_BUFFER_SIZE];
> > + RTCContext *rtc = s->priv_data;
> > + int fast_retries = rtc->ice_arq_max, timeout = rtc->ice_arq_timeout;
> > +
> > + /* Build UDP URL and create the UDP context as transport. */
> > + ff_url_join(url, sizeof(url), "udp", NULL, rtc->ice_host,
> > rtc->ice_port, NULL);
> > + ret = ffurl_alloc(&rtc->udp_uc, url, AVIO_FLAG_WRITE,
> > &s->interrupt_callback);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to open udp://%s:%d\n",
> > rtc->ice_host, rtc->ice_port);
> > + goto end;
> > + }
> > +
> > + av_opt_set(rtc->udp_uc->priv_data, "connect", "1", 0);
> > + av_opt_set(rtc->udp_uc->priv_data, "fifo_size", "0", 0);
> > + /* Set the max packet size to the buffer size. */
> > + snprintf(tmp, sizeof(tmp), "%d", rtc->pkt_size);
> > + av_opt_set(rtc->udp_uc->priv_data, "pkt_size", tmp, 0);
> > +
> > + ret = ffurl_connect(rtc->udp_uc, NULL);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to connect udp://%s:%d\n",
> > rtc->ice_host, rtc->ice_port);
> > + goto end;
> > + }
> > +
> > + /* Make the socket non-blocking, set to READ and WRITE mode after
> > connected */
> > + ff_socket_nonblock(ffurl_get_file_handle(rtc->udp_uc), 1);
> > + rtc->udp_uc->flags |= AVIO_FLAG_READ | AVIO_FLAG_NONBLOCK;
> > +
> > + /* Build the STUN binding request. */
> > + ret = ice_create_request(s, req_buf, sizeof(req_buf), &size);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to create STUN binding request,
> > size=%d\n", size);
> > + goto end;
> > + }
> > +
> > + /* Fast retransmit the STUN binding request. */
> > + while (1) {
> > + ret = ffurl_write(rtc->udp_uc, req_buf, size);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to send STUN binding request,
> > size=%d\n", size);
> > + goto end;
> > + }
> > +
> > + /* Wait so that the server can process the request and no need
> > ARQ then. */
> > +#if ICE_PROCESSING_TIMEOUT > 0
> > + av_usleep(ICE_PROCESSING_TIMEOUT * 10000);
> > +#endif
> > +
> > + /* Read the STUN binding response. */
> > + ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
> > + if (ret < 0) {
> > + /* If max retries is 6 and start timeout is 21ms, the total
> > timeout
> > + * is about 21 + 42 + 84 + 168 + 336 + 672 = 1263ms. */
> > + av_usleep(timeout * 1000);
> > + timeout *= 2;
> > +
> > + if (ret == AVERROR(EAGAIN) && fast_retries) {
> > + fast_retries--;
> > + continue;
> > + }
> > +
> > + av_log(s, AV_LOG_ERROR, "Failed to read STUN binding
> > response, retries=%d\n", rtc->ice_arq_max);
> > + goto end;
> > + }
> > +
> > + /* If got any binding response, the fast retransmission is done.
> > */
> > + if (ice_is_binding_response(res_buf, ret))
> > + break;
> > +
> > + /* When a binding request is received, it is necessary to respond
> > immediately. */
> > + if (ice_is_binding_request(res_buf, ret) && (ret =
> > ice_handle_binding_request(s, res_buf, ret)) < 0)
> > + goto end;
> > + }
> > +
> > + /* Wait just for a small while to get the possible binding request
> > from server. */
> > + fast_retries = rtc->ice_arq_max / 2;
> > + timeout = rtc->ice_arq_timeout;
> > + while (fast_retries) {
> > + ret = ffurl_read(rtc->udp_uc, res_buf, sizeof(res_buf));
> > + if (ret < 0) {
> > + /* If max retries is 6 and start timeout is 21ms, the total
> > timeout
> > + * is about 21 + 42 + 84 = 147ms. */
> > + av_usleep(timeout * 1000);
> > + timeout *= 2;
> > +
> > + if (ret == AVERROR(EAGAIN)) {
> > + fast_retries--;
> > + continue;
> > + }
> > +
> > + av_log(s, AV_LOG_ERROR, "Failed to read STUN binding request,
> > retries=%d\n", rtc->ice_arq_max);
> > + goto end;
> > + }
> > +
> > + /* When a binding request is received, it is necessary to respond
> > immediately. */
> > + if (ice_is_binding_request(res_buf, ret)) {
> > + if ((ret = ice_handle_binding_request(s, res_buf, ret)) < 0)
> > + goto end;
> > + break;
> > + }
> > + }
> > +
> > + av_log(s, AV_LOG_INFO, "WHIP: ICE STUN ok, url=udp://%s:%d,
> > location=%s, username=%s:%s, req=%dB, res=%dB, arq=%d\n",
> > + rtc->ice_host, rtc->ice_port, rtc->whip_resource_url ?
> > rtc->whip_resource_url : "",
> > + rtc->ice_ufrag_remote, rtc->ice_ufrag_local, size, ret,
> > + rtc->ice_arq_max - fast_retries);
> > + ret = 0;
> > +
> > +end:
> > + return ret;
> > +}
> > +
> > +/**
> > + * Establish the SRTP context using the keying material exported from
> > DTLS.
> > + *
> > + * Create separate SRTP contexts for sending video and audio, as their
> > sequences differ
> > + * and should not share a single context. Generate a single SRTP context
> > for receiving
> > + * RTCP only.
> > + *
> > + * @return 0 if OK, AVERROR_xxx on error
> > + */
> > +static int setup_srtp(AVFormatContext *s)
> > +{
> > + int ret;
> > + char recv_key[DTLS_SRTP_MASTER_KEY_LEN],
> > send_key[DTLS_SRTP_MASTER_KEY_LEN];
> > + char buf[AV_BASE64_SIZE(DTLS_SRTP_MASTER_KEY_LEN)];
> > + const char* suite = "AES_CM_128_HMAC_SHA1_80";
> > + RTCContext *rtc = s->priv_data;
> > +
> > + /* As DTLS client, the send key is client master key plus salt. */
> > + memcpy(send_key, rtc->dtls_ctx.dtls_srtp_material, 16);
> > + memcpy(send_key + 16, rtc->dtls_ctx.dtls_srtp_material + 32, 14);
> > +
> > + /* As DTLS client, the recv key is server master key plus salt. */
> > + memcpy(recv_key, rtc->dtls_ctx.dtls_srtp_material + 16, 16);
> > + memcpy(recv_key + 16, rtc->dtls_ctx.dtls_srtp_material + 46, 14);
> > +
> > + /* Setup SRTP context for outgoing packets */
> > + if (!av_base64_encode(buf, sizeof(buf), send_key, sizeof(send_key))) {
> > + av_log(s, AV_LOG_ERROR, "Failed to encode send key\n");
> > + ret = AVERROR(EIO);
> > + goto end;
> > + }
> > +
> > + ret = ff_srtp_set_crypto(&rtc->srtp_audio_send, suite, buf);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to set crypto for audio send\n");
> > + goto end;
> > + }
> > +
> > + ret = ff_srtp_set_crypto(&rtc->srtp_video_send, suite, buf);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to set crypto for video send\n");
> > + goto end;
> > + }
> > +
> > + ret = ff_srtp_set_crypto(&rtc->srtp_rtcp_send, suite, buf);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to set crypto for rtcp send\n");
> > + goto end;
> > + }
> > +
> > + /* Setup SRTP context for incoming packets */
> > + if (!av_base64_encode(buf, sizeof(buf), recv_key, sizeof(recv_key))) {
> > + av_log(s, AV_LOG_ERROR, "Failed to encode recv key\n");
> > + ret = AVERROR(EIO);
> > + goto end;
> > + }
> > +
> > + ret = ff_srtp_set_crypto(&rtc->srtp_recv, suite, buf);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to set crypto for recv\n");
> > + goto end;
> > + }
> > +
> > + av_log(s, AV_LOG_INFO, "WHIP: SRTP setup done, suite=%s, key=%luB\n",
> > suite, sizeof(send_key));
> > +
> > +end:
> > + return ret;
> > +}
> > +
> > +/**
> > + * Creates dedicated RTP muxers for each stream in the AVFormatContext to
> > build RTP
> > + * packets from the encoded frames.
> > + *
> > + * The corresponding SRTP context is utilized to encrypt each stream's
> > RTP packets. For
> > + * example, a video SRTP context is used for the video stream.
> > Additionally, the
> > + * "on_rtp_write_packet" callback function is set as the write function
> > for each RTP
> > + * muxer to send out encrypted RTP packets.
> > + *
> > + * @return 0 if OK, AVERROR_xxx on error
> > + */
> > +static int create_rtp_muxer(AVFormatContext *s)
> > +{
> > + int ret, i, is_video, buffer_size, max_packet_size;
> > + AVFormatContext *rtp_ctx = NULL;
> > + AVDictionary *opts = NULL;
> > + uint8_t *buffer = NULL;
> > + char buf[64];
> > + RTCContext *rtc = s->priv_data;
> > +
> > + const AVOutputFormat *rtp_format = av_guess_format("rtp", NULL, NULL);
> > + if (!rtp_format) {
> > + av_log(s, AV_LOG_ERROR, "Failed to guess rtp muxer\n");
> > + ret = AVERROR(ENOSYS);
> > + goto end;
> > + }
> > +
> > + /* The UDP buffer size, may greater than MTU. */
> > + buffer_size = MAX_UDP_BUFFER_SIZE;
> > + /* The RTP payload max size. Reserved some bytes for SRTP checksum
> > and padding. */
> > + max_packet_size = rtc->pkt_size - DTLS_SRTP_CHECKSUM_LEN;
> > +
> > + for (i = 0; i < s->nb_streams; i++) {
> > + rtp_ctx = avformat_alloc_context();
> > + if (!rtp_ctx) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > +
> > + rtp_ctx->oformat = rtp_format;
> > + if (!avformat_new_stream(rtp_ctx, NULL)) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > + /* Pass the interrupt callback on */
> > + rtp_ctx->interrupt_callback = s->interrupt_callback;
> > + /* Copy the max delay setting; the rtp muxer reads this. */
> > + rtp_ctx->max_delay = s->max_delay;
> > + /* Copy other stream parameters. */
> > + rtp_ctx->streams[0]->sample_aspect_ratio =
> > s->streams[i]->sample_aspect_ratio;
> > + rtp_ctx->flags |= s->flags & AVFMT_FLAG_BITEXACT;
> > + rtp_ctx->strict_std_compliance = s->strict_std_compliance;
> > +
> > + /* Set the synchronized start time. */
> > + rtp_ctx->start_time_realtime = s->start_time_realtime;
> > +
> > + avcodec_parameters_copy(rtp_ctx->streams[0]->codecpar,
> > s->streams[i]->codecpar);
> > + rtp_ctx->streams[0]->time_base = s->streams[i]->time_base;
> > +
> > + buffer = av_malloc(buffer_size);
> > + if (!buffer) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > +
> > + rtp_ctx->pb = avio_alloc_context(buffer, buffer_size, 1, s, NULL,
> > on_rtp_write_packet, NULL);
> > + if (!rtp_ctx->pb) {
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > + rtp_ctx->pb->max_packet_size = max_packet_size;
> > + rtp_ctx->pb->av_class = &ff_avio_class;
> > +
> > + is_video = s->streams[i]->codecpar->codec_type ==
> > AVMEDIA_TYPE_VIDEO;
> > + snprintf(buf, sizeof(buf), "%d", is_video?
> > rtc->video_payload_type : rtc->audio_payload_type);
> > + av_dict_set(&opts, "payload_type", buf, 0);
> > + snprintf(buf, sizeof(buf), "%d", is_video? rtc->video_ssrc :
> > rtc->audio_ssrc);
> > + av_dict_set(&opts, "ssrc", buf, 0);
> > +
> > + ret = avformat_write_header(rtp_ctx, &opts);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to write rtp header\n");
> > + goto end;
> > + }
> > +
> > + ff_format_set_url(rtp_ctx, av_strdup(s->url));
> > + s->streams[i]->time_base = rtp_ctx->streams[0]->time_base;
> > + s->streams[i]->priv_data = rtp_ctx;
> > + rtp_ctx = NULL;
> > + }
> > +
> > + av_log(s, AV_LOG_INFO, "WHIP: Create RTP muxer OK, buffer_size=%d,
> > max_packet_size=%d\n",
> > + buffer_size, max_packet_size);
> > +
> > +end:
> > + if (rtp_ctx)
> > + avio_context_free(&rtp_ctx->pb);
> > + avformat_free_context(rtp_ctx);
> > + av_dict_free(&opts);
> > + return ret;
> > +}
> > +
> > +/**
> > + * Callback triggered by the RTP muxer when it creates and sends out an
> > RTP packet.
> > + *
> > + * This function modifies the video STAP packet, removing the markers,
> > and updating the
> > + * NRI of the first NALU. Additionally, it uses the corresponding SRTP
> > context to encrypt
> > + * the RTP packet, where the video packet is handled by the video SRTP
> > context.
> > + */
> > +static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size)
> > +{
> > + int ret, cipher_size, is_rtcp, is_video;
> > + uint8_t payload_type, nalu_header;
> > + char cipher[MAX_UDP_BUFFER_SIZE];
> > + AVFormatContext *s = opaque;
> > + RTCContext *rtc = s->priv_data;
> > + struct SRTPContext *srtp;
> > +
> > + /* Ignore if not RTP or RTCP packet. */
> > + if (buf_size < 12 || (buf[0] & 0xC0) != 0x80)
> > + return 0;
> > +
> > + /* Only support audio, video and rtcp. */
> > + is_rtcp = buf[1] >= 192 && buf[1] <= 223;
> > + payload_type = buf[1] & 0x7f;
> > + is_video = payload_type == rtc->video_payload_type;
> > + if (!is_rtcp && payload_type != rtc->video_payload_type &&
> > payload_type != rtc->audio_payload_type)
> > + return 0;
> > +
> > + /**
> > + * For video, the STAP-A with SPS/PPS should:
> > + * 1. The marker bit should be 0, never be 1.
> > + * 2. The NRI should equal to the first NALU's.
> > + */
> > + if (is_video && buf_size > 12) {
> > + nalu_header = buf[12] & 0x1f;
> > + if (nalu_header == NALU_TYPE_STAP_A) {
> > + /* Reset the marker bit to 0. */
> > + if (buf[1] & 0x80)
> > + buf[1] &= 0x7f;
> > +
> > + /* Reset the NRI to the first NALU's NRI. */
> > + if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
> > + buf[12] = (buf[12]&0x80) | (buf[15]&0x60) |
> > (buf[12]&0x1f);
> > + }
> > + }
> > +
> > + /* Get the corresponding SRTP context. */
> > + srtp = is_rtcp ? &rtc->srtp_rtcp_send : (is_video?
> > &rtc->srtp_video_send : &rtc->srtp_audio_send);
> > +
> > + /* Encrypt by SRTP and send out. */
> > + cipher_size = ff_srtp_encrypt(srtp, buf, buf_size, cipher,
> > sizeof(cipher));
> > + if (cipher_size <= 0 || cipher_size < buf_size) {
> > + av_log(s, AV_LOG_WARNING, "Failed to encrypt packet=%dB,
> > cipher=%dB\n", buf_size, cipher_size);
> > + return 0;
> > + }
> > +
> > + ret = ffurl_write(rtc->udp_uc, cipher, cipher_size);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to write packet=%dB, ret=%d\n",
> > cipher_size, ret);
> > + return ret;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * Inserts the SPS/PPS data before each IDR (Instantaneous Decoder
> > Refresh) frame.
> > + *
> > + * The SPS/PPS is parsed from the extradata. If it's in ISOM format, the
> > SPS/PPS is
> > + * multiplexed to the data field of the packet. If it's in annexb format,
> > then the entire
> > + * extradata is set to the data field of the packet.
> > + */
> > +static int insert_sps_pps_packet(AVFormatContext *s, AVPacket *pkt)
> > +{
> > + int ret, is_idr, size, i;
> > + uint8_t *p;
> > + AVPacket* extra = NULL;
> > + AVStream *st = s->streams[pkt->stream_index];
> > + AVFormatContext *rtp_ctx = st->priv_data;
> > + RTCContext *rtc = s->priv_data;
> > +
> > + is_idr = (pkt->flags & AV_PKT_FLAG_KEY) && st->codecpar->codec_type
> > == AVMEDIA_TYPE_VIDEO;
> > + if (!is_idr || !st->codecpar->extradata)
> > + return 0;
> > +
> > + extra = av_packet_alloc();
> > + if (!extra)
> > + return AVERROR(ENOMEM);
> > +
> > + size = !rtc->avc_nal_length_size ? st->codecpar->extradata_size :
> > + rtc->avc_nal_length_size * 2 + rtc->avc_sps_size +
> > rtc->avc_pps_size;
> > + ret = av_new_packet(extra, size);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to allocate extra packet\n");
> > + goto end;
> > + }
> > +
> > + /* Encode SPS/PPS in annexb format. */
> > + if (!rtc->avc_nal_length_size) {
> > + memcpy(extra->data, st->codecpar->extradata, size);
> > + } else {
> > + /* Encode SPS/PPS in ISOM format. */
> > + p = extra->data;
> > + for (i = 0; i < rtc->avc_nal_length_size; i++) {
> > + *p++ = rtc->avc_sps_size >> (8 * (rtc->avc_nal_length_size -
> > i - 1));
> > + }
> > + memcpy(p, rtc->avc_sps, rtc->avc_sps_size);
> > + p += rtc->avc_sps_size;
> > +
> > + /* Encode PPS in ISOM format. */
> > + for (i = 0; i < rtc->avc_nal_length_size; i++) {
> > + *p++ = rtc->avc_pps_size >> (8 * (rtc->avc_nal_length_size -
> > i - 1));
> > + }
> > + memcpy(p, rtc->avc_pps, rtc->avc_pps_size);
> > + p += rtc->avc_pps_size;
> > + }
> > +
> > + /* Setup packet and feed it to chain. */
> > + extra->pts = pkt->pts;
> > + extra->dts = pkt->dts;
> > + extra->stream_index = pkt->stream_index;
> > + extra->time_base = pkt->time_base;
> > +
> > + ret = ff_write_chained(rtp_ctx, 0, extra, s, 0);
> > + if (ret < 0)
> > + goto end;
> > +
> > +end:
> > + av_packet_free(&extra);
> > + return ret;
> > +}
> > +
> > +/**
> > + * RTC is connectionless, for it's based on UDP, so it check whether
> > sesison is
> > + * timeout. In such case, publishers can't republish the stream util the
> > session
> > + * is timeout.
> > + * This function is called to notify the server that the stream is ended,
> > server
> > + * should expire and close the session immediately, so that publishers
> > can republish
> > + * the stream quickly.
> > + */
> > +static int whip_dispose(AVFormatContext *s)
> > +{
> > + int ret;
> > + char buf[MAX_URL_SIZE];
> > + URLContext *whip_uc = NULL;
> > + RTCContext *rtc = s->priv_data;
> > +
> > + if (!rtc->whip_resource_url)
> > + return 0;
> > +
> > + ret = ffurl_alloc(&whip_uc, rtc->whip_resource_url,
> > AVIO_FLAG_READ_WRITE, &s->interrupt_callback);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to alloc WHIP delete context:
> > %s\n", s->url);
> > + goto end;
> > + }
> > +
> > + av_opt_set(whip_uc->priv_data, "chunked_post", "0", 0);
> > + av_opt_set(whip_uc->priv_data, "method", "DELETE", 0);
> > + ret = ffurl_connect(whip_uc, NULL);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to DELETE url=%s\n",
> > rtc->whip_resource_url);
> > + goto end;
> > + }
> > +
> > + while (1) {
> > + ret = ffurl_read(whip_uc, buf, sizeof(buf));
> > + if (ret == AVERROR_EOF) {
> > + ret = 0;
> > + break;
> > + }
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to read response from DELETE
> > url=%s\n", rtc->whip_resource_url);
> > + goto end;
> > + }
> > + }
> > +
> > + av_log(s, AV_LOG_INFO, "WHIP: Dispose resource %s ok\n",
> > rtc->whip_resource_url);
> > +
> > +end:
> > + ffurl_closep(&whip_uc);
> > + return ret;
> > +}
> > +
> > +static av_cold int rtc_init(AVFormatContext *s)
> > +{
> > + int ret;
> > + RTCContext *rtc = s->priv_data;
> > +
> > + if ((ret = whip_init(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = parse_codec(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = generate_sdp_offer(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = exchange_sdp(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = parse_answer(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = ice_handshake(s)) < 0)
> > + return ret;
> > +
> > + /* Now UDP URL context is ready, setup the DTLS transport. */
> > + rtc->dtls_ctx.udp_uc = rtc->udp_uc;
> > +
> > + if ((ret = dtls_context_handshake(&rtc->dtls_ctx)) < 0)
> > + return ret;
> > +
> > + if ((ret = setup_srtp(s)) < 0)
> > + return ret;
> > +
> > + if ((ret = create_rtp_muxer(s)) < 0)
> > + return ret;
> > +
> > + return ret;
> > +}
> > +
> > +static int rtc_write_packet(AVFormatContext *s, AVPacket *pkt)
> > +{
> > + int ret;
> > + RTCContext *rtc = s->priv_data;
> > + AVStream *st = s->streams[pkt->stream_index];
> > + AVFormatContext *rtp_ctx = st->priv_data;
> > +
> > + /* TODO: Send binding request every 1s as WebRTC heartbeat. */
> > + /* TODO: Receive packets from the server such as ICE binding
> > requests, DTLS messages,
> > + * and RTCP like PLI requests, then respond to them.*/
> > +
> > + /* For audio OPUS stream, correct the timestamp. */
> > + if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> > + pkt->dts = pkt->pts = rtc->audio_jitter_base;
> > + rtc->audio_jitter_base += 960;
> > + }
> >
> 
> Correct how?
Winlin: Occasionally, the timestamp of OPUS audio may be inaccurate, 
particularly when the input is an MP4 file.

Summary of the issues:

1. When playing an MP4 file (264+Opus) directly in Chrome, there are no audio 
issues.
2. When copying the MP4 file using `-c copy` and pushing it to WHIP, playing it 
in Chrome results in audio noise issues.
3. When copying the MP4 file using `-re -c copy` and pushing it to WHIP, there 
are no audio issues.
4. After re-encoding the MP4 file with `-acodec libopus`, pushing it to WHIP, 
and playing it in Chrome, there are no audio issues.
5. When copying an MKV file using `-c copy` and pushing it to WHIP, playing it 
in Chrome results in no audio issues.

For opus 48khz, each frame is 20ms which is 48000*20/1000 = 960. It appears 
that there is a bug introduced by libopus regarding the timestamp. Instead of 
being exactly 960, there is a slight deviation, such as 956 or 970. This 
deviation can cause Chrome to play the audio stream with noise.

To replicate the issue, start by transcoding the 
[file](https://github.com/ossrs/srs/blob/develop/trunk/doc/source.200kbps.768x320.flv)
 into an MP4 format then publish by WHIP muxer:

```bash
ffmpeg -i ~/git/srs/trunk/doc/source.200kbps.768x320.flv \
  -vcodec libx264 -profile:v baseline -r 25 -g 50 \
  -acodec libopus -ar 48000 -ac 2 \
  -y t.mp4

ffmpeg -re -i t.mp4 -c copy \
  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

When adding `-re`, the timestamp is accurate, and there is no such issue:

```bash
ffmpeg -re -i ~/git/srs/trunk/doc/source.200kbps.768x320.flv \
  -vcodec libx264 -profile:v baseline -r 25 -g 50 \
  -acodec libopus -ar 48000 -ac 2 \
  -y t.mp4

ffmpeg -re -i t.mp4 -c copy \
  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

The process also functions smoothly without any issues when converting to MKV 
format and then publishing as a WHIP stream:

```bash
ffmpeg -i ~/git/srs/trunk/doc/source.200kbps.768x320.flv \
  -vcodec libx264 -profile:v baseline -r 25 -g 50 \
  -acodec libopus -ar 48000 -ac 2
  -y t.mkv

ffmpeg -re -i t.mkv -c copy \
  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

Nonetheless, when directly transcoding and publishing through the WHIP muxer, 
this problem is absent, and the difference in audio pts consistently remains at 
960:

```bash
~/git/FFmpeg/ffmpeg_g -re -i ~/git/srs/trunk/doc/source.200kbps.768x320.flv \
  -vcodec libx264 -profile:v baseline -r 25 -g 50 \
  -acodec libopus -ar 48000 -ac 2 \
  -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

Although we are unsure of the root cause, I have included a "TODO" comment 
regarding this issue. We need to conduct further research and remove this line.

```c
/**
 * TODO: FIXME: There is an issue with the timestamp of OPUS audio, especially 
when
 *  the input is an MP4 file. The timestamp deviates from the expected value of 
960,
 *  causing Chrome to play the audio stream with noise. This problem can be 
replicated
 *  by transcoding a specific file into MP4 format and publishing it using the 
WHIP
 *  muxer. However, when directly transcoding and publishing through the WHIP 
muxer,
 *  the issue is not present, and the audio timestamp remains consistent. The 
root
 *  cause is still unknown, and this comment has been added to address this 
issue
 *  in the future. Further research is needed to resolve the problem.
 */
static int parse_codec(AVFormatContext *s)
```

The log indicates that the pts does not equal 960, but rather has a deviation:

```text
>>> audio diff=960
>>> audio diff=954
>>> audio diff=973
>>> audio diff=960
>>> audio diff=960
>>> audio diff=960
>>> audio diff=952
```

See https://github.com/winlinvip/ffmpeg-webrtc/commit/961c95cc4f

Since this problem is unrelated to WHIP and appears to be a bug when converting 
FLV to MP4 without using the -re option, resulting in a mismatch between the 
timestamp and OPUS frame size that causes audio distortion, we will not fix 
this issue.
> 
> 
> 
> + ret = insert_sps_pps_packet(s, pkt);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to insert SPS/PPS packet\n");
> > + return ret;
> > + }
> > +
> > + ret = ff_write_chained(rtp_ctx, 0, pkt, s, 0);
> > + if (ret < 0) {
> > + if (ret == AVERROR(EINVAL)) {
> > + av_log(s, AV_LOG_WARNING, "Ignore failed to write packet=%dB,
> > ret=%d\n", pkt->size, ret);
> > + ret = 0;
> > + } else
> > + av_log(s, AV_LOG_ERROR, "Failed to write packet, size=%d\n",
> > pkt->size);
> > + return ret;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static av_cold void rtc_deinit(AVFormatContext *s)
> > +{
> > + int i, ret;
> > + RTCContext *rtc = s->priv_data;
> > +
> > + ret = whip_dispose(s);
> > + if (ret < 0)
> > + av_log(s, AV_LOG_WARNING, "Failed to dispose resource, ret=%d\n",
> > ret);
> > +
> > + for (i = 0; i < s->nb_streams; i++) {
> > + AVFormatContext* rtp_ctx = s->streams[i]->priv_data;
> > + if (!rtp_ctx)
> > + continue;
> > +
> > + av_write_trailer(rtp_ctx);
> > + avio_context_free(&rtp_ctx->pb);
> > + avformat_free_context(rtp_ctx);
> > + s->streams[i]->priv_data = NULL;
> > + }
> > +
> > + av_freep(&rtc->avc_sps);
> > + av_freep(&rtc->avc_pps);
> > + av_freep(&rtc->sdp_offer);
> > + av_freep(&rtc->sdp_answer);
> > + av_freep(&rtc->whip_resource_url);
> > + av_freep(&rtc->ice_ufrag_remote);
> > + av_freep(&rtc->ice_pwd_remote);
> > + av_freep(&rtc->ice_protocol);
> > + av_freep(&rtc->ice_host);
> > + ffurl_closep(&rtc->udp_uc);
> > + ff_srtp_free(&rtc->srtp_audio_send);
> > + ff_srtp_free(&rtc->srtp_video_send);
> > + ff_srtp_free(&rtc->srtp_rtcp_send);
> > + ff_srtp_free(&rtc->srtp_recv);
> > + dtls_context_deinit(&rtc->dtls_ctx);
> > +}
> > +
> > +#define OFFSET(x) offsetof(RTCContext, x)
> > +#define DEC AV_OPT_FLAG_DECODING_PARAM
> > +static const AVOption options[] = {
> > + { "ice_arq_max", "Maximum number of retransmissions for the
> > ICE ARQ mechanism", OFFSET(ice_arq_max), AV_OPT_TYPE_INT, {
> > .i64 = 5 }, -1, INT_MAX, DEC },
> > + { "ice_arq_timeout", "Start timeout in milliseconds for the ICE
> > ARQ mechanism", OFFSET(ice_arq_timeout), AV_OPT_TYPE_INT, {
> > .i64 = 30 }, -1, INT_MAX, DEC },
> > + { "dtls_arq_max", "Maximum number of retransmissions for the
> > DTLS ARQ mechanism", OFFSET(dtls_arq_max), AV_OPT_TYPE_INT, {
> > .i64 = 5 }, -1, INT_MAX, DEC },
> > + { "dtls_arq_timeout", "Start timeout in milliseconds for the DTLS
> > ARQ mechanism", OFFSET(dtls_arq_timeout), AV_OPT_TYPE_INT, {
> > .i64 = 50 }, -1, INT_MAX, DEC },
> > + { "pkt_size", "The maximum size, in bytes, of RTP packets
> > that send out", OFFSET(pkt_size), AV_OPT_TYPE_INT, {
> > .i64 = 1500 }, -1, INT_MAX, DEC },
> > + { NULL },
> > +};
> > +
> > +static const AVClass rtc_muxer_class = {
> > + .class_name = "WebRTC muxer",
> > + .item_name = av_default_item_name,
> > + .option = options,
> > + .version = LIBAVUTIL_VERSION_INT,
> > +};
> > +
> > +const FFOutputFormat ff_rtc_muxer = {
> > + .p.name = "rtc",
> > + .p.long_name = NULL_IF_CONFIG_SMALL("WHIP WebRTC muxer"),
> > + .p.audio_codec = AV_CODEC_ID_OPUS,
> > + .p.video_codec = AV_CODEC_ID_H264,
> > + .p.flags = AVFMT_GLOBALHEADER | AVFMT_NOFILE,
> > + .p.priv_class = &rtc_muxer_class,
> > + .priv_data_size = sizeof(RTCContext),
> > + .init = rtc_init,
> > + .write_packet = rtc_write_packet,
> > + .deinit = rtc_deinit,
> > +};
> > --
> > 2.40.0
> >
> 
> This patchset is massive and unmaintainable. Not to mention only implements
> a small portion of webrtc, that will inevitably get patched to add more
> features and become a giant sprawl.
Winlin: If we aim to support all the full features of WebRTC, the patch will 
become a sprawling giant and difficult to maintain. However, I believe that 
WHIP is only a small subset of WebRTC, as there are some usefull features that 
we do not need to support. For example:

* P2P and ICE will require the inclusion of numerous transport algorithms and 
libraries. However, we only need to support ICE Lite, which functions as a UDP 
client and server, similar to other protocols like SRT, RTMP, or RTSP that are 
supported by FFmpeg.
* Simulcast and SVC are also significant gitant features, and I do not 
recommend supporting them immediately. These advanced features are not 
necessary, and they are not commonly used in WebRTC scenarios.
* Congestion control and algorithms, such as ARQ and FEC, can be 
straightforward if we only support NACK for ARQ and simple FEC like RED. 
However, supporting GCC or RSFEC, etc., can be complex. I also suggest not 
supporting them immediately because, in some use scenarios, the network 
conditions are good.
* Signaling and room management are not necessary to touch as WHIP is a 
straightforward signaling method for WebRTC. FFmpeg only needs to publish a 
WebRTC stream for use cases such as robot control or similar scenarios. 
Therefore, there is no need to introduce rooms or multiple streams in a WebRTC 
session.
* Audio processing, such as 3A, is not required for FFmpeg as it is a feature 
for players like Chrome.

This patch is simple not only because WHIP is a subset of WebRTC, but also 
because it reuses many of FFmpeg's powerful capabilities, such as codec and 
format, SRTP, HTTP, HTTPS, OpenSSL, and more.
> 
> 
> 
> Kieran
> 
> 
> 


On Wed May 31 23:46:07 EEST 2023, Derek Buitenhuis wrote
> 
> Hello, quick skim below, followe by some waffling at the end.
> 
> I have not bothere to mention any code style nits, to keep it technical.
> 
> On 5/29/2023 12:50 PM, Steven Liu wrote:
> > @@ -3483,6 +3483,7 @@ ogg_demuxer_select="dirac_parse"
> > ogv_muxer_select="ogg_muxer"
> > opus_muxer_select="ogg_muxer"
> > psp_muxer_select="mov_muxer"
> > +rtc_muxer_deps_any="openssl"
> 
> Given we support a bunch of SSL libraries, how exactly does this work with 
> that? Can
> it be abstracted out? OpenSSL poses licensing issues for many.
> 
Winlin: Currently, we only support OpenSSL, but after merging this patch, we 
intend to add support for other SSL libraries. This is because including 
support for all SSL libraries in a single patch would make it overly complex 
and difficult for others to follow. Is this plan acceptable, or do we need to 
support all SSL libraries within this patch?
Add a TODO to configure.

```bash
+# TODO: Support libtls, mbedtls, and gnutls.
 rtc_muxer_deps_any="openssl"
```

> Further, I think this is missing a bunch of dependencies like HTTP and AVC?
Winlin: In the Makefile, we add dependencies at this line: 

```
OBJS-$(CONFIG_RTC_MUXER) += rtcenc.o avc.o http.o srtp.o
```

I'm not certain if we should also include avc and http as dependency libraries 
like this: 

```
rtc_muxer_deps_any="openssl avc http"
```
> 
> 
> > +#ifndef CONFIG_OPENSSL
> > +#error "DTLS is not supported, please enable openssl"
> > +#else
> > +#include <openssl/ssl.h>
> > +#include <openssl/err.h>
> > +#if OPENSSL_VERSION_NUMBER < 0x1010102fL
> > +#error "OpenSSL version 1.1.1b or newer is required"
> > +#endif
> > +#endif
> 
> This should be in configure.
Winlin: I elliminated the lines of code that used macros for checking, and 
instead, I only perform the check in the configure script like this:

```bash
enabled openssl            && {
    enabled rtc_muxer && {
        $pkg_config --exists --print-errors "openssl >= 1.0.1k" ||
        require_pkg_config openssl "openssl >= 1.0.1k" openssl/ssl.h 
SSL_library_init ||
        require_pkg_config openssl "openssl >= 1.0.1k" openssl/ssl.h 
OPENSSL_init_ssl
    }
}
```

Fixed.
> 
> 
> 
> > +/*
> > + * Supported DTLS cipher suites for FFmpeg as a DTLS client.
> > + * These cipher suites are used to negotiate with DTLS servers.
> > + *
> > + * It is advisable to use a limited number of cipher suites to reduce
> > + * the size of DTLS UDP packets.
> > + */
> > +#define DTLS_CIPHER_SUTES "ECDHE-ECDSA-AES128-GCM-SHA256"\
> > + ":ECDHE-RSA-AES128-GCM-SHA256"\
> > + ":ECDHE-ECDSA-AES128-SHA"\
> > + ":ECDHE-RSA-AES128-SHA"\
> > + ":ECDHE-ECDSA-AES256-SHA"\
> > + ":ECDHE-RSA-AES256-SHA"
> 
> The reason for this specific subset should be documented.
Winlin: Resolved by employing "ALL" cipher suite to attain the maximum 
compatibility.

```c
    /**
     * We activate "ALL" cipher suites to align with the peer's capabilities,
     * ensuring maximum compatibility.
     */
    if (SSL_CTX_set_cipher_list(dtls_ctx, "ALL") != 1) {
```

Fixed.
> 
> 
> 
> > +/**
> > + * STAP-A stands for Single-Time Aggregation Packet.
> > + * The NALU type for STAP-A is 24 (0x18).
> > + */
> > +#define NALU_TYPE_STAP_A 24
> 
> Why is a NALU type being defined in a protocol implementation file?
Winlin: There might be an issue in rtpenc.c, where the RTP marker for STAP-A 
should be set to 0, but it is currently set to 1. This problem leads to 
Chrome's inability to decode the video stream. It is important to note that 
STAP-A includes SPS/PPS. Therefore, we need to define this macro to parse the 
RTP packet and fix the issue.

```c
    /**
     * For video, the STAP-A with SPS/PPS should:
     * 1. The marker bit should be 0, never be 1.
     * 2. The NRI should equal to the first NALU's.
     * TODO: FIXME: Should fix it in rtpenc.c
     */
    if (is_video && buf_size > 12) {
        nalu_header = buf[12] & 0x1f;
        if (nalu_header == NALU_TYPE_STAP_A) {
            /* Reset the marker bit to 0. */
            if (buf[1] & 0x80)
                buf[1] &= 0x7f;

            /* Reset the NRI to the first NALU's NRI. */
            if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
                buf[12] = (buf[12]&0x80) | (buf[15]&0x60) | (buf[12]&0x1f);
        }
    }
```

After utilizing BSF, the marker is correctly set to false, so there is no need 
to reset it. Regarding the NRI, it appears to be functioning properly and does 
not require resetting at this time. Thus, I eliminated this code and macro.
> 
> 
> 
> > +/**
> > + * Wait for a small timeout in milliseconds to allow for the server to 
> > process
> > + * the Interactive Connectivity Establishment (ICE) request. If we 
> > immediately
> > + * read the response after sending the request, we may receive nothing and 
> > need
> > + * to immediately retry. To lessen the likelihood of retries, we can send 
> > the
> > + * request and wait for a small amount of time for the server to process it
> > + * before reading the response.
> > + */
> > +#define ICE_PROCESSING_TIMEOUT 10
> 
> Are we really hardcoding sleeps? Surely a better way exists. What do other 
> implementations
> do here?
Winlin: We have improved the ICE and DTLS code, and removed this macro as it is 
no longer necessary. Currently, only a handshake_timeout option is utilized, 
with a default value of 5 seconds, to restrict the ICE and DTLS timeout 
duration.
> 
> > +/**
> > + * Wait for a short timeout in milliseconds to allow the server to process
> > + * the Datagram Transport Layer Security (DTLS) request. If we immediately
> > + * read the response after sending the request, we may receive nothing and
> > + * need to immediately retry. To reduce the likelihood of retries, we can
> > + * send the request and wait a short amount of time for the server to
> > + * process it before attempting to read the response.
> > + */
> > +#define DTLS_PROCESSING_TIMEOUT 30
> 
> Same comment as above.
> 
> > +/**
> > + * The maximum number of retries for Datagram Transport Layer Security 
> > (DTLS) EAGAIN errors.
> > + * When we send a DTLS request and receive no response, we may encounter 
> > an EAGAIN error.
> > + * In this situation, we wait briefly and attempt to read the response 
> > again.
> > + * We limit the maximum number of times we retry this loop.
> > + */
> > +#define DTLS_EAGAIN_RETRIES_MAX 5
> 
> Should this be an option?
Winlin: We have improved the ICE and DTLS code, and removed this macro as it is 
no longer necessary. Currently, only a handshake_timeout option is utilized, 
with a default value of 5 seconds, to restrict the ICE and DTLS timeout 
duration.
> 
> > +/**
> > + * The DTLS timer's base timeout in microseconds. Its purpose is to 
> > minimize the unnecessary
> > + * retransmission of ClientHello.
> > + */
> > +#define DTLS_SSL_TIMER_BASE 400 * 1000
> 
> Is this from a spec or arbitrary?
Winlin: We have improved the ICE and DTLS code, and removed this macro as it is 
no longer necessary. Currently, only a handshake_timeout option is utilized, 
with a default value of 5 seconds, to restrict the ICE and DTLS timeout 
duration.
> 
> > +typedef struct DTLSContext {
> > + /* For av_log to write log to this category. */
> > + void *log_avcl;
> 
> Why is this void *?
Winlin: The `log_avcl` is utilized for `av_log(void* avcl)`, where the first 
argument is of `void*` type. We have change the `void*` to `AVClass*`.

```c
    RTCContext *rtc = s->priv_data;
    rtc->dtls_ctx.av_class = rtc->av_class;

    DTLSContext *ctx;
    av_log(ctx, AV_LOG_ERROR, "DTLS: Unable to generate private key, 
EC_KEY_set_group failure\n");
```

Jack Lau: In latest code, the log_avcl is removed, because DTLS has its own 
class.
> 
> 
> 
> > + /**
> > + * This represents the material used to build the SRTP master key. It is
> > + * generated by DTLS and has the following layout:
> > + * 16B 16B 14B 14B
> > + * client_key | server_key | client_salt | server_salt
> > + */
> > + uint8_t dtls_srtp_material[DTLS_SRTP_MASTER_KEY_LEN * 2];
> 
> *2?
Winlin: The dtls_srtp_materials consist of pairs of keys and salts, with each 
pair containing a key and a salt, such as the client key and salt. The layout 
is as follows:

```
     16B         16B         14B             14B
client_key | server_key | client_salt | server_salt
```

That's the reason for multiplying it by 2. However, we have refined it to make 
it clear.

```c
// This material consists of a key of 16 bytes and a salt of 14 bytes.
#define DTLS_SRTP_MASTER_KEY_LEN 30
uint8_t dtls_srtp_materials[DTLS_SRTP_MASTER_KEY_LEN];
```

Refined to:

```c
#define DTLS_SRTP_KEY_LEN 16
#define DTLS_SRTP_SALT_LEN 14
/**
 * This represents the material used to build the SRTP master key. It is
 * generated by DTLS and has the following layout:
 *          16B         16B         14B             14B
 *      client_key | server_key | client_salt | server_salt
 */
uint8_t dtls_srtp_materials[(DTLS_SRTP_KEY_LEN + DTLS_SRTP_SALT_LEN) * 2];
```

Fixed.
> 
> 
> 
> > +/**
> > + * Generate a self-signed certificate and private key for DTLS.
> > + */
> 
> Does WHIP allow use of arbitrary certs (i.e. non-self-signed)?
Winlin: Supported both self-signed and non-self-signed certificates now.
> 
> > +static av_cold int dtls_context_init(DTLSContext *ctx)
> > +{
> > + int ret = 0, serial, expire_day, i, n = 0;
> > + AVBPrint fingerprint;
> > + unsigned char md[EVP_MAX_MD_SIZE];
> > + const char *aor = "ffmpeg.org", *curve = NULL;
> 
> I am not sure the the FFmpeg community wants to advertise itself as a CN.
> 
> Mayb lavf?
Changed the CommonName from ffmpeg.org to lavf.
> 
> > + curve = "prime256v1";
> 
> For this and all other occurrences: They should be intialized at declaration
> time, like the rest of the codebase.
Fixed
> 
> 
> > + ctx->dtls_pkey = dtls_pkey = EVP_EC_gen(curve);
> > + if (!dtls_pkey) {
> > + av_log(s1, AV_LOG_ERROR, "DTLS: EVP_EC_gen curve=%s failed\n", curve);
> 
> For this and all ther logs: Something more descriptive than "function_name 
> failed:"
> should be used.
Winlin: I have included additional information in the log message, beyond just 
the failed function name.
> 
> > + serial = (int)av_get_random_seed();
> 
> For all uses of av_get_random_seed: Why are we using a seed and not a PRNG?
Got it! Replace av_get_random_seed by AVLFG.
> 
> > + if (!av_bprint_is_complete(&fingerprint)) {
> > + av_log(s1, AV_LOG_ERROR, "Fingerprint %d exceed max %d, %s\n", ret, 
> > MAX_SDP_SIZE, fingerprint.str);
> > + ret = AVERROR(EIO);
> > + goto end;
> > + }
> 
> Is this actually possible to trigger?
Winlin: Remove this unnecessary code.
> 
> > + if (!fingerprint.str || !strlen(fingerprint.str)) {
> > + av_log(s1, AV_LOG_ERROR, "Fingerprint is empty\n");
> > + ret = AVERROR(EINVAL);
> > + goto end;
> > + }
> 
> Same as above.
> 
> > + /* never exceed the max timeout. */
> > + timeout_us = FFMIN(timeout_us, 30 * 1000 * 1000); /* in us */
> 
> Is this max based on anything?
Winlin: Remove this unnecessary code.
> 
> > + /* Change_cipher_spec(20), alert(21), handshake(22), application_data(23) 
> > */
> > + if (length >= 1)
> > + content_type = (uint8_t)data[0];
> > + if (length >= 13)
> > + size = (uint16_t)(data[11])<<8 | (uint16_t)data[12];
> 
> nit: AV_WB16?
Winlin: Use AV_RB16 to parse the data.
```
size = (uint16_t)(data[11])<<8 | (uint16_t)data[12];
// Change to:
size = AV_RB16(&data[11]);
```

Fixed.
> 
> 
> 
> > + if (length >= 14)
> > + handshake_type = (uint8_t)data[13];
> 
> Casting uint8_t to uint8_t? Above, also.
Winlin: Use AV_RB instead.
> 
> > + av_log(s1, AV_LOG_VERBOSE, "WHIP: DTLS state %s %s, done=%u, arq=%u, 
> > r0=%d, r1=%d, len=%u, cnt=%u, size=%u, hs=%u\n",
> > + "Active", (incoming? "RECV":"SEND"), ctx->dtls_done_for_us, 
> > ctx->dtls_arq_packets, r0, r1, length,
> > + content_type, size, handshake_type);
> 
> Does the use of this trace callback affec perf in any way? I have to assume 
> not,
> but thought I would check.
Winlin: No, it doesn't negatively impact performance, as the DTLS handshake 
consists of only 4 messages in 2 roundtrips. It displays important state 
changes in DTLS with only a minimal number of messages.
Jack Lau: this callback was removed already because the previous implementation 
had too high coupling, and since dtls has been abstracted away, reimplementing 
it would require more work, which I felt was unnecessary.
> 
> > + /* For ECDSA, we could set the curves list. */
> > + if (SSL_CTX_set1_curves_list(dtls_ctx, "P-521:P-384:P-256") != 1) {
> 
> Where did this list come from?
Winlin: In the test cases include additional curves, such as "X25519", "P-256", 
"P-384", and "P-521" for evaluation.

```
// webrtc-checkout/src/rtc_base/ssl_adapter_unittest.cc
// Test transfer with TLS Elliptic curves set to "X25519:P-256:P-384:P-521"
TEST_F(SSLAdapterTestTLS_ECDSA, TestTLSEllipticCurves) {
```

It is possible to include these additional curve lists, as they will be 
negotiated during the DTLS handshake process. It is important to note that 
X25519 is only supported by OpenSSL version 1.1.0 and later versions.

```c
    /* Refer to the test cases regarding these curves in the WebRTC code. */
#if OPENSSL_VERSION_NUMBER >= 0x10100000L /* OpenSSL 1.1.0 */
    const char* curves = "X25519:P-256:P-384:P-521";
#elif OPENSSL_VERSION_NUMBER >= 0x10002000L /* OpenSSL 1.0.2 */
    const char* curves = "P-256:P-384:P-521";
#endif
```

Fixed.
> 
> 
> 
> 
> > + /* Only support SRTP_AES128_CM_SHA1_80, please read ssl/d1_srtp.c *> + if 
> > (SSL_CTX_set_tlsext_use_srtp(dtls_ctx, "SRTP_AES128_CM_SHA1_80")) {
> 
> What is ssl/d1_srtp.c? It is not from FFmpeg.
Winlin: Refined the comments.

```c
    /**
     * The profile for OpenSSL's SRTP is SRTP_AES128_CM_SHA1_80, see 
ssl/d1_srtp.c.
     * The profile for FFmpeg's SRTP is SRTP_AES128_CM_HMAC_SHA1_80, see 
libavformat/srtp.c.
     */
    const char* profiles = "SRTP_AES128_CM_SHA1_80";
```

Fixed.
> 
> 
> 
> > + /* Wait so that the server can process the request and no need ARQ then. 
> > */
> > +#if DTLS_PROCESSING_TIMEOUT > 0
> > + av_usleep(DTLS_PROCESSING_TIMEOUT * 10000);
> > +#endif
> 
> This #if is a no-op.
Winlin: We have improved the ICE and DTLS code, and removed this code as it is 
no longer necessary. Currently, only a handshake_timeout option is utilized, 
with a default value of 5 seconds, to restrict the ICE and DTLS timeout 
duration.
> 
> > +
> > + for (j = 0; j <= DTLS_EAGAIN_RETRIES_MAX && !res_size; j++) {
> 
> Does this not make DTLS_EAGAIN_RETRIES_MAX a misnomer (due to use of <=)?
Winlin: We have improved the ICE and DTLS code, and removed this code as it is 
no longer necessary. Currently, only a handshake_timeout option is utilized, 
with a default value of 5 seconds, to restrict the ICE and DTLS timeout 
duration.
> 
> > + /* Ignore other packets, such as ICE indication, except DTLS. */
> > + if (ret > 0 && (ret < 13 || buf[0] <= 19 || buf[0] >= 64))
> > + continue;
> 
> Magic numbers?
Winlin: Replace the magic number to a macro and extract to a function 
is_dtls_packet.

```c
/**
 * The DTLS content type.
 * See https://tools.ietf.org/html/rfc2246#section-6.2.1
 * change_cipher_spec(20), alert(21), handshake(22), application_data(23)
 */
#define DTLS_CONTENT_TYPE_CHANGE_CIPHER_SPEC 20

/**
 * The DTLS record layer header has a total size of 13 bytes, consisting of
 * ContentType (1 byte), ProtocolVersion (2 bytes), Epoch (2 bytes),
 * SequenceNumber (6 bytes), and Length (2 bytes).
 * See https://datatracker.ietf.org/doc/html/rfc9147#section-4
 */
#define DTLS_RECORD_LAYER_HEADER_LEN 13

/**
 * The DTLS version number, which is 0xfeff for DTLS 1.0, or 0xfefd for DTLS 
1.2.
 * See https://datatracker.ietf.org/doc/html/rfc9147#name-the-dtls-record-layer
 */
#define DTLS_VERSION_10 0xfeff
#define DTLS_VERSION_12 0xfefd

/**
 * Whether the packet is a DTLS packet.
 */
static int is_dtls_packet(char *b, int size) {
    uint16_t version = AV_RB16(&b[1]);
    return size > DTLS_RECORD_LAYER_HEADER_LEN &&
        b[0] >= DTLS_CONTENT_TYPE_CHANGE_CIPHER_SPEC &&
        (version == DTLS_VERSION_10 || version == DTLS_VERSION_12);
}
```

Fixed.

> 
> > + if (!ctx->udp_uc) {
> > + av_log(s1, AV_LOG_ERROR, "DTLS: No UDP context\n");
> 
> Kind of odd this would be possible?
Winlin: Eliminate the code that is not used.
> 
> > + av_log(s1, AV_LOG_INFO, "WHIP: DTLS handshake done=%d, arq=%d, 
> > srtp_material=%luB, cost=%dms\n",
> > + ctx->dtls_done_for_us, ctx->dtls_arq_packets, 
> > sizeof(ctx->dtls_srtp_material),
> > + (int)(av_gettime() - starttime) / 1000);
> 
> Bit verbose (here and other uses of INFO)?
> 
> Why are we logging the cost/time diff?
> 
> Also we do not use %lu, I think.
Winlin: I have altered all log levels from INFO to VERBOSE, except for the 
final one, which gathers the time cost for each step. Since there are numerous 
steps that may consume a significant amount of time, potentially causing a slow 
handshake, it is crucial to track and optimize them. For instance, the 
following log displays a total cost of 47ms:

```text
WHIP: Muxer state=10, buffer_size=4096, max_packet_size=1184,
elapsed=47ms(init:7,offer:0,answer:7,udp:0,ice:6,dtls:25,srtp:0)
```
> 
> 
> > + /* The SPS/PPS of AVC video */
> > + uint8_t *avc_sps;
> > + int avc_sps_size;
> > + uint8_t *avc_pps;
> > + int avc_pps_size;
> > + /* The size of NALU in ISOM format. */
> > + int avc_nal_length_size;
> > +
> 
> Will get into this in my waffling at the end...
> 
> > + /* The SRTP send context, to encrypt outgoing packets. */
> > + struct SRTPContext srtp_audio_send;
> > + struct SRTPContext srtp_video_send;
> > + struct SRTPContext srtp_rtcp_send;
> > + /* The SRTP receive context, to decrypt incoming packets. */
> > + struct SRTPContext srtp_recv;
> 
> Not typedefed?
Winlin: I have altered all log levels from INFO to VERBOSE, except for the 
final one, which gathers the time cost for each step. Since there are numerous 
steps that may consume a significant amount of time, potentially causing a slow 
handshake, it is crucial to track and optimize them. For instance, the 
following log displays a total cost of 47ms:

```text
WHIP: Muxer state=10, buffer_size=4096, max_packet_size=1184,
elapsed=47ms(init:7,offer:0,answer:7,udp:0,ice:6,dtls:25,srtp:0)
```
> 
> 
> 
> > +static int on_rtp_write_packet(void *opaque, uint8_t *buf, int buf_size);
> 
> Why not reorder the file instead of a same-file proto?
Winlin: Reposition the on_rtp_write_packet function slightly earlier in the 
file to prevent the same file proto.

Fixed.
> 
> 
> 
> > + rtc->dtls_ctx.dtls_arq_max = rtc->dtls_arq_max;
> > + rtc->dtls_ctx.dtls_arq_timeout = rtc->dtls_arq_timeout;
> > + rtc->dtls_ctx.pkt_size = rtc->pkt_size;
> 
> Why are these duplicated instead of the struct pointer embedded?
Winlin: Eliminate the dtls_arq_max as it's not used. Introduce a new field 
called mtu in the DTLSContext structure. The mtu is generally equivalent to 
pkt_size, but since pkt_size is utilized not only for DTLS but also for the UDP 
protocol and SRTP packet restrictions, it will be retained in the RTCContext 
structure and passed to the DTLSContext.

```c
typedef struct RTCContext {
    int pkt_size;
} RTCContext;

typedef struct DTLSContext {
    int mtu;
} DTLSContext;

static av_cold int whip_init(AVFormatContext *s)
    RTCContext *rtc = s->priv_data;
    rtc->dtls_ctx.mtu = rtc->pkt_size;
```

Fixed.
> 
> 
> 
> 
> 
> > +static int isom_read_avcc(AVFormatContext *s, uint8_t *extradata, int 
> > extradata_size)
> 
> AS Kieran noted, duplicated code.
> > +/**
> > + * Generate SDP offer according to the codec parameters, DTLS and ICE 
> > information.
> > + * The below is an example of SDP offer:
> 
> This comment and function need some serious work:
> * The example and comment explain exactly nothing about the SDP text being 
> generated. It
> is an opaque blob.
> * It is illegal SDP.
> * We already have SDP facilties you should use.
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/5#discussioncomment-6096202
> 
> 
> > + snprintf(rtc->ice_ufrag_local, sizeof(rtc->ice_ufrag_local), "%08x",
> > + av_get_random_seed());
> > + snprintf(rtc->ice_pwd_local, sizeof(rtc->ice_pwd_local), 
> > "%08x%08x%08x%08x",
> > + av_get_random_seed(), av_get_random_seed(), av_get_random_seed(),
> > + av_get_random_seed());
> 
> Use the PRNG... calling av_get_random_seed 5 times is a bit whack.
> 
> Also, is it limited to hex characters?
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/5#discussioncomment-6096202
> 
> > + rtc->audio_ssrc = av_get_random_seed();
> > + rtc->video_ssrc = av_get_random_seed();
> 
> Same as above.
> 
> > + rtc->audio_payload_type = 111;
> > + rtc->video_payload_type = 106;
> 
> Magic numbers. I assume this is H.264 and Opus? Do not harcode.
Winlin: Replace by macros.

```c
/* Referring to Chrome's definition of RTP payload types. */
#define RTC_RTP_PAYLOAD_TYPE_H264 106
#define RTC_RTP_PAYLOAD_TYPE_OPUS 111
```
> 
> 
> 
> > +
> > + av_bprintf(&bp, ""
> > + "v=0\r\n"
> > + "o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1\r\n"
> 
> Hardcoding 127.0.0.1?
> 
> 4489045141692799359?
Winlin: In the case of ICE-LITE, these fields are not used; instead, they are 
defined as constant values.

```c
/**
 * In the case of ICE-LITE, these fields are not used; instead, they are defined
 * as constant values.
 */
#define RTC_SDP_SESSION_ID "4489045141692799359"
#define RTC_SDP_CREATOR_IP "127.0.0.1"
```

> 
> > + if (rtc->video_par) {
> > + profile = rtc->video_par->profile < 0 ? 0x42 : rtc->video_par->profile;
> > + level = rtc->video_par->level < 0 ? 30 : rtc->video_par->level;
> > + profile_iop = profile & FF_PROFILE_H264_CONSTRAINED;
> 
> Do not hardcode H264.
Winlin: Fixed.
> 
> > + av_bprintf(&bp, ""
> > + "m=video 9 UDP/TLS/RTP/SAVPF %u\r\n"
> > + "c=IN IP4 0.0.0.0\r\n"
> > + "a=ice-ufrag:%s\r\n"
> > + "a=ice-pwd:%s\r\n"
> > + "a=fingerprint:sha-256 %s\r\n"
> > + "a=setup:active\r\n"
> > + "a=mid:1\r\n"
> > + "a=sendonly\r\n"
> > + "a=msid:FFmpeg video\r\n"
> > + "a=rtcp-mux\r\n"
> > + "a=rtcp-rsize\r\n"
> > + "a=rtpmap:%u H264/90000\r\n"
> 
> Same.
> 
> > + "a=fmtp:%u 
> > level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=%02x%02x%02x\r\n"
> 
> Same.
> 
> > + "a=ssrc:%u cname:FFmpeg\r\n"
> > + "a=ssrc:%u msid:FFmpeg video\r\n",
> > + rtc->video_payload_type,
> > + rtc->ice_ufrag_local,
> > + rtc->ice_pwd_local,
> > + rtc->dtls_ctx.dtls_fingerprint,
> > + rtc->video_payload_type,
> > + rtc->video_payload_type,
> > + profile & (~FF_PROFILE_H264_CONSTRAINED),
> 
> Same.
> 
> > + * Exchange SDP offer with WebRTC peer to get the answer.
> > + * The below is an example of SDP answer:
> 
> Same comment as my before: This is an opaque blob. 
> 
> > +/**
> > + * Parses the ICE ufrag, pwd, and candidates from the SDP answer.
> 
> We have SDP facilities already.
Winlin: I have experimented with the SDP functionality in FFmpeg by utilizing 
the av_sdp_create function. I discovered that the SDP generated by 
av_sdp_create is tailored for RTSP, which significantly differs from WebRTC. 
Below is a comparison of the test code and the SDP.

```c
    int i;
    char url[MAX_URL_SIZE];
    char sdp[MAX_SDP_SIZE];
    AVFormatContext sdp_ctx, *ctx_array[1];
    sdp_ctx = *s;
    sdp_ctx.url = url;
    for (i = 0; i < sdp_ctx.nb_streams; i++) {
        /* Exclude the stream's bitrate information to prevent writing a line 
in the SDP like b=AS:96. */
        sdp_ctx.streams[i]->codecpar->bit_rate = 0;
        /* Do not write a sprop-stereo=1 line. */
        if (sdp_ctx.streams[i]->codecpar->codec_id == AV_CODEC_ID_OPUS)
            sdp_ctx.streams[i]->codecpar->ch_layout.nb_channels = 1;
    }
    ff_url_join(url, sizeof(url), "rtp", NULL, "0.0.0.0", 9, NULL);
    ctx_array[0] = &sdp_ctx;
    if (av_sdp_create(ctx_array, 1, sdp, MAX_SDP_SIZE)) {
        av_free(sdp);
        return AVERROR_INVALIDDATA;
    }
    av_log(s, AV_LOG_INFO, "SDP:\n%s\n", sdp);
```

Below is the SDP produced by av_sdp_create for RTSP.

```sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 0.0.0.0
t=0 0
a=tool:libavformat 60.5.100
m=video 9 RTP/AVP 96
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1; 
sprop-parameter-sets=Z0LAHtkAwCmwEQAAAwABAAADADIPFi5I,aMuDyyA=; 
profile-level-id=42C01E
m=audio 11 RTP/AVP 97
a=rtpmap:97 opus/48000/2
```

Below is the SDP produced by WHIP muxer for WebRTC.

```sdp
v=0
o=FFmpeg 4489045141692799359 2 IN IP4 127.0.0.1
s=FFmpegPublishSession
t=0 0
a=group:BUNDLE 0 1
a=extmap-allow-mixed
a=msid-semantic: WMS
m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=ice-ufrag:f07a1195
a=ice-pwd:5c4b3fe3a7ffed9262c712fdc77fe232
a=fingerprint:sha-256 
CD:EE:CE:45:4C:D7:3C:57:7B:79:38:7F:83:3E:3D:2F:3E:38:92:87:EC:57:4C:6D:21:26:97:B8:FA:58:EC:E1
a=setup:passive
a=mid:0
a=sendonly
a=msid:FFmpeg audio
a=rtcp-mux
a=rtpmap:111 opus/48000/1
a=ssrc:3591919697 cname:FFmpeg
a=ssrc:3591919697 msid:FFmpeg audio
m=video 9 UDP/TLS/RTP/SAVPF 106
c=IN IP4 0.0.0.0
a=ice-ufrag:f07a1195
a=ice-pwd:5c4b3fe3a7ffed9262c712fdc77fe232
a=fingerprint:sha-256 
CD:EE:CE:45:4C:D7:3C:57:7B:79:38:7F:83:3E:3D:2F:3E:38:92:87:EC:57:4C:6D:21:26:97:B8:FA:58:EC:E1
a=setup:passive
a=mid:1
a=sendonly
a=msid:FFmpeg video
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:106 H264/90000
a=fmtp:106 
level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001e
a=ssrc:1300609085 cname:FFmpeg
a=ssrc:1300609085 msid:FFmpeg video
```

The primary distinction is outlined below:

| - | RTSP | WHIP Muxer | Description | 
|---|---|---|---|
| connection | c=IN IP4 0.0.0.0 | c=IN IP4 0.0.0.0 | RTSP use global 
connection. WHIP use it for each m line. |
| m line | m=video 9 RTP/AVP 96 | m=video 9 UDP/TLS/RTP/SAVPF 106 | The 
protocols are difference. |
| m line | m=audio 11 RTP/AVP 97 | m=audio 9 UDP/TLS/RTP/SAVPF 111 | RTSP has a 
port 11, while WHIP ignores port. |
| ssrc | - | a=ssrc:3591919697 cname:FFmpeg | No SSRC in RTSP. |
| dtls | - | a=setup:passive | No DTLS role in RTSP. |
| stream id | - | a=mid:1 | No stream id in RTSP. |
| direction | - | a=sendonly | No stream direction in RTSP. |
| rtcp | - | a=rtcp-mux | No rtcp pamameters in RTSP. |
| ice | - | a=ice-ufrag:f07a1195 | No ICE config in RTSP. |

The difference is significant, and it is highly recommended to improve the SDP 
functionality in the future once this patch has been merged.

Winlin: I attempted to utilize av_sdp_create for SDP creation, but it required 
extensive modification of the existing code, making it difficult to maintain 
due to the significant differences. Therefore, it is recommended to improve 
this approach in the future. See [WHIP: Use av_sdp_create to create 
SDP.](https://github.com/winlinvip/ffmpeg-webrtc/commit/0f0259bf39208aca79e90709e4b7ad45a85de835)
 for detail.
> 
> 
> 
> > + /* Write 20 bytes header */
> > + avio_wb16(pb, 0x0001); /* STUN binding request */
> > + avio_wb16(pb, 0); /* length */
> > + avio_wb32(pb, STUN_MAGIC_COOKIE); /* magic cookie */
> > + avio_wb32(pb, av_get_random_seed()); /* transaction ID */
> > + avio_wb32(pb, av_get_random_seed()); /* transaction ID */
> > + avio_wb32(pb, av_get_random_seed()); /* transaction ID */
> 
> Use the PRNG.
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/5#discussioncomment-6096173
> > + av_opt_set(rtc->udp_uc->priv_data, "connect", "1", 0);
> > + av_opt_set(rtc->udp_uc->priv_data, "fifo_size", "0", 0);
> > + /* Set the max packet size to the buffer size. */
> > + snprintf(tmp, sizeof(tmp), "%d", rtc->pkt_size);
> > + av_opt_set(rtc->udp_uc->priv_data, "pkt_size", tmp, 0);
> 
> This is as good of a place as any to ask: Do we plan to handle passing
> AVOptions down to subcontexts?
Winlin: Fixed 
> 
> > +static int setup_srtp(AVFormatContext *s)
> 
> > + const char* suite = "AES_CM_128_HMAC_SHA1_80";
> 
> Source?
> 
> Same as before, re: subcontext AVOptions?
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/5#discussioncomment-6096182

Fixed.
> 
> 
> > + /* Ignore if not RTP or RTCP packet. */
> > + if (buf_size < 12 || (buf[0] & 0xC0) != 0x80)
> > + return 0;
> > +
> > + /* Only support audio, video and rtcp. */
> > + is_rtcp = buf[1] >= 192 && buf[1] <= 223;
> 
> Magic numbers.
Winlin: Replace the magic number with a macro for both RTP and RTCP payload 
types.

```c
/**
 * For RTCP, PT is [128, 223] (or without marker [0, 95]). Literally, RTCP 
starts
 * from 64 not 0, so PT is [192, 223] (or without marker [64, 95]), see "RTCP 
Control
 * Packet Types (PT)" at
 * 
https://www.iana.org/assignments/rtp-parameters/rtp-parameters.xhtml#rtp-parameters-4
 *
 * For RTP, the PT is [96, 127], or [224, 255] with marker. See "RTP Payload 
Types (PT)
 * for standard audio and video encodings" at
 * 
https://www.iana.org/assignments/rtp-parameters/rtp-parameters.xhtml#rtp-parameters-1
 */
#define RTC_RTCP_PT_START 192
#define RTC_RTCP_PT_END   223
```

Fixed.
> 
> > + /**
> > + * For video, the STAP-A with SPS/PPS should:
> > + * 1. The marker bit should be 0, never be 1.
> > + * 2. The NRI should equal to the first NALU's.
> > + */
> > + if (is_video && buf_size > 12) {
> > + nalu_header = buf[12] & 0x1f;
> > + if (nalu_header == NALU_TYPE_STAP_A) {
> > + /* Reset the marker bit to 0. */
> > + if (buf[1] & 0x80)
> > + buf[1] &= 0x7f;
> > +
> > + /* Reset the NRI to the first NALU's NRI. */
> > + if (buf_size > 15 && (buf[15]&0x60) != (buf[12]&0x60))
> > + buf[12] = (buf[12]&0x80) | (buf[15]&0x60) | (buf[12]&0x1f);
> > + }
> > + }
> 
> This seems hardcoded to H.264.
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/5#discussioncomment-6096157

Fixed.
> 
> 
> > +/**
> > + * Inserts the SPS/PPS data before each IDR (Instantaneous Decoder 
> > Refresh) frame.
> > + *
> > + * The SPS/PPS is parsed from the extradata. If it's in ISOM format, the 
> > SPS/PPS is
> > + * multiplexed to the data field of the packet. If it's in annexb format, 
> > then the entire
> > + * extradata is set to the data field of the packet.
> 
> Why do we do it different ways instea of always munging it into the same way?
> 
> Also, do we have a BSF for this?
Winlin: It varies depending on the situation. When copying, the packet is in 
the MP4/ISOM format. In this case, use the h264_mp4toannexb BSF and eliminate 
this function. However, when encoding, the packet is in the ANNEXB format, and 
it becomes essential to manually insert encoder metadata before each IDR while 
handling annexb format packets.

## ISOM

When transcoding then copying, the packet is in the MP4/ISOM format.

```bash
~/git/FFmpeg/ffmpeg_g -re -i ~/git/srs/trunk/doc/source.flv \
    -vcodec libx264 -profile:v baseline -r 25 -g 50 -acodec libopus -ar 48000 
-ac 2 \
    -y t.mp4

~/git/FFmpeg/ffmpeg_g -re -i t.mp4 \
    -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

Utilize the h264_mp4toannexb BSF and remove this function.

```c
const FFOutputFormat ff_rtc_muxer = {
    .p.name             = "rtc",
    /**
     * Avoid using AVFMT_GLOBALHEADER, for annexb format, it's necessary for the
     * encoder to insert metadata headers (e.g., SPS/PPS for H.264) before each
     * IDR frame.
     */
    .p.flags            = AVFMT_NOFILE,
    .check_bitstream    = rtc_check_bitstream,

static int rtc_check_bitstream(AVFormatContext *s, AVStream *st, const AVPacket 
*pkt) {
    if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
        if (pkt->size >= 5 && AV_RB32(pkt->data) != 0x0000001 &&
                             (AV_RB24(pkt->data) != 0x000001 ||
                              (st->codecpar->extradata_size > 0 &&
                               st->codecpar->extradata[0] == 1)))
            ret = ff_stream_add_bitstream_filter(st, "h264_mp4toannexb", NULL);
```

Fixed by 
https://github.com/winlinvip/ffmpeg-webrtc/commit/6598ffce669a403cc39fb7394854848b513a3683

Keep in mind that if the packet is already in annexb format, this BSF will 
bypass the packet:

```c
static int h264_mp4toannexb_init(AVBSFContext *ctx)
    if (!extra_size                                               ||
        (extra_size >= 3 && AV_RB24(ctx->par_in->extradata) == 1) ||
        (extra_size >= 4 && AV_RB32(ctx->par_in->extradata) == 1)) {
        av_log(ctx, AV_LOG_VERBOSE,
               "The input looks like it is Annex B already\n");

// s->extradata_parsed is 0.
static int h264_mp4toannexb_filter(AVBSFContext *ctx, AVPacket *opkt)
    /* nothing to filter */
    if (!s->extradata_parsed) {
        av_packet_move_ref(opkt, in);
        av_packet_free(&in);
        return 0;
    }
```

In this scenario, we must manually insert the SPS/PPS.

## ANNEXB

When encoding, the packet is in the ANNEXB format.

```bash
~/git/FFmpeg/ffmpeg_g -re -i ~/git/srs/trunk/doc/source.flv \
    -vcodec libx264 -profile:v baseline -r 25 -g 50 -acodec libopus -ar 48000 
-ac 2 \
    -f rtc "http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream";
```

Since the h264_mp4toannexb filter only processes the MP4 ISOM format and 
bypasses the annexb format, it is necessary to manually insert encoder metadata 
before each IDR when dealing with annexb format packets. For instance, in the 
case of H.264, we must insert SPS and PPS before the IDR frame.

```c
static int rtc_check_bitstream(AVFormatContext *s, AVStream *st, const AVPacket 
*pkt) {
    if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
        extradata_isom = st->codecpar->extradata_size > 0 && 
st->codecpar->extradata[0] == 1;
        if (pkt->size >= 5 && AV_RB32(b) != 0x0000001 && (AV_RB24(b) != 
0x000001 || extradata_isom))
            ret = ff_stream_add_bitstream_filter(st, "h264_mp4toannexb", NULL);
        else
            rtc->h264_annexb_insert_sps_pps = 1;
    }

static int rtc_write_packet(AVFormatContext *s, AVPacket *pkt) {
    if (rtc->h264_annexb_insert_sps_pps && st->codecpar->codec_id == 
AV_CODEC_ID_H264) {
        if ((ret = h264_annexb_insert_sps_pps(s, pkt)) < 0) {

/**
 * Since the h264_mp4toannexb filter only processes the MP4 ISOM format and 
bypasses
 * the annexb format, it is necessary to manually insert encoder metadata 
before each
 * IDR when dealing with annexb format packets. For instance, in the case of 
H.264,
 * we must insert SPS and PPS before the IDR frame.
 */
static int h264_annexb_insert_sps_pps(AVFormatContext *s, AVPacket *pkt)
```

See 
https://github.com/ossrs/ffmpeg-webrtc/pull/1/commits/981a09876c07986612a94617751dd56bfd95089e

Fixed.
> 
> > + /* For audio OPUS stream, correct the timestamp. */
> > + if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
> > + pkt->dts = pkt->pts = rtc->audio_jitter_base;
> > + rtc->audio_jitter_base += 960;
> > + }
> 
> 1. Do not hardcode opus.
> 2. 960 is not always true for opus, it just happens to be the libopus default.
Winlin: Dup to 
https://github.com/ossrs/ffmpeg-webrtc/discussions/4#discussioncomment-6096129

Fixed.
> 
> > + ret = insert_sps_pps_packet(s, pkt);
> > + if (ret < 0) {
> > + av_log(s, AV_LOG_ERROR, "Failed to insert SPS/PPS packet\n");
> > + return ret;
> > + }
> 
> Do not hardcode H264.
Winlin: Refine the comments to:

```c
        av_log(rtc, AV_LOG_ERROR, "WHIP: Failed to insert codec metadata 
packet\n");
```

Fixed.
> 
> > +static const AVClass rtc_muxer_class = {
> > + .class_name = "WebRTC muxer",
> 
> WHIP or WebRTC?
Winlin: Rename to `WHIP muxer`

```c
static const AVClass rtc_muxer_class = {
    .class_name = "WHIP muxer",

const FFOutputFormat ff_rtc_muxer = {
    .p.long_name        = NULL_IF_CONFIG_SMALL("WHIP(WebRTC-HTTP ingestion 
protocol) muxer"),
```

Fixed.
> 
> [...]
> 
> OK. So now on to my waffling.
> 
> This patch, to me, seems almost like a proof of concept rather than a fully 
> fleshed out
> (read: pushable) patch. I have few concerns:
> 
> 1. If it is pushed in such a state, lacking so many things, my hunch, based 
> on experience,
> is that it will stay that way: incomplete, just being maintained, rather than 
> finished.
> Users will rightfully ask why it is like this, and will ask for it to be 
> completed, but
> I expect nobody will. But there will be a blog post about how "FFmpeg has 
> WebRTC now",
> like we saw with IPFS.
> 2. It is monolithic - WHIP, DTLS, ICE, STUN, SRTP, RTP, and SPD should really 
> all be separate
> layers. Typing that out really does make one realize WebRTC suffers from deep 
> IETF
> committee-ness...
> 3. The codec parts should probably be separated out like they are for RTP.
> 4. If our SDP facilities are not good enough for WHIP, they should be 
> improved so that they
> are.
Winlin: This patch goes beyond being a mere prototype for proof of concept; it 
is a fully functional patch. Since WHIP is significantly simpler and only a 
subset of WebRTC, there is no need to implement the entire WebRTC stack, making 
this patch truly workable.

At present, we only have support for OpenSSL, but adding support for other TLS 
libraries like GnuTLS is not difficult. To maintain the simplicity of this 
patch, we suggest incorporating this additional support in the future.

I believe that extracting WHIP, DTLS, ICE, and STUN into separate layers may 
not be necessary, as it represents a complete WebRTC stack. WHIP involves a 
single HTTP request and response, so creating an additional API layer is not 
required. Similarly, for ICE and STUN, we utilize ICE-LITE mode instead of 
ICE-FULL mode, which consists of a straightforward UDP binding request and 
response, making an extra API layer redundant. However, for DTLS, it might be 
beneficial to introduce an API layer. Regarding SRTP and RTP, FFmpeg already 
provides an API layer, which only needs some refinement.

The issue with the SDP is somewhat complicated, as the WHIP's SDP is quite 
straightforward, requiring only the generation of a single SDP. However, if we 
want to utilize ffmpeg's SDP API, it becomes more complex and necessitates code 
modifications to function properly. I suggest addressing this in a future 
update rather than in the current patch.

Currently, we only support H.264 and OPUS codecs, but we may need to add AV1 
support in the future. Incorporating a new codec is straightforward; it simply 
requires updating the SDP and codec metadata, such as the H.264's SPS/PPS.

Jack Lau: Since I refactored the DTLS and merged it into tls, we can easily add 
support for other tls libs.
> 
> - Derek



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] WHIP Feature latest patch Preparation Notes

Reply via email to