On Tue, Mar 08, 2016 at 06:21:12PM +0100, Gerion Entrup wrote: > Hello, >
Hi, > my own ideas seems not to be suitable for GSoC, so I looked again on the > ideas page, > because I have high interest to do something for FFmpeg this summer. > > The project, that I find most interesting, unfortunately is a unmentored one, > the subtitle > support. Is someone willing to mentor this? > I added this task for previous OPW (and maybe GSoC, can't remember). I'm unfortunately not available for mentoring (too much time, energy and responsibility). Though, I can provide standard help as a developer. The main issue with this task is that it involves API redesign, which is often not a good idea for a GSoC task. That said, a bunch of core limitations have been solved in the past so it's starting to be comfortable to work on top of the current stack. I'm summarizing the current state at the end of this mail, which can be useful for any potential mentor and eventually student. > On the ideas page the mentioned subtitle for the qualification task is Spruce > subtitle. It > seems, it is already supported, so I would try to implement the corepart of > usf. I know > it is not widely used, but very powerful with similar features as SSA, if I > get it right. Do you > think, that is suitable? > Spruce has indeed been added in last OPW as a qualification task. USF is more painful but a basic support could be a potential qualification task indeed. You might be able to figure out something playing with the ff_smil_* functions for the demuxing part. So basically you would have to: - an USF demuxer which extracts the timing and text (with its markup) of every event, and put them into an AVPacket - introduce an USF codec and write a decoder that will transform the xml-like markup into ASS markup (see below) Again, I'm not a mentor, so you need confirmation from someone else. > And then another question. You mentioned as ultimate goal the libavfilter > integration. > If I get it right, ATM no rendering is implemented, and libavfilter would > allow an (automatic) > rendering from SSA to e.g. dvdsub. Would the rendering itself part of the > project (because > this is very extensive, I think)? > So, yeah, currently the subtitles are decoded into an AVSubtitle structure, which hold one or several AVSubtitleRect (AVSubtitle.rects[N]). For graphic subtitles, each rectangle contains a paletted buffer and its position, size, ... For text subtitles, the ass field contains the text in ASS markup: indeed, we consider the ASS markup to be the best/least worst superset supporting almost every style of every other subtitles formats have, so it's used as the "decoded" form for all text subtitles. For example, the SubRip (the "codec", or markup you find in SRT files) decoder will transform "<i>foo</i>" into "{\i1}foo{\i0}". So far so good. Unfortunately, this is not sufficient, because the AVSubtitle* structs are old and not convenient for several reasons: - they are allocated on the stack by the users, so we can't extend them (add fields) without breaking the ABI (= angry users). - they are defined in libavcodec, and we do not want libavfilter to depend on libavcodec for a core feature (we have a few filters depending on it, but that's optional). As such, libavutil is a much better place for this, which already contains the AVFrame. - the graphic subtitles are kind of limited (palette only, can't hold YUV or RGB32 pixel formats for instance) - the handling of the timing is inconsistent: pts is in AV_TIME_BASE and start/end display time are relative and in ms. When these issues are sorted out, we can finally work on the integration within libavfilter, which is yet another topic where other developers might want to comment. Typically, I'm not sure what is the state of dealing with the sparse property of the subtitles. Nicolas may know :) Anyway, there are multiple ways of dealing with the previous mentioned issues. The first one is to create an AVSubtitle2 or something in libavutil, copying most of the current AVSubtitle layout but making sure the user allocates it with av_subtitle_alloc() or whatever, so we can add fields and extend it (mostly) at will. The second one, which I'm currently wondering about these days is to try to hold the subtitles data into the existing AVFrame structure. We will for example have the frame->extended_data[N] (currently used by audio frames to hold the channels) point on a instances of a newly defined rectangle structure. Having the subtitles into AVFrame might simplify a lot the future integration within libavfilter since they are already supported as audio and video. This needs careful thinking, but it might be doable. But again, these are ideas, which need to be discussed and experimented. I don't know if it's a good idea for a GSoC, and I don't know who would be up for mentoring. It's nice to finally see some interest into this topic though. > regards, Regards, > Gerion -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel