On Dienstag, 8. März 2016 20:42:39 CET Clément Bœsch wrote: > On Tue, Mar 08, 2016 at 06:21:12PM +0100, Gerion Entrup wrote: > > Hello, > > > > Hi, > > > my own ideas seems not to be suitable for GSoC, so I looked again on the > > ideas page, > > because I have high interest to do something for FFmpeg this summer. > > > > The project, that I find most interesting, unfortunately is a unmentored > > one, the subtitle > > support. Is someone willing to mentor this? > > > > I added this task for previous OPW (and maybe GSoC, can't remember). I'm > unfortunately not available for mentoring (too much time, energy and > responsibility). Though, I can provide standard help as a developer. > > The main issue with this task is that it involves API redesign, which is > often not a good idea for a GSoC task. > > That said, a bunch of core limitations have been solved in the past so > it's starting to be comfortable to work on top of the current stack. > > I'm summarizing the current state at the end of this mail, which can be > useful for any potential mentor and eventually student. > > > On the ideas page the mentioned subtitle for the qualification task is > > Spruce subtitle. It > > seems, it is already supported, so I would try to implement the corepart of > > usf. I know > > it is not widely used, but very powerful with similar features as SSA, if I > > get it right. Do you > > think, that is suitable? > > > > Spruce has indeed been added in last OPW as a qualification task. USF is > more painful but a basic support could be a potential qualification task > indeed. You might be able to figure out something playing with the > ff_smil_* functions for the demuxing part. > > So basically you would have to: > > - an USF demuxer which extracts the timing and text (with its markup) of > every event, and put them into an AVPacket > > - introduce an USF codec and write a decoder that will transform the > xml-like markup into ASS markup (see below) I've implement such a demuxer and decoder, based on SAMI (see other mail). But XML parsing with the builtin tools is a real pain and hard to extend later.
If the GSoC project come off, please let me change this and maybe the SAMI code into code based on a xmllib. Header parsing should be doable then as well. > > Again, I'm not a mentor, so you need confirmation from someone else. > > > And then another question. You mentioned as ultimate goal the libavfilter > > integration. > > If I get it right, ATM no rendering is implemented, and libavfilter would > > allow an (automatic) > > rendering from SSA to e.g. dvdsub. Would the rendering itself part of the > > project (because > > this is very extensive, I think)? > > > > So, yeah, currently the subtitles are decoded into an AVSubtitle > structure, which hold one or several AVSubtitleRect (AVSubtitle.rects[N]). > > For graphic subtitles, each rectangle contains a paletted buffer and its > position, size, ... > > For text subtitles, the ass field contains the text in ASS markup: indeed, > we consider the ASS markup to be the best/least worst superset supporting > almost every style of every other subtitles formats have, so it's used as > the "decoded" form for all text subtitles. For example, the SubRip (the > "codec", or markup you find in SRT files) decoder will transform > "<i>foo</i>" into "{\i1}foo{\i0}". > > So far so good. Unfortunately, this is not sufficient, because the > AVSubtitle* structs are old and not convenient for several reasons: > > - they are allocated on the stack by the users, so we can't extend them > (add fields) without breaking the ABI (= angry users). > > - they are defined in libavcodec, and we do not want libavfilter to > depend on libavcodec for a core feature (we have a few filters > depending on it, but that's optional). As such, libavutil is a much > better place for this, which already contains the AVFrame. > > - the graphic subtitles are kind of limited (palette only, can't hold YUV > or RGB32 pixel formats for instance) > > - the handling of the timing is inconsistent: pts is in AV_TIME_BASE and > start/end display time are relative and in ms. > > When these issues are sorted out, we can finally work on the integration > within libavfilter, which is yet another topic where other developers > might want to comment. Typically, I'm not sure what is the state of > dealing with the sparse property of the subtitles. Nicolas may know :) > > Anyway, there are multiple ways of dealing with the previous mentioned > issues. > > The first one is to create an AVSubtitle2 or something in libavutil, > copying most of the current AVSubtitle layout but making sure the user > allocates it with av_subtitle_alloc() or whatever, so we can add fields > and extend it (mostly) at will. > > The second one, which I'm currently wondering about these days is to try > to hold the subtitles data into the existing AVFrame structure. We will > for example have the frame->extended_data[N] (currently used by audio > frames to hold the channels) point on a instances of a newly defined > rectangle structure. Having the subtitles into AVFrame might simplify a > lot the future integration within libavfilter since they are already > supported as audio and video. This needs careful thinking, but it might > be doable. > > But again, these are ideas, which need to be discussed and experimented. I > don't know if it's a good idea for a GSoC, and I don't know who would be > up for mentoring. > > It's nice to finally see some interest into this topic though. > > > regards, > > Regards, > > > Gerion > > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel