Thank you for the suggestion.. It took me a while to get it but I think it looks much better now.
Please let me know what you think! Cheers, On Sat, Aug 15, 2015 at 11:24 AM, Clément Bœsch <u...@pkh.me> wrote: > On Sat, Aug 08, 2015 at 09:24:03PM -0700, Yayoi Ukai wrote: >> On Sat, Aug 8, 2015 at 8:23 AM, Clément Bœsch <u...@pkh.me> wrote: >> > On Fri, Aug 07, 2015 at 11:03:29PM -0700, Yayoi wrote: >> >> --- >> >> libavcodec/samidec.c | 59 >> >> +++++++++++++++++++++++++++++-------------------- >> >> tests/ref/fate/sub-sami | 18 +++++++-------- >> >> 2 files changed, 44 insertions(+), 33 deletions(-) >> >> >> >> diff --git a/libavcodec/samidec.c b/libavcodec/samidec.c >> >> index 47850e2..41826a7 100644 >> >> --- a/libavcodec/samidec.c >> >> +++ b/libavcodec/samidec.c >> >> @@ -27,6 +27,7 @@ >> >> #include "ass.h" >> >> #include "libavutil/avstring.h" >> >> #include "libavutil/bprint.h" >> >> +#include "htmlsubtitles.h" >> >> >> >> typedef struct { >> >> AVBPrint source; >> >> @@ -41,11 +42,12 @@ static int sami_paragraph_to_ass(AVCodecContext >> >> *avctx, const char *src) >> >> char *tag = NULL; >> >> char *dupsrc = av_strdup(src); >> >> char *p = dupsrc; >> >> + char *pcopy = NULL; >> >> + int index = 0; >> >> + int second_paragraph = 0; >> >> >> >> - av_bprint_clear(&sami->content); >> >> for (;;) { >> >> char *saveptr = NULL; >> >> - int prev_chr_is_space = 0; >> >> AVBPrint *dst = &sami->content; >> >> >> >> /* parse & extract paragraph tag */ >> >> @@ -77,37 +79,46 @@ static int sami_paragraph_to_ass(AVCodecContext >> >> *avctx, const char *src) >> >> goto end; >> >> } >> >> >> > >> >> - /* extract the text, stripping most of the tags */ >> >> + /* check for the second paragrph */ >> > >> > Why change the comment? What does "check" mean here? What is the "second >> > paragraph"? >> >> >> I answer it below with the other questions you have because you are >> basically asking the same things. >> And I will document better too. >> >> > >> >> + pcopy = av_strdup(p); >> >> while (*p) { >> >> if (*p == '<') { >> >> - if (!av_strncasecmp(p, "<P", 2) && (p[2] == '>' || >> >> av_isspace(p[2]))) >> >> + if (!av_strncasecmp(p, "<p", 2) && (p[2] == '>' || >> >> av_isspace(p[2]))) { >> >> + second_paragraph = 1; >> >> break; >> >> - if (!av_strncasecmp(p, "<BR", 3)) >> >> - av_bprintf(dst, "\\N"); >> >> - p++; >> >> - while (*p && *p != '>') >> >> - p++; >> >> - if (!*p) >> >> - break; >> >> - if (*p == '>') >> >> - p++; >> >> - continue; >> >> + } >> >> } >> >> - if (!av_isspace(*p)) >> >> - av_bprint_chars(dst, *p, 1); >> >> - else if (!prev_chr_is_space) >> >> - av_bprint_chars(dst, ' ', 1); >> >> - prev_chr_is_space = av_isspace(*p); >> >> p++; >> >> + index++; >> >> + } >> >> + p = p - index; >> >> + if (second_paragraph) { >> >> + p[index] = 0; >> >> } >> >> - } >> >> >> >> - av_bprint_clear(&sami->full); >> >> - if (sami->source.len) >> >> - av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", sami->source.str); >> >> - av_bprintf(&sami->full, "%s", sami->content.str); >> >> + ff_htmlmarkup_to_ass(avctx, dst, p); >> >> + >> >> + /* add the source if there are any. */ >> >> + av_bprint_clear(&sami->full); >> >> + if (sami->source.len) { >> >> + av_bprintf(&sami->full, "{\\i1}%s{\\i0}\\N", >> >> sami->source.str); >> >> + av_bprintf(&sami->full, "%s", sami->content.str); >> >> + if (second_paragraph) { >> >> + second_paragraph = 0; >> >> + p = pcopy; >> >> + p += index; >> >> + index = 0; >> >> + continue; >> >> + } >> >> + } else { >> >> + av_bprintf(&sami->full, "%s", sami->content.str); >> >> + } >> >> + av_bprint_clear(&sami->content); >> >> + >> >> + } >> >> >> > >> > This looks clumsy at best: you are finalizing the subtitle event inside >> > the paragraph loop when it should be outside. It also seems there is a >> > duplicating "second paragraph" logic even though the loop is supposed to >> > handle one paragraph at a time. >> >> I know. It is clumsy.. I explain it below as well. >> >> > >> > If you are uncomfortable with the current logic or believe it's badly >> > designed for the logic you're trying to mangle, feel free to rewrite it; >> > it's better than trying to inhibit the old behaviour with hacks. >> >> It's not that your code was badly designed to begin with. I feel like >> it is the nature >> of this format. >> >> So let me explain what was going on your old code and what I had changed. >> Here is the example: (This one is from current fate sample.) >> >> <SYNC Start=73000> >> <P Class=ENUSCC ID=Source>End of: >> <P Class=ENUSCC>President John F. Kennedy Speech >> >> So this is one subtitle event and there are two paragraphs. One >> paragraph has a new source and >> the other paragraph has its content. >> Since it is decided that this code uses html >> parser(ff_htmlmarkup_to_ass) and use it to parse the contents of the >> paragraph. Each paragraph content needs to go to to html parser, >> regardless whether it is source or subtitle content. >> >> So I need to tell html_parser, ff_htmlmarkup_to_ass that where the >> destination is.. (whether it is source or content). >> I can't no longer specify the destination which I was able to do in >> your old code. I mean maybe I can change the function header and add >> additional flag to tell our parser where to put in the dst but I don't >> want to because it may require to change srtdec and I do not want to >> do that.. >> >> Anyways, so that's the basic idea. I honestly think it is good that >> subtitle event handle is also in loop because one loop iteration per >> one subtitle event and subtitle event does not guarantee that it >> contains only one paragraph to begin with. It was a possibility either >> adding additional loop or having goto to honor your original logic but >> after I noticed that subtitle event contains more than one paragraph, >> I gave up on the idea to keep your logic. The comment says "check >> second paragraph" but it should be able to handle multiple paragraphs >> more than 2 in one subtitle event. (I definitely tested while I was >> working on this part.. I can modify sample or attach test file if that >> makes you feel happier. And I make a better comment then.) >> >> So I want to leave it as it is if you do not see major flaws or you do >> not have very very strong opinion about how it needs to be modified. >> Well, given the information, if you have some ideas, you can tell me >> too. Besides, I can't cross out the possibility if you or me or >> someone wants to add more enhancement (maybe one day you decided to >> add support for img tag? who knows..) , and it may need to change code >> structure again. And as it is now, it is still enhancement that works. >> ;-). >> > > I think it's breakable one way or another without that much effort. > > Here is a suggestion based on the original code: > > - make sure you have 5 bprint buffers: source_sami, content_sami, > source_encoded, content_encoded, full (you already have 3 of them in > the context) > > - remove the current markup processing from the loop and make sure the > loop just adds the content of each paragraph where it belongs (that is, > in either source_sami or content_sami). The code already does that, so > there is probably no change to do (code already concats in each bprint > buffer if more than 2 paragraphs happen). > > - after the loop ends, you now have the SAMI data of your packet > split into content_sami and eventually in source_sami, without the SAMI > crap (such as <P ...>). So what you need to do is just call the markup > on content_sami (and eventually source_sami if there is data) to create > content_encoded (and eventually source_encoded). > > - the existing code will then finish things up by concatenating source & > content as it is already doing > > This is probably way more solid than trying to interlace another partial > sub-loop in a loop of the same purpose like you proposed. > > If you do not want to do that, I will have to push your version and do the > above or similar before pushing the whole set. > > [...] > > -- > Clément B. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel