Dear Jing, 

I was confused when I started out the regular expression. Many thanks for the 
kind and detailed explanation. 
After reading more on perl regex, I think I have a better grasp of the 
greedy/non-greedy concept now. 
Your code also worked well for my task. 

Regards, 
Viet-Duc

-----------------------Original Message-----------------------
From: Jing Yu <logus...@googlemail.com>
To: Viet-Duc Le <leviet...@kaist.ac.kr>
Sent date: 2014-09-17 12:20:29 GMT +0900 (Asia/Seoul)
Subject: Re: Regular expression: option match after a greedy/non-greedy match

Hi Viet-Duc Le,
On 17 Sep 2014, at 10:23, Viet-Duc Le <leviet...@kaist.ac.kr> wrote:
Greeting from S. Korea ! 

I am parsing the output of ffmpeg with perl. Particular, I want to print only 
these lines among the output and capturing the resolution, i.e. 1280x720. 
....
Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 
fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)
Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default)
Stream #0:2(eng): Subtitle: ass (default) 
.....
My code is following: 
# INFO is pipe to ffmpeg 
# Here, the <print "$1 $2 $3 $4\n"> is for debugging .  
while ( <INFO> ) { 
        if ( <regular expression>  ) { 
            print "$1 $2 $3 $4\n"; 
        }
}
Desirable outputs: 
-> Video 1280 720 
     Audio 
     Subtitle 

Regarding the <regular expession>: 
1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy)
-> Video 0 720 
Q: why does $2 give  0? I remember .* match backward starting from the end of 
the string. Then it should be  "Video 1280 720" as output.
that '0' is from 128'0', since the '.*' consumes 128. What it does under the 
hood is .* first will reach to the end of the target string, and then backtract 
according to the following regex. Once the whole regex is satisfied, it will 
stop backtracting, although further retracting will possibly also satisfy the 
regex.

2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) 
-> Video 1280 720 
Q: I can understand this, but again I think (1) should work too. 

3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing 
optional group ) 
-> Video 
    Audio 
    Subtitle 
Q: It seems that the resolution part is ignored because it is optional. 
Otherwise, the output will contains "Video" only as (1) and (2). How can I 
circumvent this ? 
that ?: prevents $ variables to capture the matching regex group. I guess you 
can get rid of it. The trailing ? already tells the regex group to match 
optionally. It is equivalent to {0,1}. The big problem coming with it is the 
middle .*?. Since the last part is optional, .*? will just match the least 
number of char possible, which is nothing.

4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ 
-> Video 
     Audio 
     Subtitle 
Q: I tried to match things after the resolution, hoping that it will be 
captured. 

Again the ?: prevents it being captured. .+? in the middle is better, now it 
matches ':'.
5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's 
capture the last part) 
-> Video    h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 
tbr, 1k tbn, 47.95 tbc (default)
    Audio    ac3, 48000 Hz, stereo, fltp, 192 kb/s (default)
    Subtitle    ass (default)
Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am 
quite puzzled by the output. 
If it is optional, it is non greedy. So everything goes to the (.*?)$.

Please pardon my long email. I hope someone can point out the flaws in my 
logic. Here, I can match and print Video/Audio/Subtitle separately. 
But I wish for one expression to match them all, one expression to print them. 
In general, it is a better practise to add 'x' to your regex to make it more 
readable. My regex might not be the best, but it works as expected.
use strict;use warnings;use 5.16.0;
while(<DATA>){    / (Video|Audio|Subtitle)  (?: (?:.) +? (\d+x\d+) || (?:.)+ ) 
/x        and say $1, $2, $3, $4;}

__DATA__Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 
23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)Stream #0:1(jpn): Audio: ac3, 
48000 Hz, stereo, fltp, 192 kb/s (default)Stream #0:2(eng): Subtitle: ass 
(default)
The '||' operator will first check the group before it. It will only look at 
the other group if the first group fails. This puts your resolution group 
matching as priority, but not necessity.
Hope this helps.Jing

Reply via email to