Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Slepp Lukwai
On Mon, 2003-12-15 at 21:08, Richard Ellis wrote:
> What program are you using to monitor CPU usage while mpeg2enc runs? 
> Some versions of top (if you are using top) report percentages as a
> roll-up of the whole SMP machine, so that 3x33% usage really means
> 99% utilization of the machine, where "the machine" means both
> processors combined.  Other versions report a per-cpu percentage
> instead of rolling everything together.

I hate the combined ratings, so I already setup top to report per CPU
usage, so I can see 200% usage instead of it showing 50% as 100% on one
CPU (it's misleading when you deal with single CPUs almost all day for
work).

> Additionally, why kind of memory do you have attached to the cpu's? 
> Mpeg encoding is very memory bandwidth hungry to begin with, and with
> two cpu's trying to eat at the same trough, a not quite as fast as it
> should be memory subsystem can produce results like what you are
> seeing.  It's because with the two cpu's trying to run mpeg2enc, they
> together oversaturate the memory bus, causing both to wait.  But with
> only one mpeg2enc thread running, the entire memory bus bandwidth is
> available to that one cpu alone.

I've noticed. I never saw really how much memory it used unti I used the
buffer program with -t. It was moving gigs of data for a short period of
frames (perhaps 10,000 frames). It's a dual Athlon, which inherently
means 266FSB (DDR 266), though the memory is actually Hynix PC3200 w/
timings set as low as they go on this board (2-2-2), which gives me
about 550MB/s memory bandwidth according to memtest, with a 13GB/s L1
and something like 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1.

At 550MB/s, it SHOULD be able to push enough to keep the frames encoding
at 100% CPU, in theory. I don't think there's enough overhead on this
machine to qualify as keeping it even half saturated. This is why I want
the Corsair XMS Pro memory with load meters on them. (Per bank load
meters, even).

> FWIW, when my desktop machine was a dual PII-400Mhz box, I almost
> always had two mpeg2enc threads eating up 97-98%cpu on both PII
> chips.  The few times both cpu's were not fully saturated at mpeg
> encoding was when I'd bother them with something silly like browsing
> the web with mozilla. :)

Now that's just silly. Why would you hurt the CPUs by running such bloat
as Mozilla? I can't think of how many times Mozilla has gone nuts on me
and used 100% CPU without reason, and you can't kill it any normal UI
way.. Good ol' killall. However, I love it. It's a great browser. Just
rather hungry at times. I suppose there's a reason the logo is a
dinosaur. :>



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


[Mjpeg-users] max bitrate never reached

2003-12-16 Thread Daniel Silva




Hello,
    
I’m trying to encode a bit stream using the command: mpeg2enc … -b 7500 
–q5 outputfile; I’d like to know why the maximum bit rate obtained never reaches 
the desired 7500 kbit/s. The maximum reached was 
1000kbit/s.
 
There 
is anything I could do to make the mjpegtools reach the bit rate near the one 
desired?


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Slepp Lukwai
On Mon, 2003-12-15 at 20:27, Steven M. Schultz wrote:
> On Mon, 15 Dec 2003, Slepp Lukwai wrote:
> 
> > faster to begin with. However, in both cases, after multiple tests and
> > trying different things, I can't get the SMP modes to be fast at all. In
> > fact, they're slower than the non-SMP modes.
> 
>   I think I see what you're doing that could cause that.   I've never
>   seen the problem - using "-M 2" is not going to be 2x as fast though
>   if that was the expectation.   ~40% speedup or so is what I see
>   (from about 10fps to 14fps) typically.

Tried it without any options, same effect. I'm definitely seeing nowhere
near 40% speedup, which is what boggles me. I expected at least
reasonable gains of 25%.

> > When encoding with the -M 0 with .92, I get around 19fps. When I use -M
> 
>   That's full sized (720x480) is it?   Sounds more like a SVCD 
>   or perhaps "1/2 D1" (bit of a misnomer - D1 is actually a digital
>   video tape deck) at 352x480.At 1/2 size yes, around 20fps or a bit
>   more I've seen.   But I'm usually tossing in a bit of filtering so
>   the process is a slower.

Sorry, upon further testing, I actually average around 14fps at DVD
quality (720x480, 9800kbit/s). (see all the details of my command lines
in the post I sent in responce to Bernhard).

> > I installed 'buffer', set it up with a 32MB buffer and put it in the
> 
>   10MB is about all I use - it's just a cushion to prevent the encoder
>   from having to wait (-M 1 is the default - there's I/O readahead
>   going on) for input.

Yeh, I tried 20 first, then 32, but in the end, it made no difference at
all.

> > Has anyone found a way around this, or is it time to look at the source
> > and see what's up?
>   
> > And for reference, it's a dual Athlon MP 2100+, which is below the
> > '2600' that the Howto references as fast.
>   
>   I'm using dual 2800s and around 14-15fps for DVD encoding is what I
>   usually get.

It's interesting that I'm faster with dual 2100s than the dual 2800 (or
at least on par). I suppose it really comes down to command line
options, but you would need to compare those yourself (since I haven't
seen yours).

> > The actual command line is:
> > mpeg2enc -v 0 -I 0 -f 8 -b 9800 -F 1 -n n -p -a 3 -o test.m2v -S  -M
> > 3 -4 2 -2 1 -r 32 -q 5 -Q 3.0 -K kvcd
> 
>   You have progressive non-interlaced source?   If not then "-I 0" is
>   not the right option. 

According to the docs -I 1 turns on interlacing support, and causes
un-needed overhead if it is known progressive material. Hence the -I 0
(plus transcode sets that, though I could override it).

>   The speed up from multiple processors comes, I believe (but if I'm
>   wrong I'm sure someone will tactfully point that out ;)) the speedup
>   comes from the motion estimation of the 2 fields/frame being done in
>   parallel.

Oh. Son of a... If that's all it is...

>   Try "-I 1" (or just leave out the '-I" and let it default.
> 
>   Oh, and there's no real benefit from going above -M 2.   I had a 4
>   cpu box and tried "-M 4" and saw no gain over -M 3 (which in turn
>   was a very minimal increase over -M 2).

I've never even bothered with -M 4 (well, not for a real run, anyway,
just as a quick test).

>   If you want to speed things up by a good percentage try encoding
>   without B frames.   Those are computationally a lot more expensive
>   than I or P frames.   "-R 0" will disable B frames.

I just enabled that, and that's how I'm hitting 15fps instead of 8, and
the quality is good and the size is just fine.

>   And do you realize that increasing the search radius (-r) slows
>   things down?Leave the -r value defaulted to 16 and you should
>   see encoding speed up.

Yup, entirely aware. I do like the minor difference it makes, though.
I'm not in it for speed, really, I just want to see both CPUs get used
to their potentials and give me the equivalent of a 4200+ ;> If it takes
6 hours to transcode a movie because I set -r32 (I noticed a larger
difference with -4 -2 options, btw, than -r16 vs -r32), that's fine, but
I feel it could be faster.

>   All in all - the defaults are fairly sane so if you're not certain
>   about an option, well, let it default.
> 
>   And drop the -Q unless you want artifacting - especially values over 2.   
>   Under some conditions (it's partly material dependent) the -Q can
>   generate really obnoxious color blocks and similar artifacts.Much
>   better results (especially with clean source material) can be obtained
>   with "-E -8" or perhaps "-E -10".

Until I upgraded to .92, I didn't have those options. I'm using them
now, in combination with -Q, but I find the artifacts are almost never
there (I used to do -q 4 and -Q 4.0, and it looked about the same as the
5/3.0).

> > Of course the -M 3 changes to 2 and 0 in testing. I also tested it 

Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Slepp Lukwai
On Mon, 2003-12-15 at 22:44, Bernhard Praschinger wrote:
> Hallo
> 
> > I was doing some testing of both the older version (1.6.1.90) and the
> > newer version of mpeg2enc (1.6.1.92). First off, the .92 was somewhat
> > faster to begin with. However, in both cases, after multiple tests and
> > trying different things, I can't get the SMP modes to be fast at all. In
> > fact, they're slower than the non-SMP modes.
> With slower, I hope you mean "mpeg2enc needs more time to encode the
> movie". 
> And not the time the encoding need in the "realtime". 

Slower by wallclock slower. It took less time to re-encode the entire
thing with -M 0 than when I used -M 3. (I didn't let it run through 2,
since it takes over 4 hours as is). (K, after all these tests, the dual
stuff is running faster, but not fast enough over a full movie to even
warrant the extra threads).

Top output of the 3 running mpeg2enc with mjpegtools 1.6.1.92 on the
Dual Athlon MP 2100+. That's with -M3. Top usage is 2% and the decoder
is only about 10% intermittent. So, I'm neglecting those for the moment.
I'm using transcode, by the way (though I found the same results when
not using transcode and doing a straight pipe from decoded MPEG2
frames). Note the top dumps below ignore the memory usage (which has
approximately 640MB of free RAM (really free, not cache or anything,
it's a clean boot, 127 processes running in all cases)).

 Cpu0 :  50.0% user,   8.6% system,   0.0% nice,  41.4% idle
 Cpu1 :  53.4% user,   4.3% system,   0.0% nice,  42.2% idle
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
11234 slepp 16   0 43436  42m  968 S 38.2  4.2   0:16.96 mpeg2enc
12422 slepp 16   0 43436  42m  968 S 34.5  4.2   0:16.86 mpeg2enc
  623 slepp 16   0 43436  42m  968 R 33.6  4.2   0:17.14 mpeg2enc

Command line:
time /usr/bin/transcode -u 120,2 -M 0 -V -q 1 -f 24,1 --color 1 -x
mpeg2,null -y mpeg2enc,null -e 48000,16 -A -N 0x2000 -F 8,'-S  -M 3
-g 9 -G 18 -4 2 -2 1 -r 32 -q 4 -Q 3.0 -K kvcd -R 0' --pulldown -w 9800
-i 28DaysLater.m2v -o test3 --print_status 50 -c 0-1000

Results:[import_mpeg2.so] tcextract -x mpeg2 -i "28DaysLater.m2v" -d 1 |
tcdecode -x mpeg2 -d 1 -y yv12
[export_mpeg2enc.so] *** init-v *** !
[export_mpeg2enc.so] cmd=mpeg2enc -v 0 -I 0 -f 8 -b 9800 -F 1 -n n -p -a
3 -o "test3".m2v -S  -M 3 -g 9 -G 18 -4 2 -2 1 -r 32 -q 4 -Q 3.0 -K
kvcd -R 0
++ WARN: [mpeg2enc] 3:2 movie pulldown with frame rate set to decode
rate not display rate
++ WARN: [mpeg2enc] 3:2 Setting frame rate code to display rate = 4
(29.970 fps)
encoding frame [950],  14.93 fps, 95.2%, ETA: 0:00:03, ( 0| 0|116)
clean up | frame threads | unload modules | cancel signal | internal
threads | done
[transcode] encoded 999 frames (0 dropped, 0 cloned), clip length 41.67s

73.56user 7.76system 1:09.29elapsed 117%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (2055major+31007minor)pagefaults 0swaps

(I can't find how to turn off line wrap. Sorry...)

Note I used 120 incoming frame buffers with 2 threads decoding the
video. The buffer usage of transcode never dropped below 90 frames
buffered, so the buffering was keeping pace.

Here's the identical command, the only thing changed is -M 3 to -M 2
(this time I included a snapshot of tcdecode, but note that it isn't
always in the top 3 of the list, it comes and goes quite frequently, and
the transcode buffers stay right around 110 to 116 frames):

 Cpu0 :  61.8% user,   7.3% system,   0.0% nice,  30.9% idle
 Cpu1 :  50.5% user,  12.8% system,   0.0% nice,  36.7% idle
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20631 slepp 19   0 39824  38m  984 R 51.1  3.9   0:03.79 mpeg2enc
14434 slepp 17   0 39824  38m  984 R 45.7  3.9   0:03.94 mpeg2enc
29969 slepp 16   0  2644 2644  668 S 13.7  0.3   0:01.95 tcdecode

And the output of time (and the end of transcode):
encoding frame [950],  14.33 fps, 95.2%, ETA: 0:00:03, ( 0| 0|116)
clean up | frame threads | unload modules | cancel signal | internal
threads | done
[transcode] encoded 999 frames (0 dropped, 0 cloned), clip length 41.67s

74.89user 7.68system 1:11.95elapsed 114%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1979major+26920minor)pagefaults 0swaps


And with -M 1 instead of -M 2:

 Cpu0 :  87.0% user,  13.0% system,   0.0% nice,   0.0% idle
 Cpu1 :  22.2% user,   5.6% system,   0.0% nice,  72.2% idle
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
31916 slepp 25   0 36192  35m  984 R 90.3  3.5   0:07.58 mpeg2enc
 3690 slepp 16   0  2644 2644  668 S 14.7  0.3   0:01.91 tcdecode

Note that it's now using an entire CPU (other processes keep sharing,
but it's still using a full CPU).

And the transcode/time results:

encoding frame [950],  14.19 fps, 95.2%, ETA: 0:00:03, ( 0| 0|117)
clean up | frame threads | unload modules | cancel signal | internal
threads | done
[transcode] encoded 999 frames (0 dropped, 0 cloned), clip length 41.67s

73.98user 7.51system 1:12

Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Al Bogner
Am Dienstag, 16. Dezember 2003 01:53 schrieb Steven M. Schultz:

>   demuxing tool (I use mpgtx but transcode might have a tool for that
>   also).

tcextract -i test.mpg -x mpeg2 -d 1 > test.m2v
tcextract -i test.mpg -x mp2 -d 1 > test.mp2

Al


---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


[Mjpeg-users] max bitrate never reached

2003-12-16 Thread Daniel Silva




Hello,
    
I’m trying to encode a bit stream using the command: mpeg2enc … -b 7500 
–q5 outputfile; I’d like to know why the maximum bit rate obtained never reaches 
the desired 7500 kbit/s. The maximum reached was 
1000kbit/s.
 
There 
is anything I could do to make the mjpegtools reach the bit rate near the one 
desired?
 
    Thank 
you,
        
    Daniel Silva


Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Matto Marjanovic

 >> Does anybody have scripts/tools they use to do this?  It is possible,
 >> right?
 ...
 >  Off the top of my head ("finger memory") you'll want the latest
 >  y4mscaler (0.6.1), a recent (preferably cvs) mpeg2dec (decoder) and
 ...
 >  I'd recommend scaling down from 480x480 to 352x480 rather than up
 >  to 704x480.
 >
 >  It goes something like this:
 >
 >mpeg2dec -s -o pgmpipe input.mpg | pgmtoy4m -i t -a 15:11 | \
 >   y4mscaler -S option=sinc:8 -O sar=20:11 -O size=352x480 | \
 >   mpegenc -f 8 -E -8 -K tmpgenc -4 2 -2 1 -o output.m2v
 >
 >  I think I got it right - SVCDs have a SAR of 15:11 but the 1/2 DVD
 >  size would have a SAR of 20:11.   Matt? ;)

Yes, and the new "-O preset=CVD" will take care of all of that, i.e.:

y4mscaler -S option=sinc:8 -O preset=CVD


-matt m.


ps:  mpeg2dec doesn't dump directly to a YUV4MPEG2 stream?

 That is unfortunate --- the chroma subsampling in pgmtoy4m is a simple
 linear kernel, and who knows what kind of chroma supersampling is used
 in mpeg2dec.  (And, this process adds an unnecessary roundtrip step of
 Y'CrCb -> R'G'B'-> Y'CrCb.)


---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] max bitrate never reached

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Daniel Silva wrote:

> Hello,
> 
> I'm trying to encode a bit stream using the command: 
> mpeg2enc . -b 7500 -q5 outputfile; I'd like to know why the maximum bit rate
> obtained never reaches the desired 7500 kbit/s. The maximum reached was 
> 1000kbit/s.

Not enough information to answer the question.   What frame size are
you encoding for?   DVD, VCD, SVCD, CVD, custom size ?

If 1000 is all the encoder needed then you have very clean source
and are probably encoding a 352x288 or so size.

I don't see where you specified the output format with a "-f" option.
"-f 8" is for DVD, "-f 1" for VCD and so on - see the usage() summary
for mpeg2enc (or the manpage).

> There is anything I could do to make the mjpegtools reach the bit rate near 
> the one desired?

Try giving "-f" and setting the format and I think you'll see a 
difference.

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Matto Marjanovic wrote:

>  >mpeg2dec -s -o pgmpipe input.mpg | pgmtoy4m -i t -a 15:11 | \
>  >   y4mscaler -S option=sinc:8 -O sar=20:11 -O size=352x480 | \
>  >   mpegenc -f 8 -E -8 -K tmpgenc -4 2 -2 1 -o output.m2v
> 
> Yes, and the new "-O preset=CVD" will take care of all of that, i.e.:
> 
> y4mscaler -S option=sinc:8 -O preset=CVD

Ah yes - that entered my mind just as I was hitting the send
command.   Very useful preset.

> ps:  mpeg2dec doesn't dump directly to a YUV4MPEG2 stream?

Alas, no it does not ;(

It's why I came up with the pgmtoy4m program.   The YUV4MPEG output
from mpeg2enc was a local hack (I think Andrew did it eons ago to
an old version of mpeg2dec).

>  That is unfortunate --- the chroma subsampling in pgmtoy4m is a simple
>  linear kernel, and who knows what kind of chroma supersampling is used

Huh?  All that pgmtoy4m  does is unpack the data from "mpeg2dec -o
pgmpipe" and slap a YUV4MPEG2 header on it (and FRAME markers).  No
conversion done at all.  

What comes out of mpeg2enc is 4:2:0 (at the moment, 422 is on the
TODO list) but oddly packed (not planar).  The data's read, unpacked 
and output as 420p with header/frame added.

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Matto Marjanovic

 >  Huh?  All that pgmtoy4m  does is unpack the data from "mpeg2dec -o
 >  pgmpipe" and slap a YUV4MPEG2 header on it (and FRAME markers).  No
 >  conversion done at all.  

Ooops... I misread it as "ppmtoy4m".  Disregard everything I wrote.

This reminds me, maybe a better name for "pgmtoy4m" is "pgmpipetoy4m"
 --- because the current name makes it sound like it takes a PGM (portable
 gray map, a la NetPBM) as input, which it does not. 

It's not clear to me why it's called 'pgm' in mpeg2dec, either, but why
 propagate a mistake?

-matt m.



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Matto Marjanovic wrote:

> Ooops... I misread it as "ppmtoy4m".  Disregard everything I wrote.

Ok - consider it disregarded ;)

> This reminds me, maybe a better name for "pgmtoy4m" is "pgmpipetoy4m"
>  --- because the current name makes it sound like it takes a PGM (portable
>  gray map, a la NetPBM) as input, which it does not. 

Technically it is a gray map - you can take the output of '-o pgm'
and view it with xv/gimp/whatever and it will display as the luma
in a big rectangle with the U and V as smaller rectangles below it.

Actually fairly useful when looking for shifts/misalignment.

> It's not clear to me why it's called 'pgm' in mpeg2dec, either, but why
>  propagate a mistake?

Short is good?   Didn't feel like typing out the extra characters.
Anyone using it will (or should) know what they're doing and the source
has a big block of comments at the top detailing the intended use and
a description of the data layout (P5 "raw"), etc.

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Slepp Lukwai wrote:

> Tried it without any options, same effect. I'm definitely seeing nowhere
> near 40% speedup, which is what boggles me. I expected at least
> reasonable gains of 25%.

I think that has to do with the -I setting...

> Sorry, upon further testing, I actually average around 14fps at DVD
> quality (720x480, 9800kbit/s). (see all the details of my command lines

Ah, that's more like it then.   

> It's interesting that I'm faster with dual 2100s than the dual 2800 (or
> at least on par). I suppose it really comes down to command line
> options, but you would need to compare those yourself (since I haven't

Friend of mine has dual 2400s and my setup is ~10-15% faster as I
recall - he's getting around 11fps as a rule where I see 14 or so.

I'm usually adding a bit of overhead with the chroma conversion.  I
build smilutils with ffmpeg/libavcodec (to use ffmpeg's DV codec)
and then run the data thru something like:
"smil2yuv -i 2 file.dv | filters | y4mscaler -O chromass=420_MPEG2 |..."

Produces better output that the default which uses libdv but does
cost a bit in cpu use.

> According to the docs -I 1 turns on interlacing support, and causes
> un-needed overhead if it is known progressive material. Hence the -I 0
> (plus transcode sets that, though I could override it).

But unless you have the raw 23.976fps progressive data (with the 3:2
pulldown undone) then I think '-I 1' is the option to use.   But then 
I might be confused (wouldn't be the first time ;)).

That would explain why the encoding rate I see is lower since I'm
using -I 1.

> > wrong I'm sure someone will tactfully point that out ;)) the speedup
> > comes from the motion estimation of the 2 fields/frame being done in
> > parallel.
> 
> Oh. Son of a... If that's all it is...

Yep - I'm fairly sure that is why you're not seeing any improvement
when using "-M 2".

> > without B frames.   Those are computationally a lot more expensive
> > than I or P frames.   "-R 0" will disable B frames.
> 
> I just enabled that, and that's how I'm hitting 15fps instead of 8, and
> the quality is good and the size is just fine.

Great!   It takes, from what I've seen, extraordinarily clean sources
before -R 0 has no or little effect.

> to their potentials and give me the equivalent of a 4200+ ;> If it takes
> 6 hours to transcode a movie because I set -r32 (I noticed a larger
> difference with -4 -2 options, btw, than -r16 vs -r32), that's fine, but

Yep - "-4 1" will close to double the time over "-4 2" and the 
difference in bitrate/filesize is measured in tenths of a percent. 
Hardly worth it.   Not all that much difference between "-4 2" and
"-4 3" though.

> > better results (especially with clean source material) can be obtained
> > with "-E -8" or perhaps "-E -10".
> 
> Until I upgraded to .92, I didn't have those options. I'm using them

On noisy source material the -E option has almost no effect  but the
cleaner the input the more effect even modest values of -E have.

> now, in combination with -Q, but I find the artifacts are almost never
> there (I used to do -q 4 and -Q 4.0, and it looked about the same as the
> 5/3.0).

Perhaps Richard Ellis could chime in with his experiences with -Q ;)

> > Right, with -I 0 the cpus take turns but there's little parallelism.
> 
> And again, son of I didn't realize the parallelization was done
> based on interlacing settings.

Looking back on it that makes sense though.   A P frame depends on the
preceeding P frame - rather sequential in nature since you can't
move on to the next one without completing the first one...

> The MPEG decoding doesn't take much, and the pipe overhead is negligble,

Pipe overhead sneaks up on you though.   One pipe?  Not a real problem,
two?  Begins to be noticed but isn't too bad.   Four or five?   Yeah,
it starts to take a hit on the overall speed of the system - the data
has to go up/down thru the kernel all those times and that's not "free".

> (As I write this, I'm still waiting for the -M 2 run to finish, so it'll
> arrive before the tests results to Bernhard make it out).

You might try, for timing purposes, without -I 0 and see what, if any
effect that has.   Might be a useful data point.

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing li

Re: [Mjpeg-users] (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Matto Marjanovic

 >> This reminds me, maybe a better name for "pgmtoy4m" is "pgmpipetoy4m"
 >>  --- because the current name makes it sound like it takes a PGM (portable
 >>  gray map, a la NetPBM) as input, which it does not. 
 >
 >  Technically it is a gray map - you can take the output of '-o pgm'
 >  and view it with xv/gimp/whatever and it will display as the luma
 >  in a big rectangle with the U and V as smaller rectangles below it.

Well, whaddayaknow... that's clever (but still kinda disturbing).

 ...
 >  Anyone using it will (or should) know what they're doing and the source
 >  has a big block of comments at the top detailing the intended use and
 >  a description of the data layout (P5 "raw"), etc.

Ok, I'll settle for a nice manpage.  ;^)

-matt m.



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Andrew Stevens
Hi all,

First off a bit of background to the multi-threading in the current stable 
branch.  First off:

- Parallelism is primarily frame-by-frame.  This means that the final phases 
of the encoding lock on completion of the reference frame (prediction and DCT 
transform) and the predecessor (bit allocation).   If you have a really fast 
CPU that motion estimates and DCT's very fast you will get lower 
parallelisation.  If you use -R 0 you will get very litte parallelism *at 
all*.   Certainly not enough to make -M 3 sensible.

- There is also a parallel read-ahead thread but this rarely soaks much CPU on 
modern CPUs.

The MPEG_DEVEL branch encoder stripes all encoding phases to allow much more 
scalable parallelisation.  You might want to give it a go - I'd be interested 
in the results!

N.b. in a 'realistic' scenario you're running the multiplexer and audio 
encoding in parallel with the encoder and video filters communicating via 
pipes and named FIFO's.   This setup usually saturate a modern dual machine 

cheers,

Andrew
PS
I'm away on vacation for a couple of weeks from friday so there'll be a bit of 
pause in answering emails / posts from then ;-)





---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Bernhard Praschinger
Hallo

> Top output of the 3 running mpeg2enc with mjpegtools 1.6.1.92 on the
> Dual Athlon MP 2100+. That's with -M3. Top usage is 2% and the decoder
> is only about 10% intermittent. So, I'm neglecting those for the moment.
> I'm using transcode, by the way (though I found the same results when
> not using transcode and doing a straight pipe from decoded MPEG2
> frames). Note the top dumps below ignore the memory usage (which has
> approximately 640MB of free RAM (really free, not cache or anything,
> it's a clean boot, 127 processes running in all cases)).
> 
>  Cpu0 :  50.0% user,   8.6% system,   0.0% nice,  41.4% idle
>  Cpu1 :  53.4% user,   4.3% system,   0.0% nice,  42.2% idle
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 11234 slepp 16   0 43436  42m  968 S 38.2  4.2   0:16.96 mpeg2enc
> 12422 slepp 16   0 43436  42m  968 S 34.5  4.2   0:16.86 mpeg2enc
>   623 slepp 16   0 43436  42m  968 R 33.6  4.2   0:17.14 mpeg2enc
> 
> Command line:
> time /usr/bin/transcode -u 120,2 -M 0 -V -q 1 -f 24,1 --color 1 -x
> mpeg2,null -y mpeg2enc,null -e 48000,16 -A -N 0x2000 -F 8,'-S  -M 3
> -g 9 -G 18 -4 2 -2 1 -r 32 -q 4 -Q 3.0 -K kvcd -R 0' --pulldown -w 9800
> -i 28DaysLater.m2v -o test3 --print_status 50 -c 0-1000
Could you run a few test (please).  Get some frames (100-1000) as yuv
format. I gues that should be possible even with transcode. ;)
(I do not use transcode so I can't help, or get the test streams on
mjpeg.sf.net)

And do afterwards something like that:
cat stream.yuv | mpeg2enc -f8 -M 0-3 -o test.m2v 
or 
lav2yuv stream.avi | mpeg2enc -f 8 -M 0-3 -o test.m2v

So you can be soure that nothing else makes any troubels. And check
thant how it is going. That should not take to long. Than you can add
the options you used, to see if anything there causes the probelm of non
increasing framerate. 

> > I use the 2.6.0-test8 kernel. Maybe that changes the situation.
> I used to be using 2.5.63 or similar, but have rebuilt the machine with
> 2.4.20 with scheduling optimizations and other goodies (gentoo). I
> noticed a number of speed ups in most other parallel processes
> (cinelerra, MPI povray, gcc). Of course, most of the patches in the
> gentoo 2.4.20 kernel are stock in 2.5+ (I also used 2.6.0-test8, but
> this Asus board doesn't behave under that kernel, and it crashed
> whenever i'd load the CPUs or IDE buses :<)
Bad. WHich board do you have ? (Mine is a Tyan Tiger MPX) 

> > Sorry if the mail is a bit confusing,
[...]
> Hopefully this one didn't ramble on TOO long.
My brain had given up the time I started my computer that evening ;)

But I'm not really knowing why the situation is that bad.


auf hoffentlich bald,

Berni the Chaos of Woodquarter

Email: [EMAIL PROTECTED]
www: http://www.lysator.liu.se/~gz/bernhard


---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Richard Ellis
On Tue, Dec 16, 2003 at 09:27:53AM -0800, Steven M. Schultz wrote:
> 
> Perhaps Richard Ellis could chime in with his experiences with -Q
> ;)

It seems that with the right set of options, and the right set of
input data, -Q can help to create some really nasty looking
artifacts.  

> > And again, son of I didn't realize the parallelization was
> > done based on interlacing settings.
>   
> Looking back on it that makes sense though.   A P frame depends on
> the preceeding P frame - rather sequential in nature since you
> can't move on to the next one without completing the first one...

The P frame dependency chain is how the artifacts come about based on
Andrew's explanation.  It's accumulated round off error in the iDCT
routines.  Made worse by -Q as well as -R 0 and a few other options.



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Richard Ellis
On Tue, Dec 16, 2003 at 12:33:52AM -0700, Slepp Lukwai wrote:
> On Mon, 2003-12-15 at 21:08, Richard Ellis wrote:
> > Additionally, why kind of memory do you have attached to the cpu's? 
> > Mpeg encoding is very memory bandwidth hungry to begin with, and with
> > two cpu's trying to eat at the same trough, a not quite as fast as it
> > should be memory subsystem can produce results like what you are
> > seeing. ...

> ... It's a dual Athlon, which inherently means 266FSB (DDR 266),
> though the memory is actually Hynix PC3200 w/ timings set as low as
> they go on this board (2-2-2), which gives me about 550MB/s memory
> bandwidth according to memtest, with a 13GB/s L1 and something like
> 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1.  At 550MB/s, it
> SHOULD be able to push enough to keep the frames encoding at 100%
> CPU, in theory.

Yes, but just one 720x480 DVD quality frame is larger than 256k in
size, so a 256k cache per CPU isn't helping too much overall
considering how many frames there are in a typical video to be
encoded.  Plus, my experience with Athlon's is that they are actually
faster at mpeg2enc encoding that Intel chips of equivalent speed
ratings (the Athlon's 3dnow/mmx implimentation is faster) and so they
put a heavier stress on one's memory bandwidth than an equivalent
speed Intel chip would.  It's possible that 275MB/s per CPU just
isn't fast enough to keep up with the rate that mpeg2enc can consume
data on an Athlon.

Of course, Andrew would be much better suited to discuss mpeg2enc's
memory access patterns during encoding, which depending on how it
does go about accessing memory can better make use of the 256k of
cache, or cause the 256k of cache to be constantly thrashed in and
out.

> > FWIW, when my desktop machine was a dual PII-400Mhz box, I almost
> > always had two mpeg2enc threads eating up 97-98%cpu on both PII
> > chips.  The few times both cpu's were not fully saturated at mpeg
> > encoding was when I'd bother them with something silly like browsing
> > the web with mozilla. :)
> 
> Now that's just silly. Why would you hurt the CPUs by running such bloat
> as Mozilla? I can't think of how many times Mozilla has gone nuts on me
> and used 100% CPU without reason, and you can't kill it any normal UI
> way.. Good ol' killall. However, I love it. It's a great browser. Just
> rather hungry at times. I suppose there's a reason the logo is a
> dinosaur. :>

Hmm... Interesting.  I've had it sometimes just stop but never go
nuts with 100% CPU, and although I usually do CLI kill it if need be,
FVWM2's "destroy" window command has never failed to get rid of it if
I don't bother to go CLI to do so.  In fact, FVWM2's "destroy" has
never failed to get rid of anything that went wonky.  It's the X
windows equivalent to a "kill -9" from the CLI.



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Andrew Stevens wrote:

> Hi all,
> 
> First off a bit of background to the multi-threading in the current stable 
> branch.  First off:
> 
> - Parallelism is primarily frame-by-frame.  This means that the final phases 
> of the encoding lock on completion of the reference frame (prediction and DCT 

If one were using closed and fixed length GOPs would it make
sense to parallelize the encoding of complete GOPs?   Each cpu
could be dispatched a set of N frames that comprise a closed GOP and
a master thread could write the GOPs out in the correct order.

But as Andrew mentioned - but the time filters and other processing
is added in a dual cpu system's pretty well saturated.   Quad cpu
systems are very much a niche (and expensive) item (not to mention 
the noise they make;))

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Trent Piepho
On Tue, 16 Dec 2003, Richard Ellis wrote:
> > 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1.  At 550MB/s, it
> > SHOULD be able to push enough to keep the frames encoding at 100%
> > CPU, in theory.
> 
> Yes, but just one 720x480 DVD quality frame is larger than 256k in
> size, so a 256k cache per CPU isn't helping too much overall
> considering how many frames there are in a typical video to be

A 720x480 4:2:0 frame is about 512KB, at 550MB/sec there is enough memory
bandwidth to encode at about 1000 frames/sec if all you had to do was read the
data.  Obviously the encoder runs somewhat slower than that, so each byte of
data must be accessed multiple times.  That's where the cache helps.

> Of course, Andrew would be much better suited to discuss mpeg2enc's
> memory access patterns during encoding, which depending on how it
> does go about accessing memory can better make use of the 256k of
> cache, or cause the 256k of cache to be constantly thrashed in and
> out.

I seem to recall that one of the biggest performance bottlenecks of mpeg2enc
is they way it accesses memory.  It runs each step of the encoding processes
and en entire frame at a time.  It's much more cache friendly run every stage
of the encoding process on a single macroblock before moving on the to next
macroblock.




---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Trent Piepho
On Tue, 16 Dec 2003, Steven M. Schultz wrote:
> > First off a bit of background to the multi-threading in the current stable 
> > branch.  First off:
> > 
> > - Parallelism is primarily frame-by-frame.  This means that the final phases 
> > of the encoding lock on completion of the reference frame (prediction and DCT 
> 
>   If one were using closed and fixed length GOPs would it make
>   sense to parallelize the encoding of complete GOPs?   Each cpu
>   could be dispatched a set of N frames that comprise a closed GOP and
>   a master thread could write the GOPs out in the correct order.

But what about bit allocation?  You need to know how big the last GOP was to
figure out how many bits you can use for the next GOP.




---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Steven M. Schultz

On Tue, 16 Dec 2003, Trent Piepho wrote:

> But what about bit allocation?  You need to know how big the last GOP was to
> figure out how many bits you can use for the next GOP.

Well, you know the maximum bitrate allowed (via the -b option) - could
encode each GOP with that limit in mind.I'm not sure how bits
"carry over" from GOP to GOP.

Nice self-contained chunks of data should parallelize nicely - perhaps 
not that hard to extend to a "cluster".That'd be fast I'd think.

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


[Mjpeg-users] Re: (S)VCD to DVD w/o re-encoding?

2003-12-16 Thread Matti Haveri
Steven M. Schultz:

 the encoded frame size of a standard SVCD is 480x480.   Legal sizes for
 DVDs are 720x480, 704x480, 352x480 MPEG-2, and 352x240 MPEG-1
 I'd recommend scaling down from 480x480 to 352x480 rather than up to
 704x480.
 I think I got it right - SVCDs have a SAR of 15:11 but the 1/2 DVD size
 would have a SAR of 20:11.
Going from SVCD to CVD/Half-D1 is like going from DV to CVD/Half-D1 
(VCD is similar horizontally).



For example:

PAL SVCD source to CVD/Half-D1 target:

PAL_SVCD_sampling_matrix (width x height) = 480x576
PAL_SVCD_actual_active_picture_size   = 468x576
PAL_SVCD_pixel_aspect_ratio   = (4:3):(468:576) = 128:78
PAL_CVD_sampling_matrix (width x height)  = 352x576
PAL_CVD_actual_active_picture_size= 351x576
PAL_CVD_aspect_ratio  = (4:3):(351:576) = 256:117
vertical_conversion_factor
= target_active_picture_height : source_active_picture_height
= 576:576 = 1
horizontal_conversion_factor
= (source_aspect_ratio) : (destination_aspect_ratio)
  x vertical_conversion_factor
= (128:78) : (256:117) x 1 = 3:4
target_sampling_matrix_width
= horizontal_conversion_factor x source_sampling_matrix_width
= (3:4) x 480 = 360
target_sampling_matrix_height
= vertical_conversion_factor x source_sampling_matrix_height
= 1 x 576
The source image must be resampled to 360x576. However, the original 
target resolution was 352x576. Now we need to crop the 360x576 image 
with 4 pixels from both sides so that the frame width will become 352 
pixels (frame height stays at 576 pixels).

--

NTSC SVCD source to CVD/Half-D1 target:

NTSC_SVCD_sampling_matrix (width x height) = 480x480
NTSC_SVCD_actual_active_picture_size   = 474x486
NTSC_SVCD_pixel_aspect_ratio   = (4:3):(474:486) = 108:79
NTSC_CVD_sampling_matrix (width x height)  = 352x480
NTSC_CVD_actual_active_picture_size= 355.5x486
NTSC_CVD_aspect_ratio  = (4:3):(355.5:486) = 144:79
vertical_conversion_factor
= target_active_picture_height : source_active_picture_height
= 486:486 = 1:1
horizontal_conversion_factor
= (source_aspect_ratio) : (destination_aspect_ratio)
  x vertical_conversion_factor
= (108:79) : (144:79) x 1 = 3:4
target_sampling_matrix_width
= horizontal_conversion_factor x source_sampling_matrix_width
= (3:4) x 480 = 360
target_sampling_matrix_height
= vertical_conversion_factor x source_sampling_matrix_height
= (1:1) x 480 = 480
The source image must be resampled to 360x480. However, the original 
target resolution was 352x480. Now we need to crop the 360x480 image 
with 4 pixels from both sides so that the frame width will become 352 
pixels (frame height stays at 480 pixels).

--
Matti Haveri <[EMAIL PROTECTED]> 
---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Andrew Stevens


> Produces this (approximately 1010 frames), encoding times (real time /
> user time, gives a bit of a view as to how busy the CPUs were during the
> real time, optimal should be 1m realtime, 2m user time, right? and
> average system time was 3.0s, with +/- 0.2s for all tests):
...

Yep.   You should (in theory) get a lot closer to that with the current 
MPEG_DEVEL branch mpeg2enc.   However, your scaling is really remarkably bad 
as even the -R 2 values where two CPUs should be fairly busy are unusually 
bad.  I've never heard of worse than 70% utilisation on dual CPU machines.

Here's a fairly typical snapshot of mpeg2enc -M 2 -I 1 -R 2 in action on my 
dual P-III machine...

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
12620 as18   0 46464  45M   768 R80.9 24.3   0:18 lt-mpeg2enc
12621 as18   0 46464  45M   768 R70.8 24.3   0:18 lt-mpeg2enc
12619 as 9   0 46464  45M   768 S 3.9 24.3   0:01 lt-mpeg2enc

You're getting very very symmetrical CPU loads and very very poor utilisation.  
What kernel are you using... I vaugely recall 2.6.x series radically changed 
the threading libs.  It could be something pathological is happening in the 
scheduling.  

The  2100+ is of course  a lot faster than the P-III but: I doubt the balance 
between the motion estimation and the rest of the code is hugely shifted.  
Cerainly, the approximate proportions of time spent in each are quite similar 
on my 2100+ single-CPU machine and a P-III.


> Also, encoding with one B frame is a touch faster in -I 1 mode than
> encoding without them, but it is slower when you encode two B frames
> instead of just one. I find this interesting.. I would have expected a
> single B frame to take a bit longer than none at all, and that is the
> case when -I 0 is on, but not when it's -I 1. Any ideas on that one?

Not really. However: I would expect going to two B frames to greatly increase 
your CPU utilisation without much wall-clock time increase due the increased 
scope for parallel computation.

> In the end -M 3 is not reasonably faster in -I 0 -R 0, but flys along at
> -I 0 -R 2 compared to baseline, and gets fair gains at -I 0 -R 1, while
> dropping encoding time by another 14 seconds for the same frameset.
This is what you'd expect: -R 2 offers much more scope for the 3 worker 
threads of -M 3 to do something useful.

> The numbers on -M 3 -I 1 -R 2 show a 54 second improvement over the
> tests with -M 0, but it takes almost 50% longer than -M 3 -I 0 -R 1. The
> file size of 3-1-2 is 13,807,067 and the file size of 3-0-1 is
> 13,402,673. The file is smaller, and is encoded faster, and viewing them
> now, the quality is at least on par (3-0-1 looked a tad better).

The usefulness of B frames depends a *lot* on the type of material.  For 
captured stuff they rarely buy you much apart from free room heating from 
your CPU. Hence the provision of -R 0 ;-).  They should get a little more 
useful when I add dynamic frame type selection to mpeg2enc in the new year.


> > - There is also a parallel read-ahead thread but this rarely soaks much
> > CPU on modern CPUs.

Weirdly enough on your machine the reader thread is exceedingly busy
> > The MPEG_DEVEL branch encoder stripes all encoding phases to allow much
> > more scalable parallelisation.  You might want to give it a go - I'd be
> > interested in the results!
>
> I'd love to, but I couldn't find it in CVS. I found everything else in
> the SF CVS branch, but not mjpegtools itself.

cvs co -d :ext:[EMAIL PROTECTED]:/cvsroot/mjpeg mjpeg_play
cd mjpeg_play
cvs update -r MPEG_DEVEL mpeg2enc

The 'mjpeg_play' is a bit of a historical oddity but it is momumentally 
painful to change directory names in CVS...

Andrew




---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Andrew Stevens
Hi Steven,  Trent,

> But what about bit allocation?  You need to know how big the last GOP was
> to figure out how many bits you can use for the next GOP.

Actually, this is not such a big deal provided the GOPs are well seperated.  
Simplifying a little, you just need to ensure that you have >= the assumed 
amount of decoder buffer full at the end of each 'chunk' as you assumed 
starting to encode its successor.

However, this idea came to mind more as a sneaky way of doing accurately sized 
single-pass encoding: work on multiple 'segments' spread across the video 
sequence so you get a good statistical sample of how your total 
bit-consumption is going relative to your target.  This is rotten for 
parallelism thought because you have two more or less totally uncorrelated 
memory footprints.  For DVD 'segments' would kind of naturally correlate with 
'chapters' at the authoring level.

In the MPEG_DEVEL branch encoding of each frame (apart from the bit-packed 
coding and bit allocation which is only a small fraction of the CPU load) is 
simply striped across the available CPUs.  This has a nice side effect of 
reducing each CPUs working set too as it only deals with a fraction of a 
frame.

Having said all that I'll probably simply do a simple two-pass encoding mode 
first (much simpler frame feeding!).


> > Of course, Andrew would be much better suited to discuss mpeg2enc's
> > memory access patterns during encoding, which depending on how it
> > does go about accessing memory can better make use of the 256k of
> > cache, or cause the 256k of cache to be constantly thrashed in and
> > out.
>
> I seem to recall that one of the biggest performance bottlenecks of
> mpeg2enc is they way it accesses memory.  It runs each step of the encoding
> processes and en entire frame at a time.  It's much more cache friendly run
> every stage of the encoding process on a single macroblock before moving on
> the to next macroblock.

The single-macroblock approach has been implemented for quite some time now 
(since the move to C++ roughly).  In rather basic English speed improved 
by... bugger all.  I was *most* surprised, it could well be that the story is 
rather different on multi-CPU machines.  At least I like to hope the work 
wasn't wasted ;-)

Actually, the memory footprint of encoding is much larger than you'd think.  
Remember each 16x16 int16_t difference macroblock gets generated from nastily 
unaligned 16x16 or 16x8 uint8_t predictors and a 16x16 uint8_t picture 
macroblock.  The difference is then DCT-ed in place into 4 8x8 int16_t DCT 
blocks which are then quantised in 4 8x8 int16_t quantised DCT blocks.

Where mpeg2enc could speed up is:

- DCT blocks are in 'correct' and not transposed form.  This is simply a waste 
as by transposing quantiser matrices and the scan sequence you can simply 
skip this.

- Each quantised DCT block is seperately stored.  Nice for debugging, poor for 
memory performance ;-)

- DCT is not combined with quantisation when this is possible.

- Motion estimation (probably wastefully) computes a lot of variances that 
could probably better be replaced by SAD for fast encoding modes.

- The current GOP sizing approach is wasteful.   Frame type should only be 
decided once the best encoding modest (Intra, various inter motion prediction 
modes) is known.  Basically, you turn a B/P frame into an I frame if you've 
reached your GOP length limit or it has enough Intra coded blocks that it is 
more compact that way.   Unfortunately, the current allocation algorithm 
still has a few 'left over' elements that need to know GOP size in advance 
that need to be replaced before this can be fixed.   I'm currently working on 
bit-allocation (basically, a two-pass / look-ahead mode plus the above 
improvement).

A similar approach can be used for deciding B/P frame selection but this is 
expensive in CPU as you basically have to do encode each potential B frame's 
reference frame twice.  I'm playing around with ideas for trying B frames out 
and if they don't seem worthwhile turning them off and then periodically 
checking if it might make sense to turn them on a again.


Andrew



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Richard Ellis
On Tue, Dec 16, 2003 at 12:45:48PM -0800, Trent Piepho wrote:
> On Tue, 16 Dec 2003, Richard Ellis wrote:
> > > 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1.  At 550MB/s,
> > > it SHOULD be able to push enough to keep the frames encoding at
> > > 100% CPU, in theory.
> > 
> > Yes, but just one 720x480 DVD quality frame is larger than 256k
> > in size, so a 256k cache per CPU isn't helping too much overall
> > considering how many frames there are in a typical video to be
> 
> A 720x480 4:2:0 frame is about 512KB, at 550MB/sec there is enough
> memory bandwidth to encode at about 1000 frames/sec if all you had
> to do was read the data.  Obviously the encoder runs somewhat
> slower than that, so each byte of data must be accessed multiple
> times.  That's where the cache helps.

With motion estimation each byte would end up being accessed more
than once for each new "radius" that was examined.  Plus motion
estimation is between at least two frames, so we are dealing with at
least about 1M of data to be accessed eventually in the course of
encoding one frame.

> > Of course, Andrew would be much better suited to discuss
> > mpeg2enc's memory access patterns during encoding, which
> > depending on how it does go about accessing memory can better
> > make use of the 256k of cache, or cause the 256k of cache to be
> > constantly thrashed in and out.
> 
> I seem to recall that one of the biggest performance bottlenecks of
> mpeg2enc is they way it accesses memory.  It runs each step of the
> encoding processes and en entire frame at a time.  It's much more
> cache friendly run every stage of the encoding process on a single
> macroblock before moving on the to next macroblock.

In that case it will kill the majority of the performance benifit
provided by the caches, because there's very little locality of
reference for the cache to compensate for.  It moves through at least
512k for pass one, then through the same 512k again for pass two, but
the data in the cache is from the end of the frame, and we are
starting over at the beginning of the frame.  Massive cache thrash in
that case.  Memory bandwidth becomes a much more limiting factor.



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


[Mjpeg-users] Which matrix for (S)VHS

2003-12-16 Thread Al Bogner
Which matrix would you use for (S)VHS-sources?


# TMPEGEnc NON-INTRA table
16,17,18,19,20,21,22,23
17,18,19,20,21,22,23,24
18,19,20,21,22,23,24,25
19,20,21,22,23,24,26,27
20,21,22,23,25,26,27,28
21,22,23,24,26,27,28,30
22,23,24,26,27,28,30,31
23,24,25,27,28,30,31,33

-K tmpgenc?

Al


---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] Which matrix for (S)VHS

2003-12-16 Thread Steven M. Schultz

On Wed, 17 Dec 2003, Al Bogner wrote:

> Which matrix would you use for (S)VHS-sources?
> 
> -K tmpgenc?

That'll work fine.That is a good middle ground between the
default tables and the kvcd (-K kvcd) tables.   

Which one to use is a matter of playtime and quality.   The kvcd
matrices are more aggressive about not  retaining as much high 
frequency  information which might result in softer images (less
detail on edges for example) but will give more playing time.   For
casual viewing it might be entirely adequate.

On average -K tmpgenc lowered the bitrate about 10% from the
default and -K kvcd about 16%.   It depends on how low you're
setting -q (how  close to the limit the encoder is being pushed)
of course.

For noisier material the thing to do is turn off B frames with
-R 0.   That is a big win - almost "for free" in that it lowers
the bitrate without a tradeoff for quality.

Less effective, but doesn't cost any cpu time extra really, is 
"-E".   Try a modest value of "-E -8" or -10.  

Cheers,
Steven Schultz



---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users


Re: [Mjpeg-users] -M 2/3 on SMP is slower than -M 0

2003-12-16 Thread Bernhard Praschinger
Hallo

> On Tue, 2003-12-16 at 12:57, Bernhard Praschinger wrote:
> > Could you run a few test (please).  Get some frames (100-1000) as yuv
> > format. I gues that should be possible even with transcode. ;)
> > (I do not use transcode so I can't help, or get the test streams on
> > mjpeg.sf.net)
> 
> With about 1010 frames of YUV using < to dump it in (instead of cat), I
> get these:
> 
> -M 0: 2m 11.9s
> -M 1: 2m 10.6s, -1.3s
> -M 2: 1m 27.7s, -44.2s
> -M 3: 1m 26.5s, -45.4s
That values look much better.  :-)
Now you have seen the mpeg2enc can go faster.

I have tried the command you used on my machine, and I have seen the 
same "problem". Also 3 processes and each only 33% .

(time lav2yuv n1000.eli | mpeg2enc -I 0 -f 8 -b 9800 -p -a 3 -o test.m2v
-S  -M 3 -g 9 -G 18 -4 2 -2 1 -r 32 -q 4 -Q 3.0 -K kvcd -R 0)


> Note that I responded in an earlier message with a total of 24 timings
> across -M 0-3 -I 0-1 -R 0-2 settings, which turned up some interesting
> results that -M 3 -I 0 -R 1 worked fastest of all of them (same source
> material I used for the above, and it took 51 seconds). So, I think the
> -I 1 is on, which makes a huge boost in -M ratings from 0 to 3, but it
> is still quite a bit slower than -I 0 (which I use since the input is
> Progressive 23.976fps)
Thats strange.

> > And do afterwards something like that:
> > cat stream.yuv | mpeg2enc -f8 -M 0-3 -o test.m2v
> > or
> > lav2yuv stream.avi | mpeg2enc -f 8 -M 0-3 -o test.m2v
> >
> > So you can be soure that nothing else makes any troubels. And check
> > thant how it is going. That should not take to long. Than you can add
> > the options you used, to see if anything there causes the probelm of non
> > increasing framerate.
> Compared to the run with my long options line, these are 
I'm just running some encodings to see which option causes the problem. 

On my machine the -R 0 caused the problem. If I used -R 1/2 or or R
option, I got 3 processes each using about 45-50%. 

> > Bad. WHich board do you have ? (Mine is a Tyan Tiger MPX)
> Nice board, that one. Asus A7M-266D.. I should've grabbed the MSI K7D
> Master for the same price, I hear much nicer things about it.


> > My brain had given up the time I started my computer that evening ;)
> Mine usually does that at about 8am. :>
Just as you enter work ? ;)

> > But I'm not really knowing why the situation is that bad.
> 
> I'm just not seeing the dual CPU usage that would warrant even running
> in multiple threads, when I could instead transcode two entirely
> separate items as though I had two machines, which makes some sense (I
> did that the other day, worked rather well). But, if I can make a single
> copy work by flooding both CPUs with activity, then I'll be happier,
> since it should take quite a bit less time to encode a full movie.
Encoding without the -R 0 seems to solve the problem, by now.


auf hoffentlich bald,

Berni the Chaos of Woodquarter

Email: [EMAIL PROTECTED]
www: http://www.lysator.liu.se/~gz/bernhard


---
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
___
Mjpeg-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/mjpeg-users