> In floating point, all you have to do is flip a sign bit. But with
> integers, it's not so easy. There is no instruction for absolute value in
> MMX, you have to use a four instruction sequence and two registers. Slower
> than squaring a value, which only takes two instructions.
> I finally
On 19 Dec 2003, Florin Andrei wrote:
> On Fri, 2003-12-19 at 01:49, Steven M. Schultz wrote:
>
> > At any rate I checked out ffmpeg's mpeg2 encoding vs mpeg2enc on
> > my G4 Powerbook. Yes, ffmpeg has a big speed advantage (~2x) but
The difference is even larger than I thought.
On Fri, 2003-12-19 at 01:49, Steven M. Schultz wrote:
> At any rate I checked out ffmpeg's mpeg2 encoding vs mpeg2enc on
> my G4 Powerbook. Yes, ffmpeg has a big speed advantage (~2x) but
> the resulting output is 'grainy' (same bitrate, no B frames) (and the
> rate contro
On Fri, Dec 19, 2003 at 01:34:38AM -0800, Trent Piepho wrote:
> On Fri, 19 Dec 2003, Andrew Stevens wrote:
> > The next bottlenecks would be the run-length coding and the use
> > of variance instead of SAD in motion compensation mode and DCT
> > mode selection. Sadly
>
> Is SAD really any faster
On Fri, 19 Dec 2003, Steven M. Schultz wrote:
> On Fri, 19 Dec 2003, Trent Piepho wrote:
>
> > On Fri, 19 Dec 2003, Andrew Stevens wrote:
> >
> > Is SAD really any faster to calculate than variance? SAD uses an absolute
> > value-add operation while variance is multiply-add. Multiply-add is usu
On Fri, 19 Dec 2003, Trent Piepho wrote:
> On Fri, 19 Dec 2003, Andrew Stevens wrote:
>
> Is SAD really any faster to calculate than variance? SAD uses an absolute
> value-add operation while variance is multiply-add. Multiply-add is usually
> the most heavily optimized operation a cpu can per
On Fri, 19 Dec 2003, Andrew Stevens wrote:
> The next bottlenecks would be the run-length coding and the use of variance
> instead of SAD in motion compensation mode and DCT mode selection. Sadly
Is SAD really any faster to calculate than variance? SAD uses an absolute
value-add operation whil
On Tuesday 16 December 2003 23:35, Richard Ellis wrote:
Hi Richard,
> In that case it will kill the majority of the performance benifit
> provided by the caches, because there's very little locality of
> reference for the cache to compensate for. It moves through at least
> 512k for pass one, the
On Tue, Dec 16, 2003 at 06:54:22PM -0700, Slepp Lukwai wrote:
> As a side note, I'm also using a 200Hz timer, instead of the standard
> 100Hz. Though I don't see this doing anything but making it quicker, as
> it reduces latency on scheduling, while slightly increasing scheduler
> overhead and cont
On Tue, 2003-12-16 at 23:17, Bernhard Praschinger wrote:
> > -M 0: 2m 11.9s
> > -M 1: 2m 10.6s, -1.3s
> > -M 2: 1m 27.7s, -44.2s
> > -M 3: 1m 26.5s, -45.4s
> That values look much better. :-)
> Now you have seen the mpeg2enc can go faster.
It's like it used to be. :> I'm going to try it on a full
Just a side note, I find it interesting your name is Andrew Stevens,
whereby mine is Stephen Andrew (middle name).
On Tue, 2003-12-16 at 14:41, Andrew Stevens wrote:
> Yep. You should (in theory) get a lot closer to that with the current
> MPEG_DEVEL branch mpeg2enc. However, your scaling is
On Tue, 2003-12-16 at 10:27, Steven M. Schultz wrote:
> On Tue, 16 Dec 2003, Slepp Lukwai wrote:
>
> > Tried it without any options, same effect. I'm definitely seeing nowhere
> > near 40% speedup, which is what boggles me. I expected at least
> > reasonable gains of 25%.
>
> I think that h
On Tue, 2003-12-16 at 13:15, Richard Ellis wrote:
> On Tue, Dec 16, 2003 at 12:33:52AM -0700, Slepp Lukwai wrote:
> >.. It's a dual Athlon, which inherently means 266FSB (DDR 266),
> > though the memory is actually Hynix PC3200 w/ timings set as low as
> > they go on this board (2-2-2), which gives
On Tue, 2003-12-16 at 12:57, Bernhard Praschinger wrote:
> Could you run a few test (please). Get some frames (100-1000) as yuv
> format. I gues that should be possible even with transcode. ;)
> (I do not use transcode so I can't help, or get the test streams on
> mjpeg.sf.net)
With about 1010 fr
On Tue, 2003-12-16 at 12:33, Andrew Stevens wrote:
> Hi all,
>
> First off a bit of background to the multi-threading in the current stable
> branch. First off:
>
> - Parallelism is primarily frame-by-frame. This means that the final phases
> of the encoding lock on completion of the referenc
Hallo
> On Tue, 2003-12-16 at 12:57, Bernhard Praschinger wrote:
> > Could you run a few test (please). Get some frames (100-1000) as yuv
> > format. I gues that should be possible even with transcode. ;)
> > (I do not use transcode so I can't help, or get the test streams on
> > mjpeg.sf.net)
>
On Tue, Dec 16, 2003 at 12:45:48PM -0800, Trent Piepho wrote:
> On Tue, 16 Dec 2003, Richard Ellis wrote:
> > > 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1. At 550MB/s,
> > > it SHOULD be able to push enough to keep the frames encoding at
> > > 100% CPU, in theory.
> >
> > Yes, but just one
Hi Steven, Trent,
> But what about bit allocation? You need to know how big the last GOP was
> to figure out how many bits you can use for the next GOP.
Actually, this is not such a big deal provided the GOPs are well seperated.
Simplifying a little, you just need to ensure that you have >= t
> Produces this (approximately 1010 frames), encoding times (real time /
> user time, gives a bit of a view as to how busy the CPUs were during the
> real time, optimal should be 1m realtime, 2m user time, right? and
> average system time was 3.0s, with +/- 0.2s for all tests):
...
Yep. You sh
On Tue, 16 Dec 2003, Trent Piepho wrote:
> But what about bit allocation? You need to know how big the last GOP was to
> figure out how many bits you can use for the next GOP.
Well, you know the maximum bitrate allowed (via the -b option) - could
encode each GOP with tha
On Tue, 16 Dec 2003, Steven M. Schultz wrote:
> > First off a bit of background to the multi-threading in the current stable
> > branch. First off:
> >
> > - Parallelism is primarily frame-by-frame. This means that the final phases
> > of the encoding lock on completion of the reference frame
On Tue, 16 Dec 2003, Richard Ellis wrote:
> > 6 or 8GB/s L2. The cache size is 256k/CPU, 64k L1. At 550MB/s, it
> > SHOULD be able to push enough to keep the frames encoding at 100%
> > CPU, in theory.
>
> Yes, but just one 720x480 DVD quality frame is larger than 256k in
> size, so a 256k cache
On Tue, 16 Dec 2003, Andrew Stevens wrote:
> Hi all,
>
> First off a bit of background to the multi-threading in the current stable
> branch. First off:
>
> - Parallelism is primarily frame-by-frame. This means that the final phases
> of the encoding lock on completion of the reference fram
On Tue, Dec 16, 2003 at 12:33:52AM -0700, Slepp Lukwai wrote:
> On Mon, 2003-12-15 at 21:08, Richard Ellis wrote:
> > Additionally, why kind of memory do you have attached to the cpu's?
> > Mpeg encoding is very memory bandwidth hungry to begin with, and with
> > two cpu's trying to eat at the sam
On Tue, Dec 16, 2003 at 09:27:53AM -0800, Steven M. Schultz wrote:
>
> Perhaps Richard Ellis could chime in with his experiences with -Q
> ;)
It seems that with the right set of options, and the right set of
input data, -Q can help to create some really nasty looking
artifacts.
> > And again,
Hallo
> Top output of the 3 running mpeg2enc with mjpegtools 1.6.1.92 on the
> Dual Athlon MP 2100+. That's with -M3. Top usage is 2% and the decoder
> is only about 10% intermittent. So, I'm neglecting those for the moment.
> I'm using transcode, by the way (though I found the same results when
>
Hi all,
First off a bit of background to the multi-threading in the current stable
branch. First off:
- Parallelism is primarily frame-by-frame. This means that the final phases
of the encoding lock on completion of the reference frame (prediction and DCT
transform) and the predecessor (bit
On Tue, 16 Dec 2003, Slepp Lukwai wrote:
> Tried it without any options, same effect. I'm definitely seeing nowhere
> near 40% speedup, which is what boggles me. I expected at least
> reasonable gains of 25%.
I think that has to do with the -I setting...
> Sorry, upon further testing, I
On Mon, 2003-12-15 at 22:44, Bernhard Praschinger wrote:
> Hallo
>
> > I was doing some testing of both the older version (1.6.1.90) and the
> > newer version of mpeg2enc (1.6.1.92). First off, the .92 was somewhat
> > faster to begin with. However, in both cases, after multiple tests and
> > tryi
On Mon, 2003-12-15 at 20:27, Steven M. Schultz wrote:
> On Mon, 15 Dec 2003, Slepp Lukwai wrote:
>
> > faster to begin with. However, in both cases, after multiple tests and
> > trying different things, I can't get the SMP modes to be fast at all. In
> > fact, they're slower than the non-SMP modes
On Mon, 2003-12-15 at 21:08, Richard Ellis wrote:
> What program are you using to monitor CPU usage while mpeg2enc runs?
> Some versions of top (if you are using top) report percentages as a
> roll-up of the whole SMP machine, so that 3x33% usage really means
> 99% utilization of the machine, wher
Hallo
> I was doing some testing of both the older version (1.6.1.90) and the
> newer version of mpeg2enc (1.6.1.92). First off, the .92 was somewhat
> faster to begin with. However, in both cases, after multiple tests and
> trying different things, I can't get the SMP modes to be fast at all. In
On Mon, Dec 15, 2003 at 01:46:32AM -0700, Slepp Lukwai wrote:
> ...
>
> Of course the -M 3 changes to 2 and 0 in testing. I also tested it
> with and without the buffer program in the list. Another notable
> thing, is that with the newest version .92, -M3 causes three 33%
> usage processes to exis
On Mon, 15 Dec 2003, Slepp Lukwai wrote:
> faster to begin with. However, in both cases, after multiple tests and
> trying different things, I can't get the SMP modes to be fast at all. In
> fact, they're slower than the non-SMP modes.
I think I see what you're doing that could cause tha
I was doing some testing of both the older version (1.6.1.90) and the
newer version of mpeg2enc (1.6.1.92). First off, the .92 was somewhat
faster to begin with. However, in both cases, after multiple tests and
trying different things, I can't get the SMP modes to be fast at all. In
fact, they're s
35 matches
Mail list logo