Tag Archive for 'optimization'

Tweening and object pools

Some days ago I was writing a tween engine for our library here at Hobnox and when I was comparing it with the other ones out there which are quiet popular (Tweener, TweenLite) I was pretty surprised that my engine was performing about 10fps faster than TweenLite when tweening 1000 objects. It was 40fps for me, 28fps for TweenLite and 14fps for Tweener. It is not only faster but the memory usage is constant and small — on a Mac it was constant 11mb for my engine versus up to 22mb for TweenLite for instance.

Since I am currently not allowed to post the source-codes I want to talk a little bit about the ideas behind the engine.

Continue reading ‘Tweening and object pools’

ActionScript 3 optimization techniques

If you attended my session about bytecode inlining at FITC Toronto you might remember that I said I have a document covering a lot of common optimization techniques.

After talking to so many of you and getting nice feedback I decided to release it in a state which is a work in progress.

My main idea is still to put this into a Wiki so everybody can share his or her knowledge. Since I am pretty busy the next weeks I do not know when this will happen so for now I hope you get some use out of the PDF.

Converting a Number to 4 bytes and vice versa

For the hell of it I wanted to implement the fast inverse square-root that has been flying around in the Quake3 source-code for instance. Fantastic game by the way. There has been quiet a lot of talk about this algorithm. So can we do that with Flash? No we can not. It would be nice but it is simply not possible because you can not just map a Number into an int. With languages like C++ you can. So first of all: I found already an approach that is very fast but not as nearly as good as the original algorithm. It is also not the same “magic”. It is just some newton iterations and faster but worse then 1/Math.sqrt(x).

So what do you need to implement the original inverse square root algorithm? A way to convert any number to its four byte representation aka the way it is stored in memory. The original algorithm is freakin fast because it does not have to calculate the four bytes. It can just juggle with them. Since we have to insert some heavy math here our approach will always be slower but it helped me understanding IEEE single precision numbers a lot. Maybe anyone has a use for this?

Remember the isqrt function is WAY slower than doing 1/Math.sqrt(x)!

[as]private function isqrt( x: Number ): Number
{
var xhalf: Number = x * 0.5;
var i: int = toFloat( x );
i = 0×5f3759df - ( i >> 1 ); // the magic first guess
x = toNumber( uint(i) );
x = x * ( 1.5 - xhalf * x * x );
return x;
}

private function toNumber( x: uint ): Number
{
//– precheck here
if ( 0xffc00000 == x )
{
return NaN;
}
else
if ( 0×7f800000 == x )
{
return Number.POSITIVE_INFINITY;
}
else
if ( 0xff800000 == x )
{
return Number.NEGATIVE_INFINITY;
}

//– can not shift because flash doesnt like it…
var v: Boolean = uint( x & uint(0×80000000) ) == uint(0×80000000);
var e: int = ( uint( x >> uint(23) ) & 0xff ) - 127;
var m: int = x & 0×7fffff;
var n: Number;

//– if e > 0 we can do 2^e very easy by shifting it
//– otherwise we do 1/(2^e) and 2^e can be achieved by shifting it
if ( e > 0 )
{
n = ( 1 + m / ( 1 << 23 ) ) * ( 1 << e );
}
else
{
n = ( 1 + m / ( 1 << 23 ) ) * ( 1 / ( 1 << -e ) );
}

if ( v )
{
return -n;
}

return n;
}

private function toFloat( x: Number ): uint
{
//-- do a quick precheck
if ( Number.POSITIVE_INFINITY == x )
{
return 0x7f800000;
}
else
if ( Number.NEGATIVE_INFINITY == x )
{
return 0xff800000;
}
else
if ( isNaN( x ) )
{
return 0xffc00000;
}

//-- calculate ieee single precision
var v: Boolean, e0: int, e1: int, m: int;
var f: Boolean = x > 0;

if ( x > 0 )
{
e0 = int( Math.log( x ) / Math.LN2 );
m = ( ( x / ( 1 << e0 ) ) - 1 ) * ( 1 << 23 );
}
else
{
v = true;
e0 = int( Math.log( -x ) / Math.LN2 );
m = ( ( -x / ( 1 << e0 ) ) - 1 ) * ( 1 << 23 );
}

e1 = e0 + 127;

//– usually here would go the test for e1 = 0×00 or e1 = 0xff
//– BUT we tested already and that’s why we do not do it here

//– be save
e1 &= 0xff;
m &= 0×7fffff;

//– merge into 4bytes
var float: uint;

if ( v )
{
float = 0×80000000;
}

float |= e1 << 23;
float |= m;

return float;
}[/as]

Analyzing bitmaps bigger than 512×512

Remember I promised to release some paper on performance optimizations for the Flash Player? Well currently there is just to much to do in order to get it done. But I do not want to hide this from you any longer.

Analyzing big BitmapData objects is a pain. For a nice contrast correction, world luminance or any other histogram based filter you need to know about all the pixels (if you want to make it accurate).

On my way from Germany to France a month ago I was thinking about this problem and I found something that is strangely enough faster to read a whole picture — but only for big pictures. For smaller ones fortunately it does not matter that much since the time you have to spend on reading them is around 30ms.

So basically I read through a BitmapData like this:
[code]var x: int;
var y: int;

for (;x < width; ++x)
for (y = 0; y < height; ++y)
color = bmp.getPixel( x, y );[/code]

I guess it is very common to do that. Now this goes row by row (or column by column) through your BitmapData. Now just for fun I was testing what happens if you cycle through your BitmapData without ever resetting x and y. And voilá it is faster. On a 2048x2048 BitmapData it is 336ms vs 739ms. That is a nice performance increase. But funny enough it is slower for smaller bitmaps.

Here are the results (cycling vs. for-loop):

Iterations Cycle[ms] Simple for[ms]
128×128 2.7 2.7
128×128 11.1 10.3
512×512 29.7 28.3
1024×1024 95.4 106.3
2048×2048 355.8 738.6

And here is the code for the faster version:
[code]var y: int;
var x: int;
var m: Boolean = true;

while (true)
{
color = bitmapData.getPixel(m ? x++ : –x, y);

if (x == width || x == 0)
{
if (y++ == height)
break;
m = !m;
}
}[/code]

Lists faster than Array

Since we are planing to release a detailed paper about AS3 optimizations and techniques I was thinking about another optimization for one of the daily problems. Looping through an array that contains objects. The fastest code to do this is

[as]for (;i {
v0 = array[i];
l = v0.length;
}[/as]

v0 is a local variable for my custom class which looks like this

[as]class Vector3D
{
public var x: Number;
public var y: Number;
public var z: Number;

public function Vector3D( x: Number, y: Number, z: Number )
{
this.x = x;
this.y = y;
this.z = z;
}

public function get length(): Number
{
return Math.sqrt( x * x + y * y + z * z );
}
}[/as]

Looks like from a real scenario. You can replace the vector class also with a particle class or whatever. It does not matter. The point is this loop takes 272ms for 1000000 iterations. It is pretty good already but if you add one more parameter to the Vector3D class you can get even faster. This is my modified Vector3D called LVector3D

[as]class LVector3D
{
public var x: Number;
public var y: Number;
public var z: Number;

public var next: LVector3D;

public function LVector3D( x: Number, y: Number, z: Number )
{
this.x = x;
this.y = y;
this.z = z;
}

public function get length(): Number
{
return Math.sqrt( x * x + y * y + z * z );
}
}[/as]

There is a public variable now called next. This variable will always point to the next LVector3D object. The exact same loop takes about 260ms.

[as]var v0: LVector3D = start;
for (;i {
l = v0.length;
v0 = v0.next;
}[/as]

You see how nice this looks? No ugly [] etc. Our variable is directly typed. And we can speed it up even more! If we replace the for(...) part with while( v0 ) it goes down to 249ms. What is also nice about a linked list is that you could add an easy splice functionality or functions to remove elemts from a list. You just have to change the next reference. If you also add a prev variable it becomes even more easy and you could loop backwards.

Building that list of vectors is also very easy. And it looks beautiful. Compare this

[as]for ( var i: int = 0; i < n; ++i )
array[ i ] = new Vector3D( i, i, i );

var v3d: LVector3D = start;

for ( var j: int = 1; j < n; j++ )
v3d = v3d.next = new LVector3D( j, j, j );[/as]

The start variable is defined in the class as private const vector3d: LVector3D = new LVector3D( 0, 0, 0 );. Very nice. Of course not useful in every case but sometimes if you have to squeeze out what is possible here is another method. So stay tuned for our paper.

How high can you get?

:o)Actually this frame-rate is not faked. It is not the actual screen refresh rate. This means you do not have 65.000 new screens during one second, but, and that is the funny thing: you have 65.000 executions of a function in one second. This is an evil hack and slows down the whole display stuff of Flash. So you can not use this in any of your projects but it is funny to watch. I think you can get higher, I was just lazy and got tired searching for the best settings.

So there is no use for this. Just fun to watch :)

By the way the initial idea came from André Michelle. The competition is open, ha! How high can you get?

The framerate hack

976 fpsDo you worry about your frames-per-second? At least I do. So in general you can do something like this:

[SWF(frameRate='1000',width='800',height='600',backgroundColor='0xCCCCCC')]

This results for me in a constant framerate of 250. It seems to me that there is a limit that Flash puts on the framerate. But 250 is lame to me. Why not go for the 1000? It was a lucky find because I used stage.frameRate = 25; at one point and then wanted to switch back to 1000. But the result was not a constant framerate of 250 but… 976. Seems to me that Flash is not using a limit at this point. Sweet :o)

Update:
You can try using [SWF(frameRate='1000')] and -default-frame-rate 1000 as a compiler argument. Both will result in the same if you do trace( stage.frameRate ). The maximum that you can set is 0xff so Adobe is using a byte here. But inside the Flash player you are allowed to use more than a byte.




Close
E-mail It