Remember I promised to release some paper on performance optimizations for the Flash Player? Well currently there is just to much to do in order to get it done. But I do not want to hide this from you any longer.
Analyzing big BitmapData objects is a pain. For a nice contrast correction, world luminance or any other histogram based filter you need to know about all the pixels (if you want to make it accurate).
On my way from Germany to France a month ago I was thinking about this problem and I found something that is strangely enough faster to read a whole picture — but only for big pictures. For smaller ones fortunately it does not matter that much since the time you have to spend on reading them is around 30ms.
So basically I read through a BitmapData like this:
[code]var x: int;
var y: int;
for (;x < width; ++x)
for (y = 0; y < height; ++y)
color = bmp.getPixel( x, y );[/code]
I guess it is very common to do that. Now this goes row by row (or column by column) through your BitmapData. Now just for fun I was testing what happens if you cycle through your BitmapData without ever resetting x and y. And voilá it is faster. On a 2048x2048 BitmapData it is 336ms vs 739ms. That is a nice performance increase. But funny enough it is slower for smaller bitmaps.
Here are the results (cycling vs. for-loop):
| Iterations | Cycle[ms] | Simple for[ms] |
| 128×128 | 2.7 | 2.7 |
| 128×128 | 11.1 | 10.3 |
| 512×512 | 29.7 | 28.3 |
| 1024×1024 | 95.4 | 106.3 |
| 2048×2048 | 355.8 | 738.6 |
And here is the code for the faster version:
[code]var y: int;
var x: int;
var m: Boolean = true;
while (true)
{
color = bitmapData.getPixel(m ? x++ : –x, y);
if (x == width || x == 0)
{
if (y++ == height)
break;
m = !m;
}
}[/code]




hey joa, i copy & pasted your code and strangely enough my results are: default for loop 816ms and cycle 1757ms (2048×2048). is there perhaps something wrong or missing in your code snippet ?
Oh, interesting! Thanks for sharing!
Does anybody know why this is happening? I mean, the difference according to your tests is quite big..
Sounds amazing. I´m looking forward to copy´n´paste your little snippet in my “oldschool tunnel efx” - maybe it´ll boost up the performance a bit…
Just have to find some sparetime! ;)
Cheers
pwd
i found the mistake - i copied the snippet into a class which extends a Sprite and since width and height were never defined locally it took the properties of the display object :)
Well yes ;) You have to keep in mind that this is the width and height of the BitmapData. Is it now faster for you too?
I run each test 15 times and take the average of that. w and h are defined as 2048. bitmapData is a BitmapData object with width and height 2048 and no transparency.
The code:
private function test19(): int{
var t0: int;
var t1: int;
var c0: int;
var x: int;
var y: int;
t0 = getTimer();
for ( ; x < w; ++x )
for ( y = 0; y < h; ++y )
c0 = bitmapData.getPixel( x, y );
t1 = getTimer();
return ( t1 - t0 );
}
private function test20(): int
{
var t0: int;
var t1: int;
var c0: int;
var y: int;
var x: int;
var m: Boolean = true;
t0 = getTimer();
while ( true )
{
c0 = bitmapData.getPixel( m ? x++ : --x, y );
if ( x == w || x == 0 )
{
if ( y++ == h )
break;
m = !m;
}
}
t1 = getTimer();
return ( t1 - t0 );
}
My wild guess is that those speed differences are caused by the CPU’s prefetch cache. On my notebook the difference is not as big as you have measured - it’s only 1660ms (cycle) vs. 1850ms (simple) for a 2048×2048 BitmapData.
Usually it should be faster to loop through a bitmap in forward direction(aka x from 0 to width and y from 0 to height) since that is how a bitmap lies in memory and the processor can already prefetch the adjacent memory slots into the cache.
Could you maybe publish the whole class that you used for testing? Like this we could get comparable results.
BTW - I noticed that for non-transparent bitmaps it is actually faster to read them with getPixel32().
Well… I’ve modified your cycling code so it does not meander through the bitmap but just goes left->right, top->down and at least for me it is faster. Could you compare that on your machine?
private function test22(): int
{
var t0: int;
var t1: int;
var c0: int;
var y: int;
var x: int;
var m: Boolean = true;
t0 = getTimer();
while ( true )
{
c0 = bitmapData.getPixel32( x , y );
if ( x == w )
{
if ( y == h )
break;
x = 0;
}
}
t1 = getTimer();
return ( t1 - t0 );
}
On my machine this is slower compared 370 for this one to 350 average for the other one. But in fact I have some programs running now so the test is not very accurate.
I will do some more tests when I am done with work. And I will send you the test environment later as well.
But thinking about the memory structure. I think they store the BitmapData data (without the overhead) like rgb *bmp = new rgb[width*height]; (maybe not with the custom typedef — im no c expert). So it is a 1d array. The fastest way to loop through that would be a pointer of course. And if you can not use that it should be the loop you mentioned.
But the fact that the version with the while is slower for me now or at least not faster makes me a little bit perplex. But I will do some more tests when I have the time :)
Actually I would like to know also why it is slower for smaller bitmaps. Maybe yours will be as fast as the simple for even if the size is below 1024.
I’m also getting big variations for certain tests. For example did I test exactly the same code in two different functions (test1() and test2()) and they took different times to run. Seems like this depends on the room temperature. Also my getPixel() vs. getPixel32() test suddenly was slower again for getPixel32().
It would be really good to have a reliable setup for making these kind of tests.
Personally I have a test environment where I run each test multiple times and get the average out of that. Possible parameters to change are number of tests, test iterations and some other things. A test is started after a delay of 10sec so it does not go with the launch of the Flash player. And tests have to be done in the release player because the debug player gives different results sometimes.
But even that is sometimes not telling the truth. André is using a Float class inside an array to store Numbers. A test showed that it is not faster to do that. Then he tested it in an audio filter and the Float was much faster. I tried to figure it but at the end the only answer would be reading of an array is not that heavy but writing to an array is which the test was not doing (just reading).