As a response to Ralph Hauwert’s article I created a little example of what can be achieved using plain ActionScript 3 syntax. Ralph has put up a great example of how you can wire things like Alchemy, ActionScript and PixelBender together to achieve an astonishing result.
However I asked myself if it is possible to achieve the same result without making use of Alchemy, PixelBender or bytecode manipulation. I asked the guys sitting with me in the office to compare the results and they were unfortunately very different on various machines. Sometimes my version is faster, sometimes the version from Ralph is faster and sometimes they are about the same.
Now there are some very important things to note here and I am surprised that I got so close. Since Ralph is making use of PixelBender the number crunching is done on multiple cores. Something that is not possible with the ActionScript version which is the real bottleneck. And there is another big difference. Ralph’s calculations are done in 32bit while I am using 64bit precision. Therefore I am happy with the result and it shows that using pure ActionScript is still a good choice.
In order to optimize the code I used a linked list for the particles and minimized the comparisons between different data types. Here is the result.
Sources:
Very impressive! How exactly did you manage this without the new ByteArray opcodes? I thought the performance cost of 300k writeFloat() was too much as demonstrated in Ralph’s code?
Im working on a haXe version to get around the fiddley alchemy bit, hope to have something to show and tell soon.
Sorry make that a writeUnsignedInt()
Sorry, I forgot to add the sources which are now available. I am not using a ByteArray at all. Since there is no PixelBender and Alchemy involved I used a Vector.<uint> (which sucks) and a linked list of particles.
Great example. I hoped my previous post on Alchemy (the heresy one), already shined through enough on how I felt about this. It’s merely a tool to achieve the speed with the same vm easier. I wish the native AS3 compiler was just better.
And… how are you doing it? What’s the difference between Ralph’s AS3 version of the effect and yours?
Very fast on my P4 here (Northwood 2.4GHz). Ralph’s example freezes Firefox, while yours is very smooth and ‘light’ on the cpu. How odd?
Woh, lots of comments while I posted mine. Ignore mine :)
On a macmini core2duo 2ghz + linux yours is 2fps faster.
Yes it is faster for me too — and I am using Linux as well :)
At first I was amazed by the beauty of the pattern. Then I thought, ‘It’s easy to display a bunch of particles. The trick is moving them.’ Then I went to close the tab and dragged my mouse across the SWF. Twirl…
Now I’m impressed. The particles spin with zero lag. Nice work and thanks for the code.
Very interesting dude, thanks ;)
cool! no fps changed with low->high quality got 22/32 here, ;D
on my machine core 2 duo 1.7Ghz(Win XP, Firefox 3) your version runs at 23fps, while Ralph’s alchemy/pixelbender version runs at 13fps.
great experiment dude!!!
On my machine, both versions run with the same FPS. Great job!
20-21 fps running on battery on macbook with 2.1 GHz core 2 duo. Damn!
Perfect performance on my MacBook Pro 2.2 GHz Intel Core 2Duo, OSX 10.5
I’m getting 30/32 FPS on my LG Intel Core 2 Duo 2.53 GHz – sweet!
Yours is 14/15 FPS running on a AMD Semprom 1.8GHz.
Hmm I am getting nothing less than solid 30/32 fps in chrome on a T61p with the centrino cPro. Maybe it’s chrome?
I’m going to check out these sources, did you use papervision at all for the rendering? I’ll probably post an entry on my blog on this one, keep up the great work!
DjacK
There is no use of PV3D in this example. Most of the techniques used can be found at http://wiki.joa-ebert.com/
~24/32fps …and it´s kinda cold-freezing my browser when running for a while… (Intel Core 2 CPU 2.00 GHz)
I do get better framerate results with the pure AS3 version. This one runs very stable at 20/30 while the PB+alchemy one nearly crashes my browser.
Is PB causing the trouble then? Maybe different also when using PB async instead of synchronous?
The main benifit of the AS3 version is that data does not have to be pushed around all the time from AVM to PixelBender and vice-versa.
Yeah this one is much faster and smoother on my machine. Nice example!
I have 6 to 10 fps here, which is far faster than Ralph’s.
Yours is definitely faster(25 fps vs 17fps) and it just feels nicer/smoother too. I’m using Opera on Windows using Flash Player 10,0,12,36
Very interesting results … on a Core2Duo @3GHz …
Ralph’s version: 22-25/60 fps, CPU usage 60-70% (because he uses multi core).
Your version: 32/32 fps, CPU usage: 30-50%.
Overall your version runs more smooth here.
Great job, Joa.
Ralph’s: 14fps
Yours: 22fps
on a AMD 2.0 GHz Dual Core.
wie kann das sein, dass deine version flüssig läuft (21fps) ich aber exakt deine sources kompiliere (Flex 4) ich nur 3fps erreiche? wie hast du kompiliert?
500.000 Particle experiment:
http://www.yagizgurgul.com/blog/2009/04/24/500000-particle/
Very nice. I really like the way you loop through particles.
It makes me want to do bench mark speed tests with standard for(var i:int = 0; i < particles.length; ++i){} loops.
Very nice!
Pure AS3 is running faster for me. The Alchemy version crashed FireFox the first time it loaded.
This was a fascinating read, thanks!
I thought this might make a good example of how much faster linked lists are than Arrays, but I did a version that uses an array instead of the list and it was only about 1 fps slower (31 instead of 32 fps on my machine) and I’m not even typing the Array values…
Does that sound odd to you? I know haXe has a fastList which uses a linked list structure because it is supposed to be much faster… kind of puts a crimp in a talk I was going to give on optimization… :-/
I’d like to hear any insights you might have about this.
here it is in haxe for those interested :)
http://webr3.org/blog/haxe/flash-10-massive-amounts-of-3d-particles-with-haxe/
I tested both movies in an old pc. The AS3 version was much faster, around 17fps while Ralph’s one was struggling to reach 9fps. PC specs: AMD Athlon 64 2400MHz, 1536 MB (DDR SDRAM)
If you replace
while( –n > -1 ) buffer[ n ] = 0×000000;
with a simple BitmapData.fillRect and another BitmapData.getVector() it’s actually faster. What’s even better is that performance won’t depend that much on the size of the BitmapData anymore but (almost) only on the number of particles.
Patrick: You will create a new Vector each frame in that case with the size of the BitmapData. This means every so-often the GC will kick in and has to clean up a lot …
That’s what I thought too. And to my surprise, the framerate really is about 2-3fps higher and stays very stable. The GC doesn’t have that much to do in the rest of my stuff so maybe that’s why it’s okay?
I found that using two vectors shaves off a few milliseconds:
private const _empty: Vector. = new Vector.( 550 * 400, true );
I fill this with the 0xFFFFFF when the class is instantiated. Then in the enter frame method I clone the vector:
var buffer: Vector. = _empty.slice();
I’ve left it running for some time, and don’t seem to see any issue with GC.
Sorry that should have been “0×000000″ not “0xFFFFFF”.
13/32 on your version, very smooth I like it.
10/60 on Ralph’s. A bit of a lag. Nearly froze my Firefox – everything but that tab was not working. Thankfully I had Fire Gestures to close the tab, phew!
This is a real bummer for Pixel Bender, I’d really thought we could play with shaders in Flash. =(