Monthly Archive for August, 2009

UML generation using Dump

playerglobal.abc UML

Dump has now another cool feature: UML diagram generation. The UML diagram is exported in DOT format. I think this is a really cool feature because the graph is built by analyzing a SWF file and you get it for free. I will probably create a different tool to make full use of the UML generation since you could link multiple files together for a complete coverage. Since Graphviz is not able to underline text I have choosen to use a dollar sign for static methods. You will also get proper parameter names if you compile your SWF file in debug mode.

This is an example UML diagram for the playerglobal.swc. In order to create it I took the playerglobal.abc from the Tamarin sources and the command was java -jar dump.jar -input playerglobal.abc -uml.

Here are two example representations but be careful. The PNG size is 30831×6232 and might crash your browser. Chrome can display the PNG for me but is not able to show the SVG correct. Firefox displays the SVG very well. You can download the PNG and open it in IrfanView or Photoshop to be safe.

Inheritance Graphs

Inheritance GraphThe Dump tool is now able to export an inheritance graph for a given ABC/SWC/SWF file. This is a very easy and nice way to look at the classes and their relationships. The small image shows the graph for one ABC file of the AudioTool. I think this it is pretty neat.

You will need a program like Graphviz to visualize the exported DOT file. If you want to export the inheritance graph you basically writejava -jar dump.jar -input file.swf -ig. I think this shows a really cool feature of Apparat. It is also very easy to reverse engineer a UML diagram. I am not interested in such a feature but maybe someone else.

Update: Here is a full-size example for the inheritance graph of the unfinished ImageProcessing library I am currently working on.

ImageProcessing Library Inheritance Graph

Dump Disassembler

A very important tool for SWF manipulation is a way to debug them and to have a look at the bytecode. Dump is a tool that does nothing else but listing all SWF Tags and ABC files the way Apparat represents them internally.

I know that there are a lot of other tools out there. swfdump and abcdump are great already. I have used swfdump from the Flex SDK before. But I dislike how they represent namespaces and the fact that not all properties are always shown. Dump is simple. It lists everything and uses the naming form the avm2overview.pdf.

Apparat is now Open Source

The full source code of Apparat is now available at GoogleCode. It is the whole framework behind TDSI and Reducer.

Apparat is released under the GNU Lesser General Public License. Please contact me if you want to contribute to this project. Maybe someone is interested in writing an Ant task for Reducer? I am also happy to receive feedback if you have used the framework to build something cool with it.

Please join the Apparat Discussion Group if you are interested in collaboration.

Flirting With Silverlight

Lately a lot of discussion had the ActionScript language as its focus. People start complaining about complex language features but I think they are great because the end user will benefit from that. Yesterday evening I had my very first test with Silverlight — and I am really impressed.

It took me a short amount of time to port the strange attractor to Silverlight. I agree that this is may be not a fair comparison because I know C# already but have a look at the source code. I make heavy use of type inference and the Matrix4x4 class has the plus, multiply and array-index operator overloaded. The code is more readable. And besides: it executes really fast. Faster than my heavily optimized ActionScript version. Imagine I would write var bleh = 1.0 in ActionScript. Framerate would drop to something like zero. But this is sad since there is no reason for me to write var bleh: Number = 1.0. A modern compiler should be able to figure this out. haXe can do it, C# can do it, OCaml can do it and lots of others as well.

Remember: This was my first time using Silverlight. To achieve the same result with Flash you have to be kind of an expert in strange player and language “features”. Now tell me again that the end user will not benefit.

Reducer

Another spinoff from my current library to optimize SWF/SWC files. Reducer is a tool that will make SWF and SWC files significantly smaller. There is currently a huge problem with filesizes. If you use the [Embed] tag with PNG images they are not compressed at all. For the Hobnox AudioTool we have been using the Flash IDE to export all graphics so that they are smaller which was a pain.

Now with Reducer you are safe to use [Embed] and then run the tool afterwards. It will compress all lossless images and make them lossy. But usually a PNG can be compressed at 100% JPEG quality and you will still safe a lot of data.

Note: You will not loose alpha transparency when using Reducer. The SWF file format allows us to use a special compression where a PNG gets split up into its color and alpha channels. The color channels are encoded using the traditional JPEG algorithm with adjustable quality while the alpha channel is handled seperately. Transparency is always stored in a lossless fashion which means even with a low JPEG quality you will not get any compression artifacts for the alpha channel.

Update: Reducer is now open source!

This is an outrage!

First of all I think I have to clarify at least one thing. I have criticised Adobe in the past for a lof of reasons. Not because I do not like them or the technologies they produce but because I want to improve the Flash Platform. This is of course pure self-interest since Flash is a key technology for the Hobnox AudioTool.
Continue reading ‘This is an outrage!’

TDSI Examples

Everyone likes examples. So here are three examples using TDSI. The archive includes a ready-to-go FDT project with post-compile ANT tasks configured.

Example01

This is the old code of the already optimized attractor using the Memory API instead of a Vector.<uint>.

Example02

In this case there exists no Particle class at all and no linked list. The particle information is stored inside the memory as well. Particles are extended to a fourth value so indexing a particle can be done with a simple bitshift which is very fast.

Example03

The last example uses float instead of double values for the particles. The framerate stays the same which is really cool because the memory usage drops. Before a particle consisted of four doubles which is a total of 4 * 8b = 32b. In this example each particle takes up only 16b. There the memory difference is 0x4B0000b which is about 4.7mb in total.
And also the first version needs about 20mb on my machine which means about 12mb of RAM are not wasted. Pretty cool when thinking about devices with less memory.

By the way I just stumpled across a bug when using [Embed]. Hopefully it will be easy to fix.

TurboDieselSportInjection

I am definitly not good at choosing names for software projects. However TurboDieselSportInjection is a release of my experiments from yesterday. It is a spinoff from the whole framework and allows you to inline __bytecode and of course to use the new Memory API.

Hopefully you are kind enough to provide me with some feedback. I am especially interested in Exceptions that occur when reading or writing ABC files. Have fun!

Update: TDSI is now open source!

Alchemy for ActionScript

Today I had to do something else than backend development and since FOTB is getting closer and I could not really continue working on TAAS I decided to add something which is easy to implement and has a huge benifit: Alchemy support in ActionScript.

So what is the idea? TAAS is part of a framework I developed to manipulate SWF, SWC and ABC files. The main focus are of course ABC files since they contain the bytecode which gets executed.
Part of the framework are tools for control flow analysis, various bytecode analyzers and also a search-and-replace system which work on a bytecode level. There are for instance pattern matchers that search for bad code produced by the ASC and replace the match with a more performant set of instructions.

With all those weapons in my arsenal I thought it should be a walk in the park to implement the Alchemy features in a way that makes sense. So the first idea is to have the old functionality AS3C had but more robust. AS3C had a feature that was the __asm function which allowed you to inline instructions. The new framework comes with the old __asm and also another cool method: __bytecode! This will inline raw bytes. This means also you would have to know all the indices for variables you want to use from the constant pool in advance so __asm will still be your friend.

With the __bytecode method it is already possible to use all Alchemy features again. It would also be possible with the __asm method but writing plain bytes is simply more elitist. In order to make it easy for the developer I want a high-level API. Having a class with some static methods is nice of course but also slow. Alchemy is fast because those opcodes that write and read from a ByteArray are no method calls. They are low-level FlashPlayer features.

The first attempt was to write a Memory class that allows you to use the Alchemy features. This class contains raw bytecode implementations and ActionScript code. This means if you do not use the optimizer everything will still work — only 1000 times slower. When looking at the memory class there is another tool of the framework that becomes very helpful. Both the __bytecode and ActionScript stuff should not co-exist with each other. So when we inline the bytecode a dead-code-elimination will simply cleanup afterwards. Since the 0x47 byte for instance is “ReturnVoid” the ActionScript code which would follow afterwards can be dropped. That code is now unreachable.

Step two is to replace all calls to the Memory class with the correct Alchemy opcode. This was really simple and the result is a really really fast way to access a ByteArray while still maintaining a high comfort. Of course one might think now that the __bytecode method becomes useless since no methods of the Memory class are called at all. But if anyone is crazy enough to access the Memory class untyped with a runtime namespace for instance you are still happy to have the code optimized inside. In some circumstances it is simply impossible to figure out that someone called Memory.writeByte(). End of the story: your calls to a ByteArray are always optimized in the best way possible.

This is an example of the Memory.readByte() method before applying optimizations:

0x000000       GetLocal0
0x000001       PushScope
0x000002       FindPropStrict       QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode")
0x000004       PushShort            0xd1
0x000007       PushByte             0x35
0x000009       PushByte             0x48
0x00000b       CallPropVoid         QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode"), 3
0x00000e       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000010       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x000012       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x000014       GetLocal1
0x000015       SetProperty          QName(PackageNamespace(""), "position")
0x000017       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000019       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x00001b       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x00001d       CallProperty         QName(PackageNamespace(""), "readUnsignedByte"), 0
0x000020       ReturnValue

The same method after inlining the bytes and applying various other analysis like dead-code-elimination:

0x000000       GetLocal0
0x000001       PushScope
0x000000       GetLocal1
0x000001       GetByte
0x000002       ReturnValue

This is an example of the famous inverse square root using the Memory API:

private function invSqrt( value: Number ): Number
{
	var half: Number = 0.5 * value;
	Memory.writeFloat( value, 0 );
	Memory.writeInt( 0x5f3759df - ( Memory.readInt( 0 ) >> 1 ), 0 );
	value = Memory.readFloat( 0 );
	value = value * ( 1.5 - half * value * value );
	return value;
}

The same method before optimization in bytecode representation:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x000008       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x00000d       CallPropVoid         QName(PackageNamespace(""), "writeFloat"), 2
0x000010       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000012       PushInt              0x5f3759df
0x000014       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000016       PushByte             0x0
0x000018       CallProperty         QName(PackageNamespace(""), "readInt"), 1
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000021       CallPropVoid         QName(PackageNamespace(""), "writeInt"), 2
0x000024       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000026       PushByte             0x0
0x000028       CallProperty         QName(PackageNamespace(""), "readFloat"), 1
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

The same method after inlining the Memory API:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x000000       SetFloat
0x000012       PushInt              0x5f3759df
0x000016       PushByte             0x0
0x000000       GetInt
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000000       SetInt
0x000026       PushByte             0x0
0x000000       GetFloat
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

As you can see this is blazing fast. Now the next job is to finish TAAS. Once TAAS is complete even a method like the inverse square root might be inlined and optimized much better. I did a simple test using the Lorenz attractor from before and replacing the Vector.<uint> buffer with a ByteArray gave a performance boost of about 5fps. Afterwards I tried getting rid of the Particle class completly and the framerate dropped a little bit. But imagine having 300.000 particle’s x, y and z coodrinates stored in an Array. It was still faster than the old version but not as fast as combining the power of Alchemy with simple ActionScript optimizations like linked lists.