Tag Archive for 'as3c'

Apparat RC6: Say Hello To An Old Friend

Patrick Le Clec’h is an active committer to the Apparat project and recently we just merged his work into the main branch. With a couple of other changes this is now a good time for another release candidate. Patrick added a good old friend to Apparat: The __asm function. You might remember __asm from my work on the now deprecated AS3C project.

However the Apparat version is much better. First of all we have put a lot of work into Apparat to make such transformations rock-solid. AS3C had its issues and was never a reliable tool. But there are a lot of new great features Patrick implemented. You can mix AS3 in your bytecode as well. __as3 is the best friend of __asm. Because sometimes writing pure bytecode is very verbose and not necessary.

A simple trace('Hello World!') with pure bytecode would look like this. Please note the FindPropStrict and CallPropVoid operations which reference trace.

__asm(
  FindPropStrict(AbcQName('trace', AbcNamespace(NamespaceKind.PACKAGE, ''))),
  PushString('Hello World!'),
  CallPropVoid(AbcQName('trace', AbcNamespace(NamespaceKind.PACKAGE, '')), 1)
);

Finding the object in the correct namespace is often a very cumbersome task. Thanks to __as3 we can also write this in a much more conciese way.

__asm(
  FindPropStrict(__as3(trace)),
  PushString('Hello World!'),
  CallPropVoid(__as3(trace), 1)
);

Note that the ASM compiler will try to guess the required name once it is requested by an operation. You can use __as3 also for other tasks.

var x: int = 1;
__asm(
  FindPropStrict(__as3(trace)),
  __as3(x < 10),
  CallPropVoid(__as3(trace), 1)
);

This would trace "true" for instance. If you are curious about the ASM syntax I can recommend you using the dump tool. It produces code which is nearly __asm-ready. We will probably write another output so you can directly transform existing code to __asm calls.

If you are interested in some more examples the Apparat Math replacements make use of __asm now as well. IntMath is a good example for an inlined class where you are using maybe a simple method like IntMath.abs and the heavy lifting is done behind the scenes using inline assembler. To use the ASM expansion you have to process your SWF file with TDSI. It is by default turned on.

Alchemy for ActionScript

Today I had to do something else than backend development and since FOTB is getting closer and I could not really continue working on TAAS I decided to add something which is easy to implement and has a huge benifit: Alchemy support in ActionScript.

So what is the idea? TAAS is part of a framework I developed to manipulate SWF, SWC and ABC files. The main focus are of course ABC files since they contain the bytecode which gets executed.
Part of the framework are tools for control flow analysis, various bytecode analyzers and also a search-and-replace system which work on a bytecode level. There are for instance pattern matchers that search for bad code produced by the ASC and replace the match with a more performant set of instructions.

With all those weapons in my arsenal I thought it should be a walk in the park to implement the Alchemy features in a way that makes sense. So the first idea is to have the old functionality AS3C had but more robust. AS3C had a feature that was the __asm function which allowed you to inline instructions. The new framework comes with the old __asm and also another cool method: __bytecode! This will inline raw bytes. This means also you would have to know all the indices for variables you want to use from the constant pool in advance so __asm will still be your friend.

With the __bytecode method it is already possible to use all Alchemy features again. It would also be possible with the __asm method but writing plain bytes is simply more elitist. In order to make it easy for the developer I want a high-level API. Having a class with some static methods is nice of course but also slow. Alchemy is fast because those opcodes that write and read from a ByteArray are no method calls. They are low-level FlashPlayer features.

The first attempt was to write a Memory class that allows you to use the Alchemy features. This class contains raw bytecode implementations and ActionScript code. This means if you do not use the optimizer everything will still work — only 1000 times slower. When looking at the memory class there is another tool of the framework that becomes very helpful. Both the __bytecode and ActionScript stuff should not co-exist with each other. So when we inline the bytecode a dead-code-elimination will simply cleanup afterwards. Since the 0x47 byte for instance is “ReturnVoid” the ActionScript code which would follow afterwards can be dropped. That code is now unreachable.

Step two is to replace all calls to the Memory class with the correct Alchemy opcode. This was really simple and the result is a really really fast way to access a ByteArray while still maintaining a high comfort. Of course one might think now that the __bytecode method becomes useless since no methods of the Memory class are called at all. But if anyone is crazy enough to access the Memory class untyped with a runtime namespace for instance you are still happy to have the code optimized inside. In some circumstances it is simply impossible to figure out that someone called Memory.writeByte(). End of the story: your calls to a ByteArray are always optimized in the best way possible.

This is an example of the Memory.readByte() method before applying optimizations:

0x000000       GetLocal0
0x000001       PushScope
0x000002       FindPropStrict       QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode")
0x000004       PushShort            0xd1
0x000007       PushByte             0x35
0x000009       PushByte             0x48
0x00000b       CallPropVoid         QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode"), 3
0x00000e       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000010       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x000012       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x000014       GetLocal1
0x000015       SetProperty          QName(PackageNamespace(""), "position")
0x000017       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000019       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x00001b       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x00001d       CallProperty         QName(PackageNamespace(""), "readUnsignedByte"), 0
0x000020       ReturnValue

The same method after inlining the bytes and applying various other analysis like dead-code-elimination:

0x000000       GetLocal0
0x000001       PushScope
0x000000       GetLocal1
0x000001       GetByte
0x000002       ReturnValue

This is an example of the famous inverse square root using the Memory API:

private function invSqrt( value: Number ): Number
{
	var half: Number = 0.5 * value;
	Memory.writeFloat( value, 0 );
	Memory.writeInt( 0x5f3759df - ( Memory.readInt( 0 ) >> 1 ), 0 );
	value = Memory.readFloat( 0 );
	value = value * ( 1.5 - half * value * value );
	return value;
}

The same method before optimization in bytecode representation:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x000008       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x00000d       CallPropVoid         QName(PackageNamespace(""), "writeFloat"), 2
0x000010       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000012       PushInt              0x5f3759df
0x000014       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000016       PushByte             0x0
0x000018       CallProperty         QName(PackageNamespace(""), "readInt"), 1
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000021       CallPropVoid         QName(PackageNamespace(""), "writeInt"), 2
0x000024       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000026       PushByte             0x0
0x000028       CallProperty         QName(PackageNamespace(""), "readFloat"), 1
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

The same method after inlining the Memory API:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x000000       SetFloat
0x000012       PushInt              0x5f3759df
0x000016       PushByte             0x0
0x000000       GetInt
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000000       SetInt
0x000026       PushByte             0x0
0x000000       GetFloat
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

As you can see this is blazing fast. Now the next job is to finish TAAS. Once TAAS is complete even a method like the inverse square root might be inlined and optimized much better. I did a simple test using the Lorenz attractor from before and replacing the Vector.<uint> buffer with a ByteArray gave a performance boost of about 5fps. Afterwards I tried getting rid of the Particle class completly and the framerate dropped a little bit. But imagine having 300.000 particle’s x, y and z coodrinates stored in an Array. It was still faster than the old version but not as fast as combining the power of Alchemy with simple ActionScript optimizations like linked lists.

Leaving The Sandbox

If you have been following me on Twitter you might have figured out that I am working on a new project. During the last couple of months I have learned a lot in terms of code analysis and optimizations. And I am not talking about ActionScript optimizations — this is as interesting as a piece of cake. I mean stuff like sparse conditional constant propagation or loop-invariant code motion. This is where it gets interesting.

In order to perform such optimizations considering ActionScript there are two options:

  • Extend the existing ActionScript Compiler
  • Write a compiler that does not take ActionScript as the input but ActionScript Bytecode like AS3C

Extending the ASC for this task is not worh considering in my opinion and since AS3C is really buggy I decided to start from scratch. The result is high-level framework to deal with SWF, SWC and ABC files including abstract structures for control flow analysis or bytecode permutations. The idea was to make manipulations as easy as possible by hiding the complex nature of ABC files which contain ActionScript bytecode and the description of classes, their visibility etc.
Since this basic framework is now complete I started with the next step: transforming the bytecode into a stack-less representation. The reason is quite simple. The bytecode and AVM+ are using a stack-based form for various good reasons. But optimizing stack-based code is hard because the stack plays such an important role since nearly all instructions depend on the stack’s state, thus on the preceding operations.

The idea is to transform the stack-based bytecode into a stack-less three-address-code. This is why I started working on TAAS, Three-Address-ActionScript. TAAS is a stackless representation of ActionScript bytecode and typed as long as the type can be determined at compile-time. This means also that method calls are solved and that it is possible to have an optimization step to inline those for instance.
Unfortunately it is absolutely not trivial to convert bytecode to three-address-code since the control flow of a method has to be considered as well for instance. This and many other things caused me a lot of headaches during the last week. Most problems are solved but I have not implemented all instructions of the AVM+ yet. Although I can already transform the 3D lorenz attractor to TAAS for instance and all types are solved correct.

Now what is the next step? Of course converting TAAS to a static-single-assignment form which is perfect for optimizations.
Having a powerful framework opens up a lot of possibilities. There are frameworks available for Java which convert Java bytecode to SSA as well which could be converted to TAAS and finally to ActionScript bytecode. One could also start implementing great features like Code Contracts.

I will talk about my work and results also at FOTB this year.

AS3C — take a look inside

I have started working on AS3C at the end of last year. After a quick prototype the development stagnated and I added just several fixes and tests to the code. Basically I started AS3C as a complete C# newcommer and because of that the code is very ugly.

Due to the fact that I do not have much free time to continue developing AS3C I think it is the right time to release the source-code on the one hand and to let people experiment with it on the other hand.

You can either download the sources and build AS3C manually (you will need zlib.net) or download a binary from trunk/bin/.

When using AS3C you will need the ActionScript from the SVN. Remember that you write real ActionScript code which gets translated by AS3C. There is also one undocumented and very experimental feature existing. If you run as3c.exe -optimize main.swf you could get some speed improvements if you have heavy loops using the Math class. But it could also destroy the SWF so do not forget to make a backup :o)

AS3 compiler open-source: the logical consequences

With the release of the Flex 3 SDK the ActionScript compiler (ASC) became open-source as well. As you know I am working on my own “compiler” to inline bytecode directly with ActionScript. My approach is a post-compile compiler because I wrote it while the ASC was still closed source and ther was no way to do that different.

I still like the approach but it is of course not the best way. The only logical consequence now is to write a patch for the ASC and add some new keywords to it. This would be also easier than what I have to do currently because I am working with bytecode only and the compiler is half a compiler and half a virtual machine. I think it would be great if a community starts to evolve around the SDK and builds a version with more advanced features. Nicolas?

FITC Toronto: enthusiASM

I am so happy — this year I will speak at FITC Toronto! And it is for me the first conference outside of Europe. So what will it be about? It is the first release and session about my currently nameless inline compiler. This tool allows you to write and debug bytecode by writing ActionScript. It includes also a lot of other nifty features like method injection — which is by the way a real killer-feature. If you do not understand it right now don’t worry: you will love it!

The compiler is currently in a working state (although I finally have to support bytecode 0x1b) and I am implementing several optimization techniques. This is not so easy but I hope to find some solutions to enable branch elimination, constant folding and loop unrolling (which is already getting pretty close) to name a few.

I do not want to spoil the fun so any updates are kept private until the FITC for now.

Flashforum Conference 2008

Flashforum Konferenz 2008

A new year, a new Flashforum Conference. The organizers Sascha and Marc invite you from 2nd to the 5th of June to Cologne. It is my third time at the Flashforum Conference and I think it will be a great event just like the last two years. This time the conference will be also a little bit more workshop-based with two extra days (pre and post conference) full of workshops. I will have one session about Hydra like in Amsterdam and another one about my inline bytecode compiler and optimizer. Be prepared — it will definitly rock.

Hello As3c!

A little sneak preview for a tool I am currently working on. What you can see is inline assembler instructions in between ActionScript 3 code. You can also place breakpoints and debug your code like you expect it.

The tool has also some other options that can be very helpful. Stay tuned — once it is robust and nice I will let you know for pure low-level fun and optimizations.

Hello world!


__asm(
'.fun:',
Op.findPropertyStrict('public::trace'),
Op.pushString('Hello World!'),
Op.callPropertyVoid('public::trace', 1),
Op.jump('.fun')
);

Would it not be great to write that inline ASM in ActionScript 3 while being able to maintain full debugging capabilities? I think so ;). By the way I hope you noticed that Nicolas just revealed awesome news about future haXe additions.