Tag Archive for 'taas'

Polyglott Programming On The AVM2

Take some time and think about this tweet for a moment. It took me a while to realize that Joel Hooks is right. I was embaressed of myself. How could I forget about that? But I had also a big smile on my face at the same same time. Let me explain why.

The Java to SWF compiler does not compile Java sourcecode but JVM bytecode to ActionScript bytecode. This means I do not have to teach my program the Java language. It only understands JVM bytecode. This seems like an akward decision on the one hand since working on the bytecode level implies lots of problems. But it turns out that this was a really cool decision on the other hand. “Java to SWF compiler” is maybe the wrong description. “any language that compiles to JVM bytecode to SWF compiler” is maybe better.

So what does any language mean? Here is a list of JVM languages. Now you feel maybe like I did after reading that tweet. And I am really looking forward to get Scala up and running.

Some problems still exist. Threading is one issue and I will basically have to do what Scott Peterson did for Alchemy. But reflections, annotations and method overloading have to be solved as well. Some glitches may exist even after figuring everything out. Stacktraces will look pretty weird. However I think this is a really cool project.

Update: I forgot to mention something important. Java supports native code. This means you can build a library that works with OpenGL for instance. Those native methods can not be converted. There are also some other things that do not work. File access is just one of them.

TAAS As A Decompiler

The TAAS compiler is different from the ActionScript compiler since its input is not ActionScript source code but already compiled SWF or SWC files. Just like the haXe compiler can output AS3 instead of a SWF the TAAS compiler can do the same.

Now if you add one and one together you see that the TAAS compiler can be used as a very strong decompiler. My own tests have shown that it will work flawlessly where other commercial decompilers output rubbish. Since the compiler behaves like the Flash Player it will “execute” the bytecode in order to parse it which means it has a very highlevel understanding of the structure inside the SWF.

The only question is now what to do with the source code. I wrote the decompiler for my session at FOTB to show much easier how the optimizations behave. It is also a great tool to debug errors. But should it be opened or not?

To take it one step further one might also be able to write an obfuscator using the TAAS compiler. In my opinion it would be cool to have a strong decompiler and obfuscator, both being open source. We might also add an option to protect SWFs from the decompiler by adding something to the SWF metadata for instance. Of course this is just a simple rule which could be removed by someone once the code is open. What do you think?

Compiling Java and C# to SWF

At the end of my Leaving the Sandbox session I showed two projects I have been working on without telling anyone. The first was a compiler that could compile C# to SWF. The second was another compiling being able to compile Java to SWF. First of all I have to say that both tools are far from being finished. They are a so called proof-of-concept. For me it is enough to know that this is possible. I am also not sure if I will finish one of those tools.

So how does this work? Because I am now working on the AudioTool backend for our highly anticipated launch I had to switch to fulltime Java development. In order to understand the JVM better I started to search for the optimizations that the HotSpot VM performs and stumbled upon a framework called Soot. Voilà. Reading about Soot made me question why nobody is working on such a framework for ActionScript. Probably because universities see Flash just as a toy. For us developers being dependent on the Flash Platform it is of course more than that.

I will quickly explain what Soot does. It reads compiled Java class files and converts the machine code they contain into a three address code representaiton. This code is then heavily optimized. TAAS is basically doing the same. I also want to note that Soot is by far more complex and complete than TAAS right now. It would be megalomania saying that TAAS is Soot for ActionScript. But you get the idea. The one framework converts Java to three address code and the other framework converts ActionScript to three address code. Maybe it is possible to connect them.

This is what I did for the Java compiler. The whole pipeline looks like this:

  1. Compile Java sourcecode with the Java compiler of your choise
  2. Use Soot to get the three address code representation of a compiled binary
  3. Create TAAS expressions for the expressions that Soot uses
  4. Connect the TAAS graph the same way the Soot graph is connected
  5. Compile the TAAS graph to a SWF

This is about it. There is some glue-code involved here and there and I did not implement all Java expressions yet. I did also not bother about threads. Basically one could do what Scott has done for Alchemy. The cool thing is that we get highly optimized Java code because of the Java compiler and Soot. Then the TAAS compiler can run a second time after linking is done and perform platform specific optimizations like inlining and loop invariant code motion for certain expressions.

I already wrote a couple of classes and packed them in a SWC. Those mimic the behaviour of the Java classes like java.lang.System or java.util.LinkedList. On the Java side I implemented the Flash classes like flash.display.Sprite and flash.events.EventDispatcher. The interesting part is that all the methods those classes contain are marked native which means they have no implementation since they are native to the Flash Player.
This is also a great advantage. It is slow in Alchemy to call Flash methods and to communicate with Flash classes. The Java approach does not have this tradeoff. I also do not have to trigger the ActionScript compiler. The conversion form class to swf is entirely done using Soot and TAAS.

So far for the Java compiler. But in order to show a little bit more of the potential I wanted to have a C# compiler working as well. The great thing about advanced languages and a large community is that people have already built a lot of tools. It did not take too long to find an advanced framework that converts C# to Java. Actually this is not be the best approach. But I could not find the “Soot for C#” the evening before my FOTB session.

So I am using net2java to convert .NET code like C# or VisualBasic to Java. That code is then compiled using the Java compiler and I am using Soot again to convert the code to TAAS without much stress.

For me the most important part is to know that it is possible to compile .NET source code and Java binaries to SWF. To complete those tools one just has to implement all the missing expressions and the standard Java library in Flash. Everything could also work the other way around. Compiling from TAAS to .NET, Silverlight and Java is another option.

TAAS progress

I have developed some other optimizations during the past couple of days. Including strength reduction and optimizing tail recursive calls.

Strength reduction can already handle about 55 different cases at the moment. For instance it will convert code like if( x - 1 == 0 ) into if( x == 1 ). But it will also remove expressions that other optimizations could introduce. x + 0 is something that might occur during inline expansion. Or x + Number.NaN for example.

But I really like the tail recursive optimizations. TAAS can detect if a method needs to call itself or not. Take this code for example:

private function sum( i: int, value: int ): int
{
	if( i == 0 )
	{
		return value;
	}

	return sum( i - 1, value + i );
}

You see that the last statement sum( i - 1, value + i ) is tail recursive. This means we can replace the recursive call to sum() with a jump to the beginning of the method after changing the method parameters. The sum() method becomes converted to something like a simple while loop. This means also that one could have written code like this in the first place since this optimization applies only to methods that do not have to be recursive after all.

Knock yourself out with some examples:

First results of TAAS

TAAS in action

Finally I am able to present the first results of TAAS. It was a long way to get here. I was actually not sure at all if I will manage to show something at FOTB 09. Let me explain what happens here.

In the image is the original ActionScript code on the left. The ActionScript bytecode produced by the ASC is in the middle and the compiled TAAS version is on the right. You see that I am running a loop and call a method inside that loop. It would be nice to have small helper methods like this one inlined automatically. TAAS can do this for you now.

So how does this work? First of all an SWF file is parsed and the bytecode is extracted. This bytecode is transformed into a control flow graph. This control flow graph is converted into a graph of TAAS expressions which a lot of benifits. The conversion step works like the Flash Player and kind of executes your code. The result is that all methods and local variables are typed if that is possible.

It is much easier to perform a lot of optimizations now. Like inline expansion. In order to do that I can simply compile the method that is called to TAAS and connect the vertices together in the graph after some adjustments have been made. The inline expansion step will automatically inline methods when it thinks it is useful and possible. You can only inline methods that are private or final and make no use of reflections.

Inlined methods create a lot of overhead. Usually they contribute additional local variables and the code gets bigger. So one step after the inline expansion is to clean everything up again. In this example I can remove those registers thanks to copy propagation and dead code elimination.
And it gets even better. If you take a closer look at the example, the calc method expects two paramters that are of type Number. Therefore the original method uses an Add instruction. TAAS knows about the types and sees that it will pass two integers into this method. Since it makes no sense to convert from int to Number to int it stays with integer in this case.

So after inlining the method ends up with even less local variables. The original local variable t1 is considered useless since it is only used one time. TAAS will put the code getTimer() at the position where t1 has been used and we end up with a heavily optimized method. What I like the most is that I do not have to change the way I write my ActionScript code. All happens behind the scenes automatically.
Some other optimizations are implicit. TAAS will use AddInt instead of Add if both operands are typed int. A FlowOptimizer is also involved and will invert the if expression for the loop. This can reduce the number of required jumps in a method by 50%.

I know that the output is not perfect. But this is all a work in progress and really the first result that I can share. The SWFs speak for them self.

Apparat is now Open Source

The full source code of Apparat is now available at GoogleCode. It is the whole framework behind TDSI and Reducer.

Apparat is released under the GNU Lesser General Public License. Please contact me if you want to contribute to this project. Maybe someone is interested in writing an Ant task for Reducer? I am also happy to receive feedback if you have used the framework to build something cool with it.

Please join the Apparat Discussion Group if you are interested in collaboration.

TurboDieselSportInjection

I am definitly not good at choosing names for software projects. However TurboDieselSportInjection is a release of my experiments from yesterday. It is a spinoff from the whole framework and allows you to inline __bytecode and of course to use the new Memory API.

Hopefully you are kind enough to provide me with some feedback. I am especially interested in Exceptions that occur when reading or writing ABC files. Have fun!

Update: TDSI is now open source!

Alchemy for ActionScript

Today I had to do something else than backend development and since FOTB is getting closer and I could not really continue working on TAAS I decided to add something which is easy to implement and has a huge benifit: Alchemy support in ActionScript.

So what is the idea? TAAS is part of a framework I developed to manipulate SWF, SWC and ABC files. The main focus are of course ABC files since they contain the bytecode which gets executed.
Part of the framework are tools for control flow analysis, various bytecode analyzers and also a search-and-replace system which work on a bytecode level. There are for instance pattern matchers that search for bad code produced by the ASC and replace the match with a more performant set of instructions.

With all those weapons in my arsenal I thought it should be a walk in the park to implement the Alchemy features in a way that makes sense. So the first idea is to have the old functionality AS3C had but more robust. AS3C had a feature that was the __asm function which allowed you to inline instructions. The new framework comes with the old __asm and also another cool method: __bytecode! This will inline raw bytes. This means also you would have to know all the indices for variables you want to use from the constant pool in advance so __asm will still be your friend.

With the __bytecode method it is already possible to use all Alchemy features again. It would also be possible with the __asm method but writing plain bytes is simply more elitist. In order to make it easy for the developer I want a high-level API. Having a class with some static methods is nice of course but also slow. Alchemy is fast because those opcodes that write and read from a ByteArray are no method calls. They are low-level FlashPlayer features.

The first attempt was to write a Memory class that allows you to use the Alchemy features. This class contains raw bytecode implementations and ActionScript code. This means if you do not use the optimizer everything will still work — only 1000 times slower. When looking at the memory class there is another tool of the framework that becomes very helpful. Both the __bytecode and ActionScript stuff should not co-exist with each other. So when we inline the bytecode a dead-code-elimination will simply cleanup afterwards. Since the 0x47 byte for instance is “ReturnVoid” the ActionScript code which would follow afterwards can be dropped. That code is now unreachable.

Step two is to replace all calls to the Memory class with the correct Alchemy opcode. This was really simple and the result is a really really fast way to access a ByteArray while still maintaining a high comfort. Of course one might think now that the __bytecode method becomes useless since no methods of the Memory class are called at all. But if anyone is crazy enough to access the Memory class untyped with a runtime namespace for instance you are still happy to have the code optimized inside. In some circumstances it is simply impossible to figure out that someone called Memory.writeByte(). End of the story: your calls to a ByteArray are always optimized in the best way possible.

This is an example of the Memory.readByte() method before applying optimizations:

0x000000       GetLocal0
0x000001       PushScope
0x000002       FindPropStrict       QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode")
0x000004       PushShort            0xd1
0x000007       PushByte             0x35
0x000009       PushByte             0x48
0x00000b       CallPropVoid         QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "__bytecode"), 3
0x00000e       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000010       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x000012       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x000014       GetLocal1
0x000015       SetProperty          QName(PackageNamespace(""), "position")
0x000017       GetLex               QName(PackageNamespace("flash.system"), "ApplicationDomain")
0x000019       GetProperty          QName(PackageNamespace(""), "currentDomain")
0x00001b       GetProperty          QName(PackageNamespace(""), "domainMemory")
0x00001d       CallProperty         QName(PackageNamespace(""), "readUnsignedByte"), 0
0x000020       ReturnValue

The same method after inlining the bytes and applying various other analysis like dead-code-elimination:

0x000000       GetLocal0
0x000001       PushScope
0x000000       GetLocal1
0x000001       GetByte
0x000002       ReturnValue

This is an example of the famous inverse square root using the Memory API:

private function invSqrt( value: Number ): Number
{
	var half: Number = 0.5 * value;
	Memory.writeFloat( value, 0 );
	Memory.writeInt( 0x5f3759df - ( Memory.readInt( 0 ) >> 1 ), 0 );
	value = Memory.readFloat( 0 );
	value = value * ( 1.5 - half * value * value );
	return value;
}

The same method before optimization in bytecode representation:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x000008       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x00000d       CallPropVoid         QName(PackageNamespace(""), "writeFloat"), 2
0x000010       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000012       PushInt              0x5f3759df
0x000014       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000016       PushByte             0x0
0x000018       CallProperty         QName(PackageNamespace(""), "readInt"), 1
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000021       CallPropVoid         QName(PackageNamespace(""), "writeInt"), 2
0x000024       GetLex               QName(PackageNamespace("com.joa_ebert.abc.bytecode.asbridge"), "Memory")
0x000026       PushByte             0x0
0x000028       CallProperty         QName(PackageNamespace(""), "readFloat"), 1
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

The same method after inlining the Memory API:

0x000000       GetLocal0
0x000001       PushScope
0x000002       PushDouble           0.5
0x000004       GetLocal1
0x000005       Multiply
0x000006       ConvertDouble
0x000007       SetLocal2
0x00000a       GetLocal1
0x00000b       PushByte             0x0
0x000000       SetFloat
0x000012       PushInt              0x5f3759df
0x000016       PushByte             0x0
0x000000       GetInt
0x00001b       PushByte             0x1
0x00001d       ShiftRight
0x00001e       Subtract
0x00001f       PushByte             0x0
0x000000       SetInt
0x000026       PushByte             0x0
0x000000       GetFloat
0x00002b       ConvertDouble
0x00002c       SetLocal1
0x00002d       GetLocal1
0x00002e       PushDouble           1.5
0x000030       GetLocal2
0x000031       GetLocal1
0x000032       Multiply
0x000033       GetLocal1
0x000034       Multiply
0x000035       Subtract
0x000036       Multiply
0x000037       ConvertDouble
0x000038       SetLocal1
0x000039       GetLocal1
0x00003a       ReturnValue

As you can see this is blazing fast. Now the next job is to finish TAAS. Once TAAS is complete even a method like the inverse square root might be inlined and optimized much better. I did a simple test using the Lorenz attractor from before and replacing the Vector.<uint> buffer with a ByteArray gave a performance boost of about 5fps. Afterwards I tried getting rid of the Particle class completly and the framerate dropped a little bit. But imagine having 300.000 particle’s x, y and z coodrinates stored in an Array. It was still faster than the old version but not as fast as combining the power of Alchemy with simple ActionScript optimizations like linked lists.

A Simple Method And Taas

I just want to share with you how Taas can actually optimize code and what happens behind the scenes. So in my last post I talked about stackless code and optimizations that are possible but how does it work at all?
Imagine you have got this ActionScript code:

var x: Number = 1.0;
var y: Number;

if(true)
{
	y = 3.0;
}
else
{
	y = 2.0;
}

x += y;

return x;

For the sake of simplicity I will not use local variables in the bytecode. But it will be easier to figure out what happens in the bytecode.

      PushDouble           1.0
      PushTrue
      IfTrue               L0

      PushDouble           2.0
      Jump                 L1

L0:   PushDouble           3.0
L1:   Add
      ReturnValue

This bytecode is nearly the same as the method but without local variables. This should be fairly simple to understand.

When converting bytecode to Taas, its corresponding control flow graph is converted into a control flow graph of Taas expressions. I do not want to go into much detail how the stack based code is converted but here is the graph before and after the transformation from bytecode to Taas. It looks quite similar but there is a major difference. All those constant values in the Taas graph are not values pushed on a stack but assignments to virtual variables which exist only in theory. You can see also that the Add in the bytecode has known operands and type in the Taas version. Since it is currently not know if 2.0 or 3.0 has been used there is a Φ function that says “This value is either 2.0 or 3.0 depending on the path taken at runtime”.

So as I said this code can be optimized much better than stack based code. This graph is very redundant. There are three utilities to compact and simplify the current graph of Taas expressions. The algorithm has been developed for Java bytecode but works with ActionScript very well tool. The concept is quite simple. Perform copy propagation, constant folding and dead code elimination until the graph stops changing. Applying these techniques to the Taas graph yields the following results.

Copy propagation:

Most of the theoretical variables have been eliminated using copy propagation.

Constant folding:

Known constants have been replaced. Even the if condition is no longer needed and the false branch has been removed. The dead code elimination will clean up this code afterwards.

Dead code elimination:

Dead code elimination has removed the dead Jump statement and also the value 2.0 from the Φ expression. Afterwards constant folding replaced the Φ expression with its one constant value. Then another iteration of constant folding replaced the Add expression with the constant value 4.0. Afterwards copy propagation has put the result into the Return statement and voilá.

This result may have seemed quite obvious from the beginning and you may ask who writes such code. Probably nobody. But once you start inlining methods this makes a lot of sense. A lot of preconditions are known in that case and unnecessary branches can be removed. Since inlining methods bloats up the code it is very important to compact it afterwards as much as possible. I hope you see now how interesting all of this might be in the future.

Leaving The Sandbox

If you have been following me on Twitter you might have figured out that I am working on a new project. During the last couple of months I have learned a lot in terms of code analysis and optimizations. And I am not talking about ActionScript optimizations — this is as interesting as a piece of cake. I mean stuff like sparse conditional constant propagation or loop-invariant code motion. This is where it gets interesting.

In order to perform such optimizations considering ActionScript there are two options:

  • Extend the existing ActionScript Compiler
  • Write a compiler that does not take ActionScript as the input but ActionScript Bytecode like AS3C

Extending the ASC for this task is not worh considering in my opinion and since AS3C is really buggy I decided to start from scratch. The result is high-level framework to deal with SWF, SWC and ABC files including abstract structures for control flow analysis or bytecode permutations. The idea was to make manipulations as easy as possible by hiding the complex nature of ABC files which contain ActionScript bytecode and the description of classes, their visibility etc.
Since this basic framework is now complete I started with the next step: transforming the bytecode into a stack-less representation. The reason is quite simple. The bytecode and AVM+ are using a stack-based form for various good reasons. But optimizing stack-based code is hard because the stack plays such an important role since nearly all instructions depend on the stack’s state, thus on the preceding operations.

The idea is to transform the stack-based bytecode into a stack-less three-address-code. This is why I started working on TAAS, Three-Address-ActionScript. TAAS is a stackless representation of ActionScript bytecode and typed as long as the type can be determined at compile-time. This means also that method calls are solved and that it is possible to have an optimization step to inline those for instance.
Unfortunately it is absolutely not trivial to convert bytecode to three-address-code since the control flow of a method has to be considered as well for instance. This and many other things caused me a lot of headaches during the last week. Most problems are solved but I have not implemented all instructions of the AVM+ yet. Although I can already transform the 3D lorenz attractor to TAAS for instance and all types are solved correct.

Now what is the next step? Of course converting TAAS to a static-single-assignment form which is perfect for optimizations.
Having a powerful framework opens up a lot of possibilities. There are frameworks available for Java which convert Java bytecode to SSA as well which could be converted to TAAS and finally to ActionScript bytecode. One could also start implementing great features like Code Contracts.

I will talk about my work and results also at FOTB this year.