Archive for the 'experiments' Category

So I Recorded A New Video

This time recorded on Ubuntu. Did I mention 64bit already?
Do not miss my session at FOTB for a live demo.

Putting Things In Perspective

I am back home in Germany from my trip to San Francisco where I initially announced JITB at FITC. I did not expect it to get around that quickly and that a quick-and-dirty video filmed with my phone would get so much coverage.

I want to put a lot of things regarding JITB in perspective here as a followup to the various comments that I received via mail, reddit, etc.

Flash Replacement

JITB is not a Flash replacement. Imagine a F1 race car and a family van. The family van comes with air conditioning, a radio and a lot of comfort. For the F1 car you will probably have to wear a helmet and need an engineering team to get it running.
The important factor for me is that JITB will run very fast under certain conditions. A race car is also not a jeep. JITB will not support all features of the Flash Player. If you search for a Flash replacement you might want to take a look at lightspark.

Performance

Java is slow and the world is flat. That being said I will not talk about Java performance. JITB compiles to native Java code with a certain overhead. An ActionScript 3 closure is converted into an anonymous class for instance.
JITB converts ActionScript 3 bytecode to Java bytecode at runtime and performs various optimizations. This way we can leverage the speed of the JVM for ActionScript and get great results.
That being said there is also a problem of course. Startup time is not very good. JITB and its parent project Apparat are implemented in the Scala programming language and written for multi core architectures. The startup costs are high and JITB is only useful for long living applications. However compiling ActionScript to Java can be done AOT. This means you need to deal only with the normal JVM startup time. The Flash Player API is implemented in pure Java.

The compiler used for JITB is doing relatively naive optimizations at the moment. Namely constant folding, copy propagation, dead code elimination and strength reduction. An older proof of concept version I implemented a year ago featured also loop invariant code motion, inline expansion and tail-recursive optimizations. The new compiler framework is very powerful and I will add even more algorithms and a graph coloring register allocator for example. So there is some more NP-complete fun ahead making startup time even worse.

Purpose

We need a Flash Player that performs very fast for special purposes. We could use JITB at Audiotool to render mixdowns of tracks or other companies could use it to run their ActionScript code on the server side. You can use JITB to create an offline application that runs on a client machine. You can hack JITB to do what you want. Someone could potentially write a web framework for ActionScript and you could execute that code on Google App Engine if compiled AOT. And it is definitly possible to create Android applications using ActionScript. JITB’s terrain is everything but the browser in my opinion. However I can imagine that someone would be interested in creating a plugin version. With some clever simple caching algorithms it could make sense to get around startup costs but I am not interested in doing this.

Legal

There are no legal obligations to face since both the AVM2 and SWF specifications are published by Adobe. It is also not illegal to compile code for the JVM.

ActionScript Subset

By “a subset of the language” I really mean the language and not the API. JITB supports currently only statically typed ActionScript code. However the internal language model can be adjusted to support dynamic typing as well and with Java 7 things got even easier. In fact JITB does support dynamic code as long as it is able to infer types correct. Does this mean that I am willing to implement the full Flash Player API all by myself? Hell no.

In the next weeks I will probably create a better example that is easier to understand and get some other stuff ready so that you can try everything for yourself.

Introducing JITB

My talk at FITC San Francisco is over and I want to share some of the anouncements from today with you. At the end of my talk I was showing JITB.

What you see in the YouTube video I posted a while ago is a Java program executing a SWF. For FITC I added some more code and an OpenGL based Display List renderer. In other words: I wrote a Flash Player.

However I should rephrase that statement and say I am attempting to build a Flash Player. The current state is available in the sf2010-sprint clone of Apparat. I will merge the changes into the main Apparat branch when I am back home in Germany.

JITB is currently able to translate a subset of ActionScript code at runtime into Java bytecode and runs nearly at the same speed as native Java. This is a really huge improvement compared to standard ActionScript performance. A lot of smart people worked on the JVM and made it really fast. Apparat will allow you to leverage all this hard work in the future. I am also shooting for Java interoperability at some level so that you can call Java classes from within ActionScript. Hopefully you will be able to use JITB on your desktop machine, on a server or on an Android phone. Basically everywhere Java runs.

There are still a lot of things missing. The whole Flash API needs to be implemented. And the Display List rendering needs a proper OpenGL implementation. However I thought this might be some cool stuff to share with you in its early stages.
My hope is that more people start contributing to the project. Maybe some OpenGL guru wants to take care of the Display List rendering or someone else likes to help implement the Flash API in Java.

I also showed a Raytracer by Nico Zimmermann during my presentation and promised to put the URL on my blog so here it is. His company is called Britzpetermann and the address is http://www.britzpetermann.com/.

Update: Please do not think that this implementation is 30x faster than the Flash Player developed by Adobe. One(!) microbenchmark is never a number you should count on. I would like to make clear that I never said this.

New Apparat Example

Good news everyone. The Apparat inline expansion works now to full extent after fixing some minor bugs. A complete example is also available. Just change the paths in the build.properties file and compile everything using Ant.

Apparat Example

Use the inline feature with care. Apparat does not try to optimize your code and performs nothing but dead simple inlining. This can lead to slower code due to the creation of lots of local registers. Your code gets also much bigger and will require more space in memory. I am actually not a fan of manual inlining at all. I think it makes only sense to inline code if you have a powerful optimizer available that will cleanup the whole mess.

The fun story about this example is that the inlined version is slower using the lastes Flash Player release candidate if you have only 40.000 particles. That is why I increased the number of particles to 80.000 ;). I developed the example using an old standalone player and the inlined version was nearly twice as fast. However when I watched the example in the browser with the latest release candidate the game was completely different. Kudos to Adobe for significantly improving the Flash Player performance!

Polyglott Programming On The AVM2

Take some time and think about this tweet for a moment. It took me a while to realize that Joel Hooks is right. I was embaressed of myself. How could I forget about that? But I had also a big smile on my face at the same same time. Let me explain why.

The Java to SWF compiler does not compile Java sourcecode but JVM bytecode to ActionScript bytecode. This means I do not have to teach my program the Java language. It only understands JVM bytecode. This seems like an akward decision on the one hand since working on the bytecode level implies lots of problems. But it turns out that this was a really cool decision on the other hand. “Java to SWF compiler” is maybe the wrong description. “any language that compiles to JVM bytecode to SWF compiler” is maybe better.

So what does any language mean? Here is a list of JVM languages. Now you feel maybe like I did after reading that tweet. And I am really looking forward to get Scala up and running.

Some problems still exist. Threading is one issue and I will basically have to do what Scott Peterson did for Alchemy. But reflections, annotations and method overloading have to be solved as well. Some glitches may exist even after figuring everything out. Stacktraces will look pretty weird. However I think this is a really cool project.

Update: I forgot to mention something important. Java supports native code. This means you can build a library that works with OpenGL for instance. Those native methods can not be converted. There are also some other things that do not work. File access is just one of them.

Getting Rid Of null

A couple of weeks ago I started learning Scala. I can highly recommend it. The language has a lot of great approaches to multithreading and scalability. The reason why I like Scala is because it is so simple yet powerful.

Of course doing something at home influences work. I decided to write a build tool for the Hobnox AudioTool which is about 200 lines of Scala code. Cool thing is that this tool replaces a manually maintained Ant build and all project dependencies are always correct. Plus analyzing the dependencies in a more powerful way allows me to spawn the compiler in parallel. Building the AudioTool is now twice as fast and much more comfortable.

When learning a new programming language you also learn about new concepts. Functional languages in general have a different approach to nullable types. I know Scala is not the only one but let me introduce the concept in terms of ActionScript.

When you have a method that returns either a result or nothing: what do you do? Imagine you have some kind of service and a Dictionary of users. Requesting a user works by his unique id. The Dictionary is private to the class since you want to keep it read only.

function getUser(id: String): User {
  return hasUser(id) ? users[id] : null;
}

If I would now simply ask the service for an unknown user and do something like if(getUser('xyz').isLoggedIn) { trace('Hooray'); } I could probably and up with a null-reference error. No one checks for me if the user exists. So what else could we do? Write a lot of boilerplate code and prepend a check if the user is null or not each time we request one from the service. A much better approach in my opinion is to throw an error as early as possible. In this case we would rewrite the method to something like this:

function getUser(id: String): User {
  if(hasUser(id)) return users[id];
  throw new NoSuchUserError(id);
}

In this scenario we get informed about the error as early as possible. But we are stuck again. First of all ActionScript does not enforce you to catch possible exceptions. This means if you do not read the documentation of a method carefully or look into the source code of a method before calling it somewhere you will never know that it throws an exception. And what if we are actually in a scenario where we do not expect errors for non-existing objects? Think of the Dictionary object throwing each time an error when you access it and the result is null. How could I even check if an object exists in a Dictionary?

try {
  dictionary[key];
  return true;
}
catch(noSuchElementError: NoSuchElementError) {
  return false;
}

I guess you see that this can not be the solution to our problem. In a real world example you may deal with your own collection of objects instead of a Dictionary of course. So we have to get rid of exceptions and null for the optimal solution. Scala’s approach to this problem is the Option type. We always abuse null as a placeholder when we want to express that an element does not exist. The Option means that either Some or None result exists. Rewriting our getUser function using this approach would yield the following ActionScript code.

function getUser(id: String): Option {
  return hasUser(id) ? new Some(user[id]) : new None();
}

Why is this much better than the old approach? When calling the method you will always know that the method has only an optional result value. We get rid of the exception and null values. Our only problem at the moment is ActionScript. The result is now untyped. In an ideal world this method would be written as:

function getUser(id: String): Option.<User> {
  return hasUser(id) ? new Some.<User>(user[id]) : new None.<User>();
}

However we can still tackle this issue by implementing null-representations of our objects. Imagine the User class. You could rewrite the code to something like this.

function getUser(id: String): IUser {
  return hasUser(id) ? user[id] : new NullUser();
}

final class NullUser implements IUser {
  public function get isLoggedIn(): Boolean { return false; }
  public function get name(): String { return 'null'; }
}

And even if you are interested in null-reference errors you could rewrite your code to something like this:

final class NullUser implements IUser {
  public function get isLoggedIn(): Boolean {
    CONFIG::ThrowNullReferenceErrors { throw new NullReferenceError(); }
    return false;
  }
  public function get name(): String {
    CONFIG:: ThrowNullReferenceErrors { throw new NullReferenceError(); }
    return 'null';
  }
}

It is definitely a very different approach. A functional language like Scala allows you to deal much better with Options. But it makes sense to diferentiate between an uninitialized variable which is null and an optional result of a function. Unfortunately this is at the moment very painful with the lack of generics in ActionScript.

Compiling Java and C# to SWF

At the end of my Leaving the Sandbox session I showed two projects I have been working on without telling anyone. The first was a compiler that could compile C# to SWF. The second was another compiling being able to compile Java to SWF. First of all I have to say that both tools are far from being finished. They are a so called proof-of-concept. For me it is enough to know that this is possible. I am also not sure if I will finish one of those tools.

So how does this work? Because I am now working on the AudioTool backend for our highly anticipated launch I had to switch to fulltime Java development. In order to understand the JVM better I started to search for the optimizations that the HotSpot VM performs and stumbled upon a framework called Soot. VoilĂ . Reading about Soot made me question why nobody is working on such a framework for ActionScript. Probably because universities see Flash just as a toy. For us developers being dependent on the Flash Platform it is of course more than that.

I will quickly explain what Soot does. It reads compiled Java class files and converts the machine code they contain into a three address code representaiton. This code is then heavily optimized. TAAS is basically doing the same. I also want to note that Soot is by far more complex and complete than TAAS right now. It would be megalomania saying that TAAS is Soot for ActionScript. But you get the idea. The one framework converts Java to three address code and the other framework converts ActionScript to three address code. Maybe it is possible to connect them.

This is what I did for the Java compiler. The whole pipeline looks like this:

  1. Compile Java sourcecode with the Java compiler of your choise
  2. Use Soot to get the three address code representation of a compiled binary
  3. Create TAAS expressions for the expressions that Soot uses
  4. Connect the TAAS graph the same way the Soot graph is connected
  5. Compile the TAAS graph to a SWF

This is about it. There is some glue-code involved here and there and I did not implement all Java expressions yet. I did also not bother about threads. Basically one could do what Scott has done for Alchemy. The cool thing is that we get highly optimized Java code because of the Java compiler and Soot. Then the TAAS compiler can run a second time after linking is done and perform platform specific optimizations like inlining and loop invariant code motion for certain expressions.

I already wrote a couple of classes and packed them in a SWC. Those mimic the behaviour of the Java classes like java.lang.System or java.util.LinkedList. On the Java side I implemented the Flash classes like flash.display.Sprite and flash.events.EventDispatcher. The interesting part is that all the methods those classes contain are marked native which means they have no implementation since they are native to the Flash Player.
This is also a great advantage. It is slow in Alchemy to call Flash methods and to communicate with Flash classes. The Java approach does not have this tradeoff. I also do not have to trigger the ActionScript compiler. The conversion form class to swf is entirely done using Soot and TAAS.

So far for the Java compiler. But in order to show a little bit more of the potential I wanted to have a C# compiler working as well. The great thing about advanced languages and a large community is that people have already built a lot of tools. It did not take too long to find an advanced framework that converts C# to Java. Actually this is not be the best approach. But I could not find the “Soot for C#” the evening before my FOTB session.

So I am using net2java to convert .NET code like C# or VisualBasic to Java. That code is then compiled using the Java compiler and I am using Soot again to convert the code to TAAS without much stress.

For me the most important part is to know that it is possible to compile .NET source code and Java binaries to SWF. To complete those tools one just has to implement all the missing expressions and the standard Java library in Flash. Everything could also work the other way around. Compiling from TAAS to .NET, Silverlight and Java is another option.

TAAS progress

I have developed some other optimizations during the past couple of days. Including strength reduction and optimizing tail recursive calls.

Strength reduction can already handle about 55 different cases at the moment. For instance it will convert code like if( x - 1 == 0 ) into if( x == 1 ). But it will also remove expressions that other optimizations could introduce. x + 0 is something that might occur during inline expansion. Or x + Number.NaN for example.

But I really like the tail recursive optimizations. TAAS can detect if a method needs to call itself or not. Take this code for example:

private function sum( i: int, value: int ): int
{
	if( i == 0 )
	{
		return value;
	}

	return sum( i - 1, value + i );
}

You see that the last statement sum( i - 1, value + i ) is tail recursive. This means we can replace the recursive call to sum() with a jump to the beginning of the method after changing the method parameters. The sum() method becomes converted to something like a simple while loop. This means also that one could have written code like this in the first place since this optimization applies only to methods that do not have to be recursive after all.

Knock yourself out with some examples:

First results of TAAS

TAAS in action

Finally I am able to present the first results of TAAS. It was a long way to get here. I was actually not sure at all if I will manage to show something at FOTB 09. Let me explain what happens here.

In the image is the original ActionScript code on the left. The ActionScript bytecode produced by the ASC is in the middle and the compiled TAAS version is on the right. You see that I am running a loop and call a method inside that loop. It would be nice to have small helper methods like this one inlined automatically. TAAS can do this for you now.

So how does this work? First of all an SWF file is parsed and the bytecode is extracted. This bytecode is transformed into a control flow graph. This control flow graph is converted into a graph of TAAS expressions which a lot of benifits. The conversion step works like the Flash Player and kind of executes your code. The result is that all methods and local variables are typed if that is possible.

It is much easier to perform a lot of optimizations now. Like inline expansion. In order to do that I can simply compile the method that is called to TAAS and connect the vertices together in the graph after some adjustments have been made. The inline expansion step will automatically inline methods when it thinks it is useful and possible. You can only inline methods that are private or final and make no use of reflections.

Inlined methods create a lot of overhead. Usually they contribute additional local variables and the code gets bigger. So one step after the inline expansion is to clean everything up again. In this example I can remove those registers thanks to copy propagation and dead code elimination.
And it gets even better. If you take a closer look at the example, the calc method expects two paramters that are of type Number. Therefore the original method uses an Add instruction. TAAS knows about the types and sees that it will pass two integers into this method. Since it makes no sense to convert from int to Number to int it stays with integer in this case.

So after inlining the method ends up with even less local variables. The original local variable t1 is considered useless since it is only used one time. TAAS will put the code getTimer() at the position where t1 has been used and we end up with a heavily optimized method. What I like the most is that I do not have to change the way I write my ActionScript code. All happens behind the scenes automatically.
Some other optimizations are implicit. TAAS will use AddInt instead of Add if both operands are typed int. A FlowOptimizer is also involved and will invert the if expression for the loop. This can reduce the number of required jumps in a method by 50%.

I know that the output is not perfect. But this is all a work in progress and really the first result that I can share. The SWFs speak for them self.

TDSI Examples

Everyone likes examples. So here are three examples using TDSI. The archive includes a ready-to-go FDT project with post-compile ANT tasks configured.

Example01

This is the old code of the already optimized attractor using the Memory API instead of a Vector.<uint>.

Example02

In this case there exists no Particle class at all and no linked list. The particle information is stored inside the memory as well. Particles are extended to a fourth value so indexing a particle can be done with a simple bitshift which is very fast.

Example03

The last example uses float instead of double values for the particles. The framerate stays the same which is really cool because the memory usage drops. Before a particle consisted of four doubles which is a total of 4 * 8b = 32b. In this example each particle takes up only 16b. There the memory difference is 0x4B0000b which is about 4.7mb in total.
And also the first version needs about 20mb on my machine which means about 12mb of RAM are not wasted. Pretty cool when thinking about devices with less memory.

By the way I just stumpled across a bug when using [Embed]. Hopefully it will be easy to fix.