Posts Tagged ‘optimisation’

Poster submitted

21 September 2008

I have submitted the soft-copy of my poster. Now it just remains to print it out in colour and stick it together for the poster presentations on Wednesday.

Now to get on with my final report, due all too soon…

Advertisements

Testing a few different JVMs

23 August 2008

So I found that Sun’s HotSpot Java VM does a fair bit of optimisation, and so the cases where my optimisations actually improve program performance are more limited than I thought they might have been. However, there are plenty of other JVMs around without so much run-time optimisation. I have tested here a number of free JVMs, but it is likely that similar results would be found in the JVMs used on many small devices such as cellphones.

I tried JamVM, GIJ, CACAO and Kaffe, as well as Sun Java 1.5.0 and 1.6.0 for comparison.

Here are the times I got (running a similar test to last time, though not using interfaces to confuse the JIT optimiser and using a smaller size for the test as some of the VMs were quite slow):

andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.6.0_07
Normal: 15 ms
Normal: 34 ms
Normal: 32 ms
Factored: 32 ms
Factored: 31 ms
Factored: 29 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.5.0
Normal: 46 ms
Normal: 46 ms
Normal: 46 ms
Factored: 61 ms
Factored: 39 ms
Factored: 39 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ jamvm ManualMethod
Normal: 812 ms
Normal: 837 ms
Normal: 825 ms
Factored: 230 ms
Factored: 230 ms
Factored: 241 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ gij ManualMethod
Normal: 3288 ms
Normal: 3228 ms
Normal: 3188 ms
Factored: 287 ms
Factored: 287 ms
Factored: 286 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ cacao ManualMethod
Normal: 65 ms
Normal: 76 ms
Normal: 76 ms
Factored: 11 ms
Factored: 10 ms
Factored: 11 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ kaffe ManualMethod
Normal: 284 ms
Normal: 307 ms
Normal: 317 ms
Factored: 10 ms
Factored: 10 ms
Factored: 10 ms
Side effects: 0

It seems from this that many VMs have less (or perhaps no) optimisation and greater overheads to method calls, and so in these cases my optimisations are quite helpful. Even the older 1.5.0 version of Sun’s HotSpot VM seems to benefit from my optimisations to the bytecode.

You are doing it wrong…

21 August 2008

SimpleLisp is an interpreter, written in Java, for a limited dialect of LISP. I have been using it as an example Java program to evaluate my compiler optimisations. I just tried running a simple Lisp program (to calculate the Fibonacci sequence) in SimpleLisp to see the affect of my optimisations on the run time. Here are some results.

SimpleLisp compiled with jkit, without my optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m3.380s
user 0m3.294s
sys 0m0.031s

SimpleLisp compiled with jkit, with my LICM optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m4.289s
user 0m4.136s
sys 0m0.051s

Something is not quite right here…

More tests

21 August 2008

Following my tests a couple of days ago and the ensuing discussion, I have done some further testing to better understand what the HotSpot optimiser in the Sun Java VM can do.

In particular, I made an interface called CollectionInterface which my TestCollection class adheres to, and made the test program use that instead of the class type. This means that HotSpot does not have enough static type information to inline method calls on the TestCollection class (or whatever it was doing with them), and so the manually factored loop is significantly faster than the normal one.

Here is my test program now:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long normalTime = test(list);
long normalTime2 = test(list);
long normalTime3 = test(list);
long factoredTime = testFactored(list);
long factoredTime2 = testFactored(list);
long factoredTime3 = testFactored(list);

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
System.out.println("Normal: " + normalTime3 + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Factored: " + factoredTime2 + " ms");
System.out.println("Factored: " + factoredTime3 + " ms");
System.out.println("Side effects: " + list.sideEffects);
}

private static long test(CollectionInterface list) {
long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
return System.currentTimeMillis() - startTime;
}

private static long testFactored(CollectionInterface list) {
int length = list.size();
long startTime = System.currentTimeMillis();
for (int i = 0; i < length; ++i) {
}
return System.currentTimeMillis() - startTime;
}
}

And the output:

$ java ManualMethod
Normal: 2192 ms
Normal: 2182 ms
Normal: 2204 ms
Factored: 737 ms
Factored: 727 ms
Factored: 885 ms
Side effects: 0

With the server VM, the effect is now even more pronounced:

$ java -server ManualMethod
Normal: 1458 ms
Normal: 1451 ms
Normal: 1445 ms
Factored: 1 ms
Factored: 0 ms
Factored: 0 ms
Side effects: 0

Presumable here the server VM can completely remove the loop in the factored case as it is not producing any useful results, but it cannot know whether the call to CollectionInterface.size() will have side effects so cannot remove the loop in the case that calls that on every iteration.

Testing manual optimisation

19 August 2008

I have just made a few simple tests to see how much code movement of the sort performed by my optimisation stage actually helps Java programs’ run time, and the results are fairly disappointing.

First, I tried moving calls to Array.length, using this code:

public class ManualArray {
public static void main(String[] args) {
char[] array = new char[10000000];

long startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = array.length;
for (int i = 0; i < length * 100; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

The output from running this (compiled with javac 1.6.0_03-p3, running with java 1.6.0_03-p3, both from Sun) looks something like this:

$ java ManualArray
Normal: 2172 ms
Factored: 730 ms
Normal: 726 ms

Running with java -server to use the Java HotSpot Server VM, the results are not promising either:

$ java -server ManualArray
Normal: 9 ms
Factored: 0 ms
Normal: 0 ms

I then tested moving a simple method call:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = list.size();
for (int i = 0; i < length; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

And the results this time:

$ java ManualMethod
Normal: 1451 ms
Factored: 725 ms
Normal: 724 ms
$ java -server ManualMethod
Normal: 4 ms
Factored: 0 ms
Normal: 0 ms

This is rather disappointing, as it suggests that my optimisations will not actually help much. I am inclined to suspect that the reason the code movement does not help is that the JIT compiler in the JVM already does some optimisation of this sort (perhaps by inlining short methods), though it is also possible that the method call overhead is really low.

If I make TestCollection.size() slower by adding some useless code to it, then the optimisation makes a lot more difference:

$ java ManualMethod
Normal: 11253 ms
Factored: 835 ms
Normal: 11134 ms

If we use the server VM, the results are very strange:

$ java -server ManualMethod
Normal: 7 ms
Factored: 0 ms
Normal: 5773 ms

I cannot see why the first normal run should be so much quicker than the second: the reverse was observed in all the other tests before this, which was presumably due to the JVM doing some JIT compilation once it decided that this would be worthwhile.

A little refactoring generalises to allow a little more optimisation

24 July 2008

Up till now, my optimisation stage would only move expressions out of loops if they were exactly one of the cases discussed (Array.length or call to a pure method meeting various conditions). In doing the checking required to detect when such movement was safe, however, I had a method in my code called isInvariantReference which checks whether a given expression can be guaranteed not to change within the loop, such as if it is a constant, a local variable which is not assigned to in the loop, an arithmetic expression with all parts meeting the same condition, or various other types of expression which will not change value. This is needed, for example, as part of the condition imposed on arguments to the pure method call for it to be able to be moved out of the loop.

I realised that this isInvariantReference method was in fact doing a very similar thing to my checks for expressions that can be moved out of the loop. By combining the two, I can move compound expressions rather than just the actual method calls or field dereferences, and so make the optimisation slightly more effective. The change I made was to have isInvariantReference also detect the Array.length and pure method call cases, and then replace the part that previously detected these cases with a call to isInvariantReference (and some other simple checks, to prevent infinite recursion). This means that the usual loop invariant code motion optimisations can be made in addition to my new optimisations. This refactoring also results in a little less code duplication in my compiler stage.

For example, take the code snippet below:

for (int i = 0; i < listA.size() * listB.size(); ++i) {
System.out.println(listA.getWith(i, listB));
}

Assuming the appropriate conditions are met, this code would before have been optimised to something like this:

int $invariant0 = listA.size();
int $invariant1 = listB.size();
for (int i = 0; i < $invariant0 * $invariant1; ++i) {
System.out.println(listA.getWith(i, listB));
}

It will now be transformed into something like this:

int $invariant0 = listA.size() * listB.size();
for (int i = 0; i < $invariant0; ++i) {
System.out.println(listA.getWith(i, listB));
}

This saves a local variable and a multiplication each time through the loop.

Many other examples are possible. To be honest this is not likely to be a big improvement over the optimisation as it stood before, but it still seems worth doing, even if only because it makes the code nicer.

Annotating SimpleLisp

16 July 2008

I have now added a fair number of annotations to Dave’s SimpleLisp interpreter, and tested compiling it with my loop invariant code movement optimisations. Results were moderately encouraging: there were a fair number of cases where the Array.length optimisation could be applied, and a few for pure methods (all of which were lengths of collection types).

Non-immutable classes, and edge expressions

23 June 2008

I finally have a reasonable version of the optimisation I planned at the beginning of this project working well. Rather than requiring all the classes passed as arguments to pure methods to be immutable, I can do sufficient analysis (along with yet more annotations) to satisfy a weaker requirement. Pure method calls can safely be moved out of loops if (as before) each argument is an invariant reference and either a primitive type or an immutable class type (marked by the @Immutable annotation), but I can now detect a third case, where the following three conditions are all met:

  1. The reference is unique; specifically, it is a local variable annotated @Unique.
  2. There are no statements in the loop which change the object via the reference (i.e. no calls to const methods that are neither @Pure nor @Const).
  3. There is nothing to change objects referenced within the object (the class is marked as @Encapsulated)

The extra annotations (@Unique and @Encapsulated) absolve me from having to solve the aliasing problem, passing the problem onto the programmer (who hopefully understands how eir program works and where aliasing can happen) instead. The @Encapsulated annotation in the last condition is defined to mean that there are guaranteed not to be any references into the internal structure of objects of the class, so the only way to change the state of the object is by calling methods on it.

While I was testing this, I realised that my optimiser did not consider expressions on flow graph edges (which is how JKit expresses conditions in its intermediate representation) at all, neither trying to factor them nor checking them when it looks for statements that might modify an object. To rectify this I had to first compute the set of relevant edges by finding all edges in the flow graph having both source and destination within the set of points in the LoopBody region. I then used this edge set as well as the Point set when factoring expressions and when looking for expressions that might change an object.

Constant folding etc.

17 April 2008

To make a start with writing optimisation stages in JKit I have written a simple constant folding optimisation stage. This will do some simple compile-time evaluation of arithmetic expressions, for example turning int x = 1 + 2; into int x = 3;.

After talking to Lindsay yesterday we have decided that the next step is for me to write some code to traverse the flow graph to find loops, in particular to identify which vertices are the tops of loops. I still need to work out where this and other information discovered during the traversal should be stored: perhaps in the flow graph somehow, perhaps in separate data structures kept locally, or maybe somewhere else.

As a first step to doing the actual code movement of loop invariants I will probably add an annotation for certain method calls to say that they certainly can be moved (e.g. if they just return a constant value and have no side-effects). The next step will probably be more annotations for objects which the programmer guarantees will have no aliases (somewhat like ownership, I think), or perhaps some sort of immutability.

Lindsay suggested that another possible direction is to do some sort of constant folding / propagation for immutable objects like Strings (for the length() method, for example), or partially immutable objects like arrays where the length cannot change. The length could be found once and cached in a local variable, or it might be possible to infer the length from how the array is constructed. For example, it would be nice if the following code:

int[] a = new int[n];
System.out.println(a.length);

could be transformed to

int[] a = new int[n];
System.out.println(n);