Archive for August, 2008

A more successful test

24 August 2008

Talking to a friend this afternoon, I thought of another case that might be relatively common where my optimisation would apply and potentially save a fair bit of time. The situation is where two nested loops are used to loop through different arrays or lists, and then inside the inner loop are expressions that reference particular values from the collections indexed by their respective loop variables. To look at it another way, this sort of pattern is working with the Cartesian product of the two arrays.

Here is a simple example:

@Unique int[] a = new int[10000];
@Unique int[] b = new int[20000];

int sum = 0;

for (int i = 0; i < a.length; ++i) {
for (int j = 0; j < b.length; ++j) {
sum += a[i] * b[j];
}
}

System.out.println("Sum: " + sum);

My optimisations move 3 expressions: a.length and b.length can be moved out of both loops, while a[i] can be moved out of the inner loop. It is this latter optimisation that is the most helpful.

Compiling this with JKit with and without my optimisations and then running with Sun Java 1.6.0 gives these times:
Without optimisation:

$ time java NestedProduct
Sum: 0

real 0m0.923s
user 0m0.829s
sys 0m0.051s

With optimisation:

$ time java NestedProduct
Sum: 0

real 0m0.724s
user 0m0.657s
sys 0m0.019s

Supporting code movement in this case required a few changes to my JKit stage, so it will be interesting to see if this allows any more optimisation in my evaluation programs as well.

Testing a few different JVMs

23 August 2008

So I found that Sun’s HotSpot Java VM does a fair bit of optimisation, and so the cases where my optimisations actually improve program performance are more limited than I thought they might have been. However, there are plenty of other JVMs around without so much run-time optimisation. I have tested here a number of free JVMs, but it is likely that similar results would be found in the JVMs used on many small devices such as cellphones.

I tried JamVM, GIJ, CACAO and Kaffe, as well as Sun Java 1.5.0 and 1.6.0 for comparison.

Here are the times I got (running a similar test to last time, though not using interfaces to confuse the JIT optimiser and using a smaller size for the test as some of the VMs were quite slow):

andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.6.0_07
Normal: 15 ms
Normal: 34 ms
Normal: 32 ms
Factored: 32 ms
Factored: 31 ms
Factored: 29 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.5.0
Normal: 46 ms
Normal: 46 ms
Normal: 46 ms
Factored: 61 ms
Factored: 39 ms
Factored: 39 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ jamvm ManualMethod
Normal: 812 ms
Normal: 837 ms
Normal: 825 ms
Factored: 230 ms
Factored: 230 ms
Factored: 241 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ gij ManualMethod
Normal: 3288 ms
Normal: 3228 ms
Normal: 3188 ms
Factored: 287 ms
Factored: 287 ms
Factored: 286 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ cacao ManualMethod
Normal: 65 ms
Normal: 76 ms
Normal: 76 ms
Factored: 11 ms
Factored: 10 ms
Factored: 11 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ kaffe ManualMethod
Normal: 284 ms
Normal: 307 ms
Normal: 317 ms
Factored: 10 ms
Factored: 10 ms
Factored: 10 ms
Side effects: 0

It seems from this that many VMs have less (or perhaps no) optimisation and greater overheads to method calls, and so in these cases my optimisations are quite helpful. Even the older 1.5.0 version of Sun’s HotSpot VM seems to benefit from my optimisations to the bytecode.

You are doing it wrong…

21 August 2008

SimpleLisp is an interpreter, written in Java, for a limited dialect of LISP. I have been using it as an example Java program to evaluate my compiler optimisations. I just tried running a simple Lisp program (to calculate the Fibonacci sequence) in SimpleLisp to see the affect of my optimisations on the run time. Here are some results.

SimpleLisp compiled with jkit, without my optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m3.380s
user 0m3.294s
sys 0m0.031s

SimpleLisp compiled with jkit, with my LICM optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m4.289s
user 0m4.136s
sys 0m0.051s

Something is not quite right here…

More tests

21 August 2008

Following my tests a couple of days ago and the ensuing discussion, I have done some further testing to better understand what the HotSpot optimiser in the Sun Java VM can do.

In particular, I made an interface called CollectionInterface which my TestCollection class adheres to, and made the test program use that instead of the class type. This means that HotSpot does not have enough static type information to inline method calls on the TestCollection class (or whatever it was doing with them), and so the manually factored loop is significantly faster than the normal one.

Here is my test program now:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long normalTime = test(list);
long normalTime2 = test(list);
long normalTime3 = test(list);
long factoredTime = testFactored(list);
long factoredTime2 = testFactored(list);
long factoredTime3 = testFactored(list);

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
System.out.println("Normal: " + normalTime3 + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Factored: " + factoredTime2 + " ms");
System.out.println("Factored: " + factoredTime3 + " ms");
System.out.println("Side effects: " + list.sideEffects);
}

private static long test(CollectionInterface list) {
long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
return System.currentTimeMillis() - startTime;
}

private static long testFactored(CollectionInterface list) {
int length = list.size();
long startTime = System.currentTimeMillis();
for (int i = 0; i < length; ++i) {
}
return System.currentTimeMillis() - startTime;
}
}

And the output:

$ java ManualMethod
Normal: 2192 ms
Normal: 2182 ms
Normal: 2204 ms
Factored: 737 ms
Factored: 727 ms
Factored: 885 ms
Side effects: 0

With the server VM, the effect is now even more pronounced:

$ java -server ManualMethod
Normal: 1458 ms
Normal: 1451 ms
Normal: 1445 ms
Factored: 1 ms
Factored: 0 ms
Factored: 0 ms
Side effects: 0

Presumable here the server VM can completely remove the loop in the factored case as it is not producing any useful results, but it cannot know whether the call to CollectionInterface.size() will have side effects so cannot remove the loop in the case that calls that on every iteration.

Testing manual optimisation

19 August 2008

I have just made a few simple tests to see how much code movement of the sort performed by my optimisation stage actually helps Java programs’ run time, and the results are fairly disappointing.

First, I tried moving calls to Array.length, using this code:

public class ManualArray {
public static void main(String[] args) {
char[] array = new char[10000000];

long startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = array.length;
for (int i = 0; i < length * 100; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

The output from running this (compiled with javac 1.6.0_03-p3, running with java 1.6.0_03-p3, both from Sun) looks something like this:

$ java ManualArray
Normal: 2172 ms
Factored: 730 ms
Normal: 726 ms

Running with java -server to use the Java HotSpot Server VM, the results are not promising either:

$ java -server ManualArray
Normal: 9 ms
Factored: 0 ms
Normal: 0 ms

I then tested moving a simple method call:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = list.size();
for (int i = 0; i < length; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

And the results this time:

$ java ManualMethod
Normal: 1451 ms
Factored: 725 ms
Normal: 724 ms
$ java -server ManualMethod
Normal: 4 ms
Factored: 0 ms
Normal: 0 ms

This is rather disappointing, as it suggests that my optimisations will not actually help much. I am inclined to suspect that the reason the code movement does not help is that the JIT compiler in the JVM already does some optimisation of this sort (perhaps by inlining short methods), though it is also possible that the method call overhead is really low.

If I make TestCollection.size() slower by adding some useless code to it, then the optimisation makes a lot more difference:

$ java ManualMethod
Normal: 11253 ms
Factored: 835 ms
Normal: 11134 ms

If we use the server VM, the results are very strange:

$ java -server ManualMethod
Normal: 7 ms
Factored: 0 ms
Normal: 5773 ms

I cannot see why the first normal run should be so much quicker than the second: the reverse was observed in all the other tests before this, which was presumably due to the JVM doing some JIT compilation once it decided that this would be worthwhile.