Posts Tagged ‘Java’

Poster submitted

21 September 2008

I have submitted the soft-copy of my poster. Now it just remains to print it out in colour and stick it together for the poster presentations on Wednesday.

Now to get on with my final report, due all too soon…

Postering continues

19 September 2008

I have spent the last week working on my poster, which is coming along slowly. I will post it here after I submit it on Sunday. I am getting varying advice, but it seems I have to cut down on the amount of text. It could also do with some better graphic design, though that may not happen.

Also, Stephen Nelson just pointed me to a paper Verifiable Functional Purity in Java which looks interesting.

What I have been doing

12 September 2008

Since my last post, I have made another version of the IntersectSquares test, this one using a list class rather than just an array. There were again speed improvements found, but not quite so striking.

Other than that, I have started to prepare my poster (which is due Sunday week, 2008-09-21). Alex gave a talk yesterday with some general advice on how to make good posters, and I talked a bit to my supervisors today about it. I am getting some conflicting advice, though, about how to structure the poster, which parts to emphasise more, where to include examples, and so on.

More testing

3 September 2008

I wrote a slightly more interesting program called IntersectSquares, which generates an array of a large number of rectangles, then counts how many times they intersect each other. This is written in a fairly naive way, so there are a fair few expression for the optimiser to move out of the loop. I tried it here with 100,000 rectangles:

With jkit:

andrew@rise:test cases$ time java IntersectSquares; time java IntersectSquares
3290 intersections

real 0m58.843s
user 0m58.470s
sys 0m0.020s
3290 intersections

real 0m58.918s
user 0m58.491s
sys 0m0.040s

With javac:

andrew@rise:test cases$ time java IntersectSquares; time java IntersectSquares; time java IntersectSquares
3290 intersections

real 1m4.690s
user 1m3.783s
sys 0m0.030s
3290 intersections

real 1m4.908s
user 1m3.805s
sys 0m0.051s
3290 intersections

real 1m4.209s
user 1m3.695s
sys 0m0.041s

A more successful test

24 August 2008

Talking to a friend this afternoon, I thought of another case that might be relatively common where my optimisation would apply and potentially save a fair bit of time. The situation is where two nested loops are used to loop through different arrays or lists, and then inside the inner loop are expressions that reference particular values from the collections indexed by their respective loop variables. To look at it another way, this sort of pattern is working with the Cartesian product of the two arrays.

Here is a simple example:

@Unique int[] a = new int[10000];
@Unique int[] b = new int[20000];

int sum = 0;

for (int i = 0; i < a.length; ++i) {
for (int j = 0; j < b.length; ++j) {
sum += a[i] * b[j];
}
}

System.out.println("Sum: " + sum);

My optimisations move 3 expressions: a.length and b.length can be moved out of both loops, while a[i] can be moved out of the inner loop. It is this latter optimisation that is the most helpful.

Compiling this with JKit with and without my optimisations and then running with Sun Java 1.6.0 gives these times:
Without optimisation:

$ time java NestedProduct
Sum: 0

real 0m0.923s
user 0m0.829s
sys 0m0.051s

With optimisation:

$ time java NestedProduct
Sum: 0

real 0m0.724s
user 0m0.657s
sys 0m0.019s

Supporting code movement in this case required a few changes to my JKit stage, so it will be interesting to see if this allows any more optimisation in my evaluation programs as well.

Testing a few different JVMs

23 August 2008

So I found that Sun’s HotSpot Java VM does a fair bit of optimisation, and so the cases where my optimisations actually improve program performance are more limited than I thought they might have been. However, there are plenty of other JVMs around without so much run-time optimisation. I have tested here a number of free JVMs, but it is likely that similar results would be found in the JVMs used on many small devices such as cellphones.

I tried JamVM, GIJ, CACAO and Kaffe, as well as Sun Java 1.5.0 and 1.6.0 for comparison.

Here are the times I got (running a similar test to last time, though not using interfaces to confuse the JIT optimiser and using a smaller size for the test as some of the VMs were quite slow):

andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.6.0_07
Normal: 15 ms
Normal: 34 ms
Normal: 32 ms
Factored: 32 ms
Factored: 31 ms
Factored: 29 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ java ManualMethod #Sun Java 1.5.0
Normal: 46 ms
Normal: 46 ms
Normal: 46 ms
Factored: 61 ms
Factored: 39 ms
Factored: 39 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ jamvm ManualMethod
Normal: 812 ms
Normal: 837 ms
Normal: 825 ms
Factored: 230 ms
Factored: 230 ms
Factored: 241 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ gij ManualMethod
Normal: 3288 ms
Normal: 3228 ms
Normal: 3188 ms
Factored: 287 ms
Factored: 287 ms
Factored: 286 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ cacao ManualMethod
Normal: 65 ms
Normal: 76 ms
Normal: 76 ms
Factored: 11 ms
Factored: 10 ms
Factored: 11 ms
Side effects: 0
andrew@rimu:~/COMP489/test cases$ kaffe ManualMethod
Normal: 284 ms
Normal: 307 ms
Normal: 317 ms
Factored: 10 ms
Factored: 10 ms
Factored: 10 ms
Side effects: 0

It seems from this that many VMs have less (or perhaps no) optimisation and greater overheads to method calls, and so in these cases my optimisations are quite helpful. Even the older 1.5.0 version of Sun’s HotSpot VM seems to benefit from my optimisations to the bytecode.

You are doing it wrong…

21 August 2008

SimpleLisp is an interpreter, written in Java, for a limited dialect of LISP. I have been using it as an example Java program to evaluate my compiler optimisations. I just tried running a simple Lisp program (to calculate the Fibonacci sequence) in SimpleLisp to see the affect of my optimisations on the run time. Here are some results.

SimpleLisp compiled with jkit, without my optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m3.380s
user 0m3.294s
sys 0m0.031s

SimpleLisp compiled with jkit, with my LICM optimisation stage:

$ time java org/simplelisp/interpreter/Interpreter examples/Fibonacci.sl
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393

real 0m4.289s
user 0m4.136s
sys 0m0.051s

Something is not quite right here…

More tests

21 August 2008

Following my tests a couple of days ago and the ensuing discussion, I have done some further testing to better understand what the HotSpot optimiser in the Sun Java VM can do.

In particular, I made an interface called CollectionInterface which my TestCollection class adheres to, and made the test program use that instead of the class type. This means that HotSpot does not have enough static type information to inline method calls on the TestCollection class (or whatever it was doing with them), and so the manually factored loop is significantly faster than the normal one.

Here is my test program now:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long normalTime = test(list);
long normalTime2 = test(list);
long normalTime3 = test(list);
long factoredTime = testFactored(list);
long factoredTime2 = testFactored(list);
long factoredTime3 = testFactored(list);

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
System.out.println("Normal: " + normalTime3 + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Factored: " + factoredTime2 + " ms");
System.out.println("Factored: " + factoredTime3 + " ms");
System.out.println("Side effects: " + list.sideEffects);
}

private static long test(CollectionInterface list) {
long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
return System.currentTimeMillis() - startTime;
}

private static long testFactored(CollectionInterface list) {
int length = list.size();
long startTime = System.currentTimeMillis();
for (int i = 0; i < length; ++i) {
}
return System.currentTimeMillis() - startTime;
}
}

And the output:

$ java ManualMethod
Normal: 2192 ms
Normal: 2182 ms
Normal: 2204 ms
Factored: 737 ms
Factored: 727 ms
Factored: 885 ms
Side effects: 0

With the server VM, the effect is now even more pronounced:

$ java -server ManualMethod
Normal: 1458 ms
Normal: 1451 ms
Normal: 1445 ms
Factored: 1 ms
Factored: 0 ms
Factored: 0 ms
Side effects: 0

Presumable here the server VM can completely remove the loop in the factored case as it is not producing any useful results, but it cannot know whether the call to CollectionInterface.size() will have side effects so cannot remove the loop in the case that calls that on every iteration.

Testing manual optimisation

19 August 2008

I have just made a few simple tests to see how much code movement of the sort performed by my optimisation stage actually helps Java programs’ run time, and the results are fairly disappointing.

First, I tried moving calls to Array.length, using this code:

public class ManualArray {
public static void main(String[] args) {
char[] array = new char[10000000];

long startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = array.length;
for (int i = 0; i < length * 100; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < array.length * 100; ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

The output from running this (compiled with javac 1.6.0_03-p3, running with java 1.6.0_03-p3, both from Sun) looks something like this:

$ java ManualArray
Normal: 2172 ms
Factored: 730 ms
Normal: 726 ms

Running with java -server to use the Java HotSpot Server VM, the results are not promising either:

$ java -server ManualArray
Normal: 9 ms
Factored: 0 ms
Normal: 0 ms

I then tested moving a simple method call:

public class ManualMethod {
public static void main(String[] args) {
TestCollection list = new TestCollection(1000000000);

long startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
int length = list.size();
for (int i = 0; i < length; ++i) {
}
long factoredTime = System.currentTimeMillis() - startTime;

startTime = System.currentTimeMillis();
for (int i = 0; i < list.size(); ++i) {
}
long normalTime2 = System.currentTimeMillis() - startTime;

System.out.println("Normal: " + normalTime + " ms");
System.out.println("Factored: " + factoredTime + " ms");
System.out.println("Normal: " + normalTime2 + " ms");
}
}

And the results this time:

$ java ManualMethod
Normal: 1451 ms
Factored: 725 ms
Normal: 724 ms
$ java -server ManualMethod
Normal: 4 ms
Factored: 0 ms
Normal: 0 ms

This is rather disappointing, as it suggests that my optimisations will not actually help much. I am inclined to suspect that the reason the code movement does not help is that the JIT compiler in the JVM already does some optimisation of this sort (perhaps by inlining short methods), though it is also possible that the method call overhead is really low.

If I make TestCollection.size() slower by adding some useless code to it, then the optimisation makes a lot more difference:

$ java ManualMethod
Normal: 11253 ms
Factored: 835 ms
Normal: 11134 ms

If we use the server VM, the results are very strange:

$ java -server ManualMethod
Normal: 7 ms
Factored: 0 ms
Normal: 5773 ms

I cannot see why the first normal run should be so much quicker than the second: the reverse was observed in all the other tests before this, which was presumably due to the JVM doing some JIT compilation once it decided that this would be worthwhile.

Non-immutable classes, and edge expressions

23 June 2008

I finally have a reasonable version of the optimisation I planned at the beginning of this project working well. Rather than requiring all the classes passed as arguments to pure methods to be immutable, I can do sufficient analysis (along with yet more annotations) to satisfy a weaker requirement. Pure method calls can safely be moved out of loops if (as before) each argument is an invariant reference and either a primitive type or an immutable class type (marked by the @Immutable annotation), but I can now detect a third case, where the following three conditions are all met:

  1. The reference is unique; specifically, it is a local variable annotated @Unique.
  2. There are no statements in the loop which change the object via the reference (i.e. no calls to const methods that are neither @Pure nor @Const).
  3. There is nothing to change objects referenced within the object (the class is marked as @Encapsulated)

The extra annotations (@Unique and @Encapsulated) absolve me from having to solve the aliasing problem, passing the problem onto the programmer (who hopefully understands how eir program works and where aliasing can happen) instead. The @Encapsulated annotation in the last condition is defined to mean that there are guaranteed not to be any references into the internal structure of objects of the class, so the only way to change the state of the object is by calling methods on it.

While I was testing this, I realised that my optimiser did not consider expressions on flow graph edges (which is how JKit expresses conditions in its intermediate representation) at all, neither trying to factor them nor checking them when it looks for statements that might modify an object. To rectify this I had to first compute the set of relevant edges by finding all edges in the flow graph having both source and destination within the set of points in the LoopBody region. I then used this edge set as well as the Point set when factoring expressions and when looking for expressions that might change an object.