Classloader leaks III – “Die Thread, die!”

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In my previous post we looked at different categories of ClassLoader leaks, and looked at a particular example of a reference from outside the web application ClassLoader (a JVM shutdown hook pointing to a JAI class).

In this post we will look at another category; unterminated Threads running in your ClassLoader. This is a problem you can easily create yourself, but it may also come from third party libraries.

MAT analysis with running thread

When doing the “load all classes from third party JARs” test mentioned in my former post, and analyzing it with the technique outlined in my first post, I also ended up with this finding:

Batik analysis

As you can see, it is a thread still running inside my ClassLoader. We can also see, that the thread seems to be part of the Batik library. I was using version 1.5 beta 4, so let’s dig into the sources.

org.apache.batik.util.SoftReferenceCache (from line 181):

    private static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

org.apache.batik.ext.awt.image.rendered.TileMap (from line 139):

    static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

So, what do we have here? Not one but two static blocks (executing as the class is loaded) starting threads that execute in a while(true) loop. Once such a Threads is started, there is no garbage collecting their ClassLoader – neither the ClassLoader having loaded the Thread class (if a custom subclass to java.lang.Thread), nor the Threads contextClassLoader. In theory, the contextClassLoader of the thread can be changed (although I believe that rarely makes sense), but to garbage collect the ClassLoader of a custom Threads subclass, the thread must stop executing.

In newer versions of Batik, the two pieces of code above have been merged together into a new class – org.apache.batik.util.CleanerThread. That’s good. What’s not good is that there is at the time of this writing still a while(true) loop… This problem has been reported, and a patch has been proposed.

Stopping the thread – gangsta style

Fortunately, a referece to the thread is held in both SoftReferenceCache and TileMap (as can be seen above). In the new CleanerThread, there is also a static reference:

public class CleanerThread extends Thread {

    static volatile ReferenceQueue queue = null;
    static CleanerThread  thread = null;

That enables us to get hold of the Thread instance using reflection (same as with the shutdown hook in the former post) and call stop() on the Thread. Note that stop() is deprecated, since it may lead to an incosistent state. (You can read more about that in the Thread.stop() JavaDoc and the document that is linked from there.)

In our case however, leaking ClassLoaders and the eventual java.lang.OutOfMemoryError: PermGen space is a bigger problem than any inconsistent state that – if it occurs – presumably affects the abandoned instance of our web application. The best thing we can do in a generic case, is give the thread a chance to finish execution first. So in the cleanup Servlet/context listener we looked at last time, we will add this method, and call it once for every thread that needs to be stopped.

public static void forceThreadStop(Thread thread) {
  thread.interrupt(); // Make Thread stop waiting in sleep(), wait() or join()

  try {
    thread.join(2000); // Give the Thread 2 seconds to finish executing
  } catch (InterruptedException e) {
    // join failed
  }

  // If still not done, kill it
  if (thread.isAlive())
    thread.stop();

Stopping threads gracefully

In case you spawn threads from your own code, you should make sure that there is either a definitive ending point for them or, in case they need to be executed over and over again like a watchdog thread as in the case with Batik, that there is a way to gracefully tell to Thread to stop executing.

So, instead of the while(true), you should have a boolean flag that can be altered in order to tell the thread it’s time to die.

public class MyThread extends Thread {

  private boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

It is very important to note however, that the above code is likely to still leak ClassLoaders. This is because the JVM may cache the value of fields per thread, which Heinz Kabutz explains in somewhat more detail in The Java Specialists’ Newsletter edition titled “The Law of the Blind Spot”.

As Heinz shows, the easiest solution is probably to add the volatile keyword.

public class MyThread extends Thread {

  private volatile boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

I encourage you to read Heinz’s entire article.

That’s all for this time. Until next post, good luck killing those threads!

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

  • seanf

    I noticed that the classloader-leak-prevention library doesn’t call thread.interrupt() as shown in forceThreadStop above. Why is that?

    • I cannot remember avoiding this deliberately, so I think you have found a bug. Thanks for the report – good catch!

      Fixed in Git and will be included in upcoming release.