Monthly Archives: January 2012

Classloader leaks IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In the series on classloader leaks, it’s time to talk about ThreadLocals. As you know, java.lang.ThreadLocal provides a means to achieve thread safety in a multi-threaded environment, such as a web application. They can also be used to allow a thread access to some data in all layers of an application.

However, unless used correctly ThreadLocals are also a common cause of classloader leaks in web application environment. This is because Application Servers use thread pools, which means that a Thread may very well outlive your web application instance. So in case a ThreadLocal has a reference to something inside your classloader, there is a great risk that it will prevent your classloader from being garbage collected and you end up with java.lang.OutOfMemoryError: PermGen space / Metaspace. So in a web application environment, ThreadLocals should rather be considered to be ThreadGlobals, since it’s best to assume they will remain as long as the Application Server is running unless explicitly remove()d.

Avoid leaks caused by ThreadLocals

The most straightforward way to avoid this problem, is to enclose all your ThreadLocal usage in a try/finally block and make sure you remove the value before the Thread is returned to the thread pool.

try {
  threadLocal.set(value);
  ...
}
finally {
  threadLocal.remove();
}

When and why it leaks

You may think that java.lang.ThreadLocal is implemented with a WeakHashMap with Thread as key. If that was the case, they would probably have been less likely to cause classloader leaks, since all references would be cleared as the ThreadLocal instance was garbage collected.

But instead, ThreadLocal uses the Thread as storage. To be exact, java.lang.Thread has a threadLocals attribute of type java.lang.ThreadLocal.ThreadLocalMap. That is a Map with a WeakReference to the ThreadLocal instance as key and the threads value as value.

Some sources claim, that using custom ThreadLocal subclasses loaded by your webapp classloader will cause leaks, however this is not the case (concluded with both theory and test case – the WeakReference of the key does it’s job).

However ThreadLocal values loaded by our classloader, including values with strong references to such classes (for example a java.util.List of our own classes) are likely to cause leaks. This is due to two facts. The first and obvious is that the references from the Thread to the value are strong references. But then you might ask, isn’t the strong value reference removed when the WeakReference key is garbage collected? Well, like with WeakHashMap the values are not immedately removed. WeakHashMap however uses a ReferenceQueue to keep informed about what keys have been removed, and internally calls the private expungeStaleEntries() when servicing any public method call. ThreadLocalMap however does not use a ReferenceQueue and thus removes the unused values much more rarely and unpredictably, as stated by it’s JavaDoc:

“However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space”

Third party example analyzed in Eclipse Memory Analyzer

Here is what an uncleared ThreadLocal can look like in a MAT analysis:
ThreadLocal analysis
This however does not show what causes the problem. In order to find that, make note of the entry index in brackets (38 in our case) and right click the ThreadLocal and select “List objects”, “with outgoing references”
ThreadLocal analysis 2
Find the correct entry with the index from the previous list
ThreadLocal analysis 3
At this stage, “referent” is the ThreadLocal instance and “value” is the value for the Thread. In this case we are lucky, in that a custom ThreadLocal subclass is used, so we can easily see what is causing the problem. In case no subclass is used, we could have right clicked the “referent” and selected “List objects”, “with incoming references” to see what class holds the ThreadLocal
ThreadLocal analysis 4

In this case, the problem is within Apache Axis, version 1.4. Looking at the source code of org.apache.axis.utils.XMLUtils we can confirm that it uses as custom ThreadLocal, kept in the static documentBuilder attribute.

    private static class ThreadLocalDocumentBuilder extends ThreadLocal {
        protected Object initialValue() {
            try {
                return getDOMFactory().newDocumentBuilder();
            } catch (ParserConfigurationException e) {
                log.error(Messages.getMessage("parserConfigurationException00"),
                        e);
            }
            return null; 
        }
    }     
    private static ThreadLocalDocumentBuilder documentBuilder = new ThreadLocalDocumentBuilder(); 

Which fails to be cleared:

    public static DocumentBuilder getDocumentBuilder() throws ParserConfigurationException {
        return (DocumentBuilder) documentBuilder.get();
    }

(Note that get() will set the value to what is returned by the overridden initialValue().)

Preventing ThreadLocal leaks

Update: The ClassLoader Leak Prevention library in part IV includes a much more sophisticated preventative measure against ThreadLocal leaks.

Trying to fix ThreadLocal leaks out of your control during application shutdown is risky due to concurrency issues. Instead you could to take care of the problem while the thread is still under your control, before it is returned to the thread pool. We can achieve this using a Servlet Filter. In the case above we have a static reference to the offending ThreadLocal. We could therefore create a filter like the following one. (Note that the code has been simplified in that all exceptions have been ignored and null is assumed to be returned instead. This of course will not compile, but hopefully makes the principle clearer.)

public class ThreadLocalLeakPreventionFilter implements javax.servlet.Filter {

  private ThreadLocal[] offendingThreadLocals;

  public void init(FilterConfig filterConfig) throws ServletException {
    List<ThreadLocal> threadLocals = new ArrayList<ThreadLocal>();

    // TODO: Needs error handling!!!
    Class clazz = Class.forName("org.apache.axis.utils.XMLUtils");
    if(clazz != null) {
      final Field threadLocalField = 
          clazz.getDeclaredField("documentBuilder");
      if(threadLocalField != null) {
        threadLocalField.setAccessible(true);
        Object threadLocal = threadLocalField.get(null);
        if(threadLocal instanceof ThreadLocal) {
          threadLocals.add((ThreadLocal)threadLocal);
        }
      }
    }
    
    // TODO: Look up more offenders here
    
    this.offendingThreadLocals = 
        threadLocals.toArray(new ThreadLocal[threadLocals.size()]);
  }

  /** 
   * In the doFilter() method we have a chance to clean up the thread
   * before it is returned to the thread pool 
   */
  public void doFilter(ServletRequest servletRequest, 
                       ServletResponse servletResponse, 
                       FilterChain filterChain) 
      throws IOException, ServletException {
    
    try {
      filterChain.doFilter(servletRequest, servletResponse);
    }
    finally {
      // Clean up ThreadLocals
      for(ThreadLocal offendingThreadLocal : offendingThreadLocals) {
        offendingThreadLocal.remove(); // Remove offender from current thread
      }
    }
  }

  public void destroy() {
    offendingThreadLocals = null; // Make available for Garbage Collector
  }
-->
}

In case there was no static reference to the ThreadLocal to grab hold of, we could have looped the entries of the ThreadLocalMap via reflection in the doFilter() method, and looked for entries where either key or value is of a type loaded by the web app classloader. I might get back and show you exactly what that would look like.


Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks III – “Die Thread, die!”

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In my previous post we looked at different categories of ClassLoader leaks, and looked at a particular example of a reference from outside the web application ClassLoader (a JVM shutdown hook pointing to a JAI class).

In this post we will look at another category; unterminated Threads running in your ClassLoader. This is a problem you can easily create yourself, but it may also come from third party libraries.

MAT analysis with running thread

When doing the “load all classes from third party JARs” test mentioned in my former post, and analyzing it with the technique outlined in my first post, I also ended up with this finding:

Batik analysis

As you can see, it is a thread still running inside my ClassLoader. We can also see, that the thread seems to be part of the Batik library. I was using version 1.5 beta 4, so let’s dig into the sources.

org.apache.batik.util.SoftReferenceCache (from line 181):

    private static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

org.apache.batik.ext.awt.image.rendered.TileMap (from line 139):

    static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

So, what do we have here? Not one but two static blocks (executing as the class is loaded) starting threads that execute in a while(true) loop. Once such a Threads is started, there is no garbage collecting their ClassLoader – neither the ClassLoader having loaded the Thread class (if a custom subclass to java.lang.Thread), nor the Threads contextClassLoader. In theory, the contextClassLoader of the thread can be changed (although I believe that rarely makes sense), but to garbage collect the ClassLoader of a custom Threads subclass, the thread must stop executing.

In newer versions of Batik, the two pieces of code above have been merged together into a new class – org.apache.batik.util.CleanerThread. That’s good. What’s not good is that there is at the time of this writing still a while(true) loop… This problem has been reported, and a patch has been proposed.

Stopping the thread – gangsta style

Fortunately, a referece to the thread is held in both SoftReferenceCache and TileMap (as can be seen above). In the new CleanerThread, there is also a static reference:

public class CleanerThread extends Thread {

    static volatile ReferenceQueue queue = null;
    static CleanerThread  thread = null;

That enables us to get hold of the Thread instance using reflection (same as with the shutdown hook in the former post) and call stop() on the Thread. Note that stop() is deprecated, since it may lead to an incosistent state. (You can read more about that in the Thread.stop() JavaDoc and the document that is linked from there.)

In our case however, leaking ClassLoaders and the eventual java.lang.OutOfMemoryError: PermGen space is a bigger problem than any inconsistent state that – if it occurs – presumably affects the abandoned instance of our web application. The best thing we can do in a generic case, is give the thread a chance to finish execution first. So in the cleanup Servlet/context listener we looked at last time, we will add this method, and call it once for every thread that needs to be stopped.

public static void forceThreadStop(Thread thread) {
  thread.interrupt(); // Make Thread stop waiting in sleep(), wait() or join()

  try {
    thread.join(2000); // Give the Thread 2 seconds to finish executing
  } catch (InterruptedException e) {
    // join failed
  }

  // If still not done, kill it
  if (thread.isAlive())
    thread.stop();

Stopping threads gracefully

In case you spawn threads from your own code, you should make sure that there is either a definitive ending point for them or, in case they need to be executed over and over again like a watchdog thread as in the case with Batik, that there is a way to gracefully tell to Thread to stop executing.

So, instead of the while(true), you should have a boolean flag that can be altered in order to tell the thread it’s time to die.

public class MyThread extends Thread {

  private boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

It is very important to note however, that the above code is likely to still leak ClassLoaders. This is because the JVM may cache the value of fields per thread, which Heinz Kabutz explains in somewhat more detail in The Java Specialists’ Newsletter edition titled “The Law of the Blind Spot”.

As Heinz shows, the easiest solution is probably to add the volatile keyword.

public class MyThread extends Thread {

  private volatile boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

I encourage you to read Heinz’s entire article.

That’s all for this time. Until next post, good luck killing those threads!

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks II – Find and work around unwanted references

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In my previous post we learnt how to locate classloader leaks using Eclipse Memory Analyzer (MAT).

This time we will discuss different reasons for leaks, look at an example of a leak in a third party library, and see how we can fix that leak by a workaround.

Different reasons for ClassLoader leaks

In order to know what you should be looking for in your heapdump analysis, we could categorize ClassLoader leaks into three different types. In the end, they are all just variants of the first one.

  1. References from outside your webapp – that is from the application server or the JDK classes – to either the ClassLoader itself or one of the classes it has loaded (which in turn has a reference to the ClassLoader), including any instances of such classes.
     
  2. Threads running inside your webapp. If you spawn new threads from within your web application that may not terminate, they are likely to prevent your ClassLoader from being garbage collected. This can happen even if the thread does not use any of the classes loaded by your webapps ClassLoader. This is because threads have a context classloader, to which there is a reference (contextClassLoader) in the java.lang.Thread class. More about this in the next post.
     
  3. ThreadLocals with values whose class is loaded in your webapp. If you use ThreadLocals in your webapp, you need to explicitly clear all ThreadLocals before the webapp closes down. This is because a) the application server uses a thread pool, which means that the thread will outlive your webapp instance and b) ThreadLocal values are actually stored in the java.lang.Thread object. Therefore, this is just a variation of 1.
    (Note: This may be the case most likely created by yourself, but also exists in third party libraries.)

Example of reference from outside your application

When trying to hunt down a ClassLoader leak in our web application, I created a little JSP page in which I looped through all the third party JARs of our application. I tried to load every single class that was found in a custom ClassLoader, added a ZombieMarker to the ClassLoader (see previous post) and then disposed the ClassLoader. I ran the JSP page over and over again until I got a java.lang.OutOfMemoryError: PermGen space. That is, I was able to trigger ClassLoader leaks just by loading classes from our third party libraries… 🙁 It actually turned out to be more than one of them, that triggered this behaviour.

Here is a MAT trace for one of them:

(In this picture, it’s not obvious where our ClassLoader is. The custom ClassLoader was an anonymous inner class in my JSP, so it’s the second entry with the strange class name ending with $1.)

At first glance, it may seem like this is type 2 above, with a running thread. This is not the case however, since the thread itself is not the GC root (not at the bottom level). In fact, there is a Thread involved, but it is not running.

Rather we can see that what keeps our ClassLoader from being garbage collected is a reference from outside the webapp (java.lang.*) to an instance of com.sun.media.jai.codec.TempFileCleanupThread, which in turn is loaded by our ClassLoader. From the names of the referenced and referencing (java.lang.ApplicationShutdownHook) classes, I suspected that a JVM shutdown hook was added by some Java Advanced Imaging (JAI) class when it was loaded.

The com.sun.media.jai.codec.TempFileCleanupThread class is in the Codec part of JAI; version 1.1.2_01 in our case. The sources can be found in the official SVN repo (1.1.2_01 tag). As you can see, TempFileCleanupThread.java class is not in that list. That is because someone thought is was a great idea to put it as a package protected class in FileCacheSeekableStream.java.

There we can also find the source of the leak.

    // Create the cleanup thread. Use reflection to preserve compile-time
    // compatibility with JDK 1.2.
    static {
        try {
            Method shutdownMethod =
                Runtime.class.getDeclaredMethod("addShutdownHook",
                                                new Class[] {Thread.class});

            cleanupThread = new TempFileCleanupThread();

            shutdownMethod.invoke(Runtime.getRuntime(),
                                  new Object[] {cleanupThread});
        } catch(Exception e) {
            // Reset the Thread to null if Method.invoke failed.
            cleanupThread = null;
        }
    }

As suspected, there is a static block that (via reflection) adds a JVM shutdown hook, as soon as the com.sun.media.jai.codec.FileCacheSeekableStream class is loaded. Not very practical in a web application environment, since the JVM will will not shutdown until the application server is shut down.

The JAI TempFileCleanupThread is supposed to delete temporary files when the JVM shuts down. In a web application, what we want is probably to remove those temporary files as soon as the web application is redeployed. If this was our own code, we should have changed this. In this case it’s a third party library, and judging from the SVN trunk, this still has not been fixed, so upgrading doesn’t help. (This has been reported here.)

Cleaning up leaking references at redeploy

In order to clean up references as part of web application shutdown, to prevent ClassLoader leaks, there are two approaches. You can either put the code in the destroy() method of a Servlet that is load-on-startup

  <servlet servlet-name='cleanup' servlet-class='my.CleanupServlet'>
    <load-on-startup>1</load-on-startup>
  </servlet>

or (probably slightly more correct) you can create a javax.servlet.ServletContextListener and add the cleanup to the contextDestroyed() method.

  <listener>
    <listener-class>my.CleanupListener</listener-class>
  </listener>

The workaround

Fortunately, FileCacheSeekableStream keeps a reference to the shutdown hook in our case.

public final class FileCacheSeekableStream extends SeekableStream {

    /** A thread to clean up all temporary files on VM exit (VM 1.3+) */
    private static TempFileCleanupThread cleanupThread = null;

So let’s grab that reference and remove the shutdown hook. But we probably don’t just want to throw away the hook, since in theory that may leave us with temporary files that should has been deleted at JVM shutdown. Instead get the hook, remove it, and then run it immediately.

We may actually turn this into a generic method, to be reused for other third party shutdown hooks we want to remove. (System.out is used for logging, since logging frameworks usually needs to be cleaned up too, and I suggest you do that before calling this method.)

private static void removeShutdownHook(Class clazz, String field) {
  // Note that loading the class may add the hook if not yet present... 
  try {
    // Get the hook
    final Field cleanupThreadField = clazz.getDeclaredField(field);
    cleanupThreadField.setAccessible(true);
    Thread cleanupThread = (Thread) cleanupThreadField.get(null);

    if(cleanupThread != null) {
      // Remove hook to avoid PermGen leak
      System.out.println("  Removing " + cleanupThreadField + " shutdown hook");
      Runtime.getRuntime().removeShutdownHook(cleanupThread);
      
      // Run cleanup immediately
      System.out.println("  Running " + cleanupThreadField + " shutdown hook");
      cleanupThread.start();
      cleanupThread.join(60 * 1000); // Wait up to 1 minute for thread to run
      if(cleanupThread.isAlive())
        System.out.println("STILL RUNNING!!!");
      else
        System.out.println("Done");
    }
    else
      System.out.println("  No " + cleanupThreadField + " shutdown hook");
    
  }
  catch (NoSuchFieldException ex) {
    System.err.println("*** " + clazz.getName() + '.' + field + 
      " not found; has JAR been updated??? ***");
    ex.printStackTrace();
  }
  catch(Exception ex) {
    System.err.println("Unable to unregister " + clazz.getName() + '.' + field);
    ex.printStackTrace();
  }    
}

Now we just call that method in our application shutdown (CleanupServlet.destroy() / CleanupListener.contextDestroyed()) like so:

removeShutdownHook(com.sun.media.jai.codec.FileCacheSeekableStream.class,
  "cleanupThread");

In a worst case scenario, if there is no reference kept to the shutdown hook, we may use reflection into the JVM classes. It would look like this:

final Field field = 
  Class.forName("java.lang.ApplicationShutdownHooks").getDeclaredField("hooks");
field.setAccessible(true);
Map<Thread, Thread> shutdownHooks = (Map<Thread, Thread>) field.get(null);
// Iterate copy to avoid ConcurrentModificationException
for(Thread t : new ArrayList<Thread>(shutdownHooks.keySet())) {
  if(t.getClass().getName().equals("class.name.of.ShutdownHook")) { // TODO: Set name
    // Make sure it's from this web app instance
    if(t.getClass().getClassLoader().equals(this.getClass().getClassLoader())) {
      Runtime.getRuntime().removeShutdownHook(t); // Remove hook to avoid PermGen leak
      t.start(); // Run cleanup immediately
      t.join(60 * 1000); // Wait up to 1 minute for thread to run
    }
  }
}

That’s all for this post. Next time we’ll look at threads running within your ClassLoader.

Update – Bean Validation API begs “FIXME”

I can’t help but post an additional example, that I found just the other day. Had some PermGen errors in a new webapp and this is what I found:

Looking at Validation.java and the inner class javax.validation.Validation.DefaultValidationProviderResolver it does, at least in the current revision, contain these lines of code:

		//cache per classloader for an appropriate discovery
		//keep them in a weak hashmap to avoid memory leaks and allow proper hot redeployment
		//TODO use a WeakConcurrentHashMap
		//FIXME The List<VP> does keep a strong reference to the key ClassLoader, use the same model as JPA CachingPersistenceProviderResolver
		private static final Map<ClassLoader, List<ValidationProvider<?>>> providersPerClassloader =
				new WeakHashMap<ClassLoader, List<ValidationProvider<?>>>();

Isn’t that nice? In the Bean Validation API (JSR 303) – not an implementation but the API – there is a cache that have been created with hot redeployment in mind, and still it has the potential to leaks classloaders. Not only that – the authors of the code have been aware that it can leak classloaders, and still validation-api-1.0.0.GA.jar was released, without any means of manually telling the cache to release our ClassLoader. Sigh…

The leak is triggered when the API is shipped with your application server, but the implementation (Hibernate Validator in my case) is provided in your web application, and thus loaded with your classloader.

Using reflection like above, we stop the leak by getting hold of the Map and remove() our classloader. Alternatively, we could add the JAR of our Validation provider on the Application Server level, so that the cache will not reference our webapp ClassLoader at all.

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)