Classloader leaks IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In the series on classloader leaks, it’s time to talk about ThreadLocals. As you know, java.lang.ThreadLocal provides a means to achieve thread safety in a multi-threaded environment, such as a web application. They can also be used to allow a thread access to some data in all layers of an application.

However, unless used correctly ThreadLocals are also a common cause of classloader leaks in web application environment. This is because Application Servers use thread pools, which means that a Thread may very well outlive your web application instance. So in case a ThreadLocal has a reference to something inside your classloader, there is a great risk that it will prevent your classloader from being garbage collected and you end up with java.lang.OutOfMemoryError: PermGen space / Metaspace. So in a web application environment, ThreadLocals should rather be considered to be ThreadGlobals, since it’s best to assume they will remain as long as the Application Server is running unless explicitly remove()d.

Avoid leaks caused by ThreadLocals

The most straightforward way to avoid this problem, is to enclose all your ThreadLocal usage in a try/finally block and make sure you remove the value before the Thread is returned to the thread pool.

try {
  threadLocal.set(value);
  ...
}
finally {
  threadLocal.remove();
}

When and why it leaks

You may think that java.lang.ThreadLocal is implemented with a WeakHashMap with Thread as key. If that was the case, they would probably have been less likely to cause classloader leaks, since all references would be cleared as the ThreadLocal instance was garbage collected.

But instead, ThreadLocal uses the Thread as storage. To be exact, java.lang.Thread has a threadLocals attribute of type java.lang.ThreadLocal.ThreadLocalMap. That is a Map with a WeakReference to the ThreadLocal instance as key and the threads value as value.

Some sources claim, that using custom ThreadLocal subclasses loaded by your webapp classloader will cause leaks, however this is not the case (concluded with both theory and test case – the WeakReference of the key does it’s job).

However ThreadLocal values loaded by our classloader, including values with strong references to such classes (for example a java.util.List of our own classes) are likely to cause leaks. This is due to two facts. The first and obvious is that the references from the Thread to the value are strong references. But then you might ask, isn’t the strong value reference removed when the WeakReference key is garbage collected? Well, like with WeakHashMap the values are not immedately removed. WeakHashMap however uses a ReferenceQueue to keep informed about what keys have been removed, and internally calls the private expungeStaleEntries() when servicing any public method call. ThreadLocalMap however does not use a ReferenceQueue and thus removes the unused values much more rarely and unpredictably, as stated by it’s JavaDoc:

“However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space”

Third party example analyzed in Eclipse Memory Analyzer

Here is what an uncleared ThreadLocal can look like in a MAT analysis:
ThreadLocal analysis
This however does not show what causes the problem. In order to find that, make note of the entry index in brackets (38 in our case) and right click the ThreadLocal and select “List objects”, “with outgoing references”
ThreadLocal analysis 2
Find the correct entry with the index from the previous list
ThreadLocal analysis 3
At this stage, “referent” is the ThreadLocal instance and “value” is the value for the Thread. In this case we are lucky, in that a custom ThreadLocal subclass is used, so we can easily see what is causing the problem. In case no subclass is used, we could have right clicked the “referent” and selected “List objects”, “with incoming references” to see what class holds the ThreadLocal
ThreadLocal analysis 4

In this case, the problem is within Apache Axis, version 1.4. Looking at the source code of org.apache.axis.utils.XMLUtils we can confirm that it uses as custom ThreadLocal, kept in the static documentBuilder attribute.

    private static class ThreadLocalDocumentBuilder extends ThreadLocal {
        protected Object initialValue() {
            try {
                return getDOMFactory().newDocumentBuilder();
            } catch (ParserConfigurationException e) {
                log.error(Messages.getMessage("parserConfigurationException00"),
                        e);
            }
            return null; 
        }
    }     
    private static ThreadLocalDocumentBuilder documentBuilder = new ThreadLocalDocumentBuilder(); 

Which fails to be cleared:

    public static DocumentBuilder getDocumentBuilder() throws ParserConfigurationException {
        return (DocumentBuilder) documentBuilder.get();
    }

(Note that get() will set the value to what is returned by the overridden initialValue().)

Preventing ThreadLocal leaks

Update: The ClassLoader Leak Prevention library in part IV includes a much more sophisticated preventative measure against ThreadLocal leaks.

Trying to fix ThreadLocal leaks out of your control during application shutdown is risky due to concurrency issues. Instead you could to take care of the problem while the thread is still under your control, before it is returned to the thread pool. We can achieve this using a Servlet Filter. In the case above we have a static reference to the offending ThreadLocal. We could therefore create a filter like the following one. (Note that the code has been simplified in that all exceptions have been ignored and null is assumed to be returned instead. This of course will not compile, but hopefully makes the principle clearer.)

public class ThreadLocalLeakPreventionFilter implements javax.servlet.Filter {

  private ThreadLocal[] offendingThreadLocals;

  public void init(FilterConfig filterConfig) throws ServletException {
    List<ThreadLocal> threadLocals = new ArrayList<ThreadLocal>();

    // TODO: Needs error handling!!!
    Class clazz = Class.forName("org.apache.axis.utils.XMLUtils");
    if(clazz != null) {
      final Field threadLocalField = 
          clazz.getDeclaredField("documentBuilder");
      if(threadLocalField != null) {
        threadLocalField.setAccessible(true);
        Object threadLocal = threadLocalField.get(null);
        if(threadLocal instanceof ThreadLocal) {
          threadLocals.add((ThreadLocal)threadLocal);
        }
      }
    }
    
    // TODO: Look up more offenders here
    
    this.offendingThreadLocals = 
        threadLocals.toArray(new ThreadLocal[threadLocals.size()]);
  }

  /** 
   * In the doFilter() method we have a chance to clean up the thread
   * before it is returned to the thread pool 
   */
  public void doFilter(ServletRequest servletRequest, 
                       ServletResponse servletResponse, 
                       FilterChain filterChain) 
      throws IOException, ServletException {
    
    try {
      filterChain.doFilter(servletRequest, servletResponse);
    }
    finally {
      // Clean up ThreadLocals
      for(ThreadLocal offendingThreadLocal : offendingThreadLocals) {
        offendingThreadLocal.remove(); // Remove offender from current thread
      }
    }
  }

  public void destroy() {
    offendingThreadLocals = null; // Make available for Garbage Collector
  }
-->
}

In case there was no static reference to the ThreadLocal to grab hold of, we could have looped the entries of the ThreadLocalMap via reflection in the doFilter() method, and looked for entries where either key or value is of a type loaded by the web app classloader. I might get back and show you exactly what that would look like.


Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)