Category Archives: ClassLoader leaks

ClassLoader leaks links

During September 2016 I’ll be speaking about ClassLoader leaks on JavaZone, JDK.IO and JavaOne. For those that listened to my talk and want to read more on the subject, here are the slides, links to my blog series and to the ClassLoader Leak Prevention library on GitHub.

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Recording (from JDK.IO):

ClassLoader Leak Prevention library 2.0 released

I recently released version 2.0.0 of the ClassLoader Leak Prevention library to Maven Central. This is a major refactoring, that provides the following new.

App server and non-servlet framework integration

The library now has a core module that does not assume a servlet environment. This means that the library can be integrated into environments that do dynamic class loading, such as scripting enginges. It also means that Java EE application servers can integrate the library, so that web apps deployed onto that server wouldn’t need to include the library to be protected from java.lang.OutOfMemoryError: PermGen space / Metaspace. More details can be found in the module README.md on GitHub

Zero-config Servlet 3.0+ module

If you’re in a Servlet 3.0 or 3.1 environment, there is no longer a need for explicitly declaring the <listener> in web.xml. Instead use a the classloader-leak-prevention-servlet3 Maven dependency that handles this for you automatically. For details, see the README.md on GitHub.

Preventions are now plugins

In version 1.x, you needed to subclass the librarys ServletContextListener to add, remove or change behaviour of specific leak prevention measures. In 2.x, each prevention mechanism is a separate class implementing an interface. This makes it easier to implement your own additional preventions, remove measures from the configuration, or subclass and adjust any single mechanism.

Improved logging

While 1.x logged to System.out/System.err unless you subclassed and overrode the log methods, 2.x by default uses java.util.logging (JUL). You can also easily switch to the System.out/System.err behaviour, or provide your own logging.

Please note that bridging JUL to other logging frameworks (for example using jul-to-slf4j has not been tested, and may produce unexpected results, in case something is logged after the logging framework has been shut down by the library.

ClassLoader Leak test framework in Maven

Today I launch another weapon in the ongoing war on Classloader Leaks: The classloader-leak-test-framework. Admittedly, the framework itself is not new. The news is that in order to use it you no longer have to clone the Git repo, because it is now available as a Maven artifact through Maven Central.

If you want to confirm a suspected leak, just add

<dependency>
  <groupId>se.jiderhamn</groupId>
  <artifactId>classloader-leak-test-framework</artifactId>
  <version>1.0.0</version>
  <scope>test</scope>
</dependency>

to your POM and create a test case that you believe would trigger the leak. (Make sure to check GitHub for the current version.)

Heap dump when leak detected

Another improvement to the test framework that I have not previously announced, is that the framework can now automatically create a heap dump when a ClassLoader leak is detected. This makes it even easier to track down the cause of the leak and determine the required countermeasures. To activate this feature add @Leaks(dumpHeapOnError = true) to your test method.

Test framework documentation

For further information on how to use the ClassLoader Leak test framework, see the projects space on GitHub.

ClassLoader leaks at JavaForum Gothenburg

Tonight I have talked about ClassLoader leaks at JavaForum Gothenburg. Below you will find the slides from the presentation, the link to the Leak prevention library on GitHub and links to all the parts of the blog series I made on the subject.

Video

Slides


Heinz Kabutz Java Specialists’ Newsletter The Law Of The Blind Spot referenced on slide 26.
Tomcat Bugzilla entries #48895 and #49159 referenced on slide 41.

Leak prevention library

GitHub project

Links to all parts in blog series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Classloader leaks VI – “This means war!” (Leak Prevention library)

This post is intended to conclude the series about classloader leaks, and I want to use it to declare war! (Not as in Web application archive though…)

I have decided to pursue the fight against classloader leaks, and I invite you to join me. For this purpose I have created a project on GitHub called classloader-leak-prevention. The project consists of two parts.

Classloader leak protection listener

First and foremost there is a component for you to add to your web application, that intends to remove and work around as many of the known issues as possible. This will allow us not to depend on bugs being fixed in third party libraries. Yes, this is somewhat like what Tomcat has built in, but my component covers cases that Tomcat currently does not. Another major advantage is that my component is Application Server independent.

It should be as easy as adding a JAR or .java file, configuring a ServletContextListener in web.xml, and you should be protected against java.lang.OutOfMemoryError: PermGen space caused by your app leaking classloaders (you could still run out of PermGen however, or other apps on the same server may leak). In due time, I hope that parts of it will be configurable to your needs (check back here for updates). If it is not configurable enough, feel free to subclass or create your own GitHub fork.

To configure the main prevention mechanism, just add the component to your project and insert this into your web.xml:

<listener>
  <listener-class>
    se.jiderhamn.classloader.leak.prevention.ClassLoaderLeakPreventor
  </listener-class>
</listener>

It makes sense to keep this listener “outermost” (initializing first, destroying last), so you should normally declare it before any other listeners in web.xml.

Maven

The library is available in Maven Central with the following details:

<dependency>
  <groupId>se.jiderhamn</groupId>
  <artifactId>classloader-leak-prevention</artifactId>
  <version>1.15.2</version>
</dependency>

Download

Non-Maven users can download the JAR with the current version (1.15.2) of the project » here «.

Configuration

The context listener has a number of settings, see the readme on GitHub.

Classloader leak detection / test framework

Another part of the project, is a framework that allows the creation of JUnit tests, that confirms classloader leaks in third party APIs. It is also possible to test leak prevention mechanisms to confirm that the leak really is avoided.

Read more about this on GitHub.

License

This project is licensed under the Apache 2 license, which allows you to include modified versions of the code in your distributed software, without having to release your source code.

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks V – Common mistakes and Known offenders

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

A couple of years ago, there was an article on the Confluence site of the Spring framework written by Magnus Alvestad, titled “Memory leaks where the classloader cannot be garbage
collected” (originally located here/here). Apart from a short description of the problem with classloader leaks, it also contained a list of known offenders; practices and third party libraries known to cause classloader leaks, either by incorrect use or as a bug.

Even though not by far exhaustive, an article like that was a good reference – a place to start looking if you experienced java.lang.OutOfMemoryError: PermGen space. Unfortunately it seems that page is not available any more. My hope for this post is that it may serve that purpose instead. For that to come true, I also need your help. If you know of a public Java library that does, in a current or previous version, cause classloader leaks, please comment on this post. I intend to keep the list updated. (Please note that this is not an invitation for support requests, asking me whether a particular library does leak. I want you to report leaks that are confirmed, either by yourself or that there is an online bug report / mailing list discussion. References are preferrable!)

Common mistakes

Let’s start with the leaks that aren’t necessarily bugs, but failure to follow guidelines or best practice. It may also be API:s that have the potential to cause classloader leaks, while they also provide a means to avoid them.

Uncleared ThreadLocals

As we talked about in the previous post, you should always call remove() on a ThreadLocal in a finally block.

Unstopped threads

In part III of this series we talked about the importance of making sure threads started within your web app stop executing when the application is redeployed.

JDBC driver included in WAR

Your JDBC driver will be registered in java.sql.DriverManager, which means that if you include your JDBC driver inside your web application, there will be a reference to your webapps classloader from system classes (see part II). The simple solution is to put JDBC driver on server level instead, but you can also deregister the driver at application shutdown.

Logging frameworks

Logging frameworks such as Apache Commons Logging (ACL) – formerly Jakarta Commons Logging (JCL) – log4j and java.util.logging (JUL) will cause classloader leaks under some circumstances.

Apache Commons Logging will cause trouble if the logging framework is supplied outside of the web application, such as within the Application Server. In such a case, you need to add a bit of cleanup code to the ServletContextListener we’ve talked about:

org.apache.commons.logging.LogFactory.release(
  Thread.currentThread().getContextClassLoader());

or

org.apache.commons.logging.LogFactory.release(this.getClass().getClassLoader());

There is an article about this on the Apache Commons Wiki. It is also mentioned in the guide and FAQ.

When it comes to log4j, you can achieve the same kind of leak with some configurations. I’m not sure if calling org.apache.log4j.LogManager.shutdown() in the cleanup ServletContextListener helps, but it’s probably a good idea anyway, at least if there is only a single web application running on the server.

With log4j2, it seems the shutdown mechanism is a bit more complicated. I hope to get back with a more proper solution, but for now see this Stac kOverflow answer.

As for java.util.logging (JUL), it will always be outside the web application, since it is part of the JDK. With it you can cause classloader leaks by creating custom log Levels inside your web app. This is what Frank Kieviet uses as example in his blog post on classloader leaks. It seems also that JBoss Logging does exactly that if backed by JUL, and thereby triggers such a leak if loaded within your web app. See report here.

There is also a thorough article about the problem with logging frameworks and classloaders here. As a general recommendation, SLF4J supposedly helps preventing at least some of these problems.

Bean instrospection

The Java Bean introspection has a cache with strong references, that need to be cleared by calling

java.beans.Introspector.flushCaches();

in the cleanup ServletContextListener. If you don’t want to create your own, you can use org.springframework.web.util.IntrospectorCleanupListener from the Spring framework.

Custom property editor

If a property editor loaded within the web application (or a property editor for a class loaded in the web application) is registered by calling java.beans.PropertyEditorManager.registerEditor() it needs to be deregistered at application shutdown, or it will cause classloader leaks. Deregistering can be achieved by calling java.beans.PropertyEditorManager.registerEditor() again with the same targetType but null as the second argument (editorClass).

Custom default java.net.Authenticator

Custom java.net.Authenticator loaded in your web application and registered with java.net.Authenticator.setDefault() must be unregistered at application shutdown, or it will cause leaks.

Custom default java.net.ProxySelector

Custom java.net.ProxySelector loaded in your web application and registered with java.net.ProxySelector.setDefault() must be unregistered at application shutdown, or it will cause leaks.

Custom java.security.Provider

Custom java.security.Provider loaded in your web application and registered with java.security.Security.addProvider() must be unregistered with java.security.Security.removeProvider() at application shutdown, or it will cause leaks. However, there is also a problem with javax.crypto.JceSecurity that holds a couple of static caches. So in case you’ve used a custom java.security.Provider for cryptographic operations, this will cause a leak even if you call java.security.Security.removeProvider().

Custom MBean

Custom MBeans registered in the MBeanServer (ManagementFactory.getPlatformMBeanServer().registerMBean()) needs to be unregistered at application shutdown, or they will cause leaks.

Custom MBean NotificationListener/NotificationFilter/handback

MBeans/MXBeans implementing the javax.management.NotificationBroadcaster interface – or the javax.management.NotificationEmitter sub-interface – may have javax.management.NotificationListeners added to them, in combination with a javax.management.NotificationFilter and/or a handback object. If any of these three are loaded by your web application, you would need to call removeNotificationListener() on shutdown to prevent leaks. Apache Pig has such a leak reported, but users are adviced to remove Pigs SpillableMemoryManager from the MemoryMXBean on application shutdown.

Undestroyed custom ThreadGroup

ThreadGroups are in a hierarchy with the “system” ThreadGroup at the top. Children are added to a parent upon creation, and parents will keep a reference to it’s child ThreadGroups, until the child is destroyed by a call to destroy(). This means that if java.lang.ThreadGroup is subclassed, and that subclass is loaded inside your application and instantiated, then there will be a strong reference to your applications classloader until that ThreadGroup is destroy()ed.

Known offenders

Among the API:s that are known to cause classloader leaks, without providing a proper cleanup/workaround, are a couple of JDK classes with the habit of keeping a reference to the contextClassLoader of the Thread that first calls them. The solution in this case is to make sure these methods are called once, with some other classloader – such as ClassLoader.getSystemClassLoader() – as contextClassLoader. Preferrably we put this code in the contextInitialized() method of our ServletContextListener. See Tomcats JreMemoryLeakPreventionListener class for more details.

Apache ActiveMQ

Apache ActiveMQ registers org.apache.activemq.util.StringArrayEditor as a property editor for String[] in java.beans.PropertyEditorManager (by a static block in org.apache.activemq.util.IntrospectionSupport), but provide no means of deregistering it. Reported here.

Apache Axis

Apache Axis leaks classloaders because of uncleared ThreadLocal, at least version 1.4, as we saw in part IV.

Apache Batik

Some versions of Batik SVG Toolkit (at least 1.5 beta 4 up to 1.7), leaves unterminated threads, as we saw in part III. It has been reported but at the time of this writing there is still no fix.

Apache Commons Pool / DBCP

Apache Commons Pool, which is used by Apache Commons DBCP, has a feature to automatically evict idle objects, which use a background thread running in the applications classloader. Earlier versions suffered from a bug, that did not allow the thread to be stopped properly (missing synchronization or volatile, like talked about in part III). See this blog post. It seem that in recent versions you can simply call org.apache.commons.pool2.impl.GenericObjectPool.close() at application shutdown. You can also turn off the idle evict feature at any time by calling org.apache.commons.pool2.impl.GenericObjectPool.
setTimeBetweenEvictionRunsMillis()
with a negative value.

Apache Commons HttpClient

Apache Commons HttpClient has a MultiThreadedHttpConnectionManager on which you may need to call the static shutdownAll() method in order to stop its ReferenceQueueThread, in case it has been loaded in your web app. This leak may be triggered by the Jersey jersey-apache-client.

Apache CXF

Apache CXF may set org.apache.cxf.transport.http.CXFAuthenticator as the default java.net.Authenticator; see above. (Thanks to Arild Froeland for the report!) Reported here.

Bean Validation API / JSR 303

The Bean Validation API / JSR 303 will leak classloaders, if the API is at the application server level while the implementation, such as Hibernate Validator, is included inside your web application. More details in part II.

CGLIB / Hibernate / Spring / JBoss / Apache Geronimo

CGLIB, used by Hibernate, Spring etc, has had a bug in it’s proxy code, with uncleared ThreadLocals (see part IV). Bug report on Hibernate. Bug reports on Spring. Bug report on JBoss. Bug report on Apache Geronimo.

dom4j

dom4j has uncleared ThreadLocals (see part IV). Reported here. That bug report is still “open”, but on other parts of the net report it to be fixed in 1.6.1.

DOM normalization/serialization

Two very similar issues can be triggered by creating a DOM document, and then either normalize it or serialize it, such as the following code

import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.ls.DOMImplementationLS;

...
Document document = DocumentBuilderFactory.newInstance()
  .newDocumentBuilder().newDocument();

document.normalizeDocument();
// or
DOMImplementationLS implementation = 
  (DOMImplementationLS)document.getImplementation();
implementation.createLSSerializer().writeToString(document);

The reason for the potential leak is that both com.sun.org.apache.xerces.internal.dom.DOMNormalizer and com.sun.org.apache.xml.internal.serialize.DOMSerializerImpl have a static field named abort, each containing a RuntimeException whose stack trace may contain a reference to the first class that invoked the normalization/serialization. Reported here.

com.sun.jndi.ldap.LdapPoolManager

The contextClassLoader of the thread loading the com.sun.jndi.ldap.LdapPoolManager class may be kept from being garbage collected, since it will start a new thread if the system property com.sun.jndi.ldap.connect.pool.timeout is set to a value greater than 0.

EclipseLink

The EclipseLink JPA implementation has had classloader leaks. According to this bug report fixed since version 1.1.

GeoTools

GeoTools is reported here to have unending thread. As of version 2.6.2 you should call org.geotools.util.WeakCollectionCleaner.exit() in your cleanup.

Google Guice

Googles IoC/DI framework Guice seems to have had several classloader leaks, at least some of which are not resolved at the time of this writing. Reports here, here and here.

Groovy

Groovy can cause classloader leaks according to this bug report, which is still “open”.

Hessian

Hessian binary web service protocol has suffered from uncleared ThreadLocals (see part IV). Bug reported here. Should be fixed since version 4.0.23 shipped with Resin.

iCal4J

iCal4J seems to have sufferend from uncleared ThreadLocals (see part IV). More info here.

Infinispan

Infinispan has had a number of reports of uncleared ThreadLocals. Two examples here and here, but there are more if you look at “Similar Issues” or search Jira yourself.

IntrospectionUtils

IntrospectionUtils copied from Tomcat (org.apache.tomcat.util.IntrospectionUtils) to Apache Commons Modeler (org.apache.commons.modeler.util.IntrospectionUtils) keeps a strong reference cache. After this was reported as a bug, a static clear() method has been added. Make sure to call it on application shutdown.

Java Advanced Imaging (JAI)

Java Advanced Imaging (JAI) library can cause classloader leaks with registered shutdown hooks, as we saw in part II. Bug report here – still “open”.

java.awt.Toolkit.getDefaultToolkit()

The first calll to java.awt.Toolkit.getDefaultToolkit() will spawn a new thread with the same contextClassLoader as the caller.

Javassist

I’ve seen reports on the net about PermGen errors when using older versions of Javassist for example with Hibernate. There are lots of issues in Javassist JIRA; start here and see “Similar Issues”. Since the issues have different fixed versions, you better use the latest version.

Java Cryptography Architecture (JCA) / MessageDigest initialization

According to Tomcat documentation, a Token poller thread with the same contextClassLoader as the caller, will be created “under certain conditions” when Java Cryptography Architecture is initialized, for example when a MessageDigest is created.

Java Server Faces 2

The JSF API, more precisely javax.faces.component.UIComponentBase, contains a cache that may cause classloader leaks in case the API is at the application server level. More information in this bug report.

javax.imageio / sun.awt.AppContext.getAppContext() / GWT

There will be a strong reference to the classloader of the calls to sun.awt.AppContext.getAppContext(). Note that Google Web Toolkit (GWT) will trigger this leak via its use of javax.imageio. Another way to trigger this leak is by creating a new javax.swing.JEditorPane("type", "text").

javax.management.remote.rmi.RMIConnectorServer.start() / sun.misc.GC.requestLatency(long)

sun.misc.GC.requestLatency(long), which is known to be called from javax.management.remote.rmi.RMIConnectorServer.start(), will cause the current contextClassLoader to be unavailable for garbage collection.

javax.net.ssl

If the javax.net.ssl package is used together with a keystore that contains certificates that have unparseable extensions, there will be a reference from system classes to the exception occurring while trying to parse the extension. The backtrace (stacktrace) of that exception is likely to contain references to the code that called the javax.net.ssl classes. In a servlet environment this means the first webapp instance to trigger javax.net.ssl will be prevented from being garbage collected. Reported to Oracle here. Prevented since version 2.1.0. Thanks to “CptS” for the report!

javax.security.auth.Policy

javax.security.auth.Policy.getPolicy() will keep a strong static reference to the contextClassLoader of the first calling thread.

javax.security.auth.login.Configuration

The class javax.security.auth.login.Configuration will keep a strong static reference to the contextClassLoader of Thread from which the class is loaded.

JGroups

JGroups may cause classloader leaks due to undestroyed custom ThreadGroups. At the time of this writing this issue has been fixed, while this one has not.

LambdaJ

The no longer maintained LambdaJ seems to suffer from uncleared ThreadLocals in some cases.

Logback

Logback causes classloader leaks when using SocketAppender according to this report, in which is it not really clear whether it has been fixed.

Logback also seems to have suffered from uncleared ThreadLocals (see part IV), as of this report. However that should be fixed since version 0.9.26.

JAXB

javax.xml.bind.DatatypeConverterImpl in the JAXB Reference Implementation shipped with JDK 1.6+ will keep a static reference (datatypeFactory) to a concrete subclass of javax.xml.datatype.DatatypeFactory, that is resolved when the class is loaded (which I believe happens if you have custom bindings that reference the static methods in javax.xml.bind.DatatypeConverter). It seems that if for example you have a version of Xerces inside your application, the factory method may resolve org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl as the implementation to use (rather than com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl shipped with the JDK), which means there will a reference from javax.xml.bind.DatatypeConverterImpl to your classloader.

Mojarra

If the Mojarra JSF implementation is provided by the application server, and you have JSF components included in your .war file, Mojarra may cause classloader leaks by keeping references to the components when application is redeployed. See bug report here.

Mozilla Rhino

The Rhino JavaScript interpreter has suffered problems related to ThreadLocals. You can read about the details in this blog post. The bug report states that it’s fixed (supposedly in 2005), but not from which version.

MVEL

MVEL expression language enginge has problems with uncleared ThreadLocals. The problem was adressed but version 2.0.19 still leaks, due to a static block adding an MVEL class as a ThreadLocal value. Current bug report here.

OpenOffice Java Uno RunTime (JURT)

The JURT (Java Uno RunTime) library in the OpenOffice UDK contains a peculiar leak. In an effort to avoid OutOfMemoryErrors(!) caused by long-running finalize methods, they created the com.sun.star.lib.util.AsynchronousFinalizer class, which holds a queue of jobs to be executed by a Thread. The problem is that this thread is never allowed to finish, as we talked about in part III. What makes this particularly tricky is that we don’t have any handle to the thread, and not least the fact that jobs are put on queue, and the thread is started if not yet running, from the finalize() method of other objects. That is, the thread may be started only when other objects are garbage collected, which may – at least in theory – not be until after your web application has been already unloaded! This problem has been reported here.

Oracle JDBC

Oracle JDBC driver, at least in some versions (such as ojdbc6), have a timeout detection thread, oracle.jdbc.driver.OracleTimeoutPollingThread, that according to reports has as its context classloader the classloader of the web application from which the first JDBC connection is requested, even if the driver itself resides on the server level. It seem this can be prevented by loading the class oracle.jdbc.driver.OracleTimeoutThreadPerVM using the system classloader before any JDBC connections are opened, which can be achieved in contextInitialized() of our ServletContextListener. (Thanks to Hal Deadman for the report!)

Postgresql JDBC

Postgresql JDBC driver will under some circumstances start timer threads with names prefixed PostgreSQL-JDBC-SharedTimer-. These threads may prevent your classloader from being garbage collected, presumably due to the inheritance of the context classloader and access control context. This has been reported here, but at the time of writing it is unclear if this is actually fixed or not. (Thanks to Hal Deadman for the report!)

Serialization

Up to Java 1.4.2, serialization of classes loaded by your web apps classloader would cause reference to your classloader most likely to be kept by internal JDK caches. More info in this blog post. Bug report here.

Spring framework

Springs use of InheritableThreadLocal has caused leaks. Fixed since 1.2.9 / 2.0.1.

sun.java2d.Disposer

Loading the class sun.java2d.Disposer will spawn a new thread with the same contextClassLoader. More info. Fixed in Java 7.

sun.net.www.http.KeepAliveCache

In order to close reusable HTTP 1.1 connections after a timeout, sun.net.www.http.KeepAliveCache starts a thread which will keep a strong reference to the classloader from which the HTTP connection was initiated (through a ProtectionDomain). See more info in this blog post. According to Magnus Alvestads article, the thread will terminate eventually, which may cause intermittent classloader leaks. If you have control over the server end you could add “Connection: close” to the HTTP responses. On the client end you can disable keep alive using -Dhttp.keepAlive=false.

Unified Expression Language

Up until version 2.2, the javax.el.BeanELResolver class of the Java Unified Expression Language / javax.el API kept a static strong reference cache with bean classes that had been introspected. This was reported as a leak and has since been fixed in version 2.2.4+ by using soft references in the cache. For currently released versions, the authors have been kind enough to provide a purgeBeanClasses() method to clear the cache, however this method is both private and non-static… (The JavaDoc of the method is a bit of a funny read, as it suggests the authors realized the problem, but failed to provide a proper solution.)

URLConnection

The caching mechanism of JarURLConnection can prevent JAR files to be reloaded. See this bug report. It is not entirely clear whether this will actually leak classloaders.

XML parsing

The classloader of the first thread to call DocumentBuilderFactory.newInstance().newDocumentBuilder() seems to be unable to garbage collection. Is it believed this is caused by some JVM internal bug.

Your Application Server

Yes, unfortunately the Application Server itself can be the cause of your classloader leaks. Tomcat (reports here, here, here), Jetty (here) and Resin (reports here, here, here, and more) has been know to suffer from such bugs, but there may be others as well. (There is a report on GlassFish here, but it’s not clear whether the error was actually within GlassFish.)

Pick the right Garbage Collector

It seems that for some reason, the default Garbage Collector of the Sun/Oracle JVM is not your best bet when it comes to freeing up permanent generation space. Assuming you have a multi core/multi CPU server (and who doesn’t these days?) I suggest you consider the Concurrent Mark and Sweep (CMS) Garbage Collector instead (but make sure you read up on the implications first). The CMS Garbage Collector however needs to be explicitly told to unload classes from PermGen space. Use these JVM options:

-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled

Depending on your JVM version, you may also need -XX:+CMSPermGenSweepingEnabled. Try them out and keep an eye on the console/log file during startup to see if the JVM accepts them.

For further reading on different Garbage Collectors, see here.

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In the series on classloader leaks, it’s time to talk about ThreadLocals. As you know, java.lang.ThreadLocal provides a means to achieve thread safety in a multi-threaded environment, such as a web application. They can also be used to allow a thread access to some data in all layers of an application.

However, unless used correctly ThreadLocals are also a common cause of classloader leaks in web application environment. This is because Application Servers use thread pools, which means that a Thread may very well outlive your web application instance. So in case a ThreadLocal has a reference to something inside your classloader, there is a great risk that it will prevent your classloader from being garbage collected and you end up with java.lang.OutOfMemoryError: PermGen space / Metaspace. So in a web application environment, ThreadLocals should rather be considered to be ThreadGlobals, since it’s best to assume they will remain as long as the Application Server is running unless explicitly remove()d.

Avoid leaks caused by ThreadLocals

The most straightforward way to avoid this problem, is to enclose all your ThreadLocal usage in a try/finally block and make sure you remove the value before the Thread is returned to the thread pool.

try {
  threadLocal.set(value);
  ...
}
finally {
  threadLocal.remove();
}

When and why it leaks

You may think that java.lang.ThreadLocal is implemented with a WeakHashMap with Thread as key. If that was the case, they would probably have been less likely to cause classloader leaks, since all references would be cleared as the ThreadLocal instance was garbage collected.

But instead, ThreadLocal uses the Thread as storage. To be exact, java.lang.Thread has a threadLocals attribute of type java.lang.ThreadLocal.ThreadLocalMap. That is a Map with a WeakReference to the ThreadLocal instance as key and the threads value as value.

Some sources claim, that using custom ThreadLocal subclasses loaded by your webapp classloader will cause leaks, however this is not the case (concluded with both theory and test case – the WeakReference of the key does it’s job).

However ThreadLocal values loaded by our classloader, including values with strong references to such classes (for example a java.util.List of our own classes) are likely to cause leaks. This is due to two facts. The first and obvious is that the references from the Thread to the value are strong references. But then you might ask, isn’t the strong value reference removed when the WeakReference key is garbage collected? Well, like with WeakHashMap the values are not immedately removed. WeakHashMap however uses a ReferenceQueue to keep informed about what keys have been removed, and internally calls the private expungeStaleEntries() when servicing any public method call. ThreadLocalMap however does not use a ReferenceQueue and thus removes the unused values much more rarely and unpredictably, as stated by it’s JavaDoc:

“However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space”

Third party example analyzed in Eclipse Memory Analyzer

Here is what an uncleared ThreadLocal can look like in a MAT analysis:
ThreadLocal analysis
This however does not show what causes the problem. In order to find that, make note of the entry index in brackets (38 in our case) and right click the ThreadLocal and select “List objects”, “with outgoing references”
ThreadLocal analysis 2
Find the correct entry with the index from the previous list
ThreadLocal analysis 3
At this stage, “referent” is the ThreadLocal instance and “value” is the value for the Thread. In this case we are lucky, in that a custom ThreadLocal subclass is used, so we can easily see what is causing the problem. In case no subclass is used, we could have right clicked the “referent” and selected “List objects”, “with incoming references” to see what class holds the ThreadLocal
ThreadLocal analysis 4

In this case, the problem is within Apache Axis, version 1.4. Looking at the source code of org.apache.axis.utils.XMLUtils we can confirm that it uses as custom ThreadLocal, kept in the static documentBuilder attribute.

    private static class ThreadLocalDocumentBuilder extends ThreadLocal {
        protected Object initialValue() {
            try {
                return getDOMFactory().newDocumentBuilder();
            } catch (ParserConfigurationException e) {
                log.error(Messages.getMessage("parserConfigurationException00"),
                        e);
            }
            return null; 
        }
    }     
    private static ThreadLocalDocumentBuilder documentBuilder = new ThreadLocalDocumentBuilder(); 

Which fails to be cleared:

    public static DocumentBuilder getDocumentBuilder() throws ParserConfigurationException {
        return (DocumentBuilder) documentBuilder.get();
    }

(Note that get() will set the value to what is returned by the overridden initialValue().)

Preventing ThreadLocal leaks

Update: The ClassLoader Leak Prevention library in part IV includes a much more sophisticated preventative measure against ThreadLocal leaks.

Trying to fix ThreadLocal leaks out of your control during application shutdown is risky due to concurrency issues. Instead you could to take care of the problem while the thread is still under your control, before it is returned to the thread pool. We can achieve this using a Servlet Filter. In the case above we have a static reference to the offending ThreadLocal. We could therefore create a filter like the following one. (Note that the code has been simplified in that all exceptions have been ignored and null is assumed to be returned instead. This of course will not compile, but hopefully makes the principle clearer.)

public class ThreadLocalLeakPreventionFilter implements javax.servlet.Filter {

  private ThreadLocal[] offendingThreadLocals;

  public void init(FilterConfig filterConfig) throws ServletException {
    List<ThreadLocal> threadLocals = new ArrayList<ThreadLocal>();

    // TODO: Needs error handling!!!
    Class clazz = Class.forName("org.apache.axis.utils.XMLUtils");
    if(clazz != null) {
      final Field threadLocalField = 
          clazz.getDeclaredField("documentBuilder");
      if(threadLocalField != null) {
        threadLocalField.setAccessible(true);
        Object threadLocal = threadLocalField.get(null);
        if(threadLocal instanceof ThreadLocal) {
          threadLocals.add((ThreadLocal)threadLocal);
        }
      }
    }
    
    // TODO: Look up more offenders here
    
    this.offendingThreadLocals = 
        threadLocals.toArray(new ThreadLocal[threadLocals.size()]);
  }

  /** 
   * In the doFilter() method we have a chance to clean up the thread
   * before it is returned to the thread pool 
   */
  public void doFilter(ServletRequest servletRequest, 
                       ServletResponse servletResponse, 
                       FilterChain filterChain) 
      throws IOException, ServletException {
    
    try {
      filterChain.doFilter(servletRequest, servletResponse);
    }
    finally {
      // Clean up ThreadLocals
      for(ThreadLocal offendingThreadLocal : offendingThreadLocals) {
        offendingThreadLocal.remove(); // Remove offender from current thread
      }
    }
  }

  public void destroy() {
    offendingThreadLocals = null; // Make available for Garbage Collector
  }
-->
}

In case there was no static reference to the ThreadLocal to grab hold of, we could have looped the entries of the ThreadLocalMap via reflection in the doFilter() method, and looked for entries where either key or value is of a type loaded by the web app classloader. I might get back and show you exactly what that would look like.


Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks III – “Die Thread, die!”

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In my previous post we looked at different categories of ClassLoader leaks, and looked at a particular example of a reference from outside the web application ClassLoader (a JVM shutdown hook pointing to a JAI class).

In this post we will look at another category; unterminated Threads running in your ClassLoader. This is a problem you can easily create yourself, but it may also come from third party libraries.

MAT analysis with running thread

When doing the “load all classes from third party JARs” test mentioned in my former post, and analyzing it with the technique outlined in my first post, I also ended up with this finding:

Batik analysis

As you can see, it is a thread still running inside my ClassLoader. We can also see, that the thread seems to be part of the Batik library. I was using version 1.5 beta 4, so let’s dig into the sources.

org.apache.batik.util.SoftReferenceCache (from line 181):

    private static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

org.apache.batik.ext.awt.image.rendered.TileMap (from line 139):

    static Thread cleanup;

    static {
        cleanup = new Thread() {
                public void run() {
                    while(true) {
...
                    }
                }
            };
        cleanup.setDaemon(true);
        cleanup.start();
    }

So, what do we have here? Not one but two static blocks (executing as the class is loaded) starting threads that execute in a while(true) loop. Once such a Threads is started, there is no garbage collecting their ClassLoader – neither the ClassLoader having loaded the Thread class (if a custom subclass to java.lang.Thread), nor the Threads contextClassLoader. In theory, the contextClassLoader of the thread can be changed (although I believe that rarely makes sense), but to garbage collect the ClassLoader of a custom Threads subclass, the thread must stop executing.

In newer versions of Batik, the two pieces of code above have been merged together into a new class – org.apache.batik.util.CleanerThread. That’s good. What’s not good is that there is at the time of this writing still a while(true) loop… This problem has been reported, and a patch has been proposed.

Stopping the thread – gangsta style

Fortunately, a referece to the thread is held in both SoftReferenceCache and TileMap (as can be seen above). In the new CleanerThread, there is also a static reference:

public class CleanerThread extends Thread {

    static volatile ReferenceQueue queue = null;
    static CleanerThread  thread = null;

That enables us to get hold of the Thread instance using reflection (same as with the shutdown hook in the former post) and call stop() on the Thread. Note that stop() is deprecated, since it may lead to an incosistent state. (You can read more about that in the Thread.stop() JavaDoc and the document that is linked from there.)

In our case however, leaking ClassLoaders and the eventual java.lang.OutOfMemoryError: PermGen space is a bigger problem than any inconsistent state that – if it occurs – presumably affects the abandoned instance of our web application. The best thing we can do in a generic case, is give the thread a chance to finish execution first. So in the cleanup Servlet/context listener we looked at last time, we will add this method, and call it once for every thread that needs to be stopped.

public static void forceThreadStop(Thread thread) {
  thread.interrupt(); // Make Thread stop waiting in sleep(), wait() or join()

  try {
    thread.join(2000); // Give the Thread 2 seconds to finish executing
  } catch (InterruptedException e) {
    // join failed
  }

  // If still not done, kill it
  if (thread.isAlive())
    thread.stop();

Stopping threads gracefully

In case you spawn threads from your own code, you should make sure that there is either a definitive ending point for them or, in case they need to be executed over and over again like a watchdog thread as in the case with Batik, that there is a way to gracefully tell to Thread to stop executing.

So, instead of the while(true), you should have a boolean flag that can be altered in order to tell the thread it’s time to die.

public class MyThread extends Thread {

  private boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

It is very important to note however, that the above code is likely to still leak ClassLoaders. This is because the JVM may cache the value of fields per thread, which Heinz Kabutz explains in somewhat more detail in The Java Specialists’ Newsletter edition titled “The Law of the Blind Spot”.

As Heinz shows, the easiest solution is probably to add the volatile keyword.

public class MyThread extends Thread {

  private volatile boolean running = true;

  public void run() {
    while(running) {
      // Do something
    }
  }

  public void shutdown() {
    running = false;
  }
}

I encourage you to read Heinz’s entire article.

That’s all for this time. Until next post, good luck killing those threads!

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks II – Find and work around unwanted references

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

In my previous post we learnt how to locate classloader leaks using Eclipse Memory Analyzer (MAT).

This time we will discuss different reasons for leaks, look at an example of a leak in a third party library, and see how we can fix that leak by a workaround.

Different reasons for ClassLoader leaks

In order to know what you should be looking for in your heapdump analysis, we could categorize ClassLoader leaks into three different types. In the end, they are all just variants of the first one.

  1. References from outside your webapp – that is from the application server or the JDK classes – to either the ClassLoader itself or one of the classes it has loaded (which in turn has a reference to the ClassLoader), including any instances of such classes.
     
  2. Threads running inside your webapp. If you spawn new threads from within your web application that may not terminate, they are likely to prevent your ClassLoader from being garbage collected. This can happen even if the thread does not use any of the classes loaded by your webapps ClassLoader. This is because threads have a context classloader, to which there is a reference (contextClassLoader) in the java.lang.Thread class. More about this in the next post.
     
  3. ThreadLocals with values whose class is loaded in your webapp. If you use ThreadLocals in your webapp, you need to explicitly clear all ThreadLocals before the webapp closes down. This is because a) the application server uses a thread pool, which means that the thread will outlive your webapp instance and b) ThreadLocal values are actually stored in the java.lang.Thread object. Therefore, this is just a variation of 1.
    (Note: This may be the case most likely created by yourself, but also exists in third party libraries.)

Example of reference from outside your application

When trying to hunt down a ClassLoader leak in our web application, I created a little JSP page in which I looped through all the third party JARs of our application. I tried to load every single class that was found in a custom ClassLoader, added a ZombieMarker to the ClassLoader (see previous post) and then disposed the ClassLoader. I ran the JSP page over and over again until I got a java.lang.OutOfMemoryError: PermGen space. That is, I was able to trigger ClassLoader leaks just by loading classes from our third party libraries… 🙁 It actually turned out to be more than one of them, that triggered this behaviour.

Here is a MAT trace for one of them:

(In this picture, it’s not obvious where our ClassLoader is. The custom ClassLoader was an anonymous inner class in my JSP, so it’s the second entry with the strange class name ending with $1.)

At first glance, it may seem like this is type 2 above, with a running thread. This is not the case however, since the thread itself is not the GC root (not at the bottom level). In fact, there is a Thread involved, but it is not running.

Rather we can see that what keeps our ClassLoader from being garbage collected is a reference from outside the webapp (java.lang.*) to an instance of com.sun.media.jai.codec.TempFileCleanupThread, which in turn is loaded by our ClassLoader. From the names of the referenced and referencing (java.lang.ApplicationShutdownHook) classes, I suspected that a JVM shutdown hook was added by some Java Advanced Imaging (JAI) class when it was loaded.

The com.sun.media.jai.codec.TempFileCleanupThread class is in the Codec part of JAI; version 1.1.2_01 in our case. The sources can be found in the official SVN repo (1.1.2_01 tag). As you can see, TempFileCleanupThread.java class is not in that list. That is because someone thought is was a great idea to put it as a package protected class in FileCacheSeekableStream.java.

There we can also find the source of the leak.

    // Create the cleanup thread. Use reflection to preserve compile-time
    // compatibility with JDK 1.2.
    static {
        try {
            Method shutdownMethod =
                Runtime.class.getDeclaredMethod("addShutdownHook",
                                                new Class[] {Thread.class});

            cleanupThread = new TempFileCleanupThread();

            shutdownMethod.invoke(Runtime.getRuntime(),
                                  new Object[] {cleanupThread});
        } catch(Exception e) {
            // Reset the Thread to null if Method.invoke failed.
            cleanupThread = null;
        }
    }

As suspected, there is a static block that (via reflection) adds a JVM shutdown hook, as soon as the com.sun.media.jai.codec.FileCacheSeekableStream class is loaded. Not very practical in a web application environment, since the JVM will will not shutdown until the application server is shut down.

The JAI TempFileCleanupThread is supposed to delete temporary files when the JVM shuts down. In a web application, what we want is probably to remove those temporary files as soon as the web application is redeployed. If this was our own code, we should have changed this. In this case it’s a third party library, and judging from the SVN trunk, this still has not been fixed, so upgrading doesn’t help. (This has been reported here.)

Cleaning up leaking references at redeploy

In order to clean up references as part of web application shutdown, to prevent ClassLoader leaks, there are two approaches. You can either put the code in the destroy() method of a Servlet that is load-on-startup

  <servlet servlet-name='cleanup' servlet-class='my.CleanupServlet'>
    <load-on-startup>1</load-on-startup>
  </servlet>

or (probably slightly more correct) you can create a javax.servlet.ServletContextListener and add the cleanup to the contextDestroyed() method.

  <listener>
    <listener-class>my.CleanupListener</listener-class>
  </listener>

The workaround

Fortunately, FileCacheSeekableStream keeps a reference to the shutdown hook in our case.

public final class FileCacheSeekableStream extends SeekableStream {

    /** A thread to clean up all temporary files on VM exit (VM 1.3+) */
    private static TempFileCleanupThread cleanupThread = null;

So let’s grab that reference and remove the shutdown hook. But we probably don’t just want to throw away the hook, since in theory that may leave us with temporary files that should has been deleted at JVM shutdown. Instead get the hook, remove it, and then run it immediately.

We may actually turn this into a generic method, to be reused for other third party shutdown hooks we want to remove. (System.out is used for logging, since logging frameworks usually needs to be cleaned up too, and I suggest you do that before calling this method.)

private static void removeShutdownHook(Class clazz, String field) {
  // Note that loading the class may add the hook if not yet present... 
  try {
    // Get the hook
    final Field cleanupThreadField = clazz.getDeclaredField(field);
    cleanupThreadField.setAccessible(true);
    Thread cleanupThread = (Thread) cleanupThreadField.get(null);

    if(cleanupThread != null) {
      // Remove hook to avoid PermGen leak
      System.out.println("  Removing " + cleanupThreadField + " shutdown hook");
      Runtime.getRuntime().removeShutdownHook(cleanupThread);
      
      // Run cleanup immediately
      System.out.println("  Running " + cleanupThreadField + " shutdown hook");
      cleanupThread.start();
      cleanupThread.join(60 * 1000); // Wait up to 1 minute for thread to run
      if(cleanupThread.isAlive())
        System.out.println("STILL RUNNING!!!");
      else
        System.out.println("Done");
    }
    else
      System.out.println("  No " + cleanupThreadField + " shutdown hook");
    
  }
  catch (NoSuchFieldException ex) {
    System.err.println("*** " + clazz.getName() + '.' + field + 
      " not found; has JAR been updated??? ***");
    ex.printStackTrace();
  }
  catch(Exception ex) {
    System.err.println("Unable to unregister " + clazz.getName() + '.' + field);
    ex.printStackTrace();
  }    
}

Now we just call that method in our application shutdown (CleanupServlet.destroy() / CleanupListener.contextDestroyed()) like so:

removeShutdownHook(com.sun.media.jai.codec.FileCacheSeekableStream.class,
  "cleanupThread");

In a worst case scenario, if there is no reference kept to the shutdown hook, we may use reflection into the JVM classes. It would look like this:

final Field field = 
  Class.forName("java.lang.ApplicationShutdownHooks").getDeclaredField("hooks");
field.setAccessible(true);
Map<Thread, Thread> shutdownHooks = (Map<Thread, Thread>) field.get(null);
// Iterate copy to avoid ConcurrentModificationException
for(Thread t : new ArrayList<Thread>(shutdownHooks.keySet())) {
  if(t.getClass().getName().equals("class.name.of.ShutdownHook")) { // TODO: Set name
    // Make sure it's from this web app instance
    if(t.getClass().getClassLoader().equals(this.getClass().getClassLoader())) {
      Runtime.getRuntime().removeShutdownHook(t); // Remove hook to avoid PermGen leak
      t.start(); // Run cleanup immediately
      t.join(60 * 1000); // Wait up to 1 minute for thread to run
    }
  }
}

That’s all for this post. Next time we’ll look at threads running within your ClassLoader.

Update – Bean Validation API begs “FIXME”

I can’t help but post an additional example, that I found just the other day. Had some PermGen errors in a new webapp and this is what I found:

Looking at Validation.java and the inner class javax.validation.Validation.DefaultValidationProviderResolver it does, at least in the current revision, contain these lines of code:

		//cache per classloader for an appropriate discovery
		//keep them in a weak hashmap to avoid memory leaks and allow proper hot redeployment
		//TODO use a WeakConcurrentHashMap
		//FIXME The List<VP> does keep a strong reference to the key ClassLoader, use the same model as JPA CachingPersistenceProviderResolver
		private static final Map<ClassLoader, List<ValidationProvider<?>>> providersPerClassloader =
				new WeakHashMap<ClassLoader, List<ValidationProvider<?>>>();

Isn’t that nice? In the Bean Validation API (JSR 303) – not an implementation but the API – there is a cache that have been created with hot redeployment in mind, and still it has the potential to leaks classloaders. Not only that – the authors of the code have been aware that it can leak classloaders, and still validation-api-1.0.0.GA.jar was released, without any means of manually telling the cache to release our ClassLoader. Sigh…

The leak is triggered when the API is shipped with your application server, but the implementation (Hibernate Validator in my case) is provided in your web application, and thus loaded with your classloader.

Using reflection like above, we stop the leak by getting hold of the Map and remove() our classloader. Alternatively, we could add the JAR of our Validation provider on the Application Server level, so that the cache will not reference our webapp ClassLoader at all.

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

Classloader leaks I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

If you just want a quick fix to the problem without understanding the theory, jump to part IV introducing the ClassLoader Leak Prevention library.

I’m planning a series of posts around classloader leaks, also known as PermGen memory leaks. You have probably arrived at this page because your Java web application crashes with the dreaded java.lang.OutOfMemoryError: PermGen space (or java.lang.OutOfMemoryError: Metaspace, if you’re on Java 8). I will not explain what this error means nor the reason it occurs, since there is lots of information about it on the net – for example, see Frank Kieviet’s blogs on the problem and its solution.

What I will focus on in this first post, is the step between the “what” and the “how” – the “where” that is often forgotten in other online discussions. After you’ve realized you have classloader leaks, you must identify where those leaks are, before you can fix them.

Not many years ago, finding the source of a classloader leak was really tricky – or at least I thought so. The tools at hand were jmap and jhat, which are quite “raw”. Later there were some commercial tools, such as YourKit to help you in the process. Nowadays there are Open Source alternatives that makes it relatively easy to find the offending code. I will show you step by step how to do it.

First things first: the heap dump

The first thing you need to do to find a classloader leak, is to aquire a heap dump to analyze. The heap should be dumped after at least one ClassLoader instance has leaked, so that you can analyze what references there are to the leaked instance, that prevents it from being garbage collected.

One of the easiest ways to do this, is to add a JVM parameter that makes the (Sun/Oracle) JVM automatically create a heapdump whenever a java.lang.OutOfMemoryError occurs. The advantage of this, is that you don’t have to try to force the appearance of the leak, in case you don’t know what triggers it. This also means you won’t spend time looking for a leak in a heapdump where there is none.

The name of the parameter is -XX:+HeapDumpOnOutOfMemoryError, so add -XX:+HeapDumpOnOutOfMemoryError to your command line, script or configuration file – depending on what application server you are using and how you are starting it. Then run and redeploy the application until it crashes with java.lang.OutOfMemoryError: PermGen space / Metaspace and voilà – there is your heap dump. The name of the file will be something like java_pid18148.hprof, and it will be located in whatever was the startup directory of your application server, which may be different from the directory from where you launched the startup script. You may also decide the directory yourself using the -XX:HeapDumpPath=/directory parameter.

Now that you’ve got your heap dump, download Eclipse Memory Analyzer (MAT), run it and open the heap dump you just aquired.

Open heap dump

An alternative approach, is to extract the heap dump from a locally running application server, from inside MAT. Just start MAT and select “Aquire Heap Dump …” from the File menu. This will present you with a list of running Java applications.

Select your application server (make sure it’s not the application servers bootstrapper / watchdog) and click Finish.

Find a leaked classloader

When you open or aquire a heapdump, MAT will ask you if you want to perform some kind of analyzis on the dump, such as looking for memory leak suspects. This may be good for looking for heap leaks, but in my experience is not of much help when it comes to classloader leaks, since the leaked classloaders often have less retained (non-Class) objects than the current “non-leaked” one. Therefore I suggest you click Cancel.
Getting Started Wizard
What you should do instead, depends on what application server you used when aquiring the heap dump. In case you were using a fairly recent version (>= 4.0.12) of Caucho’s Resin you’re in luck, since it has some features that significantly simplifies finding the leaked classloaders. What Resin does, quite geniously, is that it adds a marker to each classloader that from Resins perspective is ready do be garbage collected. That allows us to simply search for that marker and analyze why the marked classloaders are not garbage collected.

So click the “Open Query Browser” icon, and select “List objects” / “with incoming references”.
List objects
Now type in the class name of the marker, which for Resin version 4.0.12 – 4.0.20 is called com.caucho.loader.ZombieMarker and since Resin 4.0.21 it is called com.caucho.loader.ZombieClassLoaderMarker.
List zombie markers
Clicking Finish will present you with a list of zombie marker instances, one for every classloader that Resin considers ready for garbage collection. You can see the classloader for each of them by clicking the little arrow in front, which will unfold the incoming references.

List of zombie markers

Now you can skip the rest of this section.

I don’t know if any other application servers provide something similar to Resins zombie markers, but assuming yours do not, you should do this instead: click the “Open Query Browser” icon, and select “Java Basics” / “Class Loader Explorer”.

Class Loader ExplorerUnless you already know the class name of the classloaders used for each web application in your application server, just click Finish. This will present you with a list of all the classloaders in your heap dump.

Class Loader Explorer listHopefully you can figure out by the class names, which ones are – possibly leaked – web application instances. For each such instance, you need to perform the steps in Finding the leak below to determine if that instance is a leaked one.

Different types of references

As you know, the reason for the java.lang.OutOfMemoryError: PermGen space / Metaspace is that the old, unused classloaders are not being garbage collected, and the reason they are not being garbage collected is that there is a reference from outside the classloader either to a class (including any instance of such class) loaded by that classloader, or to the classloader itself. What you might not know, is that there are actually four different types of references in Java. Before moving on to finding your classloader leak, I thought I’d take the time to explain them briefly.

There is the “normal” strong reference, which is what you have unless you make any effort to have a weaker reference. Then there is the weak reference, which you may have used – directly or indirectly for example via a WeakHashMap. The weak reference works in a such a way, that the referenced object may be garbage collected whenever there are no more strong references to it. This means that weak references will not themselves cause memory leaks.

Not too long ago, I also learned about soft references and phantom references. Soft references are stronger than weak references. An object will not be garbage collected, even if the only reference to it is a soft reference. What a soft reference means, is that whenever the JVM is about to run out of memory, as a last resort it will garbage collect all the objects with only soft (and possibly weaker) references. The JavaDoc for java.lang.ref.SoftReference says

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError.

The JavaDoc does not explicitly say whether this applies only to normal objects on the heap, or if this applies also to classes in the PermGen space. While investigating a classloader leak with a SoftReference in the mix, I downloaded the JDK 1.6 sources and tried to find out by studying them. My conclusion from the sources – that it does not apply to PermGen / class allocation – was contrary to what later testing showed… I’m still not certain how this really works, but since it was “Long time, no C” for me, I’m leaning towards believing that soft referenced objects are garbage collected before a java.lang.OutOfMemoryError: PermGen space is thrown. If you know for certain, please leave a comment! Update: I even asked a member of Oracles GC development team that couldn’t give a straight answer…

This leaves us with phantom references. I haven’t really gotten a hold of phantom references yet, but they are weaker than weak references and from what I understand, so weak you cannot even reach the referenced object having only a phantom reference to it. Rather the phantom reference can be used with a ReferenceQueue to be notified when the referenced object is being garbage collected. For now we will only need to know two things. 1: You will probably never use any phantom references. 2: Phantom references will not cause classloader leaks.

If you want to read more about the different types of references, see for example this blog entry.

Finding the leak

Now, to find out the cause of your classloader leak, right click on one of the classloaders that you found above – either one that you application server has marked as ready for garbage collection (in that case just right click the zombie marker itself), or one that might be a leaked one. If you’re in the “Class Loader Explorer” you need to first select “Class Loader” and in either case you will then select “Path To GC Roots” and then, since (assumingly) only strong references will cause class loader leaks, select “exclude all phantom/weak/soft etc. references”.
Path To GC Roots
Now one of three things can happen:

I have seen cases where no strong references at all are found. In this case, the classloader should be garbage collected. I won’t discuss now why it isn’t, but might be back with a rant about that. For now, it’s enought to know that it’s not your fault, and there is nothing you can do about it.

If you did not use the zombie marker feature of Resin (or similar in other app server), you may find a totally legitimate strong reference. As an example, your ClassLoader may be the contextClassLoader of a currently executing thread, such as one from the application servers thread pool, serving an HTTP request.
Serving HTTP request
(However, being the contextClassLoader of a thread may actually be the cause of the leak – more about that in part III).

Last but not least, we may find the cause of our leak, by looking throught references and finding the unwanted one that prevents the classloaders from being garbage collected. This reference may be within your own code, a third party library, your application server or the JVM. This is what it would look like, in case you have put your JDBC driver within your web application, rather than on the application server level.
JDBC driver leak
In the following posts, I intend to show a few different examples of what these references might look like, what causes the leak and how to fix or work around the leak.

Until then, good luck hunting down those nasty classloader leaks!

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)