Classloader leaks I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

I’m planning a series of posts around classloader leaks, also known as PermGen memory leaks. You have probably arrived at this page because your Java web application crashes with the dreaded java.lang.OutOfMemoryError: PermGen space. I will not explain what this error means nor the reason it occurs, since there is lots of information about it on the net – for example, see Frank Kieviet’s blogs on the problem and its solution.

What I will focus on in this first post, is the step between the “what” and the “how” – the “where” that is often forgotten in other online discussions. After you’ve realized you have classloader leaks, you must identify where those leaks are, before you can fix them.

Not many years ago, finding the source of a classloader leak was really tricky – or at least I thought so. The tools at hand were jmap and jhat, which are quite “raw”. Later there were some commercial tools, such as YourKit to help you in the process. Nowadays there are Open Source alternatives that makes it relatively easy to find the offending code. I will show you step by step how to do it.

First things first: the heap dump

The first thing you need to do to find a classloader leak, is to aquire a heap dump to analyze. The heap should be dumped after at least one ClassLoader instance has leaked, so that you can analyze what references there are to the leaked instance, that prevents it from being garbage collected.

One of the easiest ways to do this, is to add a JVM parameter that makes the (Sun/Oracle) JVM automatically create a heapdump whenever a java.lang.OutOfMemoryError occurs. The advantage of this, is that you don’t have to try to force the appearance of the leak, in case you don’t know what triggers it. This also means you won’t spend time looking for a leak in a heapdump where there is none.

The name of the parameter is -XX:+HeapDumpOnOutOfMemoryError, so add -XX:+HeapDumpOnOutOfMemoryError to your command line, script or configuration file – depending on what application server you are using and how you are starting it. Then run and redeploy the application until it crashes with java.lang.OutOfMemoryError: PermGen space and voila – there is your heap dump. The name of the file will be something like java_pid18148.hprof, and it will be located in whatever was the startup directory of your application server, which may be different from the directory from where you launched the startup script.

Now that you’ve got your heap dump, download Eclipse Memory Analyzer (MAT), run it and open the heap dump you just aquired.

Open heap dump

An alternative approach, is to extract the heap dump from a locally running application server, from inside MAT. Just start MAT and select “Aquire Heap Dump …” from the File menu. This will present you with a list of running Java applications.

Select your application server (make sure it’s not the application servers bootstrapper / watchdog) and click Finish.

Find a leaked classloader

When you open or aquire a heapdump, MAT will ask you if you want to perform some kind of analyzis on the dump, such as looking for memory leak suspects. This may be good for looking for heap leaks, but in my experience is not of much help when it comes to classloader leaks, since the leaked classloaders often have less retained (non-Class) objects than the current “non-leaked” one. Therefore I suggest you click Cancel.
Getting Started Wizard
What you should do instead, depends on what application server you used when aquiring the heap dump. In case you were using a fairly recent version (>= 4.0.12) of Caucho’s Resin you’re in luck, since it has some features that significantly simplifies finding the leaked classloaders. What Resin does, quite geniously, is that it adds a marker to each classloader that from Resins perspective is ready do be garbage collected. That allows us to simply search for that marker and analyze why the marked classloaders are not garbage collected.

So click the “Open Query Browser” icon, and select “List objects” / “with incoming references”.
List objects
Now type in the class name of the marker, which for Resin version 4.0.12 – 4.0.20 is called com.caucho.loader.ZombieMarker and since Resin 4.0.21 it is called com.caucho.loader.ZombieClassLoaderMarker.
List zombie markers
Clicking Finish will present you with a list of zombie marker instances, one for every classloader that Resin considers ready for garbage collection. You can see the classloader for each of them by clicking the little arrow in front, which will unfold the incoming references.

List of zombie markers

Now you can skip the rest of this section.

I don’t know if any other application servers provide something similar to Resins zombie markers, but assuming yours do not, you should do this instead: click the “Open Query Browser” icon, and select “Java Basics” / “Class Loader Explorer”.

Class Loader ExplorerUnless you already know the class name of the classloaders used for each web application in your application server, just click Finish. This will present you with a list of all the classloaders in your heap dump.

Class Loader Explorer listHopefully you can figure out by the class names, which ones are – possibly leaked – web application instances. For each such instance, you need to perform the steps in Finding the leak below to determine if that instance is a leaked one.

Different types of references

As you know, the reason for the java.lang.OutOfMemoryError: PermGen space is that the old, unused classloaders are not being garbage collected, and the reason they are not being garbage collected is that there is a reference from outside the classloader either to a class loaded by that classloader, or to the classloader itself. What you might not know, is that there are actually four different types of references in Java. Before moving on to finding your classloader leak, I thought I’d take the time to explain them briefly.

There is the “normal” strong reference, which is what you have unless you make any effort to have a weaker reference. Then there is the weak reference, which you may have used – directly or indirectly for example via a WeakHashMap. The weak reference works in a such a way, that the referenced object may be garbage collected whenever there are no more strong references to it. This means that weak references will not themselves cause memory leaks.

Not too long ago, I also learned about soft references and phantom references. Soft references are stronger than weak references. An object will not be garbage collected, even if the only reference to it is a soft reference. What a soft reference means, is that whenever the JVM is about to run out of memory, as a last resort it will garbage collect all the objects with only soft (and possibly weaker) references. The JavaDoc for java.lang.ref.SoftReference says

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError.

The JavaDoc does not explicitly say whether this applies only to normal objects on the heap, or if this applies also to classes in the PermGen space. While investigating a classloader leak with a SoftReference in the mix, I downloaded the JDK 1.6 sources and tried to find out by studying them. My conclusion from the sources – that it does not apply to PermGen / class allocation – was contrary to what later testing showed… I’m still not certain how this really works, but since it was “Long time, no C” for me, I’m leaning towards believing that soft referenced objects are garbage collected before a java.lang.OutOfMemoryError: PermGen space is thrown. If you know for certain, please leave a comment! Update: I even asked a member of Oracles GC development team that couldn’t give a straight answer…

This leaves us with phantom references. I haven’t really gotten a hold of phantom references yet, but they are weaker than weak references and from what I understand, so weak you cannot even reach the referenced object having only a phantom reference to it. Rather the phantom reference provides some kind of hook for when the referenced object is being garbage collected. For now we will only need to know two things. 1: You will probably never use any phantom references. 2: Phantom references will not cause classloader leaks.

If you want to read more about the different types of references, see for example this blog entry.

Finding the leak

Now, to find out the cause of your classloader leak, right click on one of the classloaders that you found above – either one that you application server has marked as ready for garbage collection (in that case just right click the zombie marker itself), or one that might be a leaked one. If you’re in the “Class Loader Explorer” you need to first select “Class Loader” and in either case you will then select “Path To GC Roots” and then, since (assumingly) only strong references will cause class loader leaks, select “exclude all phantom/weak/soft etc.” references.
Path To GC Roots
Now one of three things can happen:
I have seen cases where no strong references at all are found. In this case, the classloader should be garbage collected. I won’t discuss now why it isn’t, but might be back with a rant about that. For now, it’s enought to know that it’s not your fault, and there is nothing you can do about it.

If you did not use the zombie marker feature of Resin (or similar in other app server), you may find a totally legitimate strong reference. As an example, your ClassLoader may be the contextClassLoader of a currently executing thread, such as one from the application servers thread pool, serving an HTTP request.
Serving HTTP request
(However, being the contextClassLoader of a thread may actually be the cause of the leak – more about that in part III).

Last but not least, we may find the cause of our leak, by looking throught references and finding the unwanted one that prevents the classloaders from being garbage collected. This reference may be within your own code, a third party library, your application server or the JVM. This is what it would look like, in case you have put your JDBC driver within your web application, rather than on the application server level.
JDBC driver leak
In the following posts, I intend to show a few different examples of what these references might look like, what causes the leak and how to fix or work around the leak.

Until then, good luck hunting down those nasty classloader leaks!

Links to all parts in the series

Part I – How to find classloader leaks with Eclipse Memory Analyser (MAT)

Part II – Find and work around unwanted references

Part III – “Die Thread, die!”

Part IV – ThreadLocal dangers and why ThreadGlobal may have been a more appropriate name

Part V – Common mistakes and Known offenders

Part VI – “This means war!” (leak prevention library)

Presentation on Classloader leaks (video and slides)

  • Hal Deadman

    This was a very helpful post. I look forward to the rest of your series. nnI just used your instructions to track down a class loader memory leak in an application we run in Weblogic 12c. We are using the Quartz scheduler configured by Spring but we had a “org.quartz.plugins.management.ShutdownHookPlugin” quartz plugin registered as part of the config. We didn’t need it because Spring should cleanup the scheduler on shutdown and the ShutdownHookPlugin is for JVM termination, not web app undeploy. The ShutdownHookPlugin registers a shutdown hook like so: java.lang.Runtime.getRuntime().addShutdownHook(t); Since t is an anonymous Thread sub-class with a reference to a Quartz class, the classloader can’t be GC’d. I hope the leaks in the rest of our applications are so simple.nnI noticed that after the leak was resolved not all of the webapp classloaders disappeared right away or even after a single GC. I was forcing GC through Jconsole and it seemed like it took a few GC’s to get rid of all the web app class loaders from undeployed instances of the application. nnBy the way, the Weblogic 12c web app classloader is called weblogic.utils.classloaders.ChangeAwareClassLoader. nnI was running Weblogic as a service on Windows so in order to get jmap.exe to be able to talk to the java process running Weblogic I had to run jmap.exe as a service as well by using “psexec -s “. If you don’t do that you get the following unhelpful error message from jmap. nn: Not enough storage is available to process this commandnThe -F option can be used when the target process is not responding

    • Glad you found the post helpful.nnThe JVM shutdown hook was actually the example I planned to use in my next post, although one a bit more unexpected.nnThanks for the detailed tips for WebLogic users, hopefully that will help some future visitor.

  • m.p@zenika.com

    very nice post!nBy the way, could you tell us a bit more about the orphan classloader (w/o strong references) that will never be gc and why?nnThanks in advancenM.n

    • Hi M.nWhy some classloaders are not garbage collected while there is no strong reference to them, remains one of the great JVM mysteries to me.nnHave you tried switching to the Concurrent Mark and Sweep (CMS) garbage collector, as per the bottom of this post?

      • Martijn van Tilburg

        Depending on the chosen garbage collector the JVM is not very eager on collecting the PermGen space (not even with Full GC). With our application we had serious issues with the G1 (Garbage first) collector in Java 7. Only by invoking the garbage collector repeatedly manually, all PermGen was reclaimed.

  • Pingback: Búsqueda de Memory Leaks mediante Eclipse MAT « Java Mania()