Tuesday, May 10, 2011

Tomcat and Webapp Hot-deploy Permgen "Leaks"

Spent the last few days trying to fix an issue of running out of permgen when hot-deploying (removing old war to undeploy and adding new war to deploy while Tomcat is running).

Using (old) Tomcat 5.5.20, I attached via JMX remotely from JConsole and watched the permgen climb during deploy, climb a tiny bit during undeploy, then climb more during deploy, etc. as I repeated until running out of permgen for a portlet war (webapp).

I tweaked the settings until I had this in catalina.sh (this is insecurely opening up JMX btw):

JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.hostname=name.of.the.host \
                      -XX:+UseParNewGC \
                      -XX:+UseConcMarkSweepGC \
                      -XX:+CMSPermGenSweepingEnabled \
                      -XX:+CMSClassUnloadingEnabled \
                      -Dcom.sun.management.jmxremote \
                      -Dcom.sun.management.jmxremote.ssl=false \
                      -Dcom.sun.management.jmxremote.port=(specify open port here) \
                      -Dcom.sun.management.jmxremote.authenticate=false \
                      -Djava.library.path=/usr/lib:/usr/lib64 \
                      -Xmx1024m \
                      -Xms1024m \
                      -XX:MaxPermSize=128m \
                      -XX:NewSize=128m \
                      -XX:MaxTenuringThreshold=0 \
                      -XX:+UseTLAB \
                      -XX:SurvivorRatio=128 \
                      "
(notice: I would have set permgen size to 256m here, but was trying to test it failing due to lack of permgen during hot-deploys.)

I then switched to Tomcat 7.0.12 and did same and also tried out VisualVM. I could see that the permgen even with the same settings started off much lower. The deploys made permgen jump even more than before, though- at least that is how it appeared; I didn't do quantitative measurement even though I probably should have; I was just "spiking" it. The undeploys between the deploys didn't make permgen go up really so that was nice. Long story short: it still ran out of permgen for the portlet (webapp) that I was redeploying.

I tried against and used Tomcat 7's new /manager/text/findleaks. All this did was just print out that the portlet/webapp I was hot-deploying was causing a memory leak. This would have been more useful if I hadn't already been focusing on a particular webapp I knew had an issue. It would really be nice to have something that could tell you not only what webapps were a problem, but could help you diagnose what classes/instances are sticking around after you undeploy that maybe shouldn't be there. I also think that logging leaks may be more helpful than using part of manager, because many don't deploy manager (they remove it).

I then used jmap to create hprofs (dumps) that I first looked at in VisualVM to compare side-by-side. On the server, I did:

ps auxf|grep java
I got the process id and then did the following, first after the first hot-deploy, second after undeploying, and third after redeploying again (just moving around the same war in and out of webapps directory):
jmap /tmp/(some_filename).hprof (process id)
(note: be sure to tar.gz these via tar -czvf filename.hprof before scp'ing them if it helps to save time.)

I tried Mat (Eclipse Memory Analyzer, which is a standalone tool so don't have to have/use Eclipse) to look at the pie charts and dominator trees. I found that after opening/loading the heaps to view side-by-side, when I clicked on Histogram icon (looks like a bar graph) that it would show an icon to the right called "Compare to Another Heap Dump", so I tried that. When I compared the first deploy to the heap after the second deploy, things that stood out (in addition to normal types and collections) were instances resulting from reflection (Method objects), JarFile and URLJarFile instances, and xerces's XSSimpleTypeDecl. However, focusing on the new instances wasn't helpful. I was actually interested in what wasn't new. It would be nice if there were a way to focus on what instances where the same between the two dumps. Is there an easy way of doing that with VisualVM? I didn't see one. And that would require me to eliminate much of the stuff that should stay the same from the results (Tomcat jars, etc.). It just isn't that easy yet.

I also used JHat to browse and crawl the difference:

jhat -J-Xmx2048m -baseline ~/Desktop/after_1st_hotdeploy.hprof ~/Desktop/after_2nd_hotdeploy.hprof
(note: ironically it first ran out of memory when I tried to compare without using -J to give it more memory.)

It was a little harder to make use of browsing the results via served html and showing both new and old instances in the diff. It is also fairly slow and drags the system down a bit. Here at first I was again probably wrongly focused on the new instances rather than the ones that remained between deploys.

Going back and looking at both Mat and JHat later, I'm thinking that I should have removed all but the absolute necessities for the webapp, because all of the classes loaded by Tomcat for every other webapp, common dependencies, etc. get in the way of trying to diagnose what shouldn't have been left behind. And, maybe I should have only focused on what remained after the undeploy.

I read this post/interview about Tomcat 7's memory leak protection and about the various things in webapps that can cause leaks:
http://java.dzone.com/articles/memory-leak-protection-tomcat

It was an older interview and not totally surprising, but it is nice to know that at least someone was working on trying to clean up other people's messes when it comes to memory leaks related to hot-deploys and undeploys.

All that said, here are some recommendations for diagnosing permgen memory leaks related to undeploys/hot-deploys:

  1. Determine which webapps have leaks by either using JConsole or VisualVM to watch permgen as you undeploy and redeploy, or by undeploying and redeploying each webapp and then using Tomcat 7's findleaks to tell you which webapps are leaking.
  2. After determining which webapps to focus on, create a new Tomcat environment like the old one but where you really limit jars and webapps to only those absolutely necessary to test the hot-deploy leak issue.
  3. Finally, do diffs with JHat and determine what is the same (not marked as new) and try to exclude in your mind those packages loaded by Tomcat to focus on the others, or you may use Mat and visually compare dominator trees, etc.
Unfortunately, the last item is the tough one. You might be better off researching possible leaks in versions of jars/libraries you are using and then carefully updating Tomcat and jars used by your webapps to the latest versions- hoping that the leaks have been fixed. Also, do what you can to ensure that threads, connections, and anything else your webapp starts/uses/references (directly or indirectly) is cleaned up as much as possible.

Edit: Christopher Schultz on the Tomcat users mailing list said:

Generally speaking, when you have a permgen "leak" across webapp redeployments, this is the situation:

1. Webapp invokes some code that stores a reference to the webapp's WebappClassLoader

2. That reference is not cleared during undeploy [this is the leak]

The big problem is that the WebappClassLoader keeps a reference to all java.lang.Class objects it loads, so it can be quite bulky when it leaks, depending on exactly how many classes your webapp has locally.

markt has a great presentation which lays out the anatomy of these types of memory leaks, how to diagnose them, and often how to fix them. These are slides from a presentation at last year's ApacheCon NA. It's /very/ readable.

http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf

- -chris

That is a very helpful presentation!

No comments: