Montag, 25. Januar 2010

Heap Dump Analysis with Memory Analyzer, Part 1: Heap Dumps

Almost two years passed since the Memory Analyzer tool (MAT) was published at Eclipse. Since then we have collected a lot of feedback, questions and comments by people using it, and we also gathered experience in using the tool ourselves. Most of the people find their way to solve memory problems using MAT relatively easy, but I am convinced there are also a lot of unexplored features and concepts within the tool, which can be very handy if properly understood and used. Therefore I decided to start a series of blog posts dedicated to memory analysis (with MAT) - starting from the basics and covering the different topics in detail. I would try to answer there some of the questions which pop-up most often, give some (hopefully useful) hints, explain the benefit of certain “unpopular” queries, and (please, please, please…) collect your feedback.

As the Memory Analyzer is a tool working with heap dumps, I will start with a detailed look at heap dumps – what they are, which formats MAT can read, what can be found inside, how one can get them, etc… If you are interested in the topic, read further.

What Is a Heap Dump?

A heap dump is a snapshot of the memory of a Java process at a certain point of time. There are different formats for persisting this data, and depending on the format it may contain different pieces of information, but in general the snapshot contains information about the java objects and classes in the heap at the moment the snapshot was triggered. As it is just a snapshot at a given moment, a heap dump does not contain information such as when and where (in which method) an object was allocated.

What Are Heap Dumps Good for?

So what are heap dumps good for? Well, for a lot of things :-)
If there is a system which is crashing sporadically with an OutOfMemoryError, then analyzing an automatically written heap dump with MAT can be a very easy way to find the root cause of the problem (read more here).
If you wan to analyze what the footprint into memory of your application is, then MAT and heap dumps are again a good choice. This combination can also help you to find which are your biggest structures, to find redundant data structures, to find space wasted in unused collections, and much more. Such topics will be covered later in this blog series.
If you however are trying to find out why too many garbage objects are produced during a certain operation, or want to see which methods allocate most of the objects, then you would need to use a profiler which is collecting data over time from the VM. Leak detecting techniques relying on analysis of the objects behaviour (allocation / garbage collection) are difficult to inplement using heap dumps (see object identity below).

Types of Heap Dumps

Currently the Memory Analyzer is able to work with HPROF binary heap dumps (produced by Sun, HP, SAP, etc… JVMs), IBM system dumps (after preprocessing them), and IBM portable heap dumps (PHD) from a variaty of IBM platforms. Let’s have a closer look at each of the types.

HPROF Binary Heap Dumps

A detailed specification of the content of an HPROF file can be found here.

Below are summarized some of the important pieces of the information used within MAT:

  • Information about all loaded classes. For every class the HPROF dump contains its name, its super-class, its class loader, the defined fields for the instances (name and type), the static fields of the class and their values


  • Information about all objects. For every object one can find the class and the values of all fields – both references and primitive fields. The possibility to look at the names and the content of certain objects, e.g. the char[] within a huge StringBuilder, the size of a collection, etc … can be very helpful when performing memory analysis


  • a list of GC roots (what is a GC root?)


  • the callstacks of all threads (in heap dumps from JDK 6 update 14 and above)


  • IBM System Dumps

    On IBM platforms one can preprocess a system dump (core file) from a Java process with the jxtract tool, and analyze the result with Memory Analyzer on an arbitrary other box (DTFJ libraries have to be additionally installed, see details below in the “How to get heap dump” section). As the core file contains the whole process memory, this kind of dump also provides all the details seen in an hprof heap dump (including the field names, primitive fields' values, stacktraces, etc…). There is even more information (e.g. process related information), but at the moment it is not used in Memory Analyzer.

    IBM Portable Heap Dumps (PHD)

    The PHD files are much smaller in size than the corresponding system dumps. However, they contain less information.
    The major difference between the HPROF dumps (or the IBM system dumps) and PHD dumps is that a PHD dump does not contain the values of the primitive fields. Only the non-null references from an object are provided. The second important difference is that the field names are not present, i.e. one can’t distinguish from which field a reference is made, and because of this the presented reference chains (paths) are not as concrete as with the other dumps. Using just the object graph is still enough for the analysis of many memory-related problems, but when the content of some fields is needed to get an idea why an object is too big then one has to use the system dumps.
    Usually when a PHD dump is generated there is also a corresponding javacore file. If they are put together in the same directory when the PHD dump is opened with MAT, then some of the data in the javacore file will also be used.

    So, having less information has both advantages and disadvantages - the PHD dumps are ways easier to transport from a customer (smaller size), can be used to find the biggest objects in the heap. And as they are usually written by default they are a good place to start the analysis. However, in some cases the information is enough to analyze in details the root cause of a problem.

    A Common API for Them All?

    Having different formats for the heap dumps is definitely easier for the VM providers, as they can provide very efficiently the specific data they have. This however doesn’t hold true for the tools, which are faced with the different formats, have to understand each of them, and possibly optimize for every format separately.
    An attempt to solve this problem and make the life of tool writers easier is made under the Apache Kato project and the related JSR 326. They put efforts to provide a common API for accessing data from vendor specific snapshots and thus give tools a standard way to extract the data needed for post-mortem diagnostics (including memory related problems).

    How To Get a Heap Dump

    How to obtain a heap dump depends on the platform and the used JVM. In general all VMs provide the possibility to request a heap dump manually, or to get one written from the VM when an OutOfMemoryError occurs. The second option is very convenient for the analysis of problems happening on production systems, or happening only sporadically, as one does not have to observe the system and wait for the problem to reoccur.
    A detailed description how a heap dump can be obtained depending on the JVM is provided here.

    Object Identity

    One of the questions which we are asked very often is if MAT can recognize the same objects in two or more heap dumps from the same process. The answer is unfortunately still no. Object IDs which are provided in the known to us heap dumps are just the addresses at which the objects are located. As objects are often moved and reordered by the JVM during a GC these addressed change. Therefore they cannot be used to compare the objects. Tagging the objects while they are allocated is something a profiler could do (usually at a relatively high cost), but in the standard heap dumps described above such information is missing. Some ideas how to guess identical objects were discussed in this bugzilla entry.

    Are Dead Objects Present In the Heap Dump?

    Another question which often pops up is if garbage objects are included in the heap dump. This again depends on the heap dump, but usually a GC is done before the heap dump is written. Nevertheless there are always some objects which are unreachable from the GC roots, i.e. should be thrown away. The Memory Analyzer removes such objects during the initial parsing of a heap dump in order to simplify the analysis. If you want to have a look at the “garbage” or even want the objects to remain, then find here what to do.

    In Closing ...
    This was my attempt to give a detailed explanation of the different heap dump formats which the Memory Analyzer understands, and also give the answers to some of the questions which we frequently get. I'm sure there are still questions to be answered, and the MAT team will be very happy to get them from you, be it as comments here, in our fourm, or in bugzilla.

    Kommentare:

    1. the link for "detailed specification of the content of an HPROF file" https://heap-snapshot.dev.java.net/files/documents/4282/31543/hprof-binary-format.html is BROKEN

      AntwortenLöschen
      Antworten
      1. I replaced the link with a different one, which is working

        Löschen
    2. Nice post. I really enjoy reading it. Thanks for sharing.

      carpet cleaning

      AntwortenLöschen
    3. Nice post informative. One question as how to read on the current thread activity any possibilities via MAT?

      AntwortenLöschen