Donnerstag, 25. Februar 2010

Heap Dump Analysis with Memory Analyzer, Part 2: Shallow Size

In the second part of the blog series dedicated to heap dump analysis with Memory Analyzer (see previous post here) I will have a detailed look at the shallow size of objects in the heap dump. In Memory Analyzer the term shallow size means the size of an object itself, without counting and accumulating the size of other objects referenced from it. For example the shallow size of an instance of java.lang.String will not include the memory needed for the underlying char[].

At first look this seems like a clear definition and a relatively boring topic to read about. So why did I decide to write about it? Because despite of the understandable definition, it is not always straightforward (for the tool developers) to calculate the shallow size, or (for a user) to understand how the size was calculated. The reasons? – different JVM vendors, different pointer sizes (32 / 64 bit), different dump formats, insufficient data in some heap dumps, etc … These factors could lead to small differences in the shallow sizes for objects of the same type shallow sizes being displayed for objects of the same type, and thus to questions.

Is it really important to know the precise size? Not necessarily. If you got a heap dump from an OutOfMemoryError in your production system, and MAT helps you to easily find the leak suspect there – let’s say it is some 500Mb big object - then the shallow size of every individual object accumulated in the suspect's size doesn’t really matter. The suspect is clear and you can go on and try to fix the problem.

On the other hand, if you are trying to understand the impact of adding some fields to your “base” classes, then the size of the individual instance can be of interest.

In the rest of the post I would have a look at the information available (or missing) in the different snapshot formats, explain what MAT displays as shallow size in the different cases, and try to answer some of the questions related to the shallow size which we usually get. If you are interested, read further.

As I mentioned already, the various snapshot formats contain different pieces of information about the objects. I will look at each of them separately, and additionally differentiate between object instances and classes. For more information on the different heap dump formats see part one of the blog series.

Instance Size in HPROF Heap Dumps

The heap dumps in HPROF binary format do not provide the correct size of each instance. What they provide is the number of bytes used to store the necessary data in the heap dump, but not the number of bytes the VM really needs to store the instance in the heap. Therefore, in MAT we have to (and attempt to) model how the VM would store the instance and how much memory it would need.

The approach we have to calculate the sizes is the following:

Instance Shallow Size = [object header] + space for fields of super class N + [some bytes because of alignment] + … + space for own fields + [some bytes because of alignment]

The sizes originally provided in the hprof file do not contain the object header and the additional space the VM uses to have the object addresses aligned in a certain way. These are namely the parameters we guess on our own. Does this always work? No. Unfortunately not. In the Bugzilla entry 231296 you can find some discussions on the topic, and also what the current state is. Here is just a short summary:

  • The formula we use gives correct results for dumps from 64 bit Sun VMs (1.4, 1.5, 1.6)

  • With the formula we use to calculated the sizes for dumps from 32 bit Sun VMs we observed: correct results for 1.6 dumps; small deviations for a handful of objects in 1.5 dumps

  • For the special case of a x64 Sun VM with compressed OOPs we have no solution at the moment. We haven't found a way to guess from the HPROF file that the pointers were compressed

  • Instance Size in IBM System Dumps (read with DTFJ)

    The DTFJ provides already the correct instance size for the objects, and in MAT we don’t have to do any guessing – the instance sizes are correct. What needs to mentioned is that it may happen that two instances of the same class and in the same heap dump have different shallow sizes. The Memory Analyzer was not prepared until recently to handle such a case. More information on when exactly such a difference could appear and some discussions on the necessary changes can be found in Bugzilla entry 301228.

    Instance Size in PHD Dumps

    The sizes provided in the PHD (Portable Heap Dumps from IBM JVMs) dumps are also correct and MAT just displays them without any further computations.

    Class Size in HPROF

    The HPROF format does not provide information about the memory needed for a class - for bytecode, for jitted code, etc... For every class the Memory Analyzer will show as shallow size the sum of the shallow sizes of all static fields of the class.

    Class Size in IBM System Dumps

    DTFJ provides more information about the classes sizes. The shallow size for classes reported by MAT includes the size of all methods (bytecode and jitted code sections) and also the on heap size of the java.lang.Class object.

    Class Size in PHD Dumps

    The PHD dumps do not contain information about the method sizes. The shallow size for classes in PHD dumps is just the size of the java.lang.Class object.

    Shallow Size of a Set of Objects

    The "Shallow Size" column appears in many views where objects have been aggregated in groups based on different criteria. The shallow size of a set of objects is just the sum of the shallow sizes of the individual objects in the set. There are two things to mention here, which have raised questions in the past:

  • in a class histogram (i.e. when objects are aggregated based on their class) the table may contain more than one entry with the same class name. This happens if the same class is loaded with more than one class loaders. If one is interested in the total shallow size of the instances of all classes, one can filter the histogram and sum up the sizes

  • the classes are added to the record for java.lang.Class. This means that the shallow size in the histogram entry for java.lang.Class is the sum of the shallow sizes of all classes (calculated as described above in the "Class Size ..." paragraphs.

  • Summary

    My personal view is that if one is using the Memory Analyzer to find the root cause of an OutOfMemoryError, then the shallow sizes of the individual objects are not that important.

    If for a given purpose one needs to understand in detail the sizes of objects, then it is important to remember that they depend on the concrete JVM and heap dump type, and that in some cases the displayed sizes are not given by the VM but are calculated in MAT. I hope that the short overview given in this blog could be helpful for better understanding these details.

    What Comes Next?

    In the next post I plan to write again about size, but a different one - the retained (or keep alive) size of objects and object sets.