출처 : http://www.ibm.com/developerworks/rational/library/06/0822_satish-giridhar/
Introduction
Most programmers agree that defects related to incorrect memory usage
and management are the hardest to isolate, analyze, and fix. Therefore,
they are the costliest defects to have in your programs. These defects
are typically caused by using uninitialized memory, using un-owned
memory, buffer overruns, or faulty heap management.
IBM® Rational® Purify® is an advanced memory usage error detecting
tool that enables software developers and testers to detect memory
errors in C and C++ programs. While a program runs, Purify collects and
analyzes data to accurately identify memory errors that are about to
happen. It provides detailed information, such as the error location
(function call stack) and size of the affected memory, to assist you in
quickly locating the problem areas. It also greatly reduces debugging
time and complexity, so you can focus on fixing the flaw in the
application logic that is causing the error.
Purify is available for all prominent platforms, including IBM® AIX®
on Power PC®, HP-UX® on PA-RISC, Linux™ on x86 and x86/64, Sun™ Solaris™
on SPARC®, and the Microsoft® Windows® on x86 (check documentation for
updated list of supported platform). In this article, you will first
learn about various types of memory access errors with the help of
examples, and then learn how to use Purify for detecting and fixing
those errors. In the
Download
section, you will find the C source file (memerrors.c) with the code
samples in this article, and you can use them to experiment with Purify.
Memory errors
Memory errors can be broadly classified into four categories:
- Using memory that you have not initialized
- Using memory that you do not own
- Using more memory than you have allocated (buffer overruns)
- Using faulty heap memory management
Purify detects errors in all of these categories and identifies
the
type of the error within a category. Understanding the types of
errors helps you identify and isolate subtle mistakes in your program
that may cause the program to act strangely and unpredictably. In rest
of this section, various error types are explained using code samples.
Using memory that you have not initialized
When you read from memory that you forgot to initialize, you get
garbage value. This error looks deceptively innocent, but it has
the potential to cause mysterious program behavior.
The garbage value that you get could fortuitously happen to be a
meaningful value that your program can handle. For example, some
operating systems initialize a memory block with zeros when it is
allocated for the first time. If zero is a meaningful value for your
program, it may run smoothly, initially. However, after the program runs
for a while, the memory might be freed and reallocated. When a memory
block is recycled, it has the values that were stored in it when it was
last used. These values are unpredictable. Depending upon the value,
your program may crash immediately, may run for a while and crash
sometime later, or may run smoothly but produce strange results. Since
the value could be different in each run, the behavior of the program
can be baffling, making it hard to reproduce the problem consistently.
Purify detects such errors and reports an
Uninitialized Memory Read (UMR)
error for every use of uninitialized memory. It
further differentiates between using uninitialized memory and copying
value from an uninitialized memory location to another memory location.
When an uninitialized memory is copied, Purify reports an
Uninitialized Memory Copy (UMC)
error. After the copying, the destination location also has
uninitialized memory; therefore, whenever this memory is used, Purify
reports a UMR.
Listing 1 shows a simple example. There are two integers:
i and
j. The integer
i is initialized with 10. Then the value of
j is copied into
i. Since
j has not been initialized,
i also has garbage value after
j is copied into it. Purify maintains status of each memory location. It is capable of the analysis that reveals that, although
i has been initialized with 10, copying an uninitialized value has made
i also uninitialized. Therefore, Purify reports any usage of
i (for example, as an argument to
printf in the next line) as a UMR error.
Listing 1. An example of UMR and UMC errors
void uninit_memory_errors() {
int i=10, j;
i = j; /* UMC: j is uninitialized, copied into i */
printf("i = %d\n", i); /* UMR: Using i, which has junk value */
}
This example is intentionally trivial to make it easy for you to
identify the problem just by inspecting the code. But real-world
applications have many thousands lines of code and have complex control
flow. The location where a valid value is corrupted by
copying a garbage value into it, could be in a different function, and
potentially in a different sub-system or library. If you inspect
the
bar method in Listing 2, and you do not know much about
foo method, you would not suspect that
i would be corrupted after calling the
foo
method. Depending upon the size and complexity of the source code, you
may have to spend considerable time and effort to analyze and then to
rectify this type of defect. Purify eliminates this effort and reports
UMRs, indicating the use of uninitialized memory value.
Listing 2. Another example of UMR and UMC errors
void foo(int *pi) {
int j;
*pi = j; /* UMC: j is uninitialized, copied into *pi */
}
void bar() {
int i=10;
foo(&i);
printf("i = %d\n", i); /* UMR: Using i, which is now junk value */
}
As you notice, whenever a memory location with a UMC error is finally
used, Purify reports a UMR error for that same memory location. UMC
errors may not always be critical, and Purify hides them by default.
Later in this article, you will learn how to see UMC and other errors
that Purify hides.
Using memory that you don't own
Explicit memory management and pointer arithmetic present
opportunities for designing compact and efficient programs. However,
incorrect use of these features can lead to complex defects, such as a
pointer referring to memory that you don't own. In this case, too,
reading memory through such pointers may give garbage value or cause
segmentation faults and core dumps, and using garbage values can cause
unpredictable program behavior or crashes.
Purify detects these errors. In addition to reporting the type of
error, Purify indicates the memory area that the pointer refers to
and where that memory has been allocated. This is typically a good clue
for identifying the cause of the error. This category includes following
types of errors:
- Null pointer read or write (NPR, NPW)
- Zero page read or write (ZPR, ZPW)
- Invalid pointer read or write (IPR, IPW)
- Free memory read or write (FMR, FMW)
- Beyond stack read or write (BSR, BSW)
Null Pointer Read/Write (NPR, NPW) and Zero Page Read/Write (ZPR, ZPW):
If a pointer's value can potentially be null (NULL), the pointer
should not be de-referenced without checking it for being null. For
example, a call to
malloc can return a null result if no memory is available. Before using the pointer returned by
malloc,
you need to check it to make sure that isn't null. For example, a
linked list or tree traversal algorithm needs to check whether the next
node or child node is null.
It is common to forget these checks. Purify detects any memory access
through de-referencing a null pointer, and reports an NPR or NPW error.
When you see this error, examine whether you need to add a null pointer
check or whether you wrongly assumed that your program logic guaranteed
a non-null pointer. On AIX, HP, and under some linker options in
Solaris, dereferencing a null pointer produces a zero value, not a
segmentation fault signal.
The memory is divided into pages, and it is "illegal" to read from or
write to a memory location on the zero'th page. This error is
typically due to null pointer or incorrect pointer arithmetic
computations. For example, if you have a null pointer to a structure and
you attempt to access various fields of that structure, it will lead to
a zero page read error, or ZPR.
Listing 3 shows a simple example of both NPR and ZPR problems. The
findLastNodeValue method has a defect, in that it does not check whether the
head parameter is null. NPR and ZPR errors occur when the
next and
val
fields are accessed, respectively.
Listing 3. An example of NPR and ZPR errors
typedef struct node {
struct node* next;
int val;
} Node;
int findLastNodeValue(Node* head) {
while (head->next != NULL) { /* Expect NPR */
head = head->next;
}
return head->val; /* Expect ZPR */
}
void genNPRandZPR() {
int i = findLastNodeValue(NULL);
}
Invalid Pointer Read or Write (IPR, IPW):
Purify tracks all memory operations. When it detects a pointer to a
memory location that has not been allocated to the program, it reports
either an IPR or IPW error, depending on whether it was a read or write
operation. The error can happen for multiple reasons. For example, you
will get this type of error if you have an uninitialized pointer
variable and the garbage value happens to be invalid. As another
example, if you wanted to do
*pi = i;, where
pi
is a pointer to an integer and
i is an integer. But, by mistake, you didn't type the
* and wrote just
pi = i;. With the help of implicit casting, an integer value is copied as a pointer value. When you dereference
pi
again, you may get an IPR or IPW error. This can also happen when
pointer arithmetic results in an invalid address, even when it is not on
the zero'th page. (See Listing 4.)
Listing 4. An example of IPR and IPW errors
void genIPR() {
int *ipr = (int *) malloc(4 * sizeof(int));
int i, j;
i = *(ipr - 1000); j = *(ipr + 1000); /* Expect IPR */
free(ipr);
}
void genIPW() {
int *ipw = (int *) malloc(5 * sizeof(int));
*(ipw - 1000) = 0; *(ipw + 1000) = 0; /* Expect IPW */
free(ipw);
}
IPR and IPW are encountered commonly while using functions that return a pointer (e.g.
malloc)
in 64-bit applications because pointer is 8 byte long and integer is 4
byte long. If the method declaration is not included, compiler assumes
that the method returns an integer, and implicitly casts the return
value and retains only lower 4 bytes of the pointer value. Purify
reports IPR and IPW upon using this invalid pointer. (See Listing 5.)
Listing 5. Another example of IPR and IPW errors
/*Forgot to include following in a 64-bit application:
#include <malloc.h>
#include <stdlib.h>
*/
void illegalPointer() {
int *pi = (int*) malloc(4 * sizeof(int));
pi[0] = 10; /* Expect IPW */
printf("Array value = %d\n", pi[0]); /* Expect IPR */
}
Free Memory Read or Write (FMR, FMW):
When you use
malloc or
new, the operating
system allocates memory from heap and returns a pointer to the location
of that memory. When you don't need this memory anymore, you de-allocate
it by calling
free or
delete. Ideally, after de-allocation, the memory at that location should not be accessed thereafter.
However, you may have more than one pointer in your program pointing
to the same memory location. For instance, while traversing a linked
list, you may have a pointer to a node, but a pointer to that node is
also stored as
next in the previous node. Therefore, you
have two pointers to the same memory block. Upon freeing that node,
these pointers will become
heap dangling pointers, because they point to memory that has already been freed. Another common cause for
this error is usage of
realloc method. (See Listing 6 code.)
The heap management system may respond to another
malloc
call in the same program and allocate this freed memory to other,
unrelated objects. If you use a dangling pointer and access the memory
through it, the behavior of the program is undefined. It may result in
strange behavior or crash. The value read from that location would be
completely unrelated and garbage. If you modify memory through a
dangling pointer, and later that value is used for the intended purpose
and unrelated context, the behavior will be unpredictable. Of course,
either an uninitialized pointer or incorrect pointer arithmetic can also
result in pointing to already freed heap memory.
Listing 6. An example of FMR and FMW errors
int* init_array(int *ptr, int new_size) {
ptr = (int*) realloc(ptr, new_size*sizeof(int));
memset(ptr, 0, new_size*sizeof(int));
return ptr;
}
int* fill_fibonacci(int *fib, int size) {
int i;
/* oops, forgot: fib = */ init_array(fib, size);
/* fib[0] = 0; */ fib[1] = 1;
for (i=2; i<size; i++)
fib[i] = fib[i-1] + fib[i-2];
return fib;
}
void genFMRandFMW() {
int *array = (int*)malloc(10);
fill_fibonacci(array, 3);
}
Beyond Stack Read or Write (BSR, BSW) :
If the address of a local variable in a function is directly or
indirectly stored in a global variable, in a heap memory location,
or somewhere in the stack frame of an ancestor function in the call
chain, upon returning from the function, it becomes a
stack dangling pointer.
When a stack dangling pointer is de-referenced to read from or write to
the memory location, it accesses memory outside of the current stack
boundaries, and Purify reports a BSR or BSW error. Uninitialized pointer
variables or incorrect pointer arithmetic can also result in BSR or BSW
errors.
In the example in Listing 7, the
append
method returns the address
of a local variable. Upon returning from that method, the stack frame
for the method is freed, and stack boundry shrinks. Now the
returned pointer would be outside the stack bounds. If you use that
pointer, Purify will report a BSR or BSW error. In the example,
you would expect
append("IBM ", append("Rational ", "Purify")) to return
"IBM Rational Purify", but it returns garbage manifesting BSR and BSW errors.
Listing 7. An example of BSR and BSW errors
char *append(const char* s1, const char *s2) {
const int MAXSIZE = 128;
char result[128];
int i=0, j=0;
for (j=0; i<MAXSIZE-1 && j<strlen(s1); i++,j++) {
result[i] = s1[j];
}
for (j=0; i<MAXSIZE-1 && j<strlen(s2); i++,j++) {
result[i] = s2[j];
}
result[++i] = '\0';
return result;
}
void genBSRandBSW() {
char *name = append("IBM ", append("Rational ", "Purify"));
printf("%s\n", name); /* Expect BSR */
*name = '\0'; /* Expect BSW */
}
Using memory that you haven't allocated, or buffer overruns
When you don't do a boundary check correctly on an array, and then
you go beyond the array boundary while in a loop, that is
called buffer overrun. Buffer overruns are a very common programming
error resulting from using more memory than you have allocated. Purify
can detect buffer overruns in arrays residing in heap memory, and it
reports them as
array bound read (ABR) or
array bound write (ABW) errors. (See Listing 8.)
Listing 8. An example of ABR and ABW errors
void genABRandABW() {
const char *name = "IBM Rational Purify";
char *str = (char*) malloc(10);
strncpy(str, name, 10);
str[11] = '\0'; /* Expect ABW */
printf("%s\n", str); /* Expect ABR */
}
Using faulty heap memory management
Explicit memory management in C and C++ programming puts the onus of
managing memory on the programmers. Therefore, you must be vigilant
while allocating and freeing heap memory. These are the common memory
management mistakes:
- Memory leaks and potential memory leaks (MLK, PLK, MPK)
- Freeing invalid memory (FIM)
- Freeing mismatched memory (FMM)
- Freeing non-heap memory (FNH)
- Freeing unallocated memory (FUM)
Memory leaks and potential memory leaks:
When all pointers to a heap memory block are lost, that is commonly
called a memory leak. With no valid pointer to that memory, there is no
way you can use or release that memory. You lose a pointer to a memory
when you overwrite it with another address, or when a pointer variable
goes out of the scope, or when you free a structure or an array that has
pointers stored in it. Purify scans all of the memory and reports all
memory blocks without any pointers pointing to them as memory leaks
(MLK). In addition, it reports all blocks as potential leaks, or PLK
(called MPK on Windows platforms) when there are no pointers to the
beginning of the block but there are pointers to the middle of the
block.
Linsting 9 shows a simple example of a memory leak and a heap dangling pointer. In this example, interestingly, methods
foo and
main
independently seem to be error-free, but together they manifest both
errors. This example demonstrates that interactions between methods may
expose
multiple flaws that you may not find simply by inspecting individual
functions. Real-world applications are very complex, thus tedious and
time-consuming for you to inspect and to analyze the control flow and
its consequences. Using Purify gives you vital help in detecting errors
in such situations.
First, in the method
foo, the pointer
pi is
overwritten with a new memory allocation, and all pointers to the old
memory block are lost. This results in leaking the memory block that was
allocated in method
main. Purify reports a memory leak
(MLK) and specifies the line where the leaked memory was allocated. It
eliminates the slow process of hunting down the memory block that is
leaking, therefore shortens the debugging time. You can start debugging
at the memory allocation site where the leak is reported, and then track
what you are doing with that pointer and where you are overwriting it.
Later, the method
foo frees up the memory it has allocated, but the pointer
pi still holds the address (it is not set to
null). After returning from method
foo to
main, when you use the pointer
pi, it refers to the memory that has already been freed, so
pi becomes a dangling pointer. Purify promptly reports a FMW error at that location.
Listing 9. An example of a memory leak and a dangling pointer
int *pi;
void foo() {
pi = (int*) malloc(8*sizeof(int)); /* Allocate memory for pi */
/* Oops, leaked the old memory pointed by pi holding 4 ints */
/* use pi */
free(pi); /* foo() is done with pi, so free it */
}
void main() {
pi = (int*) malloc(4*sizeof(int)); /* Expect MLK: foo leaks it */
foo();
pi[0] = 10; /* Expect FMW: oops, pi is now a dangling pointer */
}
Listing 10 shows an example of a potential memory leak. After incrementing pointer
plk,
it points to the middle of the memory block, but there is no pointer
pointing to the beginning of that memory block. Therefore, a potential
memory leak is reported at the memory allocation site for that block.
Listing 10. An example of potential memory leak
int *plk = NULL;
void genPLK() {
plk = (int *) malloc(2 * sizeof(int)); /* Expect PLK */
plk++;
}
Freeing invalid memory:
This error occurs whenever you attempt to free memory that you are
not allowed to free. This may happen for various reasons: allocating and
freeing memory through inconsistent mechanisms, freeing a non-heap
memory (say, freeing a pointer that points to stack memory), or freeing
memory that you haven't allocated. When using Purify for the Windows
platform, all such errors are reported as
freeing invalid memory (FIM).
On the UNIX® system, Purify further classifies these errors by
reporting freeing mismatched memory (FMM), freeing non-heap memory
(FNH), and freeing unallocated memory (FUM) to indicate the exact reason
for the error.
Freeing mismatched memory (FMM) is reported when a memory
location is de-allocated by using a function from a different family
than the one used for allocation. For example, you use
new operator to allocate memory, but use method
free to de-allocate it. Purify checks for the following families, or matching pairs:
malloc() / free()
calloc() / free()
realloc() / free()
- operator
new / operator delete
- operator
new[] / operator delete[]
Purify reports any incompatible use of memory allocation and de-allocation routine as an FMM error. In the example in
Listing 11, the memory was allocated using the
malloc method but freed using the
delete
operator, which is not the correct counterpart, thus incompatible.
Another common example of an FMM error is C++ programs that allocate an
array using the
new[] operator, but
free the memory using a scalar
delete operator instead of array
delete[]
operator. These errors are hard to detect through code inspection,
because the memory allocation and de-allocation locations may not be
located close to each other, and because there is no difference in
syntax between an integer pointer and a pointer to an integer array.
Listing 11. An example of a freeing mismatched memory error
void genFMM() {
int *pi = (int*) malloc(4 * sizeof(int));
delete pi; /* Expect FMM/FIM: should have used free(pi); */
pi = new int[5];
delete pi; /* Expect FMM/FIM: should have used delete[] pi; */
}
Freeing non-heap memory (FNH) error is reported when you call
free with a non-heap address (a stack address, for instance).
Freeing unallocated memory (FUM)
is reported when you try to free unallocated memory, such as memory
that you have already freed, or the pointer you are trying to free
points to the middle of a memory block. Listing 12 shows examples of these errors.
Listing 12. Examples of freeing non-heap memory and freeing unallocated memory errors
void genFNH() {
int fnh = 0;
free(&fnh); /* Expect FNH: freeing stack memory */
}
void genFUM() {
int *fum = (int *) malloc(4 * sizeof(int));
free(fum+1); /* Expect FUM: fum+1 points to middle of a block */
free(fum);
free(fum); /* Expect FUM: freeing already freed memory */
}