以文本方式查看主题 - 计算机科学论坛 (http://bbs.xml.org.cn/index.asp) -- 『 C/C++编程思想 』 (http://bbs.xml.org.cn/list.asp?boardid=61) ---- UNDERSTANDING POINTERS (for beginners) (http://bbs.xml.org.cn/dispbbs.asp?boardid=61&rootid=&id=15530) |
-- 作者:awt -- 发布时间:3/14/2005 11:14:00 AM -- UNDERSTANDING POINTERS (for beginners) UNDERSTANDING POINTERS (for beginners) by Ted Jensen Version 0.0 This material is hereby placed in the public domain. September 5, 1993 telecommunication conferences on C I have noticed that one of the most difficult problems for beginners was the understanding of pointers. After writing dozens of short messages in attempts to clear up various fuzzy aspects of dealing with pointers, I set up a series of messages arranged in "chapters" which I could draw from or email to various individuals who appeared to need help in this area. conference. It received such a good acceptance, I decided to clean it up a little and submit it for inclusion in Bob Stout's SNIPPETS file. in the future. To that end, I am hoping that those who read this and find where it is lacking, or in error, or unclear, would notify me of same so the next version, should there be one, I can correct these deficiencys. pointers in various nets contributed to my knowledge in this area. So, I will just say Thanks to All. contacted via the echo itself or by email at: P.O. Box 324 Redwood City, CA 94064 CHAPTER 1: What is a pointer? understand is the concept of pointers. The purpose of this document is to provide an introduction to pointers and their use to these beginners. problem with pointers is that they have a weak or minimal feeling for variables, (as they are used in C). Thus we start with a discussion of C variables in general. of which can vary. The way the compiler and linker handles this is that it assigns a specific block of memory within the computer to hold the value of that variable. The size of that block depends on the range over which the variable is allowed to vary. For example, on PC's the size of an integer variable is 2 bytes, and that of a long integer is 4 bytes. In C the size of a variable type such as an integer need not be the same on all types of machines. things, the name of the variable and the type of the variable. For example, we declare a variable of type integer with the name k by writing: aside 2 bytes (on a PC) of memory to hold the value of the integer. It also sets up a symbol table. And in that table it adds the symbol k and the address in memory where those 2 bytes were set aside. memory location reserved for the storage of the value of k. being the value of the integer stored there (2 in the above example) and the other being the "value" of the memory location where it is stored, i.e. the address of k. Some texts refer to these two values with the nomenclature rvalue (right value, pronounced "are value") and lvalue (left value, pronunced "el value") respectively. assignment operator '=' (i.e. the address where the result of evaluation of the right side ends up). The rvalue is that which is on the right side of the assignment statment, the '2' above. Note that rvalues cannot be used on the left side of the assignment statement. Thus: 2 = k; is illegal. k = 2; j = 7; <-- line 1 k = j; <-- line 2 address of the variable j (its lvalue) and creates code to copy the value 7 to that address. In line 2, however, the j is interpreted as its rvalue (since it is on the right hand side of the assignment operator '='). That is, here the j refers to the value _stored_ at the memory location set aside for j, in this case 7. So, the 7 is copied to the address designated by the lvalue of k. copying of rvalues from one storage location to the other is done by copying 2 bytes. Had we been using long integers, we would be copying 4 bytes. designed to hold an lvalue (an address). The size required to hold such a value depends on the system. On older desk top computers with 64K of memory total, the address of any point in memory can be contained in 2 bytes. Computers with more memory would require more bytes to hold an address. Some computers, such as the IBM PC might require special handling to hold a segment and offset under certain circumstances. The actual size required is not too important so long as we have a way of informing the compiler that what we want to store is an address. which will hopefully become clearer a little later). In C when we define a pointer variable we do so by preceding its name with an asterisk. In C we also give our pointer a type which, in this case, refers to the type of data stored at the address we will be storing in our pointer. For example, consider the variable definition: of our integer variable). The '*' informs the compiler that we want a pointer variable, i.e. to set aside however many bytes is required to store an address in memory. The "int" says that we intend to use our pointer variable to store the address of an integer. Such a pointer is said to "point to" an integer. Note, however, that when we wrote "int k;" we did not give k a value. If this definiton was made outside of any function many compilers will initialize it to zero. Simlarly, ptr has no value, that is we haven't stored an address in it in the above definition. In this case, again if the definition is outside of any function, it is intialized to a value #defined by your compiler as NULL. It is called a NULL pointer. While in most cases NULL is #defined as zero, it need not be. That is, different compilers handle this differently. Also note that while zero is an integer, NULL need not be. want to store in ptr the address of our integer variable k. To do this we use the unary '&' operator and write: of k, even though k is on the right hand side of the assignment operator '=', and copies that to the contents of our pointer ptr. Now, ptr is said to "point to" k. Bear with us now, there is only one more operator we need to discuss. as follows: "points to" (contains the address of) k, the above statement will set the value of k to 7. That is, when we use the '*' this way we are refering to the value of that which ptr is pointing at, not the value of the pointer itself. pointed to by "ptr". run the following program and then review the code and the output carefully. #include int *ptr;
int main(void) { j = 1; k = 2; ptr = &k; printf("\n"); printf("j has the value %d and is stored at %p\n",j,&j); printf("k has the value %d and is stored at %p\n",k,&k); printf("ptr has the value %p and is stored at %p\n",ptr,&ptr); printf("The value of the integer pointed to by ptr is %d\n", *ptr); return 0; } --------------------------------------- To review: int k;) (e.g. int *ptr) where the asterisk tells the compiler that the variable named ptr is a pointer variable and the type tells the compiler what type the pointer is to point to (integer in this case). preceding its name with the unary '&' operator, as in &k. that which it points to, by using the unary '*' operator as in *ptr. where it is stored in memory. The "rvalue" of a variable is the value stored in that variable (at that address). CHAPTER 2: Pointer types and Arrays the "type" of variable that a pointer points to, as in: to" something, if we write: location pointed to by ptr. If ptr was defined as pointing to an integer, 2 bytes would be copied, if a long, 4 bytes would be copied. Similarly for floats and doubles the appropriate number will be copied. But, defining the type that the pointer points to permits a number of other interesting ways a compiler can interpret code. For example, consider a block in memory consisting if ten integers in a row. That is, 20 bytes of memory are set aside to hold 10 integer. of these integers. Furthermore lets say that integer is located at memory location 100 (decimal). What happens when we write: value is an address) and that it points to an integer (its current address, 100, is the address of an integer), it adds 2 to ptr instead of 1, so the pointer "points to" the _next_ _integer_, at memory location 102. Similarly, were the ptr defined as a pointer to a long, it would add 4 to it instead of 1. The same goes for other data types such as floats, doubles, or even user defined data types such as structures. ptr + 1 (though the point in the program when ptr is incremented may be different), incrementing a pointer using the unary ++ operator, either pre- or post-, increments the address it stores by the amount sizeof(type) (i.e. 2 for an integer, 4 for a long, etc.). is, by definition, an array of integers, this brings up an interesting relationship between arrays and pointers. each of these integers by means of a subscript to my_array, i.e. using my_array[0] through my_array[5]. But, we could alternatively access them via a pointer as follows: integer in our array */ notation or by dereferencing our pointer. The following code illustrates this: ------------------------------------------------------ #include int *ptr; { int i; ptr = &my_array[0]; /* point our pointer to the array */ printf("\n\n"); for(i = 0; i < 6; i++) { printf("my_array[%d] = %d ",i,my_array[i]); /*<-- A */ printf("ptr + %d = %d\n",i, *(ptr + i)); /*<-- B */ } return 0; } ---------------------------------------------------- Compile and run the above program and carefully note lines A and B and that the program prints out the same values in either case. Also note how we dereferenced our pointer in line B, i.e. we first added i to it and then dereferenced the the new pointer. Change line B to read: carefully look at the actual outcome. &var_name[0] we can replace that with var_name, thus in our code where we wrote: pointer. While this is true, I prefer to mentally think "the name of the array is a _constant_ pointer". Many beginners (including myself when I was learning) forget that _constant_ qualifier. In my opinon this leads to some confusion. For example, while we can write ptr = my_array; we cannot write constant. That is, the location at which the first element of my_array will be stored cannot be changed once my_array[] has been declared. the names "ptr" and "my_array" as used above. We said that my_array is a constant pointer. What do we mean by that? Well, to understand the term "constant" in this sense, let's go back to our definition of the term "variable". When we define a variable we set aside a spot in memory to hold the value of the appropriate type. Once that is done the name of the variable can be interpreted in one of two ways. When used on the left side of the assignment operator, the compiler interprets it as the memory location to which to move that which lies on the right side of the assignment operator. But, when used on the right side of the assignment operator, the name of a variable is interpreted to mean the contents stored at that memory address set aside to hold the value of that variable. constants, as in: i = 2; data portion of memory, "2" is a constant and, as such, instead of setting aside memory in the data segment, it is imbedded directly in the code segment of memory. That is, while writing something like k = i; tells the compiler to create code which at run time will look at memory location &i to determine the value to be moved to k, code created by i = 2; simply puts the '2' in the code and there is no referencing of the data segment. the compiler establishes where the array itself is to be stored, it "knows" the address of my_array[0] and on seeing: there is no referencing of the data segment beyond that. expect a beginner to understand all of it on first reading. With time and experimentation you will want to come back and re-read the first 2 chapters. But for now, let's move on to the relationship between pointers, character arrays, and strings. CHAPTER 3: Pointers and Strings relationship between pointers and arrays. It also makes it easy to illustrate how some of the standard C string functions can be implemented. Finally it illustrates how and when pointers can and should be passed to functions. necessarily true in other languages. In Pascal or (most versions of) Basic, strings are treated differently from arrays. To start off our discussion we will write some code which, while preferred for illustrative purposes, you would probably never write in an actual program. Consider, for example: my_string[1] = 'e'; my_string[2] = 'd': my_string[3] = '\0'; result is a string in that it is an array of characters _terminated_with_a_nul_character_. By definition, in C, a string is an array of characters terminated with the nul character. Note that "nul" is _not_ the same as "NULL". The nul refers to a zero as is defined by the escape sequence '\0'. That is it occupies one byte of memory. The NULL, on the other hand, is the value of an uninitialized pointer and pointers require more than one byte of storage. NULL is #defined in a header file in your C compiler, nul may not be #defined at all. permits two alternate ways of achieving the same thing. First, one might write: permits: as was done in the previous examples, the nul character ( '\0' ) is automatically appended to the end of the string. compiler sets aside an contiguous block of memory 40 bytes long to hold characters and initialized it such that the first 4 characters are Ted\0. #include char strB[80]; { char *pA; /* a pointer to type character */ char *pB; /* another pointer to type character */ puts(strA); /* show string A */ pA = strA; /* point pA at string A */ puts(pA); /* show what pA is pointing to */ pB = strB; /* point pB at string B */ putchar('\n'); /* move down one line on the screen */ while(*pA != '\0') /* line A (see text) */ { *pB++ = *pA++; /* line B (see text) */ } *pB = '\0'; /* line C (see text) */ puts(strB); /* show strB on screen */ return 0; } --------- end program 3.1 ------------------------------------- 80 characters each. Since these are globally defined, they are initialized to all '\0's first. Then, strA has the first 42 characters initialized to the string in quotes. and show the string on the screen. We then "point" the ponter pA at strA. That is, by means of the assignment statement we copy the address of strA[0] into our variable pA. We now use puts() to show that which is pointed to by pA on the screen. Consider here that the function prototype for puts() is: puts is a pointer, that is the _value_ of a pointer (since all parameters in C are passed by value), and the value of a pointer is the address to which it points, or, simply, an address. Thus when we write: line A. Line A states: character (i.e. the terminating '\0'), do the following: space pointed to by pB, then increment pA so it points to the next character and pB so it points to the next space. points to the terminating nul character and the loop ends. However, we have not copied the nul character. And, by definition a string in C _must_ be nul terminated. So, we add the nul character with line C. while watching strA, strB, pA and pB and single stepping through the program. It is even more educational if instead of simply defining strB[] as has been done above, initialize it also with something like: strA and then repeat the single stepping procedure while watching the above variables. Give these things a try! of copying a string. After playing with the above until you have a good understanding of what is happening, we can proceed to creating our own replacement for the standard strcpy() that comes with C. It might look like: { char *p = destination while (*source != '\0') { *p++ = *source++; } *p = '\0'; return destination. } standard routine of returning a pointer to the destination. character pointers, i.e. addresses, and thus in the previous program we could write: { my_strcpy(strB, strA); puts(strB); } which would have the prototype: function will not modify the contents pointed to by the source pointer. You can prove this by modifying the function above, and its prototype, to include the "const" modifier as shown. Then, within the function you can add a statement which attempts to change the contents of that which is pointed to by source, such as: an X. The const modifier should cause your compiler to catch this as an error. Try it and see. have shown us. First off, consider the fact that *ptr++ is to be interpreted as returning the value pointed to by ptr and then incrementing the pointer value. On the other hand, note that this has to do with the precedence of the operators. Were we to write (*ptr)++ we would increment, not the pointer, but that which the pointer points to! i.e. if used on the first character of the above example string the 'T' would be incremented to a 'U'. You can write some simple example code to illustrate this. of characters. What we have done above is deal with copying an array. It happens to be an array of characters but the technique could be applied to an array of integers, doubles, etc. In those cases, however, we would not be dealing with strings and hence the end of the array would not be _automatically_ marked with a special value like the nul character. We could implement a version that relied on a special value to identify the end. For example, we could copy an array of postive integers by marking the end with a negative integer. On the other hand, it is more usual that when we write a function to copy an array of items other than strings we pass the function the number of items to be copied as well as the address of the array, e.g. something like the following prototype might indicate: to play with this idea and create an array of integers and see if you can write the function int_copy() and make it work. large arrays. For example, if we have an array of 5000 integers that we want to manipulate with a function, we need only pass to that function the address of the array (and any auxiliary information such as nbr above, depending on what we are doing). The array itself does _not_ get passed, i.e. the whole array is not copied and put on the stack before calling the function, only its address is sent. a function. When we pass an integer we make a copy of the integer, i.e. get its value and put it on the stack. Within the function any manipulation of the value passed can in no way effect the original integer. But, with arrays and pointers we can pass the address of the variable and hence manipulate the values of of the original variables. CHAPTER 4: More on Strings back up a little and look at what was done in Chapter 3 on copying of strings but in a different light. Consider the following function: { int i = 0; { dest[i] = source[i]; i++; } dest[i] = '\0'; return dest; } chosen to use array notation instead of pointer notation to do the actual copying. The results are the same, i.e. the string gets copied using this notation just as accurately as it did before. This raises some interesting points which we will discuss. a character pointer or the name of the array as above, what actually gets passed is the address of the first element of each array. Thus, the numerical value of the parameter passed is the same whether we use a character pointer or an array name as a parameter. This would tend to imply that somehow: replaced with *(a + i) without any problems. In fact, the compiler will create the same code in either case. Now, looking at this last expression, part of it.. (a + i) is a simple addition using the + operator and the rules of c state that such an expression is commutative. That is (a + i) is identical to (i + a). Thus we could write *(i + a) just as easily as *(a + i). comes the curious truth that if: int i; etc. and assigned the 3rd or 4th element a value using the conventional approach and then print out that value to be sure you have that working. Then reverse the array notation as I have done above. A good compiler will not balk and the results will be identical. A curiosity... nothing more! Additions, generally speaking, take more time than incrementations (such as those done using the ++ operator as in i++). This may not be true in modern optimizing compilers, but one can never be sure. Thus, the pointer version may be a bit faster than the array version. change: the same time in either case. some of your own programs using pointers. Manipulating strings is a good place to experiment. You might want to write your own versions of such standard functions as: strcat(); strchr(); pointers in a future chapter. For now, let's move on and discuss structures for a bit. CHAPTER 5: Pointers and Structures containing different data types by means of a structure declaration. For example, a personnel file might contain structures which look something like: char lname[20]; /* last name */ char fname[20]; /* first name */ int age; /* age */ float rate; /* e.g. 12.75 per hour */ }; and we want to read each one out and print out the first and last name of each one so that we can have a list of the people in our files. The remaining information will not be printed out. We will want to do this printing with a function call and pass to that function a pointer to the structure at hand. For demonstration purposes I will use only one structure for now. But realize the goal is the writing of the function, not the reading of the file which, presumably, we know how to do. the dot operator as in: #include #include char lname[20]; /* last name */ char fname[20]; /* first name */ int age; /* age */ float rate; /* e.g. 12.75 per hour */ }; { strcpy(my_struct.lname,"Jensen"); strcpy(my_struct.fname,"Ted"); printf("\n%s ",my_struct.fname); printf("%s\n",my_struct.lname); return 0; } -------------- end of program 5.1 -------------- many used in C programs. To the above we might want to add: date_of_last_raise; last_percent_increase; emergency_phone; medical_plan; Social_S_Nbr; etc..... do manipulate the data in these structures by means of functions. For example we might want a function print out the name of any structure passed to it. However, in the original C (Kernighan & Ritchie) it was not possible to pass a structure, only a pointer to a structure could be passed. In ANSI C, it is now permissible to pass the complete structure. But, since our goal here is to learn more about pointers, we won't pursue that. enough room on the stack to hold it. With large structures this could prove to be a problem. However, passing a pointer uses a minimum amount of stack space. discuss how we go about passing a pointer to a structure and then using it within the function. will accept as a parameter a pointer to a structure and from within that function we want to access members of the structure. For example we want to print out the name of the employee in our example structure. structure declared using struct tag. We define such a pointer with the definition: pointer. But, how do we de-reference the pointer to a structure? Well, consider the fact that we might want to use the pointer to set the age of the employee. We would write: parenthesis with that which st_ptr points to, which is the structure my_struct. Thus, this breaks down to the same as my_struct.age. designers of C have created an alternate syntax with the same meaning which is: #include char lname[20]; /* last name */ char fname[20]; /* first name */ int age; /* age */ float rate; /* e.g. 12.75 per hour */ }; { struct tag *st_ptr; /* a pointer to a structure */ st_ptr = &my_struct; /* point the pointer to my_struct */ strcpy(my_struct.lname,"Jensen"); strcpy(my_struct.fname,"Ted"); printf("\n%s ",my_struct.fname); printf("%s\n",my_struct.lname); my_struct.age = 63; show_name(st_ptr); /* pass the pointer */ return 0; }
void show_name(struct tag *p) { printf("\n%s ", p->fname); /* p points to a structure */ printf("%s ", p->lname); printf("%d\n", p->age); } -------------------- end of program 5.2 ---------------- The reader should compile and run the various code snippets and using a debugger monitor things like my_struct and p while single stepping through the main and following the code down into the function to see what is happening. CHAPTER 6: Some more on Strings, and Arrays of Strings all assignments are to be understood as being global, i.e. made outside of any function, including main. in the first 4 bytes (three for the characters in the quotes and a 4th to handle the terminating '\0'. could write: nul character and store the total of the four characters in memory the location of which would be returned by the array name, in this case my_string. these? The answer is.. yes. Using the array notation 4 bytes of storage in the static memory block are taken up, one for each character and one for the nul character. But, in the pointer notation the same 4 bytes required, _plus_ N bytes to store the pointer variable my_name (where N depends on the system but is usually a minimum of 2 bytes and can be 4 or more). variable). In the pointer notation my_name is a variable. As to which is the _better_ method, that depends on what you are going to do within the rest of the program. each of these definitions are done within a function as opposed to globally outside the bounds of any function. { char a[] = "ABCDE"; . . } { char *cp = "ABCDE"; . . } In my_function_A the automatic variable is the character array a[]. In my_function_B it is the pointer cp. While C is designed in such a way that a stack is not required on those processors which don't use them, my particular processor (80286) has a stack. I wrote a simple program incorporating functions similar to those above and found that in my_function_A the 5 characters in the string were all stored on the stack. On the other hand, in my_function_B, the 5 characters were stored in the data space and the pointer was stored on the stack. 5 characters in the data space as opposed to the stack. I did this exercise to point out just one more difference between dealing with arrays and dealing with pointers. By the way, array initialization of automatic variables as I have done in my_function_A was illegal in the older K&R C and only "came of age" in the newer ANSI C. A fact that may be important when one is considering portabilty and backwards compatability. between pointers and arrays, let's move on to multi-dimensional arrays. Consider, for example the array: following light. ^^^^^^^^^^^^^ to be a variable in its own right, we have an array of 10 characters with the "name" multi[5]. But this name, in itself, implies an array of 5 somethings. In fact, it means an array of five 10 character arrays. Hence we have an array of arrays. In memory we might think of this as looking like: multi[1] = "abcdefghij" multi[2] = "ABCDEFGHIJ" multi[3] = "9876543210" multi[4] = "JIHGFEDCBA" multi[1][7] = 'h' multi[4][0] = 'J' for the above should look like: array so it can interpret multi + 1 as the address of the 'a' in the 2nd row above. That is, it adds 10, the number of columns, to get this location. If we were dealing with integers and an array with the same dimension the compiler would add 10*sizeof(int) which, on my machine, would be 20. Thus, the address of the "9" in the 4th row above would be &multi[3][0] or *(multi + 3) in pointer notation. To get to the content of the 2nd element in row 3 we add 1 to this address and dereference the result as in multi[row][col] yield the same results. instead of character arrays. #include #define COLS 10 { int row, col; for (row = 0; row < ROWS; row++) for(col = 0; col < COLS; col++) multi[row][col] = row*col; for (row = 0; row < ROWS; row++) for(col = 0; col < COLS; col++) { printf("\n%d ",multi[row][col]); printf("%d ",*(*(multi + row) + col)); } return 0; } ----------------- end of program 6.1 --------------------- version, the name of a 2 dimensional array is said to be a pointer to a pointer. With a three dimensional array we would be dealing with an array of arrays of arrays and a pointer to a pointer to a pointer. Note, however, that here we have initially set aside the block of memory for the array by defining it using array notation. Hence, we are dealing with an constant, not a variable. That is we are talking about a fixed pointer not a variable pointer. The dereferencing function used above permits us to access any element in the array of arrays without the need of changing the value of that pointer (the address of multi[0][0] as given by the symbol "multi"). introduction to pointers for newcomers to C. In C, the more one understands about pointers the greater flexibility one has in the writing of code. The above has just scratched the surface of the subject. In time I hope to expand on this material. Therefore, if you have questions, comments, criticisms, etc. concerning that which has been presented, I would greatly appreciate your contacting me using one of the mail addresses cited in the Introduction.
|
-- 作者:awt -- 发布时间:3/14/2005 11:17:00 AM -- 这篇文章我看好久了,一直收藏,因为我当时看它是为了学指针,没想到看完后英文变好了。。。。。。 可以说是深入浅出吧,本人是个菜鸟,希望与各位菜鸟朋友共勉 |
W 3 C h i n a ( since 2003 ) 旗 下 站 点 苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》 |
199.219ms |