Gay But Not Narrow (ruakh) wrote in cprogramming,
Gay But Not Narrow
ruakh
cprogramming

Printf-ing a char[] field of a struct.

So I hadn't programmed in C for a while, so I thought I'd undertake a small just-for-fun project in it. Almost immediately, I ran into a strange wall. Behind the cut is a short program that demonstrates the problem I'm having (using GCC 3.2.3 for MinGW).

#include <stdio.h> // printf()

typedef struct {
        char s[4];
} four_bytes;

four_bytes identity(four_bytes fb)
{
        return fb;
}

int main()
{
        four_bytes fb1, fb2;

        // Initialize fb1.s to "ABC":
        fb1.s[0] = 'A';
        fb1.s[1] = 'B';
        fb1.s[2] = 'C';
        fb1.s[3] = '\0';

        // Goes crazy (prints garbage, call it a segfault):
        printf("%s\n", identity(fb1).s);

        // Goes crazy (prints garbage, call it a segfault):
        printf("%s\n", (fb2 = identity(fb1)).s);

        // Works fine - prints "ABC\n":
        fb2 = identity(fb1);
        printf("%s\n", fb2.s);

        // Works fine - prints "ABC - 0\n":
        printf("%c", identity(fb1).s[0]);
        printf("%c", identity(fb1).s[1]);
        printf("%c", identity(fb1).s[2]);
        printf(" - %d\n", identity(fb1).s[3]);

        return 0;
}

I just don't get it — it seems like all four groups should work exactly the same (aside from the extra " - 0" in the fourth one, which is only there to demonstrate that I do have a null byte there). Especially the second and third groups: how the heck could those differ from each other? (Obviously this is completely workaroundable, so it's not a big deal; but still, I'd like to understand what's going on!)

Thanks in advance for any thoughts. :-)

  • Post a new comment

    Error

    default userpic
  • 14 comments

The %s format identifier specifies a pointer to a string. The identity() function returns a four_bytes struct on the stack - not the same thing.

I don't know what's going on with the 2nd printf(), I'd have to compile with the -S option and inspect the assembly outpu, or maybe use a debugger.

fb2 = identity(fb1); essentially does a memcpy() so that's legal.

Passing fb2.s to printf() does in fact leave a pointer on the stack so that's legal.

In general, I'd avoid passing large chunks of data to functions, or returning chunks of data from functions. It adds a lot of overhead to the code. The preferred C way is to pass pointers.

If you raise the warning level by adding Wall -W -O4 to the compilation command, I predict the compiler will complain. If it wasn't dinnertime I'd try it myself. :)

The %s format identifier specifies a pointer to a string. The identity() function returns a four_bytes struct on the stack - not the same thing.

Yeah, I know, but identity(…).s is a char[4], which is a string, no? And if not, how come the version with fb2.s does work?

(And I'm not passing "large chunks of data" — I'm passing a four-byte structure. Which is actually the same amount of data as a pointer on 32-bit systems.)

The suggestion to turn on all warnings is a good one — I don't know why I didn't think of it — but it doesn't clarify anything for me: it complains that in the examples where the program goes crazy, the second argument isn't a pointer; except that the second argument is a pointer, isn't it? It's the name of an array. And what would be different between those examples and the example with fb2.s?

lightning_rose

February 24 2008, 03:16:46 UTC 9 years ago Edited:  February 24 2008, 03:28:52 UTC

identity() is a function that returns a copy of a struct (in this case 4 bytes) on the stack. This is not the same as returning a pointer to a struct on the stack. See my comment to prasun below.

I can't explain why the second example fails, but if the compiler tells me it's not a pointer, then I believe it. :) I think looking at the assembly generated by the compiler would make things clear, but I'm not an x86 assembly programmer so I doubt I would be much help interpreting things.

fb2 is a struct containing an array so passing fb2.s is the same as passing &sb2.s[0]

The fact that C treats an array name (ie a pointer to the beginning of the array) the same as explicitly defining the address of the first element of the array (also a pointer) is what I believe leads to the common misconception that arrays and pointers are the same thing (I assure you they're not, but I think you already know that).

I understand four_bytes is just that in size, but the principle is that in C it's generally better to pass and return pointers to structs rather than copies of structs.

Edit: I wrote this comment before I saw pm215's response.


This works:
printf("%c\n", identity(fb1).s[0]);
printf("%c\n", identity(fb1).s[1]);
printf("%c\n", identity(fb1).s[2]);

but this does not even compile:
printf("%s\n", (char*)(identity(fb1).s));

structures.c:23: error: cannot convert to a pointer type
structures.c:23: warning: reading through null pointer (argument 2)

Good example.

The following generates the same fatal error, proving identity() does not return a pointer.

printf("%s\n", (char*) identity(fb1) );

In C, casting a pointer of one type to another *never* generates an error.

The following generates the same fatal error, proving identity() does not return a pointer.

Yes, but I never claimed that it did, and my code didn't use it as one. (See pm215's comment below for what seems to be the correct explanation.)

Thanks for trying, though. I gather that you first read my code on an empty stomach, which makes it easy to miss things. :-)
Thanks!

pm215's comment below explains why this is, if you're interested. :-)

Why yes, yes I am interested. :)

Your code is relying on a change between C89 and C99:

pm215@canth:/tmp$ gcc -std=c99 -g -Wall -o foo foo.c
pm215@canth:/tmp$ ./foo
ABC
ABC
ABC
ABC - 0
pm215@canth:/tmp$ gcc -g -Wall -o foo foo.c
foo.c: In function ‘main’:
foo.c:23: warning: format ‘%s’ expects type ‘char *’, but argument 2 has type ‘char[4]’
foo.c:26: warning: format ‘%s’ expects type ‘char *’, but argument 2 has type ‘char[4]’
pm215@canth:/tmp$ ./foo
Segmentation fault (core dumped)

In the two examples which do not work under C89 rules, you have an argument to printf() like identity(fb1).s. This has array type. It is not an lvalue (because the thing on the left of the dot is a function).

The C89 rule for decay of arrays to pointers is given in the C FAQ:

An lvalue [see question 2.5] of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T.
(The exceptions are when the array is the operand of a sizeof or & operator, or is a literal string initializer for a character array.)

Since identity(fb1).s is not an lvalue, this doesn't happen, and what you have is not a char* but a char[], hence the warning and the crash/weird behaviour.

Under C99, the rule has been relaxed and applies to any expression of array type (with the same three exceptions as C89), not just to lvalues. So if you tell the compiler to apply C99 semantics you do get decay-to-pointer and your code works. [See section 6.3.2.1 para 3 in the C99 standard.]

In the interests of writing more generally portable code, I suggest you stay within the C89 rules rather than fiddling with gcc options :-)

Thanks!

I didn't actually know that C99 allowed it — I've always used gcc's default, which I think is C89 with some non-standard extensions — I just figured that what I was doing made sense.

Now that I see your explanation, it does make some sense: array names aren't exactly the same as pointers, even if we can usually pretend that they are. :-P

(The silly thing is, I was actually relying on the fact that array names aren't exactly the same as pointers in my code — specifically, on the fact that an array in a structure is an actual in-line part of the structure, and gets copied when the structure does — yet somehow it didn't occur to me that there might be other relevant differences.)

I still can't claim I completely understand the need for the lvalue restriction, or why (fb2 = identity(fb1))</code> doesn't evaluate to the fb2 lvalue (I'm guessing it has something to do with where the sequence points go, and the rule about one read and one write being allowed between sequence points?), but I can at least understand and accept that these things are the case. :-P

Thanks again. :-)
Darn. I hate when I do <tt>…</code>. You'd think I'd learn. :-P
I still can't claim I completely understand the need for the lvalue restriction

Well, there isn't a need, really -- that's why the C99 committee felt free to remove it ;-)

why (fb2 = identity(fb1)) doesn't evaluate to the fb2 lvalue

The unhelpful answer is "because C99 section 6.5.16 para 3 says so".

Well, sometimes an unhelpful answer is all we have. :-)

Thanks again.
pm 215 is correct. I confess I'm not terribly up on changes since c99.

Here's a version that compiles without warnings and executes correctly under both c89 and c99

#include <stdio.h> // printf() typedef struct { char s[4]; } four_bytes; four_bytes identity(four_bytes fb) { return fb; } int main() { four_bytes fb1, fb2; // Initialize fb1.s to "ABC": fb1.s[0] = 'A'; fb1.s[1] = 'B'; fb1.s[2] = 'C'; fb1.s[3] = '\0'; // This works fine. // calculates the address of s[0] left on the stack by identity() printf("%s\n", &identity(fb1).s[0]); // Same as previous example, but // calculates the address of fb2.s[0] // fb2 is also on the stack, but could be anywhere in mem printf("%s\n", &(fb2 = identity(fb1)).s[0]); // Also works fine // proves identity() was correctly copied into fb2 printf("%s\n", fb2.s); // Works fine - prints "ABC\n": fb2 = identity(fb1); printf("%s\n", fb2.s); // Works fine - prints "ABC - 0\n": printf("%c", identity(fb1).s[0]); printf("%c", identity(fb1).s[1]); printf("%c", identity(fb1).s[2]); printf(" - %d\n", identity(fb1).s[3]); return 0; }