Sanchit's Blog

String Termination in C/C++

January 9th, 2009 born2c0de No comments

This is a typical assignment given to students who are learning C/C++.
“Write a function that copies the contents of one string into another.”
Note that in this example a string is referred to as a character array.

Some people come up with code similar to this one:

void copy(char [],char []);

int main()
{
   char s1[10],s2[10];
   printf(”Enter String 1:”);
   gets(s1);
   copy(s1,s2);
   printf(”nThe copied string is: “);
   puts(s2);
   return 0;
}

void copy(char x[],char y[])
{
   int i=0;
   while(x[i])
   {
      y[i]=x[i];
      i++;
   }
}

Can you notice that something is missing in the copy() function? Yes, the null termination character was not appended at the end of the new string. I corrected a friend of mine who the same mistake and he asked me why the code works for other strings as well such as “Sanch”.

Luckily, we were near a computer so I could demonstrate my examples straight away to convince him.

Observe the declaration of this character array:

char name[10]=”Sanchit”;

Internally, this String is stored like this:
S,a,n,c,h,i,t,,,

Yes, if the string initial declaration is less than the size provided in the square brackets, the rest of the elements are filled with zeros. This is applicable for arrays of any data type.

Since the ASCII Value of is 0, I can represent my array like this again,

S,a,n,c,h,i,t,0x0,0x0,0x0

Does this ring any bells?

If you create another array of the same size (here: 10) and copy this string to it using the code given above, the rest of the elements after the String are already . So there is no need to the character as C would know where the string terminates.

But what if the String is larger? Assuming “Sanchit Karve” is stored in the same Array, it will look like this:
S,a,n,c,h,i,t, <space> ,K,a,r,v,e, , , …

If you notice, that inspite of the string being larger than the array size, it is stored completely after occupying the space after the array. Since, now the space outside the array has been accessed, the data out there is not zero. Instead they contain garbage values. So, for such situations, we need to append the character at the end of the string, so that C can figure where the string ends. Otherwise, C will output all bytes till a zero is encountered.

So, appending the character is not required if we assume that the string length is lesser the array size. However, we should also ensure that we declare enough space before input larger than the allocated size because of the following reasons:

Unlike other Languages C/C++ do not provide Array Bound Checking features.
Hackers make the most of such types of errors, known as Buffer Overflow Errors, and launch Buffer Overflow Attacks where the string is inserted with shellcode (yes, in hex), and the return address of the function is overwritten to the address of the string. This results in the processor passing control to the shellcode and executing it after the function returns. But just because we append the character does not mean that Buffer Overflow Errors don’t occur. But it’s done just as a safety measure so that the program can run without faults.

Categories: Programming Tags: c++, strings

Passing 2D Arrays to functions

January 9th, 2009 born2c0de No comments

Passing 2D arrays without bounds is a common error people make.
Some people assume that if this works:

void someFunc(int []);

then this should should work too:

void anotherFunc(int [][]);

As normal as it may seem, it is incorrect.

For passing 2D Arrays to a function, the column element must be mentioned like this:

void func1(int [][upperSize]);

But why do we need to include the column element? If C/C++ compilers can figure out the size of a 1D array then why not for 2D?

That’s because 2D Array elements are stored consecutively as two 1D Arrays.

Have a look at this code snippet:

int main()
{
int x[][2]={1,2,3,4,5,6,7,8,9,10};
int *p,*q;
int i;
p=&x[0][0];
for(i=0;i<10;i++)
printf(”%d “,*(p+i));

printf(”nn Now using pointer q:n”);

q=&x[1][0];
for(i=0;i<10;i++)
printf(”%d “,*(q+i));

return 0;
}

Run the Program. You’ll get this as the output.
1 2 3 4 5 6 7 8 9 10

Now using pointer q: 3 4 5 6 7 8 9 10 0 4239532
Now change the number 2 in the array declaration to 5 like this:

int x[][5]={1,2,3,4,5,6,7,8,9,10};

Run the code and observe the output:
1 2 3 4 5 6 7 8 9 10

Now using pointer q: 6 7 8 9 10 0 4239532 0 4235541 1

As you can see, the output using the pointer p remains unchanged because we’re setting it to the first element of the array itself, so it iterates to every element after it.

But q is set to the first element of the 2nd row.
Now which one is the first element of the second row? It’s 6. How do we know it?
It’s because we’ve placed 5 in the column bracket to specify the bounds of each column.

So if we don’t place any value, the compiler gets confused as to how many numbers can be grouped into one column.
Hence whenever such confusion occurs, the compiler generates the error: size of the int[] is unknown or zero.

Categories: Programming Tags: 2D Arrays, c++

Reversing a String recursively

January 9th, 2009 born2c0de No comments

When you’re asked to reverse a string, you’ll mostly use the strrev() function or write your own boring implementation using loops.
Ever tried it recursively?

Have a look at this:
#include <iostream>

using namespace std;

void ret_str(char* s)
{

if(*s != ”)
ret_str(s+1);

cout << *s;

}

int main()
{
ret_str(”born2c0de”);
return 0;
}

Isn’t that some neat piece of code? All we have to do is push the next character to the stack, so when the stack is popped, the characters come out in reverse order.

However this functions isn’t efficient, it’s horribly slow and sluggish and larger strings will result in overloading the stack so you’re better off using the functions which use loops internally.

Categories: Programming Tags: c++, recursion, strings

Consequences of Data type Range Violations

January 9th, 2009 born2c0de 2 comments

I’ve come across a lot of people who wonder why a number stored in a variable, if exceeds its range, turns into a weirder negative number.
To explain this phenomena they offer a conclusion without showing any evidence and this conclusion is in no way close to the truth.
Here’s what happens.

The numbers change because of the complement notation that the computer uses to store negative numbers.

And it’s not the compiler’s fault but the processor’s limitation.
This code will give negative numbers on 32-bit,16-bit and 8bit Microprocessors as each register can hold 32/16/8 bits respectively.
You won’t encounter this problem on 64-bit processors as each register can hold a maximum of 64-bits. (Actually, you will encounter the same problem since the value of INT_MAX will be a 64-bit number for a 64 bit processor….but it’ll work fine with 32-bit numbers)

Adding a Number to a maximum value set in these registers creates the problem.
Have a look at this printf statement (I’ve typecasted it for portability)

printf(”%d = %xn%d = %x”,INT_MAX,INT_MAX,(unsigned long)INT_MAX+1,(unsigned long)INT_MAX+1);

The compiler generates the following code:

push    10000000000000000000000000000000b
push    10000000000000000000000000000000b
push    1111111111111111111111111111111b
push    1111111111111111111111111111111b
push    offset aDXDX ; format
call    _printf
add     esp, 14
…
…
aDXDX  db ‘%d = %x’,0Ah
       db ‘%d = %x’,0

Now, the values in the first 2 push Instructions is the binary representation of INT_MAX (2147483647 for 32-bit systems). Note that it is comprised of only ONE’s.

When 1 is added to INT_MAX, look what happens to the next two push instructions.
This number is 2147483648 but in unsigned notation…but in signed notation it’s -2147483648 since the Most Significant Bit decides the Sign of a Number.

And since by default the compiler assumes that a variable is signed, you get the negative value.

Try the same C code after replacing %d with the %u format specifier.
You’ll notice that 2147483648 will be the output.

Remember, there’s nothing like postive or negative numbers…It’s all about interpretation. 111 (for a 3 bit architecture) could mean 7(without using MSB for sign-convention…ie. unsigned) as well as -1 (using MSB for sign ie. signed).

You might wonder why I’ve blamed the Processor and not the compiler in spite of the fact that the Compiler has precalculated the result of addition and passed it to printf.
The reason is that I’ve compiled the above code in Aggressive Optimization, so the compiler generates push instructions.
Normally the Compiler plays safe and lets the processor take care of such values by generating the following code:

mov eax, 1111111111111111111111111111111b
lea edx, [eax+1]
push edx

Categories: Programming Tags: c++, data types, range violations

void main() v/s int main()

January 8th, 2009 born2c0de 2 comments

Many people use void main() instead of int main() while writing C/C++ programs, inspite of void not being accepted as a return type for main in the C/C++ standard.

I too used to do the same until some programmers on a programming forum told me a few years ago about the C++ Standard.

But what does void do anyway? How is it different from return 0?

I disassembled a few test programs to check what really happens under the hood. It is interesting to know what happens when a program is executed.

Almost everybody believes that main()/WinMain() is the entrypoint of all C/C++ programs but it isn’t so.

On Windows, a start() function gets called before the main() function. It first calls GetVersion(), GetCommandLine(), GetEnvironmentStrings(), GetStartupInfo() and GetModuleHandle() API Functions from kernel32.dll in the exact order. Then it passes the Module Handle, command-line arguments and environment variables as arguments to main() and then calls it.

But what about main()’s return value? Is it read after main() quits?

It doesn’t matter what data type you return from main(). Whether its int or void it doesn’t make any difference.
Here’s what actually happens.
In an executable file, the start() function which calls the main() function, expects a return value of data type integer so that it may decide what argument is passed to the exit() function which is called after main().
If void is chosen as the return type for main, the EAX register is passed as an argument to the exit function.
Since the EAX register almost always contains the return value of any function, so if we say return 0, EAX would be set to zero, that’s all.
Generally whenever void is used, EAX is set to zero at the end of main() , so it is exactly the same as passing return 0.
But since void main() is not a part of the standard, compiler developers are given full freedom to design their own implementation of the code after main() returns. Hence, we cannot always assume that EAX is set to 0 with a void main().

Ever wondered what happens to the return value after main() quits?
Take a look at this disassembled snippet of the start() function from a C/C++ program.

call dword ptr [esi+18h] ; call main()
add esp, 0Ch
; (4 x 3) 12 bytes are cleared from the stack to clear space
; occupied by main()'s three arguments.
push eax ; status for exit. Return value of main() is pushed
call _exit

So no matter what you do, even if you ignore the return value type, the compiler will still pass the contents of the EAX Register into the exit function. This could lead to some unwanted results, and hence it is better to return 0 (ie. EXIT_SUCCESS) or 1(EXIT_FAILURE) depending on when and why you have designed the program to end.

So even though, it is almost always the same, there is a reason why the standard recommends using int. Simply for the following reasons:
returning 0 by default is COMPILER DEPENDENT.
No Guarantee about the content of the EAX Register after main() exits.
Will not be portable to all Operating Systems.
May cause incorrect termination of main()

So even though void main and return 0 are alike, we should try to avoid its use.

Categories: Programming Tags: c++, code entrypoint, int main, standards, void main

Sanchit's Blog

Archive

String Termination in C/C++

Passing 2D Arrays to functions

Reversing a String recursively

Consequences of Data type Range Violations

void main() v/s int main()

Recent Posts

Archives

Categories

Blogroll