zv7qrnb
Sanchit's Blog » strings

Archive

Posts Tagged ‘strings’

String Termination in C/C++

January 9th, 2009 No comments

This is a typical assignment given to students who are learning C/C++.
“Write a function that copies the contents of one string into another.”
Note that in this example a string is referred to as a character array.

Some people come up with code similar to this one:

void copy(char [],char []);

int main()
{
   char s1[10],s2[10];
   printf(”Enter String 1:”);
   gets(s1);
   copy(s1,s2);
   printf(”nThe copied string is: “);
   puts(s2);
   return 0;
}

void copy(char x[],char y[])
{
   int i=0;
   while(x[i])
   {
      y[i]=x[i];
      i++;
   }
}

Can you notice that something is missing in the copy() function? Yes, the null termination character was not appended at the end of the new string. I corrected a friend of mine who the same mistake and he asked me why the code works for other strings as well such as “Sanch”.

Luckily, we were near a computer so I could demonstrate my examples straight away to convince him.

Observe the declaration of this character array:

char name[10]=”Sanchit”;

Internally, this String is stored like this:
S,a,n,c,h,i,t,,,

Yes, if the string initial declaration is less than the size provided in the square brackets, the rest of the elements are filled with zeros. This is applicable for arrays of any data type.

Since the ASCII Value of is 0, I can represent my array like this again,

S,a,n,c,h,i,t,0×0,0×0,0×0

Does this ring any bells?

If you create another array of the same size (here: 10) and copy this string to it using the code given above, the rest of the elements after the String are already . So there is no need to the character as C would know where the string terminates.

But what if the String is larger? Assuming “Sanchit Karve” is stored in the same Array, it will look like this:
S,a,n,c,h,i,t, <space> ,K,a,r,v,e, , , …

If you notice, that inspite of the string being larger than the array size, it is stored completely after occupying the space after the array. Since, now the space outside the array has been accessed, the data out there is not zero. Instead they contain garbage values. So, for such situations, we need to append the character at the end of the string, so that C can figure where the string ends. Otherwise, C will output all bytes till a zero is encountered.

So, appending the character is not required if we assume that the string length is lesser the array size. However, we should also ensure that we declare enough space before input larger than the allocated size because of the following reasons:

  • Unlike other Languages C/C++ do not provide Array Bound Checking features.
  • Hackers make the most of such types of errors, known as Buffer Overflow Errors, and launch Buffer Overflow Attacks where the string is inserted with shellcode (yes, in hex), and the return address of the function is overwritten to the address of the string. This results in the processor passing control to the shellcode and executing it after the function returns. But just because we append the character does not mean that Buffer Overflow Errors don’t occur. But it’s done just as a safety measure so that the program can run without faults.
Categories: Programming Tags: ,

Reversing a String recursively

January 9th, 2009 No comments

When you’re asked to reverse a string, you’ll mostly use the strrev() function or write your own boring implementation using loops.
Ever tried it recursively?

Have a look at this:
#include <iostream>

using namespace std;

void ret_str(char* s)
{

if(*s != ”)
ret_str(s+1);

cout << *s;

}

int main()
{
ret_str(”born2c0de”);
return 0;
}

Isn’t that some neat piece of code? All we have to do is push the next character to the stack, so when the stack is popped, the characters come out in reverse order.

However this functions isn’t efficient, it’s horribly slow and sluggish and larger strings will result in overloading the stack so you’re better off using the functions which use loops internally.

Categories: Programming Tags: , ,

Microsoft does it differently

January 8th, 2009 1 comment

A while back I was studying the Len() function of VB6 and I came across something interesting.

Contrary to our belief that this function counts the number of characters in a string and returns it, it actually does something totally different.

Here’s how it works.

When any string is stored in VB, it is automatically stored in this format (in unicode):

Suppose the String is “ABC”:

06 00 00 00 41 00 42 00 43 00

The 41 to 43 part is a typical unicode style of storing strings, but 2 unicode characters before that, the number of bytes occupied by the string (including zeros) is stored (which is similar to Pascal style strings).
Hence 06 stands for 6 bytes occupied by “ABC” (since it’s stored as A,0,B,0,C,0)

So all that the Len() does is read 4 bytes before the beginning of the string and return that value itself, instead of calculating the length of the string.

So actually, the Len Function does nothing except read the length from the string format and return it.
Want to see how they do it?
Here’s the Disassembled Listing of the Len() Function MSVBVM60.DLL DLL File.

__vbaLenBstr proc near
   string = dword ptr 4

   mov eax, [esp+string] ; eax points to string
   test eax, eax ; ZF,SF,PF = EAX and EAX
   jz short break ; If String is Null then break from loop
   mov eax, [eax-4] ; Gets Unicode Length stored before string.

   ; Here’s how Text from textbox is stored internally:
   ; If text is born2c0de then in memory:
   ; (0×12 0×00) 0×00 0×00 (0×62 0×00 …)
   ; ie. length of string in bytes(unicode) (here 18 bytes)
   ; followed by a Unicode 0 (0×00 0×00)
   ; followed by the actual string in unicode
   ; (’b’ 00 ‘o’ 00 … ‘e’ 00)
   ; So [eax-4] just gets the unicode length of
   ; string which already is stored when a
   ; string is taken as input from keyboard.

   ; The Len() function doesn’t even calculate length!!!
   shr eax, 1 ; Divides Length by 2 to get Actual Length.
   ; Uses eax so it can be used as a return value.

break:
   retn 4
__vbaLenBstr endp