Consequences of Data type Range Violations
I’ve come across a lot of people who wonder why a number stored in a variable, if exceeds its range, turns into a weirder negative number.
To explain this phenomena they offer a conclusion without showing any evidence and this conclusion is in no way close to the truth.
Here’s what happens.
The numbers change because of the complement notation that the computer uses to store negative numbers.
And it’s not the compiler’s fault but the processor’s limitation.
This code will give negative numbers on 32-bit,16-bit and 8bit Microprocessors as each register can hold 32/16/8 bits respectively.
You won’t encounter this problem on 64-bit processors as each register can hold a maximum of 64-bits. (Actually, you will encounter the same problem since the value of INT_MAX will be a 64-bit number for a 64 bit processor….but it’ll work fine with 32-bit numbers)
Adding a Number to a maximum value set in these registers creates the problem.
Have a look at this printf statement (I’ve typecasted it for portability)
printf(”%d = %xn%d = %x”,INT_MAX,INT_MAX,(unsigned long)INT_MAX+1,(unsigned long)INT_MAX+1);
The compiler generates the following code:
push 10000000000000000000000000000000b push 10000000000000000000000000000000b push 1111111111111111111111111111111b push 1111111111111111111111111111111b push offset aDXDX ; format call _printf add esp, 14 … … aDXDX db ‘%d = %x’,0Ah db ‘%d = %x’,0
Now, the values in the first 2 push Instructions is the binary representation of INT_MAX (2147483647 for 32-bit systems). Note that it is comprised of only ONE’s.
When 1 is added to INT_MAX, look what happens to the next two push instructions.
This number is 2147483648 but in unsigned notation…but in signed notation it’s -2147483648 since the Most Significant Bit decides the Sign of a Number.
And since by default the compiler assumes that a variable is signed, you get the negative value.
Try the same C code after replacing %d with the %u format specifier.
You’ll notice that 2147483648 will be the output.
Remember, there’s nothing like postive or negative numbers…It’s all about interpretation. 111 (for a 3 bit architecture) could mean 7(without using MSB for sign-convention…ie. unsigned) as well as -1 (using MSB for sign ie. signed).
You might wonder why I’ve blamed the Processor and not the compiler in spite of the fact that the Compiler has precalculated the result of addition and passed it to printf.
The reason is that I’ve compiled the above code in Aggressive Optimization, so the compiler generates push instructions.
Normally the Compiler plays safe and lets the processor take care of such values by generating the following code:
mov eax, 1111111111111111111111111111111b lea edx, [eax+1] push edx