Description:
For integer:
For float:
- Fixed-point Binary
- The float is divided into 2 integers, 1 before decimal and 1 after
- Each is individually represented with their binary
- The number of bits for each side is agreed beforehand
- Floatting-point Number
- floats to the right of the most significant 1
- ±M×BE
- Mantissa * base ^ exponent
- There are different standard:
- IEEE-754: demo
- 1 sign bit (pos/neg)
- 8 bias exponent
- value of 8 bias exponent = real exponent +127
- to be able to repre negative exponent, ex: 2−3→E=124
- meaning the most left bit is -1,
- 24 normalized mantissa
- 1 hidden (always 1) + 23 fraction bit (start from 1/2,1/4,…)
- meaning the mantissa has value > 1
- To calculate: turn a to x/y*2^n
- where y is min
- and x is min that larger than y
- Can be different precision:
- Single-precision:
- 32 bits
- 1 sign bit, 8 exponent bits, 23 fraction bits
- bias = 127
- Double-precision:
- 64 bits
- 1 sign bit, 11 exponent bits, 52 fraction bits
- bias = 1023