Skip to main content

1 Number Representation

Number Representation​

Unsigned​

βˆ‘i=0wβˆ’1xi2i\sum_{i=0}^{w-1} x_i2^i

Signed-Magnitude​

"first" bit gives sign, rest treated as unsigned (magnitude)

Biased Notation​

The actual value is the binary value plus a fixed bias

Two’s Complement​

βˆ’xwβˆ’12wβˆ’1+βˆ‘i=0wβˆ’2xi2i-x_{w-1}2^{w-1}+\sum_{i=0}^{w-2} x_i2^i

Logical Operations​

Shift operations

  • Left Shift: throw away extra bits on left, fill with 0's on the right
  • Right Shift: throw away extra bits on right
    • logical shift: fill with 0's on the left
    • arithmetic shift: replicate most significant bit(x >> k gives⌊x/2kβŒ‹\lfloor x/2^k \rfloor towards negative infinity)

Floating Point Representation​

Definition​

(βˆ’1)sΓ—(1+Significand)Γ—2Eβˆ’bias(-1)^s\times(1+Significand)\times2^{E-bias}

20220310163749

  • S represents Sign

    1 for negative, 0 for positive

  • Significand

    implicit leading 1, signed-magnitude (not 2’s complement)

  • Exponent(biased notation)

    Idea: we want floating point numbers to look small when their actual value is small

    βˆ’(2kβˆ’1βˆ’1)β†’2k-(2^{k-1}-1)\rightarrow2^k(bias of 127 for 32bits, 1023 for 64bits)

Special Cases​

20220615204217

Exponent(Biased)SignificandObject
00Β±0\pm0
0nonzeroDenorm
1-254aynthingNormal Floating Point
2550±∞\pm\infty
255NonzeroNaN

Overflow and Underflow​

  • Overflow (>2128>2^{128}or<βˆ’2128<-2^{128})
  • Underflow (βˆ’2149<x<2149-2^{149}<x<2^{149} without 0)

0,infinite and NAN​

  • 0:Bit pattern all 0s
  • ∞\infty(1Γ·01\div0)
    • Sign bit 0 or 1, largest exponent (all 1s), 0 in fraction
  • NaN(βˆžβˆ’βˆž\infty-\infty,0Γ·0,βˆ’40\div0,\sqrt{-4})
    • Sign bit 0 or 1, largest exponent (all 1s), not zero in fraction

op(NaN, X) = NaN

20220310173933

Demorms​

Denormalized number:

  • no (implied) leading 1(just Significand),frac nonzero
  • exponent all 0,value = 1 – Bias (instead of 0 – Bias)

Special Cases:

  • Smallest denorm: Β±0.0...01Γ—2βˆ’126=Β±2βˆ’149\pm0.0...01\times2^{-126} = \pm2^{-149}
  • Largest denorm: Β±0.1...1Γ—2βˆ’126=Β±2(βˆ’126βˆ’2βˆ’149)\pm0.1...1\times2^{-126} = \pm2(^{-126}-2^{-149})
  • Smallest norm: Β±1.0...0Γ—2βˆ’126=Β±2βˆ’126\pm1.0...0\times2^{-126}=\pm 2^{-126}
  • Largest norm: Β±1.1...1Γ—2127=Β±(2128βˆ’2104)\pm1.1...1\times2^{127}=\pm (2^{128}-2^{104})