Loading and Storing of Short Floats

Name:
LDSF $X,$Y,$ZLDSF $X,$Y,Z
STSF $X,$Y,$ZSTSF $X,$Y,Z

Specification:
LDSF: f($X) ← f(M4[$Y + $Z])
STSF: f(M4[$Y + $Z]) ← f($X)

Timing:

1υ + 1μ

Description:

Loads respectively stores a 32-bit floating point number. Before storing, respectively after loading, a conversion from, respectively to, the 64-bit floating point format is done.

LDSF: Register $X is set to the 64-bit floating point number corresponding to the 32-bit floating point number represented by M4 [$Y + $Z] or M4 [$Y + Z]. No arithmetic exceptions occurs, not even if a signaling NaN is loaded.
STSF: The value obtained by rounding register $X to a 32-bit floating point number is placed in M4 [$Y + $Z] or M4 [$Y + Z]. Rounding is done with the current rounding mode, in a manner exactly analogous to the standard conventions for rounding 64-bit results, except that the precision and exponent range are limited. In particular, floating overflow, underflow, and inexact exceptions might occur; a signaling NaN will trigger an invalid exception and it will become quiet. The fraction part of a NaN is truncated if necessary to a multiple of 2-23 , by ignoring the least significant 29 bits. If we load any two short floats and operate on them once with either FADD, FSUB, FMUL, FDIV, FREM, FSQRT, or FINT, and if we then store the result as a short float, we obtain the results required by the IEEE standard for single format arithmetic, because the double format can be shown to have enough precision to avoid any problems of "double rounding". But programmers are usually better off sticking to 64-bit arithmetic unless they have a strong reason to emulate the precise behavior of a 32-bit computer; 32 bits do not offer much precision.

See also:

LDT and STT (for 32-bit fixed point numbers).