Float8
For conceptual model usage and type mapping, see:
- class pyrogue.Float8(bitSize)[source]
Model class for 8-bit E4M3 floating point numbers (NVIDIA FP8).
- Parameters:
bitSize (
int) – Number of bits being represented. Must be 8.args (Any)
kwargs (Any)
- Return type:
Any
Notes
Format: 1 sign bit, 4 exponent bits, 3 mantissa bits (E4M3). Bias = 7. No infinity representation. NaN encoded as 0x7F. Maximum representable value is 448.0. Supported by NVIDIA Hopper (H100) and Blackwell GPUs.