FDOT (2-way, multiple and single vector, FP8 to FP16)

Multi-vector 8-bit floating-point dot-product by vector to half-precision

The instruction computes the fused sum-of-products of a group of two 8-bit floating-point values held in the corresponding 16-bit elements of the two or four first source vectors and the second source vector. The half-precision sum-of-products are scaled by 2-UInt(FPMR.LSCALE[3:0]), before being destructively added without intermediate rounding to the corresponding half-precision elements of the ZA single-vector groups. The 8-bit floating-point encoding format for the elements of the first source vector and the second source vector is selected by FPMR.F8S1 and FPMR.F8S2 respectively.

The single-vector group within each half of or each quarter of the ZA array is selected by the sum of the vector select register and offset , modulo half or quarter the number of ZA array vectors.

The vector group symbol, VGx2 or VGx4, indicates that the ZA operand consists of two or four ZA single-vector groups respectively. The vector group symbol is preferred for disassembly, but optional in assembler source code.

This instruction is unpredicated.

It has encodings from 2 classes: Two ZA single-vectors and Four ZA single-vectors

Two ZA single-vectors
(FEAT_SME_F8F16)

313029282726252423222120191817161514131211109876543210
110000010010Zm0Rv100Zn01off3
opc

Encoding

FDOT ZA.H[<Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_F8F16) then EndOfDecode(Decode_UNDEF); constant integer v = UInt('010':Rv); constant integer n = UInt(Zn); constant integer m = UInt('0':Zm); constant integer offset = UInt(off3); constant integer nreg = 2;

Four ZA single-vectors
(FEAT_SME_F8F16)

313029282726252423222120191817161514131211109876543210
110000010011Zm0Rv100Zn01off3
opc

Encoding

FDOT ZA.H[<Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_F8F16) then EndOfDecode(Decode_UNDEF); constant integer v = UInt('010':Rv); constant integer n = UInt(Zn); constant integer m = UInt('0':Zm); constant integer offset = UInt(off3); constant integer nreg = 4;

Assembler Symbols

<Wv>

Is the 32-bit name of the vector select register W8-W11, encoded in the "Rv" field.

<offs>

Is the vector select offset, in the range 0 to 7, encoded in the "off3" field.

<Zn1>

Is the name of the first scalable vector register of the first source multi-vector group, encoded as "Zn".

<Zn2>

Is the name of the second scalable vector register of the first source multi-vector group, encoded as "Zn" plus 1 modulo 32.

<Zm>

Is the name of the second source scalable vector register Z0-Z15, encoded in the "Zm" field.

<Zn4>

Is the name of the fourth scalable vector register of the first source multi-vector group, encoded as "Zn" plus 3 modulo 32.

Operation

CheckFPMREnabled(); CheckStreamingSVEAndZAEnabled(); constant integer VL = CurrentVL; constant integer elements = VL DIV 16; constant integer vectors = VL DIV 8; constant integer vstride = vectors DIV nreg; constant bits(32) vbase = X[v, 32]; integer vec = (UInt(vbase) + offset) MOD vstride; bits(VL) result; for r = 0 to nreg-1 constant bits(VL) operand1 = Z[(n+r) MOD 32, VL]; constant bits(VL) operand2 = Z[m, VL]; constant bits(VL) operand3 = ZAvector[vec, VL]; for e = 0 to elements-1 constant bits(16) op1 = Elem[operand1, e, 16]; constant bits(16) op2 = Elem[operand2, e, 16]; bits(16) sum = Elem[operand3, e, 16]; sum = FP8DotAddFP(sum, op1, op2, FPCR, FPMR); Elem[result, e, 16] = sum; ZAvector[vec, VL] = result; vec = vec + vstride;


Internal version only: aarchmrs v2024-12_rel, pseudocode v2024-12_rel ; Build timestamp: 2024-12-15T22:18

Copyright © 2010-2024 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.