FMOP4A (widening, 4-way)

8-bit floating-point quarter-tile sums of four outer products, accumulating

This instruction generates four independent quarter-tile 8-bit floating-point sums of outer products from the sub-matrices in the half-vectors of the one or two first and second source vectors and accumulates the results to the corresponding elements of a 32-bit element ZA tile.

Each of the quarter-tile sums of outer products is generated by multiplying the SVLS÷2 × 4 sub-matrix of 8-bit floating-point values held in the half-vectors of the first source vectors by the 4 × SVLS÷2 sub-matrix of 8-bit floating-point values held in the half-vectors of the second source vectors. Each 32-bit container of the first source vectors holds 4 elements of each row of a SVLS÷2 × 4 sub-matrix. Similarly, each 32-bit container of the second source vectors holds 4 elements of each column of a 4 × SVLS÷2 sub-matrix.

This instruction widens the sub-matrices of 8-bit floating-point values held in the first source vectors to single-precision floating-point values and multiplies them by the corresponding widened sub-matrices of 8-bit floating-point values in the second source vectors to single-precision floating-point values. The resulting quarter-tile SVLS÷2 × SVLS÷2 single-precision sums of outer products are scaled by 2-UInt(FPMR.LSCALE), before being destructively added to the single-precision floating-point destination tile. This is equivalent to performing a downscaled 4-way dot product and accumulate to each of the destination tile elements.

The 8-bit floating-point encoding format for the elements of the first source vector and the second source vector is selected by FPMR.F8S1 and FPMR.F8S2 respectively.

This instruction is unpredicated.

It has encodings from 4 classes: Single and multiple vectors , Single vectors , Multiple and single vectors and Multiple vectors

Single and multiple vectors
(FEAT_SME_MOP4 && FEAT_SME_F8F32)

313029282726252423222120191817161514131211109876543210
100000000011Zm00000000Zn0000ZAda
MN

Encoding

FMOP4A <ZAda>.S, <Zn>.B, { <Zm1>.B-<Zm2>.B }

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_MOP4) || !IsFeatureImplemented(FEAT_SME_F8F32) then EndOfDecode(Decode_UNDEF); constant integer n = UInt('0':Zn:'0'); constant integer m = UInt('1':Zm:'0'); constant integer nreg = 1; constant integer mreg = 2; constant integer da = UInt(ZAda);

Single vectors
(FEAT_SME_MOP4 && FEAT_SME_F8F32)

313029282726252423222120191817161514131211109876543210
100000000010Zm00000000Zn0000ZAda
MN

Encoding

FMOP4A <ZAda>.S, <Zn>.B, <Zm>.B

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_MOP4) || !IsFeatureImplemented(FEAT_SME_F8F32) then EndOfDecode(Decode_UNDEF); constant integer n = UInt('0':Zn:'0'); constant integer m = UInt('1':Zm:'0'); constant integer nreg = 1; constant integer mreg = 1; constant integer da = UInt(ZAda);

Multiple and single vectors
(FEAT_SME_MOP4 && FEAT_SME_F8F32)

313029282726252423222120191817161514131211109876543210
100000000010Zm00000001Zn0000ZAda
MN

Encoding

FMOP4A <ZAda>.S, { <Zn1>.B-<Zn2>.B }, <Zm>.B

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_MOP4) || !IsFeatureImplemented(FEAT_SME_F8F32) then EndOfDecode(Decode_UNDEF); constant integer n = UInt('0':Zn:'0'); constant integer m = UInt('1':Zm:'0'); constant integer nreg = 2; constant integer mreg = 1; constant integer da = UInt(ZAda);

Multiple vectors
(FEAT_SME_MOP4 && FEAT_SME_F8F32)

313029282726252423222120191817161514131211109876543210
100000000011Zm00000001Zn0000ZAda
MN

Encoding

FMOP4A <ZAda>.S, { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_MOP4) || !IsFeatureImplemented(FEAT_SME_F8F32) then EndOfDecode(Decode_UNDEF); constant integer n = UInt('0':Zn:'0'); constant integer m = UInt('1':Zm:'0'); constant integer nreg = 2; constant integer mreg = 2; constant integer da = UInt(ZAda);

Assembler Symbols

<ZAda>

Is the name of the ZA tile ZA0-ZA3, encoded in the "ZAda" field.

<Zn>

Is the name of the first source scalable vector register, registers in the range Z0-Z15, encoded as "Zn" times 2.

<Zm1>

Is the name of the first scalable vector register of the second source multi-vector group, in the range Z16-Z31, encoded as "Zm" times 2 plus 16.

<Zm2>

Is the name of the second scalable vector register of the second source multi-vector group, in the range Z16-Z31, encoded as "Zm" times 2 plus 17.

<Zm>

Is the name of the second source scalable vector register, registers in the range Z16-Z31, encoded as "Zm" times 2 plus 16.

<Zn1>

Is the name of the first scalable vector register of the first source multi-vector group, in the range Z0-Z15, encoded as "Zn" times 2.

<Zn2>

Is the name of the second scalable vector register of the first source multi-vector group, in the range Z0-Z15, encoded as "Zn" times 2 plus 1.

Operation

CheckFPMREnabled(); CheckStreamingSVEAndZAEnabled(); constant integer VL = CurrentVL; constant integer hvsize = VL DIV 2; constant integer dim = hvsize DIV 32; constant integer tilesize = 4*dim*dim*32; constant bits(tilesize) op3 = ZAtile[da, 32, tilesize]; bits(tilesize) result; for outprod = 0 to 3 constant integer row_hv = outprod DIV 2; constant integer col_hv = outprod MOD 2; constant integer row_base = row_hv * dim; constant integer col_base = col_hv * dim; constant bits(VL) op1 = Z[n + (nreg-1)*col_hv, VL]; constant bits(VL) op2 = Z[m + (mreg-1)*row_hv, VL]; for row = 0 to dim-1 for col = 0 to dim-1 constant integer row_idx = row_base + row; constant integer col_idx = col_base + col; constant integer tile_idx = row_idx * dim * 2 + col_idx; bits(32) sum = Elem[op3, tile_idx, 32]; bits(32) rowop; bits(32) colop; for i = 0 to 3 Elem[rowop, i, 8] = Elem[op1, 4*row_idx + i, 8]; Elem[colop, i, 8] = Elem[op2, 4*col_idx + i, 8]; sum = FP8DotAddFP(sum, rowop, colop, FPCR, FPMR); Elem[result, tile_idx, 32] = sum; ZAtile[da, 32, tilesize] = result;


Internal version only: aarchmrs v2024-12_rel, pseudocode v2024-12_rel ; Build timestamp: 2024-12-15T22:18

Copyright © 2010-2024 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.