UTMOPA (4-way)

Unsigned integer sparse sum of four outer products, accumulating

This instruction generates unsigned integer sum of outer products by multiplying the 2-in-4 selected elements from the dense sub-matrices in the two first source vectors with the corresponding elements of the compressed sparse sub-matrix in the second source vector and accumulates the results to the corresponding elements of a 32-bit element ZA tile.

The sum of outer products is generated by multiplying the selected two sets of 2-in-4 8-bit unsigned values from each overlapping 32-bit containers of the two SVLS×4 sub-matrices in the first source vectors by the four 8-bit unsigned values from the corresponding 32-bit container of the 4×SVLS sub-matrix in the second source vector. The four selected elements from the overlapping 32-bit containers of the first source vectors correspond to two sets of 2-in-4 elements of rows of two SVLS×4 sub-matrices. Each 32-bit container of the second source vector holds 4 elements of columns of a compressed 4×SVLS sub-matrix.

The two sets of 2-in-4 8-bit unsigned values from overlapping 32-bit containers of the first source vectors are selected by pairs of 4-bit controls in the indexed segment of the control vector register. If the control bit corresponding to an element in the first source vectors is 0, the element is discarded and does not contribute to the sum of products result. If more than two bits of the 4-bit control corresponding to each 32-bit container of the first source vectors are 1, only the elements corresponding to the least two significant bits are selected.

The resulting SVLS×SVLS widened 32-bit integer sum of outer products is then destructively added to the 32-bit integer destination tile. This is equivalent to performing a 4-way dot product and accumulate to each of the destination tile elements.

This instruction is unpredicated.

SME2
(FEAT_SME_TMOP)

313029282726252423222120191817161514131211109876543210
10000001011Zm100KZkZni200ZAda
u0u1

Encoding

UTMOPA <ZAda>.S, { <Zn1>.B-<Zn2>.B }, <Zm>.B, <Zk>[<index>]

Decode for this encoding

if !IsFeatureImplemented(FEAT_SME_TMOP) then EndOfDecode(Decode_UNDEF); constant integer n = UInt(Zn:'0'); constant integer m = UInt(Zm); constant integer k = UInt('1':K:'1':Zk); constant integer index = UInt(i2); constant integer da = UInt(ZAda); constant boolean op1_unsigned = TRUE; constant boolean op2_unsigned = TRUE;

Assembler Symbols

<ZAda>

Is the name of the ZA tile ZA0-ZA3, encoded in the "ZAda" field.

<Zn1>

Is the name of the first scalable vector register of the first source multi-vector group, encoded as "Zn" times 2.

<Zn2>

Is the name of the second scalable vector register of the first source multi-vector group, encoded as "Zn" times 2 plus 1.

<Zm>

Is the name of the second source scalable vector register, encoded in the "Zm" field.

<Zk>

Is the name of the control vector register Z20-Z23 or Z28-Z31, encoded in the "K:Zk" fields.

<index>

Is the control segment index, in the range 0 to 3, encoded in the "i2" field.

Operation

CheckStreamingSVEAndZAEnabled(); constant integer VL = CurrentVL; constant integer dim = VL DIV 32; constant integer csize = VL DIV 4; constant bits(VL) op2 = Z[m, VL]; constant bits(VL) op3 = Z[k, VL]; constant bits(csize) ctrl = Elem[op3, index, csize]; constant bits(dim*dim*32) op4 = ZAtile[da, 32, dim*dim*32]; bits(dim*dim*32) result; for row = 0 to dim-1 for col = 0 to dim-1 array [0..3] of bits(8) erow; array [0..3] of bits(8) ecol; for j = 0 to 3 erow[j] = Zeros(8); ecol[j] = Elem[op2, 4*col + j, 8]; for r = 0 to 1 constant bits(VL) op1 = Z[n+r, VL]; integer i = 0; for e = 0 to 3 if i < 2 && Elem[ctrl, 8*col + 4*r + e, 1] == '1' then erow[2*r + i] = Elem[op1, 4*row + e, 8]; i = i + 1; bits(32) sum = Elem[op4, row*dim+col, 32]; for j = 0 to 3 sum = sum + (Int(erow[j], op1_unsigned) * Int(ecol[j], op2_unsigned)); Elem[result, row*dim+col, 32] = sum; ZAtile[da, 32, dim*dim*32] = result;

Operational information

If PSTATE.DIT is 1:


Internal version only: aarchmrs v2024-12_rel, pseudocode v2024-12_rel ; Build timestamp: 2024-12-15T22:18

Copyright © 2010-2024 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.