Magic Numbers

Addison Wesley - Hacker’s Delight

The basic trick is to multiply by a sort of reciprocal of the divisor d, approximately $2^{32}/d$
, and then to extract the leftmost 32 bits of the product. The details, however, are more complicated, particularly for certain divisors such as 7.

Let us first consider a few specific examples. These illustrate the code that will be generated by the general method. We denote registers as follows:

n - the input integer(numerator)

M - loaded with a “magic number”

t - a temporary register

q - will contain the quotient

r - will contain the remainder

Signed Divide by 3

li		M,0x55555556	Load magic numbers, (2**32 + 2) / 3.
mulhs	q,M,n		q = floor(M*n / 2**32)
shri  	t,n,31        Add 1 to q if 
add   	q,q,t         n is negative. 
muli  	t,q,3         Compute remainder from 
sub   	r,n,t         r = n - q*3.

Proof. The multiply high signed operation (mulhs) cannot overflow, as the product of two 32-bit integers can always be represented in 64 bits and mulhs gives the high-order 32 bits of the 64-bit product. This is equivalent to dividing the 64-bit product by $2^{32}$ and taking the floor of the result, and this is true whether the product is positive or negative. Thus, for n $\ge$0 the above code computes

$q = \lfloor\frac{2^{32}+2}{3}\frac{n}{2^{32}}\rfloor = \lfloor\frac{n}{3}+\frac{2n}{3·2^{32}}\rfloor.$

Now, $n<2^{31}$ , because $2^{31}-1$ is the largest representable positive number. Hence the “error” term $2n/(3·2^{32})$ is less than 1/3 (and is nonnegative), so we have $q=\lfloor n/3\rfloor$ .

For n<0, there is an addition of 1 to the quotient. Hence the code computes

$q = \lfloor\frac{2^{32}+2}{3}\frac{n}{2^{32}}\rfloor+1=\lfloor\frac{2^{32}n+2n+3·2^{32}}{3·2^{32}}\rfloor=\lceil\frac{2^{32}n+2n+1}{3·2^{32}}\rceil,$

where we have used Theorem:

For n, d integers, d>0,

$\lfloor\frac{n}{d}\rfloor=\lceil\frac{n-d+1}{d}\rceil\ and\ \lceil\frac{n}{d}\rceil=\lfloor\frac{n+d-1}{d}\rfloor.$

If d<0:

$\lfloor\frac{n}{d}\rfloor=\lceil\frac{n-d-1}{d}\rceil\ and\ \lceil\frac{n}{d}\rceil=\lfloor\frac{n+d+1}{d}\rfloor.$

Hence

$q=\lceil\frac{n}{3}+\frac{2n+1}{3·2^{32}}\rceil.$

For $-2^{31}\le n\le-1$ ,

$-\frac{1}{3}+\frac{1}{3·2^{32}}\le\frac{2n+1}{3·2^{32}}\le-\frac{1}{3·2^{32}}.$

This establishes that the quotient is correct. That the remainder is correct follows easily from the fact that the remainder must satisfy $n=qd+r$ , the multiplication by 3 cannot overflow (because $-2^{31}/3\le q\le(2^{31}-1)/3$ ), and the subtract cannot overflow because the result must be in the range -2 to +2.

The multiply immediate can be done with two add’s, or a shift and an add, if either gives an improvement in execution time.

On many present-day RISC computers, the quotient can be computed as shown above in nine or ten cycles, whereas the divide instruction might take 20 cycles or so.

Unsigned Divide by 3

For a non-power of 2, let us first consider unsigned division by 3 on a 32-bit machine. Because the dividend n can now be as large as $2^{32}-1$ , the multiplier $(2^{32}+2)/3$ is inadequate, because the error term $2n/3·2^{32}$ can exceed 1/3. However, the multiplier $(2^{33}+1)/3$ is adequate. The code is

li    M,0xAAAAAAAB  Load magic number, (2**33+1)/3. 
mulhu q,M,n         q = floor(M*n/2**32). 
shri  q,q,1 
 
muli  t,q,3         Compute remainder from 
sub   r,n,t         r = n - q*3.

An instruction that gives the high-order 32 bits of a 64-bit unsigned product is required, which we show above as mulhu.

To see that the code is correct, observe that it computes

$q=\lfloor\frac{2^{33}+1}{3}\frac{n}{2^{33}}\rfloor=\lfloor\frac{n}{3}+\frac{n}{3·2^{33}}\rfloor.$

For $0\le n<2^{32},0\le n/(3·2^{33})<1/3$ , so by Theorem:

For n, d integers, $d\ne0$ , and x real,

$\lfloor\frac{n}{d}+x\rfloor=\lfloor\frac{n}{d}\rfloor\ if\ 0\le x<|\frac{1}{d}|,\ and\ \lceil\frac{n}{d}+x\rceil=\lceil\frac{n}{d}\rceil\ if -|\frac{1}{d}|<x\le0.$

$q=\lfloor n/3\rfloor$ .
In computing the remainder, the multiply immediate can overflow if we regard the operands as signed integers, but it does not overflow if we regard them and the result as unsigned. Also, the subtract cannot overflow, because the result is in the range 0 to 2, so the remainder is correct.