Sieving

Once our implementation of the Gaussian integer method was working, we
began searching for (*c*_{1},*c*_{2}) pairs of integers whose residues were
smooth. All values of *c*_{1} in the interval [1,95000] were
considered. We chose to look at only positive values of *c*_{1} for
simplicity; negative values of *c*_{1} are acceptable as far
as the algorithm is concerned. Our sieve size (the set of considered
*c*_{2} values for each value of *c*_{1}) was
.
Again, only
positive *c*_{2} values were considered. These choices were definitely
not optimal.

Since -1 is a quadratic nonresidue modulo *p*, while -2 is a residue, we
chose *r* = -2. (This simplified the algorithm, since
has class number 1.) By factoring *x*^{2}+2 modulo *p*, we obtained a
value of *S* such that
.
We computed the
convergents to the continued fraction of *S*/*p*, and selected *V*and *T* as follows:

These specific values of

For each fixed value of *c*_{1}, the sieve initialized an integer array
*C* of size
,
representing the possible values of *c*_{2}.
For each prime power *q*^{h}, value
was
computed. The contents of the array were incremented by
at every location
.
(The
factor of 1000 and the floor function
are used
so that we may maintain three decimal digits of accuracy yet perform
integer, instead of floating point, calculations.) When the loop over
prime powers *q*^{h} had finished, *C*[*c*_{2}] contained the (approximate)
real logarithm of the smooth part of the residue *c*_{1}*V*-*c*_{2}*T*.

After all of the small primes had been sieved for a fixed value of
*c*_{1}, any *c*_{2} for which *C*[*c*_{2}] (i.e., the sum of the
logarithms of its small prime factors) was close to the actual (real)
logarithm of the residue
*c*_{1}*V* - *c*_{2}*T* was considered ``interesting.''
If

(10) |

then the (

(11) |

the (

Even though we had collected both partially- and fully-smooth pairs of
(*c*_{1},*c*_{2}) values, we ended up using only the fully-smooth pairs to
generate equations. We found enough fully-smooth pairs in the search
space to generate the needed equations; it was not necessary to resort
to a ``large prime variation'' and allow equations with one or two large
prime factors [27]. (The large prime variation would have
increased the density of the set of equations which have to be solved,
which was undesirable.) On average, for each value of *c*_{1} we found
17.27 values of *c*_{2} in the interval
[1,300000] such that (*c*_{1},*c*_{2})was fully-smooth. Overall, we found about
fully-smooth (*c*_{1},*c*_{2}) pairs (out of
checked
values), or about one fully-smooth pair per
possible
pairs.

For each fully-smooth pair (*c*_{1},*c*_{2}), the corresponding equation:

was factored (if possible) over the extended (complex) factor base. It was likely that

Once factored, the equations were sorted, and duplicate appearances of
equations were removed. We found two causes of duplicate equations.
The first cause was two distinct (*c*_{1},*c*_{2}) pairs of integers which
happened to have the same residue
.
The second (and
much more frequent) cause was two pairs (*c*_{1},*c*_{2}) and
(*c*_{1}',*c*_{2}')with
,
,
where *k* is a small
integer. In this case,

and the factors of

Before this system of equations was reduced using structured Gaussian
elimination (see Section 6), it was first ``squeezed''
to eliminate dependent variables. Consider a real prime
which
may be factored into complex primes (*a*+*bs*) and (*a*-*bs*). All three of
these variables could appear as unknowns in any of the 288,017
equations. Since it is sufficient to compute the discrete logarithm
for any two of the three, we may reduce the density of the system by
rewriting all appearances of one of the three variables in terms of the
other two. For each triple of related primes
*q*_{i},(*a*_{i}+*b*_{i}*s*),(*a*_{i}-*b*_{i}*s*), the prime which appeared the fewest number
of times in the system of equations was rewritten in terms of the other
two. In most cases we replaced
(*a*_{i}-*b*_{i}*s*) with
*q*_{i}(*a*_{i}+*b*_{i}*s*)^{-1}; in the remaining instances
(*a*_{i}+*b*_{i}*s*) was
replaced with
*q*_{i}(*a*_{i}-*b*_{i}*s*)^{-1}. This process reduced the number
of unknown variables appearing in the system to 96,321.

The sieving program was written in C and utilized the portable
multiprecision arithmetic package distributed by A. Lenstra and
M. Manasse with their multiple polynomial quadratic sieve program. The
sieve was run on a Silicon Graphics 4D-220 computer. It has 4 R3000
MIPS Computers, Inc., 25 MHz processors, each rated at about 18 mips, or
18 times the speed of a DEC VAX 11/780 computer (and about 1.3 times the
speed of a DECstation 3100). This machine has 128 Mbytes of main
memory. Since our program occupied about 20 Mbytes of memory, all four
processors could sieve simultaneously over different *c*_{1} intervals.

It took our implementation approximately one hour of CPU time (running
on a single processor) to successfully sieve 1000 possible *c*_{1} values,
for a total running time of about 100 hours. Factoring the equations
derived from fully-smooth (*c*_{1},*c*_{2}) pairs took another 104 minutes, on
average, per 1000 possible *c*_{1} values. Neither of these running times
is particularly impressive; performance could probably be improved
significantly without too much effort. However, it should be noted that
even our implementation was able to collect the necessary equations
using only a few days worth of CPU time.

Aside from improving the efficiency of the algorithm, one could achieve about an order of magnitude improvement by working with a smaller factor base and using the single and double large prime variations. The reason we did not do this is that we wished to extrapolate our result to estimate the security of much larger systems. The efficiency of sieving is very easy to estimate, as one can run experiments on small ranges and also use the data from the large factoring experiments. On the other hand, this kind of extrapolation is harder to do in the case of solving linear equations, and so we decided to generate a very large system that we could experiment with.

In order to gain more information concerning the performance of the
Gaussian integer scheme, experiments were also performed on a 224-bit
prime *q* using essentially the same algorithm. This prime was chosen
to be similar to the one used in the Sun system in that
is a prime and
*q* = *T*^{2} + 2*V*^{2} for particular integers
.
The factor base was the same as that used in the case of the
192-bit prime. Sieving occurred over a substantially larger range; all
values of *c*_{1} in the interval
were
considered. Our sieve size (the range of possible *c*_{2} values) was
kept at
.
Sieving took under 1200 hours of CPU time, and
produced about
candidate fully-smooth (*c*_{1},*c*_{2})pairs. After removing those pairs for which *c*_{1} and *c*_{2} were not
relatively prime, factoring both sides of the corresponding
equations, and ``squeezing'' the resulting system,
we were left with a system of 164,841 equations in 94,398 unknowns.