Factoring integers and computing discrete logarithms often requires
solving large systems of linear equations over finite fields. General
surveys of these areas are presented in
[14,17,19]. So far there have been few
implementations of discrete logarithm algorithms, but many of integer
factoring methods. Some of the published results have involved solving
systems of over
equations in more than
variables [12]. In factoring, equations have had to be solved
over the field *GF*(2). In that situation, ordinary Gaussian
elimination can be used effectively, since many coefficients (either 32
or 64 depending on machine word size) can be packed into a single
machine word, and addition can be implemented as the exclusive-or
operation. Even so, the large size of the systems to be solved has
often caused storage problems (a
by
system requires approximately 110 million words of storage on a 32-bit
machine), and it has often been difficult to obtain a correspondingly
large amount of computer time. In many cases, the linear systems were
purposefully kept smaller than would have been optimal from the
standpoint of other parts of the factoring algorithm, just to avoid the
linear algebra difficulties.

Clearly we cannot hope to be able to solve future systems (even in
*GF*(2)) using only ordinary Gaussian elimination. As the size of
integers being factored increases, so does the size of the system of
equations which must be solved. In addition, the recently discovered
number field sieve currently requires (when applied to factoring general
integers) the solution of equations over the integers, not just modulo
2. (The best way to obtain such a solution appears to be either to
solve the system modulo many small or moderate primes and then apply the
Chinese remainder theorem, or else to solve it modulo a particular
prime, and then lift that solution to one modulo a power of that prime.)
In the case of the number field sieve applied to general integers, the
linear algebra problem is currently one of the critical bottlenecks that
keep it impractical.

Even in cases where the equations have to be solved modulo 2, linear
algebra difficulties are becoming a serious bottleneck. As an example,
the very recent factorization of
*F*_{9} = 2^{29} + 1 = 2^{512} +1 (using
a special form of the number field sieve for integers of this form) by
A. Lenstra and M. Manasse involved solving a system with dimension
.
The attack on the RSA challenge cipher (which
will require factoring a 129 decimal digit integer) that is currently
planned using the quadratic sieve might require solving a system with
.

For discrete logarithm algorithms, the linear algebra problem
has always seemed to be even more important than in factoring, since the
equations have to be solved modulo large primes. The largest system
that has been solved was of size about
by
,
modulo
2^{127} -1, which arose in connection with discrete
logarithms in
*GF*(2^{127}) [3,17].

For an
system with
,
ordinary Gaussian
elimination takes about
operations. Modern
supercomputers are capable of between 10^{8} and 10^{9} integer
operations per second, so 10^{15} operations might take a few weeks to
a few months on such a machine, if one matrix operation took one machine
instruction. In practice, since up to 64 matrix operations are
performed in a single supercomputer operation, the required time
decreases substantially. However, for those without access to
supercomputers, time can easily be a barrier. It is possible to obtain
10^{15} machine operations for free, as that is the approximate amount
of work that the recent integer factorization achieved by A. Lenstra
and M. Manasse [12] required. However, that effort involved
a very decentralized and only loosely coordinated computation on
hundreds of workstations. This is acceptable for the sieving stage of
the quadratic sieve algorithm, but it causes problems when one tries to
solve a system of linear equations; solving such systems requires very
close coordination and errors propagate quickly, destroying the validity
of the final result. Thus the linear algebra phase requires the use of
a single machine (although it can be a massively parallel computer), and
it can often be hard to obtain access to one that is fast enough.
Memory requirements also present serious difficulties. If the problem
is to find a solution modulo 2, the full matrix has 10^{10} bits, or
1,250 megabytes. Only a few supercomputers have this much main memory.
On all other machines, one has to work on the matrix in blocks, which
slows down the operation considerably.

Both the time and space requirements become much more serious when
solving equations modulo a ``moderate size'' prime *p*. If
,
the operation count goes up from 10^{15} to 10^{19},
which is prohibitive even for a supercomputer. The storage requirements
increase to 10^{12} bits, which is 125 gigabytes, considerably more
than any existing machine has.

Fast matrix multiplication algorithms do not offer much hope. The
Strassen *n*^{2.81} method [21] is practical for *n* on the
order of several hundred, but does not save enough. Later methods, of
which the Coppersmith-Winograd *n*^{2.376} algorithm [8] is
currently the fastest, are impractical.

If the equations that arise in factoring and discrete log algorithms were totally random, there would be little hope of solving large systems. However, these equations, while they appear fairly random in many respects, are extremely sparse, with usually no more than 50 non-zero coefficients per equation. Moreover, they are relatively dense in some parts, and very sparse in others. Previous Gaussian elimination implementations, as is mentioned below, already take advantage of some of these features. In addition, several special systematic methods have been developed to take advantage of this sparsity. They are:

- 1.
- structured Gaussian elimination (also called intelligent Gaussian elimination) [17],
- 2.
- the conjugate gradient and Lanczos algorithms [7,17],
- 3.
- the Wiedemann algorithm [22].

To the authors' knowledge, the Wiedemann algorithm has never been tested
on a large system. The conjugate gradient and Lanczos methods have been
tested [7,17], but only on fairly small systems.
Structured Gaussian elimination was tested on some very large systems in
[17], but those tests used simulated data, while the runs
on real data derived from a factoring project solved only fairly small
systems. More recently, this method was implemented and used to solve
some fairly large binary systems by Pomerance and Smith
[20]. Even more recently, a version of this method was used
by A. Lenstra and M. Manasse in their factorization of *F*_{9}.

This paper reports on the performance of some of these algorithms on
several very large sparse systems derived from factoring and discrete
logarithm computations. The largest of the systems that were solved had
about
equations in about 10^{5} unknown, modulo a prime
of 191 bits; this system arose in the computation of discrete logarithms
in a certain prime field [10]. The basic conclusion is that the
conjugate gradient and Lanczos algorithms have essentially the same
performance and are very effective in finite fields. One of their
advantages is that they use relatively little space. However, even these
algorithms are too slow to tackle very large problems. The Wiedemann
algorithm (which was not implemented) has modest space requirements,
almost exactly the same as those of the conjugate gradient and Lanczos
methods. Its running time is likely to be comparable to those of
the conjugate gradient and Lanczos algorithms, with the precise
comparison depending on the architecture of the computer and
implementation details.

We have also found that structured Gaussian elimination is very effective in reducing a large, sparse problem to a much smaller one. Structured Gaussian elimination takes very little time, and can be implemented so as to take very little space. When dealing with a large sparse system modulo a prime, it appears that the best procedure is to first apply structured Gaussian elimination to obtain a smaller system that is still sparse, and then to solve the smaller system with one of the conjugate gradient, Lanczos, or Wiedemann algorithms. When working with equations modulo 2, it is probably better to use ordinary Gaussian elimination for the final step, or else one can use conjugate gradient, Lanczos, or Wiedemann for the entire problem.

Section 2 describes the data sets that were used in the computations, as well as the machine on which the algorithms were run. We describe in Section 3 the Lanczos and conjugate gradient algorithms, and their performance. Section 4 briefly discusses the Wiedemann algorithm. Structured Gaussian elimination is detailed in Section 5