Factoring integers and computing discrete logarithms often requires solving large systems of linear equations over finite fields. General surveys of these areas are presented in [14,17,19]. So far there have been few implementations of discrete logarithm algorithms, but many of integer factoring methods. Some of the published results have involved solving systems of over equations in more than variables . In factoring, equations have had to be solved over the field GF(2). In that situation, ordinary Gaussian elimination can be used effectively, since many coefficients (either 32 or 64 depending on machine word size) can be packed into a single machine word, and addition can be implemented as the exclusive-or operation. Even so, the large size of the systems to be solved has often caused storage problems (a by system requires approximately 110 million words of storage on a 32-bit machine), and it has often been difficult to obtain a correspondingly large amount of computer time. In many cases, the linear systems were purposefully kept smaller than would have been optimal from the standpoint of other parts of the factoring algorithm, just to avoid the linear algebra difficulties.
Clearly we cannot hope to be able to solve future systems (even in GF(2)) using only ordinary Gaussian elimination. As the size of integers being factored increases, so does the size of the system of equations which must be solved. In addition, the recently discovered number field sieve currently requires (when applied to factoring general integers) the solution of equations over the integers, not just modulo 2. (The best way to obtain such a solution appears to be either to solve the system modulo many small or moderate primes and then apply the Chinese remainder theorem, or else to solve it modulo a particular prime, and then lift that solution to one modulo a power of that prime.) In the case of the number field sieve applied to general integers, the linear algebra problem is currently one of the critical bottlenecks that keep it impractical.
Even in cases where the equations have to be solved modulo 2, linear algebra difficulties are becoming a serious bottleneck. As an example, the very recent factorization of F9 = 229 + 1 = 2512 +1 (using a special form of the number field sieve for integers of this form) by A. Lenstra and M. Manasse involved solving a system with dimension . The attack on the RSA challenge cipher (which will require factoring a 129 decimal digit integer) that is currently planned using the quadratic sieve might require solving a system with .
For discrete logarithm algorithms, the linear algebra problem has always seemed to be even more important than in factoring, since the equations have to be solved modulo large primes. The largest system that has been solved was of size about by , modulo 2127 -1, which arose in connection with discrete logarithms in GF(2127) [3,17].
For an system with , ordinary Gaussian elimination takes about operations. Modern supercomputers are capable of between 108 and 109 integer operations per second, so 1015 operations might take a few weeks to a few months on such a machine, if one matrix operation took one machine instruction. In practice, since up to 64 matrix operations are performed in a single supercomputer operation, the required time decreases substantially. However, for those without access to supercomputers, time can easily be a barrier. It is possible to obtain 1015 machine operations for free, as that is the approximate amount of work that the recent integer factorization achieved by A. Lenstra and M. Manasse  required. However, that effort involved a very decentralized and only loosely coordinated computation on hundreds of workstations. This is acceptable for the sieving stage of the quadratic sieve algorithm, but it causes problems when one tries to solve a system of linear equations; solving such systems requires very close coordination and errors propagate quickly, destroying the validity of the final result. Thus the linear algebra phase requires the use of a single machine (although it can be a massively parallel computer), and it can often be hard to obtain access to one that is fast enough. Memory requirements also present serious difficulties. If the problem is to find a solution modulo 2, the full matrix has 1010 bits, or 1,250 megabytes. Only a few supercomputers have this much main memory. On all other machines, one has to work on the matrix in blocks, which slows down the operation considerably.
Both the time and space requirements become much more serious when solving equations modulo a ``moderate size'' prime p. If , the operation count goes up from 1015 to 1019, which is prohibitive even for a supercomputer. The storage requirements increase to 1012 bits, which is 125 gigabytes, considerably more than any existing machine has.
Fast matrix multiplication algorithms do not offer much hope. The Strassen n2.81 method  is practical for n on the order of several hundred, but does not save enough. Later methods, of which the Coppersmith-Winograd n2.376 algorithm  is currently the fastest, are impractical.
If the equations that arise in factoring and discrete log algorithms were totally random, there would be little hope of solving large systems. However, these equations, while they appear fairly random in many respects, are extremely sparse, with usually no more than 50 non-zero coefficients per equation. Moreover, they are relatively dense in some parts, and very sparse in others. Previous Gaussian elimination implementations, as is mentioned below, already take advantage of some of these features. In addition, several special systematic methods have been developed to take advantage of this sparsity. They are:
To the authors' knowledge, the Wiedemann algorithm has never been tested on a large system. The conjugate gradient and Lanczos methods have been tested [7,17], but only on fairly small systems. Structured Gaussian elimination was tested on some very large systems in , but those tests used simulated data, while the runs on real data derived from a factoring project solved only fairly small systems. More recently, this method was implemented and used to solve some fairly large binary systems by Pomerance and Smith . Even more recently, a version of this method was used by A. Lenstra and M. Manasse in their factorization of F9.
This paper reports on the performance of some of these algorithms on several very large sparse systems derived from factoring and discrete logarithm computations. The largest of the systems that were solved had about equations in about 105 unknown, modulo a prime of 191 bits; this system arose in the computation of discrete logarithms in a certain prime field . The basic conclusion is that the conjugate gradient and Lanczos algorithms have essentially the same performance and are very effective in finite fields. One of their advantages is that they use relatively little space. However, even these algorithms are too slow to tackle very large problems. The Wiedemann algorithm (which was not implemented) has modest space requirements, almost exactly the same as those of the conjugate gradient and Lanczos methods. Its running time is likely to be comparable to those of the conjugate gradient and Lanczos algorithms, with the precise comparison depending on the architecture of the computer and implementation details.
We have also found that structured Gaussian elimination is very effective in reducing a large, sparse problem to a much smaller one. Structured Gaussian elimination takes very little time, and can be implemented so as to take very little space. When dealing with a large sparse system modulo a prime, it appears that the best procedure is to first apply structured Gaussian elimination to obtain a smaller system that is still sparse, and then to solve the smaller system with one of the conjugate gradient, Lanczos, or Wiedemann algorithms. When working with equations modulo 2, it is probably better to use ordinary Gaussian elimination for the final step, or else one can use conjugate gradient, Lanczos, or Wiedemann for the entire problem.
Section 2 describes the data sets that were used in the computations, as well as the machine on which the algorithms were run. We describe in Section 3 the Lanczos and conjugate gradient algorithms, and their performance. Section 4 briefly discusses the Wiedemann algorithm. Structured Gaussian elimination is detailed in Section 5