We first describe the Lanczos algorithm.
Suppose that we have to solve the system
The Lanczos algorithm was invented to solve systems with real coefficients . To solve systems over finite fields, we simply take Equations 3 to 8 and apply them to a finite field situation. This causes possible problems, since over a finite field one can have a non-zero vector that is conjugate to itself. However, this difficulty, as well as some other ones that arise, can be overcome in practice.
In addition to dealing with self-conjugacy, we have to cope with the
fact that the systems we need to solve are in general not symmetric, but
rather are of the form
The Lanczos algorithm is not restricted to dealing with sparse matrices,
but that is where it is most useful. At iteration i, we need to
compute the vector
vi+1 = Awi (
vi = Awi-1 is already known
from the previous iteration), the vector inner products
(vi+1,vi), and then form
wi+1 using Equation 6. The matrix A will in general
be dense, even when B is sparse. However, we do not need to form A,
since to compute Awi we use Equation 10 and compute
In practice, B will usually be stored with rows represented by lists of positions where that row has a non-zero entry, and (in the case of non-binary problems) by lists of non-zero coefficients. Thus we need about b pointers and (for non-binary data) about b bits to indicate whether that coefficient is +1 or -1. (Non-zero coefficients that are not are relatively rare and can be treated separately.) The pointers normally have to be of at least bits, but one can reduce that by taking advantage of the fact that most indices of positions where the coefficient is non-zero are very small. In any case, storage is not a problem even for very large systems. In the implementations described below, full 32-bit words were used for all pointers and coefficients. The largest of the systems in Table 1 had , so this did not cause any problems on the machine that was used. In fact, our algorithm had separate representations for both B and BT, which is wasteful.
In a typical situation, we expect that the Lanczos method will be applied to a system that was processed first by structured Gaussian elimination, and so it will not be too sparse. As a result, the cost of the vector inner products ought to be dominated by the cost of the matrix-vector products, 2nbc1. As we will describe later, memory access times will often be the dominant factor in determining the efficiency of the matrix-vector product.
In discrete logarithm applications, the system given by
Equation 9 is usually overdetermined, and so the aim is
to find the unique x that solves it. In applications to factoring,
though, one is looking for linear dependencies in a set of vectors, and
it is necessary to find several such dependencies. This can be stated
as the problem of solving the system in Equation 9 with
B fixed, but for several vectors u, so that we need
Although the above analysis was based on the Lanczos algorithm, the derived complexity bounds also serve as rough approximations for the conjugate gradient (CG) method . The CG and Lanczos algorithms are very similar. The iterations for the two algorithms are slightly different, but the operation count is almost exactly the same.
Both the Lanczos and the CG algorithms were implemented, Lanczos for solving equations modulo a large prime, CG for solving equations modulo 2. Both worked effectively, as will be described below. Both usually required n iterations to solve a system of size n. One unexpected problem was encountered, though. In the standard Lanczos and CG algorithms, it is not necessary that the matrix A in Equation 1 be non-singular. As long as w is in the linear span of the columns of A, the algorithms will converge. In the case of equations over finite fields, however, we observed that a system of less than full rank often gave rise to a self-conjugate vector, and thus to an abort of the algorithm. This phenomenon has not been explored carefully. It was not very important in the cases where these algorithms were tried (computing discrete logarithms), since there the systems of linear equations are overdetermined. This issue might be much more important in the case of factoring algorithms, since in that case one needs to find many linear dependencies modulo 2.
The CG implementation uses auxiliary storage to carry out field
operations fast. The field F is chosen to be GF(2r), with r =
19,20, or 21. F is defined as polynomials of degree over GF(2) modulo a fixed irreducible polynomial of degree r.
are represented by a full word,
with the digits in the binary representation of
corresponding to the coefficients of .
is represented by the exclusive-or of
and this operation is fast.
Multiplication is carried out with the aid of an auxiliary array. Some
is chosen, and then an array w(j),
is generated, with
equal to the discrete logarithm of
The implementation of the Lanczos algorithm is less efficient than that of CG. The very portable, and reasonably fast, multiprecision code that A. Lenstra and M. Manasse distribute with their factoring package  was used. When the non-zero entry in the matrix was or , addition or subtraction routines were invoked; approximately 90% of the non-zero entries in matrix B were in data set L. In the other rare cases, multiprecision multiplication was invoked, even when it would have been more efficient to perform several additions.
The implementation of the Lanczos algorithm solved system L modulo a prime of 191 bits in 44 hours. The CG algorithm solved that system modulo 2 in 1.9 hours. Timings of about 100 iterations of the CG algorithm on system E indicate that this system would require about 190 hours to solve.
While quite large systems can be solved with the CG algorithm, the
timings reported above are rather slow. For system L,
Equation 13 indicates that the total cost ought to be