This method is an adaptation and systematization of some of the standard techniques used in numerical analysis to minimize fill-in during Gaussian elimination, with some additional steps designed to take advantage of the special structure present in matrices arising from integer factorization and discrete logarithm algorithms. The part of the matrix corresponding to the very small primes is actually quite dense, while that corresponding to the large primes is extremely sparse. (In all cases that the authors have looked at, there are even variables corresponding to some large primes that do not appear in any equation.) This fact was taken advantage of in all previous solutions to large systems, in that Gaussian elimination was always performed starting at the sparse end. Had it been performed starting at the dense end, fill-in would have been immediately catastrophic.
By starting at the sparse end, substantial savings have been achieved. No precise measurements are available, but based on some data provided by R. Silverman (personal communication) it appears that about half the time was spent reducing n by n systems to about n/3 by n/3systems, which were very dense. This indicates a factor of more than 10 savings over ordinary Gaussian elimination that starts at the dense end. A. Lenstra has indicated that similar results occurred in his work with M. Manasse.
The basic idea of structured Gaussian elimination is to declare some columns (those with the largest number of non-zero elements) as heavy, and to work only on preserving the sparsity of the remaining light columns. As was suggested by Pomerance and Smith [20], the set of heavy columns is allowed to grow as the algorithm progresses, instead of being chosen once, as was originally proposed [17]. In practice, the matrix would be stored in a sparse encoding, with rows represented by lists of positions where the coefficients are non-zero and with lists of the corresponding coefficients. To visualize the operation of the algorithm, it is easiest to think of the full matrix, though. The algorithm consists of a sequence of steps chosen from the following:
As long as only these steps are taken, the number of non-zero
coefficients in the light part of the matrix will never increase. In
the original description in [17], it was suggested that
one might need to take further steps, involving subtracting multiples of
rows that have
non-zero elements in the light part of the
matrix. However, experiments (some already mentioned in
[17]) suggest that this is not only unnecessary, but also
leads to rapidly increasing storage and time requirements, and so it is
better to avoid such steps.
Experiments mentioned in [17] used a pseudo-random number generator to create data sets that had the statistics of very large discrete logarithm problems over fields of characteristic 2. Those experiments indicated that structured Gaussian elimination ought to be very successful, and that to achieve big reductions in the size of the matrix that has to be solved, the original matrix ought to be kept very sparse, which has implications for the choices of parameters in factorization and discrete logarithm algorithms. Those experiments indicated that if the matrix was sparse enough (either because one started with a sparse data set, or else because enough columns were declared heavy), one could expect a very rapid collapse of the system to a much smaller one. The results of the experiments that we performed on real data confirm these findings. Furthermore, they show that excess equations are a very important factor in the performance of the algorithm. If there are many more equations than unknowns, one can obtain much smaller final systems.
Two versions of the program were implemented, one for solving equations
modulo 2, the other for all other systems. (In the second version,
coefficients were never reduced modulo anything.) Their performances on
data set K were very similar, with the
version producing a
slightly smaller final system. The general versions never produced
coefficients larger than 40 in that case. This situation would probably
change if the matrix were not so sparse.
The matrix is stored in a single linear array, with several smaller arrays being used to hold information about the status of rows and columns (whether a given column is heavy, for example). Each active row (i.e., row that has not been eliminated, was not dropped as unnecessary, and has some non-zero entries in the sparse part) is stored as two adjacent lists, one for the indices of columns in the sparse part of the matrix that have non-zero entries, and one for the indices of the rows of the original matrix that make up the current row. (In the case of the general version, there are also corresponding lists of the coefficients of the matrix entries and of the rows.) When row j is subtracted from row i, a new entry for the modified row, still called i, is created at the end of the linear array, and the space previously occupied by rows i and j is freed up. When available space is exhausted, the large linear array is compacted by invoking a garbage collection routine. If this measure does not free up enough space, then Step 2 is invoked.
Many variations on the above approach are possible. Note that the number of non-zero entries in the light part of the matrix never increases. The only storage requirement that does grow is that for the lists of ancestor rows. Those, however, do not have to be kept in core. If one stores the history of what the algorithm does in a file, the ancestor lists can be reconstructed later. This is the approach used by Pomerance and Smith [20], for example, as well as by A. Lenstra and M. Manasse. Our implementation was motivated by the availability of substantial memory on our machine and the fact that when the ancestor lists get large, the heavy part of the matrix gets quite dense, which is undesirable, and so (as will be described later) it seems better to thin out the matrix by using Step 2.
One advantage of the algorithm as described here is that it can be implemented in very little space. Our implementation keeps all the arrays in core, and is very wasteful in that it uses full 32-bit words for all pointers and coefficients. Since only a modest number of passes through the data were performed, one could keep the rows stored on a disk, and use core storage only for the more frequently accessed arrays that store information about row and column sums.
In our implementation, Step 1 is applied repeatedly until no more columns of weight 1 (i.e., with a single non-zero coefficient) exist, then Step 2 is used. The number of columns that are declared heavy affects the performance of the algorithm to some extent. We usually applied this step to about c/30 columns, where c is the number of currently light columns that have weight > 0. For matrices that were expected to reduce to very small size, such as data set K and sets derived from it, values around c/100 were used. Generally speaking, the smaller the number of columns that are put into the heavy part at a single time, the better the final result, but also more work is required. (Pomerance and Smith [20] use values of about c/1000, for example.) The columns that are declared heavy are those of highest weight. Step 2 is followed by Step 4, which is applied repeatedly. When Step 4 cannot be applied any longer, Step 2 (followed by Step 4) is applied again, and so on. At a certain stage, when the number of heavy columns is a substantial fraction of the expected size of the final dense matrix, Step 3 (followed by Step 1) is invoked. The selection of the point at which to apply Step 3 is very important, and will be discussed later.
Very often, especially if the initial matrix is fairly dense, or there are not many more equations than unknowns, the final stages of structured Gaussian elimination produce rows that have many ancestors, and so the heavy parts of those rows are quite dense. What was done to counteract this tendency was to first run the algorithm as described above, and then rerun it with two parameters c1 and c2that were set based on the experience of the first run. When the number of heavy columns exceeded c1, Step 2 was invoked so as to bring this number all the way up to c2. After this application of Step 2, the sparse part of the matrix usually collapsed very quickly. The results of this step are described below and in Table 3.
The description above is not very precise. The reason is that the various elements of the algorithm can be, and often were, applied in different sequences and with different parameters. The output is fairly sensitive to the choices that are made. No clear guidelines exist as to what the optimal choices are, since it was hard to explore all the possible variations. However, even very suboptimal choices usually lead to small final systems.
The output of the structured Gaussian elimination program is a smaller matrix, which then has to be solved by another method. In our experience (primarily based on data set K), substitution of the results of solving the dense system into the original one gives values for almost all of the original variables in a small number of passes, each of which looks for equations in which only one variable is not yet determined.
| Data Set |
Number of Equations
|
Number of
Unknowns |
|
Size of Dense Matrix |
Percent Reduction |
| A | 35,987 | 35,000 | 1.03 | 9,222 | 73.7 |
| B | 52,254 | 50,001 | 1.05 | 12,003 | 76.0 |
| C | 65,518 | 65,500 | 1.00 | 17,251 | 73.7 |
| D | 123,019 | 106,121 | 1.16 | 12,700 | 88.0 |
| E | 82,470 | 80,015 | 1.03 | 36,810 | 54.0 |
| E1 | 82,470 | 75,015 | 1.10 | 31,285 | 58.3 |
| F | 25,201 | 25,001 | 1.01 | 11,461 | 54.2 |
| G | 30,268 | 25,001 | 1.21 | 10,835 | 56.7 |
| H | 61,343 | 30,001 | 2.04 | 19,011 | 36.6 |
| I | 102,815 | 80,001 | 1.29 | 32,303 | 59.6 |
| J | 226,688 | 199,203 | 1.14 | 90,979 | 54.3 |
Structured Gaussian elimination was very successful on data set K,
since it reduced it to set L very quickly (in about 20 minutes for
reading the data, roughly the same amount of time for the basic run, and
then under an hour to produce the dense set of equations that form set
L). It also worked well on the other systems.
Table 2 summarizes the performance of structured
Gaussian elimination on data sets A through J, and
Table 4 does this for sets K,
,
and M.
The size of the dense matrix is the number of unknowns in the reduced
system. In each reduced data set, the number of equations exceeded the
number of unknowns by 20.
| Data Set | c1 | c2 |
Average Weight of
Dense Row |
| B | - | - |
|
| B | 8,000 | 16,000 | 456 |
| B | 6,000 | 16,000 | 486 |
| B | 6,000 | 20,000 | 149 |
| E | - | - | 6,620 |
| E | 20,000 | 50,000 | 260 |
| E | 20,000 | 60,000 | 115 |
| E1 | - | - | 6,172 |
| E1 | 20,000 | 35,000 | 1,366 |
| E1 | 25,000 | 40,000 | 499 |
| K | - | - | 1,393 |
| K | 3,000 | 3,400 | 883 |
| K | 3,000 | 4,000 | 346 |
| K | 2,500 | 4,000 | 230 |
| K | 2,000 | 4,500 | 140 |
| M | - | - | 2,602 |
| M | 6,750 | 9,750 | 295 |
| M | 6,750 | 10,500 | 212 |
| M | 6,750 | 11,000 | 177 |
Obtaining a small set of equations is not satisfactory by
itself in many cases, since the new system might be so dense as to be
hard to solve. Table 3 presents some data on this point.
For example, while set B was reduced to about 12,000 equations, the
resulting set was very dense, with each row having on average
non-zero entries. When the number of heavy columns was increased
to 20,000 as soon as it exceeded 6,000, the resulting dense matrix had
only 149 non-zero entries per row. Similarly, for set E, the standard
algorithm produces 36,810 equations, with 6,620 non-zeros each on
average. If we only reduce the system to 60,000 equations, the
resulting matrix has average row weight of only 115.
Table 3 also shows how the density of final systems derived from discrete logarithm data could be improved. Data set Kwas reduced to a system in 3,215 unknowns, but that system had, on average, 1,393 non-zero entries per equation. By invoking Step 2 early, before the number of heavy columns reached 3,215, less dense final systems were obtained. Increasing the number of unknowns to 4,500 as soon as 2,000 heavy columns are obtained reduced the density of the smaller system to only 140 non-zero entries per equation. For data set M, a similar order of magnitude decrease in the density of the smaller system was obtained with less than a factor of two increase in the number of unknowns.
| Data Set |
Number of Equations
|
Number of
Unknowns |
|
Size of Dense Matrix |
Percent Reduction |
| K | 288,017 | 96,321 | 2.99 | 3,215 | 96.7 |
| K0 | 216,105 | 95,216 | 2.27 | 3,850 | 96.0 |
| K1 | 165,245 | 93,540 | 1.77 | 4,625 | 95.1 |
| K2 | 144,017 | 94,395 | 1.53 | 9,158 | 90.3 |
| K3 | 144,432 | 92,344 | 1.56 | 5,534 | 94.0 |
| K4 | 144,017 | 89,003 | 1.62 | 3,544 | 96.0 |
| K5 | 115,659 | 90,019 | 1.28 | 6,251 | 93.1 |
| K6 | 101,057 | 88,291 | 1.14 | 7,298 | 91.7 |
| M | 164,841 | 94,398 | 1.75 | 9,508 | 90.0 |
Table 4 presents the results of some experiments that show the influence of extra equations and variations in matrix density. All the runs were performed with the same algorithm on data sets derived from data set K and solved the system modulo 2. The performance of the algorithm on set K that is reported in Table 4 is better than that mentioned before (which reduced it to set L, which has 6,006 unknowns). This is partially due to working modulo 2, but mostly it results from use of somewhat different parameters in the algorithm, and not caring about the density of the final dense matrix.
The results for sets K2, K3, and K4 might seem very counterintuitive, since the densest set (K4) was reduced the most, while the sparsest one (K2), was reduced the least. This appears to be due to the fact that heavy rows tend to have few entries in the sparse part of the matrix. (This has been checked to be the case in the data, and is to be expected, since in the discrete logarithm scheme that was used to derive set K [10], equations come from factoring integers of roughly equal size, so that if there are many prime factors, few of them can be large.) Thus, by selecting the heaviest rows, one makes it easier for structured Gaussian elimination to take advantage of the extreme sparsity of the sparse end of the matrix.
The results obtained with set K may not be entirely characteristic of what one might encounter in other situations because this data set is much larger in the number of unknowns (as well as in the number of equations) than would be optimal for solving the discrete logarithm problem of [10]. If one selected only equations out of K that had roughly the 25,000 unknowns corresponding to the smallest primes and prime elements of smallest norms, set K would have yielded a solution to it. The existence of all the extraneous variables and equations may enable structured Gaussian elimination to yield a better result than it would obtain in more realistic situations.
The results for systems A to J were derived in a non-systematic
manner; it is quite likely that much better results can be
obtained by different choices of parameters. In the case of system K,
and to some extent also systems K0 through K6, more extensive tests
were performed. They were all done with a linear array of length
for storage, and with variations only in the applications
of Steps 2 and 3. The number of columns to which Step 2 was applied did
not seem to have a major influence on the size of the final matrix. On
the other hand, the decision of when to apply Step 3 was very important.
In all the experiments that were carried out, essentially all the excess
rows were dropped at the same time; the effect of spreading out this
procedure was not studied. It appeared that the best time to apply Step
3, if one is simply trying to minimize the size of the final system, is
when the number of heavy columns reaches the size of the final matrix,
since in that case the system tends to collapse very quickly after the
application of Step 3. In practice, what this means is that one has to
experiment with several different thresholds for when to apply Step 3.
Since the generation of the dense equations usually takes several times
(and when the dense system is large, many times) longer than structured
Gaussian elimination, this does not effect the total running time very
much.
On the basis of the experiments that have been performed so far, it appears that the best results are achieved if the point at which Step 3 is applied is chosen so that the steps that follow reduce the matrix very rapidly, without any additional applications of Step 2. For example, the entry for set K in Table 4 was obtained by specifying that Step 3 be applied as soon as the number of heavy columns exceeded 3,200. The resulting matrix collapsed right afterwards. On the other hand, when the excess rows were deleted as soon as the number of heavy columns exceeded 3,150, the algorithm took a lot longer to terminate, resulted in a final matrix with 3,425 columns (but with the density of the final matrix lower than in the other case). At an extreme, set K2 (which corresponds to dropping 144,000 rows right at the beginning of the algorithm) resulted in a final matrix of size 9,158. In the case of systems with fewer excess equations (such as set A), some results indicate that it is preferable to apply Step 3 somewhat earlier.
The results reported here are very specific to our implementation. One of the essential features of the program was that it kept the lists of ancestor rows in core memory. Thus the algorithm was constrained most severely by the size of the big linear array, which essentially limited the total number of ancestor rows of the active rows. This had the desirable indirect benefit of producing a relatively sparse final matrix, but it undoubtedly made the algorithm behave quite differently than it would have otherwise. In particular, it must have skewed the comparisons between different data sets (since initially smaller data sets in effect could have more ancestor rows). It might be worthwhile to experiment with various variations of this method.
In the case of data set J (which came from the factorization of F9), our version of structured Gaussian elimination reduced 199,203 equations to 90,979. A. Lenstra and M. Manasse used their version of the program to reduce set J to about 72,000 equations. Their program did not maintain ancestor lists, but the final matrix was almost completely dense, with about 36,000 non-zero coefficients per equation. Our program produced equations that on average had 7,000 non-zero coefficients. Looking at the output of the program, it appears that as soon as the number of heavy columns exceeded about 70,000, catastrophic collapse of the sparse part of the matrix began. The increase in the size of the heavy part was caused by space restrictions which bounded the size of the ancestor lists. In the F9 case, since the final matrix was solved using ordinary Gaussian elimination, the decrease in the density of the final matrix that our program gave was probably not worth the increase in size. In other applications, especially when dealing with solving equations modulo large primes, and with sparser initial systems, our strategy is likely to be preferable.
The main conclusion that can be drawn from the results of these experiments is that sparse systems produce much smaller final systems than do denser ones. What is perhaps even more important, however, is that excess equations substantially improve the performance of the algorithm. When one has access to a distributed network of machines with a lot of available computing time, but where solving the matrix might be a bottleneck (due to a need to perform the calculations on a single processor) one can simplify the task substantially by choosing a larger factor base and obtaining more equations. In extreme cases, when using the quadratic sieve, for example, and when only small machines are available, it might even be worthwhile not to use the two large primes variation of A. Lenstra and M. Manasse [13] (which in any case only appears to be useful for integers > 10100), or possibly not even the old single large prime variation.
It appears that structured Gaussian elimination should be used as a preliminary step in all linear systems arising from integer factorization and discrete logarithm algorithms. It takes very little time to run, and produces smaller systems, in many cases dramatically smaller. For this method to work best, linear systems ought to be sparse, and, perhaps most important, there should be considerably more equations than unknowns. Producing the extra equations requires more effort, but it does simplify the linear algebra steps.