next up previous
Next: Structured Gaussian elimination Up: No Title Previous: Lanczos and conjugate gradient

  
The Wiedemann algorithm

This algorithm is described carefully in [22]. As was pointed out by E. Kaltofen, the basic idea of this algorithm has been known in numerical analysis for a long time in Krylov subspace methods [23]. (It is rather interesting to note that the Lanczos algorithm is also a Krylov subspace method.) The main innovation in the Wiedemann algorithm is the use of the Berlekamp-Massey algorithm [15,16], which in a finite field setting allows one to determine linear recurrence coefficients very fast, even for huge systems. Here we present only an outline of the method which will enable us to estimate its efficiency. Let us assume that we need to solve

 
Bx = u, (23)

where B is a sparse non-singular $n \times n$ matrix. (See [22] for methods of dealing with non-square and singular matrices.) B is not required to be symmetric in this algorithm. Suppose that the minimal polynomial of B is given by

\begin{displaymath}\sum_{j=0}^N c_jB^j = 0,\;\; N \leq n,\;\; c_0 \neq 0.
\end{displaymath} (24)

Then for any n-vector v, we have

 \begin{displaymath}\sum_{j=0}^{N} c_j B^{j+k}v = 0,\;\; \forall k \geq 0.
\end{displaymath} (25)

If we let vk = Bkv, then Equation 25 says that any one of the coordinates of $v_0,v_1,\ldots{},v_{2n}$ satisfies the linear recurrence with coefficients $c_0,c_1,\ldots{},c_N$. Given any 2nterms of a linear recurrent sequence of order $\leq n$, the Berlekamp-Massey algorithm will find the minimal polynomial of that sequence in O(n2) operations. (There are even faster variants of the algorithm [1,2,4] which are likely to be practical for very large systems.) Hence if we apply the Berlekamp-Massey algorithm to each of the first K coordinates of the vectors $v_0,\ldots{},v_{2n}$, in O(Kn2) steps we will obtain the Kpolynomials whose least common multiple is likely to be the minimal polynomial of B. Once we find $c_0,\ldots{},c_N$, we can obtain the solution to Equation 23 from
 
u = $\displaystyle -c_0^{-1} \sum_{j=1}^{N} c_j B^j u$  
  = $\displaystyle B \left[ -c_0^{-1} \sum_{j=1}^N c_j B^{j-1} u \right].$ (26)

If B has b non-zero coefficients, most of them $\pm 1$, and c1is again the cost of an addition or subtraction in the field we work with, then computation of $v_0,\ldots{},v_{2n}$ costs about

 \begin{displaymath}a\ n\ b\ c_1.
\end{displaymath} (27)

This is just about the cost of the Lanczos and CG algorithms. Wiedemann's method saves a factor of 2 by not having to multiply by both B and BT, but loses a factor of 2 by having to compute Bjv for $0 \leq j \leq 2n$. (In the case of binary data, when all the non-zero coefficients are 1, the cost drops to nbc1, a reduction similar to that in the Lanczos and CG algorithms.) The Berlekamp-Massey algorithm is expected to be very fast, with a cost of cn2 for some small constant c, and this ought to be much less than Equation 27. Finally, obtaining u through Equation 26 will cost about

\begin{displaymath}n\ b\ c_1\ +\ n^2\ c_2.
\end{displaymath} (28)

Thus if c2 is not too large compared to c1, or if the matrix is dense enough, the total cost of obtaining a solution using the Wiedemann algorithm is expected to be about 1.5 times as large as through the CG and Lanczos methods, even if K=1 suffices to determine the minimal polynomial of B.

It is possible to cut down on the cost of the Wiedemann algorithm if one uses additional storage. If one uses v= u, and stores $v_0,\ldots{},v_n$, then the computation of x in Equation 26 will only cost O(n2). In general, however, storing $v_0,\ldots{},v_n$ in main memory is likely to be impossible, as that involves n2 storage of n2 field elements. On the other hand, since each of the vj is only needed once during the computation of x, the vj can be stored on a disk. If disk access is sufficiently rapid, this method could be used to avoid the additional cost, and thus make the Wiedemann method perform just about as fast as the CG and Lanczos algorithms.

Another way to eliminate the need for the extra n matrix-vector products in the Wiedemann algorithm (and thus reduce its cost so that it is no more than that of the CG and Lanczos methods) is to carry out the Berlekamp-Massey computation at the same time that the vk are computed. At the cost of keeping around several additional vectors, this should allow one to construct the solution vector right away.

The assumption made above that taking K=1 will suffice does not seem unreasonable for large fields. In the binary case, one can use an approach similar to that in the CG implementation described in Section 3 to generate as many sequences as the word size of the computer being used by taking the vector v to be over GF(2r). This approach would also make it possible to obtain several solutions at once (as in Equation 14), once the cjare determined.

Most of the large systems that are likely to be solved in the near future are binary. In those cases, the discussion above implies that on a true random access machine, the Wiedemann algorithm is likely to be slower than CG or Lanczos by a factor of 3/2, and could approach their speed only by using substantial additional storage. However, on most machines data access is likely to be the main factor determining the efficiency of the algorithm. In the CG and Lanczos algorithms, the vectors wi that are multiplied by A have to be on the order of 20 bits, and for all foreseeable problems longer than 16 bits. In the Wiedemann algorithm, it is conceivable that it would suffice to work with 3 or 4 bit vectors. (This is a point that needs testing.) Therefore it is possible that one could utilize the cache much more efficiently.

The general conclusion is that the Wiedemann algorithm is of about the same efficiency as the CG and Lanczos algorithms. However, it is quite a bit more complicated to program, and some crucial steps, such as the randomization procedures described in [22] for dealing with non-square and highly singular cases, have apparently never been tested. (Our analysis above assumes that they would not cause any problems.) It would be desirable to implement the Wiedemann algorithm and test it on some large systems.


next up previous
Next: Structured Gaussian elimination Up: No Title Previous: Lanczos and conjugate gradient
Brian A. LaMacchia
1999-10-25