![]() |
(24) |
If B has b non-zero coefficients, most of them
,
and c1is again the cost of an addition or subtraction in the field we work
with, then computation of
costs about
| (28) |
It is possible to cut down on the cost of the Wiedemann algorithm if one
uses additional storage. If one uses v= u, and stores
,
then the computation of x in
Equation 26 will only cost O(n2). In general, however,
storing
in main memory is likely to be impossible, as
that involves n2 storage of n2 field elements. On the other hand,
since each of the vj is only needed once during the computation of
x, the vj can be stored on a disk. If disk access is sufficiently
rapid, this method could be used to avoid the additional cost, and thus
make the Wiedemann method perform just about as fast as the CG and
Lanczos algorithms.
Another way to eliminate the need for the extra n matrix-vector products in the Wiedemann algorithm (and thus reduce its cost so that it is no more than that of the CG and Lanczos methods) is to carry out the Berlekamp-Massey computation at the same time that the vk are computed. At the cost of keeping around several additional vectors, this should allow one to construct the solution vector right away.
The assumption made above that taking K=1 will suffice does not seem unreasonable for large fields. In the binary case, one can use an approach similar to that in the CG implementation described in Section 3 to generate as many sequences as the word size of the computer being used by taking the vector v to be over GF(2r). This approach would also make it possible to obtain several solutions at once (as in Equation 14), once the cjare determined.
Most of the large systems that are likely to be solved in the near future are binary. In those cases, the discussion above implies that on a true random access machine, the Wiedemann algorithm is likely to be slower than CG or Lanczos by a factor of 3/2, and could approach their speed only by using substantial additional storage. However, on most machines data access is likely to be the main factor determining the efficiency of the algorithm. In the CG and Lanczos algorithms, the vectors wi that are multiplied by A have to be on the order of 20 bits, and for all foreseeable problems longer than 16 bits. In the Wiedemann algorithm, it is conceivable that it would suffice to work with 3 or 4 bit vectors. (This is a point that needs testing.) Therefore it is possible that one could utilize the cache much more efficiently.
The general conclusion is that the Wiedemann algorithm is of about the same efficiency as the CG and Lanczos algorithms. However, it is quite a bit more complicated to program, and some crucial steps, such as the randomization procedures described in [22] for dealing with non-square and highly singular cases, have apparently never been tested. (Our analysis above assumes that they would not cause any problems.) It would be desirable to implement the Wiedemann algorithm and test it on some large systems.