[Optimization that reduce the accesses to an array]
In Section-2, I show the final version of the Stockham algorithm, From this algorithm, with a little ingenuity, we can reduce the accesses to an array. First of all, I show the List-7 again.
List-7: Final version of the Stockham Algorithm
#include <complex> #include <cmath> typedef std::complex<double> complex_t; void fft0(int n, int s, bool eo, complex_t* x, complex_t* y) // n : sequence length // s : stride // eo : x is output if eo == 0, y is output if eo == 1 // x : input sequence(or output sequence if eo == 0) // y : work area(or output sequence if eo == 1) { const int m = n/2; const double theta0 = 2*M_PI/n; if (n == 1) { if (eo) for (int q = 0; q < s; q++) y[q] = x[q]; } else { for (int p = 0; p < m; p++) { const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0)); for (int q = 0; q < s; q++) { const complex_t a = x[q + s*(p + 0)]; const complex_t b = x[q + s*(p + m)]; y[q + s*(2*p + 0)] = a + b; y[q + s*(2*p + 1)] = (a - b) * wp; } } fft0(n/2, 2*s, !eo, y, x); } } void fft(int n, complex_t* x) // Fourier transform // n : sequence length // x : input/output sequence { complex_t* y = new complex_t[n]; fft0(n, 1, 0, x, y); delete[] y; for (int k = 0; k < n; k++) x[k] /= n; } void ifft(int n, complex_t* x) // Inverse Fourier transform // n : sequence length // x : input/output sequence { for (int p = 0; p < n; p++) x[p] = conj(x[p]); complex_t* y = new complex_t[n]; fft0(n, 1, 0, x, y); delete[] y; for (int k = 0; k < n; k++) x[k] = conj(x[k]); }
If we look at this code carefully,
when n == 1
in fft0
,
this code does not calculate anything, this code does the copy only.
If we substitute a calculation result into the destination array beforehand,
we can remove this copy.
If I rewrite this code to such a code, the program will be as follows.
List-8: The Stockham algorithm that reduce the accesses to an array
#include <complex> #include <cmath> typedef std::complex<double> complex_t; void fft0(int n, int s, bool eo, complex_t* x, complex_t* y) // n : sequence length // s : stride // eo : x is output if eo == 0, y is output if eo == 1 // x : input sequence(or output sequence if eo == 0) // y : work area(or output sequence if eo == 1) { const int m = n/2; const double theta0 = 2*M_PI/n; if (n == 2) { complex_t* z = eo ? y : x; for (int q = 0; q < s; q++) { const complex_t a = x[q + 0]; const complex_t b = x[q + s]; z[q + 0] = a + b; z[q + s] = a - b; } } else if (n >= 4) { for (int p = 0; p < m; p++) { const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0)); for (int q = 0; q < s; q++) { const complex_t a = x[q + s*(p + 0)]; const complex_t b = x[q + s*(p + m)]; y[q + s*(2*p + 0)] = a + b; y[q + s*(2*p + 1)] = (a - b) * wp; } } fft0(n/2, 2*s, !eo, y, x); } } void fft(int n, complex_t* x) // Fourier transform // n : sequence length // x : input/output sequence { complex_t* y = new complex_t[n]; fft0(n, 1, 0, x, y); delete[] y; for (int k = 0; k < n; k++) x[k] /= n; } void ifft(int n, complex_t* x) // Inverse Fourier transform // n : sequence length // x : input/output sequence { for (int p = 0; p < n; p++) x[p] = conj(x[p]); complex_t* y = new complex_t[n]; fft0(n, 1, 0, x, y); delete[] y; for (int k = 0; k < n; k++) x[k] = conj(x[k]); }
As a side effect,
because the butterfly operation can be simplified when n == 2
,
we can also erase useless multiplication of only to multiply 1.
By the way, I made the FFT library using this algorithm. If you are interested, please to this page.