Introduction to the Stockham FFT

[Introduction to the Stockham FFT]

This page is a homepage explaining the Stockham algorithm which is a kind of the Fast Fourier Transform (FFT). I made this homepage for people who can not understand the Stockham algorithm but can understand the Cooley-Tukey algorithm. You will be able to understand that Stockham algorithm is a transformation of Cooley-Tukey algorithm if you read this page.

First of all, I show the Cooley-Tukey algorithm. When I implement the Cooley-Tukey algorithm using recursive call, the program will be as follows.

List-1: Cooley-Tukey Algorithm

#include <complex>
#include <cmath>
#include <utility>

typedef std::complex<double> complex_t;

void butterfly(int n, int q, complex_t* x) // Butterfly operation
// n : sequence length
// q : block start point
// x : input/output squence
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n > 1) {
        for (int p = 0; p < m; p++) {
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = x[q + p + 0];
            const complex_t b = x[q + p + m];
            x[q + p + 0] =  a + b;
            x[q + p + m] = (a - b) * wp;
        }
        butterfly(n/2, q + 0, x);
        butterfly(n/2, q + m, x);
    }
}

void bit_reverse(int n, complex_t* x) // Bitreversal operation
// n : squence length
// x : input/output sequence
{
    for (int i = 0, j = 1; j < n-1; j++) {
        for (int k = n >> 1; k > (i ^= k); k >>= 1);
        if (i < j) std::swap(x[i], x[j]);
    }
}

void fft(int n, complex_t* x) // Fourier transform
// n : sequence length
// x : input/output sequence
{
    butterfly(n, 0, x);
    bit_reverse(n, x);
    for (int k = 0; k < n; k++) x[k] /= n;
}

void ifft(int n, complex_t* x) // Inverse Fourier transform
// n : sequence length
// x : input/output sequence
{
    for (int p = 0; p < n; p++) x[p] = conj(x[p]);
    butterfly(n, 0, x);
    bit_reverse(n, x);
    for (int k = 0; k < n; k++) x[k] = conj(x[k]);
}

The characteristic of Cooley-Tukey algorithm is bit_reverse(). We need bit_reverse() to sort the result of FFT in a natural order. This is an important point that is different from the Stockham algorithm. In the Stockham algorithm, even if there is no bit_reverse(), the result of FFT is sorted in a natural order.

However, it is possible by the Cooley-Tukey algorithm even if we do not use bit_reverse() if we only want to sort the result of FFT in a natural order. The result of FFT becomes a natural order sequence if we do as follows.

List-2: Natural Order Cooley-Tukey Algorithm

#include <complex>
#include <cmath>

typedef std::complex<double> complex_t;

void fft0(int n, int q, complex_t* x, complex_t* y)
// n : sequence length
// q : block start point
// x : input/output sequence
// y : work area
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n > 1) {
        for (int p = 0; p < m; p++) { // Butterfly operation
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = x[q + p + 0];
            const complex_t b = x[q + p + m];
            y[q + p + 0] =  a + b;
            y[q + p + m] = (a - b) * wp;
        }
        fft0(n/2, q + 0, y, x);
        fft0(n/2, q + m, y, x);
        for (int p = 0; p < m; p++) { // Composition of even components and odd components
            x[q + 2*p + 0] = y[q + p + 0]; // Even components
            x[q + 2*p + 1] = y[q + p + m]; // Odd components
        }
    }
}

void fft(int n, complex_t* x) // Fourier transform
// n : sequence length
// x : input/output sequence
{
    complex_t* y = new complex_t[n]; // Allocation of work arera
    fft0(n, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] /= n;
}

void ifft(int n, complex_t* x) // Inverse Fourier transform
// n : sequence length
// x : input/output sequence
{
    for (int p = 0; p < n; p++) x[p] = conj(x[p]);
    complex_t* y = new complex_t[n]; // Allocation of work area
    fft0(n, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] = conj(x[k]);
}

When we assume that \(F_N\) is Fourier transform (the size is \(N, N = 2^L, L \ge 0, L\) is an integer) and assume that \(x_p~(p = 0,1,\ldots,N-1)\) is input sequence and assume that \(X_k~(k = 0,1,\ldots,N-1)\) is output sequence, Fourier transform is represented as \(X_k = F_N(x_p) = \frac{1}{N}\sum_{p=0}^{N-1}x_p W_N^{kp},~ W_N = \exp\left(-j\frac{2\pi}{N}\right),~j = \sqrt{-1}\). Now, if we assume \(N \ge 2\) and we decompose \(F_N\) by a recursion, the following relationship is satisfied.

\[ \begin{eqnarray*} 2X_{2k} & = & F_{\frac{N}{2}}(x_p + x_{p+\frac{N}{2}}) \\ 2X_{2k+1} & = & F_{\frac{N}{2}}\left((x_p - x_{p+\frac{N}{2}}) W_N^p\right) \\ p & = & 0,1,\ldots,\frac{N}{2}-1 \\ k & = & 0,1,\ldots,\frac{N}{2}-1 \\ \end{eqnarray*} \]

In this way, if we decompose Fourier transform by a recursion, even components(\(X_{2k}\)) and odd components(\(X_{2k+1}\)) are obtained. Therefore, in order to change the output to a natural order sequence, we have to obtain \(X_k\) by composing \(X_{2k}\) and \(X_{2k+1}\). The code performing it is the List-2. However, this code is slow. So, it is a turn of the Stockham algorithm.

In order to obtain the Stockham algorithm, let's start by eliminating the uselessness from the List-2. First, we notice that we are accessing to the array at two places. One is at the butterfly operation. Another is the place that we are composing even components and odd components. We can avoid uselessness if we execute butterfly operation and composition of even components and odd components at the same time. Using this idea, if we transform the List-2, it will be as follows.

List-3: Recursive version of the Stockham Algorithm

#include <complex>
#include <cmath>

typedef std::complex<double> complex_t;

void fft1(int n, int s, int q, complex_t* x, complex_t* y);

void fft0(int n, int s, int q, complex_t* x, complex_t* y)
// n : sequence length
// s : stride
// q : selection of even or odd
// x : input/output sequence
// y : work area
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n == 1) {}
    else {
        for (int p = 0; p < m; p++) {
            // Butterfly operation and composition of even components and odd components
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = x[q + s*(p + 0)];
            const complex_t b = x[q + s*(p + m)];
            y[q + s*(2*p + 0)] =  a + b;
            y[q + s*(2*p + 1)] = (a - b) * wp;
        }
        fft1(n/2, 2*s, q + 0, y, x); // Even place FFT (y:input, x:output)
        fft1(n/2, 2*s, q + s, y, x); // Odd place FFT (y:input, x:output)
    }
}

void fft1(int n, int s, int q, complex_t* x, complex_t* y)
// n : sequence length
// s : stride
// q : selection of even or odd
// x : input sequence
// y : output sequence
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n == 1) { y[q] = x[q]; }
    else {
        for (int p = 0; p < m; p++) {
            // Butterfly Operation and composition of even components and odd components
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = x[q + s*(p + 0)];
            const complex_t b = x[q + s*(p + m)];
            y[q + s*(2*p + 0)] =  a + b;
            y[q + s*(2*p + 1)] = (a - b) * wp;
        }
        fft0(n/2, 2*s, q + 0, y, x); // Even place FFT (y:input/output, x:work area)
        fft0(n/2, 2*s, q + s, y, x); // Odd place FFT (y:input/output, x:work area)
    }
}

void fft(int n, complex_t* x) // Fourier transform
// n : sequence length
// x : input/output sequence
{
    complex_t* y = new complex_t[n]; // Allocation of work area
    fft0(n, 1, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] /= n;
}

void ifft(int n, complex_t* x) // Inverse Fourier transform
// n : sequence length
// x : input/output sequence
{
    for (int p = 0; p < n; p++) x[p] = conj(x[p]);
    complex_t* y = new complex_t[n]; // Allocation of work area
    fft0(n, 1, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] = conj(x[k]);
}

This code has become a mutual recursion in order to minimize the accesses to an array. Stockham algorithm requires work area y for sorting. It saves sorted results once in the work area. If the saved results are written back immediately to x, the program is able to avoid becoming a mutual recursion. But, if we do so, it becomes meaningless that we have reduced the accesses to an array by executing butterfly operation and composition at the same time. For this reason, this code has become a mutual recursion.

I will explain about fft0() and fft1().

fft0() is the Fourier transform that the size is n. s is the stride of array access. q is used for selection of even or odd. x is input/output sequence and y is work area.

fft1() is the Fourier transform that the size is n. s is the stride of array access. q is used for selection of even or odd. x is input sequence and y is output sequence.

By the way, were you able to understand this transformation? In fact, this code is the Stockham algorithm. This is very rare recursive version of the Stockham algorithm. So, let's transform the calculation into a recurrence formula with this condition: \(n=2^{L-h},~m=2^{-1}n,~s=2^h,~x_h(q,p)=x_h[q+sp]\).

\[ \begin{eqnarray*} x_{h+1}(q,p) & = & x_h(q,p) + x_h(q,p + m) \\ x_{h+1}(q + s,p) & = & \left(x_h(q,p) - x_h(q,p + m)\right)W_N^{sp} \\ q & = & 0,1,\ldots,s-1 \\ p & = & 0,1,\ldots,m-1 \\ \end{eqnarray*} \]

\(x_h[~]\) is array of the \(h\)-step calculation. When we begin from \(x_0[~]\) and get \(x_L[~]\), we complete the FFT. However, \(x_L[~]\) needs to be divided by \(N\).

This program is called Dacimation In Frequency (DIF). In the case of Decimation In Time (DIT), the program will be as follows.

List-4: DIT Stockham Algorithm

#include <complex>
#include <cmath>

typedef std::complex<double> complex_t;

void fft1(int n, int s, int q, complex_t* x, complex_t* y);

void fft0(int n, int s, int q, complex_t* x, complex_t* y)
// n : sequence length
// s : stride
// q : selection of even or odd
// x : input/output sequence
// y : work area
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n == 1) {}
    else {
        fft1(n/2, 2*s, q + 0, y, x); // Even place FFT(x:input, y:output)
        fft1(n/2, 2*s, q + s, y, x); // Odd place FFT(x:input, y:output)
        for (int p = 0; p < m; p++) {
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = y[q + s*(2*p + 0)];
            const complex_t b = y[q + s*(2*p + 1)] * wp;
            x[q + s*(p + 0)] = a + b;
            x[q + s*(p + m)] = a - b;
        }
    }
}

void fft1(int n, int s, int q, complex_t* x, complex_t* y)
// n : sequence length
// s : stride
// q : selection of even or odd
// x : output sequence
// y : input sequence
{
    const int m = n/2;
    const double theta0 = 2*M_PI/n;

    if (n == 1) { x[q] = y[q]; }
    else {
        fft0(n/2, 2*s, q + 0, y, x); // Even place FFT(y:input/output, x:work area)
        fft0(n/2, 2*s, q + s, y, x); // Odd place FFT(y:input/output, x:work area)
        for (int p = 0; p < m; p++) {
            const complex_t wp = complex_t(cos(p*theta0), -sin(p*theta0));
            const complex_t a = y[q + s*(2*p + 0)];
            const complex_t b = y[q + s*(2*p + 1)] * wp;
            x[q + s*(p + 0)] = a + b;
            x[q + s*(p + m)] = a - b;
        }
    }
}

void fft(int n, complex_t* x) // Fourier transform
// n : sequence length
// x : input/output sequence
{
    complex_t* y = new complex_t[n]; // Allocation of work area
    fft0(n, 1, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] /= n;
}

void ifft(int n, complex_t* x) // Inverse Fourier transform
// n : sequence length
// x : input/output sequence
{
    for (int p = 0; p < n; p++) x[p] = conj(x[p]);
    complex_t* y = new complex_t[n]; // Allocation of work area
    fft0(n, 1, 0, x, y);
    delete[] y;
    for (int k = 0; k < n; k++) x[k] = conj(x[k]);
}

Let's transform the calculation into a recurrence formula with this condition: \(m=2^h,~t=2^{L-h},~s=2^{-1}t,~x_h(q,p)=x_h[q+tp]\).

\[ \begin{eqnarray*} x_{h+1}(q,p) & = & x_h(q,p) + x_h(q + s,p)W_N^{sp} \\ x_{h+1}(q,p + m) & = & x_h(q,p) - x_h(q + s,p)W_N^{sp} \\ q & = & 0,1,\ldots,s-1 \\ p & = & 0,1,\ldots,m-1 \\ \end{eqnarray*} \]

Unfortunately, this recursive version of the Stockham FFT is slow. So, we will lead the high-speed iterative version of the Stockham FFT in the next page.