The
development of digital signal processing dates from the 1960's
with the use of mainframe digital computers for number-crunching
applications such as the Fast Fourier Transform (FFT), which
allows the frequency spectrum of a signal to be computed rapidly.
These techniques were not widely used at that time, because
suitable computing equipment was available only in universities
and other scientific research institutions.
DSP
technology is nowadays commonplace in such devices as mobile
phones, multimedia computers, video recorders, CD players, hard
disc drive controllers and modems, fax machines and will soon
replace analog circuitry in TV sets and telephones, signal compression
and decompression such as in CD systems, in digital cellular
phones to allow a greater number of calls to be handled simultaneously
within each local "cell", telecommunications, computers,
consumer electronics, automotive, industrial controls, GPS,
medical instrumentation and defence/aerospace applications,
speech synthesizer, high-speed modem chip set, TV set-top box
chip set, MPEG encoders/decoders etc.
Over
the last 40 years, different algorithms have been proposed to
compute the FFT that stands at the core of all DSP operations
on IC chips. In order to improve the capacity of the processors
to handle a large flow of data in real time, all algorithms
aim at reducing the computational (reduction in the number of
serial multiplications) and communications loads, hence a reduction
in total amount of time it takes to generate the results.
FFT
Invention
The
conceptual key to this invention is the formulation of the radix-r
FFT as composed of butterflies with identical structures and
a systematic means of accessing the corresponding multiplier
coefficients. This enables the design of a processing element
(PE) which utilizes r complex
stage, butterfly, element) to the addresses of the multiplier
coefficients needed. For a single-processor environment, this
type of PE would result in a decrease
multipliers in parallel to implement each of the butterfly computations.
There is a simple mapping from the three indices (FFT in time
delay for the complete FFT by a factor of O(r). Trivial multiplication
encountered during the execution of particular butterflies may
be avoided by simple checks on the coefficient addresses. Avoiding
trivial multiplications reduces the computational load of particular
butterflies but would not be advantageous (in terms of decreasing
time delay for the complete FFT) in situations where multiple
PE's are being executed in parallel on different processors.
Radix-4
Engine.
"Butterfly-processing element for efficient Fast Fourier
Transform method and apparatus", US patent no. US-6751643.
Each
of the complex multiplier/complex adder in the JDSPE (Jaber
DSP Engine) could be implemented in parallel where the complex
multiplier/Mixer mega-function can multiply two complex numbers
or mix two signals in which the input width, output width, and
processing latency of the complex multiplier /mixer could be
customized.
Complex
Multiplier/Mixer With Four Parallel Multipliers
Complex
Mixer
Since
the multipliers are in general more costly in hardware implementation,
an alternate model of the complex multiplier is illustrated
below.
Complex
Multiplier/Mixer With Three Parallel Multipliers
JABER
* Wireless/Data/Image/Video-Encryption-Compression * is a
powerful new high-tech digital information management software
platform that provides dramatic capabilities for protecting
and reducing the storage and transmission requirements of
digital information * text/data, images and video * over a
wide number of platforms, applications, and media.
JABER
integrates into a single package the world's highest compression
and encryption technologies, secure dual database architecture
capabilities, and a multitude of other features all of which
enable it to greatly surpass competitive technologies in performance,
function, and versatility.
Based
upon the unique integration of artificial intelligence, neural
networks and various proprietary technologies, JABER offers
solutions to telecommunications, computer, broadcast and numerous
other industries never before available. Due to its versatility
and capabilities, this technology can be employed in virtually
every application that handles, stores, transmits or utilizes
digital information. Most importantly, JaberTech is positioned
to take this new technology to market immediately both to
general business customers as a solution to data security
and transmission and to specialized customers such as healthcare
companies as their solution to data management and privacy.
The
FFT Module
The
second aspect of the FFT invention is that the Jaber PE's
are also useful in parallel multiprocessing environments.
In essence, the precedence relations between the butterflies
in the radix-r FFT are such that the execution of r butterflies
in parallel is feasible during each FFT stage. If each butterfly
is executed on a Jaber PE, it means that each of the r parallel
processors would always be executing the same instruction
simultaneously, which is very desirable for SIMD implementations
on some of the latest DSP cards.
Radix-4
Module
"Butterfly-processing element for efficient Fast Fourier
Transform method and apparatus", US
patent no. US-6751643.
The
success of computational science to accurately describe and
model the real world has helped to fuel the ever increasing
demand for cheap computing power. Scientists are continually
looking for ways to test the limits of theories, using high
performance computing to allow them to simulate more realistic
systems in greater detail. Parallel computing offers a way to
tackle these problems in a cost effective manner.
One reason for this is economic. By making use of "off
the shelf" components, parallel computers can offer higher
performance at lower prices than machines which use specially
developed processors. In addition, the inherent scalability
of parallel computers allows for them to be upgraded as the
need arises. Whereas serial architectures are upgraded by making
the previous processors obsolete, parallel architectures can,
in theory, be upgraded simply by adding more processors.
However, there exists another reason, fundamental physical law,
which will ultimately limit the speed of single processors,
irrespective of the economics. Movement of information forms
the basis of a computer, but the speed of this movement is eventually
limited by the speed of light. If instead the distance traveled
by this information was reduced, eventually the need to avoid
the uncertainties introduced by quantum mechanics would limit
the separations of the paths along which the information could
travel.
These two reasons
of economics and physics, coupled with the inherent scalability
of parallel computers, points to a future of high performance
computing which is based in some way on the ideas of parallelism.
Parallel
Multiprocessing for the Fast Fourier Transform
The
computational of the fast Fourier transforms (FFTs) is the
cornerstone of many super-computer applications. These include
not only the common ones such as digital signal processing,
speech recognition, image processing, and petroleum seismic
analysis, but also other less obvious applications, such as
in computational fluid dynamics, medical technology, multiple
precision arithmetic and computational number theory. Computations
worthy of a parallel computer generally fall into four categories:
1)
one or a few very long 1-D FFTs.
2)
many small or moderate-sized 1-D FFTs.
3)
one or a few large 2-D FFTs.
4)
one or a few large 3-D FFTs.
The most significant problem in spectral analysis resides in
its data's parallel multiprocessing. This difficulty arises
in finding a feasible algorithm that could meet the following
objectives:
1) To build an algorithm, which could be easily implemented
on DSP cards of the newest technology?
2) The r parallel processors should execute a single instruction
simultaneously.
3) Reduce the N O P (no operations) to its minimum value.
4) Reduce the communication load between the r processors to
its minimum value.
5) Reduce the computational load to its minimum value.
6) No Pipeline break (or "pipeline stall"): the delay
caused on a processor using pipelines when a transfer of control
is taken (is absent).
7) Simplicity in design.
Parallel
Implementation of the FFT
The
last component in the picture will be triggred as soon as the
first circuit part will be performing the last iteration.
"PARALLEL
MULTIPROCESSING FOR THE FAST FOURIER TRANSFORM WITH PIPELINE
ARCHITECTURE" US patent no. 6792441
Digital Signal Processing (DSP) is
an engineering field that continues to extend its theoretical
foundations and practical implications in the modern world.
From the fulfillment of day-to-day needs, such as personal
communications, to sophisticated systems for biomedical
and tactical applications, DSP has a strong and ever-increasing
participation in the areas of work that are revolutionizing
our society.
Typical
DSP operations require simple many additions and multiplications,
which requires us to:
- Fetch two operands.
- Perform the addition or multiplication (usually both).
- Store the result or hold it for a repetition.
To fetch the two operands in a single instruction cycle,
we need to be able to make two memory accesses simultaneously.
Actually, a little thought will show that since we also
need to store the result - and to read the instruction itself
- we really need more than two memory accesses per instruction
cycle. Understanding how different aspects of the kernel
can impact memory architecture and usage will allow for
application fine-tuning and customizing. For this reason
DSP processors usually support multiple memory accesses
in the same instruction cycle. It is not possible to access
two different memory addresses simultaneously over a single
memory bus. There are two common methods to achieve multiple
memory accesses per instruction cycle:
- Harvard architecture.
- Modified von Neumann architecture.
So,
DSPs are typically used to input large amounts of data;
perform mathematical transformation on that data and then
output the resulting data all at very high rates. In a real
time system, data flow is important to understand and control
in order to achieve high performance. Analyzing the timing
characteristics for accessing data and switching between
data requestors can maximize bandwidth in a system. Since
the CPU should only be used for sporadic (non-periodic)
accesses to individual locations, it is preferable that
the data flow should be controlled by an independent device;
otherwise the system can incur performance degradation.
Such peripheral devices, which can control data transfers
between an I/O subsystem and a memory subsystem in the same
manner that a processor can control such transfers, reduce
CP interrupt latencies and leave precious DSP cycles free
for other tasks leading to increased performance. Special
channels were created, along with circuitry to control them,
which allowed the transfer of information without the processor
controlling every aspect of the transfer. This circuitry
is normally part of the system chipset (A number of integrated
circuits designed to perform one or more related functions)
on the DSP board.
The
Read/Write Address Generator:
The
main objective of the Read/Write Address Generator, which
is treated as a part of I/O system, is to provide a block
of memory addresses in or from which the introduced butterfly’s
input data or the processed butterfly’s output data
is collected from or stored into the specific provided memory
address locations.
Read/write
FFT Address Generator Structure.
The
Multiplier Coefficients Address Generator:
The
main role of the coefficient address generator is to provide
a block of memory addresses from which the multipliers
coefficients are collected and fed to the butterfly’s
multipliers input in order to be processed.
The
Multipliers Coefficients Address Generator Structure.
The
Control Unit:
The
flowchart of the control unit is illustrated below, which
is responsible in providing certain parameters to the DIT
RAD (Reading Address Generator), the DIT twiddle factor
address generator and the WAD (Writing Address Generator)
is illustrated in the figure below. As shown in this figure,
this complex process is implemented by mean of three simple
reset able and programmable counters which help the control
of the data flow of the input data by providing the right
parameter to the DIT reading/coefficient address generator
in order to provide the specific word of length r or series
of r input data/coefficient addresses to the input of the
butterfly PE or to provide a block of r addresses in which
the butterfly's processed output data is stored.
"ADDRESS
GENERATOR FOR FAST FOURIER TRANSFORM PROCESSOR", US
patent application no. US-60-289302 and European
patent application Serial no: PCT/US01/07602.
The
Company That Offers a Unique DSP System Solutions By The Parallel
Implementation of
Its Innovative DSP Core Engines For The Third Millennium Ultra
High Speed Applications