# A Systolic Array Processor for Adaptive Channel Estimation

Adaptive signal processing, of which major applications are adaptive equalization as well as adaptive array antennas, has been considered to play an important role in future broadband wireless communications with signal transmission bitrate of, say, several tens of Mbit/s. Signal processor that can estimate parameters related to the communication channel on real-time basis is indispensable in such applications. This paper briefly describes the systolic array signal processor for channel estimation, which we have developed recently by using parallel processing technique.

## Takahiro Asai and Tadashi Matsumoto

### Introduction

Adaptive estimation of parameters related to communication channels such as tap coefficients in adaptive equalizers as well as weights on elements of adaptive array antennas is considered as a key to achieving robustness of signal detectors against variations in communication environments. In broadband mobile radio communications, time-varying nature of inter-symbol interference and co-channel interference imposes a lot of difficulties in determining optimal parameters on real time basis, when adaptive cancelling of the interference is aimed at. The Recursive Least Square (RLS) algorithm is a reasonable choice for the adaptive parameter estimation because of much faster convergence speed, however, it requires a complexity order of the squared number of parameters to be estimated. To solve this problem, several pipelining techniques for hardware implementation of the RLS algorithm, which are in common referred figuratively to as Systolic Array technique in terms of its operation, have been proposed [1]-[3]. This paper describes an outline of a Systolic Array RLS processor we have developed for broadband mobile communication applications.

# Key Issues in Broadband Mobile Communications

#### ■ Multipath Propagation

In broadband mobile signal transmission at signal transmission rates of several tens of Mbit/s, intersymbol interference is a crucial problem because delay spread becomes far



larger than symbol duration. Figure 1 shows a broadband radio propagation model, which characterizes multipath propagation scenario taking root in reflection, diffraction and/or scattering. Signal transmission performances degrade due to intersymbol interference caused by delayed signals. The higher the transmission rate, the severer the effects of intersymbol interference, and hence, reduction of intersymbol interference is of great importance.

#### ■ Maintaining Signal Strength

To meet the demand for broadband communications, a higher frequency band such as microwave band may have to be used. When using a higher frequency band, propagation loss becomes larger, and hence, maintaining sufficient signal strength at receiver is essential.

#### Co-channel Interference

To increase the system user capacity, reuse distance of the same frequency has to be as small as possible in cellular system configuration. For this purpose, as shown in Figure 1, effects of co-channel interference have to be reduced.

#### **■** Synchronization

In broadband mobile communications, signal transmission performances are degraded by imperfect synchronization. Hence, its effects have to be mitigated.

# Adaptive Equalizer and Adaptive Array Antenna

As a key technique to solve the problems described above, a joint use of adaptive equalizer and adaptive array antenna has been studied [1].

#### Adaptive Equalizer

Adaptive equalizer can reduce the effects of intersymbol interference. In particular, Maximum Likelihood Sequence Estimation (MLSE) is considered most effective in reducing the effects of intersymbol interference as well as in combining delayed signal components, thereby keeping sufficient received signal strength. However, the computational complexity with MLSE grows exponentially with channel memory length, which imposes prohibitive computational effort.

#### Adaptive Array Antenna

A primary objective of adaptive array antenna is to weight element outputs by setting amplitude and phase on each of the elements, and combine them, thereby an antenna beam pattern results which is in some sense optimal for the reception of desired signal. The weights are determined to satisfy some criteria, and updated in response to the change in propagation environment.

Effects of delayed signal and interference signal components can be reduced by using adaptive array antenna. Figure 2 shows an example of obtained beam pattern of an adaptive array antenna. The gains towards delayed desired and interference signals are 20dB or more below desired signal. Hence the effects of them can be reduced. To avoid increased computational complexity with the adaptive equalizer, combining of desired signal and its delayed version is not considered in Figure 2. The degradation in signal transmission performances caused by imperfect synchronization can properly be reduced by using a fractional tap transversal filter in adaptive array antenna [1].

#### Adaptive Algorithm

In adaptive equalizers, it is needed to estimate impulse response of the multipath channel. In adaptive array antennas, Weights on antenna elements have to be determined. For these purposes, adaptive algorithms are used. Figure 3 shows a basic structure of adaptive filter, where the filter



input, output and reference signal are denoted by u, y, and d, respectively. Estimation error is defined as the difference between d and y. Adaptive algorithm updates coefficients of the adaptive filter to minimize the estimation error.

For channel estimation in adaptive equalizers, as shown in Figure 3, transmitter sends periodically a reference sequence which is known to the receiver. Received waveform of the known reference signal is the input to the adaptive algorithm in Figure 3's configuration. Filter coefficients obtained as a result of the adaptive algorithm are the estimate of the impulse response of the multipath channel.

Similarly to the case of adaptive equalizers, adaptive array antennas require known reference signal to form optimal beam pattern. The known signal is then used in the same way as used in adaptive equalizers. The filter coefficients derived as a result of the adaptive algorithm are the weights on antenna elements of the adaptive array.

Optimum filter coefficients w can be obtained by

$$\mathbf{w} = \mathbf{R}_{xx}^{-1} \mathbf{P} \tag{1}$$

where R<sub>xx</sub> is the covariance matrix of the input signal u, and P is the cross-correlation vector between the input signal u and the reference signal d. The covariance matrix R<sub>xx</sub> can be calculated by using the received sampled data. The optimal weight w is then updated every time R<sub>xx</sub> is calculated. Although the method to update w based on Eq. (1) is optimal in the sense that the estimation error is minimized, the inverse matrix calculation in Eq. (1) requires prohibitively large computational effort. Hence, w in Eq. (1) has to be calculated in a recursive way. There are many recursive algorithms to obtain the filter coefficients. Among them, the RLS algorithm achieves fastest convergence, where the inverse matrix calculation is not needed. However, the computational complexity of the RLS algorithm is still of the order of the



square of the number of parameters to be estimated, hence real time operation of the RLS algorithm is still a heavy burden if the number of the parameters is large.

# Systolic Array Processor

Systolic array processor [2-6] is effective in high speed processing of the RLS algorithm. By using a systolic array RLS processor, the tap and weight updating in equalizers and adaptive array antenna, respectively, can be done in a very short period of time.

#### **■** Features

Systolic array processor is a processor comprised of individual cells, each of which has some local memories and is connected to its neighboring cells in the form of a regular lattice. Systolic array processor exhibits many desirable properties such as regularity and local interconnections which render them suitable for VLSI. It uses an orthogonal triangularization technique known in matrix algebra as the QR decomposition for parallel pipelined processing. Since the systolic array processor structure is very simple, it is easy to divide processor effectively. Furthermore, RLS signal processing using systolic array processor is numerically stable under conditions of limited arithmetic precision.

#### Configuration

Figure 4 shows a block diagram of the systolic array processor where the number of parameters to be estimated is three, and  $\beta^2$  is the forgetting factor of the RLS algorithm. There are three types of processing cells used in this architecture. The circles and squares represent the Boundary and Internal cells, respectively. The final cell is a simple two-input multiplier. The dots along the diagonal of the array represent storage elements. The entire array is controlled by a single clock. On each cycle of a simple clock, every cell receives data from its neighboring cells and performs a specific operation on it. The resulting data are stored within the cell and passed on to neighboring cells on the next clock cycle [3].

# Hardware Implementation of Systolic Array Processor

#### **■** Architecture

Systolic Array RLS algorithms are, in general, known to be relatively insensitive to limited precision of arithmetic due to fixed point signal processing. This motivated us to choose a





fixed-point signal processing architecture because it is suitable for fast processing speed compared with floating point signal processing. Figure 5 shows a picture of the prototyped

systolic array processor board that can estimate up to 10 parameters. The processor board is comprised of 19 Application Specific Integrated Circuit (ASIC) chips, each having approximately one Million gates.

#### Processing Speed

Speed of the clock, with which entire portion of the board is synchronized and works properly, is 18.842 MHz. With this clock speed, one cycle of the Internal cell signal processing takes about 500 nsec, and that of the Boundary cell signal processing about 80 nsec. The board takes approximate 35  $\mu$ s for estimating 10 parameters by using 41 known symbols. 32 bits fixed-point signal processing takes place. Compared with the processing of the RLS algorithm using DSP (Digital Signal Processor) [7] under the same conditions except for the symbol rate, the processing speed of the systolic array processor board is about 100 times as fast as using DSP.

#### Antenna Pattern and Bit Error Rate

An MMSE adaptive array antenna experiment was then conducted using the developed board and a complex baseband fading/array response simulator [7]. A 12



Figure 6 Obtained Antenna Pattern (Experimental Results)



Msymbol/sec quaternary phase-shift keying (QPSK) signal was transmitted over the simulator. Three path components arrived at the receiver: one desired and the others were taken as interference. The desired signal's direction of arrival (DOA) was set at 20°, and the two interference com-

ponents' DOAs  $40^{\circ}$  and  $60^{\circ}$ , respectively. It was assumed that an N-element (N=2, 4, 8) linear array antenna with a minimum element spacing of half the wavelength is used, and that no fading was present.

Figure 6 shows an example of the obtained beam pattern without noise, where the number of elements is 8. The gain toward the interference signal is about 15dB or more below the gain toward desired signal. As a result, the effect of interference signal can be significantly reduced.

Figure 7 shows measured bit error rate (BER) performance with the number N of the antenna elements as a parameter under the same condition of Fig.6, where E<sub>b</sub> denotes per-bit signal strength, and N<sub>o</sub> per-Hz noise power. For comparison, bit error rate obtained using DSP under the same conditions except for the symbol rate [7], and theoretical value of the bit error rate with 1-path static channel model are also plotted in Figure 7.

It is found that the performance results agree well with each other. Since in Figure 7 there are two interference sources with the same strength as the desired signal, they can be suppressed if  $N \ge 3$ . This can be observed in Figure 7.

# Conclusion

This paper has outlined a Systolic Array RLS processor ASIC developed primarily for broadband mobile communication applications. 32-bit fixed-point signal processing takes place in the developed chips. Results of MMSE adaptive array experiments conducted using the prototyped Systolic Array RLS processor board were then briefly presented. The board can estimate up to 10 parameters in 35  $\mu$ s approximately. It has been shown that the experimental results agree well with those of DSP-based experiments. Interference cancelling performance with prototyped systolic array processor board was then demonstrated through an MMSE adaptive array antenna experiment. It is also shown that the processing time of the systolic array processor board can be reduced to one hundredth of using DSP.

#### Reference

- K.Fukawa: "A Cascading Connection of Adaptive Array and MLSE Detector and its Performance", Technical report of IEICE, AP97-146, 1997.
- [2] S.Haykin: ADAPTIVE FILTER THEORY, Prentice-Hall, 1996.
- [3] S.Haykin, J.Litva and T.J.Shepherd: Radar Array Processing, Springer-Verlag, 1993.
- [4] Raymond J.Lackey, Herbert F.Baurle and John Barile:

- "Application-Specific Super Computer", Proc.SPIE, Real Time Signal Processing XI, Vol.977, pp.187-195, 1988.
- [5] H.Leung and S.Haykin: "Stability of Recursive QRD-LS Algorithms Using Finite-Precision Systolic Array Implementation", IEEE Trans.ASSP, Vol.37, No.5, pp.760-763, 1989.
- [6] Christopher R.Ward, Philip J.Hargrave and John G.McWhirter: "A Novel Algorithm and Architecture for Adaptive Digital Beamforming", IEEE Trans.AP, Vol.34, No.3, pp.338-346, 1986.
- [7] S.Tsukamoto, T.Saso, T.Sakaki, H.Yosihino and T.Matsumoto: "A Complex Baseband Fading Array Response Simulator", Technical report of IEICE, RCS98-206, 1999.