Lemberg, Jerry; Martelius, Mikko; Roverato, Enrico; Antonov, Yury; Nieminen, Tero; Stadius, Kari; Anttila, Lauri; Valkama, Mikko; Kosunen, Marko; Ryynänen, Jussi

A 1.5-1.9-GHz all-digital tri-phasing transmitter with an integrated multilevel class-D power amplifier achieving 100-MHz RF bandwidth

Published in:
IEEE Journal of Solid-State Circuits

DOI:
10.1109/JSSC.2019.2902753

Published: 01/06/2019

Please cite the original version:
A 1.5–1.9-GHz All-Digital Tri-Phasing Transmitter With an Integrated Multilevel Class-D Power Amplifier Achieving 100-MHz RF Bandwidth

Jerry Lemberg, Mikko Martelius, Student Member, IEEE, Enrico Roverato, Yury Antonov, Student Member, IEEE, Tero Nieminen, Kari Stadius, Member, IEEE, Lauri Anttila, Member, IEEE, Mikko Valkama, Senior Member, IEEE, Marko Kosunen, Member, IEEE, and Jussi Ryynänen, Senior Member, IEEE

Abstract—We present a prototype RF transmitter with an integrated multilevel class-D power amplifier (PA), implemented in 28-nm CMOS. The transmitter utilizes tri-phasing modulation, which combines three constant-envelope phase-modulated signals with coarse amplitude modulation in the PA. This new architecture achieves the back-off efficiency of multilevel outphasing, without linearity-degrading discontinuities in the RF output waveform. Because all signal processing is performed in the time domain up to the PA, the entire system is implemented with digital circuits and structures, thus also enabling the use of synthesis and place-and-route CAD tools for the RF front end. The effectiveness of the digital tri-phasing concept is supported by extensive measurement results. Improved wideband performance is validated through the transmission of orthogonal frequency-division multiplexing (OFDM) bandwidths up to 100 MHz. Enhanced reconfigurability is demonstrated with non-contiguous carrier aggregation and digital carrier generation between 1.5 and 1.9 GHz without a frequency synthesizer. For a 20-MHz 256-QAM OFDM signal at 3.5% error vector magnitude (EVM), the transmitter achieves 22.6-dBm output power and 14.6% PA efficiency. Thanks to the high linearity enabled by tri-phasing, no digital predistortion is needed for the PA.

Index Terms—All-digital transmitter, class-D power amplifier (PA), digital interpolating phase modulator (DIPM), multilevel outphasing, tri-phasing.

I. INTRODUCTION

THE continuous development of wireless communications poses formidable requirements on the design of radio transmitters and their power amplifiers (PAs). Generating RF signals with high peak-to-average power ratio (PAPR) makes it challenging for the transmitter to simultaneously achieve excellent linearity and high power efficiency. In addition, the increasing need for reconfigurability and compatibility with nanoscale CMOS technologies favors digital-intensive transmitter architectures, where automated design methods such as synthesis and place-and-route are also desirable for the RF front end [1]. One architecture with good potential to meet all these requirements is outphasing [2], which utilizes two constant-envelope phase-modulated signal components to produce both amplitude and phase modulations in the RF carrier. However, although outphasing transmitters enable the use of efficient switch-mode PAs, their efficiency quickly declines in power back off.

Previously, multilevel outphasing was proposed as a solution to the aforementioned problem [3], [4]. As shown in Fig. 1(a), multilevel outphasing introduces discrete amplitude levels to dynamically follow the envelope of the transmitted waveform, thus leading to enhanced PA efficiency. However, only a handful of CMOS implementations of multilevel outphasing PAs has been published [5]–[9]. Among these, the works in [5] and [6] demonstrated the efficiency improvement enabled by multilevel operation for high-PAPR signals, but they also reported distortion caused by amplitude-level transitions. The remaining works [7]–[9] achieved similar efficiency gains in continuous-wave (CW) measurements but did not disclose any time- or frequency-domain results with modulated signals. Moreover, to the best of the authors’ knowledge, fully integrated realizations of multilevel outphasing transmitters (including RF front end and PA) have not been published.

In our recent paper [10], we carried out a comprehensive theoretical analysis of the multilevel outphasing architecture, and concluded that amplitude-level transitions cause inherent linearity-degrading discontinuities in the RF output waveform. In order to eliminate this degradation without compromising efficiency, we developed the new transmitter architecture,
to the envelope of the transmitted waveform, whereas fine amplitude modulation within each discrete level is provided by varying the outphasing angle between the phase-modulated signals $S_1$ and $S_2$. As we explained in detail in [10], this architecture has two fundamental problems. The first is that amplitude-level transitions cause narrow pulses in at least one of the signal components $A \cdot S_1$ and $A \cdot S_2$, as both the coarse PA amplitude $A$ and the outphasing angle change instantly. These narrow pulses cannot be avoided and lead to distortion since they cannot be reproduced by the PA. The second problem concerns the overall shape of the combined RF waveform $V$ around amplitude-level transitions. As shown in Fig. 1(a), it is evident that the harmonic content of $V$ is significantly different before and after $A$ increases. This discontinuity generates a wideband spectral impurity that also affects the main signal band where it cannot be filtered.

Tri-phasing solves both the challenges without compromising efficiency. The basic concept is illustrated in Fig. 1(b). Similar to those in multilevel outphasing, the components $S_0$, $S_1$, and $S_2$ are generated with phase modulators, but in this case, only $S_0$ is further modulated by $A$ in the PA. The key idea behind tri-phasing is to match the coarse amplitude step of $A \cdot S_0$ with the full amplitude range of the outphasing pair $S_1 + S_2$. Hence, during an increasing (decreasing) amplitude-level transition, the jump in $A \cdot S_0$ caused by increasing (decreasing) $A$ by one level is perfectly compensated by a $0^\circ$-to-$180^\circ$ ($180^\circ$-to-$0^\circ$) change in the angle between $S_1$ and $S_2$. This approach makes the transition invisible in the combined RF output $V$, thus eliminating harmonic discontinuities. Furthermore, tri-phasing can also avoid the generation of narrow pulses, provided that each coarse amplitude transition is synchronized with the nearest rising or falling edge of $S_0$. Such a requirement is realized by utilizing digital interpolating phase modulators (DIPMs) [2], which enable precise control of the individual toggling instants of $S_0$, $S_1$, and $S_2$. Because the number of active PA units at any given output amplitude is equal between the two architectures shown in Fig. 1, tri-phasing maintains the back-off efficiency improvement of multilevel outphasing.

Mathematically, for the tri-phasing transmitter with four discrete amplitude levels considered in this paper, the signal composition is defined by

$$V(t) = A(t) \cdot S_0(t) + S_1(t) + S_2(t) \quad (1)$$

$$S_0(t) = \cos(\omega_c t + \phi(t)) \quad (2)$$

$$S_1(t) = \cos(\omega_c t + \phi(t) + \theta(t)) \quad (3)$$

$$S_2(t) = \cos(\omega_c t + \phi(t) - \theta(t)) \quad (4)$$

where $\omega_c$ is the angular carrier frequency, $\phi(t)$ is the baseband signal phase, and $\theta(t)$ is the outphasing angle. Substituting (2)–(4) into (1) yields

$$V(t) = [A(t) + 2 \cos(\theta(t))] \cdot \cos(\omega_c t + \phi(t)). \quad (5)$$

Ideally, the signal-component separator (SCS) must choose $A(t) \in \{0, 2, 4, 6\}$ and $\theta(t) \in [0^\circ, 90^\circ]$ such that $r(t)$ in (5) matches the baseband signal magnitude. Obviously, in the fabricated circuit, the actual discrete amplitude levels will

---

**Fig. 1.** Operating principle of digital-intensive transmitters based on (a) multilevel outphasing modulation and (b) tri-phasing modulation.
not be exactly proportional to \{0, 2, 4, 6\}. This is taken into account by the SCS, without modifying the basic dependence on \(\cos(\theta(t))\) in (5). Note that the complexity of the tri-phasing SCS is equivalent to that of multilevel outphasing [14], and thus the only hardware overhead of the new architecture is the additional phase modulator required to generate \(S_0\).

### III. Transmitter Front End

The top-level block diagram of the implemented transmitter is depicted in Fig. 2. At the core of the circuit, there are three phase modulators based on the DIPM concept [2], which are controlled by an on-chip DSP unit. A 6.8-GHz local oscillator (LO) input is divided into the 1.7-GHz system clock. This signal is delayed by the phase generator into 16 equally spaced phases, which are used for clocking the modulators. The generated polar component \(S_0\) drives three PA pairs of the integrated multilevel class-D PA, which can be individually turned on and off by the 3-bit amplitude signal \(A\). The outphasing components \(S_1\) and \(S_2\) drive the fourth always-active PA pair. The transmitter features the possibility to choose which of the PA pairs is operated as the outphasing pair and the order in which the remaining PA pairs are enabled. Thus, linearity near zero-amplitude can be optimized by selecting the outphasing PA pair with the least mismatch, indicated by the smallest minimum output power. Alternatively, the outphasing PA pair with the highest maximum output power could be selected to eliminate possible gaps in achievable output power levels. Due to the voltage-subtracting nature of the off-chip power combiner, one input to each PA pair is inverted. Adjustable buffers are inserted between phase modulators and PA to equalize the routing delays.

The discrete-time inputs of the tri-phasing DSP are the PA amplitude \(A[n]\), the outphasing angle \(\theta[n]\), and the upconverted polar phase \(\Phi[n]\) that is defined as

\[
\Phi[n] = 2\pi f_c T_s + \phi(n T_s)
\]

where \(f_c = \omega_c / 2\pi\) is the carrier frequency, and \(T_s\) is the sample period. These signals are computed offline and loaded into a 64k-word static RAM (SRAM), from where they are streamed at a sample rate \(F_s = 1/T_s\) equal to the 1.7-GHz system clock. Note that \(f_c\) is a variable in (6), since the DIPM supports the generation of carrier frequencies such that \(f_c \neq F_s\).

In the rest of this section, the main circuit blocks of the transmitter front end are described in detail, whereas the multilevel class-D PA is discussed in Section IV.

#### A. Phase Modulators

During the recent years, digital-intensive delay-based architectures have become a popular approach to implement RF phase modulators [1], [11], [15]–[19]. In this paper, the three phase modulators are based on the DIPM concept, which was originally published in [2] and successfully demonstrated in our previous transmitter prototype [12], [20]. The key innovation behind the DIPM is to use a digitally controlled delay to determine the toggling instants of the RF output waveform, rather than directly using the delayed signal as the output. The time-domain locations of the toggling instants, corresponding to the points where the phase function crosses any integer multiple of 180°, are calculated by the means of linear interpolation. The main benefits of this method stem from the resulting glitch-free RF waveform, as well as the inherent sinc² attenuation of digital sampling images. These features enable wider RF bandwidth and better spectral properties compared to typical sample-and-hold digital phase modulators.

1) DIPM Implementation: Fig. 3(a) shows the block diagram and operating principle of the DIPM. The four digital-to-time converters (DTCs) produce time-controlled pulses that determine the toggling instants of the pseudo-differential RF outputs \(S^+(t)\) and \(S^-(t)\). Each DTC covers 25% of \(T_s\), thus enabling up to four output transitions within a single period of the 1.7-GHz system clock. A set/reset (SR) latch is employed in the output stage, instead of the T flip-flop used in our previous work [2], [12], [20]. The SR latch enables the digital control of the direction (rising/falling) of the output transitions, in addition to their location. This technique ensures predictable high/low state of \(S^+(t)\) and \(S^-(t)\) across all three phase
modulators, thus avoiding the possibility of a deleterious 180° phase shift in any of \( S_0, S_1, \) or \( S_2 \).

The schematic of the \( i \)th DTC is detailed in Fig. 3(b). The overall target resolution of the DIPM is 10 bits. Therefore, for the \( n \)th sample period, the \( i \)th DTC must be capable of producing a pulse with 8-bit delay resolution during the time interval

\[
(n - 1 + \frac{i}{4}) T_s \leq t < (n - 1 + \frac{i + 1}{4}) T_s. \tag{7}
\]

Because of the large delay tuning range (147 ps with \( \approx 0.6 \) ps steps for each DTC), a segmented approach is adopted. A multiplexer controlled by the 2 MSBs of the delay control word \( k_i[n] \) initially selects one among four coarse phases of the 1.7-GHz system clock. The chosen phase is further delayed through a varactor-based digitally controlled delay line (DCDL) according to the 6 LSBs of \( k_i[n] \). A logical AND with the enable signal \( e_i[n] \) determines whether an output pulse is generated or not in (7). A pulse generator triggered by rising edges decreases the duty cycle of the DCDL output, thus avoiding overlap between the inputs of the combining OR gates shown in Fig. 3(a). Finally, a demultiplexer controlled by \( s_i[n] \) selects whether the produced pulse is sent to \( r_i(t) \) or \( f_i(t) \), thus triggering a rising or falling edge in the RF output, respectively. The DTC control inputs \( k_i[n], e_i[n], \) and \( s_i[n] \) are generated by the DSP circuit discussed in Section III-C.

2) Amplitude Synchronization: As explained in Section II, each coarse amplitude transition in the PA must be synchronized with the nearest rising or falling edge of \( S_0 \) in order to avoid the generation of narrow pulses. This task is performed as shown in Fig. 4. The synchronized 3-bit amplitude \( A(t) \) is obtained by resampling the discrete-time SCS output \( A[n] \) to one of the polar DTC pulses \( u_0(t), \ldots, u_3(t) \) [Fig. 3(b)]. The 2-bit signal \( Q[n] \), computed by the tri-phasing DSP, selects which of the four pulses is used for resampling. The challenge with this approach is that the union of \( u_0(t), \ldots, u_3(t) \) spans the whole sample period, thus leading to setup/hold timing violations when the resampling edge is very close to the rising edge of the system clock. To avoid this problem, \( A[n] \) is delayed by \( T_i/2 \) prior to resampling with \( u_2(t) \) or \( u_3(t) \), as indicated by the MSB of \( Q[n] \). This arrangement ensures sufficient timing margin against setup/hold violations for any resampling edge location within the sample period.

3) Calibration: Due to a number of practical reasons (e.g., DCDL range and linearity, routing delays, etc.), the rising/falling edge delays produced by the DIPM have a slight nonlinear dependence on the 10-bit delay control word [20]. This nonlinearity can be calibrated on startup with the lookup table (LUT) method illustrated in Fig. 5, which is similar to that used in [1] and [11]. First, the static transfer curve of the DIPM is obtained through an on-chip circuit that measures the delay between a chosen coarse phase and the modulator output by counting the pulses of an asynchronous clock signal during a given time. To calculate the delay, the number of pulses occurring within the measured delay is divided by the total number of pulses, similar to [21]. Second, a signed compensation signal \( c[n] \) is calculated off-chip such that, when added to the 10-bit input control word \( k'[n] \), it results in a compensated control word \( k[n] \) that causes the DIPM to produce the wanted delay \( u(t) \). An on-chip error LUT is populated with the computed values of \( c[n] \), and the entire process is repeated twice to independently calibrate rising and falling edge delays. During the normal signal transmission, the correct compensation value is indexed by \( k'[n] \) as well as the rise/fall selector \( s[n] \). Storing the compensation signal \( c[n] \) instead of the fully compensated control word \( k[n] \) allows to decrease the LUT wordlength from 10 to 7 bits, which reduces the required memory size by 30%.

For further delay fine-tuning, the adjustable buffers between the modulators and the PAs are used to compensate for delay mismatches between paths, as shown in Fig. 2. The optimal delays for the modulated signals \( \Sigma_i \) are discovered by minimizing the CW output power of the outphasing PA pair with a 90° outphasing angle and maximizing the output power of each polar PA pair. Delay offsets between PA pairs are then eliminated with similar measurements of two PA pairs at
a time. The amplitude-signal \((A)\) delays are set by minimizing the noise in the spectrum of a modulated signal.

### B. Phase Generator

The 16 clock phases used by the DIPMs are produced by a phase generator circuit, consisting of 16 identical all-digital programmable delay lines. Fig. 6 describes the implementation of this block. Each delay line is a cascade of 18 delay tuning elements, which contain only basic logic gates selected from the standard cell library. Delay tuning is performed by reconfiguring the clock path through a different combination of logic gates. The total tuning range of the delay lines is approximately 570 ps, while step resolution down to 1 ps is achieved by exploiting the propagation delay differences between buffers with different MOS gate lengths. On system startup, before calibrating the DIPMs, each delay line is tuned to match one of the 16 equally spaced phases of the 1.7-GHz system clock. This is achieved by iteratively measuring delays with the previously described on-chip circuit and adjusting the delay-line settings off-chip. Compared to using a delay-locked loop, the chosen phase generation approach results in lower clock jitter, since no closed-loop delay control is needed after initial calibration.

### C. Tri-Phasing DSP

The control inputs for the DIPMs (delay control words, edge enables, rise/fall selectors) are computed from \(A[n]\), \(\Phi[n]\), and \(\Omega[n]\) by an on-chip DSP unit, which implements the linear interpolation equations for tri-phasing defined in [10]. These are more complex than the original DIPM equations [2], since they include the additional two-stage interpolation required during amplitude-level transitions. Therefore, the DSP unit does not implement the iterative algorithm of our previous transmitter prototype [22] but uses a custom architecture featuring six hardware dividers to follow more closely the procedure described in [10].

The two-stage interpolation process performed by the tri-phasing DSP is illustrated graphically by the example of Fig. 7. First, the upconverted polar phase is linearly interpolated between two consecutive samples \(\Phi[n-1]\) and \(\Phi[n]\). The rising (falling) edges of \(S_0\) correspond to the points where the interpolated phase crosses even (odd) multiples of 180°. The amplitude-level increment is synchronized to the crossing of 0° by the circuit described in Section III-A, whereas the outphasing angle simultaneously switches from 0° to 90°. Consequently, the phases of \(S_1\) and \(S_2\) are interpolated to the same 0° crossing of \(S_0\) before \(A(t)\) increases and from a ±90° offset after that. The results of the overall process are the toggling instants of \(S_0\), \(S_1\), and \(S_2\) with 10-bit resolution over the clock period, which directly determine the control inputs for the corresponding DIPMs.

### IV. Multilevel Class-D Power Amplifier

The integrated multilevel PA [13] consists of eight class-D units that are directly connected to output pads. This arrangement potentially enables to use the CMOS chip as a driver for an external high-power GaN booster. Alternatively, the circuit can act as a standalone PA, and the chosen setup facilitates testing with off-chip power combiners of various topologies or with different center frequencies [23]. In this paper, all measurements are performed with the PA units connected to a coupled-line subtracting combiner implemented on PCB, with a center frequency of 1.8 GHz and 3-dB passband of approximately 400 MHz. Although the PA supports both voltage adding and subtracting combiners [24], the latter category has the benefit of substantially reducing voltage ripple at the on-chip supply and ground rails, thus leading to improved reliability of the entire system. In a finalized product, where versatile testing capabilities are not required, the power combiner could also be implemented with on-chip transformers [5], provided that the CMOS PA produces a sufficient output power.

The circuit schematic of a single class-D unit is shown in Fig. 8(a). The output stage utilizes a cascode structure with 1.8-V thick-oxide transistors, enabling a supply voltage of 3.6 V. The 0.9- and 2.7-V bias voltages are generated by on-chip inverters with feedback resistors, not included in the schematic. Tri-phasing operation requires that the output of each PA unit be switched on and off during transmission, according to how many bits of \(A(t)\) are asserted. Because a class-D PA operates approximately as a voltage source and power combination is performed in voltage mode, the OFF state means producing a constant output voltage. For this purpose, the circuit includes ON/OFF logic developed from the concept.
Fig. 8. (a) Simplified schematic and (b) operating principle of a single class-D PA unit with ON/OFF logic [13].

Presented in [25]. The operating principle of the ON/OFF logic is illustrated by the waveforms in Fig. 8(b). When A is high, the XOR gates generate identical output signals, which are reproduced by the NAND gates and propagated to the output. When A is low, the XOR gate outputs are inverse of each other, which leads to constantly high NAND gate output and low PA-unit output voltage. This solution enables generating the output-stage gate voltages in both ON and OFF states in a manner that allows constant bias voltages and quick switching between the states.

V. CHIP IMPLEMENTATION

The prototype transmitter was implemented in a 28-nm fully depleted silicon-on-insulator (FDSOI) CMOS process. In addition to the digital front end, the coarse phase generator and the three DIPMs were also designed with digital CAD tools, therefore leaving only the LO feed and RF output stages as full-custom analog designs. The 16 identical delay lines of the phase generator were described with a gate-level netlist using only standard cells. Some customizations of the place-and-route flow were enforced to better control the interconnection delay between tuning elements. Post-layout simulations were performed with analog tools.

The DIPMs were described using a combination of behavioral register-transfer level (RTL) and gate-level netlists, with the DCDL imported into the flow as a custom analog macro. Successful execution of the synthesis and place-and-route flows requires the correct definition of timing constraints, in part to ensure that the 16 coarse phases preserve their relative delays within the phase modulator. However, an even more critical aspect is that these signals also clock the modulator delay control data, which has tight timing margins of $T_s/4$ due to time-interleaved DTCs. Correct clocking enables up to four output transitions within one sample period, which ensures wideband modulation capability and correct operation when employing digital carrier generation. These implementation aspects have been discussed for the previous prototype in [20].

Final verification of operation and accuracy was done with back-annotated functional simulations, thus avoiding the need of a heavy, fully analog post-layout approach.

The micrograph of the fabricated chip is shown in Fig. 9(a). The total core area is 3.2 mm$^2$. Because this area is pad-limited, most of the empty space between circuit blocks is filled with bypass capacitors. For better protection against supply noise, the on-chip ground of the PA is separated from the ground of other circuits, and the grounds are only connected outside the chip. The large on-chip inductor is part of a wideband test output stage similar to that of [12], which is not used for this paper. Fig. 9(b) shows the test PCB with the implemented coupled-line power combiner. The CMOS chip (encapsulated by black epoxy) is wirebonded directly to the combiner input stripes. Therefore, the power-combiner losses are not de-embedded from the measurement results presented in this paper.

VI. MEASUREMENT RESULTS

Fig. 10 shows the measured rising edge delays at the output of each phase modulator, as a function of the 10-bit control word. After calibrating the 16 delay lines of the phase
Fig. 11. CW measurement results at $f_c = 1.7$ GHz. (a) PA efficiency as a function of output power. (b) Output power as a function of outphasing angle.

generator, the transfer curve exhibits visible discontinuities [Fig. 10(a)], because the DCDL characteristics do not match the slope and range required to cover exactly $T_s/16$. This residual nonlinearity is significantly reduced by applying the LUT correction method described in Section III-A. As shown in Fig. 10(b), most of the measured points are within 1 ps of the ideal transfer curve, which is consistent with the 10-bit resolution of the modulators. A few points with larger error are due to mismatches between the propagation delays of the 16 coarse phases to different modulators, creating gaps in the uncalibrated transfer curve that cannot be corrected with the LUT method. In a future implementation, this can be avoided by including individually calibrated delay-tuning elements for the coarse phases in each modulator, which enables eliminating the timing mismatches.

CW measurement results are reported in Fig. 11. All PA-efficiency figures are based on total power consumption in 1.8- and 3.6-V voltage domains as shown in Fig. 8(a). The power consumption in the 1.0-V supply used by parts of the ON/OFF logic is dominated by the phase modulators and thus not included. The peak output power is 29.7 dBm at 1.77 GHz, with a PA efficiency of 34.7%. At 1.7 GHz, these numbers are slightly reduced to 29.4 dBm and 32.4% due to the power-combiner frequency response. The measured noise of $-129$ dBc/Hz is limited by the noise floor of the measurement equipment. In a PA simulation with ideal 1.7-GHz input signals, the noise at 1.75 GHz is $-142$ dBc/Hz. The PA efficiency is depicted as a function of output power in Fig. 11(a), demonstrating that tri-phasing, similar to multilevel outphasing, achieves up to $3.9 \times$ higher back-off efficiency than single-level outphasing. 

Fig. 12 plots the time-domain RF output waveform for a 1-MHz baseband sinewave. The $y$-axis grid shown in Fig. 12 corresponds to the measured values of the coarse amplitude levels (with $\theta = 0^\circ$). Therefore, amplitude-level transitions occur each time, the envelope of the RF waveform crosses one of the $y$-axis grid lines. The zoomed-in view of a single amplitude-level transition reveals a small discontinuity, which can be attributed to the sudden change in supply current when one PA pair is turned on or off. Nevertheless, such discontinuity is much smaller than those reported in [6], proving that the proposed tri-phasing approach is effective in making discrete amplitude-level transitions more linear.

Fig. 13(a) displays the output spectrum of a 20-MHz 64-QAM OFDM signal at a carrier frequency of 1.7 GHz. The measured adjacent-channel leakage ratio (ACLR) is 36.6 dBc. Modulated output power and PA efficiency are strong functions of the amount of PAPR reduction performed in the baseband, which, in turn, is limited by the error vector magnitude (EVM) constraint set by the radio standard. Table I reports the results with various subcarrier modulations, measured at the EVM limits specified for 5G base stations [26]. For 64-QAM, the maximum output power and PA efficiency are 23.7 dBm and 16.4%, respectively. As shown in Fig. 13(b), the EVM stays below the 8% limit through 25 dB of digital scaling performed in the I/Q domain, before the SCS. The minimum EVM is lower than 2.3% without any DPD.

The smoother amplitude-level transitions in tri-phasing lead to improved linearity, which enables the generation of
modulated signals of wider bandwidth. Measurement results with 40- and 80-MHz bandwidths, formed by aggregating 20-MHz OFDM carriers, are presented in Fig. 14 and Table II. Furthermore, the output spectrum of a 100-MHz aggregated OFDM signal is shown in Fig. 15. The total output power is 19.0 dBm, and the EVM of each 20-MHz 64-QAM carrier component ranges from 3.7% to 4.3%. To the best of the authors’ knowledge, this is the widest reported RF bandwidth for a sub-6-GHz integrated CMOS transmitter at such high power levels.

Table III details the power consumption breakdown of the entire transmitter front end, measured during the transmission of the 100-MHz signal shown in Fig. 15. The most power-hungry block is the digital front end, which also includes the 64k-word data SRAM used to test the prototype. The power consumption of the SRAM alone is estimated to be above 50 mW from post-layout simulations.

One of the main motivations behind digital RF is the need of transceivers with increasing programmability. Two examples of flexibility enabled by the DIPM concept are presented in Fig. 16. First, the digital carrier generation is demonstrated in Fig. 16(a). This feature allows to generate a 20-MHz OFDM carrier at any center frequency between 0.2 and 2.5 GHz, without changing the 1.7-GHz input clock of the transmitter [12]. In this prototype, the range is limited to 1.5–1.9 GHz only by the passband of the external power combiner. Second, Fig. 16(b) shows non-contiguous aggregation of three 20-MHz OFDM carriers. No noise floor degradation is observed in the gap centered at 1.71 GHz, proving that sampling images are sufficiently attenuated by the sinc² response of the DIPM.

Fig. 17(a) shows the wide-span spectra of the 100-MHz signal from Fig. 15, and the 20-MHz signal centered at 1.6 GHz from Fig. 16(a). Both spectra reveal a small peak at 450 MHz, caused by the frequency response of the combiner, and a visible second harmonic around 2\(f_c\), which arises from duty-cycle mismatches. The second harmonic could be suppressed, for example, by using the pulsewidth correction technique adopted in [1]. Moreover, the signal centered at 1.6 GHz also contains a number of smaller spurs. These spurs, including the ones seen at 1.8 and 1.9 GHz in Fig. 17(b), result from clock signal coupling and nonlinear amplification in the
phase modulators. This causes intermodulation between the output signal at 1.6 GHz and the clock signal at 1.7 GHz, both of which also contain harmonic components. Nevertheless, even though no special care was taken for such issues in this prototype, all spurs are well below $-40$ dBc, except at 1.8 GHz. Furthermore, thanks to a combination of high sample rate and sinc$^2$ response of the DIPM, no sampling images are seen in the spectra, and the third harmonic of the carrier is attenuated by the power combiner to nearly below the noise floor.

Part of the noise near the signal band of any modulated signal arises from the previously mentioned intermodulation between the output signal and the clock. The wideband signal can be considered a combination of several single-frequency components, each of which is modulated by the clock signal and the resulting spurs accumulate as noise. To a lesser extent, a similar effect is also caused by the remaining static nonlinearity shown in Fig. 10(b), which creates unwanted phase modulation at any frequency except $F_c$. An additional major noise source is damped PA supply ringing after each amplitude-level transition, which is caused by the instantaneous change in current consumption and the LC network of bonding wires and supply capacitors [5]. In the spectrum, this appears as the bumps on both sides of the signal band, as shown in Fig. 17. It is noteworthy that all of these noise sources originate in implementation nonidealities, as tri-phasing eliminates the systematic glitches inherent in multilevel outphasing. As such, future development should focus on reducing these types of noise in order to reach the full potential of tri-phasing. Methods of achieving this could include improving the isolation between clock and output signals in the phase modulators and reducing PA supply ringing with flip-chip packaging or damping legs [6]. Test measurements and simulations illustrating these sources of noise are described in the Appendix.

In order to estimate the potential impact of tri-phasing after tackling the issues discussed above, we simulated the PA without any of the described nonidealities using a 20-MHz OFDM signal at 1.7-GHz carrier frequency, and compared the noise at 1.75 GHz. In the simulation with multilevel outphasing, the noise is 7.9 dB below the measured value of $-113$ dBc/Hz. Compared to the multilevel outphasing simulation, the noise in the tri-phasing simulation is improved by 9.8 dB.

Table IV shows the comparison of this paper to recent digital-intensive integrated CMOS transmitters that achieve at least 40-MHz RF bandwidth. Among the compared transmitters, our implementation features the largest peak and modulated output power, as well as the widest reported bandwidth of 100 MHz.

VII. Conclusion

This paper presented the implementation of an RF transmitter with integrated multilevel class-D PA, fabricated in 28-nm CMOS. The transmitter is based on a new tri-phasing architecture, which achieves the back-off efficiency improvement of multilevel outphasing without linearity-degrading discontinuities in the output waveform. Thanks to the enhanced linearity, the transmitter delivers up to 100-MHz modulated signal bandwidth and supports up to 256-QAM subcarrier modulation in 20-/40-MHz OFDM signals without the need for DPD. The digital-intensive implementation leads to higher transmitter reconfigurability and compatibility with nanoscale CMOS while also enabling the use of synthesis and place-and-route design tools for the RF front end. Therefore, this paper successfully exploits the advantages of digital RF to improve the performance of multilevel time-based transmitter architectures, thus paving their way into integration into future radio transceivers and system-on-chip solutions for modern wireless systems.

Appendix

Figs. 18 and 19 illustrate the nonidealities that were identified as sources of noise in Section VI. Fig. 18(a) depicts the measured spectrum of a CW signal at $(1024 - 16)/1024 \cdot F_c \approx 1.67$ GHz, showing spurs at several frequencies. To verify the origin of these spurs, Fig. 18(b) shows the result of a MATLAB model of intermodulation between output and clock signals occurring in the phase modulators, as described in Section VI. Comparison between these figures indicates that the intermodulation explains most of the measured spurs.
Simulated effect on the spectrum of an, otherwise, ideal 20-MHz OFDM domain magnitude of a downconverted test signal around one transition. The spectrum is depicted in Fig. 19(b). The shape of the resulting noise clearly resembles the measurement results in Fig. 17. MATLAB simulation of a 20-MHz OFDM signal, and the amplitude-level transition was included in an otherwise ideal MATLAB simulation of a 20-MHz OFDM signal, and the spectrum is depicted in Fig. 19(b). The shape of the resulting noise clearly resembles the measurement results in Fig. 17.

The effects of supply ringing at amplitude-level transitions are demonstrated in Fig. 19(a), showing the measured time-domain output amplitude around one transition, derived from a downconverted test signal. The effect of the transition can be approximated as amplitude modulation by a damped sinusoid with a frequency of 140 MHz. Such modulation at each amplitude-level transition was included in an otherwise ideal MATLAB simulation of a 20-MHz OFDM signal, and the spectrum is depicted in Fig. 19(b). The shape of the resulting noise clearly resembles the measurement results in Fig. 17.

REFERENCES


Mikko Martelius (S’16) was born in Laitila, Finland, in 1988. He received the B.Sc. and M.Sc. (Hons.) degrees in electrical engineering from Aalto University, Espoo, Finland, in 2012 and 2015, respectively, where he is currently pursuing the Ph.D. degree with the Department of Electronics and Nanoengineering.

His research interests include switch-mode power amplifiers and digital-intensive transmitters.

Enrico Roverato was born in Padua, Italy, in 1988. He received the B.Sc. degree (cum laude) in information engineering from the University of Padua, Padua, in 2010, and the M.Sc. and D.Sc. degrees (Hons.) in electrical engineering from Aalto University, Espoo, Finland, in 2012 and 2017, respectively. From 2012 to 2018, he was with the Department of Electronics and Nanoengineering, Aalto University, where he carried out research on high-speed DSP algorithms for all-digital RF transmitters. He is currently a Senior Digital IC Designer with CoreHW, Tampere, Finland. He is a Consultant with Huawei Technologies, Helsinki, Finland, where he is working on next-generation cellular transceivers.

Yury Antonov (S’15) received the Engineering degree in computer design and technology from Bauman Moscow State Technical University (BMSTU), Moscow, Russia, in 2007, and the M.Sc. degree (Hons.) in electrical engineering from Aalto University, Espoo, Finland, in 2014, where he is currently pursuing the Ph.D. degree.

From 2004 to 2010, he was a Circuit Designer with several research and development companies in Moscow, where he was developing CPLD/FPGA-based mixed-signal circuits for the synchronization of scalable radar-phased arrays and designing downconversion front ends for lidar-based measurement solutions. He is now with Aalto University, where he has worked on multiple successful tape-outs from 28 to 65 nm. His research interests are in multi-phase and spur-free frequency synthesis for RF/mm-wave front ends.

Mr. Antonov has served as a reviewer and co-reviewer for a number of international journals and conferences. He was a recipient of the Best Student Paper Award from the 22nd European Conference on Circuit Theory and Design.

Tero Nieminen received the M.Sc. degree in electrical engineering from the Helsinki University of Technology, Espoo, Finland, in 2007, and the Lic.Tech. and D.Sc. degrees in electrical engineering from Aalto University, Espoo, in 2012 and 2016, respectively.

From 2005 to 2016, he was with the Electronic Circuit Design Laboratory, Helsinki University of Technology. He was with Aalto University. Since 2017, he has been with CoreHW, Tampere, Finland, where he provides analog CMOS design at Huawei Technologies, Helsinki, Finland, and also involves in ADCs for receiver circuits. His research work concentrated on analog/mixed-signal CMOS circuits, including analog-to-digital converters (ADCs), digital-to-time converters, and serial input-output interfaces.

Kari Stadius (S’95–M’03) received the M.Sc. and D.Sc. (Hons.) degrees in electrical engineering from the Helsinki University of Technology, Espoo, Finland, in 1994, 1997, and 2010, respectively.

He is currently a Staff Scientist with the Department of Electronics and Nanoengineering, School of Electrical Engineering, Aalto University, Espoo. He has authored or co-authored more than 100 refereed journal and conference papers in analog and RF circuit design. His research interests include the design and analysis of RF transceiver blocks, especially on frequency synthesis.

Lauri Anttila (S’06–M’11) received the M.Sc. and D.Sc. (Hons.) degrees in electrical engineering from the Tampere University of Technology (TUT), Tampere, Finland, in 2004 and 2011, respectively. From 2016 to 2017, he was a Visiting Research Fellow with the Department of Electronics and Nanoengineering, Aalto University, Espoo, Finland. Since 2016, he has been a University Researcher with the Department of Electrical Engineering, TUT. He has co-authored more than 100 refereed articles and three book chapters. His research interests include signal processing for wireless communications, transmitter and receiver linearization, and radio implementation challenges in 5G cellular radio, full-duplex radio, and large-scale antenna systems.

Mikko Valkama (S’99–M’02–SM’15) received the B.Sc. and Ph.D. degrees (Hons.) in electrical engineering from the Tampere University of Technology, Tampere, Finland, in 2000 and 2001, respectively. He was a Visiting Post-Doctoral Research Fellow with the Communications Systems and Signal Processing Institute, San Diego State University (SDSU), San Diego, CA, USA, in 2003. He is currently a Full Professor and the Department Head of Electrical Engineering, Tampere University, Tampere. His research interests include radio communications, radio localization, and radio-based sensing, especially on 5G and beyond mobile radio networks.

Dr. Valkama received the Best Ph.D. Thesis Award from the Finnish Academy of Science and Letters for his dissertation entitled Advanced I/Q Signal Processing for Wideband Receivers: Models and Algorithms in 2002.

Marko Kosunen (S’97–M’07) received the M.Sc., Lic.Sc., and D.Sc. (Hons.) degrees from the Helsinki University of Technology, Espoo, Finland, in 1998, 2001, and 2006, respectively.

He is currently a Senior Researcher with the Department of Electronics and Nanoengineering, Aalto University, Espoo, where he is involved in the implementations of digital-intensive transceiver circuits and medical sensor electronics, especially on the implementation of the wireless transceiver DSP algorithms and communication circuits.

Jussi Ryynänen (S’99–M’04–SM’16) was born in Ilmajoki, Finland, in 1973. He received the M.Sc. and D.Sc. degrees in electrical engineering from the Helsinki University of Technology, Espoo, Finland, in 1998 and 2004, respectively.

He was the Head of the Department of Electronics and Nanoengineering, Aalto University, Espoo, in 2016, where he is currently a Professor. He has authored or co-authored more than 140 refereed journal and conference papers in analog and RF circuit design. He holds seven patents on RF circuits. His research interests are integrated transceiver circuits for wireless applications.

Prof. Ryynänen has served as a TPC Member for the European Solid-State Circuits Conference (ESSCIRC) and the IEEE International Solid-State Circuits Conference (ISSCC), and as a Guest Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS.