Why not loose the tail

The question: What is the necessity of the tail current source in the differential amplifier?

A naive approach leads us to think that: ultimately the result we want to achieve is to nullify the part that occurs due to the $V_{C M}$ part of the input, in the amplified signal. With a setup as shown above, it seems as if we can successfully achieve that by taking the difference $V_{o u t 1} - V_{o u t 2}$ as it seems to cancel the effect due to $V_{C M}$ due to it being the same on both the arms. Let's examine that through equations -

V_{o u t 1} = V_{D D} - I_{D 1} R_{D} $ $ A s s u m i n g e v e r y t h i n g i s b i a s e d i n s a t u r a t i o n $ $ I_{D 1} = k ((V_{C M} + x) - V_{t})^{2}

that makes

V_{o u t 1} = V_{D D} - k ((V_{C M} + x) - V_{t})^{2} R_{D}

Similarly,

V_{o u t 2} = V_{D D} - k ((V_{C M} - x) - V_{t})^{2} R_{D} $ $ w h e r e $ k = \frac{1}{2} μ n C_{o x} $ N o w, $ $ V_{o u t 1} - V_{o u t 2} = - 4 k x (V_{C M} - V_{t})

We see that the output taken in this way is linearly dependent on $V_{C M}$ which in the beginning may not make sense intuitively because we tend to think that: even though the MOSFETS on both the arms are non-linear devices due to the 'square' dependence of $I_{D}$ on $(V_{G S} - V_{t})$ , but as $V_{C M}$ is exactly the same on both the arms it would shift the DC operating point by the same amount either up or down. So the $I_{D}$ due to this $V_{C M}$ should be and will be the same through both the arms. And upon that $V_{C M}$ is where we give our differential input signals $+ x$ and $- x$ , assuming these are small signals, they should ideally experience the same $g_{m}$ about the DC operating point decided by $V_{C M}$ on both the arms. In case we wish not to assume the $g_{m}$ to be linearized in the proximity of $V_{C M}$ , I can argue that the non linearity of $g_{m}$ itself is exactly the same on both the arms and should be cancelled when taken the difference. Now taking the difference of the outputs of both the arms will result in cancelling of the voltage gain caused by $V_{C M}$ as $V_{o u t 1}$ and $V_{o u t 2}$ either drop or rise by the exact same amount due to changes in $V_{C M}$ . But the signal $+ x$ on $M 1$ will cause $V_{o u t 1}$ to rise by a certain amount and the signal $- x$ will cause $V_{o u t 2}$ to drop by the exact same amount. So, the only thing that should be visible in the difference of $V_{o u t 1}$ and $V_{o u t 2}$ is twice the change of the $V_{o u t}$ of each arm.

V_{o u t 1 | V_{C M}} = U

V_{o u t 2 | V_{C M}} = U

V_{o u t 1 | V_{C M}} - V_{o u t 2 | V_{C M}} = U - U = 0

Similarly,

V_{o u t 1 | V_{+ x}} = + D

V_{o u t 2 | V_{- x}} = - D

V_{o u t 1 | V_{+ x}} - V_{o u t 2 | V_{+ x}} = + D - (- D) = 2 D

Now adding the effects of both the signals, we get

V_{o u t 1 | V_{C M} + x} - V_{o u t 2 | V_{C M} - x} = 2 D

But this is in stark contrast with the result we previously obtained. What are we missing?

One might swiftly say that the property of Additivity does not apply to non-linear systems. Our attempt to split each input into $V_{C M}$ and $x$ , examining the output due to each of them and then adding those outputs together expecting that to be equal to the output when the input is the combination of both $V_{C M}$ and $x$ is outright wrong. And you may say that due this wrong assumption that we are erroneously arriving at the result that the effect of $V_{C M}$ is nullified when we do $V_{o u t 1} - V_{o u t 2}$ .

But that is only half the answer.

We applied the property of additivity only after linearizing the system in the proximity of $V_{C M}$ . In essence, $V_{C M}$ is same for both MOSFETS, hence in that instant of time, both are biased at the exact same point. For a small signal over $V_{C M}$ , the system looks linear. So, as we always do, we can absolutely add the small signal $v_{o u t}$ to the $V_{o u t | V_{C M}}$ . So this approach is correct and let us examine the error introduced due to our assumption of linearity in the small signal proximity of $V_{C M}$ in comparison to the actual value calculated using the large signal $I_{D}$ equation.

We shall first linearize the system about $V_{C M}$ and find the $g_{m}$ at that point.

\begin{aligned} g_{m | V_{C M}} & = μ n C_{o x} \frac{W}{L} (V_{C M} - V_{t}) \\ = 2 k (V_{C M} - V_{t}) \end{aligned}

The output due to $V_{C M}$ is

I_{D | V_{C M}} = k (V_{C M} - V_{t})^{2}

\begin{aligned} V_{o u t | V_{C M}} & = V_{D D} - I_{D | V_{C M}} R_{D} \\ = V_{D D} - k R_{D} (V_{C M} - V_{t})^{2} \end{aligned}

The output due to small signals $+ x$ and $- x$ is

\begin{aligned} v_{o u t 1} & = - g_{m} R_{D} x & = - 2 k R_{D} x (V_{C M} - V_{t}) \\ v_{o u t 2} & = g_{m} R_{D} x & = 2 k R_{D} x (V_{C M} - V_{t}) \end{aligned}

Adding the above two, we get the total outputs $V_{o u t 1}$ and $V_{o u t 2}$ .

\begin{aligned} V_{o u t 1} & = V_{o u t | C M} + v_{o u t 1} \\ V_{o u t 2} & = V_{o u t | C M} + v_{o u t 2} \\ V_{o u t 1} - V_{o u t 2} & = - 4 k x (V_{C M} - V_{t}) \end{aligned}

The small signal analysis neglects the non-linearity caused due to 'squaring' operation and assumes a constant $g_{m}$ all over the near surroundings of $V_{C M}$ . Shockingly enough, we see that the result obtained by assuming that the proximity of $V_{C M}$ is linear is equal to the result obtained without that assumption. This says that, indeed the non linearity due to the 'square' is getting cancelled inherently when we take differential output $V_{o u t 1} - V_{o u t 2}$ .
Let me reiterate our findings so far:

The non-linearity caused by the 'square' in the $I_{D}$ equation is inherently being cancelled when differential output is taken.
So, we can effectively use the property of additivity to add the outputs due to $V_{C M}$ and $x$ individually to arrive correctly at the output due to combined $V_{C M} + x$
Even though the non-linearity (that we were thinking was the cause of this problem) is nullified, the dependence of the differential output $V_{o u t 1} - V_{o u t 2}$ on $V_{C M}$ is still present.

The answer to this would seem obvious and trivial to anyone who is slightly more observant than I am.

For a given $V_{C M}$ the output is linear regardless of the model used (small or large). To reemphasize, the individual outputs $V_{o u t 1}$ and $V_{o u t 2}$ have not become linear, but it's their difference that has become linear with respect to $x$ and (devoid of any higher order terms).

At any given $V_{C M}$ the $V_{o u t}$ due to that particular $V_{C M}$ is exactly the same on both arms and gets cancelled. Only the $V_{o u t}$ due to the small signal (the differential input) remains, but the factor by which this input gets amplified in the output strongly depends upon $V_{C M}$ itself. In other words, $V_{o u t}$ has become linear for differential inputs (for large signals as well), but this linear amplification factor has a dependency on $V_{C M}$

The graph above shows $V_{o u t}$ vs $x$ for different $V_{C M}$ . It is clear that for any $V_{C M}$ the $V_{o u t}$ exclusively due to $V_{C M}$ ( $x = 0$ ) is $0$ . If the $V_{C M}$ changes with time the gain of the amplifier (slope of $V_{o u t}$ vs $x$ curve) also changes. The $V_{C M}$ that we were calling our bias point till now itself is one of the inputs, and as the gain of the system turns out to be dependent on the input, the system is non-linear. And neither is the $V_{C M}$ a bias point as it is a variable.

The non-linearity in $V_{o u t}$ is due to the varying $V_{C M}$ . To overcome this we should somehow nullify the effect due to $V_{C M}$ on the circuit. Connecting a constant tail current source will make the source terminals of both the MOSFETS into a floating net. $V_{C M}$ is external noise input that we cannot control, increasing or decreasing the $V_{C M}$ will adjust the source node's floating net voltage such that $V_{G S}$ remains constant to support half the current of the tail current source through each arm (as the circuit is all symmetric). We can picture it as the current is constant, any momentary increase in $V_{G S}$ cause a momentary rise in current which will accumulate charge (as all of it is not allowed through the constant current source) at the source net rising its voltage bringing it back to normal. Effectively, this has fixed our 'bias' point and removed the dependence of $V_{o u t}$ on $V_{C M}$ . Now the differential input $x$ is a perturbation over the $V_{G S}$ fixed by the tail current source rather than over $V_{C M}$ as in the earlier case. Can we then replace $V_{C M}$ with this $V_{G S_{b i a s}}$ in the $V_{o u t}$ expression?

V_{o u t} = - 4 k x (V_{G S_{b i a s}} - V_{t})

If this expression is correct then we will have successfully removed the dependence of $V_{o u t}$ on $V_{C M}$ and perfectly linearized the output of the differential amplifier. But is it correct?
The sum of currents through the two arms must be equal to $I_{S S}$ (tail current source).
When a differential input is applied:

\begin{aligned} I_{D 1} + I_{D 2} & = k (V_{G S_{b i a s}} + x - V_{t})^{2} + k (V_{G S_{b i a s}} - x - V_{t})^{2} \\ = 2 k ({(V_{G S_{b i a s}} - V_{t})}^{2} + 2 x^{2}) \end{aligned}

Likewise when no differential input is applied:

\begin{aligned} I_{b i a s} = I_{S S} & = 2 k (V_{G S_{b i a s}} - V_{t})^{2} \\ V_{G S_{b i a s}} & = \sqrt{\frac{I_{S S}}{2 k}} + V_{t} \end{aligned}

We see that there's a $4 k x^{2}$ term in excess to $I_{S S}$ when differential input is applied. That means we cannot replace $V_{C M}$ with our new 'bias' point $V_{G S_{b i a s}}$ . So to keep the total current $I_{D 1} + I_{D 2} = I_{S S}$ the $V_{G S}$ should adjust itself with the differential input $x$ . That means $V_{G S}$ of the two MOSFETS is not $V_{G S_{b i a s}} + x$ and $V_{G S_{b i a s}} - x$ but something different. The $x$ term will stay as it is an external input, but the commonly shared $V_{G S}$ should be varying to keep the total current constant.

As calling the commonly shared $V_{G S}$ as a bias is incorrect due to its varying nature we will call it $V_{c o m}$ . Solving for $V_{c o m}$ while keeping the constant total current constraint we get:

\begin{aligned} k (V_{c o m} + x - V_{t})^{2} + k (V_{c o m} - x - V_{t})^{2} & = I_{S S} \\ V_{c o m} = \sqrt{\frac{I_{S S}}{2 k} - x^{2}} + V_{t} \end{aligned}

We can replace the $V_{C M}$ in the previously derived $V_{o u t}$ with this expression of $V_{c o m}$ . This is done in the Analysis of Differential Amplifier

Without the differential input $x$ the we see that $V_{c o m | x = 0} = V_{G S_{b i a s}}$ as expected. But this common voltage is not constant, it shrinks with increasing differential input $x$ to keep the total current constant. Are we back to square one again? In the previous case without the tail current source $V_{c o m} = V_{C M}$ , variation in $V_{C M}$ is very huge as compared to the variation in $x$ (as it is often a small signal), but now, even though the variation in $V_{c o m}$ still exists, it doesn't depend on $V_{C M}$ anymore.

By having the constant tail current source, we have significantly reduced the variations in $V_{c o m}$ by changing its dependence from $V_{C M}$ to $x$ . Now $V_{o u t}$ only depends on $x$ not $V_{C M}$ , but it is still non-linear due to the $x^{2}$ term. For as long as $x$ is a small signal we can ignore the effect due to the $x^{2}$ term and approximate $V_{o u t}$ to be linear.