**Przeglądaj wersję html pliku:**

### Introduction to Linear Algebra (ang)

A (TERSE) INTRODUCTION TO

Linear Algebra

Yitzhak Katznelson

(DRAFT)

Contents

I

Vector spaces 1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Linear dependence, bases, and dimension . . . . . . . . . . . 1.3 Systems of linear equations. . . . . . . . . . . . . . . . . . . Linear operators and matrices 2.1 Linear Operators (maps, transformations) 2.2 Operator Multiplication . . . . . . . . . . 2.3 Matrix multiplication. . . . . . . . . . . 2.4 Matrices and operators. . . . . . . . . . 2.5 Kernel, range, nullity, and rank . . . . . . 2.6 Normed ﬁnite dimensional linear spaces .

1 1 9 14 23 23 27 28 31 35 39 43 43 47 51 51 53 56 58 61 65 65 69

II

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

III

Duality of vector spaces 3.1 Linear functionals . . . . . . . . . . . . . . . . . . . . . . . 3.2 The adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . Determinants 4.1 Permutations . . . . . . . 4.2 Multilinear maps . . . . . 4.3 Alternating n-forms . . . . 4.4 Determinant of an operator 4.5 Determinant of a matrix .

IV

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

V

Invariant subspaces 5.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . 5.2 The minimal polynomial . . . . . . . . . . . . . . . . . . . . iii

IV

L INEAR A LGEBRA

5.3 5.4 5.5 5.6 5.7 5.8 VI

Reducing. . . . . . . . . . Semisimple systems. . . . Nilpotent operators . . . . The cyclic decomposition . The Jordan canonical form Functions of an operator .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

75 81 83 86 88 92 95 95 102 104 105 109 111 112 115 115 118 121 126 129 137 137 138 139 142 144 147 153

Operators on inner-product spaces 6.1 Inner-product spaces . . . . . . . 6.2 Duality and the adjoint. . . . . 6.3 Unitary and orthogonal operators . 6.4 Self-adjoint operators . . . . . . . 6.5 Normal operators. . . . . . . . . 6.6 Positive operators. . . . . . . . . 6.7 Polar decomposition . . . . . . . Additional topics 7.1 Quadratic forms . . . . . . . . 7.2 Positive matrices . . . . . . . 7.3 Nonnegative matrices . . . . . 7.4 Stochastic matrices. . . . . . 7.5 Representation of ﬁnite groups

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

VII

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

A

Appendix A.1 Equivalence relations — partitions. A.2 Maps . . . . . . . . . . . . . . . A.3 Groups . . . . . . . . . . . . . . . A.4 Group actions . . . . . . . . . . . A.5 Fields, Rings, and Algebras . . . A.6 Polynomials . . . . . . . . . . . . Index

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

JANUARY 1, 2006

—DRAFT—

Chapter I

Vector spaces

1.1

VECTOR SPACES

The notions of group and ﬁeld are deﬁned in the Appendix: A.3 and A.5.1 respectively. The ﬁelds Q (of rational numbers), R (of real numbers), and C (of complex numbers) are familiar, and are the most commonly used. Most of the notions and results we discuss are valid for vector spaces over arbitrary underlying ﬁelds. When we do not need to specify the underlying ﬁeld we denote it by the generic F and refer to its elements as scalars. Results that require speciﬁc ﬁelds will be stated explicitly in terms of the appropriate ﬁeld. 1.1.1 D EFINITION : A vector space V over a ﬁeld F is an abelian group (the group operation written as addition) and a binary product (a, v) → av of F × V into V, satisfying the following conditions: v-s 1. 1 v = v v-s 2. a(bv) = (ab)v, v-s 3. (a + b)v = av + bv, a(v + u) = av + au.

A real vector space is a vector space over the ﬁeld R; A complex vector space is one over the ﬁeld C. Vector spaces may have additional geometric structure, such as inner product, which we study in Chapter VI, or additional algebraic structure, such as multiplication, which we just mention in passing. 1

2

L INEAR A LGEBRA

E XAMPLES : a. Fn , the space of all F-valued n-tuples (a1 , . . . , an ) with addition and scalar multiplication deﬁned by (a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn ) c(a1 , . . . , an ) = (ca1 , . . . , can ) If the underlying ﬁeld is R, resp. C, we denote the space Rn , resp. Cn . We write the n-tuples as rows, as we did here, or as columns. (We sometime write Fn resp. Fn when we want to specify that vectors are written as c r columns, resp. rows.) b. M(n, m; F), the space of all F-valued n × m matrices, that is, arrays a11 a21 A= . . . an1

a1m a2m . . ... . . . . anm

... ...

with entries form F. The addition and scalar multiplication are again done entry by entry. As a vector space M(n, m; F) is virtually identical with Fmn , except that we write the entries in the rectangular array instead of a row or a column. We write M(n; F) instead of M(n, n; F) and when the underlying ﬁeld is either assumed explicitly, or is arbitrary, we may write simply M(n, m) or M(n), as the case may be. c. F[x], the space1 of all polynomials an xn with coefﬁcients from F. Addition and multiplication by scalars are deﬁned formally either as the standard addition and multiplication of functions, or by adding (and multiplying by scalars) the corresponding coefﬁcients. The two ways deﬁne the same operations.

F[x] is an algebra over F, i.e., a vector space with an additional structure, multiplication. See A.5.2.

1

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

3

d. The set CR ([0, 1]) of all continuous real-valued functions f on [0, 1], and the set C([0, 1]) of all continuous complex-valued functions f on [0, 1], with the standard operations of addition and of multiplication of functions by a scalars. CR ([0, 1]) is a real vector space. C([0, 1]) is naturally a complex vector space, but becomes a real vector space if we limit the allowable scalars to real numbers only. e. The set C ∞ ([−1, 1]) of all inﬁnitely differentiable real-valued functions f on [−1, 1], with the standard operations on functions. f. The set TN of 2π-periodic trigonometric polynomials of degree ≤ N : the functions admitting a representation as a sum of the form |n|≤N an einx . Standard operations on functions. g. The set of functions f which satisfy the differential equation 3f (x) − sin xf (x) + 2f (x) = 0. Standard operations. 1.1.2 I SOMORPHISM . The expression “virtually identical” in the comparison, in Example b. above, of M(n, m; F) with Fmn , is not a proper mathematical term. The proper term here is isomorphic. D EFINITION : A map ϕ : V1 → V2 is called linear if, for all scalars a, b and vectors v1 , v2 ∈ V1 (1.1.1) ϕ(av1 + bv2 ) = aϕ(v1 ) + bϕ(v2 ).

Two vector spaces V1 and V2 over the same ﬁeld are isomorphic if there exist a bijective2 linear map ϕ : V1 → V2 .

2

That is ϕ maps V1 onto V2 and the map is 1 − 1 (and linear); see Appendix A.2.

—DRAFT—

JANUARY 1, 2006

4

L INEAR A LGEBRA

1.1.3 S UBSPACES . A (vector) subspace of a vector space V is a subset which is closed under the operations of addition and multiplication by scalars deﬁned in V. In other words, W ⊂ V is a subspace if a1 w1 + a2 w2 ∈ W for all scalars aj and vectors wj ∈ W. E XAMPLES : a. Solution-set of a system of homogeneous linear equations. Here V = Fn . Given the scalars aij , 1 ≤ i ≤ k, 1 ≤ j ≤ n we consider the solution-set of the system of k homogeneous linear equations

n

(1.1.2)

j=1

aij xj = 0,

i = 1, . . . , k.

This is the set of all n-tuples (x1 , . . . , xn ) ∈ Fn for which all k equations are satisﬁed. If both (x1 , . . . , xn ) and (y1 , . . . , yn ) are solutions of (1.1.2), and a and b are scalars, then for each i,

n n n

aij (axj + byj ) = a

j=1 j=1

aij xj + b

j=1

aij yj = 0.

It follows that the solution-set of (1.1.2) is a subspace of Fn . b. In the space C ∞ (R) of all inﬁnitely differentiable real-valued functions f on R with the standard operations, the set of functions f that satisfy the differential equation f (x) − 5f (x) + 2f (x) − f (x) = 0. Again, we can include, if we want, complex valued functions and allow, if we want, complex scalars. c. Subspaces of M(n): The set of diagonal matrices—the n × n matrices with zero entries off the diagonal (aij = 0 for i = j).

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

5

The set of (lower) triangular matrices—the n×n matrices with zero entries above the diagonal (aij = 0 for i < j). Similarly set of upper triangular matrices, (aij = 0 for i > j). d. Intersection of subspaces: If Wj are subspaces of a space V, then ∩Wj is a subspace of V. e. The sum3 of subspaces: Wj is deﬁned by Wj = { vj : vj ∈ Wj }.

f. The span of a subset: The span of a subset E ⊂ V, denoted span[E], is the set { aj ej : aj ∈ F, ej ∈ E} of all the ﬁnite linear combinations of elements of E. span[E] is a subspace; clearly the smallest subspace of V that contains E. 1.1.4 D IRECT SUMS . If V1 , . . . , Vk are vector spaces over F, the (formal) direct sum k Vj = V1 ⊕ · · · ⊕ Vk is the set {(v1 , . . . , vk ) : vj ∈ Vj } in which 1 we deﬁne addition: (v1 , . . . , vk ) + (u1 , . . . , uk ) = (v1 + u1 , . . . , vk + uk ), and multiplication by scalars: a(v1 , . . . , vk ) = (av1 , . . . , avk ). D EFINITION : The subspaces Wj , j = 1, . . . , k of a vector space V are independent if vj = 0 with vj ∈ Wj implies that vj = 0 for all j. Proposition. If Wj are subspaces of V, then the map Φ of W1 ⊕ · · · ⊕ Wk into W1 + · · · + Wk , deﬁned by Φ : (v1 , . . . , vk ) → v1 + · · · + vk , is an isomorphism if, and only if, the subspaces are independent.

Don’t confuse the sum of subspaces with the a subspace, see exercise I.1.5 below.

3

union of subspaces which is seldom

—DRAFT—

JANUARY 1, 2006

6

L INEAR A LGEBRA

P ROOF : Φ is clearly linear and surjective. To prove it injective we need to check that every vector in the range has a unique preimage, that is, to show that (1.1.3) vj , vj ∈ Wj and v1 + · · · + vk = v1 + · · · + vk

implies that vj = vj for every j. Subtracting and writing vj = vj − vj , (1.1.3) is equivalent to: vj = 0 with vj ∈ Wj , which implies that vj = 0 for all j.

Notice that Φ is the “natural” map of the formal direct sum onto the sum of subspaces of a given space. In view of the proposition we refer to the sum Wj of independent subspaces of a vector space as direct sum and write Wj instead of Wj . If V = U ⊕ W, we refer to either U or W as a complement of the other in V. 1.1.5 Q UOTIENT SPACES . A subspace W of a vector space V deﬁnes an equivalence relation4 in V: (1.1.4) x≡y (mod W) if x − y ∈ W.

In order to establish that this is indeed an equivalence relation we need to check that it is a. reﬂexive (clear, since x − x = 0 ∈ W), b. symmetric (clear, since if x − y ∈ W, then y − x = −(x − y) ∈ W), and c. transitive, (if x−y ∈ W and y−z ∈ W, then x−z = (x−y)+(y−z) ∈ W). The equivalence relation partitions V into cosets or “translates” of W, that is into sets of the form x + W = {v : v = x + w, w ∈ W}. So far we used only the group structure and not the fact that addition in V is commutative, nor the fact that we can multiply by scalars. This information will be used now. We deﬁne the quotient space V/W to be the space whose elements are the equivalence classes mod W in V, and whose vector space structure, addition and multiplication by scalars, is given by:

4

See Appendix A.1

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

7

if x = x + W and y = y + W are cosets, and a ∈ F, then ˜ ˜ (1.1.5) x+y =x+y+W =x+y ˜ ˜ and a˜ = ax. x

The deﬁnition needs justiﬁcation. We deﬁned the sum of two cosets by taking one element of each, adding them and taking the coset containing the sum as the sum of the cosets. We need to show that the result is well deﬁned, i.e., that it does not depend on the choice of the representatives in the cosets. In other words, we need to verify that if x ≡ x1 (mod W) and y ≡ y1 (mod W), then x + y ≡ x1 + y1 (mod W). But, x = x1 + w, y = y1 + w with w, w ∈ W implies that x + y = x1 + w + y1 + w = x1 + y1 + w + w , and, since w + w ∈ W we have x + y ≡ x1 + y1 (mod W). Notice that the “switch” w + y1 = y1 + w is justiﬁed by the commutativity of the addition in V. The deﬁnition of a˜ is justiﬁed similarly: assuming x ≡ x1 (mod W) x then ax − ax1 = a(x − x1 ) ∈ W, (since W is a subspace, closed under multiplication by scalars) and ax ≡ ax1 (mod W) . 1.1.6 T ENSOR PRODUCTS . Given vector spaces V and U over F, the set of all the (formal) sums aj vj ⊗ uj , where aj ∈ F, vj ∈ V and uj ∈ U; with (formal) addition and multiplication by elements of F, is a vector space over F. The tensor product V ⊗ U is, by deﬁnition, the quotient of this space by the subspace spanned by the elements of the form a. (1.1.6) b. c. (v1 + v2 ) ⊗ u − (v1 ⊗ u + v2 ⊗ u), v ⊗ (u1 + u2 ) − (v ⊗ u1 + v ⊗ u2 ), a (v ⊗ u) − (av) ⊗ u, (av) ⊗ u − v ⊗ (au),

for all v, vj ∈ V u, uj ∈ U and a ∈ F. In other words, V ⊗ U is the space of formal sums the the equivalence relation generated by: a. (1.1.7) b. c. (v1 + v2 ) ⊗ u ≡ v1 ⊗ u + v2 ⊗ u, v ⊗ (u1 + u2 ) ≡ v ⊗ u1 + v ⊗ u2 , a (v ⊗ u) ≡ (av) ⊗ u ≡ v ⊗ (au). —DRAFT—

aj vj ⊗ uj modulo

JANUARY 1, 2006

8

L INEAR A LGEBRA

Example. If V = F[x] and U = F[y], then p(x) ⊗ q(y) can be identiﬁed with the product p(x)q(y) and V ⊗ U with F[x, y]. EXERCISES FOR SECTION 1.1

I.1.1. Verify that R is a vector space over Q, and that C is a vector space over either Q or R. I.1.2. Verify that the intersection of subspaces is a subspace. I.1.3. Verify that the sum of subspaces is a subspace. I.1.4. Prove that M(n, m; F) and Fmn are isomorphic. I.1.5. Let U and W be proper subspaces of a vector space V, neither of them contains the other. Show that U ∪ W is not a subspace.

Hint: Take u ∈ U \ W, w ∈ W \ U and consider u + w.

∗I.1.6. If F is ﬁnite, n > 1, then Fn is a union of a ﬁnite number of lines. Assuming that F is inﬁnite, show that the union of a ﬁnite number of subspaces of V, none of which contains all others, is not a subspace.

Hint: Let Vj , j = 1, . . . , k be the subspaces in question. Show that there is no loss

in generality in assuming that their union spans V. Now you need to show that Vj is not all of V. Show that there is no loss of generality in assuming that V1 is not contained in the union of the others. Take v1 ∈ V1 \ j=1 Vj , and w ∈ V1 ; show that / av1 + w ∈ Vj , a ∈ F, for no more than k values of a. I.1.7. Let p > 1 be a positive integer. Recall that two integers, m, n are congruent (mod p), written n ≡ m (mod p), if n − m is divisible by p. This is an equivalence relation (see Appendix A.1). For m ∈ Z, denote by m the coset (equivalence class) of ˜ m, that is the set of all integers n such that n ≡ m (mod p). a. Every integer is congruent (mod p) to one of the numbers [0, 1, . . . , p − 1]. In other words, there is a 1 – 1 correspondence between Zp , the set of cosets (mod p), and the integers [0, 1, . . . , p − 1]. b. As in subsection 1.1.5 above, we deﬁne the quotient ring Zp = Z/(p) (both notations are common) as the space whose elements are the cosets (mod p) in Z, and deﬁne addition and multiplication by: m + n = (m + n) and m· n = m·n. ˜ ˜ ˜ ˜ Prove that the addition and multiplication so deﬁned are associative, commutative and satisfy the distributive law. JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

9

c. Prove that Zp , endowed with these operations, is a ﬁeld if, and only if, p is prime.

Hint: You may use the following fact: if p is a prime, and both n and m are not

divisible by p then nm is not divisible by p. Show that this implies that if n = 0 ˜ in Zp , then {˜ m : m ∈ Zp } covers all of Zp . n˜ ˜ 1.2 LINEAR DEPENDENCE, BASES, AND DIMENSION

Let V be a vector space. A linear combination of vectors v1 , . . . , vk is a sum of the form v = aj vj with scalar coefﬁcients aj . A linear combination is non-trivial if at least one of the coefﬁcients is not zero. 1.2.1 Recall that The span of a set A ⊂ V, denoted span[A], is the set of all vectors v that can be written as linear combinations of elements in A. D EFINITION : A set A ⊂ V is a spanning set if span[A] = V. 1.2.2 D EFINITION : A set A ⊂ V is linearly independent if for every sequence {v1 , . . . , vl } of distinct vectors in A, the only vanishing linear combination of the vj ’s is trivial; that is, if aj vj = 0 then aj = 0 for all j. If the set A is ﬁnite, we enumerate its elements as v1 , . . . , vm and write the elements in its span as aj vj . By deﬁnition, independence of A means that the representation of v = 0 is unique. Notice, however, that this implies that the representation of every vector in span[A] is unique, since l aj vj = l bj vj 1 1 implies l (aj − bj )vj = 0 so that aj = bj for all j. 1 1.2.3 A minimal spanning set is a spanning set such that no proper subset thereof is spanning. A maximal independent set is an independent set such that no set that contains it properly is independent. Lemma. a. A minimal spanning set is independent. b. A maximal independent set is spanning. P ROOF : a. Let A be a minimal spanning set. If aj vj = 0, with distinct vj ∈ A, and for some k, ak = 0, then vk = −a−1 j=k aj vj . This permits k the substitution of vk in any linear combination by the combination of the other —DRAFT—

JANUARY 1, 2006

10

L INEAR A LGEBRA

vj ’s, and shows that vk is redundant: the span of {vj : j = k} is the same as the original span, contradicting the minimality assumption. b. If B is independent and u ∈ span[B], then the union {u} ∪ B is inde/ pendent: assume otherwise, then there exists {v1 , . . . , vl } ⊂ B and coefﬁcients d and cj , not all zero, such that du + cj vj = 0. Assuming d = 0 implies u = −d−1 cj vj and u would be in span[v1 , . . . , vl ] ⊂ span[B], contradicting the assumption u ∈ span[B]; so d = 0. But now cj vj = 0 with some / non-vanishing coefﬁcients, contradicting the assumption that B is independent. It follows that if B is maximal independent, then u ∈ span[B] for every u ∈ V, and B is spanning. D EFINITION : A basis for V is an independent spanning set in V. Thus, {v1 , . . . , vn } is a basis for V if, and only if, every v ∈ V has a unique representation as a linear combination of {v1 , . . . , vn }, that is a representation (or expansion) of the form v = aj vj . By the lemma, a minimal spanning set is a basis, and a maximal independent set is a basis. A ﬁnite dimensional vector space is a vector space that has a ﬁnite basis. (See also Deﬁnition 1.2.4.) Theorem. If V is ﬁnite dimensional then: a. Every spanning set can be trimmed to a basis. b. Every independent set can be expanded to a basis. P ROOF : a. Let {vj }N be a spanning set for V. Call inessential a vector vl j=1 that is linearly dependent on {vj }l−1 , and essential otherwise. Observe that j=1 an inessential vl is linearly dependent on the essential vectors preceding it. Remove the inessential vectors. Since every vj is either essential or linearly dependent on the preceding essential vectors, the essential vectors span V and are independent, hence form a basis. b. Let {uj }k be independent, and let {ej }n be a basis for V. Write j=1 j=1 wj = uj for j = 1, . . . , k, and wk+j = ej for j = 1, . . . , n. The sequence {wj } contains the basis {ej } and is therefore spanning. Now remove, as in part a. the inessential vectors to obtain a basis, and observe that the ﬁrst k vectors, namely {uj }k are all essential, and hence form part of the basis. j=1

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

11

E XAMPLES : a. In Fn we write ej for the vector whose j’th entry is equal to 1 and all the other entries are zero.{e1 , . . . , en } is a basis for Fn , and the unique

a1

. representation of v = . in terms of this basis is v = .

an

aj ej . We refer

to {e1 , . . . , en } as the standard basis for Fn . b. The standard basis for M(n, m): let eij denote the n × m matrix whose ij’th entry is 1and all the other zero. {eij } is a basis for M(n, m), and

a11

a21 ..

... ...

. an1

... ...

a1m a2m . . . anm

=

aij eij is the expansion.

c. The space F[x] is not ﬁnite dimensional. The inﬁnite sequence {xn }∞ n=0 is linearly independent, in fact a basis, and, as we see in the following subsection, it cannot have a ﬁnite basis. 1.2.4 S TEINITZ ’ LEMMA AND THE DEFINITION OF DIMENSION .

Lemma (Steinitz). Assume span[v1 , . . . , vn ] = V and {u1 , . . . , um } linearly independent in V. Claim: the vectors vj can be (re)ordered so that, for every k = 1, . . . , m, the sequence {u1 , . . . , uk , vk+1 , . . . , vn } spans V. In particular, m ≤ n. P ROOF : Write u1 = aj vj , possible since span[v1 , . . . , vn ] = V. Reorder the vj s, if necessary, to guarantee that a1 = 0. Now v1 = a−1 (u1 − n aj vj ), which means that span[u1 , v2 , . . . , vn ] j=2 1 contains every vj and hence is equal to V. Continue recursively: assume that, having reoredered the vj ’s if necessary, we have {u1 , . . . , uk , vk+1 , . . . , vn } spans V. Observe that unless k = m, we have k < n (since uk+1 is not in the span of {u1 , . . . , uk } at least one additional v is needed). If k = m we are done. If k < m we write uk+1 = k aj uj + n j=1 j=k+1 bj vj , and since {u1 , . . . , um } is linearly independent, at least one of the coefﬁcients bj is not zero. Reordering the remaining vj ’s if necessary, we may assume that bk+1 = 0 and obtain, —DRAFT—

JANUARY 1, 2006

12

L INEAR A LGEBRA

as before, that vk+1 ∈ span[u1 , . . . , uk+1 , vk+2 , . . . , vn ], and, once again, the span is V. Repeating the step (a total of) m times proves the claim of the lemma. Theorem. If {v1 , . . . , vn } and {u1 , . . . , um } are both bases, then m = n. P ROOF : Since {v1 , . . . , vn } is spanning and {u1 , . . . , um } independent we have m ≤ n. Reversing the roles we have n ≤ m. Steinitz’ lemma is a reﬁnement of part b. of theorem 1.2.3: in a ﬁnite dimensional vector space, every independent set can be expanded to a basis by adding, if necessary, elements from any given spanning set. The additional information here, that any spanning set has at least as many elements as any independent set, that is the basis for the current theorem, is what enables the deﬁnition of dimension. D EFINITION : A vector space V is ﬁnite dimensional if it has a ﬁnite basis. The dimension, dim V is the number of elements in any basis for V. (Well deﬁned since all bases have the same cardinality.) As you are asked to check in Exercise I.2.9 below, a subspace W of a ﬁnite dimensional space V is ﬁnite dimensional and, unless W = V, the dimension dim W of W is strictly lower than dim V. The codimension of a subspace W in V is, by deﬁnition, dim V − dim W. 1.2.5 The following observation is sometimes useful.

Proposition. Let U and W be subspaces of an n dimensional space V, and assume that dim U + dim W > n. Then U ∩ W = {0}. P ROOF : Let {uj }l be a basis for U and {wj }m be a basis for W. Since j=1 j=1 l + m > n the set {uj }l ∪ {wj }m is linearly dependent, i.e., the exist j=1 j=1 a nontrivial vanishing linear combination cj uj + dj wj = 0. If all the coefﬁcients cj were zero, we would have a vanishing nontrivial combination of the basis elements {wj }m , which is ruled out. Similarly not all the dj ’s j=1 vanish. We now have the nontrivial cj uj = − dj wj in U ∩ W.

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

13

EXERCISES FOR SECTION 1.2

I.2.1. The set {vj : 1 ≤ j ≤ k} is linearly dependent if, and only if, v1 = 0 or there exists l ∈ [2, k] such that vl is a linear combination of vectors in {vj : 1 ≤ j ≤ l − 1}. I.2.2. Let V be a vector space, W ⊂ V a subspace. Let v, u ∈ V \ W, and assume that u ∈ span[W, v]. Prove that v ∈ span[W, u]. I.2.3. What is the dimension of C5 considered as a vector space over R? I.2.4. Is R ﬁnite dimensional over Q? I.2.5. Is C ﬁnite dimensional over R? I.2.6. Check that for every A ⊂ V, span[A] is a subspace of V, and is the smallest subspace containing A. I.2.7. Let U, W be subspaces of a vector space V, and assume U ∩W = {0}. Assume that {u1 , . . . , uk } ⊂ U and {w1 , . . . , wl } ⊂ W are (each) linearly independent. Prove that {u1 , . . . , uk } ∪ {w1 , . . . , wl } is linearly independent. I.2.8. Prove that the subspaces Wj ⊂ V, j = 1, . . . , N are independent (see Deﬁnition 1.1.4) if, and only if, Wj ∩ l=j Wl = {0} for all j. I.2.9. Let V be ﬁnite dimensional. Prove that every subspace W ⊂ V is ﬁnite dimensional, and that dim W ≤ dim V with equality only if W = V. I.2.10. If V is ﬁnite dimensional, every subspace W ⊂ V is a direct summand. ∗I.2.11. Assume that V is n-dimensional vector space over an inﬁnite F. Let {Wj } be a ﬁnite collection of distinct m-dimensional subspaces. a. Prove that no Wj is contained in the union of the others. b. Prove that there is a subspace U ⊂ V which is a complement of every Wj . Hint: See exercise I.1.6. I.2.12. Let V and W be ﬁnite dimensional subspaces of a vector space. Prove that V + W and V ∩ W are ﬁnite dimensional and that (1.2.1) dim(V ∩ W) + dim(V + W) = dim V + dim W.

I.2.13. If Wj , j = 1, . . . , k, are ﬁnite dimensional subspaces of a vector space V then Wj is ﬁnite dimensional and dim Wj ≤ dim Wj , with equality if, and only if, the subspaces Wj are independent. I.2.14. Let V be an n-dimensional vector space, and let V1 ⊂ V be a subspace of dimension m.

—DRAFT—

JANUARY 1, 2006

14

L INEAR A LGEBRA a. Prove that V/V1 —the quotient space—is ﬁnite dimensional.

b. Let {v1 , . . . , vm } be a basis for V1 and let {w1 , . . . , wk } be a basis for V/V1 . ˜ ˜ For j ∈ [1, k], let wj be an element of the coset wj . ˜ Prove: {v1 , . . . , vm } ∪ {w1 , . . . , wk } is a basis for V. Hence k + m = n. I.2.15. Let V be a real vector space. Let rl = (al,1 , . . . , al,p ) ∈ Rp , 1 ≤ l ≤ s be linearly independent. Let v1 , . . . , vp ∈ V be linearly independent. Prove that the p vectors ul = 1 al,j vj , l = 1, . . . , s, are linearly independent in V. I.2.16. Let V and U be ﬁnite dimensional spaces over F. Prove that the tensor product V ⊗ U is ﬁnite dimensional. Speciﬁcally, show that if {ej }n and {fk }m are bases j=1 k=1 for V and U, then {ej ⊗ fk }, 1 ≤ j ≤ n, 1 ≤ k ≤ m, is a basis for V ⊗ U, so that dim V ⊗ U = dim V dim U. ∗I.2.17. Assume that any three of the ﬁve R3 vectors vj = (xj , yj , zj ), j = 1, . . . , 5, are linearly independent. Prove that the vectors

2 2 wj = (x2 , yj , zj , xj yj , xj zj , yj zj ) j

are linearly independent in R6 .

Hint:

Find non-zero (a, b, c) such that axj + byj + czj = 0 for j = 1, 2. Find non-zero (d, e, f ) such that dxj + eyj + f zj = 0 for j = 3, 4. Observe (and use) the fact (ax5 + by5 + cz5 )(dx5 + ey5 + f z5 ) = 0

1.3

SYSTEMS OF LINEAR EQUATIONS.

How do we ﬁnd out if a set {vj }, j = 1, . . . , m of vectors in Fn is linearly c dependent? How do we ﬁnd out if a vector u belongs to span[v1..., vm ]?

Given the vectors vj =

a1j . . , . anj

c1

. j = 1, . . . , m, and and u = . , we .

cn

express the conditions xj vj = 0 for the ﬁrst question, and the second, in terms of the coordinates.

JANUARY 1, 2006

xj vj = u for

—DRAFT—

I.

V ECTOR SPACES

15

For the ﬁrst we obtain the system of homogeneous linear equations: a11 x1 + . . . a21 x1 + . . . . . . an1 x1 + . . . or,

m

(1.3.1)

+ a1m xm + a2m xm . . .

=0 =0

+ anm xm = 0

(1.3.2)

j=1

aij xj = 0,

i = 1, . . . , n.

For the second question we obtain the non-homogeneous system:

m

(1.3.3)

j=1

aij xj = ci ,

i = 1, . . . , n.

We need to determine if the solution-set of (1.3.2), namely the set of all m-tuples (x1 , . . . , xm ) ∈ Fm for which all n equations hold, is trivial or not, i.e., if there are solutions other than (0, . . . , 0). For (1.3.3) we need to know if the solution-set is empty or not. In both cases we would like to identify the solution set as completely and as explicitely as possible. 1.3.1 Conversely, given the system (1.3.2) we can rewrite it as

a11 a1m . . x1 . + · · · + xm . . . an1 anm

(1.3.4)

=0

Our ﬁrst result depends only on dimension. The m vectors in (1.3.4) are elements of the n-dimensional space Fn . If m > n, any m vectors in Fn are dec c pendent, and since we have a nontrivial solution if, and only if, these columns are dependent, the system has nontrivial solution. This proves the following theorem. Theorem. A system of n homogeneous linear equations in m > n unknowns has nontrivial solutions. —DRAFT—

JANUARY 1, 2006

16

L INEAR A LGEBRA

Similarly, rewriting (1.3.3) in the form

a1m a11 . . . + · · · + xm . x1 . . anm an1

c1

(1.3.5)

. = . , .

cn

it is clear that the system given by (1.3.3) has a solution if, and only if, the

column

c1 . . . cn

is in the span of columns

a1j . . , . anj

j ∈ [1, m].

1.3.2 The classical approach to solving systems of linear equations is the Gaussian elimination— an algorithm for replacing the given system by an equivalent system that can be solved easily. We need some terminology: D EFINITION : The systems

m

(A) (1.3.6) (B)

j=1 j=1 m

aij xj = ci , bij xj = di ,

i = 1, . . . , k. i = 1, . . . , l.

are equivalent if they have the same solution-set (in Fm ). The matrices a11 . . . a1m a21 . . . a2m . A= . . . . ... . ak1 . . . akm

and

Aaug

a11 . . . a1m c1 a21 . . . a2m c2 . = . . . . . . ... . . ak1 . . . akm ck

are called the matrix and the augmented matrix of the system (A). The augmented matrix is obtained from the matrix by adding, as additional column, the column of the values, that is, the right-hand side of the respective equations. The augmented matrix contains all the information of the system (A). Any k × (m + 1) matrix is the augmented matrix of a system of linear equations in m unknowns.

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

17

1.3.3

ROW EQUIVALENCE OF MATRICES . The matrices

D EFINITION :

(1.3.7)

a11 . . . a1m a21 . . . a2m . . . . . ... . ak1 . . . akm

and

b11 . . . b1m b21 . . . b2m . . . . . ... . bl1 . . . blm

are row equivalent if their rows span the same subspace of Fm ; equivalently: r if each row of either matrix is a linear combination of the rows of the other. Proposition. Two systems of linear equations in m unknowns

m

(A)

j=1 m

aij xj = ci , bij xj = di ,

j=1

i = 1, . . . , k. i = 1, . . . , l.

(B)

are equivalent if their respective augmented matrices are row equivalent. P ROOF : Assume that the augmented matrices are row equivalent. If (x1 , . . . , xm ) is a solution for system (A) and (bi1 , . . . , bim , di ) = then

m

αi,k (ak1 , . . . , akm , ck )

bij xj =

j=1 k,j

αi,k akj xj =

k

αi,k ck = di

and (x1 , . . . , xm ) is a solution for system (B). D EFINITION : The row rank of a matrix A ∈ M(k, m) is the dimension of the span of its rows in Fm . Row equivalent matrices clearly have the same rank. —DRAFT—

JANUARY 1, 2006

18

L INEAR A LGEBRA

1.3.4 R EDUCTION TO row echelon FORM . The classical method of solving systems of linear equations, homogeneous or not, is the Gaussian elimination. It is an algorithm to replace the system at hand by an equivalent system that is easier to solve. a11 . . . a1m a21 . . . a2m . D EFINITION : A matrix A = . is in row echelon form if the . . . ... . ak1 . . . following conditions are satisﬁed akm

ref–1 The ﬁrst q rows of A are linearly independent in Fm , the remaining k−q rows are zero. ref–2 There are integers 1 ≤ l1 < l2 < · · · < lq ≤ m such that for j ≤ q, the ﬁrst nonzero entry in the j’th row is 1, occuring in the lj ’th column. ref–3 The entry 1 in row j is the only nonzero entry in the lj column.

One can rephrase the last three conditions as: The lj ’th columns (the “main” columns) are the ﬁrst q elements of the standard basis of Fk , and every other c column is a linear combination of the “main” columns that precede it. Theorem. Every matrix is row equivalent to a matrix in row-echelon form. P ROOF : If A = 0 there’s nothing to prove. Assuming A = 0, we describe an algorithm to reduce A to row-echelon form. The operations performed on the matrix are: a. Reordering (i.e., permuting) the rows, b. Multiplying a row by a non-zero constant, c. Adding a multiple of one row to another. These operations do not change the span of the rows so that the equivalence class of the matrix is maintained. (We shall return later, in Exercise II.3.10, to express these operations as matrix multiplications.) Let l1 be the index of the ﬁrst column that is not zero. Reorder the rows so that a1,l1 = 0, and multiply the ﬁrst row by a−11 . 1,l Subtract from the j’th row, j = 1, the ﬁrst row multiplied by aj,l1 .

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

19

Now all the columns before l1 are zero and column l1 has 1 in the ﬁrst row, and zero elswhere. Denote its row rank of A by q. If q = 1 all the entries below the ﬁrst row are now zero and we are done. Otherwise let l2 be the index of the ﬁrst column that has a nonzero entry in a row beyond the ﬁrst. Notice that l2 > l1 . Keep the ﬁrst row in its place, reorder the remaining rows so that a2,l2 = 0, and multiply the second row5 by a−12 . 2,l Subtruct from the j’th row, j = 2, the second row multiplied by aj,l2 . Repeat the sequence a total of q times. The ﬁrst q rows, r1 , . . . , rq , are (now) independent: a combination cj rj has entry cj in the lj ’th place, and can be zero only if cj = 0 for all j. If there is a nonzero entry beyond the current q’th row, necessarily beyond the lq ’th column, we could continue and get a row independent of the ﬁrst q, contradicting the deﬁnition of q. Thus, after q steps, all the rows beyond the q’th are zero. Observe that the scalars used in the process belong to the smallest ﬁeld that contains all the coefﬁcients of A. 1.3.5 If A and Aaug are the matrix and the augmented matrix of a system (A) and we apply the algorithm of the previous subsection to both, we observe that since the augmented matrix has the additional column on the right hand side, the ﬁrst q (the row rank of A) steps in the algorithm for either A or Aaug are identical. Having done q repetitions, A is reduced to row-echelon form, while Aaug may or may not be. If the row rank of Aaug is q, then the algorithm for Aaug ends as well; otherwise we have lq+1 = m + 1, and the row-echelon form for the augmented matrix is the same as that of A but with an added row and an added “main” column, both having 0 for all but the last entries, and 1 for the last entry. In the latter case, the system corresponding to the row-reduced augmented matrix has as it last equation 0 = 1 and the system has no solutions. On the other hand, if the row rank of the augmented matrix is the same as that of A, the row-echelon form of the augmented matrix is an augmentation of

5

We keep referring to the entries of the successively modiﬁed matrix as aij .

—DRAFT—

JANUARY 1, 2006

20

L INEAR A LGEBRA

the row-echelon form of A. In this case we can assign arbitrary values to the variables xi , i = lj , j = 1, . . . , q, move the corresponding terms to the right hand side and, writing Cj for the sum, we obtain (1.3.8) xlj = Cj , j = 1, . . . , q.

Theorem. A necessary and sufﬁcient condition for the system (A) to have solutions is that the row rank of the augmented matrix be equal to that of the matrix of the system. The discussion preceding the statement of the theorem not only proves the theorem but offers a concrete way to solve the system. The unknowns are now split into two groups, q “main” ones and m − q “secondary”. We have “m − q degrees of freedom”: the m − q secondary unknowns become free parameters that can be assigned arbitrary values, and these values determine the “main” unknowns uniquely. Remark: Notice that the split into “main” and “secondary” unknowns depends on the speciﬁc deﬁnition of “row–echelon form”; counting the columns in a different order may result in a different split, though the number q of “main” variables would be the same—the row rank of A. Corollary. A linear system of n equations in n unknowns with matrix A has solutions for all augmented matrices if, and only if, the only solution of the corresponding homogeneous system is the trivial solution. P ROOF : The condition on the homogeneous system amounts to “the rows of A are independent”, and no added columns can increase the row rank. 1.3.6 D EFINITION : The column rank of a matrix A ∈ M(k, m) is the dimension of the span of its columns in Fk . c Linear relations between columns of A are solutions of the homogeneous system given by A. If B is row-equivalent to A, the columns of A and B have the same set of linear relations, (see Proposition 1.3.3). In particular, if B is in row-echelon form and {lj }q are the indices of the “main” columns in B, then j=1 the lj ’th columns in A, j = 1, . . . , q, are independent, and every other column is a linear combination of these.

JANUARY 1, 2006

—DRAFT—

I.

V ECTOR SPACES

21

It follows that the column rank of A is equal to its row rank. We shall refer to the common value simply as the rank of A. EXERCISES FOR SECTION 1.3

I.3.1. Identify the matrix A ∈ M(n) of row rank n that is in row echelon form. I.3.2. A system of linear equations with rational coefﬁcients, that has a solution in C, has a solution in Q. Equivalently, vectors in Qn that are linearly dependent over C, are rationally dependent.

Hint: The last sentence of Subsection 1.3.4.

I.3.3. A system of linear equations with rational coefﬁcients, has the same number of “degrees of freedom” over Q as it does over C. I.3.4. An aﬃne subspace of a vector space is a translate of a subspace, that is a set of the form v0 + V0 = {v0 + v : v ∈ V0 }, where v0 is a ﬁxed vector and V0 ⊂ V is a subspace. (Thus a line in V is a translate of a one-dimensional subspace.) Prove that a set A ⊂ V is an afﬁne subspace if, and only if, choices of u1 , . . . , uk ∈ A, and scalars aj , j = 1, . . . , k such that aj uj ∈ A for all aj = 1.

I.3.5. If A ⊂ V is an afﬁne subspace and u0 ∈ A, then A − u0 = {u − u0 : u ∈ A} is a subspace of V. Moreover, the subspace A − u0 , the “corresponding subspace” does not depend on the choice of u0 . I.3.6. The solution set of a system of k linear equations in m unknowns is an afﬁne subspace of Fm . The solution set of the corresponding homogeneous system is the “corresponding subspace”. a11 . . . a1m a1j a21 . . . a2m . I.3.7. Consider the matrix A = . . and its columns vj = . . . . . . ... . ak1 . . . akm Prove that a column vi end up as a “main column” in the row echelon form of A if, and only if, it is linearly independent of the columns vj , j < i. b11 . . . b1m b21 . . . b2m I.3.8. (continuation) Denote by B = . . the matrix in row echelon . . . ... . bk1 . . . bkm form obtained from A by the algorithm described above. Let l1 < l2 , . . . be the indices

akj

—DRAFT—

JANUARY 1, 2006

22

L INEAR A LGEBRA

of the main columns in B and i the index of another column. Prove (1.3.9) vi =

lj <i

bji vlj .

I.3.9. What is the row echelon form of the 7 × 6 matrix A, if its columns Cj , j = 1, . . . , 6 satisfy the following conditions: a. C1 = 0; b. C2 = 3C1 ; c. C3 is not a (scalar) multiple of C1 ; d. C4 = C1 + 2C2 + 3C3 ; e. C5 = 6C3 ; f. C6 is not in the span of C2 and C3 .

j j j I.3.10. Given polynomials P1 = 0 aj x , P2 = 0 bj x , S = 0 sj x of degrees n, m, and l < n + m respectively, we want to ﬁnd polynomials q1 = m−1 n−1 cj xj , and q2 = 0 dj xj , such that 0 n m l

(1.3.10)

P1 q1 + P2 q2 = S.

Reduce the polynomial equation (1.3.10) to a system of linear equations, the unknown being the coefﬁcients c0 , . . . , cm−1 of q1 , and d0 , . . . , dn−1 of q2 . The associated homogeneous system corresponds to the case S = 0. Show that it has a nontrivial solutions if, and only if, P1 and P2 have a nontrivial common factor. (You may assume the unique factorization theorem, A.6.3.)

JANUARY 1, 2006

—DRAFT—

Chapter II

Linear operators and matrices

2.1

LINEAR OPERATORS (MAPS, TRANSFORMATIONS)

2.1.1

Let V and W be vector spaces over the same ﬁeld. A map T : V → W is linear if for all vectors vj ∈ V and scalars

D EFINITION : aj , (2.1.1)

T (a1 v1 + a2 v2 ) = a1 T v1 + a2 T v2 .

This was discussed brieﬂy in 1.1.2. Linear maps are also called linear operators, linear transformations, homomorphisms, etc. The adjective “linear” is sometimes assumed implicitly. The term we use most of the time is operator. E XAMPLES : a. If {v1 , . . . , vn } is a basis for V and {w1 , . . . , wn } ⊂ W is arbitrary, then the map vj → wj , j = 1, . . . , n extends (uniquely) to a linear map T from V to W deﬁned by (2.1.2) T: aj vj → aj wj .

Every linear operator from V into W is obtained this way. b. Let V be the space of all continuous, 2π-periodic functions on the line. For every x0 deﬁne Tx0 , the translation by x0 : Tx0 : f (x) → fx0 (x) = f (x − x0 ). 23

24

L INEAR A LGEBRA

c. The transpose. a11 a21 A= . . . an1

(2.1.3)

... ...

a11 . . . an1 a1m a2m a12 . . . an2 → ATr = . . . . . . . ... . ... . a1m . . . anm . . . anm

which maps M(n, m; F) onto M(m, n; F). d. Differentiation on F[x]:

n n

(2.1.4)

D:

0

aj xj →

1

jaj xj−1 .

There is no limiting process involved and the deﬁnition is valid for arbitrary ﬁeld F. e. Differentiation on TN :

N N

(2.1.5)

D:

−N

an einx →

−N

in an einx .

There is no limiting process involved. f. Differentiation on C ∞ [0, 1], the complex vector space of inﬁnitely differentiable complex-valued functions on [0, 1]: (2.1.6) D: f → f = df . dx

g. If V = W ⊕ U every v ∈ V has a unique representation v = w + u with w ∈ W, u ∈ U. The map π 1 : v → w is the identity on W and maps U to {0}. It is called the projection of V on W along U. The operator π 1 is linear since, if v = w + u and v1 = w1 + u1 , then av + bv1 = (aw + bw1 ) + (au + bu1 ), and π 1 (av + bv1 ) = aπ 1 v + bπ 1 v1 . Similarly, π 2 : v → u is called the projection of V on U along W. π 1 and π 2 are referred to as the projections corresponding to the direct sum decomposion.

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

25

2.1.2 We denote the space of all linear maps from V into W by L(V, W). Another common notation is HOM (V, W). The two most important cases in what follows are: W = V, and W = F, the ﬁeld of scalars. When W = V we write L(V) instead of L(V, V). When W is the underlying ﬁeld, we refer to the linear maps as linear functionals or linear forms on V. Instead of L(V, F) we write V ∗ , and refer to it as the dual space of V. 2.1.3 If T ∈ L(V, W) is bijective, it is invertible, and the inverse map T −1 is linear from W onto V. This is seen as follows: by (2.1.1), (2.1.7) T −1 (a1 T v1 + a2 T v2 ) =T −1 (T (a1 v1 + a2 v2 )) = a1 v1 + a2 v2 = a1 T −1 (T v1 ) + a2 T −1 (T v2 ),

and, as T is surjective, T vj are arbitrary vectors in W. Recall (see 1.1.2) that an isomorphism of vector spaces, V and W is a bijective linear map T : V → W. An isomorphism of a space onto itself is called an automorphism. V and W are isomorphic if there is an isomorphism of the one onto the other. The relation is clearly reﬂexive and, by the previous paragraph, symmetric. Since the concatenation (see 2.2.1) of isomorphisms is an isomorphism, the relation is also transitive and so is an equivalence relation. The image of a basis under an isomorphism is a basis, see exercise II.1.2; it follows that the dimension is an isomorphism invariant. If V is a ﬁnite dimensional vector space over F, every basis v = {v1 , . . . , vn } of V deﬁnes an isomorphism Cv of V onto Fn by:

(2.1.8)

Cv : v =

aj vj →

a1 . . . an

=

aj ej .

Cv v is the coordinate vector of v relative to the basis v. Notice that this is a special case of example a. above: we map the basis elements vj on the corresponding elements ej of the standard basis, and extend by linearity. If V and W are both n-dimensional, with bases v = {v1 , . . . , vn }, and w = {w1 , . . . , wn } respectively, the map T : aj vj → aj wj is an isomor—DRAFT—

JANUARY 1, 2006

26

L INEAR A LGEBRA

phism. This shows that the dimension is a complete invariant: ﬁnite dimensional vector spaces over F are isomorphic if, and only if, they have the same dimension. 2.1.4 The sum of linear maps T, S ∈ L(V, W), and the multiple of a linear map by a scalar are deﬁned by: for every v ∈ V, (2.1.9) (T + S)v = T v + Sv, (aT )v = a(T v).

Observe that (T + S) and aT , as deﬁned, are linear maps from V to W, i.e., elements of L(V, W). Proposition. Let V and W be vector spaces over F. Then, with the addition and multiplication by a scalar deﬁned by (2.1.9), L(V, W) is a vector space deﬁned over F. If both V and W are ﬁnite dimensional, then so is L(V, W), and dim L(V, W) = dim V dim W. P ROOF : The proof that L(V, W) is a vector space over F is straightforward checking, left to the reader. The statement about the dimension is exercise II.1.3 below.

EXERCISES FOR SECTION 2.1

II.1.1. Show that if A is linearly dependent in V and T ∈ L(V, W), then T A is linearly dependent in W. II.1.2. Prove that an injective map T ∈ L(V, W) is an isomorphism if, and only if, it maps some basis of V onto a basis of W, and this is the case if, and only if, it maps every basis of V onto a basis of W. II.1.3. Let V and W be ﬁnite dimensional with bases v = {v1 , . . . , vn } and w = {w1 , . . . , wm } respectively. Let ϕij ∈ L(V, W) be deﬁned by ϕij vi = wj and ϕij vk = 0 for k = i. Prove that {ϕij : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis for L(V, W). JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

27

2.2

OPERATOR MULTIPLICATION

2.2.1 For T ∈ L(V, W) and S ∈ L(W, U) we deﬁne ST ∈ L(V, U) by concatenation, that is: (ST )v = S(T v). ST is a linear operator since (2.2.1) ST (a1 v1 + a2 v2 ) = S(a1 T v1 + a2 T v2 ) = a1 ST v1 + a2 ST v2 .

In particular, if V = W = U, we have T , S, and T S all in L(V). Proposition. With the product ST deﬁned above, L(V) is an algebra over F. P ROOF : The claim is that the product is associative and, with the addition deﬁned by (2.1.9) above, distributive. This is straightforward checking, left to the reader. The algebra L(V) is not commutative unless dim V = 1, in which case it is simply the underlying ﬁeld. The set of automorphisms, i.e., invertible elements in L(V) is a group under multiplication, denoted GL(V). 2.2.2 Given an operator T ∈ L(V) the powers T j of T are well deﬁned for all j ≥ 1, and we deﬁne T 0 = I. Since we can take linear combinations of the powers of T we have P (T ) well deﬁned for all polynomials P ∈ F[x]. We denote (2.2.2) P(T ) = {P (T ) : P ∈ F[x]}.

P(T ) will be the main tool in understanding the way in which T acts on V. EXERCISES FOR SECTION 2.2

II.2.1. Prove that P(T ) is a commutative subalgebra of L(V). II.2.2. For T ∈ L(V) denote comm[T ] = {S : S ∈ L(V), ST = T S}, the set of operators that commute with T . Prove that comm[T ] is a subalgebra of L(V). II.2.3. Verify that GL(V) is in fact a group.

—DRAFT—

JANUARY 1, 2006

28

L INEAR A LGEBRA

2.3

MATRIX MULTIPLICATION.

2.3.1

We deﬁne the product of a 1 × n matrix (row) r = (a1 , . . . , an ) and an

b1 bn

. n × 1 matrix (column) c = . , to be the scalar given by .

(2.3.1)

r·c=

aj bj ,

Given A ∈ M(l, m) and B ∈ M(m, n), we deﬁne the product AB as the l × n matrix C whose entries cij are given by (2.3.2) cij = ri (A) · cj (B) =

k

aik bkj

(ri (A) denotes the i’th row in A, and cj (B) denotes the j’th column in B). Notice that the product is deﬁned only when the number of columns in A (the length of the row) is the same as the number of rows in B, (the height of the column).

The product is associative: given A ∈ M(l, m), B ∈ M(m, n), and C ∈ M(n, p), then AB ∈ M(l, n) and (AB)C ∈ M(l, p) is well deﬁned. Similarly, A(BC) is well deﬁned and one checks that A(BC) = (AB)C by verifying that the r, s entry in either is i,j arj bji cis . The product is distributive: for Aj ∈ M(l, m), Bj ∈ M(m, n),

(2.3.3) (A1 + A2 )(B1 + B2 ) = A1 B1 + A1 B2 + A2 B1 + A2 B2 ,

and commutes with multiplication by scalars:A(aB) = aAB. Proposition. The map (A, B) → AB, of M(l, m) × M(m, n) to M(l, n), is linear in B for every ﬁxed A, and in A for every ﬁxed B. P ROOF : The statement just summarizes the properties of the multiplication discussed above.

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

29

2.3.2

Write the n × m matrix aij

a11 a21 . . . an1 ... ... ... ...

1≤i≤n 1≤j≤m

as a “single column of rows”,

a1m r1 r2 a2m = . . . anm rn

a11 a1m a21 a2m . = . . anm an1

... ... . . . ...

where ri = ai,1 . . .

ai,m ∈ Fm . Notice that if x1 , . . . , xn ∈ Fn , then r r

a11 a21 . . . an1 ... ... ... ... a1m a2m . = x1 , . . . , xn . . anm r1 r2 . = . . rn

n

(2.3.4)

x1 , . . . , xn

xi ri .

i=1

Similarly, writing the matrix as a “single row of columns”,

a11 a21 . . . an1 ... ... ... ... a1m a11 a12 a1m a2m a21 a22 a2m . = . . . . . . = c1 , c2 , . . . , cm . . . . . . . . anm an1 an2 anm

we have

(2.3.5) a11 a21 . . . an1 ... ... ... ... a1m y1 y2 a2m . . = c1 , c2 , . . . , cm . . . . anm ym y1 y2 . = . . ym

m

y j cj .

j=1

2.3.3

If l = m = n matrix multiplication is a product within M(n).

Proposition. With the multiplication deﬁned above, M(n) is an algebra over F. The matrix I = In = (δj,k ) = n eii is the identity1 element in M(n). 1 The invertible elements in M(n), aka the non-singular matrices, form a group under multiplication, the general linear group GL(n, F). Theorem. A matrix A ∈ M(n) is invertible if, and only if its rank is n.

1

δj,k is the Kronecker delta, equal to 1 if j = k, and to 0 otherwise.

—DRAFT—

JANUARY 1, 2006

30

L INEAR A LGEBRA

P ROOF : Exercise II.3.2 below (or equation (2.3.4)) give that the row rank of BA is no bigger than the row rank of A. If BA = I, the row rank of A is at least the row rank of I, which is clearly n. On the other hand, if A is row equivalent to I, then its row echelon form is I, and by Exercise II.3.10 below, reduction to row echelon form amounts to multiplication on the left by a matrix, so that A has a left inverse. This implies, see Exercise II.3.12, that A is invertible.

EXERCISES FOR SECTION 2.3

II.3.1. Let r be the 1 × n matrix all whose entries are 1, and c the n × 1 matrix all whose entries are 1. Compute rc and cr. II.3.2. Prove that each of the columns of the matrix AB is a linear combinations of the columns of A, and that each row of AB is a linear combination of the rows of B. II.3.3. Prove: If A is a diagonal matrix with distinct entries on the diagonal, and if B is a matrix such that AB = BA, then B is diagonal. II.3.4. Denote by Ξ(n; i, j), 1 ≤ i, j ≤ n, the n × n matrix k=i,j ekk + eij + eji (the entries ξlk are all zero except for ξij = ξji = 1, and ξkk = 1 if k = i, j. This is the matrix obtained from the identity by interchanging rows i and j. Let A ∈ M(n, m) and B ∈ M(m, n). Describe Ξ(n; i, j)A and BΞ(n; i, j). II.3.5. Let σ be a permutation of [1, . . . , n]. Let Aσ be the n×n matrix whose entries aij are deﬁned by (2.3.6) aij = 1 if i = σ(j) 0 otherwise.

Let B ∈ M(n, m) and C ∈ M(m, n). Describe Aσ B and CAσ . II.3.6. A matrix whose entries are either zero or one, with precisely one non-zero entry in each row and in each column is called a permutation matrix. Show that the matrix Aσ described in the previous exercise is a permutation matrix and that every permutation matrix is equal to Aσ for some σ ∈ Sn . II.3.7. Show that the map σ → Aσ deﬁned above is multiplicative: Aστ = Aσ Aτ . (στ is deﬁned by concatenation: στ (j) = σ(τ (j)) for all j ∈ [1, n].) JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

31

II.3.8. Denote by eij , 1 ≤ i, j ≤ n, the n × n matrix whose entries are all zero except for the ij entry which is 1. With A ∈ M(n, m) and B ∈ M(m, n). Describe eij A and Beij . II.3.9. Describe an n × n matrix A(c, i, j) such that multiplying on the appropriate side, an n × n matrix B by it, has the effect of replacing the i’th row in B by the sum of the i’th row and c times the j’th row. Do the same for columns. II.3.10. Show that each of the steps in the reduction of a matrix A to its row-echelon form (see 1.3.4) can be accomplished by left multiplication of A by an appropriate matrix, so that the entire reduction to row-echelon form can be accomplished by left multiplication by an appropriate matrix. Conclude that if the row rank of A ∈ M(n) is n, then A is left-invertible. II.3.11. Let A ∈ M(n) be non-singular and let B = (A, I), the matrix obtained by “augmenting” A by the identity matrix, that is by adding to A the columns of I in their given order as columns n + 1, . . . , 2n. Show that the matrix obtained by reducing B to row echelon form is (I, A−1 ). II.3.12. Prove that if A ∈ M(n, m) and B ∈ M(m, l) then (AB)Tr = B Tr ATr . Show that if A ∈ M(n) has a left inverse then ATr has a right inverse and if A has a right inverse then ATr has a left inverse. Use the fact that A and ATr have the same rank to show that if A has a left inverse B it also has a right inverse C and since B = B(AC) = (BA)C = C, we have BA = AB = I and A has an inverse. Where does the fact that we deal with ﬁnite dimensional spaces enter the proof? II.3.13. What are the ranks and the inverses (when they exist) of the matrices 1 1 1 1 1 1 1 1 1 1 0 2 1 0 0 1 1 1 1 0 2 2 1 1 1 1 7 1 , (2.3.7) 0 0 1 1 1 . 2 1 2 1 2 , 2 2 2 2 0 5 0 9 1 0 0 0 1 1 0 5 0 0 0 5 0 0 7 0 0 0 0 1 II.3.14. Denote An = 2.4 1 n . Prove that Am An = Am+n for all integers m, n. 0 1 MATRICES AND OPERATORS.

2.4.1 Recall that we write the elements of Fn as columns. A matrix A in M(m, n) deﬁnes, by multiplication on the left, an operator TA from Fn to Fm . —DRAFT—

JANUARY 1, 2006

32

L INEAR A LGEBRA

The columns of A are the images, under TA , of the standard basis vectors of Fn (see (2.3.5)). Conversly, given T ∈ L(Fn , Fm ), if we take A = AT to be the m × n matrix whose columns are T ej , where {e1 , . . . , en } is the standard basis in Fn , we have TA = T . Finally we observe that by Proposition 2.3.1 the map A → TA is linear. This proves: Theorem. There is a 1-1 linear correspondence T ↔ AT between L(Fn , Fm ) and M(m, n) such that T ∈ L(Fn , Fm ) is obtained as a left multiplication by the m × n matrix, AT . 2.4.2 If T ∈ L(Fn , Fm ) and S ∈ L(Fm , Fl ) and AT ∈ M(m, n), resp. AS ∈ M(l, m) are the corresponding matrices, then ST ∈ L(Fn , Fl ), AS AT ∈ M(l, n), and AST = AS AT . In particular, if n = m = l, we obtain Theorem. The map T ↔ AT is an algebra isomorphism between L(Fn ) and M(n). 2.4.3 The special thing about Fn is that it has a “standard basis”. The correspondence T ↔ AT (or A ↔ TA ) uses the standard basis implicitly. Consider now general ﬁnite dimensional vector spaces V and W. Let T ∈ L(V, W) and let v = {v1 , . . . , vn } be a basis for V. As mentioned earlier, the images {T v1 , . . . , T vn } of the basis elements determine T completely. In fact, expanding any vector v ∈ V as v = cj vj , we must have T v = cj T vj . On the other hand, given any vectors yj ∈ W, j = 1, . . . , n we obtain an element T ∈ L(V, W) by declaring that T vj = yj for j = 1, . . . , n, and (necessarily) T ( aj vj ) = aj yj . Thus, the choice of a basis in V determines a 1-1 correspondence between the elements of L(V, W) and n-tuples of vectors in W. 2.4.4 If w = {w1 , . . . , wm } is a basis for W, and T vj = for any vector v = cj vj , we have (2.4.1) Tv = cj T vj =

j k m k=1 tk,j wk ,

then,

cj tk,j wk =

k j

cj tk,j wk .

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

33

Given the bases {v1 , . . . , vn } and {w1 , . . . , wm }, the full information about T is contained in the matrix t11 t21 = . . . tm1

(2.4.2)

AT,v,w

... ...

... . . . tmn

t1n t2n . = (Cw T v1 , . . . , Cw T vn ). . .

The “coordinates operators”, Cw , assign to each vector in W the column of its coordinates with respect to the basis w, see (2.1.8). When W = V and w = v we write AT,v instead of AT,v,v . Given the bases v and w, and the matrix AT,v,w , the operator T is explicitly deﬁned by (2.4.1) or equivalently by (2.4.3) Cw T v = AT,v,w Cv v.

Let A ∈ M(m, n), and denote by Sv the vector in W whose coordinates with respect to w are given by the column A Cv v. So deﬁned, S is clearly a linear operator in L(V, W) and AS,v,w = A. This gives: Theorem. Given the vector spaces V and W with bases v = {v1 , . . . , vn } and w = {w1 , . . . , wm } repectively, the map T → AT,v,w is a bijection of L(V, W) onto M(m, n). 2.4.5 C HANGE OF BASIS . Assume now that W = V, and that v and w are arbitrary bases. The v-coordinates of a vector v are given by Cv v and the w-coordinates of v by Cw v. If we are given the v-coordinates of a vector v, say x = Cv v, and we need the w-coordinates of v, we observe that v = C-1 x, v and hence Cw v = Cw C-1 x. In other words, the operator v (2.4.4) Cw,v = Cw C-1 v

on Fn assigns to the v-coordinates of a vector v ∈ V its w-coordinates. The factor C-1 identiﬁes the vector from its v-coordinates, and Cw assigns to the v identiﬁed vector its w-coordinates; the space V remains in the background. Notice that C-1 = Cw,v v,w —DRAFT—

JANUARY 1, 2006

34

L INEAR A LGEBRA

Suppose that we have the matrix AT,w of an operator T ∈ L(V) relative to a basis w, and we need to have the matrix AT,v of the same operator T , but relative to a basis v. (Much of the work in linear algebra revolves around ﬁnding a basis relative to which the matrix of a given operator is as simple as possible—a simple matrix is one that sheds light on the structure, or properties, of the operator.) Claim: (2.4.5) AT,v = Cv,w AT,w Cw,v ,

Cw,v assigns to the v-coordinates of a vector v ∈ V its w-coordinates; AT,w replaces the w-coordinates of v by those of T v; Cv,w identiﬁes T v from its w-coordinates, and produces its v-coordinates. 2.4.6 How special are the matrices (operators) Cw,v ? They are clearly nonsingular, and that is a complete characterization. Proposition. Given a basis w = {w1 , . . . , wn } of V, the map v → Cw,v is a bijection of the set of bases v of V onto GL(n, F). P ROOF : Injectivity: Since Cw is non-singular, the equality Cw,v1 = Cw,v2 implies C-11 = C-12 , and since C-11 maps the elements of the standard basis v v v of Fn onto the corresponding elements in v1 , and C-12 maps the same vectors v onto the corresponding elements in v2 , we have v1 = v2 . Surjectivity: Let S ∈ GL(n, F) be arbitrary. We shall exhibit a base v such that S = Cw,v . By deﬁnition, Cw wj = ej , (recall that {e1 , . . . , en } is the standard basis for Fn ). Deﬁne the vectors vj by the condition: Cw vj = Sej , that is, vj is the vector whose w-coordinates are given by the j’th column of S. As S is non-singular the vj ’s are linearly independent, hence form a basis v of V. For all j we have vj = C-1 ej and Cw,v ej = Cw vj = Sej . This proves v that S = Cw,v 2.4.7 S IMILARITY. The matrices B1 and B2 are said to be similar if they represent the same operator T in terms of (possibly) different bases, that is, B1 = AT,v and B2 = AT,w . If B1 and B2 are similar, they are related by (2.4.5). By Proposition 2.4.6 we have

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

35

Proposition. The Matrices B1 and B2 are similar if, and only if there exists C ∈ GL(n, F) such that (2.4.6) B1 = CB2 C −1 .

We shall see later (see exercise V.6.3) that if there exists such C with entries in some ﬁeld extension of F, then one exists in M(n, F). 2.4.8 The operators S, T ∈ L(V) are said to be similar if there is an operator R ∈ GL(V) such that (2.4.7) T = RSR−1 .

EXERCISES FOR SECTION 2.4

II.4.1. Prove that S, T ∈ L(V) are similar if, and only if, their matrices (relative to any basis) are similar. An equivalent condition is: for any basis w there is a basis v such that AT,v = AS,w . II.4.2. Let Fn [x] be the space of polynomials 0 aj xj . Let D be the differentiation operator and T = 2D + I. a. What is the matrix corresponding to T relative to the basis {xj }n ? j=0 n l n b. Verify that, if uj = l=j x , then {uj }j=0 is a basis, and ﬁnd the matrix corresponding to T relative to this basis. II.4.3. Prove that if A ∈ M(l, m), the map T : B → AB is a linear operator M(m, n) → M(l, n). In particular, if n = 1, M(m, 1) = Fm and M(l, 1) = Fl and c c T ∈ L(Fm , Fl ). What is the relation between A and the matrix AT deﬁned in 2.4.3 c c (for the standard bases, and with n there replaced here by l)? 2.5 KERNEL, RANGE, NULLITY, AND RANK

n

2.5.1

D EFINITION : The kernel of an operator T ∈ L(V, W) is the set ker(T ) = {v ∈ V : T v = 0}.

The range of T is the set range(T ) = T V = {w ∈ W : w = T v for some v ∈ V}. The kernel is also called the nullspace of T . —DRAFT—

JANUARY 1, 2006

36

L INEAR A LGEBRA

Proposition. Assume T ∈ L(V, W). Then ker(T ) is a subspace of V, and range(T ) is a subspace of W. P ROOF : If v1 , v2 ∈ ker(T ) then T (a1 v1 + a2 v2 ) = a1 T v1 + a2 T v2 = 0. If vj = T uj then a1 v1 + a2 v2 = T (a1 u1 + a2 u2 ). If V is ﬁnite dimensional and T ∈ L(V, W) then both ker(T ) and range(T ) are ﬁnite dimensional; the ﬁrst since it is a subspace of a ﬁnite dimensional space, the second as the image of one, (since, if {v1 , . . . , vn } is a basis for V, {T v1 , . . . , T vn } spans range(T )). We deﬁne the rank of T , denoted ρ(T), as the dimension of range(T ). We deﬁne the nullity of T , denoted ν(T), as the dimension of ker(T ). Theorem (Rank and nullity). Assume T ∈ L(V, W), V ﬁnite dimensional. (2.5.1) ρ(T) + ν(T) = dim V.

P ROOF : Let {v1 , . . . , vl } be a basis for ker(T ), l = ν(T), and extend it to a basis of V by adding {u1 , . . . , uk }. By 1.2.4 we have l+k = dim V. The theorem follows if we show that k = ρ(T). We do it by showing that {T u1 , . . . , T uk } is a basis for range(T ). Write any v ∈ V as l ai vi + k bi ui . Since T vi = 0, we have i=1 i=1 T v = k bi T ui , which shows that {T u1 , . . . , T uk } spans range(T ). i=1 We claim that {T u1 , . . . , T uk } is also independent. To show this, assume k k that k cj T uj = 0, then T j=1 cj uj = 0, that is j=1 cj uj ∈ ker(T ). j=1 k Since {v1 , . . . , vl } is a basis for ker(T ), we have j=1 cj uj = l dj vj for j=1 appropriate constants dj . But {v1 , . . . , vl } ∪ {u1 , . . . , uk } is independent, and we obtain cj = 0 for all j. The proof gives more than is claimed in the theorem. It shows that T can be “factored” as a product of two maps. The ﬁrst is the quotient map V → V/ ker(T ); vectors that are congruent modulo ker(T ) have the same image under T . The second, V/ ker(T ) → T V is an isomorphism. (This is the Homomorphism Theorem of groups in our context.)

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

37

2.5.2 The identity operator, deﬁned by Iv = v, is an identity element in the algebra L(V). The invertible elements in L(V) are the automorphisms of V, that is, the bijective linear maps. In the context of ﬁnite dimensional spaces, either injectivity (i.e. being 1-1) or surjectivity (onto) implies the other: Theorem. Let V be a ﬁnite dimensional vector space, T ∈ LV. Then (2.5.2) ker(T ) = {0} ⇐⇒ range(T ) = V,

and either condition is equivalent to: “T is invertible”, aka “nonsingular”. P ROOF : ker(T ) = {0} is equivalent to ν(T) = 0, and range(T ) = V is equivalent to ρ(T) = dim V. Now apply (2.5.1). 2.5.3 As another illustration of how the “rank and nullity” theorem can be used, consider the following statment (which can be seen directly as a consequence of exercise I.2.12) Theorem. Let V = V1 ⊕ V2 be ﬁnite dimensional, dim V1 = k. Let W ⊂ V be a subspace of dimension l > k. Then dim W ∩ V2 ≥ l − k. P ROOF : Denote by π 1 the restriction to W of the projection of V on V1 along V2 . Since the rank of π 1 is clearly ≤ k, the nullity is ≥ l − k. In other words, the kernel of this map, namely W ∩ V2 , has dimension ≥ l − k.

EXERCISES FOR SECTION 2.5

II.5.1. Assume T, S ∈ L(V). Prove that ν(ST) ≤ ν(S) + ν(T) . II.5.2. Give an example of two 2 × 2 matrices A and B such that ρ(AB) = 1 and ρ(BA) = 0. II.5.3. Given vector spaces V and W over the same ﬁeld. Let {vj }n ⊂ V and j=1 n {wj }j=1 ⊂ W. Prove that there exists a linear map T : span[v1 , . . . , vn ] → W such that T vj = wj , j = 1, . . . , n if, and only if, the following implication holds:

n n

If aj , j = 1 . . . , n are scalars, and

1

aj vj = 0,

then

1

aj wj = 0.

—DRAFT—

JANUARY 1, 2006

38

L INEAR A LGEBRA

Can the deﬁnition of T be extended to the entire V? II.5.4. What is the relationship of the previous exercise to Theorem 1.3.5? II.5.5. The operators T, S ∈ L(V) are called “equivalent” if there exist invertible A, B ∈ L(V) such that S = AT B (so that T = A−1 SB −1 ).

Prove that if V is ﬁnite dimensional then T, S are “equivalent” if, and only if ρ(S) = ρ(T) . II.5.6. Give an example of two operators on F3 that are equivalent but not similar. II.5.7. Assume T, S ∈ L(V). Prove that the following statements are equivalent: a. ker(S) ⊂ ker(T ), b. There exists R ∈ L(V) such that T = RS.

Hint: For the implication a. =⇒ b.: Choose a basis {v1 , . . . , vs } for ker(S). Expand

it to a basis for ker(T ) by adding {u1 , . . . , ut−s }, and expand further to a basis for V by adding the vectors {w1 , . . . , wn−t }. The sequence {Su1 , . . . , Sut−s } ∪ {Sw1 , . . . , Swn−t } is independent, so that R can be deﬁned arbitrarily on it (and extended by linearity to an operator on the entire space). Deﬁne R(Suj ) = 0, R(Swj ) = T wj . The other implication is obvious. II.5.8. Assume T, S ∈ L(V). Prove that the following statements are equivalent: a. range(S) ⊂ range(T ), b. There exists R ∈ L(V) such that S = T R.

Hint: Again, b. =⇒ a. is obvious.

For a. =⇒ b. Take a basis {v1 , . . . , vn } for V. Let uj , j = 1, . . . , n be such that T uj = Svj , (use assumption a.). Deﬁne Rvj = uj (and extend by linearity). II.5.9. Find bases for the null space, ker(A), matrix (acting on rows in R5 ) 1 0 0 5 0 1 0 −3 0 0 1 2 3 2 1 11 1 2 0 −1 JANUARY 1, 2006 and for the range, range(A), of the 9 2 1 . 32 13

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

39

II.5.10. Let T ∈ L(V ), l ∈ N. Prove: a. ker(T l ) ⊆ ker(T l+1 ); equality if, and only if range(T l ) ∩ ker(T ) = {0}. b. range(T l+1 ) ⊆ range(T l ); equality if, and only if, ker(T l+1 ) = ker(T l ). c. If ker(T l+1 ) = ker(T l ), then ker(T l+k+1 ) = ker(T l+k ) for all positive integers k. II.5.11. An operator T is idempotent if T 2 = T . Prove that an idempotent operator is a projection on range(T ) along ker(T ). 2.6 NORMED FINITE DIMENSIONAL LINEAR SPACES

2.6.1 A norm on a real or complex vector space V is a nonnegative function v → v that satisﬁes the conditions a. Positivity: 0 = 0 and if v = 0 then v > 0. b. Homogeneity: av = |a| v for scalars a and vectors v. c. The triangle inequality: v + u ≤ v + u . These properties guarantee that ρ(v, u) = v − u is a metric on the space, and with a metric one can use tools and notions from point-set topology such as limits, continuity, convergence, inﬁnite series, etc. A vector space endowed with a norm is a normed vector space. 2.6.2 If V and W are isomorphic real or complex n-dimensional spaces and S is an isomorphism of V onto W, then a norm · ∗ on W can be transported to V by deﬁning v = Sv ∗ . This implies that all possible norms on a real n-dimensional space are copies of norms on Rn , and all norms on a complex n-dimensional space are copies of norms on Cn . A ﬁnite dimensional V can be endowed with many different norms; yet, all these norms are equivalent in the following sense: D EFINITION : The norms · 1 and · 2 are equivalent, written: · if there is a positive constant C such that for all v ∈ V C −1 v

1 1

∼ ·

2

≤ v

2

≤C v

1

—DRAFT—

JANUARY 1, 2006

40

L INEAR A LGEBRA

The metrics ρ1 , ρ2 , deﬁned by equivalent norms, are equivalent: for v, u ∈ V C −1 ρ1 (v, u) ≤ ρ2 (v, u) ≤ Cρ1 (v, u). which means that they deﬁne the same topology—the familiar topology of Rn or Cn . 2.6.3 If V and W are normed vector spaces we deﬁne a norm on L(V, W) by writing, for T ∈ L(V, W), (2.6.1) Equivalently, (2.6.2) T = inf{C : T v ≤ C v for all v ∈ H}. T = max T v = max

v =1 v=0

Tv . v

To check that (2.6.1) deﬁnes a norm we observe that properties a. and b. are obvious, and that c. follows from2 (T + S)v ≤ T v + Sv ≤ T v + S v ≤( T + S ) v .

L(V) is an algebra and we observe that the norm deﬁned by (2.6.1) on L(V) is submultiplicative: we have ST v ≤ S T v ≤ S T v , where S, T ∈ L(V) and v ∈ V, which means (2.6.3) ST ≤ S T .

EXERCISES FOR SECTION 2.6

II.6.1. Let V be n-dimensional real or complex vector space, v = {v1 , . . . , vn } a basis for V. Write aj vj v,1 = |aj |, and aj vj v,∞ = max|aj |. Prove: a. (2.6.4)

2

·

v,1

and ·

v,∞

are norms on V, and ·

v,∞

≤ ·

v,1

≤n ·

v,∞

Notice that the norms appearing in the inequalities are the ones deﬁned on W, L(V, W), and V, respectively.

JANUARY 1, 2006

—DRAFT—

II.

L INEAR OPERATORS AND MATRICES

41

b. (2.6.5)

If · is any norm on V then, for all v ∈ V, v

v,1

max vj ≥ v .

II.6.2. Let · j , j = 1, 2, be norms on V, and ρj the induced metrics. Let {vn }∞ n=0 be a sequence in V and assume that ρ1 (vn , v0 ) → 0. Prove ρ2 (vn , v0 ) → 0. II.6.3. Let {vn }∞ be bounded in V. Prove that 0 vn z n converges for every z n=0 such that |z| < 1. Hint: Prove that the partial sums form a Cauchy sequence in the metric deﬁned by the norm. II.6.4. Let V be n-dimensional real or complex normed vector space. The unit ball in V is the set B1 = {v ∈ V : v ≤ 1}. Prove that B1 is convex: If v, u ∈ B1 , 0 ≤ a ≤ 1, then av + (1 − a)u ∈ B1 . Bounded: For every v ∈ V, there exist a (positive) constant λ such that cv ∈ B for / |c| > λ. Symmetric, centered at 0: If v ∈ B and |a| ≤ 1 then av ∈ B. II.6.5. Let V be n-dimensional real or complex vector space, and let B be a bounded symmetric convex set centered at 0. Deﬁne u = inf{a > 0 : a−1 u ∈ B}. Prove that this deﬁnes a norm on V, and the unit ball for this norm is the given B II.6.6. Describe a norm 1 while (1, 1, 1) 0 < 100 .

0 ∞

on R3 such that the standard unit vectors have norm 1

II.6.7. Let V be a normed linear space and T ∈ L(V). Prove that the set of vectors v ∈ V whose T -orbit, {T n v}, is bounded is a subspace of V.

—DRAFT—

JANUARY 1, 2006

42

L INEAR A LGEBRA

JANUARY 1, 2006

—DRAFT—

Chapter III

Duality of vector spaces

3.1

LINEAR FUNCTIONALS

Let V be a ﬁnite dimensional vector space with basis {v1 , . . . , vn }. Every element v ∈ V can be written, in exactly one way, as

n

(3.1.1)

v=

1

aj (v)vj ,

the notation aj (v) comes to emphasize the dependence of the coefﬁcients on the vector v. Let v = n aj (v)vj , and u = n aj (u)vj . If c, d ∈ F, then 1 1

n

cv + du =

1

(caj (v) + daj (u))vj

so that aj (cv + du) = caj (v) + daj (u). In other words, aj (v) are linear functionals on V. A standard notation for the image of a vector v under a linear functional v ∗ is (v, v ∗ ). Accordingly we denote the linear functionals corresponding to aj (v) ∗ by vj and write

n

(3.1.2)

∗ aj (v) = (v, vj ) so that v = 1

∗ (v, vj )vj .

∗ Proposition. The linear functionals vj , j = 1, . . . , n form a basis for the dual ∗. space V

43

44

L INEAR A LGEBRA

P ROOF : Let u∗ ∈ V ∗ . Write bj (u∗ ) = (vj , u∗ ), then for any v ∈ V, (v, u∗ ) = (

j ∗ (v, vj )vj , u∗ ) = j ∗ (v, vj )bj (u∗ ) = (v, ∗ bj (u∗ )vj ),

∗ ∗ ∗ and u∗ = bj (u∗ )vj . It follows that {v1 , . . . , vn } spans V ∗ . On the other ∗ ∗ ∗ ∗ hand, {v1 , . . . , vn } is independent since cj vj = 0 implies (vk , cj vj ) = ck = 0 for all k.

Corollary. dim V ∗ = dim V.

∗ The basis {vj }n , j = 1, . . . , n is called the dual basis of {v1 , . . . , vn }. It 1 is characterized by the condition

(3.1.3)

∗ (vj , vk ) = δj,k ,

δj,k is the Kronecker delta, it takes the value 1 if j = k, and 0 otherwise. 3.1.1 The way we add linear functionals or multiply them by scalars guarantees that the form (expression) (v, v ∗ ), v ∈ V and v ∗ ∈ V ∗ , is bilinear, that is linear in v for every ﬁxed v ∗ , and linear in v ∗ for any ﬁxed v. Thus every v ∈ V deﬁnes a linear functional on V ∗ . ∗ ∗ If {v1 , . . . , vn } is a basis for V, and {v1 , . . . , vn } the dual basis in V ∗ , then ∗ ∗ (3.1.3) identiﬁes {v1 , . . . , vn } as the dual basis of {v1 , . . . , vn }. The roles of V and V ∗ are perfectly symmetric and what we have is two spaces in duality, the duality between them deﬁned by the bilinear form (v, v ∗ ). (3.1.2) works in ∗ ∗ both directions, thus if {v1 , . . . , vn } and {v1 , . . . , vn } are dual bases, then for all v ∈ V and v ∗ ∈ V ∗ ,

n n ∗ (v, vj )vj , 1

(3.1.4)

v=

v∗ =

1

∗ (vj , v ∗ )vj .

Fn

The dual of Fn (i.e., Fn written as columns) can be identiﬁed with Fn (i.e., c r written as rows) and the pairing (v, v ∗ ) as the matrix product v ∗ v of the row v ∗ by the column v, (exercise III.1.4 below). The dual of the standard basis of Fn is the standard basis Fn . c r

JANUARY 1, 2006

—DRAFT—

III.

D UALITY OF VECTOR SPACES

45

3.1.2 A NNIHILATOR . Given a set A ⊂ V, the set of all the linear functionals v ∗ ∈ V ∗ that vanish identically on A is called the annihilator of A and denoted A⊥ . Clearly, A⊥ is a subspace of V ∗ . Functionals that annihilate A vanish on span[A] as well, and functionals that annihilate span[A] clearly vanish on A; hence A⊥ = (span[A])⊥ .

⊥ Proposition. Let V1 ⊂ V be a subspace, then dim V1 + dim V1 = dim V.

P ROOF : Let {v1 , . . . , vm } be a basis for V1 , and let {vm+1 , . . . , vn } complete ∗ ∗ it to a basis for V. Let {v1 , . . . , vn } be the dual basis. ⊥ ⊥ ∗ ∗ We claim, that {vm+1 , . . . , vn } is a basis for V1 ; hence dim V1 = n − m proving the proposition. ∗ ∗ ⊥ By (3.1.3) we have {vm+1 , . . . , vn } ⊂ V1 , and we know these vectors to ⊥ be independent. We only need to prove that they span V1 . ∗ ⊥ Let w∗ ∈ V1 , Write w∗ = n aj vj , and observe that aj = (vj , w∗ ). j=1 ∗ ⊥ Now w∗ ∈ V1 implies aj = 0 for 1 ≤ j ≤ m, so that w∗ = n aj vj . m+1 Theorem. Let A ⊂ V, v ∈ V and assume that (v, u∗ ) = 0 for every u∗ ∈ A⊥ . Then v ∈ span[A]. Equivalent statement: If v ∈ span[A] then there exists u∗ ∈ A⊥ such that / (v, u∗ ) = 0. P ROOF : If v ∈ span[A], then dim span[A, v] = dim span[A] + 1, hence / dim span[A, v]⊥ = dim span[A]⊥ −1. It follows that span[A]⊥ span[A, v]⊥ , and since functionals in A⊥ which annihilate v annihilate span[A, v], there exist functionals in A⊥ that do not annihilate v. 3.1.3 Let V be a ﬁnite dimensional vector space and V1 ⊂ V a subspace. Restricting the domain of a linear functional in V ∗ to V1 deﬁnes a linear functional on V1 . The functionals whose restriction to V1 is zero are, by deﬁnition, the el⊥ ements of V1 . The restrictions of v ∗ and u∗ to V1 are equal if, and only if, ⊥ v ∗ − u∗ ∈ V1 . This, combined with exercise III.1.2 below, gives a natural ∗ ⊥ identiﬁcation of V1 with the quotient space V ∗ /V1 . —DRAFT—

JANUARY 1, 2006

46

L INEAR A LGEBRA

EXERCISES FOR SECTION 3.1

III.1.1. Given a linearly independent {v1 , . . . , vk } ⊂ V and scalars {aj }k . Prove j=1 ∗ ∗ ∗ that there exists v ∈ V such that (vj , v ) = aj for 1 ≤ j ≤ k. III.1.2. If V1 is a subspace of a ﬁnite dimensional space V then every linear functional on V1 is the restriction to V1 of a linear functional on V. III.1.3. Let V be a ﬁnite dimensional vector space, V1 ⊂ V a subspace. Let ∗ r ⊥ ⊥ {uk }k=1 ⊂ V ∗ be linearly independent mod V1 (i.e., if ck u∗ ∈ V1 , then ck = 0, k ∗ ⊥ ∗ k = 1, . . . , r). Let {vj }s ⊂ V1 , be independent. Prove that {u∗ } ∪ {vj } is linearly j=1 k independent in V ∗ . III.1.4. Show that every linear functional on Fn is given by some (a1 , . . . , an ) ∈ Fn c r as

x1 x1

. . . → (a1 , . . . , an ) . = . .

xn xn

aj xj

III.1.5. Let V and W be ﬁnite dimensional vector spaces. a. Prove that for every v ∈ V and w∗ ∈ W ∗ the map ϕv,w∗ : T → (T v, w∗ ) is a linear functional on L(V, W). b. Prove that the map v ⊗ w∗ → ϕv,w∗ is an isomorphism of V ⊗ W ∗ onto the dual space of L(V, W).

∗ III.1.6. Let V be a complex vector space, {vj }s ⊂ V ∗ , and w∗ ∈ V ∗ such that for j=1 all v ∈ V, ∗ | v, w∗ | ≤ maxs | v, vj |. j=1 ∗ Prove that w∗ ∈ span[{vj }s ]. j=1

III.1.7. Linear functionals on RN [x]: 1. Show that for every x ∈ R the map ϕx deﬁned by (P, ϕx ) = P (x) is a linear functional on RN [x]. 2. If {x1 , . . . , xm } are distinct and m ≤ N + 1, then ϕxj are linearly independent. 3. For every x ∈ R and l ∈ N, l ≤ N , the map ϕx deﬁned by (P, ϕx ) = P (l) (x) is a (non-trivial) linear functional on RN [x]. JANUARY 1, 2006

(l) (l)

—DRAFT—

III.

D UALITY OF VECTOR SPACES

47

III.1.8. Let xj ∈ R, lj ∈ N, and assume that the pairs (xj , lj ), j = 1, . . . , N + 1, are distinct. Denote by #(m) the number of such pairs with lj > m. (l ) a. Prove that a necessary condition for the functionals ϕxjj to be independent on RN [x] is: (3.1.5) for every m ≤ N ,

(1)

#(m) ≤ N − m.

b. Check that ϕ1 , ϕ−1 , and ϕ0 are linearly dependent in the dual of R2 [x], hence (1) (3.1.5) is not sufﬁcient. Are ϕ1 , ϕ−1 , and ϕ0 linearly dependent in the dual of R3 [x]? 3.2 THE ADJOINT

3.2.1 The concatenation of T ∈ L(V, W), and w∗ ∈ W ∗ , is a linear map from V to the underlying ﬁeld, i.e. a linear functional v ∗ on V. With T ﬁxed, the mapping w∗ → w∗ T is a linear operator T ∗ ∈ L(W ∗ , V ∗ ). It is called the adjoint of T . The basic relationship between T, T ∗ , and the bilinear forms (v, v ∗ ) and (w, w∗ ) is: For all v ∈ V and w∗ ∈ W ∗ , (3.2.1) (T v, w∗ ) = (v, T ∗ w∗ ).

w∗ T

Notice that the left-hand side is the bilinear form on (W, W ∗ ), while the righthand side in (V, V ∗ ). 3.2.2 Proposition. (3.2.2) ρ(T* ) = ρ(T) .

P ROOF : Let T ∈ L(V, W), assume ρ(T) = r, and let {v1 , . . . , vn } be a basis for V such that {vr+1 , . . . , vn } is a basis for ker(T ). We have seen (see the proof of theorem 2.5.1) that {T v1 , . . . , T vr } is a basis for T V = range(T ). Denote wj = T vj , j = 1, . . . , r. Add the vectors wj , j = r + 1, . . . , m so ∗ ∗ that {w1 , . . . , wm } be a basis for W. Let {w1 , . . . , wm } be the dual basis. ∗ ∗ Fix k > r; for every j ≤ r we have (vj , T ∗ wk ) = (wj , wk ) = 0 which ∗ ∗ means T ∗ wk = 0. Thus T ∗ W ∗ is spanned by {T ∗ wj }r . j=1 ∗ w ∗ ) = (w , w ∗ ) = δ , which implies that For 1 ≤ i, j ≤ r, (vi , T j i i,j j ∗ {T ∗ wj }r is linearly independent in V ∗ . j=1 ∗ ∗ Thus, {T ∗ w1 , . . . , T ∗ wr } is a basis for T ∗ W ∗ , and ρ(T* ) = ρ(T). —DRAFT—

JANUARY 1, 2006

48

L INEAR A LGEBRA

3.2.3 We have seen in 3.1.1 that if V = Fn , W = Fm , both with standard c c bases, then V ∗ = Fn , W ∗ = Fm , and the standard basis of Fm is the dual basis r r r of the standard basis of Fn . c

t11 . . . t1n . . If A = AT = . . . . . , is the matrix of T with respect to the stan. . tm1 . . . tmn

dard bases, then the operator T is given as left multiplication by A on Fn and c the bilinear form (T v, w), for w ∈ Fm and v ∈ Fn , is just the matrix product r c (3.2.3) w(Av) = (wA)v.

It follows that T ∗ w = wAT , that is, the action of T ∗ on the row vectors in Fm r is obtained as multiplication on the right by the same matrix A = AT . If we want1 to have the matrix of T ∗ relative to the standard bases in Fn c and Fm , acting on columns by left multiplication, all we need to do is transpose c wA and obtain T ∗ wTr = ATr wTr . 3.2.4 Proposition. Let T ∈ L(V, W). Then (3.2.4) range(T )⊥ = ker(T ∗ ) and range(T ∗ )⊥ = ker(T ).

P ROOF : w∗ ∈ range(T )⊥ is equivalent to (T v, w∗ ) = (v, T ∗ w∗ ) = 0 for all v ∈ V, and (v, T ∗ w∗ ) = 0 for all v ∈ V is equivalent to T ∗ w∗ = 0. The condition v ∈ range(T ∗ )⊥ is equivalent to (v, T ∗ w∗ ) = 0 for all ∗ ∈ W ∗ , and T v = 0 is equivalent to (T v, w ∗ ) = (v, T ∗ w ∗ ) = 0 i.e. w v ∈ range(T ∗ )⊥ .

EXERCISES FOR SECTION 3.2

This will be the case when there is a natural way to identify the vector space with its dual, for instance when we work with inner product spaces. If the “identiﬁcation” is sesquilinear, as is the case when F = C the matrix for the adjoint is the complex conjugate of ATr , , see Chapter VI.

1

JANUARY 1, 2006

—DRAFT—

III.

D UALITY OF VECTOR SPACES

49

If V = W ⊕ U and S is the projection of V on W along U (see 2.1.1.g), what is the adjoint S ∗ ? III.2.1. III.2.2. Let A ∈ M(m, n; R). Prove

ρ(ATr A) = ρ(A)

III.2.3.

∗ Prove that, in the notation of 3.2.2, {wj }j=r+1,...,m is a basis for

ker(T ∗ ).

III.2.4. A vector v ∈ V is an eigenvector for T ∈ L(V) if T v = λv with

λ ∈ F; λ is the corresponding eigenvalue. Let v ∈ V be an eigenvector of T with eigenvalue λ, and w ∈ V ∗ an eigenvector of the adjoint T ∗ with eigenvalue λ∗ = λ. Prove that (v, w∗ ) = 0.

—DRAFT—

JANUARY 1, 2006

50

L INEAR A LGEBRA

JANUARY 1, 2006

—DRAFT—

Chapter IV

Determinants

4.1

PERMUTATIONS

A permutation of a set is a bijective, that is 1-1, map of the set onto itself. The set of permutations of the set [1, . . . , n] is denoted Sn . It is a group under concatenation—given σ, τ ∈ Sn deﬁne τ σ by (τ σ)(j) = τ (σ(j)) for all j. The identity element of Sn is the trivial permutation e deﬁned by e(j) = j for all j. Sn with this operation is called the symmetric group on [1, . . . , n]. 4.1.1 If σ ∈ Sn and a ∈ [1, . . . , n] the set {σ k (a)}, is called the σ-orbit of a. If σa = a the orbit is trivial, i.e., reduced to a single point (which is left unmoved by σ). A permutation σ is called a cycle, and denoted (a1 , . . . , al ), if {aj }l j=1 is its unique nontrivial orbit, aj+1 = σ(aj ) for 1 ≤ j < l, and a1 = σal . The length of the cycle, l, is the period of a1 under σ, that is, the ﬁrst positive integer such that σ l (a1 ) = a1 . Observe that σ is determined by the cyclic order of the entries, thus (a1 , . . . , al ) = (al , a1 , . . . , al−1 ). Given σ ∈ Sn , the σ-orbits form a partition of [1, . . . , n], the corresponding cycles commute, and their product is σ. Cycles of length 2 are called transpositions. Lemma. Every permutation σ ∈ Sn is a product of transpositions. P ROOF : Since every σ ∈ Sn is a product of cycles, it sufﬁces to show that every cycle is a product of transpositions. Observe that (a1 , . . . , al ) = (al , a1 , a2 , . . . , al−1 ) = (a1 , a2 )(a2 , a3 ) · · · (al−1 , al ) 51

52

L INEAR A LGEBRA

(al trades places with al−1 , then with al−2 , etc., until it settles in place of a1 ; every other aj moves once, to the original place of aj+1 ). Thus, every cycle of length l is a product of l − 1 transpositions. Another useful observation concerns conjugation in Sn . If σ, τ ∈ Sn , and τ (i) = j then τ σ −1 maps σ(i) to j and στ σ −1 maps σ(i) to σ(j). This means that the cycles of στ σ −1 are obtained from the cycles of τ by replacing the entries there by their σ images. In particular, all cycles of a given length are conjugate in Sn . 4.1.2 T HE SIGN OF A PERMUTATION . There are several equivalent ways to deﬁne the sign of a permutation σ ∈ Sn . The sign, denoted sgn [σ], is to take the values ±1, assign the value −1 to each transposition, and be multiplicative: sgn [στ ] = sgn [σ] sgn [τ ], in other words, be a homomorphism of Sn onto the multiplicative group {1, −1}. All these requirements imply that if σ can be written as a product of k transpositions, then sgn [σ] = (−1)k . But in order to use this as the deﬁnition of sgn one needs to prove that the numbers of factors in all the representations of any σ ∈ Sn as products of transpositions have the same parity. Also, ﬁnding the value of sgn [σ] this way requires a concrete representation of σ as a product of transpositions. We introduce sgn in a different way: D EFINITION : A set J of pairs {(k, l)} is appropriate for Sn if it contains exactly one of (j, i), (i, j) for every pair i, j, 1 ≤ i < j ≤ n. The simplest example is J = {(i, j) : 1 ≤ i < j ≤ n}. A more general example of an appropriate set is: for τ ∈ Sn , (4.1.1) Jτ = {(τ (i), τ (j)) : 1 ≤ i < j ≤ n}.

If J is appropriate for Sn , and σ ∈ Sn , then1 (4.1.2)

i<j

1

sgn (σ(j) − σ(i)) =

(i,j)∈J

sgn (σ(j) − σ(i)) sgn (j − i)

The sign of integers has the usual meaning.

JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

53

since reversing a pair (i, j) changes both sgn (σ(j) − σ(i)) and sgn (j − i), and does not affect their product. We deﬁne the sign of a permutation σ by (4.1.3) sgn [σ] =

i<j

sgn (σ(j) − σ(i))

Proposition. The map sgn : σ → sgn [σ] is a homomorphism of Sn onto the multiplicative group {1, −1}. The sign of any transposition is −1. P ROOF : The multiplicativity is shown as follows: sgn [στ ] =

i<j

sgn (στ (j) − στ (i)) sgn (στ (j) − στ (i)) sgn (τ (j) − τ (i))

i<j i<j

=

sgn (τ (j) − τ (i))

= sgn [σ] sgn [τ ]. Since the sign of the identity permutation is +1, the multiplicativity implies that conjugate permutations have the same sign. In particular all transpositions have the same sign. The computation for (1, 2) is particularly simple: sgn (j − 1) = sgn (j − 2) = 1 for all j > 2, while sgn (1 − 2) = −1 and the sign of all transpositions is −1.

EXERCISES FOR SECTION 4.1

IV.1.1. Let σ be a cycle of length k; prove that sgn [σ] = (−1)(k−1) .

IV.1.2. Let σ ∈ Sn and assume that its has s orbits (including the trivial orbits, i.e., ﬁxed points). Prove that sgn [σ] = (−1)n−s IV.1.3. Let σj ∈ Sn , j = 1, 2 be cycles with different orbits, Prove that the two commute if, and only if, their (nontrivial) orbits are disjoint. 4.2 MULTILINEAR MAPS

Let Vj , j = 1, . . . , k, and W be vector spaces over a ﬁeld F. A map (4.2.1) ψ : V1 × V2 · · · × Vk → W —DRAFT—

JANUARY 1, 2006

54

L INEAR A LGEBRA

is multilinear, or k-linear, (bilinear—if k = 2) if ψ(v1 , . . . , vk ) is linear in each entry vj when the other entries are held ﬁxed. When all the Vj ’s are equal to some ﬁxed V we say that ψ is k-linear on V. If W is the underlying ﬁeld F, we refer to ψ as a k -linear form or just k-form. E XAMPLES : a. Multiplication in an algebra, e.g., (S, T ) → ST in LV or (A, B) → AB in M(n). b. ψ(v, v ∗ ) = (v, v ∗ ), the value of a linear functional v ∗ ∈ V ∗ on a vector v ∈ V, is a bilinear form on V × V ∗ .

∗ c. Given k linear functionals vj ∈ V ∗ , the product ψ(v1 , . . . , vk ) = of is a k-form on V. ∗ (vj , vj )

d. Let V1 = F[x] and V2 = F[y] the map (p(x), q(y)) → p(x)q(y) is a bilinear map from F[x] × F[y] onto the space F[x, y] of polynomials in the two variables. 4.2.1 The deﬁnition of the tensor product V1 ⊗ V2 , see 1.1.6, guarantees that the map (4.2.2) Ψ(v, u) = v ⊗ u.

of V1 × V2 into V1 ⊗ V2 is bilinear. It is special in that every bilinear map from (V1 , V2 ) “factors through it”: Theorem. Let ϕ be a bilinear map from (V1 , V2 ) into W. Then there is a linear map Φ : V1 ⊗ V2 −→ W such that ϕ = ΦΨ. The proof consists in checking that, for vj ∈ V1 and uj ∈ V2 , vj ⊗ uj = 0 =⇒ ϕ(vj , uj ) = 0

so that writing Φ(v ⊗ u) = ϕ(v, u) deﬁnes Φ unambiguously, and checking that so deﬁned, Φ is linear. We leave the checking to the reader.

JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

55

4.2.2 Let V and W be ﬁnite dimensional vector spaces. Given v ∗ ∈ V ∗ w ∈ W, and v ∈ V, the map v → (v, v ∗ )w is clearly a linear map from V to W (a linear functional on V times a ﬁxed vector in W) and we denote it (temporarily) by v ∗ ⊗ w. Theorem. The map Φ : v ∗ ⊗ w → v ∗ ⊗ w ∈ L(V, W) extends by linearity to an isomorphism of V ∗ ⊗ W onto L(V, W). P ROOF : As in 4.2.1 we verify that all the representations of zero in the tensor product are mapped to 0, so that we do have a linear extension. ∗ Let T ∈ L(V, W), v = {vj } a basis for V, and v∗ = {vj } the dual basis. Then, for v ∈ V, (4.2.3) Tv = T

∗ (v, vj )vj = ∗ (v, vj )T vj = ∗ vj ⊗ T vj v,

∗ so that T = vj ⊗ T vj . This shows that Φ is surjective and, since the two spaces have the same dimension, a linear map of one onto the other is an isomorphism.

When there is no room for confusion we omit the underlining and write the operator as v ∗ ⊗ w instead of v ∗ ⊗ w. EXERCISES FOR SECTION 4.2

IV.2.1. Assume ϕ(v, u) bilinear on V1 × V2 . Prove that the map T : u → ϕu (v) is a ∗ linear map from V2 into (the dual space) V1 . Similarly, S : v → v ϕ(u) is linear from ∗ V1 to V2 . IV.2.2. Let V1 and V2 be ﬁnite dimensional, with bases {v1 , . . . , vm } and {u1 , . . . , un } respectively. Show that every bilinear form ϕ on (V1 , V2 ) is given by an m × n matrix m n (ajk ) such that if v = 1 xj vj and u = 1 yk uk then a11 . . . a1n y1 . . ... . . . . ajk xj yk = (x1 , . . . , xm ) . . . am1 . . . amn yn

(4.2.4)

ϕ(v, u) =

IV.2.3. What is the relation between the matrix in IV.2.2 and the maps S and T deﬁned in IV.2.1?

—DRAFT—

JANUARY 1, 2006

56

L INEAR A LGEBRA

IV.2.4. Let V1 and V2 be ﬁnite dimensional, with bases {v1 , . . . , vm } and {u1 , . . . , un } ∗ ∗ respectively, and let {v1 , . . . , vm } be the dual basis of {v1 , . . . , vm }. Let T ∈ L(V1 , V2 ) and let a11 . . . a1m . . AT = . . . . . . . an1 . . . anm be its matrix relative to the given bases. Prove (4.2.5) T =

∗ aij (vj ⊗ ui ).

4.2.3 If Ψ and Φ are k-linear maps of V1 × V2 · · · × Vk into W and a, b ∈ F then aΨ + bΦ is k-linear. Thus, the k-linear maps of V1 × V2 · · · × Vk into W form a vector space which we denote by ML({Vj }k , W). j=1 When all the Vj are the same space V, the notation is: ML(V ⊕k , W). The reference to W is omitted when W = F. 4.2.4 Example b. above identiﬁes enough k-linear forms

4.3 ALTERNATING N -FORMS

4.3.1 D EFINITION : An n-linear form ϕ(v1 , . . . , vn ) on V is alternating if ϕ(v1 , . . . , vn ) = 0 whenever one of the entry vectors is repeated, i.e., if vk = vl for some k = l. If ϕ is alternating, and k = l then ϕ(· · · , vk , · · · , vl , · · · ) = ϕ(· · · , vk , · · · , vl + vk , · · · ) (4.3.1) = ϕ(· · · , −vl , · · · , vl + vk , · · · ) = ϕ(· · · , −vl , · · · , vk , · · · ) = −ϕ(· · · , vl , · · · , vk , · · · ), which proves that a transposition (k, l) on the entries of ϕ changes its sign. It follows that for any permutation σ ∈ Sn (4.3.2) ϕ(vσ(1) , . . . , vσ(n) ) = sgn [σ]ϕ(v1 , . . . , vn ).

Condition (4.3.2) explains the term alternating and when the characteristic of F is = 2, can be taken as the deﬁnition.

JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

57

If ϕ is alternating, and if one of the entry vectors is a linear combination of the others, we use the linearity of ϕ in that entry and write ϕ(v1 , . . . , vn ) as a linear combination of ϕ evaluated on several n-tuples each of which has a repeated entry. Thus, if {v1 , . . . , vn } is linearly dependent, ϕ(v1 , . . . , vn ) = 0. It follows that if dim V < n, there are no nontrivial alternating n-forms on V. Theorem. Assume dim V = n. The space of alternating n-forms on V is one dimensional: there exists one and, up to scalar multiplication, unique non-trivial alternating n-form D on V. D(v1 , . . . , vn ) = 0 if, and only if, {v1 , . . . , vn } is a basis. P ROOF : We show ﬁrst that if ϕ is an alternating n-form, it is completely determined by its value on any given basis of V. This will show that any two alternating n-forms are proportional, and the proof will also make it clear how to deﬁne a non-trivial alternating n-form. If {v1 , . . . , vn } is a basis for V and ϕ an alternating n-form on V, then ϕ(vj1 , . . . , vjn ) = 0 unless {j1 , . . . , jn } is a permutation, say σ, of {1, . . . , n}, and then ϕ(vσ(1) , . . . , vσ(n) ) = sgn [σ]ϕ(v1 , . . . , vn ). If {u1 , . . . , un } is an arbitrary n-tuple, we express each ui in terms of the basis {v1 , . . . , vn }:

n

(4.3.3)

uj =

i=1

ai,j vi ,

j = 1, . . . , n

and the multilinearity implies ϕ(u1 , . . . , un ) = (4.3.4) =

σ∈Sn

a1,j1 · · · an,jn ϕ(vj1 , . . . , vjn ) sgn [σ]a1,σ(1) · · · , an,σ(n) ϕ(v1 , . . . , vn ).

This show that ϕ(v1 , . . . , vn ) determines ϕ(u1 , . . . , un ) for all n-tuples, and all alternating n-forms are proportional. This also shows that unless ϕ is trivial, ϕ(v1 , . . . , vn ) = 0 for every independent (i.e., basis) {v1 , . . . , vn }. For the existence we ﬁx a basis {v1 , . . . , vn } and set D(v1 , . . . , vn ) = 1. Write D(vσ(1) , . . . , vσ(n) ) = sgn [σ] (for σ ∈ Sn ) and D(vj1 , . . . , vjn ) = 0 if there is a repeated entry. —DRAFT—

JANUARY 1, 2006

58

L INEAR A LGEBRA

For arbitrary n-tuple {u1 , . . . , un } deﬁne D(u1 , . . . , un ) by (4.3.4), that is (4.3.5) D(u1 , . . . , un ) =

σ∈Sn

sgn [σ]a1,σ(1) · · · an,σ(n) .

The fact that D is n-linear is clear: it is deﬁned by multilinear expansion. To check that it is alternating take τ ∈ Sn and write D(uτ (1) , . . . , uτ (n) ) = (4.3.6) =

σ∈Sn σ∈Sn

sgn [σ]aτ (1),σ(1) · · · aτ (n),σ(n)

sgn [σ]a1,τ −1 σ(1) · · · an,τ −1 σ(n) = sgn [τ ]D(u1 , . . . , un )

since sgn [τ −1 σ] = sgn [τ ] sgn [σ]. Observe that if {u1 , . . . , un } is given by (4.3.3) then {T u1 , . . . , T un } is given by

n

(4.3.7) and (4.3.4) implies (4.3.8)

T uj =

i=1

ai,j T vi ,

j = 1, . . . , n

D(T u1 , . . . , T un ) =

4.4

D(u1 , . . . , un ) D(T v1 , . . . , T vn ) D(v1 , . . . , vn )

DETERMINANT OF AN OPERATOR

4.4.1 (4.4.1)

D EFINITION :

The determinant det T of an operator T ∈ L(V) is det T = D(T v1 , . . . , T vn ) D(v1 , . . . , vn )

where {v1 , . . . , vn } is an arbitrary basis of V and D is a non-trivial alternating n-form. The independence of det T from the choice of the basis is guaranteed by (4.3.8). Proposition. det T = 0 if, and only if, T is singular, (i.e., ker(T ) = {0}).

JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

59

P ROOF : T is singular if, and only if, it maps a basis onto a linearly dependent set. D(T v1 , . . . , T vn ) = 0 if, and only if, {T v1 , . . . , T vn } is linearly dependent. 4.4.2 Proposition. If T, S ∈ L(V) then (4.4.2) det T S = det T det S.

P ROOF : If either S or T is singular both sides of (4.4.4) are zero. If det S = 0, {Svj } is a basis, and by (4.4.1), det T S = D(T Sv1 , . . . , T Svn ) D(Sv1 , . . . , Svn ) · = det T det S. D(Sv1 , . . . , Svn ) D(v1 , . . . , vn )

4.4.3 O RIENTATION . When V is a real vector space, a non-trivial alternating n-form D determines an equivalence relation among bases. The bases {vj } and {uj } are declared equivalent if D(v1 , . . . , vn ) and D(u1 , . . . , un ) have the same sign. Using −D instead of D reverses the signs of all the readings, but maintains the equivalence. An orientation on V is a choice which of the two equivalence classes to call positive. 4.4.4 A subspace W ⊂ V is T -invariant, (T ∈ L(V)), if T w ∈ W whenever w ∈ W. The restriction TW , deﬁned by w → T w for w ∈ W, is clearly a linear operator on W. T induces also an operator TV/W on the quotient space V/W, see 5.1.5. Proposition. If W ⊂ V is T -invariant, then (4.4.3) det T = det TW det TV/W .

P ROOF : Let {wj }n be a basis for V, such that {wj }k is a basis for W. If TW 1 1 is singular then T is singular and both sides of (4.4.3) are zero. If TW is nonsingular, then w = {T w1 , . . . , T wk } is a basis for W, and {T w1 , . . . , T wk ; wk+1 , . . . , wn } is a basis for V. Let D be a nontrivial alternating n-form on V. Then Φ(u1 , . . . , uk ) = D(u1 , . . . , uk ; wk+1 , . . . , wn ) is a nontrivial alternating k-form on W. —DRAFT—

JANUARY 1, 2006

60

L INEAR A LGEBRA

The value of D(T w1 , . . . , T wk ; uk+1 , . . . , un ) is unchanged if we replace the variables uk+1 , . . . , un by ones that are congruent to them mod W, and the form Ψ(˜k+1 , . . . , un ) = D(T w1 , . . . , T wk ; uk+1 , . . . , un ) is therefore a u ˜ well deﬁned nontrivial alternating n − k-form on V/W. det T = D(T w1 , . . . , T wn ) = D(w1 , . . . , wn )

D(T w1 , . . . , T wk ; wk+1 , . . . , wn ) D(T w1 , . . . , T wn ) · = D(w1 , . . . , wn ) D(T w1 , . . . , T wk ; wk+1 , . . . , wn ) Φ(T w1 , . . . , T wk ) Ψ(T wk+1 , . . . , T wn ) · = det TW det TV/W . Φ(w1 , . . . , wk ) Ψ(wk+1 , . . . , wn ) ˜ ˜

Corollary. If V = Vj and all the Vj ’s are T -invariant, and TVj denotes the restriction of T to Vj , then (4.4.4) 4.4.5 det T =

j

det TVj .

T HE CHARACTERISTIC POLYNOMIAL OF AN OPERATOR .

D EFINITIONS : The characteristic polynomial of an operator T ∈ L(V) is the polynomial χT (λ) = det (T − λ) ∈ F[λ]. Opening up the expression D(T v1 − λv1 , . . . , T vn − λvn ), we see that χT is a polynomial of degree n = dim V, with leading coefﬁcient (−1)n . By proposition 4.4.1, χT (λ) = 0 if, and only if, T − λ is singular, that is if, and only if, ker(T − λ) = {0}. The zeroes of χT are called eigenvalues of T and the set of eigenvalues of T is called the spectrum of T , and denoted σ(T ). For λ ∈ σ(T ), (the nontrivial) ker(T − λ) is called the eigenspace of λ. The non-zero vectors v ∈ ker(T − λ) (that is the vectors v = 0 such that T v = λv) are the eigenvectors of T corresponding to the eigenvalue λ. EXERCISES FOR SECTION 4.4

IV.4.1. Prove that if T is non-singular, then det T −1 = (det T )−1 IV.4.2. If W ⊂ V is T -invariant, then χT (λ) = χTW χTV/W . JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

61

4.5

DETERMINANT OF A MATRIX

4.5.1 Let A = {aij } ∈ M(n). The determinant of A can be deﬁned in several equivalent ways: the ﬁrst—as the determinant of the operator that A deﬁnes on Fn by matrix multiplication; another, the standard deﬁnition, is directly by the following formula, motivated by (4.3.5): a11 a21 det A = . . . an1 a1n a2n sgn [σ]a1,σ(1) · · · an,σ(n) . . = . ... . σ∈Sn . . . ann ... ...

(4.5.1)

The reader should check that the two ways are in fact equivalent. They each have advantages. The ﬁrst deﬁnition, in particular, makes it transparent that det(AB) = det A det B; the second is sometimes readier for computation. 4.5.2 C OFACTORS , EXPANSIONS , AND INVERSES . For a ﬁxed pair (i, j) the elements in the sum above that have aij as a factor are those for which σ(i) = j their sum is (4.5.2)

σ∈Sn , σ(i)=j

sgn [σ]a1,σ(1) · · · an,σ(n) = aij Aij .

The sum, with the factor aij removed, denoted Aij in (4.5.2), is called the cofactor at (i, j). Observe that partitioning the sum in (4.5.1) according to the value σ(i) for some ﬁxed i gives the expansion of the determinant along its i’th row: (4.5.3) det A =

j

aij Aij .

If we consider a “mismatched” sum: j aij Akj for i = k, we obtain the determinant of the matrix obtained from A by replacing the k’th row by the i’th. Since this matrix has two identical rows, its determinant is zero, that is (4.5.4) for i = k,

j

aij Akj = 0.

JANUARY 1, 2006

—DRAFT—

62 A11 A12 Finally, write A = . . . A1n

L INEAR A LGEBRA ... ... ... ... An1 An2 . and observe that . . Ann

j

aij Akj is the

ik’th entry of the matrix AA so that equtions (4.5.3) and (4.5.4) combined are equivalent to (4.5.5) AA = det A I.

1 det(A) A.

Proposition. The inverse of a non-singular matrix A ∈ M(n) is

Historically, the matrix A was called the adjoint of A, but the term adjoint is now used mostly in the context of duality. 4.5.3 T HE CHARACTERISTIC POLYNOMIAL OF A MATRIX . The characteristic polynomial of a matrix A ∈ M(n) is the polynomial χA (λ) = det (A − λ). Proposition. If A, B ∈ M(n) are similar then they have the same characteristic polynomial. In other words, χA is similarity invariant. P ROOF : Similar matrices have the same determinant: they represent the same operator using different basis and the determinant of an operator is independent of the basis. Equivalently, if B = CAC −1 , then det B = det(CAC −1 ) = det C det A(det C)−1 = det A. Also, if B = CAC −1 , then B−λ = C(A−λ)C −1 , which implies det(B− λ) = det(A − λ). The converse is not always true—matrices (or operators) that have the same characteristic polynomials may not be similar. See exercise IV.5.2. If we write χA = an = (−1)n ,

n j 0 aj λ ,

then

n

a0 = det A,

and

an−1 = (−1)n−1

1

aii .

The sum n aii , denoted trace A, is called the trace of the matrix A. Like any 1 part of χA , the trace is similarity invariant.

JANUARY 1, 2006

—DRAFT—

IV.

D ETERMINANTS

63

The trace is just one coefﬁcient of the characteristic polynomial and is not a complete invariant. However, we shall see later that the traces of Aj for all 1 ≤ j ≤ n determine χA (λ) completely. EXERCISES FOR SECTION 4.5

IV.5.1. A matrix A = {aij } ∈ M(n) is upper triangular if aij = 0 when i > j. A is lower triangular if aij = 0 when i < j. Prove that if A is either upper or lower n triangular then det A = i=1 aii . IV.5.2. Let A = I be a lower triangular matrix with all the diagonal elements equal to 1. Prove that χA = χI (I is the identity matrix); is A similar to I? IV.5.3. How can the algorithm of reduction to row echelon form be used to compute determinants? IV.5.4. Let A ∈ M(n). A deﬁnes an operator on Fn , as well as on M(n), both by matrix multiplication. What is the relation between the values of det A as operator in the two cases? IV.5.5. Prove the following properties of the trace: 1. If A, B ∈ M(n), then trace(A + B) = trace A + trace B. 2. If A ∈ M(m, n) and B ∈ M(n, m), then trace AB = trace BA. IV.5.6. If A, B ∈ M(2), then (AB − BA)2 = − det (AB − BA)I.

IV.5.7. Prove that the characteristic polynomial of the n × n matrix A = (ai,j ) is n equal to i=1 (ai,i − λ) plus a polynomial of degree bounded by n − 2. IV.5.8. Assuming F = C, prove that trace ai,j is equal to the sum (including multiplicity) of the zeros of the characteristic polynomial of ai,j . In other words, if n the characteristic polynomial of ai,j is equal to j=1 (λ−λj ), then λj = ai,i . IV.5.9. Let A = (ai,j ) ∈ M(n) and let m > n/2. Assume that ai,j = 0 whenever both i ≤ m and j ≤ m. Prove that det(A) = 0. IV.5.10. The Fibonacci sequence is the sequence {fn } deﬁned inductively by: f1 = 1, f2 = 1, and fn = fn−1 + fn−2 for n ≥ 3, so that the start of the sequence is 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . . Let (ai,j ) be an n × n matrix such that ai,j = 0 when |j − i| > 1 (that is the only non-zero elements are on the diagonal, just above it, or just below it). Prove that the

—DRAFT—

JANUARY 1, 2006

64

L INEAR A LGEBRA

number of non-zero terms in the expansion of the detrminant of (ai,j ) is at most equal to fn+1 . IV.5.11. The Vandermonde determinant. Given scalars aj , j = 1, . . . , n, the Vandermonde determinant V (a1 , . . . , an ) is deﬁned by 1 1 V (a1 , . . . , an ) = . . . a1 a2 . . . a2 1 a2 2 . . . a2 n ... ... . . . ... an−1 1 an−1 2 . . . an−1 n

1 an

Use the following steps to compute V (a1 , . . . , an ). Observe that 1 1 . V (a1 , . . . , an , x) = . . a1 a2 . . . a2 1 a2 2 . . . a2 n x2 ... ... . . . ... ... an 1 an 2 . . . an n xn

1 an 1 x is a polynomial of degree n (in x). a. Prove that

n

V (a1 , . . . , an , x) = V (a1 , . . . , an )

j=1

(x − aj )

b. Use induction to prove V (a1 , . . . , an ) =

IV.5.12. A trigonometric polynomial P (x) = has a zero of order (j) m (a point x0 such that P (x0 ) = 0 for j = 0, . . . m − 1) is identically zero. IV.5.13. Let C ∈ M(n, C) be non-singular. Let C, resp. C, be the matrix whose entries are the real parts, resp. the imaginary parts, of the corresponding entries in C. Prove that for all but a ﬁnite number of values of a ∈ R, the matrix C + a C is non-singular. Hint: Show that replacing a single column in C by the corresponding column in C + a C creates a non-singular matrix for all but one value of a. (The determinant is a non-trivial linear function of a.) IV.5.14. Given that the matrices B1 , B2 ∈ M(n; R) are similar in M(n; C), show that they are similar in M(n; R).

i<j (aj − ai ). m iαj x that j=1 aj e

JANUARY 1, 2006

—DRAFT—

Chapter V

Invariant subspaces

The study of linear operators on a ﬁxed vector space V (as opposed to linear maps between different spaces) takes full advantage of the fact that L(V) is an algebra. Polynomials in T play an important role in the understanding of T itself. In particular they provide a way to decompose V into a direct sum of T -invariant subspaces (see below) on each of which the behaviour of T is relatively simple. Studying the behavior of T on various subspaces justiﬁes the following deﬁnition. D EFINITION : A linear system, or simply a system, is a pair (V, T ) where V is a vector space and T ∈ L(V). When we add adjectives they apply in the appropriate place, so that a ﬁnite dimensional system is a system in which V is ﬁnite dimensional, while an invertible system is one in which T is invertible.

5.1 INVARIANT SUBSPACES

5.1.1

Let (V, T ) be a linear system.

D EFINITION : A subspace V1 ⊂ V is T -invariant if T V1 ⊆ V1 . If V1 is T -invariant and v ∈ V1 , then T j v ∈ V1 for all j, and in fact P (T )v ∈ V1 for every polynomial P . Thus, V1 is P (T )-invariant for all P ∈ F[x]. E XAMPLES : a. Both ker(T ) and range(T ) are (clearly) T -invariant. b. If S ∈ L(V) and ST = T S, then ker(S) and range(S) are T -invariant since if Sv = 0 then ST v = T Sv = 0, and T SV = S(T V) ⊂ SV. In particular, if P is a polynomial then ker(P (T )) and range(P (T )) are T -invariant. 65

66

L INEAR A LGEBRA

c. Given v ∈ V, the set span[T, v] = {P (T )v : P ∈ F[x]} is clearly a subspace, clearly T -invariant, and clearly the smallest T -invariant subspace containing v. 5.1.2 Recall (see 4.4.5) that λ ∈ F is an eigenvalue of T if ker(T − λ) is nontrivial, i.e., if there exists vectors v = 0 such that T v = λv (called eigenvectors “associated with”, or “corresponding to” λ). Eigenvectors provide the simplest—namely, one dimensional—T -invariant subspaces. The spectrum σ(T ) is the set of all the eigenvalues of T . It is (see 4.4.5) the set of zeros of the characteristic polynomial χT (λ) = det (T − λ). If the underlying ﬁeld F is algebraically closed every non-constant polynomial has zeros in F and every T ∈ L(V) has non-empty spectrum. Proposition (Spectral Mapping theorem). Let T ∈ L(V), λ ∈ σ(T ), and P ∈ F[x]. Then a. P (λ) ∈ σ(P (T )). b. For all k ∈ N, (5.1.1) ker((P (T ) − P (λ))k ) ⊃ ker((T − λ)k ).

c. If F is algebraically closed, then σ(P (T )) = P (σ(T )). P ROOF : a. (P (x)−P (λ)) is divisible by x−λ: (P (x)−P (λ)) = Q(x)(x−λ), and (P (T ) − P (λ)) = Q(T )(T − λ) is not invertible. b. (P (x)−P (λ)) = Q(x)(x−λ) implies: (P (x)−P (λ))k = Qk (x−λ)k , and (P (T ) − P (λ))k = Qk (T )(T − λ)k . If v ∈ ker((T − λ)k ), i.e., (T − λ)k v = 0, then (P (T ) − P (λ))k v = Qk (T )(T − λ)k v = 0. c. If F is algebraically closed and µ ∈ F, denote by cj (µ) the roots of P (x)−µ, and by mj their multiplicities, so that P (x) − µ = (x − cj (µ))mj , and P (T ) − µ = (T − cj (µ))mj .

Unless cj (µ) ∈ σ(T ) for some j, all the factors are invertible, and so is their product. Remark: If F is not algebraically closed, σ(P (T )) may be strictly bigger than P (σ(T )). For example, if F = R, T is a rotation by π/2 on R2 , and P (x) = x2 , then σ(T ) = ∅ while σ(T 2 ) = {−1}.

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

67

5.1.3 T -invariant subspaces are P (T )-invariant for all polynomials P . Notice, however, that a subspace W can be T 2 -invariant, and not be T -invariant. Example: V = R2 and T maps (x, y) to (y, x). T 2 = I, the identity, so that everything is T 2 -invariant. But only the diagonal {(x, x) : x ∈ R} is T -invariant. Assume that T, S ∈ L(V) commute. a. T commutes with P (S) for every polynomial P ; consequently (see 5.1.1 b.) ker(P (S)) and range(P (S)) are T -invariant. In particular, for every λ ∈ F, ker(S − λ) is T -invariant. b. If W is a S-invariant subspace, then T W is S-invariant. This follows from: ST W = T SW ⊂ T W. There is no claim that W is T -invariant1 . Thus, kernels offer “a special situation.” c. If v is an eigenvector for S with eigenvalue λ, it is contained in ker(S − λ) which is T invariant. If ker(S − λ) is one dimensional, then v is an eigenvector for T . 5.1.4 Theorem. Let W ⊂ V, and T ∈ L(V). The following statements are equivalent: a. W is T -invariant; b. W ⊥ is T ∗ -invariant. P ROOF : For all w ∈ W and u∗ ∈ W ⊥ we have (T w, u∗ ) = (w, T ∗ u∗ ). Statement a. is equivalent to the left-hand side being identically zero; statement b. to the vanishing of the right-hand side.

An obvious example is S = I, which commutes with every operator T , and for which all subspaces are invariant.

1

—DRAFT—

JANUARY 1, 2006

68

L INEAR A LGEBRA

5.1.5 If W ⊂ V is a T -invariant subspace, we deﬁne the restriction TW of T to W by TW v = T v for v ∈ W. The operator TW is clearly linear on W, and every TW -invariant subspace W1 ⊂ W is T -invariant. Similarly, if W is T -invariant, T induces a linear operator TV/W on the quotient V/W as follows: (5.1.2) TV/W (v + W) = T v + W.

v + W is the coset of W containing v and, we justify the deﬁnition by showing that it is independent of the choice of the representative: if v1 − v ∈ W then, by the T -invariance of W, T v1 − T v = T (v1 − v) ∈ W. The reader should check that TV/W is in fact linear. 5.1.6 The fact that when F algebraically closed, every operator T ∈ L(V) has eigenvectors, applies equally to (V ∗ , T ∗ ). If V is n-dimensional and u∗ ∈ V ∗ is an eigenvector for T ∗ , then Vn−1 = ∗ ]⊥ = {v ∈ V : (v, u∗ ) = 0} is T invariant and dim V [u n−1 = n − 1. Repeating the argument in Vn−1 we ﬁnd a T -invariant Vn−2 ⊂ Vn−1 of dimension n − 2, and repeating the argument a total of n − 1 times we obtain: Theorem. Assume that F is algebraically closed, and let V be a ﬁnite dimensional vector space over F. For any T ∈ L(V), there exist a ladder2 {Vj }, j = 0, . . . , n, of T -invariant subspaces of Vn = V, such that (5.1.3) V0 = {0}, Vn = V; Vj−1 ⊂ Vj , and dim Vj = j.

Corollary. If F is algebraically closed, then every matrix A ∈ M(n; F) is similar to an upper triangular matrix. P ROOF : Apply the theorem to the operator T of left multiplication by A on Fn . Choose vj in Vj \ Vj−1 , j = 1, . . . , n, then {v1 , . . . , vn } is a basis for V c and the matrix B corresponding to T in this basis is (upper) triangular. The matrices A and B represent the same operator relative to two bases, hence are similar.

2

Also called a complete

ﬂag.

—DRAFT—

JANUARY 1, 2006

V.

I NVARIANT SUBSPACES

69

Observe that if the underlying ﬁeld is R, which is not algebraically closed, and T is a rotation by π/2 on R2 , T admits no invariant subspaces. EXERCISES FOR SECTION 5.1

V.1.1. Let W be T -invariant, P a polynomial. Prove that P (T )W = P (TW ). V.1.2. Let W be T -invariant, P a polynomial. Prove that P (T )V/W = P (TV/W ). V.1.3. Let W be T -invariant. Prove that ker(TW ) = ker(T ) ∩ W. V.1.4. Prove that every upper triangular matrix is similar to a lower triangular one (and vice versa). V.1.5. If V1 ⊂ V is a subspace, then the set {S : S ∈ L(V), SV1 ⊂ V1 } is a subalgebra of L(V). V.1.6. Show that if S and T commute and v is an eigenvector for S, it need not be an eigenvector for T (so that the assumption in the ﬁnal remark of 5.1.3 that ker(S − λ) is one dimensional is crucial). V.1.7. Prove theorem 5.1.6 without using duality. Hint: Start with an eigenvector u1 of T . Set U1 = span[u1 ]; Let u2 ∈ V/ U1 be an ˜ ˜ eigenvector of TV/ U1 , u2 ∈ V a representative of u2 , and U2 = span[u1 , u2 ]. Verify that U2 is T -invariant. Let u3 ∈ V/ U2 be an eigenvector of TV/ U2 , etc. ˜ 5.2 THE MINIMAL POLYNOMIAL

5.2.1 T HE MINIMAL POLYNOMIAL FOR (T, v). Given v ∈ V, let m be the ﬁrst positive integer k such that {T j v}k is linearly 0 dependent or, equivalently, that T m v is a linear combination of {T j v}m−1 , 0 say3

m−1

(5.2.1)

T v=−

0

m

aj T j v.

The coefﬁcients aj are uniquely determined—this since, by assumption, is independent. For k > 0 we have T m+k v = m−1 aj T j+k v, and 0 induction on k establishes that T m+k v ∈ span[v, . . . , T m−1 v]. It follows that {T j v}m−1 is a basis for span[T, v]. 0 {T j v}m−1 0

3

The minus sign is there to give the common notation: minPT,v (x) = xm +

m−1 0

aj xj

—DRAFT—

JANUARY 1, 2006

70

L INEAR A LGEBRA

D EFINITION : The polynomial minPT,v (x) = xm + m−1 aj xj , with aj 0 deﬁned by (5.2.1) is called the minimal polynomial for (T, v). Theorem. minPT,v (x) is the monic polynomial of lowest degree that satisﬁes P (T )v = 0. The set NT,v = {P ∈ F[x] : P (T )v = 0} is an ideal in F[x]. The theorem identiﬁes minPT,v as its generator. Observe that P (T )v = 0 is equivalent to “P (T )u = 0 for all u ∈ span[T, v]”. 5.2.2 C YCLIC VECTORS . A vector v ∈ V is cyclic for the system (V, T ) if it is not contained in any T -invariant proper subspace, that is, if span[T, v] = V. Not every linear system admits cyclic vectors4 ; a system that does is called a cyclic system. If v is a cyclic vector for (V, T ) and minPT,v (x) = xn + n−1 aj xj , then 0 the matrix of T with respect to the basis v = {v, T v, . . . , T n−1 v} has the form

0 0 ... 1 0 . . . = 0 1 . . . . . . . . . 0 0 ... 0 0 0 . . . −a0 −a1 −a2 . . .

(5.2.2)

AT,v

1 −an−1

We normalize v so that D(v, T v, . . . , T n−1 v) = 1 and compute the characteristic polynomial (see 4.4.5) of T , using the basis v = {v, T v, . . . , T n−1 v}: (5.2.3)

χT (λ) = det (T − λ) = D(T v − λv, . . . , T n v − λT n−1 v).

Replace T n v = − n−1 ak T k v, and observe that the only nonzero summand 0 in the expansion of D(T v−λv, . . . , T j v−λT j−1 v, . . . , T n−1 v−λT n−2 v, T k v) is obtained by taking −λT j v for j ≤ k and T j v for j > k so that D(T v − λv, . . . , T n−1 v − λT n−2 v, T k v) = (−λ)k (−1)n−k−1 = (−1)n−1 λk .

4

Consider T = I

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

71

Adding these, with the weights −ak for k < n−1 and −λ−an−1 for k = n−1, we obtain (5.2.4)

χT (λ) = (−1)n minPT,v (λ).

In particular, (5.2.4) implies that if T has a cyclic vector, then χT (T ) = 0. This is a special case, and a step in the proof, of the following theorem. Theorem (Hamilton-Cayley). χT (T ) = 0. P ROOF : We show that χT is a multiple of minPT,v for every u ∈ V. This implies χT (T )v = 0 for all u ∈ V, i.e., χT (T ) = 0. Let u ∈ V, denote U = span[T, u] and minPT,u = λm + m−1 aj λj . 0 The vectors u, T u, . . . , T m−1 u form a basis for U. Complete {T j u}m−1 to a 0 basis for V by adding w1 , . . . , wn−m . Let AT be the matrix of T with respect to this basis. The top left m × m submatrix of AT is the matrix of T U , and the (n − m) × m rectangle below it has only zero entries. It follows that χT = χT U Q, where Q is the characteristic polynomial of the (n−m)×(n−m) lower right submatrix of A, and since χT U = (−1)m minPT,u (by (5.2.4) applied to T U ) the proof is complete. An alternate way to word the proof, and to prove an additional claim along the way, is to proceed by induction on the dimension of the space V. A LTERNATE PROOF : If n = 1 the claim is obvious. Assume the statement valid for all systems of dimension smaller than n. Let u ∈ V, u = 0, and U = span[T, u]. If U = V the claims are a consequence of (5.2.4) as explained above. Otherwise, U and V/ U have both dimension smaller than n and, by Proposition 4.4.3 applied to T − λ, (exercise IV.4.2) we have χT = χT U χTV/ U . By the induction hypothesis, χTV/ U (TV/ U ) = 0, which means that χTV/ U (T ) maps V into U, and since χT U (T ) maps U to 0 we have χT (T ) = 0. The additional claim is: Proposition. Every prime factor of χT is a factor of minPT,u for some u ∈ V. —DRAFT—

JANUARY 1, 2006

72

L INEAR A LGEBRA

P ROOF : We return to the proof by induction, and add the statement of the proposition to the induction hypothesis. Each prime factor of χT is either a factor of χT U or of χTV/ U and, by the strengthened induction hypothesis, is either a factor of minPT,u or of minPTV/ U ,˜ for some v = v + U ∈ V/ U. ˜ v In the latter case, observe that minPT,v (T )v = 0. Reducing mod U gives minPT,v (TV/ U )˜ = 0, which implies that minPTV/ U ,˜ divides minPT,v . v v 5.2.3 Going back to the matrix deﬁned in (5.2.2), let P (x) = xn + be an arbitrary monic polynomial, the matrix

0 1 0 . . . 0 0 ... 0 ... 1 ... . . . 0 ... 0 0 0 . . . −b0 −b1 −b2 . . . .

bj xj

(5.2.5)

1 −bn−1

is called the companion matrix of the polynomial P . If {u0 , . . . , un−1 } is a basis for V, and S ∈ L(V) is deﬁned by: Suj = n−2 uj+1 for j < n − 1, and Sun−1 = − 0 bj uj . Then u0 is cyclic for (V, S), the matrix (5.2.5) is the matrix AS,u of S with respect to the basis u = {u0 , . . . , un−1 }, and minPS,u0 = P . Thus, every monic polynomial of degree n is minPS,u , the minimal polynomial of some cyclic vector u in an n-dimensional system (V, S). 5.2.4 T HE MINIMAL POLYNOMIAL . Let T ∈ L(V). The set NT = {P : P ∈ F[x], P (T ) = 0} is an ideal in F[x]. The monic generator5 for NT , is called the minimal polynomial of T and denoted minPT . To put it simply: minPT is the monic polynomial P of least degree such that P (T ) = 0. Since the dimension of L(V) is n2 , any n2 + 1 powers of T are linearly dependent. This proves that NT is non-trivial and that the degree of minPT is at most n2 . By the Hamilton-Cayley Theorem, χT ∈ NT which means that minPT divides χT and its degree is therefore no bigger than n.

5

See A.6.1

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

73

The condition P (T ) = 0 is equivalent to “P (T )v = 0 for all v ∈ V”, and the condition “P (T )v = 0” is equivalent to minPT,v divides minPT . A moment’s reﬂection gives: Proposition. minPT is the least common multiple of minPT,v for all v ∈ V. Invoking proposition 5.2.2 we obtain Corollary. Every prime factor of χT is a factor of minPT . We shall see later (exercise V.3.7) that there are always vectors v such that minPT is equal to minPT,v . 5.2.5 The minimal poynomial gives much information on T and on polynomials in T . Lemma. Let P1 be a polynomial. Then P1 (T ) is invertible if, and only if, P1 is relatively prime to minPT . P ROOF : Denote P = gcd(P1 , minPT ). By Theorem A.6.2, there exist polynomials q, q1 such that q1 P1 + q minPT = P . Substituting T for x we have q1 (T )P1 (T ) = P (T ). If P = 1, P1 (T ) is invertible and q1 (T ) is its inverse. If P = 1 we write minPT = P Q, so that P (T )Q(T ) = minPT (T ) = 0 and hence ker(P (T )) ⊃ range(Q(T )). The minimality of minPT guarantees that Q(T ) = 0 so that range(Q(T )) = {0}, and since P is a factor of P1 , ker(P1 (T )) ⊃ ker(P (T )) = {0} and P1 (T ) is not invertible. Comments: a. If P1 (x) = x, the lemma says that T itself is invertible if, and only if, minPT (0) = 0. The proof for this case reads: if minPT = xQ(x), and T is invertible, then Q(T ) = 0, contradicting the minimality. On the other hand if minPT (0) = a = 0, write R(x) = a−1 x−1 (a − minPT ), and observe that T R(T ) = I − a−1 minPT (T ) = I. b. If minPT is P (x), then the minimal polynomial for T + λ is P (x − λ). It follows that T − λ is invertible unless x − λ divides minPT , that is unless minPT (λ) = 0. —DRAFT—

JANUARY 1, 2006

74

L INEAR A LGEBRA

EXERCISES FOR SECTION 5.2

V.2.1. Let T ∈ L(V) and v ∈ V. Prove that if u ∈ span[T, v], then minPT,u divides minPT,v . V.2.2. Let U be a T -invariant subspace of V and TV/ U the operator induced on V/ U. Let v ∈ V, and let v be its image in V/ U. Prove that minPTV/ U ,˜ divides minPT,v . ˜ v V.2.3. If (V, T ) is cyclic (has a cyclic vector), then every S that commutes with T is a polynomial in T . (In other words, P(T ) is a maximal commutative subalgebra of L(V).) Hint: If v is cyclic, and Sv = P (T )v for some polynomial P , then S = P (T ) V.2.4. (V, T ) is cyclic if, and only if, deg minPT = dim V V.2.5. If minPT is irreducible then minPT,v = minPT for every v = 0 in V. V.2.6. Let P1 , P2 ∈ F[x]. Prove: ker(P1 (T )) ∩ ker(P2 (T )) = ker(gcd(P1 , P2 )). V.2.7. (Schur’s lemma) A system {W, S}, S ⊂ L(W), is minimal if no nontrivial subspace of W is invariant under every S ∈ S. Assume {W, S} minimal, and T ∈ L(W). a. If T commute with every S ∈ S, so does P (T ) for every polynomial P . b. If T commute with every S ∈ S, then ker(T ) is either {0} or W. That means that T is either invertible or identically zero. c. With T as above, the minimal polynomial minPT is irreducible. d. If T commute with every S ∈ S, and the underlying ﬁeld is C, then T = λI.

Hint: The minimal polynomial of T must be irreducible, hence linear.

V.2.8. Assume T invertible and deg minPT = m. Prove that minPT-1 (x) = cxm minPT (x−1 ), where c = minPT (0)−1 . V.2.9. Let T ∈ L(V). Prove that minPT vanishes at every zero of χT .

Hint: If T v = λv then T k v = λk v and P (T )v = P (λ)v for any polynomial.

V.2.10. What is the characteristic, resp. minimal, polynomial of the 7 × 7 matrix ai,j deﬁned by ai,j = 1 if 3 ≤ j = i + 1 ≤ 7, 0 otherwise.

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

75

V.2.11. Assume that A is a non-singular matrix and let ϕ(x) = xk + 0 aj xj be its minimal polynomial. Prove that a0 = 0 and explain how knowing ϕ gives an efﬁcient way to compute the inverse A−1 . 5.3 REDUCING.

k−1

5.3.1 Let (V, T ) be a linear system. A subspace V1 ⊂ V reduces T if it is T -invariant and has a T -invariant complement, that is, a T -invariant subspace V2 such that V = V1 ⊕ V2 . A system (V, T ) that admits no reducing subspaces is irreducible. We say also that T is irreducible on V. An invariant subspace is irreducible if T restricted to it is irreducible . Theorem. Every system (V, T ) is completely decomposable, that is, can be decomposed into a direct sum of irreducible systems. P ROOF : Use induction on n = dim V. If n = 1 the system is trivially irreducible. Assume the validity of the statement for n < N and let (V, T ) be of dimension N . If (V, T ) is irreducible the decomposition is trivial. If (V, T ) is reducible, let V = V1 ⊕ V2 be a non-trivial decomposition with T -invariant Vj . Then dim Vj < N , hence each system (Vj , TVj ) is completely decomposable, Vj = k Vj,k with every Vj,k T -invariant, and V = j,k Vj,k . Our interest in reducing subspaces is that operators can be analyzed separately on each direct summand (of a direct sum of invariant subspaces). The effect of a direct sum decomposition into T -invariant subspaces on the matrix representing T (relative to an appropriate basis) can be seen as follows: Assume V = V1 ⊕ V2 , with T -invariant Vj , and {v1 , . . . , vn } is a basis for V such that the ﬁrst k elements are a basis for V1 while the the last l = n − k elements are a basis for V2 . The entries ai,j of the matrix AT of T relative to this basis are zero unless both i and j are ≤ k, or both are > k. AT consists of two square blocks centered on the diagonal. The ﬁrst is the k × k matrix of T restricted to V1 (relative to the basis {v1 , . . . , vk }), and the second is the l × l matrix of T restricted to V2 (relative to {vk+1 , . . . , vn }). —DRAFT—

JANUARY 1, 2006

76

L INEAR A LGEBRA

Similarly, if V = s Vj is a decomposition with T -invariant compoj=1 nents, and we take as basis for V the union of s successive blocks—the bases of Vj , then the matrix AT relative to this basis is the diagonal sum6 of square matrices, Aj , i.e., consists of s square matrices A1 ,. . . , As along the diagonal (and zero everywhere else). For each j, Aj is the matrix representing the action of T on Vj relative to the chosen basis. 5.3.2 The rank and nullity theorem (see Chapter II, 2.5) gives an immediate characterization of operators whose kernels are reducing. Proposition. Assume V ﬁnite dimensional and T ∈ L(V). ker(T ) reduces T if, and only if, ker(T ) ∩ range(T ) = {0}. P ROOF : Assume ker(T ) ∩ range(T ) = {0}. Then the sum ker(T ) + range(T ) is a direct sum and, since dim (ker(T ) ⊕ range(T )) = dim ker(T ) + dim range(T ) = dim V, we have V = ker(T ) ⊕ range(T ). Both ker(T ) and range(T ) are T -invariant and the direct sum decomposition proves that they are reducing. The opposite implication is proved in Proposition 5.3.3 below. Corollary. ker(T ) and range(T ) reduce T if, and only if, ker(T 2 ) = ker(T ). P ROOF : For any T ∈ L(V) we have ker(T 2 ) ⊇ ker(T ) and the inclusion is proper if, and only if, there exist vectors v such that T v = 0 but T 2 v = 0, which amounts to T v ∈ ker(T ). 5.3.3 Given that V1 ⊂ V reduces T —there exists a T -invariant V2 such that V = V1 ⊕ V2 —how uniquely determined is V2 . Cosidering the somewhat extreme example T = I, the condition of T -invariance is satiﬁed trivially and we realize that V2 is far from being unique. There are, however, cases in which the “complementary invariant subspace”, if there is one, is uniquely determined. We propose to show now that this is the case for the T -invariant subspaces ker(T ) and range(T ).

6

Also called the direct

sum of Aj , j = 1, . . . , s

—DRAFT—

JANUARY 1, 2006

V.

I NVARIANT SUBSPACES

77

Proposition. Let V be ﬁnite dimensional and T ∈ L(V). a. If V2 ⊂ V is T -invariant and V = ker(T ) ⊕ V2 , then V2 = range(T ). b. If V1 ⊂ V is T -invariant and V = V1 ⊕ range(T ), then V1 = ker(T ). P ROOF : a. As dim ker(T ) + dim V2 = dim V = dim ker(T ) + dim range(T ), we have dim V2 = dim range(T ). Also, since ker(T ) ∩ V2 = {0}, T is 1-1 on V2 and dim T V2 = dim range(T ). Now, T V2 = range(T ) ⊂ V2 and, since they have the same dimension, V2 = range(T ). b. T V1 ⊂ V1 ∩ range(T ) = {0}, and hence V1 ⊂ ker(T ). Since V1 has the same dimension as ker(T ) they are equal. 5.3.4 T HE CANONICAL PRIME - POWER DECOMPOSITION .

Lemma. If P1 and P2 are relatively prime, then (5.3.1) ker(P1 (T )) ∩ ker(P2 (T )) = {0}.

If also P1 (T )P2 (T ) = 0 then V = ker(P1 (T )) ⊕ ker(P2 (T )), and the corresponding projections are poynomials in T . P ROOF : Given that P1 and P2 are relatively prime there exist, By Appendix A.6.1, polynomials q1 , q2 such that q1 P1 + q2 P2 = 1. Substituting T for the variable we have (5.3.2) q1 (T )P1 (T ) + q2 (T )P2 (T ) = I.

If v ∈ ker(P1 (T )) ∩ ker(P2 (T )), that is, P1 (T )v = P2 (T )v = 0, then v = q1 (T )P1 (T )v + q2 (T )P2 (T )v = 0. This proves (5.3.1) which implies, in particular, that dim ker(P1 (T )) + dim ker(P2 (T )) ≤ n. If P1 (T )P2 (T ) = 0, then the range of either Pj (T ) is contained in the kernel of the other. By the Rank and Nullity theorem (5.3.3) n = dim ker(P1 (T ))+ dim range(P1 (T )) ≤ dim ker(P1 (T )) + dim ker(P2 (T )) ≤ n.

It follows that dim ker(P1 (T )) + dim ker(P2 (T )) = n, which implies that ker(P1 (T )) ⊕ ker(P2 (T )) is all of V. —DRAFT—

JANUARY 1, 2006

78

L INEAR A LGEBRA

Observe that the proof shows that (5.3.4) range(P1 (T )) = ker(P2 (T )) and range(P2 (T )) = ker(P1 (T )).

Equation (5.3.2) implies that ϕ2 (T ) = q1 (T )P1 (T ) = I − q2 (T )P2 (T ) is the identity on ker(P2 (T )) and zero on ker(P1 (T )), that is, ϕ2 is the projection onto ker(P2 (T )) along ker(P1 (T )). Similarly, ϕ1 (T ) = q2 (T )P2 (T ) is the projection onto ker(P1 (T )) along ker(P2 (T )). Corollary. For every factorization minPT = l Pj into pairwise relatively j=1 prime factors, we have a direct sum decomposition of V

l

(5.3.5)

V=

j=1

ker(Pj (T )).

P ROOF : Use induction on the number of factors. For the prime-power factorization minPT = Φj j , where the Φj ’s are distinct prime (irreducible) polynomials in F[x], and mj their respective multiplicities, we obtain the canonical prime-power decomposition of (V, T ):

k m

(5.3.6)

m

V=

j=1

ker(Φj j (T )).

m

The subspaces ker(Φj j (T )) are called the primary components of (V, T ) Comments: By the Cayley-Hamilton theorem and corollary 5.2.4, the primepower factors of χT are those of minPT , with at least the same multiplicities, that is: (5.3.7)

χT =

Φj j ,

s

with sj ≥ mj .

m m

The minimal polynomial of T restricted to ker(Φj j (T )) is Φj j and its s m characteristic polynomial is Φj j . The dimension of ker(Φj j (T )) is sj deg(Φj ).

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

79

5.3.5 When the underlying ﬁeld F is algebraically closed, and in particular when F = C, every irreducible polynomial in F[x] is linear and every polynomial is a product of linear factors, see Appendix A.6.5. Recall that the spectrum of T is the set σ(T ) = {λj } of zeros of χT or, equivalently, of minPT . the prime-power factorization of minPT (for systems over an algebraically closed ﬁeld) has the form minPT = λ∈σ(T ) (x − λ)m(λ) where m(λ) is the multiplicity of λ in minPT . The space Vλ = ker((T − λ)m(λ) ) is called the generalized eigenspace, or, nilspace of λ. The canonical decomposition of (V, T ) is given by: (5.3.8) V=

λ∈σ(T )

Vλ .

5.3.6 The projections ϕj (T ) corresponding to the the canonical prime-power decomposition are given by ϕj (T ) = qj (T ) i=j Φmi (T ), where the polynoi mials qi are given by the representations (see Corollary A.6.2) qj

i=j ∗ Φmi + qj Φj i mj

= 1.

An immediate consequence of the fact that these are all polynomials in T is that they all commute, and commute with T . m If W ⊂ V is T -invariant then the subspaces ϕj (T )W = W ∩ker(Φj j (T )), are T -invariant and we have a decomposition

k

(5.3.9)

W=

j=1

ϕj (T )W

Proposition. The T -invariant subspace W is reducing if, and only if, ϕj (T )W m is a reducing subspace of ker(Φj j (T )) for every j. P ROOF : If W is reducing and U is a T -invariant complement, then ker(Φj j (T )) = ϕj (T )V = ϕj (T )W ⊕ ϕj (T )U, and both components are T -invariant. m Conversely, if Uj is T -invariant and ker(Φj j (T )) = ϕj (T )W ⊕ Uj , then U= Uj is an invariant complement to W. —DRAFT—

JANUARY 1, 2006

m

80

L INEAR A LGEBRA

5.3.7 Recall (see 5.3.1) that if V = s Vj is a direct sum decomposition j=1 into T invariant subspaces, and if we take for a basis on V the union of bases of the summands Vj , then the matrix of T with respect to this basis is the diagonal sum of the matrices of the restrictions of T to the components Vj . By that we mean

A1 0 AT = 0 . . . 0 0 A2 0 . . . 0 ... 0 A3 0 0 ... 0 .. . 0 0 0 . . . . . . As

(5.3.10)

where Aj is the matrix of TVj (the restriction of T to the component Vj in the decomposition.) EXERCISES FOR SECTION 5.3

V.3.1. Let T ∈ L(V), k > 0 and integer. Prove that ker(T k ) reduces T if, and only if ker(T k+1 ) = ker(T k ). Hint: Both ker(T k ) and range(T k ) are T -invariant. V.3.2. Let T ∈ L(V), and V = U ⊕ W with both summands T -invariant. Let π be the projection onto U along W. Prove that π commutes with T . V.3.3. Prove that if (V, T ) is irreducible, then its minimal polynomial is “prime power” that is, minPT = Φm with Φ irreducible and m ≥ 1. V.3.4. If Vj = ker(Φj j (T )) is a primary component of (V, T ), the minimal polynom mial of TVj is Φj j . V.3.5. Show that if minPT = Φm , with Φ irreducible, then there exist vectors v ∈ V such that minPT,v = minPT . V.3.6. Let v1 , v2 ∈ V and assume that minPT,v1 and minPT,v2 are relatively prime. Prove that minPT,v1 +v2 = minPT,v1 minPT,v2 . Hint: Write Pj = minPT,vj Q = minPT,v1 +v2 , and let qj be polynomials such that q1 P1 + q2 P2 = 1. Then Qq2 P2 (T )(v1 + v2 ) = Q(T )(v1 ) = 0, and so P1 Q. Similarly P2 Q, hence P1 P2 Q. Also, P1 P2 (T )(v1 + v2 ) = 0, and Q P1 P2 . V.3.7. Show that there always exist vectors v ∈ V such that minPT,v = minPT . Hint: use the prime-power decomposition and the previous exercise. JANUARY 1, 2006

m

—DRAFT—

V.

I NVARIANT SUBSPACES

81

5.4

SEMISIMPLE SYSTEMS.

5.4.1 D EFINITION : The system (V, T ) is semisimple if every T -invariant subspace of V is reducing. Theorem. The system (V, T ) is semisimple if, and only if, minPT is square free (that is, the multiplicities mj of the factors in the canonical factorization m minPT = Φj j are all 1). P ROOF : Proposition 5.3.6 reduces the general case to that in which minPT is Φm with Φ irreducible. a. When m > 1. Φ(T ) is not invertible and hence the invariant subspace ker(Φ(T )) is non-trivial nor is it all of V. ker(Φ(T )2 ) is strictly bigger than ker(Φ(T )) and, by corollary 5.3.2, ker(Φ(T )) is not Φ(T )-reducing, and hence not T -reducing. b. When m = 1. Observe ﬁrst that minPT,v = Φ for every non-zero v ∈ V. This since minPT,v divides Φ and Φ is prime. It follows that the dimension of span[T, v] is equal to the degree d of Φ, and hence: every non-trivial T invariant subspace has dimension ≥ d. Let W ⊂ V be a proper T -invariant subspace, and v1 ∈ W. The subspace span[T, v1 ] ∩ W is T -invariant and is properly contained in span[T, v1 ], so that its dimension is smaller than d, hence span[T, v1 ] ∩ W = {0}. It follows that W1 = span[T, (W, v1 )] = W ⊕ span[T, v1 ]. If W1 = V, let v2 ∈ V \ W1 and deﬁne W2 = span[T, (W, v1 , v2 )]. The argument above shows that W2 = W ⊕ span[T, v1 ] ⊕ span[T, v2 ]. This can be repeated until, for the appropriate7 k, we have (5.4.1) and

k 1

V=W⊕

k j=1 span[T, vj ]

span[T, vj ] is clearly T -invariant.

Remark: Notice that if we take W = {0}, the decomposition (5.4.1) expresses (V, T ) as a direct sum of cyclic subsystems.

7

The dimension of Wi+1 is dim Wi + d, so that kd = dim V − dim W.

—DRAFT—

JANUARY 1, 2006

82

L INEAR A LGEBRA

5.4.2 If (V, T ) is semisimple and F is algebraically closed, and in particular if F = C, all irreducible polynomials in F[x] are linear. If Φj (T ) = T − λj with λj ∈ F, then the canonical prime-power decomposition has the form (5.4.2) V= ker(T − λj ),

and, for each j, the restriction of T to ker(T − λj ) is just multiplication by λj . 5.4.3 If F is not algebraically closed and minPT = Φ is irreducible, but nonlinear, we have much the same phenomenon, but in somewhat hidden form. Lemma. Let T ∈ L(V), and assume that minPT is irreducible in F[x]. Then P(T ) = {P (T ) : P ∈ F[x]} is a ﬁeld.

P ROOF : If P ∈ F[x] and P (T ) = 0, then gcd(P, Φ) = 1 and hence P (T ) is invertible. Thus, every non-zero element in P(T ) is invertible and P(T ) is a ﬁeld. 5.4.4 V can now be considered as a vector space over the extended ﬁeld P(T ) by considering the action of P (T ) on v as a multiplication of v by the “scalar” P (T ) ∈ P(T ). This deﬁnes a system (VP(T ) , T ). A subspace of (VP(T ) ) is precisely a T -invariant subspace of V. The subspace span[T, v], in V (over F) becomes “the line through v in (VP(T ) )”, i.e. the set of all multiples of v by scalars from P(T ); the statement “Every subspace of a ﬁnite-dimensional vector space (here V over P(T )), has a basis.” translates here to: “Every T -invariant subspace of V is a direct sum of cyclic subspaces, that is subspaces of the form span[T, v].” EXERCISES FOR SECTION 5.4

V.4.1. a. If T is diagonalizable then (V, T ) is semisimple. b. If F is algebraically closed and (V, T ) is semisimple, then T is diagonalizable V.4.2. An algebra B ⊂ L(V) is semisimple if every T ∈ B is semisimple. Prove that if B is commutative and semisimple, then dim B ≤ dim V. JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

83

V.4.3. Let V = V1 ⊕ V0 and let B ⊂ L(B) be the set of all the operators S such that SV1 ⊂ V0 , and SV0 = {0}. Prove that B is a commutative subalgebra of L(V) and that dim B = dim V0 dim V1 . When is B semisimple? V.4.4. Let B be the subset of M(2; R) of the matrices of the form a −b b . Prove a

that B is an algebra over R, which is in fact a ﬁeld isomorphic to C. V.4.5. Let V be an n-dimensional real vector space, and T ∈ L(V) an operator with non-linear irreducible minimal polynomial. Prove that n is even and explain: (V, T ) is “isomorphic” to (Cn/2 , sI) (s a complex number, I the identity on Cn/2 ). 5.5 NILPOTENT OPERATORS

The canonical prime-power decomposition reduces every system to a direct sum of systems whose minimal polynomial is a power of an irreducible polynomial Φ. If F is algebraically closed, and in particular if F = C, the irreducible polynomials are linear, Φ(x) = (x − λ) for some scalar λ. We consider here the case of linear Φ, and discuss the general case in section 5.6. If minPT = (x−λ)m , then minP(T-λ) = xm and the structure of S = T −λ clariﬁes the structure of T . We therefore focus on the case λ = 0. 5.5.1 D EFINITION : An operator T ∈ L(V) is nilpotent if for some positive integer k, T k = 0. The height of (V, T ), denoted height[(V, T )], is the smallest positive k for which T k = 0. If T k = 0, minPT divides xk , hence it is a power of x. In other words, T is nilpotent of height k if minPT (x) = xk . For every v ∈ V, minPT,v (x) = xl for an appropriate l. The height, height[v], of a vector v (under the action of T ) is the degree of minPT,v , that is smallest integer l such that T l v = 0. It is the height of T W , where W = span[T, v], the span of v under T . Since for v = 0, height[T v] = height[v] − 1, elements of maximal height are not in range(T ). E XAMPLE : V is the space of all (algebraic) polynomials of degree bounded by m, (so that {xj }m is a basis for V), T the differentiation operator: j=0

m m m−1

(5.5.1)

T(

0

aj xj ) =

1

jaj xj−1 =

0

(j + 1)aj+1 xj .

JANUARY 1, 2006

—DRAFT—

84

L INEAR A LGEBRA

The vector w = xm has height m + 1, and {T j w}m is a basis for V (so that j=0

x w is a cyclic vector). If we take vj = (m−j)! as basis elements, the operator takes the form of the standard shift of height m + 1.

m−j

D EFINITION : A k -shift is a k-dimensional system {V, T } with T nilpotent of height k. A standard shift is a k-shift for some k, that is, a cyclic nilpotent system. If {V, T } is a k-shift, v0 ∈ V and height[v0 ] = k, then {T j v0 }k−1 is a j=0 basis for V, and the action of T is to map each basis element, except for the last, to the next one, and map the last basis element to 0. The matrix of T with respect to this basis is

0 0 ... 1 0 . . . = 0 1 . . . . . . . . . 0 0 ... 0 0 0 0 0 0 . . . . . . 1 0

(5.5.2)

AT,v

Shifts are the building blocks that nilpotent systems are made of. 5.5.2 Theorem (Cyclic decomposition for nilpotent operators). Let (V, T ) be a ﬁnite dimensional nilpotent system of height k. Then V = Vj , where Vj are T -invariant, and (Vj , TVj ) is a standard shift. Moreover, if we arrange the direct summands so that kj = height[(Vj , T )] is monotone non-increasing, then {kj } is uniquely determined. P ROOF : We use induction on k = height[(V, T )]. a. If k = 1, then T = 0 and any decomposition V = sional subspaces will do. Vj into one dimen-

b. Assume the statement valid for systems of height less that k and let (V, T ) be a (ﬁnite dimensional) nilpotent system of height k. Write Win = ker(T ) ∩ T V, and let Wout ⊂ ker(T ) be a complementary subspace, i.e., ker(T ) = Win ⊕ Wout . (T V, T ) is nilpotent of height k − 1 and, by the induction hypothesis, m admits a decomposition T V = j=1 Vj into standard shifts. Denote lj =

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

85

height[(Vj , T )]. Let vj be of height lj in Vj (so that Vj = span[T, vj ]), and ˜ ˜ observe that {T lj −1 vj } is a basis for Win . ˜ Let vj be such that vj = T vj , write Vj = span[T, vj ], and let Wout = ˜ i≤l Wi be a direct sum decomposition into one dimensional subspaces. The claim now is (5.5.3) V= Vj ⊕ Wi .

To prove (5.5.3) we need to show that the spaces {Vj , Wi }, i = 1, . . . , l, j = 1, . . . , m, are independent and span V. Independence: Assume there is a non-trivial relation uj + wi = 0 with uj ∈ Vj and wi ∈ Wi . Let h = max height[uj ]. If h > 1, then T h−1 uj = T h−1 uj + wi = 0 and we obtain a non-trivial relation between the Vj ’s. A contradiction. If h = 1 we obtain a non-trivial relation between elements of a basis of ker(T ). Again a contradiction. Spanning: Denote U = span[{Wi , Vj }], i = 1, . . . , l, j = 1, . . . , m. T U contains every vj , and hence T U = T V. It folows that U ⊃ Win and since it ˜ contains (by its deﬁnition) Wout , we have U ⊃ ker(T ). For arbitrary v ∈ V, let v ∈ U be such that T v = T v . Then v − v ∈ ˆ ˆ ˆ ker(T ) ⊂ U so that v ∈ U, and U = V. Finally, if we denote by n(h) the number of summands Vj in (5.5.3) of dimension (i.e., height) h, then n(k) = dim T k−1 V while, for l = 0, . . . , k − 2, we have

k

(5.5.4)

dim T l V =

h=l+1

(h − l)n(h),

which determines {n(h)} completely. Corollary. An irreducible nilpotent system is a standard shift. EXERCISES FOR SECTION 5.5

V.5.1. Assume F = R and minPT = Φ(x) = x2 + 1. Prove that a + bT → a + bi is a (ﬁeld) isomorphism of FΦ onto C.

—DRAFT—

JANUARY 1, 2006

86

L INEAR A LGEBRA What is FΦ if minPT = Φ(x) = x2 + 3?

V.5.2. Assume minPT = Φm with irreducible Φ. Can you explain (justify) the statement: (V, T ) is “essentially” a standard m-shift over FΦ . 5.6 THE CYCLIC DECOMPOSITION

We now show that the canonical prime-power decomposition can be reﬁned to a cyclic decomposition. D EFINITION : A cyclic decomposition of a system (V, T ) is a direct sum decomposition of the system into irreducible cyclic subspaces, that is, irreducible subspaces of the form span[T, v]. The summands in the canonical prime-power decomposition have the form ker(Φm (T )) with an irreducible polynomial Φ. We show here that such systems (whose minimal polynomial is Φm , with irreducible Φ) admit a cyclic decomposition. In the previous section we proved the special case8 in which Φ(x) = x. If we use the point of view proposed in subsection 5.4.4, the general case is nothing more than the nilpotent case over the ﬁeld P(T ) and nothing more need be proved. For the reader not used to switching underlying ﬁelds we repeat the proof of the nilpotent case in the present context. 5.6.1 We assume now that minPT = Φm with irreducible Φ of degree d. For every v ∈ V, minPT,v = Φk(v) , 1 ≤ k ≤ m, and maxv k(v) = m; we refer to k(v) as the Φ-height, or simply height, of v. Theorem. There exist vectors vj ∈ V such that V = span[T, vj ]. Moreover, the set of the Φ–heights of the vj ’s is uniquely determined. P ROOF : We use induction on the Φ-height m. a. m = 1. See 5.4. b. Assume that minPT = Φm , and the theorem valid for heights lower than m. Write Win = ker(Φ(T )) ∩ Φ(T )V and let Wout ⊂ ker(Φ(T )) be a complemen8

Notice that when Φ(x) = x, a cyclic space is what we called a standard

shift.

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

87

tary T -invariant subspace, i.e., ker(Φ(T )) = Win ⊕ Wout . Such complementary T -invariant subspace of ker(Φ(T )) exists since the system (ker(Φ(T )), T ) is semisimple, see 5.4. (Φ(T )V, T ) is of height m − 1 and, by the induction hypothesis, admits a decomposition Φ(T )V = m Vj into cyclic subspaces, Vj = span[T, vj ]. ˜ j=1 Let vj be such that vj = Φ(T )vj . ˜ Write Vj = span[T, vj ], and let Wout = i≤l Wi be a direct sum decomposition into cyclic subspaces. The claim now is (5.6.1) V= Vj ⊕ Wi .

To prove (5.6.1) we need to show that the spaces {Vj , Wi }, i = 1, . . . , l, j = 1, . . . , m, are independent, and that they span V. Independence: Assume there is a non-trivial relation uj + wi = 0 with uj ∈ Vj and wi ∈ Wi . Let h = max Φ-height[uj ]. If h > 1, then Φ(T )h−1 uj = Φ(T )h−1 uj + wi = 0 and we obtain a non-trivial relation between the Vj ’s. A contradiction. If h = 1 we obtain a non-trivial relation between elements of a basis of ker(Φ)(T ). Again a contradiction. Spanning: Denote U = span[{Wi , Vj }], i = 1, . . . , l, j = 1, . . . , m. Notice ﬁrst that U ⊃ ker(T ). Φ(T )U contains every vj , and hence T U = T V. For v ∈ V, let v ∈ U be ˜ ˜ such that T v = T v . Then v − v ∈ ker(T ) ⊂ U so that v ∈ U, and U = V. ˜ ˜ Finally, just as in the previous subsection, denote by n(h) the number of vj ’s of Φ–height h in the decomposition. Then dn(m) = dim Φ(T )m−1 V and, for l = 0, . . . , m − 2, we have

k

(5.6.2)

dim Φ(T ) V = d

h=l+1

l

(h − l)n(h),

which determines {n(h)} completely. 5.6.2 T HE GENERAL CASE . We now reﬁne the canonical prime-power decomposition (5.3.6) by applying Theorem 5.6.1 to each of the summands: —DRAFT—

JANUARY 1, 2006

88

L INEAR A LGEBRA

Theorem (General cyclic decomposition). Let (V, T ) be a linear system over m a ﬁeld F. Let minPT = Φj j be the prime-power decomposition of its minimal polynomial. Then (V, T ) admits a cyclic decomposition V= Vk .

l(k)

For each k, the minimal polynomial of T on Vk is Φj(k) for some l(k) ≤ mj(k) , and mj(k) = max l(k). The polynomials Φj(k) are called the elementary divisors of T . Remark: We deﬁned cyclic decomposition as one in which the summands are irreducible. The requirement of irreducibility is satisﬁed automatically if the minimal polynomial is a “prime-power”, i.e., has the form Φm with irreducible Φ. If one omits this requirement and the minimal polynomial has several relatively prime factors, we no longer have uniqueness of the decomposition since the direct sum of cyclic subspaces with relatively prime minimal polynomials is itself cyclic. EXERCISES FOR SECTION 5.6

V.6.1. Assume minPT,v = Φm with irreducible Φ. Let u ∈ span[T, v], and assume Φ-height[u] = m. Prove that span[T, u] = span[T, v]. V.6.2. Give an example of two operators, T and S in L(C5 ), such that minPT = minPS and χT = χS , and yet S and T are not similar. V.6.3. Assume F is a subﬁeld of F1 . Let B1 , B2 ∈ M(n, F) and assume that they are F1 -similar, i.e., B2 = C −1 B1 C for some invertible C ∈ M(n, F1 ). Prove that they are F-similar. 5.7 THE JORDAN CANONICAL FORM

l(k)

5.7.1 BASES AND CORRESPONDING MATRICES . Let (V, T ) be cyclic, that is V = span[T, v], and minPT = minPT,v = Φm , with Φ irreducible of degree d. The cyclic decomposition provides several natural bases: i. The (ordered) set {T j v}dm−1 is a basis, and the matrix of T with respect j=0 to this basis is the companion matrix of Φm .

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

89

ii. Another natural, and in some ways more useful, basis in this context is (5.7.1)

d−1 {T k v}d−1 ∪ {Φ(T )T k v}d−1 ∪ · · · ∪ {Φ(T )m−1 T k v}k=0 k=0 k=0

And the matrix AΦm of T relative to this ordered basis consists of m copies of the companion matrix of Φ arranged on the diagonal, with 1’s in the unused positions in the sub-diagonal. If AΦ is the companion matrix of Φ then the matrix AΦ4 is

AΦ 1 =

AΦ 1

(5.7.2)

AΦ4

AΦ 1

AΦ

5.7.2 Consider the special case of linear Φ, which is the rule when the underlying ﬁeld F is algebraically closed, and in particular when F = C. If Φ(x) = x − λ for some λ ∈ F, then its companion matrix is 1 × 1 with λ its only entry. Since now d = 1 the basis (5.7.1) is now simply {(T − λ)j v}m−1 and the j=0 matrix A(x−λ)m in this case is the m × m matrix that has all its diagonal entries equal to λ, all the entries just below the diagonal (assuming m > 1) are equal to 1, and all the other entries are 0.

λ 1 0 = . . . 0 0 0 λ 1 . . . 0 0 0 0 λ ... ... ... ... ... 1 0 0 0 0 . . . λ 1 0 0 0 . . . 0 λ JANUARY 1, 2006

(5.7.3)

A(x−λ)m

—DRAFT—

90

L INEAR A LGEBRA

5.7.3 T HE J ORDAN CANONICAL FORM . Consider a system (V, T ) such that all the irreducible factors of minPT are linear, (in particular, an arbitrary system (V, T ) over C). The prime-power factorization of minPT is now 9 minPT =

λ∈σ(T )

(x − λ)m(λ)

where m(λ) is the multiplicity of λ in minPT . The space Vλ = ker((T − λ)m(λ) ) is called the generalized eigenspace or nilspace of λ, see 5.3.4. The canonical decomposition of (V, T ) is given by: (5.7.4) V=

λ∈σ(T )

Vλ .

For λ ∈ σ(T ), the restriction of T − λ to Vλ is nilpotent of height m(λ). We apply to Vλ the cyclic decomposition Vλ = span[T, vj ].

h(v )−1

j and take as basis in span[T, vj ] the set {(T − λ)s vj }s=0 , where h(vj ) is the (T − λ)-height of vj . The matrix of the restriction of T to each span[T, vj ] has the form (5.7.3), the matrix AT,Vλ of T on Vλ is the diagonal sum of these, and the matrix of T on V is the diagonal sum of AT,Vλ for all λ ∈ σ(T ).

5.7.4 T HE CANONICAL FORM FOR REAL VECTOR SPACES . When (V, T ) is deﬁned over R, the irreducible factors Φ of minPT are either linear or quadratic, i.e., have the form Φ(x) = x − λ, or Φ(x) = x2 + 2bx + c with b2 − c < 0.

The companion matrix in the quadratic case is (5.7.5)

9

0 −c . 1 −2b

Recall that the spectrum of T is the set σ(T ) = {λj } of the eigenvalues of T , that is, the set of zeros of minPT .

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

91

√ (Over C we have x2 + 2bx + c = (x − λ)(x − λ) with λ = −b + b2 − c, and the matrix similar to the diagonal matrix with λ and λ on the diagonal.) EXERCISES FOR SECTION 5.7

V.7.1. Assume that v1 , . . . , vk are eigenvectors of T with the associated eigenvalues λ1 , . . . , λk all distinct. Prove that v1 , . . . , vk are linearly independent. V.7.2. Show that if we allow complex coefﬁcients, the matrix (5.7.5) is similar to √ λ 0 with λ = −b + b2 − c. 0 λ 0 0 2 V.7.3. T is given by the matrix AT = 1 0 0 acting on F3 . 0 1 0 a. What is the basic decomposition when F = C, when F = R, and when F = Q? b. Prove that when F = Q every non-zero vector is cyclic. Hence, every non-zero rational vector is cyclic when F = R or C. c. What happens to the basic decomposition under the action of an operator S that commutes with T ? d. Describe the set of matrices A ∈ M(3; F) that commute with AT where F = C, R, resp. Q. 0 −1 is not similar to a triangular matrix if the un1 0 derlying ﬁeld is R, and is diagonalizable over C. Why doesn’t this contradict exercise V.6.3? V.7.4. Prove that the matrix V.7.5. Let A ∈ M(n; C) such that {Aj : j ∈ N} is bounded (all the entries are uniformly bounded). Prove that all the eigenvalues of A are of absolute value not bigger than 1. Moreover, if λ ∈ σ(A) and |λ| = 1, there are no ones under λ in the Jordan canonical form of A. V.7.6. Let A ∈ M(n; C) such that {Aj : j ∈ Z} is bounded. Prove that A is diagonalizable, and all its eigenvalues have absolute value 1. V.7.7. Let T ∈ L(V). Write χT = Φj j with Φj irreducible, but not necessarily distinct, and mj are the corresponding heights in the cyclic decomposition of the system. Find a basis of the form (5.7.1) for each of the components. and describe the matrix of T relative to this basis.

m

—DRAFT—

JANUARY 1, 2006

92

L INEAR A LGEBRA

V.7.8. Let A be the m × m matrix A(x−λ)m deﬁned in (5.7.3). Compute An for all n > 1. 5.8 FUNCTIONS OF AN OPERATOR

5.8.1 T HEORETICAL . If P = F, we deﬁned P (T ) by

aj xj is a polynomial with coefﬁcients in aj T j .

P (T ) =

Is there a natural extension of the deﬁnition to a larger class of functions? The map P → P (T ) is a homomorphism of F[x] onto a subalgebra of L(V). We can often extend the homomorphism to a bigger function space, but in most cases the range stays the same. The advantage will be in having a better match with the natural notation arising in applications. Assume that the underlying ﬁeld is either R or C. Write minPT (z) = λ∈σ(T ) (z − λ)m(λ) and observe that a necessary and sufﬁcient condition for a polynomial Q to be divisible by minPT is that Q be divisible by (z − λ)m(λ) for every λ ∈ σ(T ), that is, have a zero of order at least m(λ) at λ. It follows that P1 (T ) = P2 (T ) if, and only if, the Taylor expansion of the two polynomials are the same up to, and including, the term of order m(λ) − 1 at every λ ∈ σ(T ). In particular, if m(λ) = 1 for all λ ∈ σ(T ) (i.e., if (V, T ) is semisimple) the condition P1 (λ) = P2 (λ) for all λ ∈ σ(T ) is equivalent to P1 (T ) = P2 (T ). If f is an arbitrary numerical function deﬁned on σ(T ), the only consistent way to deﬁne f (T ) is by setting f (T ) = P (T ) where P is any polynomial that takes the same values as f at each point of σ(T ). This deﬁnes a homomorphism of the space of all numerical functions on σ(T ) onto the (the same old) subalgebra generated by T in L(V). In the general (not necessarily semisimple) case, f needs to be deﬁned and sufﬁciently differentiable10 in a neighborhood of every λ ∈ σ(T ), and we deﬁne f (T ) = P (T ) where P is a polynomial whose Taylor expansion is the

10

That is, differentiable at least m(λ) − 1 times.

JANUARY 1, 2006

—DRAFT—

V.

I NVARIANT SUBSPACES

93

same as that of f up to, and including, the term of order m(λ) − 1 at every λ ∈ σ(T ). 5.8.2 M ORE PRACTICAL . The discussion in the previous subsection can only be put to use in practice if one has the complete spectral information about T —its minimal polynomial, its zeros including their multiplicities given explicitly. One can often deﬁne F (T ) without explicit knowledge of this information if F holomorphic in a sufﬁciently large set, and always if F is an entire function, that is a function that admits a power series representation in the entire complex plane. This is done formally just as it was for polynomials, namely, for F (z) = ∞ an z n , we write F (T ) = ∞ an T n . To verify that the deﬁni0 0 tion makes sense we check the convergence of the series. Since L(V) is ﬁnite dimensional so that all the norms on it are equivalent, we can use a submultiplicative, “operator norm”, as deﬁned by (2.6.1). This keeps the estimates a little cleaner since T n ≤ T n , and if the radius of convergence of the series is bigger than T , the convergence of ∞ an T n is assured. 0 Two simple examples: a. Assume the norm used is submultiplicative, and T < 1, then (I − T ) is invertible and (I − T )−1 = ∞ T n . n=0 Tn b. Deﬁne eaT = . The series clearly convergent for every T ∈ L(V) n! and number a. As a function of the parameter a it has the usual properties of the exponential function. We can consider it as a function of T and check if e(T +S) = eT eS . We ﬁnd that the answer is yes if S and T commute, but no in general. EXERCISES FOR SECTION 5.8

V.8.1. Prove that eaT ebT = e(a+b)T . V.8.2. Prove that if S and T commute, then e(T +S) = eT eS . V.8.3. Give an example of operators S and T such that e(T +S) = eT eS . Hint: Try for eS eT = eT eS .

—DRAFT—

JANUARY 1, 2006

94

L INEAR A LGEBRA

JANUARY 1, 2006

—DRAFT—

Chapter VI

Operators on inner-product spaces

6.1

INNER-PRODUCT SPACES

Inner-product spaces, are real or complex vector spaces endowed with an additional structure, called inner-product. The inner-product permits the introduction of a fair amount of geometry. Finite dimensional real inner-product spaces are often called Euclidean spaces. Complex inner-product spaces are also called Unitary spaces.

6.1.1 D EFINITION : a. An inner-product on a real vector space V is a symmetric, real-valued, positive deﬁnite bilinear form on V. That is, a form satisfying 1. u, v = v, u 2. u, v is bilinear. 3. u, u ≥ 0, with u, u = 0 if, and only if, u = 0. b. An inner-product on a complex vector space V is a Hermitian1 , complexvalued, positive deﬁnite, sesquilinear form on V. That is, a form satisfying 1. u, v = v, u 2. u, v is sesquilinear, that is, linear in u and skew linear in v: λu, v = λ u, v and u, λv = λ u, v . 3. u, u ≥ 0, with u, u = 0 if and only if u = 0.

1

A complex-valued form ϕ is Hermitian if ϕ(u, v) = ϕ(v, u).

95

96

L INEAR A LGEBRA

Notice that the sesquilinearity follows from the Hermitian symmetry, condition 1., combined with the assumption of linearity in the ﬁrst entry. E XAMPLES : a. The classical Euclidean n-space E n is Rn in which a, b = a = (a1 , . . . , an ) and b = (b1 , . . . , bn ). aj bj where

b. The space CR ([0, 1]) of all continuous real-valued functions on [0, 1]. The inner-product is deﬁned by f, g = f (x)g(x)dx. c. In Cn for a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) we set a, b = aj ¯j b Tr . If we conwhich can be written as matrix multiplication: a, = ab b

a1 b1

. . sider the vector as columns, a = . and b = . then . .

an bn

a, b = bTr a.

d. The space C([0, 1]) of all continuous complex-valued functions on [0, 1]. 1 The inner-product is deﬁned by f, g = 0 f (x)g(x)dx. We shall reserve the notation H for inner-product vector spaces. 6.1.2 (6.1.1) Lemma (Cauchy–Schwarz). (6.1.2) | u, v | ≤ u v . Given an inner-product space H we deﬁne a norm on it by: v = v, v .

P ROOF : If v is a scalar multiple of u we have equality. If v, u are not proportional, then for λ ∈ R, 0 < u + λv, u + λv = u

2

+ 2λ

u, v + λ2 v 2 .

A quadratic polynomial with real coefﬁcients and no real roots has negative discriminant, here ( u, v )2 − u 2 v 2 < 0. For every τ with |τ | = 1 we have | τ u, v | ≤ u v ; take τ such that τ u, v = | u, v |.

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

97

The norm has the following properties: a. Positivity: If v = 0 then v > 0; 0 = 0. b. Homogeneity: av = |a| v for scalars a and vectors v. c. The triangle inequality: v + u ≤ v + u . d. The parallelogram law: v + u

2

+ v−u

2

= 2( v

2

+ u 2 ).

Properties a. and b. are obvious. Property c. is equivalent to v

2

+ u

2

+2

v, u ≤ v

2

+ u

2

+2 v

u ,

which reduces to (6.1.2). The parallelogram law is obtained by “opening brack2. ets” in the inner-products that correspond the the various The ﬁrst three properties are common to all norms, whether deﬁned by an inner-product or not. They imply that the norm can be viewed as length, and ρ(u, v) = u − v has the properties of a metric. The parallelogram law, on the other hand, is speciﬁc to, and in fact characteristic of, the norms deﬁned by an inner-product. A norm deﬁned by an inner-product determines the inner-product, see exercises VI.1.11 and VI.1.12. 6.1.3 O RTHOGONALITY. Let H be an inner-product space. The vectors v, u in H are (mutually) orthogonal, denoted v ⊥ u, if v, u = 0. Observe that, since u, v = v, u , the relation is symmetric: u ⊥ v ⇐⇒ v ⊥ u. The vector v is orthogonal to a set A ⊂ H, denoted v ⊥ A, if it is orthogonal to every vector in A. If v ⊥ A, u ⊥ A, and w ∈ A is arbitrary, then av + bu, w = a v, w + b u, w = 0. It follows that for any set A ⊂ H, the set2 A⊥ = {v : v ⊥ A} is a subspace of H. Similarly, if we assume that v ⊥ A, w1 ∈ A, and w2 ∈ A, we obtain v, aw1 + bw2 = a v, w1 + ¯ v, w2 = 0 so that v ⊥ (span[A]). In other ¯ b ⊥ = (span[A])⊥ . words: A

2

This notation is consistent with 3.1.2, see 6.2.1 below.

—DRAFT—

JANUARY 1, 2006

98

L INEAR A LGEBRA

A vector v is normal if v = 1. A sequence {v1 , . . . , vm } is orthonormal if (6.1.3) vi , vj = δi,j (i.e., 1 if i = j, and 0 if i = j);

that is, if the vectors vj are normal and pairwise orthogonal. Lemma. Let {u1 , . . . , um } be orthonormal, v, w ∈ H arbitrary. a. {u1 , . . . , um } is linearly independent. b. The vector v1 = v − m v, uj uj is orthogonal to span[u1 , . . . , um ]. 1 c. If {u1 , . . . , um } is an orthonormal basis, then

m

(6.1.4)

v=

1

v, uj uj .

d. Parseval’s identity. If {u1 , . . . , um } is an orthonormal basis for H, then

m

(6.1.5)

v, w =

1

v, uj w, uj .

e. Bessel’s inequality and identity. If {uj } is orthonormal then (6.1.6) | v, uj |2 ≤ v 2 .

2

If {u1 , . . . , um } is an orthonormal basis for H, then v P ROOF : a. If aj uj = 0 then ak =

=

m 1 |

v, uj |2 .

aj uj , uk = 0 for all k ∈ [1, m].

b. v1 , uk = v, uk − v, uk = 0 for all k ∈ [1, m]; (skew-)linearity extends the orthogonality to linear combinations, that is to the span of {u1 , . . . , um }. c. If the span is the entire H, v1 is orthogonal to itself, and so v1 = 0. d. v, w = =

j

v, uj uj , v, uj w, uj

w, ul ul =

j,l

v, uj w, ul uj , ul

e. This is clearly weaker that (6.1.5).

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

99

6.1.4 Proposition (Gram-Schmidt). Let {v1 , . . . , vm } be independent. There exists an orthonormal {u1 , . . . , um } such that for all k ∈ [1, m], (6.1.7) span[u1 , . . . , uk ] = span[v1 , . . . , vk ].

P ROOF : (By induction on m). The independence of {v1 , . . . , vm } implies that v1 = 0. Write u1 = v1 / v1 . Then u1 is normal and (6.1.7) is satisﬁed for k = 1. Assume that {u1 , . . . , ul } is orthonormal and that (6.1.7) is satisﬁed for k ≤ l. Since vl+1 ∈ span[{v1 , . . . , vl }] the vector

l

vl+1 = vl+1 − ˜

j=1

vl+1 , uj uj

is non-zero and we set ul+1 = vl+1 / vl+1 . ˜ ˜ One immediate corollary is: every ﬁnite dimensional H has an orthonormal basis. Another is that every orthonormal sequence {uj }k can be completed to 1 an orthonormal basis. For this we observe that {uj }k is independent, complete 1 it to a basis, apply the Gram-Schmidt process and notice that it does not change the vectors uj , 1 ≤ j ≤ k. If W ⊂ H is a subspace, {vj }n is a basis for H such that {vj }m is a basis 1 1 for W, then the basis {uj }n obtained by the Gram-Schmidt process splits into 1 two: {uj }m ∪ {uj }n , where {uj }m is an o.n. basis3 for W and {uj }n 1 m+1 1 m+1 is one for W ⊥ . This gives a direct sum (in fact, orthogonal) decomposition H = W ⊕ W ⊥. The map

m

(6.1.8)

πW : v →

1

v, uj uj

is called the orthogonal projection onto W. It depends only on W and not on the particular basis we started from. In fact, if v = v1 + v2 = u1 + u2 with v1 and u1 in W, and both v2 and u2 in W ⊥ , we have v1 − u1 = u2 − v2 ∈ W ∩ W ⊥

3

“o.n.” is short for “orthonormal”

—DRAFT—

JANUARY 1, 2006

100

L INEAR A LGEBRA

which means v1 − u1 = u2 − v2 = 0. 6.1.5 The deﬁnition of the distance ρ(v1 , v2 ) (= v1 − v2 ) between two vectors, extends to that of the distance between a point (v ∈ H) and a set (E ⊂ H) by setting ρ(v, E) = inf u∈E ρ(v, u). The distance between two sets, Ej ⊂ H j = 1, 2, is deﬁned by (6.1.9) ρ(E1 , E2 ) = inf{ v1 − v2 : vj ∈ Ej }.

Proposition. Let W ⊂ H be a subspace, and v ∈ H. Then ρ(v, W) = v − π W v . In other words, π W v is the vector in W closest to v. The proof is left as an exercise (VI.1.5) below). EXERCISES FOR SECTION 6.1

VI.1.1. Let V be a ﬁnite dimensional real or complex space, and {v1 , . . . , vn } a basis. Explain: “declaring {v1 , . . . , vn } to be orthonormal deﬁnes an inner-product on V”. VI.1.2. Prove that if H is a complex inner-product space and T ∈ L(H), there exists an orthonormal basis for H such that the matrix of T with respect to this basis is triangular.

Hint: See corollary 5.1.6.

VI.1.3. a. Let H be a real inner-product space. The vectors v, u are mutually orthogonal if, and only if, v + u 2 = v 2 + u 2 . b. If H is a complex inner-product space, v, u ∈ H, then v + u 2 = v 2 + u 2 is necessary, but not sufﬁcient, for v ⊥ u. Hint: Connect to the condition “< u, v > purely imaginary”. c. If H is a complex inner-product space, and v, u ∈ H, the condition: For all a, b ∈ C, av + bu 2 = |a|2 v 2 + |b|2 u 2 is necessary and sufﬁcient for v ⊥ u. d. Let V and U be subspaces of H. Prove that V ⊥ U if, and only if, for v ∈ V and u ∈ U, v + u 2 = v 2 + u 2 . e. The set {v1 , . . . , vm } is orthonormal if, and only if aj vj 2 = all choices of scalars aj , j = 1, . . . , m. (Here H is either real or complex.) JANUARY 1, 2006 |aj |2 for

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

101

VI.1.4. Show that the map π W deﬁned in (6.1.8) is an idempotent linear operator 4 and is independent of the particular basis used in its deﬁnition. VI.1.5. Prove proposition 6.1.5. VI.1.6. Let Ej = vj + Wj be afﬁne subspaces in H. What is ρ(E1 , E2 )? VI.1.7. Show that the sequence {u1 , . . . , um } obtained by the Gram-Schmidt procedure is essentially unique: each uj is unique up to multiplication by a number of modulus 1.

Hint: If {v1 , . . . , vm } is independent, Wk = span[{v1 , . . . , vk }], k = 0, . . . , m − 1,

then uj is cπ Wj−1 vj , with |c| = π Wj−1 vj ⊥ ⊥

−1

.

VI.1.8. Over C: Every matrix is unitarily equivalent to a triangular matrix. VI.1.9. Let A ∈ M(n, C) and assume that its rows wj , considered as vectors in n C are pairwise orthogonal. Prove that AATr is a diagonal matrix, and conclude that |det A| = wj . Let {v1 , . . . , vn } ⊂ Cn be the rows of the matrix A. Prove Hadamard’s inequality: VI.1.10. (6.1.10) |det A| ≤ vj

Hint: Write Wk = span[{v1 , . . . , vk }], k = 0, . . . , n − 1, wj = π Wj−1 vj , and apply ⊥

the previous problem. VI.1.11. Prove that in a real inner-product space, the inner-product is determined by the norm: (polarization formula over R) (6.1.11) u, v = 1 4 u+v

2

− u−v

2

VI.1.12. Prove: In a complex inner-product space, the inner-product is determined by the norm, in fact, (polarization formula over C) (6.1.12) u, v = 1 4 u+v

2

− u−v

2

+ i u + iv

2

− i u − iv

2

.

4

An operator T is idempotent if T 2 = T .

—DRAFT—

JANUARY 1, 2006

102

L INEAR A LGEBRA

VI.1.13. Show that the polarization formula (6.1.12) does not depend on positivity, to wit, deﬁne the Hermitian quadratic form associated with a sesquilinear Hermitian form ψ (on a vector space over C or a subﬁeld thereof) by: (6.1.13) Prove (6.1.14) ψ(u, v) = 1 Q(u + v) − Q(u − v) + iQ(u + iv) − iQ(u − iv) . 4 Q(v) = ψ(v, v).

VI.1.14. A bilinear form ϕ on a vector space V over a ﬁeld of characteristic = 2, can be expressed uniquely as a sum of a symmetric and an alternating form: ϕ = ϕsym + ϕalt where 2ϕsym (v, u) = ϕ(v, u) + ϕ(u, v) and 2ϕalt (v, u) = ϕ(v, u) − ϕ(u, v). The quadratic form associated with ϕ is, by deﬁnition q(v) = ϕ(v, v). Show that q determines ϕsym , in fact (6.1.15) ϕsym (v, u) = 1 q(v + u) − q(v) − q(u) . 2

6.2

DUALITY AND THE ADJOINT.

6.2.1 H AS ITS OWN DUAL . The inner-product deﬁned in H associates with every vector u ∈ H the linear functional ϕu : v → v, u . In fact every linear functional is obtained this way: Theorem. Let ϕ be a linear functional on a ﬁnite dimensional inner-product space H. Then there exist a unique u ∈ H such that ϕ = ϕu , that is, (6.2.1) for all v ∈ H. P ROOF : Let {wj } be an orthonormal basis in H, and let u = ϕ(wj )wj . For every v ∈ H we have v = v, wj wj , and by Parseval’s identity, 6.1.3, (6.2.2) ϕ(v) = v, wj ϕ(wj ) = v, u . ϕ(v) = v, u

In particular, an orthonormal basis in H is its own dual basis.

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

103

6.2.2 T HE ADJOINT OF AN OPERATOR . Once we identify H with its dual space, the adjoint of an operator T ∈ L(H) is again an operator on H. We repeat the argument of 3.2 in the current context. Given u ∈ H, the mapping v → T v, u is a linear functional and therefore equal to v → v, w for some w ∈ H. We write T ∗ u = w and check that u → w is linear. In other words T ∗ is a linear operator on H, characterized by (6.2.3) T v, u = v, T ∗ u .

Lemma. For T ∈ L(H), (T ∗ )∗ = T . P ROOF : v, (T ∗ )∗ u = T ∗ v, u = u, T ∗ v = T u, v = v, T u .

Proposition 3.2.4 reads in the present context as Proposition. For T ∈ L(H), range(T ) = (ker(T ∗ ))⊥ .

P ROOF : T x, y = x, T ∗ y so that y ⊥ range(T ) if, and only if y ∈ ker(T ∗ ).

6.2.3

T HE ADJOINT OF A MATRIX .

Tr

D EFINITION : The adjoint of a matrix A ∈ M(n, C) is the matrix A∗ = A . A is self-adjoint, aka Hermitian, if A = A∗ , that is, if aij = aji for all i, j. If A = AT,v is the matrix of an operator T relative to an orthonormal basis v, see 2.4.3, and AT ∗ ,v is the matrix of T ∗ relative to the same basis, then, writing the inner-product as matrix multiplication: (6.2.4) T v, u = uTr Av = (A u) v,

Tr Tr

and

v, T ∗ u = (A∗ u)Tr v,

we obtain AT ∗ ,v = (AT,v )∗ . The matrix of the adjoint is the adjoint of the matrix. In particular, T is self-adjoint if, and only if, AT,v , for some (every) orthonormal basis v, is self-adjoint. EXERCISES FOR SECTION 6.2

VI.2.1. Prove that if T, S ∈ L(H), then (ST )∗ = T ∗ S ∗ .

—DRAFT—

JANUARY 1, 2006

104

L INEAR A LGEBRA

VI.2.2. Prove that if T ∈ L(H), then ker(T ∗ T ) = ker(T ). VI.2.3. Prove that χT ∗ is the complex conjugate of χT . ¯ VI.2.4. If T v = λv, T ∗ u = µu, and µ = λ, then v, u = 0. VI.2.5. Rewrite the proof of Theorem 6.2.1 along these lines: If ker(ϕ) = H then ϕ = 0 and u∗ = 0. Otherwise, dim ker(ϕ) = dim H − 1 and (ker(ϕ))⊥ = ∅. Take any non-zero u ∈ (ker(ϕ))⊥ and set u∗ = c˜ where the constant c is the one that ˜ u −2 guarantees u, c˜ = ϕ(˜), that is c = u ˜ u u ¯ ˜ ϕ(˜). u 6.3 UNITARY AND ORTHOGONAL OPERATORS

We have mentioned that the norm in H deﬁnes a metric, the distance between the vectors v and u given by ρ(v, u) = v−u . Mappings that preserve a metric are called isometries (of the given metric). Operators U ∈ L(H) which are isometries, that is such that U v = v for all v ∈ H are called unitary operators when H is complex, and orthogonal when H is real. The operator U is unitary if U v 2 = U v, U v = v, U ∗ U v = v, v which is equivalent to U ∗ U = I or U ∗ = U −1 . Proposition. Let H be an inner-product space, T ∈ L(H). The following statements are equivalent: a. T is unitary; b. T maps some orthonormal basis onto an orthonormal basis; c. T maps every orthonormal basis onto an orthonormal basis. The columns of the matrix of a unitary operator U relative to an orthonormal basis {vj }, are the coefﬁcient vectors of U vj and, by Parseval’s identity 6.1.3, are orthonormal in Cn (resp. Rn ). Such matrices (with orthonormal columns) are called unitary when the underlying ﬁeld is C, and orthogonal when the ﬁeld is R. The set U (n) ⊂ M(n, C) of unitary n×n matrices is a group under matrix multiplication. It is caled the unitary group. The set O(n) ⊂ M(n, R) of orthogonal n × n matrices is a group under matrix multiplication. It is caled the orthogonal group.

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

105

D EFINITION : The matrices A, B ∈ M(n) are unitarily equivalent if there exists U ∈ U (n) such that A = U −1 BU . The matrices A, B ∈ M(n) are orthogonally equivalent if there exists C ∈ O(n) such that A = O−1 BO. The added condition here, compared to similarity, is that the conjugating matrix U , resp. O, be unitary, resp. orthogonal, and not just invertible. EXERCISES FOR SECTION 6.3

VI.3.1. Prove that the set of rows of a unitary matrix is orthonormal. VI.3.2. Prove that the spectrum of a unitary operator is contained in the unit circle {z : |z| = 1}. VI.3.3. An operator T whose spectrum is contained in the unit circle is similar to a unitary operator if, and only if, it is semisimple. VI.3.4. An operator T whose spectrum is contained in the unit circle is unitary if, and only if, it is semisimple and eigenvectors corresponding to distinct eigenvalues are mutually orthogonal. VI.3.5. Let T ∈ L(H) be invertible and assume that T j is uniformly bounded for j ∈ Z. Prove that T is similar to a unitary operator. 6.4 SELF-ADJOINT OPERATORS

6.4.1 D EFINITION : An operator T ∈ L(H) is self-adjoint if it coincides with it’s adjoint: T = T ∗ , (that is, if T u, v = u, T v for every u, v ∈ H). 1 For every T ∈ L(H), the operators T = 1 (T + T ∗ ), T = 2i (T − T ∗ ), 2 T ∗ T , and T T ∗ are all self-adjoint. Proposition. Assume that T is self-adjoint on H. a. σ(T ) ⊂ R. b. If W ⊂ H is T -invariant then so is W ⊥ (the orthogonal complement of W). In particular, every T -invariant subspace is reducing, so that T is semisimple. c. If W ⊂ H is T -invariant then T W , the restriction of T to W, is selfadjoint. —DRAFT—

JANUARY 1, 2006

106

L INEAR A LGEBRA

P ROOF : a. If λ ∈ σ(T ) and v is a corresponding eigenvector, then λ v

2

¯ ¯ = T v, v = v, T v = λ v 2 , so that λ = λ.

b. If v ∈ W ⊥ then, for any w ∈ W, T v, w = v, T w = 0 (since T w ∈ W), so that T v ∈ W ⊥ . c. The condition T w1 , w2 = w1 , T w2 is valid when wj ∈ W since it holds for all vectors in H.

6.4.2 Part b. of the proposition implies that for self-adjoint operators T the generalized eigenspaces Hλ , λ ∈ σ(T ), are not generalized, they are simply kernels: Hλ = ker(T − λ). The Canonical Decomposition Theorem reads in this context: Proposition. Assume T self-adjoint. Then H = ⊕λ∈σ(T ) ker(T − λ). 6.4.3 The ﬁnal improvement we bring to the Canonical Decomposition Theorem for self-adjoint operators is the fact that the eigenspaces corresponding to distinct eigenvalues are mutually orthogonal: if T is self-adjoint, T v1 = λ1 v1 , T v2 = λ2 v1 , and λ1 = λ2 , then5 , λ1 v1 , v2 = T v1 , v2 = v1 , T v2 = λ2 v1 , v2 = λ2 v1 , v2 , so that v1 , v2 = 0. Theorem (The spectral theorem for self-adjoint operators). Let H be an innerproduct space and T a self-adjoint operator on H. Then H = λ∈σ(T ) Hλ where THλ , the restriction of T to Hλ , is multiplication by λ, and Hλ1 ⊥ Hλ2 when λ1 = λ2 . An equivalent formulation of the theorem is:

5

Remember that λ2 ∈ R.

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

107

Theorem (Variant). Let H be an inner-product space and T a self-adjoint operator on H. Then H has an orthonormal basis all whose elements are eigenvectors for T . Denote by π λ the orthogonal projection on Hλ . The theorem states: (6.4.1) I=

λ∈σ(T )

πλ,

and

T =

λ∈σ(T )

λπ λ .

The decomposition H = λ∈σ(T ) Hλ is often referred to as the spectral decomposition induced by T on H. The representation of T as λ∈σ(T ) λπ λ is its spectral decomposition. 6.4.4 If {u1 , . . . , un } is an orthonormal basis whose elements are eigenvectors for T , say T uj = λj uj , then (6.4.2) Tv = λj v, uj uj aj uj , |λj |2 | v, uj |2 .

for all v ∈ H. Consequently, writing aj = v, uj and v = (6.4.3) T v, v = λj |aj |2 and Tv

2

=

Observations. Assume T self-adjoint. a. T = maxλ∈σ(T ) |λ|.

b. If T ≤ 1. Then there exists a unitary operator U that commutes with T , such that T = 1 (U + U ∗ ). 2 P ROOF : a. If λm is an eigenvalue with maximal absolute value in σ(T ), then T ≥ T um = maxλ∈σ(T ) |λ|. Conversely, by (6.4.3), Tv

2

=

|λj |2 | v, uj |2 ≤ max|λj |2

| v, uj |2 = max|λj |2 v 2 .

b. By part a., σ(T ) ⊂ [−1, 1]. For λj ∈ σ(T ) write ζj = λj + i 1 − λ2 , so j that λj = ζj and |ζj | = 1. Deﬁne: U v = ζj v, uj uj . —DRAFT—

JANUARY 1, 2006

108

L INEAR A LGEBRA

6.4.5 Theorem (Spectral theorem for Hermitian/symmetric matrices). Every Hermitian matrix in M(n, C) is unitarily equivalent to a diagonal matrix. Every symmetric matrix in M(n, R) is orthogonally equivalent to a diagonal matrix. P ROOF : A Hermitian matrix A ∈ M(n, C) is self-adjoint (i.e., the operator on Cn of multiplication by A is self-adjoint). If the underlying ﬁeld is R the condition is being symmetric. In either case, theorem 6.4.3 guarantees that the standard Cn , resp. Rn , has an orthonormal basis {vj } all whose elements are eigenvectors for the operator of multiplication by A. The matrix C of transition from the standard basis to {vj } is unitary, resp. orthogonal, and CAC −1 = CAC ∗ is diagonal. 6.4.6 C OMMUTING SELF - ADJOINT OPERATORS . Let T is self-adjoint, H = λ∈σ(T ) Hλ . If S commutes with T , then S maps each Hλ into itself. Since the subspaces Hλ are mutually orthogonal, if S is self-adjoint then so is its restriction to every Hλ , and we can apply Theorem 6.4.3 to each one of these restrictions and obtain, in each, an orthonormal basis made up of eigenvectors of S. Since every vector in Hλ is an eigenvector for T we obtained an orthonormal basis each of whose elements is an eigenvector both for T and for S. We now have the decomposition H=

λ∈σ(T ),µ∈σ(S )

Hλ,µ ,

where Hλ,µ = ker(T − λ) ∩ ker(S − µ). By induction on the number of operators we obtain the following theorem. Theorem. Let H be a ﬁnite dimensional inner-product space, and {Tj } commuting self-adjoint operators on H. Then there exists an orthonormal basis {uk } in H such that each uk is an eigenvector of every Tj . EXERCISES FOR SECTION 6.4

VI.4.1. Let T ∈ L(H) be self-adjoint, let λ1 ≤ λ2 ≤ · · · ≤ λn be its eigenvalues and {uj } the corresponding orthonormal eigenvectors. Prove the “minmax principle”: (6.4.4) JANUARY 1, 2006 λl =

dim W=l v∈W, v =1

min

max

T v, v .

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

109

Hint: Every l-dimensional subspace intersects span[{uj }n ], see 1.2.5. j=l

VI.4.2. Let W ⊂ H be a subspace, and π W the orthogonal projection onto W. Prove that if T is self-adjoint on H, then π W T is self-adjoint on W. VI.4.3. Use exercise VI.2.2 to prove that a self-adjoint operator T on H is semisimple (Lemma 6.4.1, part b.). 6.5 NORMAL OPERATORS.

D EFINITION : An operator T ∈ L(H) is normal if it commutes with it’s adjoint: T T ∗ = T ∗ T . Self-adjoint operators are clearly normal. Unitary operators are normal since for unitary U we have U ∗ U = U U ∗ = I. If T is normal then S = T T ∗ = T ∗ T is self-adjoint. 6.5.1 T HE SPECTRAL THEOREM operator T ∈ L(H), the operators

FOR NORMAL OPERATORS .

1 2i (T

For every

T1 = T = 1 (T + T ∗ ) and T2 = T = 2

− T ∗)

are both self-adjoint, and T = (T1 + iT2 ). T is normal if, and only if, T1 and T2 commute. Theorem. Let T ∈ L(H) be normal. Then there is an orthonormal basis {uk } of H such that every uk is an eigenvector for T . P ROOF : As above, write T1 = T + T ∗ , T2 = −i(T − T ∗ ). Since T1 and T2 are commuting self-adjoint operators, Theorem 6.4.6 guarantees the existence of an orthonormal basis {uk } ⊂ H such that each uk is an eigenvector of both T1 and T2 . If Tj = k tj,k π uk , j = 1, 2, then (6.5.1) T =

k

(t1,k + it2,k )π uk ,

and the vectors uk are eigenvectors of T with eigenvalues (t1,k + it2,k ). —DRAFT—

JANUARY 1, 2006

110

L INEAR A LGEBRA

6.5.2

A subalgebra A ⊂ L(H) is self-adjoint if S ∈ A implies that S ∗ ∈ A.

Theorem. Let A ⊂ L(H) be a self-adjoint commutative subalgebra. Then there is an orthonormal basis {uk } of H such that every uk is a common eigenvector of every T ∈ A. P ROOF : The elements of A are normal and A is spanned by the self-adjoint elements it contains. Apply Theorem 6.4.6. EXERCISES FOR SECTION 6.5

VI.5.1. If S is normal (or just semisimple), a necessary and sufﬁcient condition for an operator Q to commute with S is that all the eigenspaces of S be Q-invariant. VI.5.2. If S is normal and Q commutes with S it commutes also with S ∗ VI.5.3. If T ∈ L(H) and {T n }n∈Z is bounded, then T is similar to a unitary operator. (T = S −1 U S) VI.5.4. Prove without using the spectral theorems: a. For any Q ∈ L(H), ker(Q∗ Q) = ker(Q). b. If S is normal, then ker(S) = ker(S ∗ ). c. If T is self-adjoint, then ker(T ) = ker(T 2 ). d. If S is normal, then ker(S) = ker(S 2 ). e. Normal operators are semisimple. VI.5.5. Prove without using the spectral theorems: If S is normal. then a. For all v ∈ H, S ∗ v = Sv . ¯ b. If Sv = λv then S ∗ v = λv. VI.5.6. If S is normal then S and S ∗ have the same eigenvectors with the corresponding eigenvalues complex conjugate. In particular, σ(S * ) = σ(S ). If ∗ ∗ T1 = S = S+S and T2 = S = S−S , then if Sv = λv, we have T1 v = λ v, 2 2i and T2 v = λ v. VI.5.7. Prove that the dimension of any commutative self-adjoint subalgebra of L(H) is bounded by dim H, and every such algebra is contained in a commutative self-adjoint subalgebra of L(H) of dimension dim H. JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

111

6.6

POSITIVE OPERATORS.

6.6.1 (6.6.1)

A self-adjoint operator S is nonnegative, written S ≥ 0, if Sv, v ≥ 0

for every v ∈ H. S is positive, written S > 0, if, in addition, Sv, v = 0 only for v = 0. 6.6.2 Lemma. A self-adjoint operator S is nonnegative, resp. positive, if, and only if, σ(S ) ⊂ [0, ∞), resp σ(S ) ⊂ (0, ∞). P ROOF : Use the spectral decomposition S = λ∈σ(T ) λπ λ . We have Sv, v = λj π j v 2 , which, clearly, is nonnegative for all v ∈ H if, and only if, λ ≥ 0 for all λ ∈ σ(S ). If σ(S ) ⊂ (0, ∞) and v 2= π λ v 2 > 0 then Sv, v > 0. If 0 ∈ σ(S ) take v ∈ ker(S), then Sv, v = 0 and S is not positive. 6.6.3 PARTIAL ORDERS ON THE SET OF SELF - ADJOINT OPERATORS . Let T and S be self-adjoint operators. The notions of positivity and nonnegativity deﬁne partial orders, “>” and “≥” on the set of self-adjoint operators on H. We write T > S if T − S > 0, and T ≥ S if T − S ≥ 0. Proposition. Let T and S be self-adjoint operators on H, and assume T ≥ S. Let σ(T ) = {λj } and σ(S ) = {µj }, both arranged in nondecreasing order. Then λj ≥ µj for j = 1, . . . , n. P ROOF : Use the minmax principle, exercise VI.4.1: λj =

dim W=j v∈W, v =1

min

max

T v, v ≥

dim W=j v∈W, v =1

min

max

Sv, v = µj

Remark: The condition “λj ≥ µj for j = 1, . . . , n” is necessary but, even if T and S commute, not suﬃcient for T ≥ S (unless n = 1). As example consider : {v1 , . . . , vn } is an orthonormal basis, T deﬁned by: T vj = 2jvj ; and S deﬁned by: Sv1 = 3v1 , Svj = vj for j > 1. The eigenvalues of T − S are νj = 2j − 1 for j > 1, but ν1 = −1. —DRAFT—

JANUARY 1, 2006

112

L INEAR A LGEBRA

6.7

POLAR DECOMPOSITION

6.7.1 Theorem. A positive operator S on H has a unique positive square root. √ √ λπ λ , where we P ROOF : Write S = λ∈σ(T ) λπ λ as above, and S = √ take the positive square roots of the (positive) λ’s. Then ( S)2 = S. If T is positive and T 2 = S then T and S commute so that T preserves all the eigenspaces Hλ of S. On each Hλ we have S = λI, (the identity operator √ on Hλ ) so that T = λJ, with positive square root, J positive, and J 2 = I. The eigenvalues of J are ±1, and the positivity of J implies that they are all 1, √ so J = I and T = S. A nonnegative operator S has square roots: write H = Hnull ⊕ Hpos where Hnull = ker(S) and Hpos = λ∈σ(S ) \{0} Hλ . The restriction Spos of S to Hpos is positive and, by the theorem, has a unique square root Spos . The restriction of S to Hnull is zero, and any operator T such that T 2 = 0 can serve as a square root of S on Hnull . This is the source of ambiguity in √ the deﬁnition of the square root. Setting S to be the operator that keeps both Hnull and Hpos invariant, is zero on Hnull , and is Spos on Hpos , is now a uniquely deﬁned nonnegative operator whose square is S. We’ll denote it, as √ 1 for positive S, by S or by S 2 . 6.7.2 Lemma. Let Hj ⊂ H, j = 1, 2, be isomorphic subspaces. Let U1 be an isometry H1 → H2 . Then there are unitary operators U on H that extend U1 .

⊥ ⊥ P ROOF : Deﬁne U on H1 as an arbitrary isometry onto H2 (which has the same dimension) and extend by linearity.

6.7.3 Lemma. Let A, B ∈ L(H), and assume that Av = Bv for all v ∈ H. Then there exists a unitary operator U such that B = U A. P ROOF : Clearly ker(A) = ker(B). Let {u1 , . . . , un } be an orthonormal basis of H such that {u1 , . . . , um } is a basis for ker(A) = ker(B). The subspace n range(A) is spanned by {Auj }n j=m+1 and range(B) is spanned by {Buj }j=m+1 . The map U1 : Auj → Buj extends by linearity to an isometry of range(A) onto range(B). Now apply Lemma 6.7.2, and remember that, on the range of A, U = U1

JANUARY 1, 2006

—DRAFT—

VI.

O PERATORS ON INNER - PRODUCT SPACES

113

Remark: The operator U is unique if, and only if, A (or B) is invertible. 6.7.4 We observed, 6.4.1, that for any T ∈ L(H), the operators S1 = T ∗ T and S2 = T T ∗ are self adjoint. Notice that unless T is normal, S1 = S2 . For any v ∈ H S1 v, v = T v, T v = T v

2

and

S2 v, v = T ∗ v 2 ,

so that both S1 and S2 are nonnegative, and both are positive if T is nonsingular. Let T ∈ L(H). The operators S1 = T ∗ T and S2 = T T ∗ are nonnegative and hence have nonnegative square roots. Observe that Tv

2

= T v, T v = T ∗ T v, v = S1 v, v = = S1 v, S1 v = S1 v 2 .

√ By Lemma 6.7.3, with A = S1 and B = T there exist unitary operators U √ such that T = U S1 . This proves Theorem (Polar decomposition6 ). Every operator T ∈ L(H) admits a representation (6.7.1) where U is unitary and R = √ T = U R, T ∗ T nonnegative.

Remark: Starting with T ∗ and taking adjoints at the end, one obtains also a √ representation of the form T = R1 U1 , with unitary U1 and R1 = T T ∗ nonnegative. Typically R1 = R, as shown by following example. Let T be the map on C2 deﬁned by T v1 = v2 , and T v2 = 0. Then R is the orthogonal projection onto the line of the scalar multiples of v1 , R1 is the orthogonal projection onto the multiples of v2 , and U = U1 maps each vj on the other. √ We shall use the notation |T | = T ∗ T .

6

Not to be confused with the polarisation formula,

—DRAFT—

JANUARY 1, 2006

114

L INEAR A LGEBRA

6.7.5 With T , |T |, and U as above, let {µ1 , . . . , µn } denote the eigenval√ ues of (the self-adjoint) |T | = T ∗ T , let {u1 , . . . , un } be the corresponding orthonormal eigenvectors, and denote vj = U −1 uj . Then {v1 , . . . , vn } is orthonormal, |T |v = µj v, uj uj , and (6.7.2) Tv = µj v, uj vj .

This is sometimes written7 as (6.7.3) T = µj uj ⊗ vj .

EXERCISES FOR SECTION 6.7

VI.7.1. Let {w1 , . . . , wn } be an orthonormal basis for H and let T be the (weighted) shift operator on {w1 , . . . , wn }, deﬁned by T wj = (n − j)wj+1 for j < n, and T wn = 0. Describe U and R in (6.7.1), as well as R1 and U1 above. VI.7.2. An operator T is bounded below by c, written T ≥ c, on a subspace V ⊂ H if T v ≥ c v for every v ∈ V. Assume that {u1 , . . . , un } and {v1 , . . . , vn } are orthonormal sequences, µj > 0, µj+1 ≤ µj , and T = µj uj ⊗ vj . Show that µj = max{c : there exists a j-dimensional subspace on which T ≥ c.}

7

See 4.2.2.

JANUARY 1, 2006

—DRAFT—

Chapter VII

Additional topics

Unless stated explicitely otherwise, the underlying ﬁeld of the vector spaces discussed in this chapter is either R or C.

7.1 QUADRATIC FORMS

7.1.1 A quadratic form on an n-dimensional inner-product space H is a function of the form Q(v) = T v, v with T ∈ L(H). A basis v = {v1 , . . . , vn } transforms Q into a function Qv of n variables on the underlying ﬁeld, R or C as the case may be. We use the notation appropriate1 for C. ¯ Write v = n xj vj and ai,j = T vi , vj ; then T v, v = i,j ai,j xi xj and 1 (7.1.1) Qv (x1 , . . . , xn ) =

i,j

ai,j xi xj ¯

expresses Q in terms of the variables {xj }, (i.e., the v-coordinates of v). We denote the matrix coefﬁcients (ai,j ) by Av , write the coordinates as of

x1

. a column vector, x = . , and observe that .

xn

(7.1.2)

Qv (x1 , . . . , xn ) = Ax, x = xTr Av x

transfers the action to Fn .

1

If the underlying ﬁeld is R the complex conjugation can be simply ignored

115

116

L INEAR A LGEBRA

7.1.2 When the underlying ﬁeld is R the quadratic form Q is real-valued. It does not determine the entries ai,j uniquely. Since xj xi = xi xj , the value of Q depends on ai,j + aj,i and not on each of the summands separately. We may therefore assume, without modifying Q, that ai,j = aj,i , thereby making the matrix Av = (ai,j ) symmetric. For real-valued quadratic forms over C the following lemma guarantees that the matrix of coefﬁcients is Hermitian. Lemma. A quadratic form xTr Av x on Cn is real-valued if, and only if, the matrix of coefﬁcients Av is Hermitian2 i.e., ai,j = aj,i . P ROOF : If ai,j = aj,i for all i, j, then i,j ai,j xi xj is it own complex conjugate. Conversely, assume that i,j ai,j xi xj is real-valued for all x1 , . . . , xn ∈ C. Taking xj = 0 for j = k, and xk = 1, we obtain ak,k ∈ R. Taking xk = xl = 1 and xj = 0 for j = k, l, we obtain ak,l + al,k ∈ R, i.e. ak,l = − al,k . For xk = i, xl = 1 we obtain i(ak,l − al,k ) ∈ R, i.e., ak,l = al,k and combining the two we have ak,l = al,k . 7.1.3 If we replace the basis v by another, say w, the coefﬁcients undergo a linear change of variables. There exists a matrix ∈ M(n), that transC

forms by left multiplication the w-coordinates y =

y1 . . . yn

of a vector into its

v-coordinates: x = Cy. Now (7.1.3) Qv (x1 , . . . , xn ) = xTr Av x = yTr C Av C y

Tr

and the matrix representing Q in terms of the variables yj , is3 (7.1.4) Aw = C Av C = C ∗ Av C.

Tr

Notice that the form now is C ∗ AC , rather then C −1 AC (which deﬁnes similarity). The two notions agree if C is unitary, since then C ∗ = C −1 , and the matrix of coefﬁcients for the variables {yj } is C −1 AC.

2 3

Equivalently, if the operator T is self-adjoint. The adjoint of a matrix is introduced in 6.2.3.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

117

7.1.4 The fact that the matrix of coefﬁcients of a real-valued quadratic form Q is self-adjoint makes it possible to simplify Q by a (unitary) change of variables that reduces it to a linear combination of squares. If the given matrix is A, we invoke the spectral theorem, 6.4.5, to obtain a unitary matrix U , such that U ∗ AU = U −1 AU is a diagonal matrix whose diagonal consists of the complete collection, including multiplicity, of the eigenvalues {λj } of A. In other words, if x = U y, then (7.1.5) Q(x1 , . . . , xn ) = λj |yj |2 .

There are other matrices C which diagonalize Q, and the coefﬁcients in the diagonal representation Q(y1 , . . . , yn ) = bj |yj |2 depend on the one used. What does not depend on the particular choice of C is the number n+ of positive coefﬁcients, the number n0 of zeros and the number n− of negative coefﬁcients. This is known as The law of inertia. D EFINITION : A quadratic form Q(v) on a (real or complex) vector space V is positive, resp. negative if Q(v) > 0, resp. Q(v) < 0, for all v = 0 in V. On an inner-product space Q(v) = Av, v with a self-adjoint operator A, and our current deﬁnition is consistent with the deﬁnition in 6.6.1: the operator A is positive if so is Q(v) = Av, v . Denote n+ = max dim V1 (7.1.6)

V1

: :

Q is positive on V1 . Q is negative on V1 .

n− = max dim V1

V1

and, n0 = n − n+ − n− . Proposition. Let v be a basis in terms of which Q(y1 , . . . , yn ) = bj |yj |2 , and arrange the coordinates so that bj > 0 for j ≤ m and bj ≤ 0 for j > m. Then m = n+ . P ROOF : Denote V+ = span[v1 , . . . vm ], and V≤0 = span[vm+1 , . . . vn ] the complementary subspace. Q(y1 , . . . , yn ) is clearly positive on V+ , so that m ≤ n+ . On the other hand, by Theorem 2.5.3, every subspace W of dimension > m has elements v ∈ V≤0 , and for such v we clearly have Q(v) ≤ 0. —DRAFT—

JANUARY 1, 2006

118

L INEAR A LGEBRA

The proposition applied to −Q shows that n− equals the number of negative bj ’s. This proves Theorem (Law of inertia). Let Q be a real-valued quadratic form. Then in any representation Q(y1 , . . . , yn ) = bj |yj |2 , the number of positive coefﬁcients is n+ , the number of negative coefﬁcients is n− , and the number of zeros is n0 .

EXERCISES FOR SECTION 7.1

VII.1.1. Prove that if Av, v = Bv, v for all v ∈ Rn , with A, B ∈ M(n, R), and both symmetric, then A = B. 7.2 POSITIVE MATRICES

A matrix A ∈ M(m, C) is positive4 if all the entries are positive. A is nonnegative if all the entries are nonnegative. Similarly, a vector v ∈ Cm is positive, resp. nonnegative, if all its entries are positive, resp. non-negative. With Aj denoting either matrices or vectors, A1 ≥ A2 , A1 A2 , and A1 > A2 will mean respectively that A1 − A2 is nonnegative, nonnegative but not zero, positive. 7.2.1 We write A norm of A.

sp

= max{|τ | : τ ∈ σ(A)}, and refer to it as the spectral

D EFINITION : An eigenvalue λ of a matrix A is called dominant if a. λ is simple (that is ker((A − λ)2 ) = ker(A − λ) is one dimensional), and b. every other eigenvalue µ of A satisﬁes |µ| < |λ|. Notice that b. implies that |λ| = A sp . Theorem (Perron). Let A = (ai,j ) be a positive matrix. Then it has a positive dominant eigenvalue and a positive corresponding eigenvector. Moreover, there is no other nonnegative eigenvector for A.

4

Not to be confused with positivity of the

operator of multiplication by A.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

119

P ROOF : Let p(A) be the set of all positive numbers µ such that there exist nonnegative vectors v = 0 such that (7.2.1) Av ≥ µv.

Clearly mini ai,i ∈ p(A); also µ ≤ m maxi,j ai,j for all µ ∈ p(A). Hence p(A) is non-empty and bounded. Write λ = supµ∈p(A) µ. We propose to show that λ ∈ p(A), and is the dominant eigenvalue for A. Let µn ∈ p(A) be such that µn → λ, and vn = (vn (1), · · · , vn (m)) 0 such that Avn ≥ µn vn . We normalize vn by the condition j vn (j) = 1, and since now 0 ≤ vn (j) ≤ 1 for all n and j, we can choose a (sub)sequence nk such that vnk (j) converges for each 1 ≤ j ≤ m. Denote the limit by v∗ (j) and let v∗ = (v∗ (1), · · · , v∗ (m)). Since all the entries of Avnk converge to the corresponding entries in Av∗ we have j v∗ (j) = 1, and (7.2.2) Av∗ ≥ λv∗ .

Claim: the inequality (7.2.2) is in fact an equality, so that λ is an eigenvalue and v∗ a corresponding eigenvector. If one of the entries in λv∗ , say λv∗ (l), were smaller than the l’th entry in Av∗ , we could replace v∗ by v∗∗ = v∗ + εel (where el is the unit vector that has 1 as its l’th entry and zero everywhere else) with ε > 0 small enough to have Av∗ (l) ≥ λv∗∗ (l). Since Ael is (strictly) positive, we would have Av∗∗ > Av∗ ≥ λv∗∗ , and for δ > 0 sufﬁciently small we would have Av∗∗ ≥ (λ + δ)v∗∗ contradicting the deﬁnition of λ. Since Av is positive for any v 0, a nonnegative vector which is an eigenvector of A with positive eigenvalue, is positive. In particular, v∗ > 0. Claim: λ is a simple eigenvalue. a. If Au = λu for some vector u, then A u = λ u and A u = λ u. So it would be enough to show that u is a constant multiple of v∗ under the —DRAFT—

JANUARY 1, 2006

120

L INEAR A LGEBRA

assumption that u has real entries. There exists a constant c = 0 such that v∗ + cu has nonnegative entries and at least one vanishing entry. Since v∗ + cu is an eigenvector for λ, the previous remark shows that v∗ + cu = 0 and u is a multiple of v∗ . b. We need to show that ker((A − λ)2 ) = ker((A − λ)). Assume the contrary, and let u ∈ ker((A − λ)2 ) \ ker((A − λ)), so that (7.2.3) Au = λu + cv∗

with c = 0. Splitting (7.2.3) into its real and imaginary parts we have: (7.2.4) A u = λ u + cv∗ A u = λ u + cv∗ .

Either c1 = c = 0 or c2 = c = 0 (or both). This shows that there is no loss of generality in assuming that u and c in (7.2.3) are real valued. Replace u, if necessary, by u1 = −u to obtain Au1 = λu1 + c1 v∗ with c1 > 0. Let a > 0 be such that u1 + av∗ > 0, and observe that A(u1 + av∗ ) = λ(u1 + av∗ ) + c1 v∗ so that A(u1 + av∗ ) > λ(u1 + av∗ ) contradicting the maximality of λ. c. Claim: Every eigenvalue µ = λ of A satisﬁes |µ| < λ. Let µ be an eigenvalue of A, and let w = 0 be a corresponding eigenvector: Aw = µw. Denote |w| = (|w(1)|, · · · , |w(m)|). The positivity of A implies A|w| ≥ |Aw| and, (7.2.5) A|w| ≥ |Aw| ≥ |µ||w|

so that |µ| ∈ p(A), i.e., |µ| ≤ λ. If |µ| = λ we must have equality in (7.2.5) and |w| = cv∗ . Equality in (7.2.5) can only happen if A|w| = |Aw| which means that all the entries in w have the same argument, i.e. w = eiϑ |w|, in other words, w is a constant multiple of v∗ , and µ = λ. Finally, let µ = λ be an eigenvalue of A and w a corresponding eigenvector. Tr The adjoint A∗ = A is a positive matrix and has the same dominant eigenvalue λ. If v ∗ is the positive eigenvector corresponding to λ then w, v ∗ = 0 (see exercise VI.2.4) and since v ∗ is strictly positive, w must have both positive and negative entries.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

121

EXERCISES FOR SECTION 7.2

VII.2.1. What part of the conclusion of Perron’s theorem remains valid if the assumption is replaced by “A is similar to a positive matrix” ? 7.3 NONNEGATIVE MATRICES

Nonnegative matrices exhibit a variety of modes of behavior. Consider the following n × n matrices a. The identity matrix. 1 is the only eigenvalue, multiplicity n. b. The nilpotent matrix having ones below the diagonal, zeros elsewhere. The spectrum is {0}. c. The matrix Aσ of a permutation σ ∈ Sn . The spectrum depends on the decomposition of σ into cycles. If σ is a unique cycle then the spectrum of Aσ is the set of roots of unity of order n. The eigenvalue 1 has (1, . . . , 1) as a unique eigenvector. If the decomposition of σ consists of k cycles of lengths lj , j = 1, . . . , k, then the spectrum of Aσ is the union of the sets of roots of unity of order lj . The eigenvalue 1 now has multiplicity k. 7.3.1 Let 11 denote the matrix all of whose entries are 1. If A ≥ 0 then 1 1 1 A + m 11 > 0 and has, by Perron’s theorem, a dominant eigenvalue λm and a corresponding positive eigenvector vm which we normalize by the condition n j=1 vm (j) = 1. λm is monotone non increasing as m → ∞ and converges to a limit λ ≥ 0 which clearly dominates the spectrum of A. λ can well be zero, as can be seen from example b. above. For a sequence {mi } the vectors vmi converge to a (normalized) nonnegative vector v∗ which, by continuity, is an eigenvector for λ. Thus, a nonnegative matrix has λ = A sp as an eigenvalue with nonnegative eigenvector v∗ , however 1. λ may be zero, 2. λ may have high multiplicity, —DRAFT—

JANUARY 1, 2006

122

L INEAR A LGEBRA

3. λ may not have positive eigenvectors. 4. There may be other eigenvalues of modulus A

sp .

The ﬁrst three problems disappear, and the last explained for transitive nonnegative matrices. See below. 7.3.2 D EFINITIONS . Assume A ≥ 0. We use the following terminology: A connects the index j to i (connects (j, i) for short) directly if ai,j = 0. Since Aej = ai,j ei , A connects (j, i) if ei appears (with nonzero coefﬁcient) in the expansion of Aej . A connects j to i (connects (j, i) for short) if there is a connecting chain for (j, i), that is, a sequence {sl }k such that j = s0 , i = sk and k asl ,sl−1 = l=1 l=0 0. The existence of connecting chain for (j, i) is equivalent to: ei appears (with nonzero coefﬁcient) in the expansion of Ak ej . An index j is A-recurrent if A connects it to itself—there is a connecting chain for (j, j). The lengths k of connecting chains for (j, j) are called return times for j. Since connecting chains for (j, j) can be concatenated, the set of return times for a recurrent index is an additive semigroup of N. A is transitive 5 if it connects every pair (j, i). Lemma. If A is a nonnegative transitive matrix, every index is A-recurrent. In particular, A is not nilpotent. P ROOF : Left as an exercise. Corollary. If A is a nonnegative transitive matrix then λ = A

sp

> 0.

7.3.3 We write i ≤A j if A connects (i, j). This deﬁnes a partial order and induces an equivalence relation in the set of A-recurrent indices. (The nonrecurrent indices are not equivalent to themselves, nor to anybody else.) We can reorder the indices in a way that gives each equivalence class a consecutive bloc, and is compatible with the partial order, i.e., such that for nonequivalent indices, i ≤A j implies i ≤ j. This ordering is not unique: equivalent indices can be ordered arbitrarily within their equivalence class; pairs of

5

Also called irreducible, or ergodic.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

123

equivalence classes may be ≤A comparable or not comparable in which case each may precede the other; non-recurrent indices may be placed consistently in more than one place. Yet, such order gives the matrix A a “quasi-supertriangular form”: if we denote the coefﬁcients of the “reorganized” A again by ai,j , then ai,j = 0 for i greater than the end of the bloc containing j. That means that now A has square transitive matrices centered on the diagonal—the squares Jl × Jl corresponding to the equivalence classes, while the entries on the rest of diagonal, at the non-recurrent indices, as well as in the rest of the sub-diagonal, are all zeros. This reduces much of the study of the general A to that of transitive matrices. 7.3.4 We focus now on transitive matrices. n j A nonnegative matrix A is transitive if, and only if, B = j=1 A is positive. Since, by 7.3.1, λ = A sp is an eigenvalue for A, it follows that β = n λj is an eigenvalue for B, having the same eigenvalue v∗ . 1 Either by observing that β = B sp , or by invoking the part in Perron’s theorem stating that (up to constant multiples) there is only one nonnegative eigenvector for B (and it is in fact positive), we see that β is the dominant eigenvalue for B and v∗ is positive. Lemma. Assume A transitive, v ≥ 0, µ > 0, Av positive vector u ≥ v such that Au > µu. µv. Then there exists a

P ROOF : As in the proof of Perron’s theorem: let l be such that Av(l) > µvl , let 0 < ε1 < Av(l) − µvl and v1 = v + ε1 el . Then Av ≥ µv1 , hence Av1 = Av + ε1 Ael ≥ µv1 + ε1 Ael , and Av1 is strictly bigger than µv1 at l and at all the entries on which Ael is positive, that is the i’s such that ai,l > 0. Now deﬁne v2 = v1 + ε2 Ael with ε2 > 0 sufﬁciently small so that Av2 ≥ µv2 with strict inequality for l and the indices on which Ael + A2 el is positive. Continue in the same manner with v3 , and Av3 ≥ µv3 with strict inequality on the support of (I + A + A2 + A3 )el etc. The transitivity of A guarantees that after k ≤ n such steps we obtain u = vk > 0 such that Au > µu. —DRAFT—

JANUARY 1, 2006

124

L INEAR A LGEBRA

The lemma implies in particular that if, for some µ > 0, there exists a vector v ≥ 0 such that Av µv, then µ < A sp . This since the condition Au > µu 1 implies6 (A + m 11)u > (1 + a)µu for a > 0 sufﬁciently small, and all m. In 1 turn this implies λm > (1 + a)µ for all m, and hence λ ≥ (1 + a)µ. In what follows we simplify the notation somewhat by normalizing (multiplying by a positive constant) the nonnegative transitive matrices under discussion so that A sp = 1. Corollary. Assume A sp = 1. If µ = eiϕ is an eigenvalue of A and uµ a corresponding7 eigenvector, then |uµ | = v∗ . P ROOF : A|uµ | ≥ |Auµ | = |µuµ | = |uµ |.

sp

If A|uµ | = |uµ | the lemma would contradict the assumption A

= 1.

7.3.5 For v ∈ Cn , |v| > 0 we write arg v = (arg v1 , . . . , arg vn ), and8 ei arg v = (ei arg v1 , . . . , ei arg vn ). The key observation is: if Auµ = µuµ , then A|uµ | = |Auµ | which means that every entry in Auµ is a linear combination of entries of uµ having the same argument, that is on which arg uµ is constant. The set [1, . . . , n] is partitioned into the level sets Ij on which arg uµ = ϑj , and A maps el for every l ∈ Ij , and hence span[{el }k∈Ij ], into span[{ek }k∈Is ] where ϑs = ϑj + ϕ. Let ν = eiψ be another eigenvalue of A, with eigenvector uν = ei arg uν v∗ , and let Jk be the level sets on which arg uν = γk . A maps el for every l ∈ Jk , into span[{em }m∈Js ] where γs = γk + ψ. It follows that for l ∈ Ij ∩ Jk , Ael ∈ span[{ek }k∈Is ] ∩ span[{em }m∈Jt ] where ϑs = ϑj +ϕ and γt = γk +ψ. If we write uµν = ei(arg uµ +arg uν ) v∗ , then arg Aei(ϑj +γk ) el = arg uµ + arg uν + ϕ + ψ, which means: Auµν = µν uµν . This proves that the product µν = ei(ϕ+ψ) of eigenvalues of A is an eigenvalue, and σ(A) ∩{z : |z| = 1} is a subgroup of the multiplicative unit circle; i.e., the group of roots of unity of order m for an appropriate m.

6 7

See 7.3.1 for the notation. Normalized: j |uµ (j)| = 1. 8 The notation considers Cn as an algebra of functions on the space [1, . . . , n].

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

125

The group σ(A) ∩{z : |z| = 1}, (or {eit : A sp eit ∈ σ(A)} if A is not normalized), is called the period group of A and its order m is the periodicity of A. We call the partition of [1, . . . , n] into the the level sets Ij of arg vµ , where µ is a generator of the period group of A, the basic partition. The subspaces Vj = span[{el : l ∈ Ij }] are Am -invariant and the restriction of Am to Vj is transitive with the dominant eigenvalue 1, and v∗,j = l∈Ij v∗ (l)el the corresponding eigenvector. The restriction of Am to Vj has |Ij | − 1 eigenvalues of modulus < 1. Summing for 1 ≤ j ≤ m and invoking the Spectral Mapping Theorem, 5.1.2, we see that A has n − m eigenvalues of modulus < 1. This proves that the eigenvalues in the period group are simple and have no generalized eigenvectors. Theorem (Frobenius). Let A be a transitive nonnegative n × n matrix. Then λ = A sp is a simple eigenvalue of A and has a positive eigenvector v∗ . The set {eit : λeit ∈ σ(A)} is a subgroup of the unit circle. 7.3.6 D EFINITION : A matrix A ≥ 0 is strongly transitive if Am is transitive for all m ∈ [1, . . . , n]. Theorem. If A is strongly transitive, then A sp is a dominant eigenvalue for A, and has a positive corresponding eigenvector. P ROOF : The periodicity of A has to be 1.

EXERCISES FOR SECTION 7.3

VII.3.1. A nonnegative matrix A is nilpotent if, and only if, no index is A-recurrent. VII.3.2. Prove that a nonnegative matrix A is transitive if, and only if, B = l=1 Al is positive. Hint: Check that A connects (i, j) if, and only if, n Al connects j to i directly. l=1 VII.3.3. Prove that the conclusion Perron’s theorem holds under the weaker assumption: “the matrix A is nonnegative and has a full row of positive entries”. VII.3.4. Prove that if the elements Ij of the basic partition are not equal in size, then ker(A) is nontrivial.

n

—DRAFT—

JANUARY 1, 2006

126

L INEAR A LGEBRA

Hint: Show that dim ker(A) ≥ max|Ij | − min|Ij |.

VII.3.5. Describe the matrix of a transitive A if the basis elements are reordered so that the elements of the basic partition are blocs of consecutive integers in [1, . . . , n], VII.3.6. Prove that if A ≥ 0 is transitive, then so is A∗ . VII.3.7. Prove that if A ≥ 0 is transitive, λ = A sp , and v ∗ is the positive eigenvector of A∗ , normalized by the condition v∗ , v ∗ = 1 then for all v ∈ Cn , (7.3.1) 1 N →∞ N lim

N

λ−j Aj v = v, v ∗ v∗ .

1

VII.3.8. Let σ be a permutation of [1, . . . , n]. Let Aσ be the n × n matrix whose entries aij are deﬁned by (7.3.2) aij = 1 if i = σ(j) 0 otherwise.

What is the spectrum of Aσ , and what are the corresponding eigenvectors. VII.3.9. Let 1 < k < n, and let σ ∈ Sn , be the permutation consisting of the two cycles (1, . . . , k) and (k + 1, . . . , n), and Aσ as deﬁned above. (So that the corresponding operator on Cn maps the basis vector ei onto eσ(i) .) a. Describe the positive eigenvectors of A. What are the corresponding eigenvalues? b. Let 0 < a, b < 1. Denote by Aa,b the matrix obtained from A by replacing the k’th and the n’th columns of A by (ci,k ) and (ci,n ), resp., where c1,k = 1 − a, ck+1,k = a and all other entries zero; c1,n = b, ck+1,n = 1 − b and all other entries zero. Show that 1 is a simple eigenvalue of Aa.b and ﬁnd a positive corresponding eigenvector. Show also that for other eigenvalues there are no nonnegative eignevectors. 7.4 STOCHASTIC MATRICES.

7.4.1 A stochastic matrix is a nonnegative matrix A = (ai,j ) such that the sum of the entries in each column9 is 1: (7.4.1)

i

9

ai,j = 1.

The action of the matrix is (left) multiplication of column vectors. The columns of the matrix are the images of the standard basis in Rn or Cn

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

127

A probability vector is a nonnegative vector π = (pl ) ∈ Rn such that l pl = 1. Observe that if A is a stochastic matrix and π a probability vector, then Aπ is a probability vector. In applications, one considers a set of possible outcomes of an “experiment” at a given time. The outcomes are often referred to as states, and a probability vector assigns probabilities to the various states. The word probability is taken here in a broad sense—if one is studying the distribution of various populations, the “probability” of a given population is simply its proportion in the total population. A (stationary) n-state Markov chain is a sequence {vj }j≥0 of probability vectors in Rn , such that (7.4.2) vj = Avj−1 = Aj v0 ,

where A is an n × n stochastic matrix. The matrix A is the transition matrix, and the vector v0 is referred to as the initial probability vector. The parameter j is often referred to as time. 7.4.2 P OSITIVE TRANSITION MATRIX . When the transition matrix A is positive, we get a clear description of the evolution of the Markov chain from Perron’s theorem 7.2.1. Condition (7.4.1) is equivalent to u∗ A = u∗ , where u∗ is the row vector (1, . . . , 1). This means that the dominant eigenvalue for A∗ is 1, hence the dominant eigenvalue for A is 1. If v∗ is the corresponding (positive) eigenvector, normalized so as to be a probability vector, then Av∗ = v∗ and hence Aj v∗ = v∗ for all j. If w is another eigenvector (or generalized eigenvector), it is orthogonal to u∗ , that is: n w(j) = 0. Also, |Al w(j)| is exponentially small (as a 1 function of l). If v0 is any probability vector, we write v0 = cv∗ + w with w in the span of the eigenspaces of the non dominant eigenvalues. By the remark above c = v0 (j) = 1. Then Al v0 = v∗ + Al w and, since Al w → 0 as l → ∞, we have Al v0 → v∗ . Finding the vector v∗ amounts to solving a homogeneous system of n equations (knowing a-priori that the solution set is one dimensional). The observa—DRAFT—

JANUARY 1, 2006

128

L INEAR A LGEBRA

tion v∗ = lim Al v0 , with v0 an arbitrary probability vector, may be a fast way way to obtain a good approximation of v∗ . 7.4.3 T RANSITIVE TRANSITION MATRIX . Denote vµ the eigenvectors of A corresponding to eigenvalues µ of absolute value 1, normalized so that v1 = v∗ is a probability vector, and |vµ | = v∗ . If the periodicity of A is m, then, for every probability vector v0 , the sequence Aj v0 is equal to an m-periodic sequence (periodic sequence of of period m) plus a sequence that tends to zero exponentially fast. Observe that for an eigenvalue µ = 1 of absolute value 1, m µl = 0. It 1 follows that if v0 is a probability vector, then (7.4.3) 1 k+m l A v0 → v∗ m l=k+1

exponential fast (as a function of k). 7.4.4 R EVERSIBLE M ARKOV CHAINS . One way of obtaining a transition matrix is from a nonnegative symmetric matrix (pi,j ) by writing Wj = i pij pi,j and, assuming Wj > 0 for all i, ai,j = Wi . Then A = (ai,j ) is stochastic since i ai,j = 1 for all j. We can identify the “stable distribution”—the A-invariant vector—by thinking in terms of “population mouvement”. Assume that at a given time we have population of size bj in state j and in the next unit of time ai,j proportion of this population shifts to state i. The absolute size of the j to i shift is ai,j bj so that the new distribution is given by Ab, where b is the column vector with entries bj . This description applies to any stochastic matrix, and the stable distribution is given by b which is invariant under A, Ab = b. The easiest way to ﬁnd b in this case is to go back to the matrix (pi,j ) and the weights Wj . The vector w with entries Wj is A-invariant in a very strong sense. Not only is Aw = w, but the exchange of mass between any two states is even: • the mass going from i to j is: Wi aj,i = pj,i , • the mass going from j to i is: Wj ai,j = pi,j , • the two are equal since pi,j = pj,i .

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

129

EXERCISES FOR SECTION 7.4

VII.4.1. Let σ be a permutation of [1, . . . , n]. Let Aσ be the n × n matrix whose entries aij are deﬁned by (7.4.4) aij = 1 if i = σ(j) 0 otherwise.

What is the spectrum of Aσ , and what are the corresponding eigenvectors. VII.4.2. Let 1 < k < n, and let σ ∈ Sn , be the permutation consisting of the two cycles (1, . . . , k) and (k + 1, . . . , n), and Aσ as deﬁned above. (So that the corresponding operator on Cn maps the basis vector ei onto eσ(i) .) a. Describe the positive eigenvectors of A. What are the corresponding eigenvalues? b. Let 0 < a, b < 1. Denote by Aa,b the matrix obtained from A by replacing the k’th and the n’th columns of A by (ci,k ) and (ci,n ), resp., where c1,k = 1 − a, ck+1,k = a and all other entries zero; c1,n = b, ck+1,n = 1 − b and all other entries zero. Show that 1 is a simple eigenvalue of Aa.b and ﬁnd a positive corresponding eigenvector. Show also that for other eigenvalues there are no nonnegative eignevectors.

7.5

REPRESENTATION OF FINITE GROUPS

A representation of a group G in a vector space V is a homomorphism σ : g → g of G into the group GL(V) of invertible elements in LV. Throughout this section G will denote a ﬁnite group. A representation of G in V turns V into a G-module, or a G-space. That means that in addition to the vector space operations there is an action of G on V by linear maps: for every g ∈ G and v ∈ V the element g v ∈ V is well deﬁned and, g(av1 + bv2 ) = agv1 + bgv2 while (g1 g2 )v = g1 (g2 v).

The data (σ, V), i.e., V as a G-space, is called a representation of G in V. The representation is faithful if σ is injective. Typically, σ is assumed known and is omitted from the notation. We shall use the terms G-space, G-module, and representation as synonyms. —DRAFT—

JANUARY 1, 2006

130

L INEAR A LGEBRA

We shall deal mainly in the case in which the underlying ﬁeld is C, or R, and the space has an inner-product structure. The inner-product is assumed for convenience only: it identiﬁes the space with its dual, and makes LV selfadjoint. An inner product can always be introduced (e.g., by declaring a given basis to be orthonormal). 7.5.1 T HE DUAL REPRESENTATION . If σ is a representation of G in V we obtain a representation σ ∗ of G in V ∗ by setting σ ∗ (g) = (σ(g −1 )∗ (the adjoint of the inverse of the action of G on V). Since both g → g−1 and g → g∗ reverse the order of factors in a product, their combination as used above preserves the order, and we have σ ∗ (g1 g2 ) = σ ∗ (g1 )σ ∗ (g2 ) so that σ ∗ is in fact a homomorphism. When V is endowed with an inner product, and is thereby identiﬁed with its dual, and if σ is unitary, then σ ∗ = σ. 7.5.2 Let Vj be G-spaces. We extend the actions of G to V1 ⊕V2 and V1 ⊗V2 by declaring10 (7.5.1) g(v1 ⊕ v2 ) = gv1 ⊕ gv2 and g(v1 ⊗ v2 ) = gv1 ⊗ gv2

∗ L(V1 , V2 ) = V2 ⊗ V1 and as such it is a G-space.

7.5.3 G- MAPS . Let Hj be G-spaces, j = 1, 2. A map S : H1 → H2 is a G-map if it commutes with the action of G. This means: for every g ∈ G, Sg = gS. The domains of the various actions is more explicit in the diagram H1 − − → H2 −−

g S S

g

H1 − − → H2 −− and the requirement is that it commute.

Observe that the symbol g signiﬁes, in (7.5.1) and elswhere, different operators, acting on different spaces.

10

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

131

The preﬁx G- can be attached to all words describing linear maps, thus, a G-isomorphism is an isomorphism which is a G-map, etc. If Vj , j = 1, 2, are G-spaces, we denote by LG (V1 , V2 ) the space of linear G-maps of V1 into V2 . 7.5.4 Lemma. Let S : H1 → H2 be a G-homomorphism. Then ker(S) is a subrepresentation, i.e., G-subspace, of H1 , and range(S) is a subrepresentation of H2 . D EFINITION : Two representations Hj of G are equivalent if there is a G-isomorphism S : H1 → H2 , that is, if they are isomorphic as G-spaces. 7.5.5 (7.5.2) AVERAGING , I. For a ﬁnite subgroup G ⊂ GL(H) we write IG = {v ∈ H : gv = v for all g ∈ G}.

In words: IG is the space of all the vectors in H which are invariant under every g in G. Theorem. The operator (7.5.3) is a projection onto IG . P ROOF : π G is clearly the identity on IG . All we need to do is show that 1 range(π G ) = IG , and for that observe that if v = |G| g∈G gu, then g1 v = 1 g1 gu |G| g∈G πG = 1 g |G| g∈G

and since {g1 g : g ∈ G} = G, we have g1 v = v. 7.5.6 AVERAGING , II. The operator Q = and can be used to deﬁne a new inner product (7.5.4) v, u

Q g∈G

g∗ g is positive, selfadjoint,

= Qv, u =

g∈G

gv, gu

—DRAFT—

JANUARY 1, 2006

132

L INEAR A LGEBRA

and the corresponding norm v

2 Q

=

g∈G

gv, gv =

g∈G

gv 2 .

Since {g : g ∈ G} = {gh : g ∈ G}, we have (7.5.5) hv, hu

Q

=

g∈G

ghv, ghu = Qv, u ,

and hv Q = v Q . Thus, G is a subgroup of the “unitary group” corresponding to ·, · Q . Denote by HQ the inner product space obtained by replacing the given inner-product by ·, · Q . Let {u1 , . . . , un } be an orthonormal basis of H, and {v1 , . . . , vn } be an orthonormal basis of HQ . Deﬁne S ∈ GL(H) by imposing Suj = vj . Now, S is an isometry from H onto HQ , g unitary on HQ (for any g ∈ G), and S −1 an isometry from HQ back to H; hence S −1 gS ∈ U (n). In other words, S conjugates G to a subgroup of the unitary group U (H). This proves the following theorem Theorem. Every ﬁnite subgroup of GL(H) is conjugate to a subgoup of the unitary group U (H). 7.5.7 D EFINITION : A unitary representation of a group G in an innerproduct space H is a representation such that g is unitary for all g ∈ G. The following is an immediate corollary of Theorem 7.5.6 Theorem. Every ﬁnite dimensional representation of a ﬁnite group is equivalent to a unitary representation. 7.5.8 Let G be a ﬁnite group and H a ﬁnite dimensional G-space (a ﬁnite dimensional representation of G). A subspace U ⊂ H is G-invariant if it is invariant under all the maps g, g ∈ G. If U ⊂ H is G-invariant, restricting the maps g, g ∈ G, to U deﬁnes U as a representation of G and we refer to U as a subrepresentation of H. A subspace U is G-reducing if it is G-invariant and has a G-invariant complement, i.e., H = U ⊕ V with both summands G-invariant.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

133

Lemma. Every G-invariant subspace is reducing. P ROOF : Endow the space with the inner product given by (7.5.4) (which makes the representation unitary) and observe that if U is a nontrivial G-invariant subspace, then so is its orthogonal complement, and we have a direct sum decomposition H = U ⊕ V with both summands G-invariant. We say that (the representation) H is irreducible if there is no non-trivial G-invariant subspace of H and (completely) reducible otherwise. In the terminology of V.2.7, H is irreducible if (H, G) is minimal. Thus, if H is reducible, there is a (non-trivial) direct sum decomposition H = U ⊕ V with both summands G-invariant. We say, in this case, that σ is the sum of the representations U and V. If either representation is reducible we can write it as a sum of representations corresponding to a further direct sum decomposition of the space ( U or V) into G invariant subspaces. After no more than dim H such steps we obtain H as a sum of irreducible representations. This proves the following theorem: Theorem. Every ﬁnite dimensional representation H of a ﬁnite group G is a sum of irreducible representations. That is (7.5.6) H= Uj

Uniqueness of the decomposition into irreducibles Lemma. Let V and U be irreducible subrepresentations of H. Then, either W = U ∩ V = {0}, or U = V. P ROOF : W is clearly G-invariant. 7.5.9 T HE REGULAR REPRESENTATION . Let G be a ﬁnite group. Denote by 2 (G) the vector space of all complex valued functions on G, and deﬁne the inner product, for ϕ, ψ ∈ 2 (G), by ϕ, ψ =

x∈G

ϕ(x)ψ(x).

JANUARY 1, 2006

—DRAFT—

134

L INEAR A LGEBRA

2 (G)

For g ∈ G, the left translation by g is the operator τ (g) on (τ (g)ϕ)(x) = ϕ(g −1 x). Clearly τ (g) is linear and, in fact, unitary. Moreover,

deﬁned by

−1 −1 (τ (g1 g2 )ϕ)(x) = ϕ((g1 g2 )−1 x) = ϕ(g2 (g1 x)) = (τ (g1 )τ (g2 )ϕ)(x)

so that τ (g1 g2 ) = τ (g1 )τ (g2 ) and τ is a unitary representation of G. It is called the regular representation of G. If H ⊂ G is a subgroup we denote by 2 (G/H) the subspace of 2 (G) of the functions that are constant on left cosets of H. Since multiplication on the left by arbitrary g ∈ G maps left H-cosets onto left H-cosets, 2 (G/H) is τ (g) invariant, and unless G is simple, that is—has no nontrivial subgroups, τ is reducible. If H is not a maximal subgroup, that is, there exists a proper subgroup H1 that contains H properly, then left cosets of H1 split into left cosets of H so that 2 (G/H1 ) ⊂ 2 (G/H) and τ 2 (G/H) is reducible. This proves the following: Lemma. If the regular representation of G is irreducible, then G is simple. The converse is false! A cyclic group of order p, with prime p, is simple. Yet its regular representation is reducible. In fact, Proposition. Every representation of a ﬁnite abelian group is a direct sum of one-dimensional representations. P ROOF : Exercise VII.5.2 7.5.10 Let W be a G space and let , be an inner-product in W. Fix a non-zero vector u ∈ W and, for v ∈ W and g ∈ G, deﬁne (7.5.7) fv (g) = g−1 v, u

The map S : v → fv is a linear map from W into 2 (G). If W is irreducible and v = 0, the set {gv : g ∈ G} spans W which implies that fv = 0, i.e., S is injective.

JANUARY 1, 2006

—DRAFT—

VII.

A DDITIONAL TOPICS

135

Observe that for γ ∈ G, (7.5.8) τ (γ)fv (g) = fv (γ −1 g) = g−1 γv, u = fγv (g),

so that the space SW = WS ⊂ 2 (G) is a reducing subspace of the regular representation of 2 (G) and S maps σ onto the (restriction of the) regular representation τ (to) on WS . This proves in particular Proposition. Every irreducible representation of G is equivalent to a subrepresentation of the regular representation. Corollary. There are only a ﬁnite number of distinct irreducible representations of a ﬁnite group G. EXERCISES FOR SECTION 7.5

VII.5.1. If G is ﬁnite abelian group and σ a representation of G in H, then the linear span of {σ(g) : g ∈ G} is a selfadjoint commutative subalgebra of LH. VII.5.2. Prove that every representation of a ﬁnite abelian group is a direct sum of one-dimensional representations. Hint: 6.5.2 1 n VII.5.3. Consider the representation of Z in R2 deﬁned by σ(n) = . Check 0 1 the properties shown above for representations of ﬁnite groups that fail for σ.

—DRAFT—

JANUARY 1, 2006

136

L INEAR A LGEBRA

JANUARY 1, 2006

—DRAFT—

Appendix

A.1

EQUIVALENCE RELATIONS — PARTITIONS.

A.1.1 E QUIVALENCE RELATIONS . A binary relation on a set X is a subset R ⊂ X × X. We write xRy when (x, y) ∈ R. E XAMPLES : a. Equality: R = {(x, x) : x ∈ X}, xRy means x = y. b. Order in Z: R = {(x, y) : x < y}. D EFINITION : An equivalence relation in a set X, is a binary relation (denoted here x ≡ y) that is reﬂexive: for all x ∈ X, x ≡ x; symmetric: for all x, y ∈ X, if x ≡ y, then y ≡ x; and transitive: for all x, y, z ∈ X, if x ≡ y and y ≡ z, then x ≡ z. E XAMPLES : a. Of the two binary relations above, equality is an equivalence relation, order is not. b. Congruence modulo an integer. Here X = Z, the set of integers. Fix an integer k. x is congruent to y modulo k and write x ≡ y (mod k) if x−y is an integer multiple of k. c. For X = {(m, n) : m, n ∈ Z, n = 0}, deﬁne (m, n) ≡ (m1 , n1 ) by the condition mn1 = m1 n. This will be familiar if we write the pairs as m n instead of (m, n) and observe that the condition mn1 = m1 n is the one 1 deﬁning the equality of the rational fractions m and m1 . n n 137

138

L INEAR A LGEBRA

A.1.2

PARTITIONS .

D EFINITION : A partition of X is a collection P of (pairwise) disjoint subsets Pα ⊂ X whose union is X. A partition P deﬁnes an equivalence relation: by deﬁnition, x ≡ y if, and only if, x and y belong to the same element of the partition. Conversely, given an equivalence relation on X, we deﬁne the equivalence class of x ∈ X as the set Ex = {y ∈ X : x ≡ y}. The deﬁning properties of equivalence can be rephrased as: a. x ∈ Ex , b. If y ∈ Ex , then x ∈ Ey , and c. If y ∈ Ex , and z ∈ Ey , then z ∈ Ex . These conditions guarantee that different equivalence classes are disjoint and the collection of all the equivalence classes is a partition of X (which deﬁnes the given equivalence relation). EXERCISES FOR SECTION A.1

A.1.1. Write R1 ⊂ R × R = {(x, y) : |x − y| < 1} and x ∼1 y when (x, y) ∈ R1 . Is this an equivalence relation, and if not—what fails? A.1.2. Identify the equivalence classes for congruence mod k. A.2 MAPS

The terms used to describe properties of maps vary by author, by time, by subject matter, etc. We shall use the following: A map ϕ : X → Y is injective if x1 = x2 =⇒ ϕ(x1 ) = ϕ(x2 ). Equivalent terminology: ϕ is one-to-one (or 1–1), or ϕ is a monomorphism. A map ϕ : X → Y is surjective if ϕ(X) = {ϕ(x) : x ∈ X} = Y . Equivalent terminology: ϕ is onto, or ϕ is an epimorphism. A map ϕ : X → Y is bijective if it is both injective and surjective: for every y ∈ Y there is precisely one x ∈ X such that y = ϕ(x). Bijective maps are invertible—the inverse map deifed by: ϕ−1 (y) = x if y = ϕ(x). Maps that preserve some structure are called morphisms, often with a preﬁx providing additional information. Besides the mono- and epi- mentioned above, we use systematically homomorphism, isomorphism, etc. A permutation of a set is a bijective map of the set onto itself.

JANUARY 1, 2006

—DRAFT—

A PPENDIX

139

A.3

GROUPS

A.3.1 D EFINITION : A group is a pair (G, ∗), where G is a set and ∗ is a binary operation (x, y) → x ∗ y, deﬁned for all pairs (x, y) ∈ G × G, taking values in G, and satisfying the following conditions: G-1 The operation is associative: For x, y, z ∈ G, (x ∗ y) ∗ z = x ∗ (y ∗ z). G-2 There exists a unique element e ∈ G called the identity element or the unit of G, such that e ∗ x = x ∗ e = x for all x ∈ G. G-3 For every x ∈ G there exists a unique element x−1 , called the inverse of x, such that x−1 ∗ x = x ∗ x−1 = e. A group (G, ∗) is Abelian, or commutative if x ∗ y = y ∗ x for all x and y. The group operation in a commutative group is often written and referred to as addition, in which case the identity element is written as 0, and the inverse of x as −x. When the group operation is written as multiplication, the operation symbol ∗ is sometimes written as a dot (i.e., x · y rather than x ∗ y) and is often omitted altogether. We also simplify the notation by referring to the group, when the binary operation is “assumed known”, as G, rather than (G, ∗). E XAMPLES : a. (Z, +), the integers with standard addition. b. (R \ {0}, ·), the non-zero real numbers, standard multiplication. c. Sn , the symmetric group on [1, . . . , n]. Here n is a positive integer, the elements of Sn are all the permutations σ of the set [1, . . . , n], and the operation is concatenation: for σ, τ ∈ Sn and 1 ≤ j ≤ n we set (τ σ)(j) = τ (σ(j)). More generally, if X is a set, the collection S(X) of permutations, i.e., invertible self-maps of X, is a group under concatenation. (Thus Sn = S([1, . . . , n])). The ﬁrst two examples are commutative; the third, if n > 2, is not. —DRAFT—

JANUARY 1, 2006

140

L INEAR A LGEBRA

A.3.2

Let Gi , i = 1, 2, be groups. A map ϕ : G1 → G2 is a homomorphism if ϕ(xy) = ϕ(x)ϕ(y)

D EFINITION : (A.3.1)

Notice that the multiplication on the left-hand side is in G1 , while that on the right-hand side is in G2 . The deﬁnition of homomorphism is quite broad; we don’t assume the mapping to be injective (1-1), nor surjective (onto). We use the proper adjectives explicitly whenever relevant: monomorphism for injective homomorphism and epimorphism for one that is surjective. An isomorphism is a homomorphism which is bijective, that is both injective and surjective. Bijective maps are invertible, and the inverse of an isomorphism is an isomorphism. For the proof we only have to show that ϕ−1 is multiplicative (as in (A.3.1)), that is that for g, h ∈ G2 , ϕ−1 (gh) = ϕ−1 (g)ϕ−1 (h). But, if g = ϕ(x) and h = ϕ(y), this is equivalent to gh = ϕ(xy), which is the multiplicativity of ϕ. If ϕ : G1 → G2 and ψ : G2 → G3 are both isomorphisms, then ψϕ : G1 → G3 is an isomorphism as well.. We say that two groups G and G1 are isomorphic if there is an isomorphism of one onto the other. The discussion above makes it clear that this is an equivalence relation. A.3.3 I NNER AUTOMORPHISMS AND CONJUGACY CLASSES . An isomorphism of a group onto itself is called an automorphism. A special class of automorphisms, the inner automorphisms, are the conjugations by elements y ∈ G: (A.3.2) Ay x = y −1 xy

One checks easily (left as exercise) that for all y ∈ G, the map Ay is in fact an automorphism. An important fact is that conjugacy, deﬁned by x ∼ z if z = Ay x = y −1 xy for some y ∈ G, is an equivalence relation. To check that every x is conjugate

JANUARY 1, 2006

—DRAFT—

A PPENDIX

141

to itself take y = e, the identity. If z = Ay x, then x = Ay−1 z, proving the symmetry. Finally, if z = y −1 xy and u = w−1 zw, then u = w−1 zw = w−1 y −1 xyw = (yw)−1 x(yw), which proves the transitivity. The equivalence classes deﬁned on G by conjugation are called conjugacy classes. A.3.4 S UBGROUPS AND COSETS . A subgroup of a group G is a subset H ⊂ G such that

D EFINITION :

SG-1 H is closed under multiplication, that is, if h1 , h2 ∈ H then h1 h2 ∈ H. SG-2 e ∈ H. SG-3 If h ∈ H, then h−1 ∈ H E XAMPLES : a. {e}, the subset whose only term is the identity element b. In Z, the set qZ of all the integral multiples of some integer q. This is a special case of the following example. c. For any x ∈ G, the set {xk }k∈Z is the subgroup generated by x. The element x is of order m, if the group it generates is a cyclic group of order m. (That is if m is the smallest positive integer for which xm = e). x has inﬁnite order if {xn } is inﬁnite, in which case n → xn is an isomorphism of Z onto the group generated by x. d. If ϕ : G → G1 is a homomorphism and e1 denotes the identity in G1 , then {g ∈ G : ϕg = e1 } is a subgroup of G (the kernel of ϕ). e. The subset of Sn of all the permutations that leave some (ﬁxed) l ∈ [1, . . . , n] in its place, that is, {σ ∈ Sn : σ(l) = l}. Let H ⊂ G a subgroup. For x ∈ G write xH = {xz : z ∈ H}. Sets of the form xH are called left cosets of H. —DRAFT—

JANUARY 1, 2006

142

L INEAR A LGEBRA

Lemma. For any x, y ∈ G the cosets xH and yH are either identical or disjoint. In other words, the collection of distinct xH is a partition of G. P ROOF : We check that the binary relation deﬁned by “x ∈ yH” which is usually denoted by x ≡ y (mod H), is an equivalence relation. The cosets xH are the elements of the associated partition. a. Reﬂexive: x ∈ xH, since x = xe and e ∈ H. b. Symmetric: If y ∈ xH then x ∈ yH. y ∈ xH means that there exists z ∈ H, such that y = xz. But then yz −1 = x, and since z −1 ∈ H, x ∈ yH. c. Transitive: If w ∈ yH and y ∈ xH, then w ∈ xH. For appropriate z1 , z2 ∈ H, y = xz1 and w = yz2 = xz1 z2 , and z1 z2 ∈ H. EXERCISES FOR SECTION A.3

A.3.1. Check that, for any group G and every y ∈ G, the map Ay x = y −1 xy is an automorphism of G. A.3.2. Let G be a ﬁnite group of order m. Let H ⊂ G be a subgroup. Prove that the order of H divides m. A.4 GROUP ACTIONS

A.4.1 ACTIONS . D EFINITION : An action of G on X is a homomorphism ϕ of G into S(X), the group of invertible self-maps (permutations) of X. The action deﬁnes a map (g, x) → ϕ(g)x. The notation ϕ(g)x often replaced, when ϕ is “understood”, by the simpler gx, and the assumption that ϕ is a homomorphism is equivalent to the conditions: ga1. ex = x for all x ∈ X, (e is the identity element of G). ga2. (g1 g2 )x = g1 (g2 x) for all gj ∈ G, x ∈ X. E XAMPLES : a. G acts on itself (X = G) by left multiplication: (x, y) → xy. b. G acts on itself (X = G) by right multiplication (by the inverse): (x, y) → yx−1 . (Remember that (ab)−1 = b−1 a−1 )

JANUARY 1, 2006

—DRAFT—

A PPENDIX

143

c. G acts on itself by conjugation: (x, y) → ϕ(x)y where ϕ(x)y = xyx−1 . d. Sn acts as mappings on {1, . . . , n}. A.4.2 O RBITS . The orbit of an element x ∈ X under the action of a group G is the set Orb (x) = {gx : g ∈ G}. The orbits of a G action form a partition of X. This means that any two orbits, Orb (x1 ) and Orb (x2 ) are either identical (as sets) or disjoint. In fact, −1 −1 if x ∈ Orb (y), then x = g0 y and then y = g0 x, and gy = gg0 x. Since −1 the set {gg0 : g ∈ G} is exactly G, we have Orb (y) = Orb (x). If x ∈ ˜ Orb (x1 ) ∩ Orb (x2 ) then Orb (˜) = Orb (x1 ) = Orb (x2 ). The corresponding x equivalence relation is: x ≡ y when Orb (x) = Orb (y). E XAMPLES : a. A subgroup H ⊂ G acts on G by right multiplication: (h, g) → gh. The orbit of g ∈ G under this action is the (left) coset gH. b. Sn acts on [1, . . . , n], (σ, j) → σ(j). Since the action is transitive, there is a unique orbit—[1, . . . , n]. c. If σ ∈ Sn , the group (σ) (generated by σ) is the subgroup {σ k } of all the powers of σ. Orbit of elements a ∈ [1, . . . , n] under the action of (σ), i.e. the set {σ k (a)}, are called cycles of σ and are written (a1 , . . . , al ), where aj+1 = σ(aj ), and l, the period of a1 under σ, is the ﬁrst positive integer such that σ l (a1 ) = a1 . Notice that cycles are “enriched orbits”, that is orbits with some additional structure, here the cyclic order inherited from Z. This cyclic order deﬁnes σ uniquely on the orbit, and is identiﬁed with the permutation that agrees with σ on the elements that appear in it, and leaves every other element in its place. For example, (1, 2, 5) is the permutation that maps 1 to 2, maps 2 to 5, and 5 to 1, leaving every other element unchanged. Notice that n, the cardinality of the complete set on which Sn acts, does not enter the notation and is in fact irrelevant (provided that all the entries in the cycle are bounded by it; here n ≥ 5). Thus, breaking [1, . . . , n] into σ-orbits amounts to writing σ as a product of disjoint cycles. —DRAFT—

JANUARY 1, 2006

144

L INEAR A LGEBRA

A.4.3 C ONJUGATION . Two actions of a group G, ϕ1 : G × X1 → X1 , and ϕ2 : G × X2 → X2 are conjugate to each other if there is an invertible map Ψ : X1 → X2 such that for all x ∈ G and y ∈ X1 , (A.4.1) ϕ2 (x)Ψy = Ψ(ϕ1 (x)y) or, equivalently, ϕ2 = Ψϕ1 Ψ−1 .

This is often stated as: the following diagrams commute X1 − − → X1 −−

Ψ ϕ2 ϕ1

X1 − − → X1 −− or

Ψ−1

ϕ1

Ψ

ϕ2

Ψ

X2 − − → X2 −−

X2 − − → X2 −−

meaning that the concatenation of maps associated with arrows along a path depends only on the starting and the end point, and not on the path chosen.

A.5 FIELDS, RINGS, AND ALGEBRAS

A.5.1

F IELDS .

D EFINITION : A (commutative) ﬁeld, (F, +, ·) is a set F endowed with two binary operations, addition: (a, b) → a + b, and multiplication: (a, b) → a · b (we often write ab instead of a · b) such that: F-1 (F, +) is a commutative group, its identity (zero) is denoted by 0. F-2 (F \ {0}, ·) is a commutative group, whose identity is denoted 1, and a · 0 = 0 · a = 0 for all a ∈ F. F-3 Addition and multiplication are related by the distributive law: a(b + c) = ab + ac. E XAMPLES : a. Q, the ﬁeld of rational numbers. b. R, the ﬁeld of real numbers. c. C, the ﬁeld of complex numbers.

JANUARY 1, 2006

—DRAFT—

A PPENDIX

145

d. Z2 denotes the ﬁeld consisting of the two elements 0, 1, with addition and multiplication deﬁned mod 2 (so that 1 + 1 = 0). Similarly, if p is a prime, the set Zp of residue classes mod p, with addition and multiplication mod p, is a ﬁeld. (See exercise I.5.2.) A.5.2 R INGS .

D EFINITION : A ring is a triplet (R, +, ·), R is a set, + and · binary operations on R called addition, resp. multiplication, such that (R, +) is a commutative group, the multiplication is associative (but not necessarily commutative), and the addition and multiplication are related by the distributive laws: a(b + c) = ab + ac, and (b + c)a = ba + ca.

A subring R1 of a ring R is a subset of R that is a ring under the operations induced by the ring operations, i.e., addition and multiplication, in R. Z is an example of a commutative ring with a multiplicative identity; 2Z, (the even integers), is a subring. 2Z is an example of a commutative ring without a multiplicative identity. A.5.3 A LGEBRAS .

D EFINITION : An Algebra over a ﬁeld F is a ring A and a multiplication of elements of A by scalars (elements of F), that is, a map F × A → A such that if we denote the image of (a, u) by au we have, for a, b ∈ F and u, v ∈ A, identity: associativity: distributivity: 1u = u; a(bu) = (ab)u, a(uv) = (au)v; (a + b)u = au + bu, and a(u + v) = au + av.

A subalgebra A1 ⊂ A is a subring of A that is also closed under multiplication by scalars. E XAMPLES : a. F[x] – The algebra of polynomials in one variable x with coefﬁcients from F, and the standard addition, multiplication, and multiplication by scalars. It is an algebra over F. —DRAFT—

JANUARY 1, 2006

146

L INEAR A LGEBRA

b. C[x, y] – The (algebra of) polynomials in two variables x, y with complex coefﬁcients, and the standard operations. C[x, y] is “complex algebra”, that is an algebra over C. Notice that by restricting the scalar ﬁeld to, say, R, a complex algebra can be viewed as a “real algebra” i.e., and algebra over R. The underlying ﬁeld is part of the deﬁnition of an algebra. The “complex” and the “real” C[x, y] are diﬀerent algebras. c. M(n), the n × n matrices with matrix multiplication as product. D EFINITION : A left (resp. right) ideal in a ring R is a subring I that is closed under multiplication on the left (resp. right) by elements of R: for a ∈ R and h ∈ I we have ah ∈ I (resp. ha ∈ I). A two-sided ideal is a subring that is both a left ideal and a right ideal. A left (resp. right, resp. two-sided) ideal in an algebra A is a subalgebra of A that is closed under left (resp. right, resp, either left or right) multiplication by elements of A. If the ring (resp. algebra) is commutative the adjectives “left”, “right” are irrelevant. Assume that R has an identity element. For g ∈ R, the set Ig = {ag : a ∈ R} is a left ideal in R, and is clearly the smallest (left) ideal that contains g. Ideals of the form Ig are called principal left ideals, and g a generator of Ig . One deﬁnes principal right ideals similarly. A.5.4 Z AS A RING . Notice that since multiplication by an integer can be accomplished by repeated addition, the ring Z has the (uncommon) property that every subgroup in it is in fact an ideal. Another special property is: Z is a principal ideal domain—every nontriv1 ideal I ⊂ Z is principal, that is, has the form mZ for some positive integer ial m. In fact if m is the smallest positive element of I and n ∈ I, n > 0, we can “divide with remainder” n = qm + r with q, r integers, and 0 ≤ r < m. Since both n and qm are in I so is r. Since m is the smallest positive element in I,

1

Not reduced to {0}.

JANUARY 1, 2006

—DRAFT—

A PPENDIX

147

r = 0 and n = qm. Thus, all the positive elements of I are divisible by m (and so are their negatives). If mj ∈ Z, j = 1, 2, the set Im1 ,m2 = {n1 m1 + n2 m2 : n1 , n2 ∈ Z} is an ideal in Z, and hence has the form gZ. As g divides every element in Im1 ,m2 , it divides both m1 and m2 ; as g = n1 m1 + n2 m2 for appropriate nj , every common divisor of m1 and m2 divides g. It follows that g is their greatest common divisor, g = gcd(m1 , m2 ). We summarize: Proposition. If m1 and m2 are integers, then for appropriate integers n1 , n2 , gcd(m1 , m2 ) = n1 m1 + n2 m2 . EXERCISES FOR SECTION A.5

A.5.1. Let R be a ring with identity, B ⊂ R a set. Prove that the ideal generated by B, that is the smallest ideal that contains B, is: I = { aj bj : aj ∈ R. bj ∈ B}. A.5.2. Verify that Zp is a ﬁeld. Hint: If p is a prime and 0 < m < p then gcd(m, p) = 1. A.5.3. Prove that the set of invertible elements in a ring with and identity is a multiplicative group. A.5.4. Show that the set of polynomials {P : P = j≥2 aj xj } is an ideal in F[x], and that {P : P = j≤7 aj xj } is an additive subgroup but not an ideal. A.6 POLYNOMIALS

Let F be a ﬁeld and F[x] the algebra of polynomials P = n aj xj in the 0 variable x with coefﬁcients from F. The degree of P , deg(P ), is the highest power of x appearing in P with non-zero coefﬁcient. If deg(P ) = n, then an xn is called the leading term of P , and an the leading coeﬃcient. A polynomial is called monic if its leading coefﬁcient is 1. A.6.1 D IVISION WITH REMAINDER . By deﬁnition, an ideal in a ring is principal if it consists of all the multiples of one of its elements, called a generator of the ideal. The ring F[x] shares with Z the property of being a principal ideal domain—every ideal is principal. The proof for F[x] is virtually the same as the one we had for Z, and is again based on division with remainder. —DRAFT—

JANUARY 1, 2006

148

L INEAR A LGEBRA

Theorem. Let P, F ∈ F[x]. There exist polynomials Q, R ∈ F[x] such that deg(R) < deg(F ), and (A.6.1) P = QF + R.

P ROOF : Write P = n aj xj and F = m bj xj with an = 0 and bm = 0, j=0 j=0 so that deg(P ) = n, deg(F ) = m. If n < m there is nothing to prove: P = 0 · F + P . If n ≥ m, we write qn−m = an /bm , and P1 = P − qn−m xn−m F , so that P = qn−m xn−m F + P1 with n1 = deg(P1 ) < n. If n1 < m we are done. If n1 ≥ m, write the leading term of P1 as a1,n1 xn1 , and set qn1 −m = a1,n1 /bm , and P2 = P1 − qn1 −m xn1 −m F . Now deg(P2 ) < deg(P1 ) and P = (qn−m xn−m + qn1 −m xn1 −m )F + P2 . Repeating the procedure a total of k times, k ≤ n − m + 1, we obtain P = QF + Pk with deg(Pk ) < m, and the statement follows with R = Pk . Corollary. Let I ⊂ F[x] be an ideal, and let P0 be an element of minimal degree in I. Then P0 is a generator for I. P ROOF : If P ∈ I, write P = QP0 + R, with deg(R) < deg(P0 ). Since R = P − QP0 ∈ I, and 0 is the only element of I whose degree is smaller than deg(P0 ), P = QP0 . The generator P0 is unique up to multiplication by a scalar. If P1 is another generator, each of the two divides the other and since the degree has to be the same the quotients are scalars. It follows that if we normalize P0 by requiring that it be monic, that is with leading coefﬁcient 1, it is unique and we refer to it as the generator. A.6.2 Given polynomials Pj , j = 1, . . . , l any ideal that contains them all must contain all the polynomials P = qj Pj with arbitrary polynomial coefﬁcients qj . On the other hand the set of all theses sums is clearly an ideal in F[x]. It follows that the ideal generated by {Pj } is equal to the set of polynomials of the form P = qj Pj with polynomial coefﬁcients qj . The generator G of this ideal divides every one of the Pj ’s, and, since G can be expressed as qj Pj , every common factor of all the Pj ’s divides G. In

JANUARY 1, 2006

—DRAFT—

A PPENDIX

149

other words, G = gcd{P1 , . . . , Pl }, the greatest common divisor of {Pj }. This implies Theorem. Given polynomials Pj , j = 1, . . . , l there exist polynomials qj such that gcd{P1 , . . . , Pl } = qj Pj . In particular: Corollary. If P1 and P2 are relatively prime, there exist polynomials q1 , q2 such that P1 q1 + P2 q2 = 1. A.6.3 FACTORIZATION . A polynomial P in F[x] is irreducible or prime if it has no proper factors, that is, if every factor of P is either scalar multiple of P or a scalar. Lemma. If gcd(P, P1 ) = 1 and P P1 P2 , then P P2 . P ROOF : There exist q, q1 such that qP + q1 P1 = 1. Then the left-hand side of qP P2 + q1 P1 P2 = P2 is divisible by P , and hence so is P2 . Theorem (Prime power factorization). Every P ∈ F[x] admits a factorization m P = Φj j , where each factor Φj is irreducible in F[x], and they are all distinct. The factorization is unique up to the order in which the factors are enumerated, and up to multiplication by non-zero scalars.

A.6.4 T HE FUNDAMENTAL THEOREM OF ALGEBRA . A ﬁeld F is algebraically closed if it has the property that every P ∈ F[x] has roots in F, that is elements λ ∈ F such that P (λ) = 0. The so-called fundamental theorem of algebra states that C is algebraically closed. Theorem. Given a non-constant polynomial P with complex coefﬁcients, there exist complex numbers λ such that P (λ) = 0. —DRAFT—

JANUARY 1, 2006

150

L INEAR A LGEBRA

A.6.5 We now observe that P (λ) = 0 is equivalent to the statement that (z − λ) divides P . By Theorem A.6.1, P (z) = (z − λ)Q(z) + R with deg R smaller than deg (z − λ) = 1, so that R is a constant. Evaluating P (z) = (z−λ)Q(z)+R at z = λ shows that R = P (λ), hence the claimed equivalence. It follows that a non-constant polynomial P ∈ C[z] is prime if and only if it is linear. Theorem. Let P ∈ C[z] be a polynomial of degree n. There exist complex numbers λ1 , . . . , λn , (not necessarily distinct), and a = 0 (the leading coefﬁcient of P ), such that

n

(A.6.2)

P (z) = a

1

(z − λj ).

The theorem and its proof apply verbatim to polynomials over any algebraically closed ﬁeld. A.6.6 FACTORIZATION IN R[x]. The factorization (A.6.2) applies, of course, to polynomials with real coefﬁcients, but the roots need not be real. The basic example is P (x) = x2 + 1 with the roots ±i. We observe that for polynomials P whose coefﬁcients are all real, we have ¯ = P (λ), which means in particular that if λ is a root of P then so is λ. ¯ P (λ) A second observation is that (A.6.3) ¯ (x − λ)(x − λ) = x2 − 2x λ + |λ|2

has real coefﬁcients. Combining these observations with (A.6.2) we obtain that the prime factors in R[x] are the linear polynomials and the quadratic of the form (A.6.3) where λ ∈ R. Theorem. Let P ∈ R[x] be a polynomial of degree n. P admits a factorization (A.6.4) P (z) = a (x − λj ) Qj (x),

where a is the leading coefﬁcient, {λj } is the set of real zeros of P and Qj are irreducible quadratic polynomials of the form (A.6.3) corresponding to (pairs of conjugate) non-real roots of P .

JANUARY 1, 2006

—DRAFT—

A PPENDIX

151

Either product may be empty, in which case it is interpreted as 1. As mentioned above, the factors appearing in (A.6.4) need not be distinct— the same factor may be repeated several times. We can rewrite the product as (A.6.5) P (z) = a (x − λj )lj Qj j (x),

k

with λj and Qj now distinct, and the exponents lj resp. kj their multiplicities. k The factors (x − λj )lj and Qj j (x) appearing in (A.6.5) are pairwise relatively prime.

—DRAFT—

JANUARY 1, 2006

152

L INEAR A LGEBRA

JANUARY 1, 2006

—DRAFT—

Index

Adjoint matrix of, 103 of an operator, 103 Afﬁne subspace, 21 Algebra, 145 Alternating n-form, 56 Annihilator, 45 Basic partition, 125 Basis, 10 dual, 44 standard, 11 Bilinear map, 54 Bilinear form, 44, 47 Canonical prime-power decomposition, 78 Cauchy–Schwarz, 96 Characteristic polynomial of a matrix, 62 of an operator, 60 Codimension, 12 Complement, 6 Coset, 141 Cycle, 51 Cyclic decomposition, 86 system, 70 vector, 70 Decomposition cyclic, 86

Determinant of a matrix, 61 of an operator, 58 Diagonal sum, 76, 80 Dimension, 11, 12 Direct sum formal, 5 of subspaces, 6 Eigenspace, 60, 66 generalized, 79, 90 Eigenvalue, 49, 60, 66 Eigenvector, 49, 60, 66 Elementary divisor, 88 Equivalence relation, 137 Euclidean space, 95 Factorization in R[x], 150 prime-power, 78, 79, 149 Field, 1, 144 Flag, 68 Frobenius, 125 Gaussian elimination, 16, 18 Group, 1, 139 Hadamard’s inequality, 101 Hamilton-Cayley, 71 Hermitian form, 95 quadratic form, 102

153

154

L INEAR A LGEBRA

Ideal, 146 Idempotent, 101 Independent subspaces, 5 vectors, 9 Inertia, law of, 118 Inner-product, 95 Irreducible polynomial, 149 system, 75 Isomorphism, 3 Jordan canonical form, 88, 90 Kernel, 35 Ladder, 68 Linear system, 65 Linear equations homogeneous, 14 non-homogeneous, 14 Markov chain, 127 reversible, 128 Matrix orthogonal, 104 unitary , 104 augmented, 16 companion, 72 diagonal, 5 Hermitian, 103 nonnegative, 121 permutation, 30 positive, 118 self-adjoint, 103 stochastic, 126 strongly transitive, 125 transitive, 122 JANUARY 1, 2006

triangular, 5, 63, 68 Minimal system, 74 Minimal polynomial, 72 for (T,v), 70 Minmax principle, 108 Monic polynomial, 147 Multilinear form, 54 map, 53 Nilpotent, 83 Nilspace, 79, 90 Nonsingular, 37 Norm, 39 Normal operator, 109 Nullity, 35 Nullspace, 35 Operator nonnegative, 111 normal, 109 orthogonal, 104 positive, 111 self-adjoint, 105 unitary, 104 Orientation, 59 Orthogonal operator, 104 projection, 99 vectors, 97 Orthogonal equivalence, 105 Orthonormal, 98 Period group, 125 Permutation, 51, 138 Perron, 118 Polarization, 101

—DRAFT—

I NDEX

155

Primary components, 78 Probability vector, 127 Projection along a subspace, 24 orthogonal, 99 Quadratic form positive, 117 Quadratic forms, 115 Quotient space, 6 Range, 35 Rank column, 20 of a matrix, 21 of an operator, 35 row, 17 Reducing subspace, 75 Regular representation, 133 Ring, 145 Row echelon form, 18 Row equivalence, 17 Schur’s lemma, 74 Self-adjoint algebra, 110 matrix, 108 operator, 105 Semisimple, 81 k-shift , 84 Similar, 35 Similarity, 34 Solution-set, 4 Span, 5, 9 Spectral mapping theorem, 66 Spectral norm, 118 Spectral Theorems, 106–110 Spectrum, 60, 66, 79 Steinitz’ lemma, 11

Symmetric group, 51 Tensor product, 7 Trace, 62 Transition matrix, 127 Transposition, 51 Unitary operator, 104 space, 95 Unitary equivalence, 105 Vandermonde, 64 Vector space, 1 complex, 1 real, 1

—DRAFT—

JANUARY 1, 2006

Symbols

C, 1 Q, 1 R, 1

χT , 60

Cv , 25 Cw,v , 33 dim V, 11 {e1 , . . . , en }, 10 Fn , 2 F[x], 2 GL(H) , 131 GL(V) , 27 height[v], 83 M(n; F), 2 M(n, m; F), 2 minPT , 72 minPT,v , 69 O(n), 104 P(T ), 27, 82 Sn , 51 span[E], 5 span[T, v], 66 sp , 118 TW , 68 U (n), 104

156