**Przeglądaj wersję html pliku:**

### Elementary linear algebra (ang)

ELEMENTARY LINEAR ALGEBRA

K. R. MATTHEWS

DEPARTMENT OF MATHEMATICS

UNIVERSITY OF QUEENSLAND

Second Online Version, December 1998 Comments to the author at krm@maths.uq.edu.au

Contents

1 LINEAR EQUATIONS 1.1 Introduction to linear equations . . . . 1.2 Solving linear equations . . . . . . . . 1.3 The Gauss–Jordan algorithm . . . . . 1.4 Systematic solution of linear systems. 1.5 Homogeneous systems . . . . . . . . . 1.6 PROBLEMS . . . . . . . . . . . . . . 2 MATRICES 2.1 Matrix arithmetic . . . . 2.2 Linear transformations . 2.3 Recurrence relations . . 2.4 PROBLEMS . . . . . . 2.5 Non–singular matrices . 2.6 Least squares solution of 2.7 PROBLEMS . . . . . . 3 SUBSPACES 3.1 Introduction . . . . 3.2 Subspaces of F n . 3.3 Linear dependence 3.4 Basis of a subspace 3.5 Rank and nullity of 3.6 PROBLEMS . . . 1 1 6 8 9 16 17 23 23 27 31 33 36 47 49 55 55 55 58 61 64 67

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . equations . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . a matrix . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

4 DETERMINANTS 71 4.1 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 i

5 COMPLEX NUMBERS 5.1 Constructing the complex numbers 5.2 Calculating with complex numbers 5.3 Geometric representation of C . . . 5.4 Complex conjugate . . . . . . . . . 5.5 Modulus of a complex number . . 5.6 Argument of a complex number . . 5.7 De Moivre’s theorem . . . . . . . . 5.8 PROBLEMS . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

89 . 89 . 91 . 95 . 96 . 99 . 103 . 107 . 111

6 EIGENVALUES AND EIGENVECTORS 115 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2 Deﬁnitions and examples . . . . . . . . . . . . . . . . . . . . . 118 6.3 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7 Identifying second degree equations 129 7.1 The eigenvalue method . . . . . . . . . . . . . . . . . . . . . . 129 7.2 A classiﬁcation algorithm . . . . . . . . . . . . . . . . . . . . 141 7.3 PROBLEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8 THREE–DIMENSIONAL GEOMETRY 8.1 Introduction . . . . . . . . . . . . . . . . . 8.2 Three–dimensional space . . . . . . . . . . 8.3 Dot product . . . . . . . . . . . . . . . . . 8.4 Lines . . . . . . . . . . . . . . . . . . . . . 8.5 The angle between two vectors . . . . . . 8.6 The cross–product of two vectors . . . . . 8.7 Planes . . . . . . . . . . . . . . . . . . . . 8.8 PROBLEMS . . . . . . . . . . . . . . . . 9 FURTHER READING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 . 149 . 154 . 156 . 161 . 166 . 172 . 176 . 185 189

ii

List of Figures

1.1 2.1 2.2 4.1 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6.1 7.1 7.2 7.3 7.4 7.5 7.6 7.7 8.1 8.2 8.3 8.4 8.5 Gauss–Jordan algorithm . . . . . . . . . . . . . . . . . . . . . Reﬂection in a line . . . . . . . . . . . . . . . . . . . . . . . . Projection on a line . . . . . . . . . . . . . . . . . . . . . . . Area of triangle OP Q. . . . . . . . . . . . . . . . . . . . . . . Complex addition and subtraction Complex conjugate . . . . . . . . . Modulus of a complex number . . Apollonius circles . . . . . . . . . . Argument of a complex number . . Argument examples . . . . . . . . The nth roots of unity. . . . . . . . The roots of z n = a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 29 30 72 96 97 99 101 104 105 108 109

Rotating the axes . . . . . . . . . . . . . . . . . . . . . . . . . 116 An ellipse example . . . . . . . . . . . ellipse: standard form . . . . . . . . . hyperbola: standard forms . . . . . . . parabola: standard forms (i) and (ii) . parabola: standard forms (iii) and (iv) 1st parabola example . . . . . . . . . . 2nd parabola example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 137 138 138 139 140 141

Equality and addition of vectors . . . . . . . . . . . . . . . . 150 Scalar multiplication of vectors. . . . . . . . . . . . . . . . . . 151 Representation of three–dimensional space . . . . . . . . . . . 155 The vector AB. . . . . . . . . . . . . . . . . . . . . . . . . . . 155 The negative of a vector. . . . . . . . . . . . . . . . . . . . . . 157 iii

E

1 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18 8.19 (a) Equality of vectors; (b) Addition and subtraction of vectors.157 Position vector as a linear combination of i, j and k. . . . . . 158 Representation of a line. . . . . . . . . . . . . . . . . . . . . . 162 The line AB. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 The cosine rule for a triangle. . . . . . . . . . . . . . . . . . . 167 Pythagoras’ theorem for a right–angled triangle. . . . . . . . 168 Distance from a point to a line. . . . . . . . . . . . . . . . . . 169 Projecting a segment onto a line. . . . . . . . . . . . . . . . . 171 The vector cross–product. . . . . . . . . . . . . . . . . . . . . 174 Vector equation for the plane ABC. . . . . . . . . . . . . . . 177 Normal equation of the plane ABC. . . . . . . . . . . . . . . 178 The plane ax + by + cz = d. . . . . . . . . . . . . . . . . . . . 179 Line of intersection of two planes. . . . . . . . . . . . . . . . . 182 Distance from a point to the plane ax + by + cz = d. . . . . . 184

2

Chapter 1

LINEAR EQUATIONS

1.1 Introduction to linear equations

A linear equation in n unknowns x1 , x2 , · · · , xn is an equation of the form a1 x1 + a2 x2 + · · · + an xn = b, where a1 , a2 , . . . , an , b are given real numbers. For example, with x and y instead of x1 and x2 , the linear equation 2x + 3y = 6 describes the line passing through the points (3, 0) and (0, 2). Similarly, with x, y and z instead of x1 , x2 and x3 , the linear equation 2x + 3y + 4z = 12 describes the plane passing through the points (6, 0, 0), (0, 4, 0), (0, 0, 3). A system of m linear equations in n unknowns x1 , x2 , · · · , xn is a family of linear equations

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2 . . .

am1 x1 + am2 x2 + · · · + amn xn = bm . We wish to determine if such a system has a solution, that is to ﬁnd out if there exist numbers x1 , x2 , · · · , xn which satisfy each of the equations simultaneously. We say that the system is consistent if it has a solution. Otherwise the system is called inconsistent. 1

2

CHAPTER 1. LINEAR EQUATIONS Note that the above system can be written concisely as

n

aij xj = bi ,

j=1

i = 1, 2, · · · , m.

The matrix

a11 a21 . . .

a12 a22

··· ···

a1n a2n . . .

am1 am2 · · · amn

is called the augmented matrix of the system. Geometrically, solving a system of linear equations in two (or three) unknowns is equivalent to determining whether or not a family of lines (or planes) has a common point of intersection. EXAMPLE 1.1.1 Solve the equation 2x + 3y = 6. Solution. The equation 2x + 3y = 6 is equivalent to 2x = 6 − 3y or 3 x = 3 − 2 y, where y is arbitrary. So there are inﬁnitely many solutions. EXAMPLE 1.1.2 Solve the system x+y+z = 1 x − y + z = 0. Solution. We subtract the second equation from the ﬁrst, to get 2y = 1 1 and y = 2 . Then x = y − z = 1 − z, where z is arbitrary. Again there are 2 inﬁnitely many solutions. EXAMPLE 1.1.3 Find a polynomial of the form y = a0 +a1 x+a2 x2 +a3 x3 which passes through the points (−3, −2), (−1, 2), (1, 5), (2, 1).

is called the coeﬃcient matrix of the system, while the matrix a11 a12 · · · a1n b1 a21 a22 · · · a2n b2 . . . . . . . . . am1 am2 · · · amn bm

1.1. INTRODUCTION TO LINEAR EQUATIONS

3

Solution. When x has the values −3, −1, 1, 2, then y takes corresponding values −2, 2, 5, 1 and we get four equations in the unknowns a0 , a1 , a2 , a3 : a0 − 3a1 + 9a2 − 27a3 = −2 a0 + a1 + a2 + a3 = 5 a0 − a1 + a2 − a3 = 2

a0 + 2a1 + 4a2 + 8a3 = 1. This system has the unique solution a0 = 93/20, a1 = 221/120, a2 = −23/20, a3 = −41/120. So the required polynomial is y = 93 221 23 41 3 + x − x2 − x . 20 120 20 120

In [26, pages 33–35] there are examples of systems of linear equations which arise from simple electrical networks using Kirchhoﬀ’s laws for electrical circuits. Solving a system consisting of a single linear equation is easy. However if we are dealing with two or more equations, it is desirable to have a systematic method of determining if the system is consistent and to ﬁnd all solutions. Instead of restricting ourselves to linear equations with rational or real coeﬃcients, our theory goes over to the more general case where the coefﬁcients belong to an arbitrary ﬁeld. A ﬁeld F is a set F which possesses operations of addition and multiplication which satisfy the familiar rules of rational arithmetic. There are ten basic properties that a ﬁeld must have: THE FIELD AXIOMS. 1. (a + b) + c = a + (b + c) for all a, b, c in F ; 2. (ab)c = a(bc) for all a, b, c in F ; 3. a + b = b + a for all a, b in F ; 4. ab = ba for all a, b in F ; 5. there exists an element 0 in F such that 0 + a = a for all a in F ; 6. there exists an element 1 in F such that 1a = a for all a in F ;

4

CHAPTER 1. LINEAR EQUATIONS 7. to every a in F , there corresponds an additive inverse −a in F , satisfying a + (−a) = 0; 8. to every non–zero a in F , there corresponds a multiplicative inverse a−1 in F , satisfying aa−1 = 1; 9. a(b + c) = ab + ac for all a, b, c in F ; 10. 0 = 1.

a With standard deﬁnitions such as a − b = a + (−b) and = ab−1 for b b = 0, we have the following familiar rules: −(a + b) = (−a) + (−b), (ab)−1 = a−1 b−1 ; −(−a) = a, (a−1 )−1 = a; a b −(a − b) = b − a, ( )−1 = ; b a ad + bc a c + = ; b d bd ac ac = ; bd bd ac b a ab = ; = , b ac c b c −(ab) = (−a)b = a(−b); a a −a − = ; = b b −b 0a = 0; (−a)−1 = −(a−1 ). Fields which have only ﬁnitely many elements are of great interest in many parts of mathematics and its applications, for example to coding theory. It is easy to construct ﬁelds containing exactly p elements, where p is a prime number. First we must explain the idea of modular addition and modular multiplication. If a is an integer, we deﬁne a (mod p) to be the least remainder on dividing a by p: That is, if a = bp + r, where b and r are integers and 0 ≤ r < p, then a (mod p) = r. For example, −1 (mod 2) = 1, 3 (mod 3) = 0, 5 (mod 3) = 2.

1.1. INTRODUCTION TO LINEAR EQUATIONS Then addition and multiplication mod p are deﬁned by a ⊕ b = (a + b) (mod p) a ⊗ b = (ab) (mod p).

5

For example, with p = 7, we have 3 ⊕ 4 = 7 (mod 7) = 0 and 3 ⊗ 5 = 15 (mod 7) = 1. Here are the complete addition and multiplication tables mod 7: ⊕ 0 1 2 3 4 5 6 0 0 1 2 3 4 5 6 1 1 2 3 4 5 6 0 2 2 3 4 5 6 0 1 3 3 4 5 6 0 1 2 4 4 5 6 0 1 2 3 5 5 6 0 1 2 3 4 6 6 0 1 2 3 4 5 ⊗ 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 2 0 2 4 6 1 3 5 3 0 3 6 2 5 1 4 4 0 4 1 5 2 6 3 5 0 5 3 1 6 4 2 6 0 6 5 4 3 2 1

If we now let Zp = {0, 1, . . . , p − 1}, then it can be proved that Zp forms a ﬁeld under the operations of modular addition and multiplication mod p. For example, the additive inverse of 3 in Z7 is 4, so we write −3 = 4 when calculating in Z7 . Also the multiplicative inverse of 3 in Z7 is 5 , so we write 3−1 = 5 when calculating in Z7 . In practice, we write a ⊕ b and a ⊗ b as a + b and ab or a × b when dealing with linear equations over Zp . The simplest ﬁeld is Z2 , which consists of two elements 0, 1 with addition satisfying 1 + 1 = 0. So in Z2 , −1 = 1 and the arithmetic involved in solving equations over Z2 is very simple. EXAMPLE 1.1.4 Solve the following system over Z2 : x+y+z = 0 x + z = 1. Solution. We add the ﬁrst equation to the second to get y = 1. Then x = 1 − z = 1 + z, with z arbitrary. Hence the solutions are (x, y, z) = (1, 1, 0) and (0, 1, 1). We use Q and R to denote the ﬁelds of rational and real numbers, respectively. Unless otherwise stated, the ﬁeld used will be Q.

6

CHAPTER 1. LINEAR EQUATIONS

1.2

Solving linear equations

We show how to solve any system of linear equations over an arbitrary ﬁeld, using the GAUSS–JORDAN algorithm. We ﬁrst need to deﬁne some terms. DEFINITION 1.2.1 (Row–echelon form) A matrix is in row–echelon form if (i) all zero rows (if any) are at the bottom of the matrix and (ii) if two successive rows are non–zero, the second row starts with more zeros than the ﬁrst (moving from left to right). For example, the matrix 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0

is not in row–echelon form.

is in row–echelon form, whereas the matrix 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0

The zero matrix of any size is always in row–echelon form. DEFINITION 1.2.2 (Reduced row–echelon form) A matrix is in reduced row–echelon form if 1. it is in row–echelon form, 2. the leading (leftmost non–zero) entry in each non–zero row is 1, 3. all other elements of the column in which the leading entry 1 occurs are zeros. For example the matrices 1 0 0 1 0 0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 2 3 4 0

and

1.2. SOLVING LINEAR EQUATIONS are in reduced row–echelon 1 0 0 1 0 0 form, whereas the matrices 1 2 0 0 0 and 0 1 0 0 0 0 2

7

are not in reduced row–echelon form, but are in row–echelon form. The zero matrix of any size is always in reduced row–echelon form.

Notation. If a matrix is in reduced row–echelon form, it is useful to denote the column numbers in which the leading entries 1 occur, by c1 , c2 , . . . , cr , with the remaining column numbers being denoted by cr+1 , . . . , cn , where r is the number of non–zero rows. For example, in the 4 × 6 matrix above, we have r = 3, c1 = 2, c2 = 4, c3 = 5, c4 = 1, c5 = 3, c6 = 6. The following operations are the ones used on systems of linear equations and do not change the solutions. DEFINITION 1.2.3 (Elementary row operations) There are three types of elementary row operations that can be performed on matrices: 1. Interchanging two rows: Ri ↔ Rj interchanges rows i and j. 2. Multiplying a row by a non–zero scalar: Ri → tRi multiplies row i by the non–zero scalar t. 3. Adding a multiple of one row to another row: Rj → Rj + tRi adds t times row i to row j. DEFINITION 1.2.4 [Row equivalence]Matrix A is row–equivalent to matrix B if B is obtained from A by a sequence of elementary row operations. EXAMPLE 1.2.1 Working from left to 1 2 0 1 1 R2 → R2 + 2R3 A= 2 1 −1 2 1 2 0 R2 ↔ R3 1 −1 2 R1 → 2R1 4 −1 5 right, 1 2 4 −1 1 −1 2 4 1 −1 4 −1

0 5 2 0 2 = B. 5

8

CHAPTER 1. LINEAR EQUATIONS

Thus A is row–equivalent to B. Clearly B is also row–equivalent to A, by 1 performing the inverse row–operations R1 → 2 R1 , R2 ↔ R3 , R2 → R2 −2R3 on B. It is not diﬃcult to prove that if A and B are row–equivalent augmented matrices of two systems of linear equations, then the two systems have the same solution sets – a solution of the one system is a solution of the other. For example the systems whose augmented matrices are A and B in the above example are respectively 2x + 4y = 0 x + 2y = 0 x−y = 2 2x + y = 1 and 4x − y = 5 x−y = 2 and these systems have precisely the same solutions.

1.3

The Gauss–Jordan algorithm

We now describe the GAUSS–JORDAN ALGORITHM. This is a process which starts with a given matrix A and produces a matrix B in reduced row– echelon form, which is row–equivalent to A. If A is the augmented matrix of a system of linear equations, then B will be a much simpler matrix than A from which the consistency or inconsistency of the corresponding system is immediately apparent and in fact the complete solution of the system can be read oﬀ. STEP 1. Find the ﬁrst non–zero column moving from left to right, (column c1 ) and select a non–zero entry from this column. By interchanging rows, if necessary, ensure that the ﬁrst entry in this column is non–zero. Multiply row 1 by the multiplicative inverse of a1c1 thereby converting a1c1 to 1. For each non–zero element aic1 , i > 1, (if any) in column c1 , add −aic1 times row 1 to row i, thereby ensuring that all elements in column c1 , apart from the ﬁrst, are zero. STEP 2. If the matrix obtained at Step 1 has its 2nd, . . . , mth rows all zero, the matrix is in reduced row–echelon form. Otherwise suppose that the ﬁrst column which has a non–zero element in the rows below the ﬁrst is column c2 . Then c1 < c2 . By interchanging rows below the ﬁrst, if necessary, ensure that a2c2 is non–zero. Then convert a2c2 to 1 and by adding suitable multiples of row 2 to the remaing rows, where necessary, ensure that all remaining elements in column c2 are zero.

1.4. SYSTEMATIC SOLUTION OF LINEAR SYSTEMS.

9

The process is repeated and will eventually stop after r steps, either because we run out of rows, or because we run out of non–zero columns. In general, the ﬁnal matrix will be in reduced row–echelon form and will have r non–zero rows, with leading entries 1 in columns c1 , . . . , cr , respectively. EXAMPLE 1.3.1 2 2 −2 5 0 0 4 0 2 2 −2 5 R1 ↔ R2 0 0 4 0 5 5 −1 5 5 5 −1 5 1 1 1 −1 5 2 1 R3 → R3 − 5R1 0 0 0 4 0 R1 → 2 R1 5 5 −1 5 0 5 1 1 −1 2 R1 → R1 + R2 1 1 0 R2 → 4 R2 0 0 R3 → R3 − 4R2 0 0 4 − 15 2 1 1 0 5 1 2 5 R3 → −2 R3 0 0 1 0 R1 → R1 − 2 R3 0 15 0 0 0 1 0 The last matrix is in reduced row–echelon form.

5 1 −1 2 0 4 0 15 0 4 −2 5 1 1 0 2 0 0 1 0 15 0 0 0 −2 1 0 0 0 1 0 0 0 1

REMARK 1.3.1 It is possible to show that a given matrix over an arbitrary ﬁeld is row–equivalent to precisely one matrix which is in reduced row–echelon form. A ﬂow–chart for the Gauss–Jordan algorithm, based on [1, page 83] is presented in ﬁgure 1.1 below.

1.4

Systematic solution of linear systems.

Suppose a system of m linear equations in n unknowns x1 , · · · , xn has augmented matrix A and that A is row–equivalent to a matrix B which is in reduced row–echelon form, via the Gauss–Jordan algorithm. Then A and B are m × (n + 1). Suppose that B has r non–zero rows and that the leading entry 1 in row i occurs in column number ci , for 1 ≤ i ≤ r. Then 1 ≤ c1 < c2 < · · · , < cr ≤ n + 1.

10

CHAPTER 1. LINEAR EQUATIONS START

c

Input A, m, n

c

i = 1, j = 1

E c c '

Are the elements in the jth column on and below the ith row all zero?

d

No

c

Let apj be the ﬁrst non–zero element in column j on or below the ith row

d Yes d d d

j =j+1

T

Is j = n? E No Yes

c

Is p = i?

T

Yes

c

q No ¨¨

Interchange the pth and ith rows

¨¨ % ¨¨ c

¨

Divide the ith row by aij

c

Subtract aqj times the ith row from the qth row for for q = 1, . . . , m (q = i)

c

Set ci = j

c

Yes

E

T

i=i+1 j =j+1

Is i = m? No C No Yes E ' Is j = n?

Print A, c1 , . . . , ci

c

STOP

Figure 1.1: Gauss–Jordan algorithm.

1.4. SYSTEMATIC SOLUTION OF LINEAR SYSTEMS.

11

Also assume that the remaining column numbers are cr+1 , · · · , cn+1 , where 1 ≤ cr+1 < cr+2 < · · · < cn ≤ n + 1. Case 1: cr = n + 1. The system is inconsistent. For the last non–zero row of B is [0, 0, · · · , 1] and the corresponding equation is 0x1 + 0x2 + · · · + 0xn = 1, which has no solutions. Consequently the original system has no solutions. Case 2: cr ≤ n. The system of equations corresponding to the non–zero rows of B is consistent. First notice that r ≤ n here. If r = n, then c1 = 1, c2 = 2, · · · , cn = n and 1 0 · · · 0 d1 0 1 · · · 0 d2 . . . . . . B = 0 0 · · · 1 dn . 0 0 ··· 0 0 . . . . . . 0 0 ··· 0 0 If r < n, there will be more than one solution (inﬁnitely many if the ﬁeld is inﬁnite). For all solutions are obtained by taking the unknowns xc1 , · · · , xcr as dependent unknowns and using the r equations corresponding to the non–zero rows of B to express these unknowns in terms of the remaining independent unknowns xcr+1 , . . . , xcn , which can take on arbitrary values: xc1 xcr = b1 n+1 − b1cr+1 xcr+1 − · · · − b1cn xcn . . . = br n+1 − brcr+1 xcr+1 − · · · − brcn xcn . There is a unique solution x1 = d1 , x2 = d2 , · · · , xn = dn .

In particular, taking xcr+1 = 0, . . . , xcn−1 = 0 and xcn = 0, 1 respectively, produces at least two solutions. EXAMPLE 1.4.1 Solve the system x+y = 0 4x + 2y = 1. x−y = 1

12

CHAPTER 1. LINEAR EQUATIONS

Solution. The augmented matrix of the system is 1 1 0 A = 1 −1 1 4 2 1 which is row equivalent to 1 1 0 2 B = 0 1 −1 . 2 0 0 0

1 We read oﬀ the unique solution x = 1 , y = − 2 . 2 (Here n = 2, r = 2, c1 = 1, c2 = 2. Also cr = c2 = 2 < 3 = n + 1 and r = n.)

EXAMPLE 1.4.2 Solve the system 2x1 + 2x2 − 2x3 = 5 7x1 + 7x2 + x3 = 10 5x1 + 5x2 − x3 = 5. Solution. The augmented matrix is 2 2 −2 5 1 10 A= 7 7 5 5 −1 5 which is row equivalent to 1 1 0 0 B = 0 0 1 0 . 0 0 0 1

We read oﬀ inconsistency for the original system. (Here n = 3, r = 3, c1 = 1, c2 = 3. Also cr = c3 = 4 = n + 1.) EXAMPLE 1.4.3 Solve the system x1 − x2 + x3 = 1

x1 + x2 − x3 = 2.

1.4. SYSTEMATIC SOLUTION OF LINEAR SYSTEMS. Solution. The augmented matrix is A= which is row equivalent to B= 1 0 0 0 1 −1

3 2 1 2

13

1 −1 1 1 1 1 −1 2

.

The complete solution is x1 = 3 , x2 = 1 + x3 , with x3 arbitrary. 2 2 (Here n = 3, r = 2, c1 = 1, c2 = 2. Also cr = c2 = 2 < 4 = n + 1 and r < n.) EXAMPLE 1.4.4 Solve the system 6x3 + 2x4 − 4x5 − 8x6 = 8 2x1 − 3x2 + x3 + 4x4 − 7x5 + x6 = 2 Solution. The augmented matrix is 0 0 6 2 −4 −8 8 0 0 3 1 −2 −4 4 A= 2 −3 1 4 −7 1 2 6 −9 0 11 −19 3 1 which is row equivalent to 1 −3 0 2 0 0 1 B= 0 0 0 0 0 0 The complete solution is x1 = x3 = x6 =

11 6 1 3

3x3 + x4 − 2x5 − 4x6 = 4

6x1 − 9x2 + 11x4 − 19x5 + 3x6 = 1.

− 19 6 −2 3 0 0 0 0

0 0 1 0

1 24 5 3 1 4

0

.

3 11 1 24 + 2 x2 − 6 x4 5 1 2 3 − 3 x4 + 3 x5 , 1 4,

+

19 6 x5 ,

with x2 , x4 , x5 arbitrary. (Here n = 6, r = 3, c1 = 1, c2 = 3, c3 = 6; cr = c3 = 6 < 7 = n + 1; r < n.)

14

CHAPTER 1. LINEAR EQUATIONS

EXAMPLE 1.4.5 Find the rational number t for which the following system is consistent and solve the system for this value of t. x+y = 2 3x − y = t. Solution. The augmented matrix of the system is 1 1 2 A = 1 −1 0 3 −1 t x−y = 0

Hence if t = 2 the system is inconsistent. If t = 2 the system is consistent and 1 0 1 1 1 2 B = 0 1 1 → 0 1 1 . 0 0 0 0 0 0 We read oﬀ the solution x = 1, y = 1. EXAMPLE 1.4.6 For which rationals a and b does the following system have (i) no solution, (ii) a unique solution, (iii) inﬁnitely many solutions? x − 2y + 3z = 4

which is row–equivalent to the simpler matrix 1 1 2 1 . B= 0 1 0 0 t−2

2x − 3y + az = 5 Solution. The augmented matrix of the 1 −2 2 −3 A= 3 −4

3x − 4y + 5z = b. system is 3 4 a 5 5 b

1.4. SYSTEMATIC SOLUTION OF LINEAR SYSTEMS. 1 −2 3 4 R2 → R2 − 2R1 0 1 a−6 −3 R3 → R3 − 3R1 0 2 −4 b − 12 1 −2 3 4 1 a−6 −3 = B. R3 → R3 − 2R2 0 0 0 −2a + 8 b − 6

15

Case 1. a = 4. Then −2a + 8 = 0 and we see that B can be reduced to a matrix of the form 1 0 0 u 0 1 0 v b−6 0 0 1 −2a+8 and we have the unique solution x = u, y = v, z = (b − 6)/(−2a + 8). Case 2. a = 4. Then 1 −2 3 4 1 −2 −3 . B= 0 0 0 0 b−6 If b = 6 we get no solution, whereas if b = 6 then 1 0 −1 −2 1 −2 3 4 1 −2 −3 R1 → R1 + 2R2 0 1 −2 −3 . We B = 0 0 0 0 0 0 0 0 0 read oﬀ the complete solution x = −2 + z, y = −3 + 2z, with z arbitrary. EXAMPLE 1.4.7 Find the reduced row–echelon form of the following matrix over Z3 : 2 1 2 1 . 2 2 1 0 Hence solve the system 2x + y + 2z = 1 2x + 2y + z = 0 over Z3 . Solution.

16 2 1 2 1 2 2 1 0 R1 → 2R1 R2 → R2 − R1 1 2 1 2 0 1 2 2

CHAPTER 1. LINEAR EQUATIONS 2 1 2 1 0 1 −1 −1 = 2 1 2 1 0 1 2 2

R1 → R 1 + R 2

1 0 0 1 . 0 1 2 2

The last matrix is in reduced row–echelon form. To solve the system of equations whose augmented matrix is the given matrix over Z3 , we see from the reduced row–echelon form that x = 1 and y = 2 − 2z = 2 + z, where z = 0, 1, 2. Hence there are three solutions to the given system of linear equations: (x, y, z) = (1, 2, 0), (1, 0, 1) and (1, 1, 2).

1.5

Homogeneous systems

a11 x1 + a12 x2 + · · · + a1n xn = 0

A system of homogeneous linear equations is a system of the form

a21 x1 + a22 x2 + · · · + a2n xn = 0 . . .

am1 x1 + am2 x2 + · · · + amn xn = 0. Such a system is always consistent as x1 = 0, · · · , xn = 0 is a solution. This solution is called the trivial solution. Any other solution is called a non–trivial solution. For example the homogeneous system x−y = 0

x+y = 0

has only the trivial solution, whereas the homogeneous system x−y+z = 0

x+y+z = 0

has the complete solution x = −z, y = 0, z arbitrary. In particular, taking z = 1 gives the non–trivial solution x = −1, y = 0, z = 1. There is simple but fundamental theorem concerning homogeneous systems. THEOREM 1.5.1 A homogeneous system of m linear equations in n unknowns always has a non–trivial solution if m < n.

1.6. PROBLEMS

17

Proof. Suppose that m < n and that the coeﬃcient matrix of the system is row–equivalent to B, a matrix in reduced row–echelon form. Let r be the number of non–zero rows in B. Then r ≤ m < n and hence n − r > 0 and so the number n − r of arbitrary unknowns is in fact positive. Taking one of these unknowns to be 1 gives a non–trivial solution. REMARK 1.5.1 Let two systems of homogeneous equations in n unknowns have coeﬃcient matrices A and B, respectively. If each row of B is a linear combination of the rows of A (i.e. a sum of multiples of the rows of A) and each row of A is a linear combination of the rows of B, then it is easy to prove that the two systems have identical solutions. The converse is true, but is not easy to prove. Similarly if A and B have the same reduced row–echelon form, apart from possibly zero rows, then the two systems have identical solutions and conversely. There is a similar situation in the case of two systems of linear equations (not necessarily homogeneous), with the proviso that in the statement of the converse, the extra condition that both the systems are consistent, is needed.

1.6

PROBLEMS

in reduced row–echelon 0 1 0 0 (c) 0 0 1 0 0 1 0 −2 0 0 0 0 0 0 1 2 (f) 0 0 0 1 0 0 0 0

1. Which of the following matrices of rationals is form? 0 1 0 0 5 1 0 0 0 −3 0 −4 4 (b) 0 0 1 (a) 0 0 1 0 0 0 0 −1 3 0 0 0 1 2 1 2 0 0 0 0 1 0 0 2 0 0 1 0 0 0 0 0 0 −1 (e) (d) 0 0 0 0 1 0 0 0 1 4 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 2 (g) 0 0 0 1 −1 . [Answers: (a), (e), (g)] 0 0 0 0 0

2. Find reduced row–echelon forms which are row–equivalent to the following matrices: 2 0 0 1 1 1 0 0 0 0 1 3 (a) (b) (c) 1 1 0 (d) 0 0 0 . 2 4 0 1 2 4 −4 0 0 1 0 0

18 [Answers: (a) 1 2 0 0 0 0 (b) 1 0 −2 0 1 3

CHAPTER 1. LINEAR EQUATIONS

3. Solve the following systems of linear equations by reducing the augmented matrix to reduced row–echelon form: (a) x+y+z = 2 2x + 3y − z = 8 x − y − z = −8 3x − y + 7z 2x − y + 4z x−y+z 6x − 4y + 10z = = = = 0 1 3

19 4 , 1 2

1 0 0 (c) 0 1 0 0 0 1

1 0 0 (d) 0 0 0 .] 0 0 0

(b)

x1 + x2 − x3 + 2x4 = 10 3x1 − x2 + 7x3 + 4x4 = 1 −5x1 + 3x2 − 15x3 − 6x4 = 9 2x2 + 3x3 − 4x4 2x3 + 3x4 2x1 + 2x2 − 5x3 + 2x4 2x1 − 6x3 + 9x4 = = = = 1 4 4 7

(c)

(d)

(c) x = − 1 − 3z, y = − 3 − 2z, with z arbitrary; 2 2 (d) x1 =

19 2 5 − 9x4 , x2 = − 2 + 17 4 x4 ,

[Answers: (a) x = −3, y =

1 z = 4 ; (b) inconsistent;

x3 = 2 − 3 x4 , with x4 arbitrary.] 2

4. Show that the following system is consistent if and only if c = 2a − 3b and solve the system in this case. 2x − y + 3z = a

−5x − 5y + 21z = c. [Answer: x =

a+b 5

3x + y − 5z = b

19 5 z,

+ 2 z, y = 5

−3a+2b 5

+

with z arbitrary.]

5. Find the value of t for which the following system is consistent and solve the system for this value of t. x+y = 1 tx + y = t (1 + t)x + 2y = 3. [Answer: t = 2; x = 1, y = 0.]

1.6. PROBLEMS 6. Solve the homogeneous system −3x1 + x2 + x3 + x4 = 0 x1 + x2 − 3x3 + x4 = 0 x1 − 3x2 + x3 + x4 = 0

19

x1 + x2 + x3 − 3x4 = 0. [Answer: x1 = x2 = x3 = x4 , with x4 arbitrary.] 7. For which rational numbers λ does the homogeneous system x + (λ − 3)y = 0

(λ − 3)x + y = 0 have a non–trivial solution? [Answer: λ = 2, 4.] 8. Solve the homogeneous system

3x1 + x2 + x3 + x4 = 0 5x1 − x2 + x3 − x4 = 0.

1 [Answer: x1 = − 1 x3 , x2 = − 4 x3 − x4 , with x3 and x4 arbitrary.] 4

9. Let A be the coeﬃcient matrix of the following homogeneous system of n equations in n unknowns: (1 − n)x1 + x2 + · · · + xn = 0 ··· = 0

x1 + (1 − n)x2 + · · · + xn = 0

x1 + x2 + · · · + (1 − n)xn = 0. Find the reduced row–echelon form of A and hence, or otherwise, prove that the solution of the above system is x1 = x2 = · · · = xn , with xn arbitrary. 10. Let A = a c 1 equivalent to 0 whose second row b be a matrix over a ﬁeld F . Prove that A is row– d 0 if ad − bc = 0, but is row–equivalent to a matrix 1 is zero, if ad − bc = 0.

20

CHAPTER 1. LINEAR EQUATIONS

11. For which rational numbers a does the following system have (i) no solutions (ii) exactly one solution (iii) inﬁnitely many solutions? x + 2y − 3z = 4

4x + y + (a2 − 14)z = a + 2. [Answer: a = −4, no solution; a = 4, inﬁnitely many solutions; a = ±4, exactly one solution.] 12. Solve the following system of homogeneous equations over Z2 : x1 + x3 + x5 = 0 x2 + x4 + x5 = 0 x1 + x2 + x3 + x4 = 0 x3 + x4 = 0. [Answer: x1 = x2 = x4 + x5 , x3 = x4 , with x4 and x5 arbitrary elements of Z2 .] 13. Solve the following systems of linear equations over Z5 : (a) 2x + y + 3z = 4 4x + y + 4z = 1 3x + y + 2z = 0 (b) 2x + y + 3z = 4 4x + y + 4z = 1 x + y = 3.

3x − y + 5z = 2

[Answer: (a) x = 1, y = 2, z = 0; (b) x = 1 + 2z, y = 2 + 3z, with z an arbitrary element of Z5 .] 14. If (α1 , . . . , αn ) and (β1 , . . . , βn ) are solutions of a system of linear equations, prove that ((1 − t)α1 + tβ1 , . . . , (1 − t)αn + tβn ) is also a solution. 15. If (α1 , . . . , αn ) is a solution of a system of linear equations, prove that the complete solution is given by x1 = α1 + y1 , . . . , xn = αn + yn , where (y1 , . . . , yn ) is the general solution of the associated homogeneous system.

1.6. PROBLEMS

21

16. Find the values of a and b for which the following system is consistent. Also ﬁnd the complete solution when a = b = 2. x+y−z+w = 1 aw = 1 + a.

ax + y + z + w = b 3x + 2y +

[Answer: a = 2 or a = 2 = b; x = 1 − 2z, y = 3z − w, with z, w arbitrary.] 17. Let F = {0, 1, a, b} be a ﬁeld consisting of 4 elements. (a) Determine the addition and multiplication tables of F . (Hint: Prove that the elements 1 + 0, 1 + 1, 1 + a, 1 + b are distinct and deduce that 1 + 1 + 1 + 1 = 0; then deduce that 1 + 1 = 0.) (b) A matrix A, whose elements belong 1 a A= a b 1 1 prove that the reduced row–echelon 1 0 B= 0 1 0 0 to F , is deﬁned by b a b 1 , 1 a

form of A is given by the matrix 0 0 0 b . 1 1

22

CHAPTER 1. LINEAR EQUATIONS

Chapter 2

MATRICES

2.1 Matrix arithmetic

A matrix over a ﬁeld F is a rectangular array of elements from F . The symbol Mm×n (F ) denotes the collection of all m × n matrices over F . Matrices will usually be denoted by capital letters and the equation A = [aij ] means that the element in the i–th row and j–th column of the matrix A equals aij . It is also occasionally convenient to write aij = (A)ij . For the present, all matrices will have rational entries, unless otherwise stated. EXAMPLE 2.1.1 The formula aij = 1/(i + j) for 1 ≤ i ≤ 3, 1 ≤ j ≤ 4 deﬁnes a 3 × 4 matrix A = [aij ], namely 1 1 1 1

2 3 4 5

DEFINITION 2.1.1 (Equality of matrices) Matrices A and B are said to be equal if A and B have the same size and corresponding elements are equal; that is A and B ∈ Mm×n (F ) and A = [aij ], B = [bij ], with aij = bij for 1 ≤ i ≤ m, 1 ≤ j ≤ n. DEFINITION 2.1.2 (Addition of matrices) Let A = [aij ] and B = [bij ] be of the same size. Then A + B is the matrix obtained by adding corresponding elements of A and B; that is A + B = [aij ] + [bij ] = [aij + bij ]. 23

A=

1 3 1 4

1 4 1 5

1 5 1 6

1 6 1 7

.

24

CHAPTER 2. MATRICES

DEFINITION 2.1.3 (Scalar multiple of a matrix) Let A = [aij ] and t ∈ F (that is t is a scalar). Then tA is the matrix obtained by multiplying all elements of A by t; that is tA = t[aij ] = [taij ]. DEFINITION 2.1.4 (Additive inverse of a matrix) Let A = [aij ] . Then −A is the matrix obtained by replacing the elements of A by their additive inverses; that is −A = −[aij ] = [−aij ]. DEFINITION 2.1.5 (Subtraction of matrices) Matrix subtraction is deﬁned for two matrices A = [aij ] and B = [bij ] of the same size, in the usual way; that is A − B = [aij ] − [bij ] = [aij − bij ]. DEFINITION 2.1.6 (The zero matrix) For each m, n the matrix in Mm×n (F ), all of whose elements are zero, is called the zero matrix (of size m × n) and is denoted by the symbol 0. The matrix operations of addition, scalar multiplication, additive inverse and subtraction satisfy the usual laws of arithmetic. (In what follows, s and t will be arbitrary scalars and A, B, C are matrices of the same size.) 1. (A + B) + C = A + (B + C); 2. A + B = B + A; 3. 0 + A = A; 4. A + (−A) = 0; 5. (s + t)A = sA + tA, (s − t)A = sA − tA; 6. t(A + B) = tA + tB, t(A − B) = tA − tB; 7. s(tA) = (st)A; 8. 1A = A, 0A = 0, (−1)A = −A; 9. tA = 0 ⇒ t = 0 or A = 0. Other similar properties will be used when needed.

2.1. MATRIX ARITHMETIC

25

DEFINITION 2.1.7 (Matrix product) Let A = [aij ] be a matrix of size m × n and B = [bjk ] be a matrix of size n × p; (that is the number of columns of A equals the number of rows of B). Then AB is the m × p matrix C = [cik ] whose (i, k)–th element is deﬁned by the formula

n

cik =

j=1

aij bjk = ai1 b1k + · · · + ain bnk .

EXAMPLE 2.1.2 1. 2. 3. 4. 5. 1 2 3 4 5 6 7 8 1 2 3 4 1 −1 1 −1 5 6 7 8 1 2 3 4 3 4 1 2 = = 1 −1 1 −1 = = 1×5+2×7 1×6+2×8 3×5+4×7 3×6+4×8 23 34 31 46 ; ; = 0 0 . 0 0 = 1 2 3 4 5 6 7 8 = ; 19 22 43 50 ;

3 4 6 8 11

Matrix multiplication obeys many of the familiar laws of arithmetic apart from the commutative law. 1. (AB)C = A(BC) if A, B, C are m × n, n × p, p × q, respectively; 2. t(AB) = (tA)B = A(tB), A(−B) = (−A)B = −(AB); 3. (A + B)C = AC + BC if A and B are m × n and C is n × p; We prove the associative law only: First observe that (AB)C and A(BC) are both of size m × q. Let A = [aij ], B = [bjk ], C = [ckl ]. Then

p p n

4. D(A + B) = DA + DB if A and B are m × n and D is p × m.

((AB)C)il =

(AB)ik ckl =

n

k=1 p

k=1

j=1

aij bjk ckl

=

k=1 j=1

aij bjk ckl .

26 Similarly (A(BC))il =

j=1 k=1

CHAPTER 2. MATRICES

n

p

aij bjk ckl .

However the double summations are equal. For sums of the form

n p p n

djk

j=1 k=1

and

k=1 j=1

djk

represent the sum of the np elements of the rectangular array [djk ], by rows and by columns, respectively. Consequently ((AB)C)il = (A(BC))il for 1 ≤ i ≤ m, 1 ≤ l ≤ q. Hence (AB)C = A(BC). The system of m linear equations in n unknowns a21 x1 + a22 x2 + · · · + a2n xn = b2 . . . a11 x1 + a12 x2 + · · · + a1n xn = b1

am1 x1 + am2 x2 + · · · + amn xn = bm is equivalent to a single matrix equation b1 x1 a11 a12 · · · a1n a21 a22 · · · a2n x2 b2 . . . = . , . . . . . . . . bm xn am1 am2 · · · amn that is AX x1 x2 X = . . .

xn constants. Another useful matrix equation equivalent to the equations is a1n a12 a11 a2n a22 a21 x1 . + x2 . + · · · + xn . . . . . . . am1 am2 amn

= B, where A = [aij ] is the coeﬃcient matrix the system, of b1 b2 is the vector of unknowns and B = . is the vector of . . bm above system of linear b1 b2 . . . bm

=

.

2.2. LINEAR TRANSFORMATIONS EXAMPLE 2.1.3 The system x+y+z = 1 x − y + z = 0. is equivalent to the matrix equation 1 1 1 1 −1 1 and to the equation x 1 1 +y 1 −1 x y = z 1 1 1 0

27

+z

=

1 0

.

2.2

Linear transformations

An n–dimensional column vector is an n × 1 matrix over F . The collection of all n–dimensional column vectors is denoted by F n . Every matrix is associated with an important type of function called a linear transformation. DEFINITION 2.2.1 (Linear transformation) With A ∈ Mm×n (F ), we associate the function TA : F n → F m deﬁned by TA (X) = AX for all X ∈ F n . More explicitly, using components, the above function takes the form y1 = a11 x1 + a12 x2 + · · · + a1n xn

y2 = a21 x1 + a22 x2 + · · · + a2n xn . . .

ym = am1 x1 + am2 x2 + · · · + amn xn , where y1 , y2 , · · · , ym are the components of the column vector TA (X). The function just deﬁned has the property that TA (sX + tY ) = sTA (X) + tTA (Y ) for all s, t ∈ F and all n–dimensional column vectors X, Y . For TA (sX + tY ) = A(sX + tY ) = s(AX) + t(AY ) = sTA (X) + tTA (Y ). (2.1)

28

CHAPTER 2. MATRICES

REMARK 2.2.1 It is easy to prove that if T : F n → F m is a function satisfying equation 2.1, then T = TA , where A is the m × n matrix whose columns are T (E1 ), . . . , T (En ), respectively, where E1 , . . . , En are the n– dimensional unit vectors deﬁned by 0 1 0 0 E1 = . , . . . , En = . . . . . . 0 1 One well–known example of a linear transformation arises from rotating the (x, y)–plane in 2-dimensional Euclidean space, anticlockwise through θ radians. Here a point (x, y) will be transformed into the point (x1 , y1 ), where x1 = x cos θ − y sin θ

y1 = x sin θ + y cos θ.

In 3–dimensional Euclidean space, the equations x1 = x cos θ − y sin θ, y1 = x sin θ + y cos θ, z1 = z; x1 = x, y1 = y cos φ − z sin φ, z1 = y sin φ + z cos φ; x1 = x cos ψ − z sin ψ, y1 = y, z1 = x sin ψ + z cos ψ; correspond to rotations about the positive z, x, y–axes, anticlockwise through θ, φ, ψ radians, respectively. The product of two matrices is related to the product of the corresponding linear transformations: If A is m×n and B is n×p, then the function TA TB : F p → F m , obtained by ﬁrst performing TB , then TA is in fact equal to the linear transformation TAB . For if X ∈ F p , we have TA TB (X) = A(BX) = (AB)X = TAB (X). The following example is useful for producing rotations in 3–dimensional animated design. (See [27, pages 97–112].) EXAMPLE 2.2.1 The linear transformation resulting from successively rotating 3–dimensional space about the positive z, x, y–axes, anticlockwise through θ, φ, ψ radians respectively, is equal to TABC , where

2.2. LINEAR TRANSFORMATIONS

d d

(x, y)

d d d d l

29

θ

d (x1 , y1 )

Figure 2.1: Reﬂection in a line.

cos θ − sin θ 0 1 0 0 C = sin θ cos θ 0 , B = 0 cos φ − sin φ . 0 0 1 0 sin φ cos φ cos ψ 0 − sin ψ . 1 0 A= 0 sin ψ 0 cos ψ

The matrix ABC is quite complicated: cos θ − sin θ 0 cos ψ 0 − sin ψ cos φ sin θ cos φ cos θ − sin φ 1 0 A(BC) = 0 sin φ sin θ sin φ cos θ cos φ sin ψ 0 cos ψ

cos ψ cos θ − sin ψ sin φ sin θ − cos ψ sin θ − sin ψ sin φ sin θ − sin ψ cos φ . cos φ sin θ cos φ cos θ − sin φ = sin ψ cos θ + cos ψ sin φ sin θ − sin ψ sin θ + cos ψ sin φ cos θ cos ψ cos φ EXAMPLE 2.2.2 Another example of a linear transformation arising from geometry is reﬂection of the plane in a line l inclined at an angle θ to the positive x–axis. We reduce the problem to the simpler case θ = 0, where the equations of transformation are x1 = x, y1 = −y. First rotate the plane clockwise through θ radians, thereby taking l into the x–axis; next reﬂect the plane in the x–axis; then rotate the plane anticlockwise through θ radians, thereby restoring l to its original position.

30

CHAPTER 2. MATRICES (x, y) d l

d d (x , y ) 1 1

θ

Figure 2.2: Projection on a line.

In terms of matrices, we get transformation equations x1 y1 = = = cos θ − sin θ sin θ cos θ cos θ sin θ sin θ − cos θ 1 0 0 −1 cos (−θ) − sin (−θ) sin (−θ) cos (−θ) x y x y

cos 2θ sin 2θ sin 2θ − cos 2θ

cos θ sin θ − sin θ cos θ x y .

The more general transformation x1 y1 =a cos θ − sin θ sin θ cos θ x y + u v , a > 0,

represents a rotation, followed by a scaling and then by a translation. Such transformations are important in computer graphics. See [23, 24]. EXAMPLE 2.2.3 Our last example of a geometrical linear transformation arises from projecting the plane onto a line l through the origin, inclined at angle θ to the positive x–axis. Again we reduce that problem to the simpler case where l is the x–axis and the equations of transformation are x1 = x, y1 = 0. In terms of matrices, we get transformation equations x1 y1 = cos θ − sin θ sin θ cos θ 1 0 0 0 cos (−θ) − sin (−θ) sin (−θ) cos (−θ) x y

2.3. RECURRENCE RELATIONS = = cos θ 0 sin θ 0 cos θ sin θ − sin θ cos θ x y .

31

cos2 θ cos θ sin θ sin θ cos θ sin2 θ

x y

2.3

Recurrence relations

DEFINITION 2.3.1 (The identity matrix) The n × n matrix In = [δij ], deﬁned by δij = 1 if i = j, δij = 0 if i = j, is called the n × n identity matrix of order n. In other words, the columns of the identity matrix of order n are the unit vectors E1 , · · · , En , respectively. For example, I2 = 1 0 . 0 1

THEOREM 2.3.1 If A is m × n, then Im A = A = AIn . DEFINITION 2.3.2 (k–th power of a matrix) If A is an n×n matrix, we deﬁne Ak recursively as follows: A0 = In and Ak+1 = Ak A for k ≥ 0. For example A1 = A0 A = In A = A and hence A2 = A1 A = AA. The usual index laws hold provided AB = BA: 1. Am An = Am+n , (Am )n = Amn ; 2. (AB)n = An B n ; 3. Am B n = B n Am ; 4. (A + B)2 = A2 + 2AB + B 2 ;

n

5. (A + B)n =

i=0

n i

Ai B n−i ;

6. (A + B)(A − B) = A2 − B 2 . We now state a basic property of the natural numbers. AXIOM 2.3.1 (PRINCIPLE OF MATHEMATICAL INDUCTION) If for each n ≥ 1, Pn denotes a mathematical statement and (i) P1 is true,

32

CHAPTER 2. MATRICES

(ii) the truth of Pn implies that of Pn+1 for each n ≥ 1, then Pn is true for all n ≥ 1. EXAMPLE 2.3.1 Let A = An = 7 4 −9 −5 . Prove that

1 + 6n 4n −9n 1 − 6n

if n ≥ 1.

Solution. We use the principle of mathematical induction. Take Pn to be the statement An = Then P1 asserts that A1 = 1+6×1 4×1 −9 × 1 1 − 6 × 1 = 7 4 −9 −5 , 1 + 6n 4n −9n 1 − 6n .

which is true. Now let n ≥ 1 and assume that Pn is true. We have to deduce that An+1 = Now An+1 = An A 1 + 6n 4n = −9n 1 − 6n = = 1 + 6(n + 1) 4(n + 1) −9(n + 1) 1 − 6(n + 1) = 7 + 6n 4n + 4 −9n − 9 −5 − 6n .

7 + 6n 4n + 4 −9n − 9 −5 − 6n

(1 + 6n)7 + (4n)(−9) (1 + 6n)4 + (4n)(−5) (−9n)7 + (1 − 6n)(−9) (−9n)4 + (1 − 6n)(−5) ,

7 4 −9 −5

and “the induction goes through”. The last example has an application to the solution of a system of recurrence relations:

2.4. PROBLEMS

33

EXAMPLE 2.3.2 The following system of recurrence relations holds for all n ≥ 0: xn+1 = 7xn + 4yn yn+1 = −9xn − 5yn . Solve the system for xn and yn in terms of x0 and y0 . Solution. Combine the above equations into a single matrix equation xn+1 yn+1 or Xn+1 = AXn , where A = We see that = 7 4 −9 −5 xn yn , xn yn .

7 4 −9 −5

and Xn =

X1 = AX0 X2 = AX1 = A(AX0 ) = A2 X0 . . . Xn = An X0 . (The truth of the equation Xn = An X0 for n ≥ 1, strictly speaking follows by mathematical induction; however for simple cases such as the above, it is customary to omit the strict proof and supply instead a few lines of motivation for the inductive statement.) Hence the previous example gives xn yn = Xn = = 1 + 6n 4n −9n 1 − 6n x0 y0 ,

(1 + 6n)x0 + (4n)y0 (−9n)x0 + (1 − 6n)y0

and hence xn = (1 + 6n)x0 + 4ny0 and yn = (−9n)x0 + (1 − 6n)y0 , for n ≥ 1.

2.4

PROBLEMS

1. Let A, B, C, D be matrices deﬁned by 1 5 2 3 0 A = −1 2 , B = −1 1 0 , −4 1 3 1 1

34

CHAPTER 2. MATRICES −3 −1 1 , D = C= 2 4 3 4 −1 2 0

.

Which of the following matrices are deﬁned? Compute those matrices which are deﬁned. A + B, A + C, AB, BA, CD, DC, D2 . [Answers: A + C, BA, CD, D2 ; 0 −1 0 12 −14 3 1 3 , −4 2 , 10 −2 , 5 4 −10 5 22 −4 2. Let A = then

14 −4 .] 8 −2

for suitable numbers a and b. Use the associative law to show that (BA)2 B = B. 3. If A = a b , prove that A2 − (a + d)A + (ad − bc)I2 = 0. c d

−1 0 1 . Show that if B is a 3 × 2 such that AB = I2 , 0 1 1 a b B = −a − 1 1 − b a+1 b

4. If A =

4 −3 , use the fact A2 = 4A − 3I2 and mathematical 1 0 induction, to prove that An = (3n − 1) 3 − 3n A+ I2 2 2 if n ≥ 1.

5. A sequence of numbers x1 , x2 , . . . , xn , . . . satisﬁes the recurrence relation xn+1 = axn + bxn−1 for n ≥ 1, where a and b are constants. Prove that xn xn+1 , =A xn−1 xn

2.4. PROBLEMS where A =

35

a b x1 xn+1 . in terms of and hence express 1 0 x0 xn If a = 4 and b = −3, use the previous question to ﬁnd a formula for xn in terms of x1 and x0 . [Answer: xn = 6. Let A = 2a −a2 1 0 . 3 − 3n 3n − 1 x1 + x0 .] 2 2

(a) Prove that An = (n + 1)an −nan+1 nan−1 (1 − n)an if n ≥ 1.

(b) A sequence x0 , x1 , . . . , xn , . . . satisﬁes the recurrence relation xn+1 = 2axn − a2 xn−1 for n ≥ 1. Use part (a) and the previous question to prove that xn = nan−1 x1 + (1 − n)an x0 for n ≥ 1. 7. Let A = a b and suppose that λ1 and λ2 are the roots of the c d quadratic polynomial x2 −(a+d)x+ad−bc. (λ1 and λ2 may be equal.)

n

Let kn be deﬁned by k0 = 0, k1 = 1 and for n ≥ 2 kn =

i=1 n−i λ1 λi−1 . 2

Prove that kn+1 = (λ1 + λ2 )kn − λ1 λ2 kn−1 , if n ≥ 1. Also prove that kn = (λn − λn )/(λ1 − λ2 ) 1 2 n−1 nλ1 if λ1 = λ2 , if λ1 = λ2 .

Use mathematical induction to prove that if n ≥ 1, An = kn A − λ1 λ2 kn−1 I2 , [Hint: Use the equation A2 = (a + d)A − (ad − bc)I2 .]

36 8. Use Question 7 to prove that if A = An = if n ≥ 1. 3n 2 1 1 1 1

CHAPTER 2. MATRICES 1 2 , then 2 1 −1 1 1 −1

+

(−1)n−1 2

9. The Fibonacci numbers are deﬁned by the equations F0 = 0, F1 = 1 and Fn+1 = Fn + Fn−1 if n ≥ 1. Prove that 1 Fn = √ 5 if n ≥ 0. 10. Let r > 1 be an integer. Let a and b be arbitrary positive integers. Sequences xn and yn of positive integers are deﬁned in terms of a and b by the recurrence relations xn+1 = xn + ryn yn+1 = xn + yn , for n ≥ 0, where x0 = a and y0 = b. √ xn → r yn as n → ∞. √ 1+ 5 2

n

−

√ 1− 5 2

n

Use Question 7 to prove that

2.5

Non–singular matrices

DEFINITION 2.5.1 (Non–singular matrix) A square matrix A ∈ Mn×n (F ) is called non–singular or invertible if there exists a matrix B ∈ Mn×n (F ) such that AB = In = BA. Any matrix B with the above property is called an inverse of A. If A does not have an inverse, A is called singular.

2.5. NON–SINGULAR MATRICES THEOREM 2.5.1 (Inverses are unique) If A has inverses B and C, then B = C.

37

Proof. Let B and C be inverses of A. Then AB = In = BA and AC = In = CA. Then B(AC) = BIn = B and (BA)C = In C = C. Hence because B(AC) = (BA)C, we deduce that B = C. REMARK 2.5.1 If A has an inverse, it is denoted by A−1 . So AA−1 = In = A−1 A. Also if A is non–singular, it follows that A−1 is also non–singular and (A−1 )−1 = A. THEOREM 2.5.2 If A and B are non–singular matrices of the same size, then so is AB. Moreover (AB)−1 = B −1 A−1 . Proof. (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In . Similarly (B −1 A−1 )(AB) = In . REMARK 2.5.2 The above result generalizes to a product of m non– singular matrices: If A1 , . . . , Am are non–singular n × n matrices, then the product A1 . . . Am is also non–singular. Moreover (A1 . . . Am )−1 = A−1 . . . A−1 . m 1 (Thus the inverse of the product equals the product of the inverses in the reverse order.) EXAMPLE 2.5.1 If A and B are n × n matrices satisfying A2 = B 2 = (AB)2 = In , prove that AB = BA. Solution. Assume A2 = B 2 = (AB)2 = In . Then A, B, AB are non– singular and A−1 = A, B −1 = B, (AB)−1 = AB. But (AB)−1 = B −1 A−1 and hence AB = BA.

38 EXAMPLE 2.5.2 A =

CHAPTER 2. MATRICES a b c d

1 2 is singular. For suppose B = 4 8 is an inverse of A. Then the equation AB = I2 gives 1 2 4 8 a b c d = 1 0 0 1

and equating the corresponding elements of column 1 of both sides gives the system a + 2c = 1 4a + 8c = 0 which is clearly inconsistent. THEOREM 2.5.3 Let A = non–singular. Also A−1 = ∆−1 d −b −c a . a b c d and ∆ = ad − bc = 0. Then A is

REMARK 2.5.3 The expression ad − bc is called the determinant of A a b . and is denoted by the symbols det A or c d Proof. Verify that the matrix B = ∆−1 AB = I2 = BA. EXAMPLE 2.5.3 Let 0 1 0 A = 0 0 1 . 5 0 0 d −b −c a satisﬁes the equation

Verify that A3 = 5I3 , deduce that A is non–singular and ﬁnd A−1 . Solution. After verifying that A3 = 5I3 , we notice that A 1 2 A 5 = I3 = 1 2 A A. 5

1 Hence A is non–singular and A−1 = 5 A2 .

2.5. NON–SINGULAR MATRICES

39

THEOREM 2.5.4 If the coeﬃcient matrix A of a system of n equations in n unknowns is non–singular, then the system AX = B has the unique solution X = A−1 B. Proof. Assume that A−1 exists. 1. (Uniqueness.) Assume that AX = B. Then (A−1 A)X = A−1 B, In X = A−1 B, X = A−1 B. 2. (Existence.) Let X = A−1 B. Then AX = A(A−1 B) = (AA−1 )B = In B = B. THEOREM 2.5.5 (Cramer’s rule for 2 equations in 2 unknowns) The system ax + by = e cx + dy = f has a unique solution if ∆ = x= where ∆1 = e b f d a b c d ∆1 , ∆ and = 0, namely y= ∆2 , ∆ a e c f .

∆2 =

Proof. Suppose ∆ = 0. Then A = A−1 = ∆−1 and we know that the system A x y

a b c d

has inverse

d −b −c a e f

=

40 has the unique solution x y = A−1 e f = = 1 ∆ 1 ∆ d −b −c a e f =

CHAPTER 2. MATRICES

de − bf −ce + af

1 ∆

∆1 ∆2

=

∆1 /∆ ∆2 /∆

.

Hence x = ∆1 /∆, y = ∆2 /∆. COROLLARY 2.5.1 The homogeneous system ax + by = 0 cx + dy = 0 has only the trivial solution if ∆ = EXAMPLE 2.5.4 The system 7x + 8y = 100 2x − 9y = 10 has the unique solution x = ∆1 /∆, y = ∆2 /∆, where ∆= 7 8 2 −9

980 79

a b c d

= 0.

= −79, ∆1 =

130 79 .

100 8 10 −9

= −980, ∆2 =

7 100 2 10

= −130.

So x =

and y =

THEOREM 2.5.6 Let A be a square matrix. If A is non–singular, the homogeneous system AX = 0 has only the trivial solution. Equivalently, if the homogenous system AX = 0 has a non–trivial solution, then A is singular. Proof. If A is non–singular and AX = 0, then X = A−1 0 = 0. REMARK 2.5.4 If A∗1 , . . . , A∗n denote the columns of A, then the equation AX = x1 A∗1 + . . . + xn A∗n holds. Consequently theorem 2.5.6 tells us that if there exist scalars x1 , . . . , xn , not all zero, such that x1 A∗1 + . . . + xn A∗n = 0,

2.5. NON–SINGULAR MATRICES

41

that is, if the columns of A are linearly dependent, then A is singular. An equivalent way of saying that the columns of A are linearly dependent is that one of the columns of A is expressible as a sum of certain scalar multiples of the remaining columns of A; that is one column is a linear combination of the remaining columns. EXAMPLE 2.5.5 1 2 3 A= 1 0 1 3 4 7

is singular. For it can be veriﬁed that A has reduced row–echelon form 1 0 1 0 1 1 0 0 0

and consequently AX = 0 has a non–trivial solution x = −1, y = −1, z = 1.

REMARK 2.5.5 More generally, if A is row–equivalent to a matrix containing a zero row, then A is singular. For then the homogeneous system AX = 0 has a non–trivial solution. An important class of non–singular matrices is that of the elementary row matrices. DEFINITION 2.5.2 (Elementary row matrices) There are three types, Eij , Ei (t), Eij (t), corresponding to the three kinds of elementary row operation: 1. Eij , (i = j) is obtained from the identity matrix In by interchanging rows i and j. 2. Ei (t), (t = 0) is obtained by multiplying the i–th row of In by t. 3. Eij (t), (i = j) is obtained from In by adding t times the j–th row of In to the i–th row. EXAMPLE 2.5.6 (n = 3.) 1 0 0 1 0 0 1 0 0 E23 = 0 0 1 , E2 (−1) = 0 −1 0 , E23 (−1) = 0 1 −1 . 0 0 1 0 0 1 0 1 0

42

CHAPTER 2. MATRICES

The elementary row matrices have the following distinguishing property: THEOREM 2.5.7 If a matrix A is pre–multiplied by an elementary row– matrix, the resulting matrix is the one obtained by performing the corresponding elementary row–operation on A. EXAMPLE 2.5.7 a b a b 1 0 0 a b E23 c d = 0 0 1 c d = e f . c d e f 0 1 0 e f COROLLARY 2.5.2 The three types of elementary row–matrices are non– singular. Indeed

−1 1. Eij = Eij ; −1 2. Ei (t) = Ei (t−1 );

3. (Eij (t))−1 = Eij (−t). Proof. Taking A = In in the above theorem, we deduce the following equations: Eij Eij Ei (t)Ei (t

−1

= In if t = 0

) = In = Ei (t−1 )Ei (t)

Eij (t)Eij (−t) = In = Eij (−t)Eij (t). EXAMPLE 2.5.8 Find the 3 × 3 matrix A = E3 (5)E23 (2)E12 explicitly. Also ﬁnd A−1 . Solution. 0 1 0 0 1 0 0 1 0 A = E3 (5)E23 (2) 1 0 0 = E3 (5) 1 0 2 = 1 0 2 . 0 0 5 0 0 1 0 0 1 A−1 = (E3 (5)E23 (2)E12 )−1

−1 = E12 (E23 (2))−1 (E3 (5))−1

To ﬁnd A−1 , we have

= E12 E23 (−2)E3 (5−1 )

2.5. NON–SINGULAR MATRICES 1 0 0 = E12 E23 (−2) 0 1 0 0 0 1 5 2 1 0 0 0 1 −5 0 . = E12 0 1 − 2 = 1 0 5 1 1 0 0 0 0 5 5

43

REMARK 2.5.6 Recall that A and B are row–equivalent if B is obtained from A by a sequence of elementary row operations. If E1 , . . . , Er are the respective corresponding elementary row matrices, then B = Er (. . . (E2 (E1 A)) . . .) = (Er . . . E1 )A = P A, where P = Er . . . E1 is non–singular. Conversely if B = P A, where P is non–singular, then A is row–equivalent to B. For as we shall now see, P is in fact a product of elementary row matrices. THEOREM 2.5.8 Let A be non–singular n × n matrix. Then (i) A is row–equivalent to In , (ii) A is a product of elementary row matrices. Proof. Assume that A is non–singular and let B be the reduced row–echelon form of A. Then B has no zero rows, for otherwise the equation AX = 0 would have a non–trivial solution. Consequently B = In . It follows that there exist elementary row matrices E1 , . . . , Er such that −1 −1 Er (. . . (E1 A) . . .) = B = In and hence A = E1 . . . Er , a product of elementary row matrices. THEOREM 2.5.9 Let A be n × n and suppose that A is row–equivalent to In . Then A is non–singular and A−1 can be found by performing the same sequence of elementary row operations on In as were used to convert A to In . Proof. Suppose that Er . . . E1 A = In . In other words BA = In , where B = Er . . . E1 is non–singular. Then B −1 (BA) = B −1 In and so A = B −1 , which is non–singular. −1 Also A−1 = B −1 = B = Er ((. . . (E1 In ) . . .), which shows that A−1 is obtained from In by performing the same sequence of elementary row operations as were used to convert A to In .

44

CHAPTER 2. MATRICES

REMARK 2.5.7 It follows from theorem 2.5.9 that if A is singular, then A is row–equivalent to a matrix whose last row is zero. EXAMPLE 2.5.9 Show that A = 1 2 is non–singular, ﬁnd A−1 and 1 1 express A as a product of elementary row matrices.

Solution. We form the partitioned matrix [A|I2 ] which consists of A followed by I2 . Then any sequence of elementary row operations which reduces A to I2 will reduce I2 to A−1 . Here [A|I2 ] = R2 → R2 − R 1 R2 → (−1)R2 R1 → R1 − 2R2 1 2 1 1 1 2 0 −1 1 2 0 1 1 0 0 1 1 0 0 1 1 0 −1 1 1 0 1 −1 −1 2 1 −1 .

Hence A is row–equivalent to I2 and A is non–singular. Also A−1 = We also observe that E12 (−2)E2 (−1)E21 (−1)A = I2 . Hence A−1 = E12 (−2)E2 (−1)E21 (−1) A = E21 (1)E2 (−1)E12 (2). The next result is the converse of Theorem 2.5.6 and is useful for proving the non–singularity of certain types of matrices. THEOREM 2.5.10 Let A be an n × n matrix with the property that the homogeneous system AX = 0 has only the trivial solution. Then A is non–singular. Equivalently, if A is singular, then the homogeneous system AX = 0 has a non–trivial solution. −1 2 1 −1 .

2.5. NON–SINGULAR MATRICES

45

Proof. If A is n × n and the homogeneous system AX = 0 has only the trivial solution, then it follows that the reduced row–echelon form B of A cannot have zero rows and must therefore be In . Hence A is non–singular. COROLLARY 2.5.3 Suppose that A and B are n × n and AB = In . Then BA = In . Proof. Let AB = In , where A and B are n × n. We ﬁrst show that B is non–singular. Assume BX = 0. Then A(BX) = A0 = 0, so (AB)X = 0, In X = 0 and hence X = 0. Then from AB = In we deduce (AB)B −1 = In B −1 and hence A = B −1 . The equation BB −1 = In then gives BA = In . Before we give the next example of the above criterion for non-singularity, we introduce an important matrix operation. DEFINITION 2.5.3 (The transpose of a matrix) Let A be an m × n matrix. Then At , the transpose of A, is the matrix obtained by interchanging the rows and columns of A. In other words if A = [aij ], then At ji = aij . Consequently At is n × m. The transpose operation has the following properties: 1. At

t

= A;

2. (A ± B)t = At ± B t if A and B are m × n; 3. (sA)t = sAt if s is a scalar; 4. (AB)t = B t At if A is m × n and B is n × p; 5. If A is non–singular, then At is also non–singular and At

−1

= A−1 ;

t

6. X t X = x2 + . . . + x2 if X = [x1 , . . . , xn ]t is a column vector. n 1 We prove only the fourth property. First check that both (AB)t and B t At have the same size (p × m). Moreover, corresponding elements of both matrices are equal. For if A = [aij ] and B = [bjk ], we have (AB)t

ki

= (AB)ik

n

=

j=1

aij bjk

46

n

CHAPTER 2. MATRICES Bt

j=1 t

= =

kj

At

ji

B At

ki

.

There are two important classes of matrices that can be deﬁned concisely in terms of the transpose operation. DEFINITION 2.5.4 (Symmetric matrix) A real matrix A is called symmetric if At = A. In other words A is square (n × n say) and aji = aij for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. Hence A= is a general 2 × 2 symmetric matrix. DEFINITION 2.5.5 (Skew–symmetric matrix) A real matrix A is called skew–symmetric if At = −A. In other words A is square (n × n say) and aji = −aij for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. REMARK 2.5.8 Taking i = j in the deﬁnition of skew–symmetric matrix gives aii = −aii and so aii = 0. Hence A= 0 b −b 0 a b b c

is a general 2 × 2 skew–symmetric matrix.

We can now state a second application of the above criterion for non– singularity. COROLLARY 2.5.4 Let B be an n × n skew–symmetric matrix. Then A = In − B is non–singular.

Proof. Let A = In − B, where B t = −B. By Theorem 2.5.10 it suﬃces to show that AX = 0 implies X = 0. We have (In − B)X = 0, so X = BX. Hence X t X = X t BX. Taking transposes of both sides gives (X t BX)t = (X t X)t X t B t (X t )t = X t (X t )t X t (−B)X = X t X Hence X t X = −X t X and X t X = 0. But if X = [x1 , . . . , xn ]t , then X t X = x2 + . . . + x2 = 0 and hence x1 = 0, . . . , xn = 0. n 1 −X t BX = X t X = X t BX.

2.6. LEAST SQUARES SOLUTION OF EQUATIONS

47

2.6

Least squares solution of equations

Suppose AX = B represents a system of linear equations with real coeﬃcients which may be inconsistent, because of the possibility of experimental errors in determining A or B. For example, the system x = 1 y = 2 x + y = 3.001 is inconsistent. It can be proved that the associated system At AX = At B is always 2 consistent and that any solution of this system minimizes the sum r1 + . . . + 2 , where r , . . . , r (the residuals) are deﬁned by rm 1 m ri = ai1 x1 + . . . + ain xn − bi , for i = 1, . . . , m. The equations represented by At AX = At B are called the normal equations corresponding to the system AX = B and any solution of the system of normal equations is called a least squares solution of the original system. EXAMPLE 2.6.1 Find system. 1 0 Solution. Here A = 1 a least squares solution of the above inconsistent

1 0 x 1 , X = , B = 2 . y 3.001 1 1 0 2 1 1 0 1 0 1 = Then At A = . 1 2 0 1 1 1 1 1 4.001 1 0 1 2 = Also At B = . 5.001 0 1 1 3.001 So the normal equations are 2x + y = 4.001 x + 2y = 5.001 which have the unique solution x= 3.001 , 3 y= 6.001 . 3

48

CHAPTER 2. MATRICES

EXAMPLE 2.6.2 Points (x1 , y1 ), . . . , (xn , yn ) are experimentally determined and should lie on a line y = mx + c. Find a least squares solution to the problem. Solution. The points have to satisfy mx1 + c = y1 . . . mxn + c = yn , or Ax = B, where x1 1 . . A = . . , X = . . xn 1 m c y1 . , B = . . . yn

The normal equations are given x1 x1 . . . xn . t AA= . . 1 ... 1 xn Also

by (At A)X = At B. Here 1 2 2 . = x1 + . . . + xn x1 + . . . + xn . . x1 + . . . + xn n 1 y1 . . = . yn

At B =

x1 . . . xn 1 ... 1

x1 y1 + . . . + xn yn y1 + . . . + y n

.

It is not diﬃcult to prove that

∆ = det (At A) =

1≤i<j≤n

(xi − xj )2 ,

which is positive unless x1 = . . . = xn . Hence if not all of x1 , . . . , xn are equal, At A is non–singular and the normal equations have a unique solution. This can be shown to be m= 1 ∆ (xi − xj )(yi − yj ), c = 1 ∆ (xi yj − xj yi )(xi − xj ).

1≤i<j≤n

1≤i<j≤n

REMARK 2.6.1 The matrix At A is symmetric.

2.7. PROBLEMS

49

2.7

PROBLEMS

1 4 . Prove that A is non–singular, ﬁnd A−1 and −3 1 express A as a product of elementary row matrices. [Answer: A−1 =

1 13 3 13 4 − 13 1 13

1. Let A =

,

A = E21 (−3)E2 (13)E12 (4) is one such decomposition.] 2. A square matrix D = [dij ] is called diagonal if dij = 0 for i = j. (That is the oﬀ–diagonal elements are zero.) Prove that pre–multiplication of a matrix A by a diagonal matrix D results in matrix DA whose rows are the rows of A multiplied by the respective diagonal elements of D. State and prove a similar result for post–multiplication by a diagonal matrix. Let diag (a1 , . . . , an ) denote the diagonal matrix whose diagonal elements dii are a1 , . . . , an , respectively. Show that diag (a1 , . . . , an )diag (b1 , . . . , bn ) = diag (a1 b1 , . . . , an bn ) and deduce that if a1 . . . an = 0, then diag (a1 , . . . , an ) is non–singular and (diag (a1 , . . . , an ))−1 = diag (a−1 , . . . , a−1 ). n 1 Also prove that diag (a1 , . . . , an ) is singular if ai = 0 for some i. 0 3. Let A = 1 3 express A as a 0 2 2 6 . Prove that A is non–singular, ﬁnd A−1 and 7 9 product of elementary row matrices. −12

9 2 1 2

[Answers: A−1 =

7 −2 −3 1 , 0 0

A = E12 E31 (3)E23 E3 (2)E12 (2)E13 (24)E23 (−9) is one such decomposition.]

50

CHAPTER 2. MATRICES 1 2 k 1 4. Find the rational number k for which the matrix A = 3 −1 5 3 −5 is singular. [Answer: k = −3.] 5. Prove that A = 1 2 is singular and ﬁnd a non–singular matrix −2 −4 P such that P A has last row zero. 1 4 , verify that A2 − 2A + 13I2 = 0 and deduce that −3 1 1 = − 13 (A − 2I2 ).

6. If A = A−1

1 1 −1 1 . 7. Let A = 0 0 2 1 2

(ii) Express A4 in terms of A2 , A and I3 and hence calculate A4 explicitly. (iii) Use (i) to prove that A is non–singular and ﬁnd A−1 explicitly. −11 −8 −4 9 4 ; [Answers: (ii) A4 = 6A2 − 8A + 3I3 = 12 20 16 5 (iii) A−1 −1 −3 1 4 −1 .] = A2 − 3A + 3I3 = 2 0 1 0

(i) Verify that A3 = 3A2 − 3A + I3 .

8.

(i) Let B be an n × n matrix such that B 3 = 0. If A = In − B, prove that A is non–singular and A−1 = In + B + B 2 . Show that the system of linear equations AX = b has the solution X = b + Bb + B 2 b. 0 r s (ii) If B = 0 0 t , verify that B 3 = 0 and use (i) to determine 0 0 0 −1 explicitly. (I3 − B)

2.7. PROBLEMS 1 r s + rt t .] [Answer: 0 1 0 0 1 9. Let A be n × n. (i) If A2 = 0, prove that A is singular. (ii) If A2 = A and A = In , prove that A is singular. 10. Use Question 7 to solve the system of equations x+y−z = a z = b 2x + y + 2z = c

51

where a, b, c are given rationals. Check your answer using the Gauss– Jordan algorithm. [Answer: x = −a − 3b + c, y = 2a + 4b − c, z = b.] 11. Determine explicitly the following products of 3 × 3 elementary row matrices. (i) E12 E23

−1 (v) E12

(ii) E1 (5)E12

(iii) E12 (3)E21 (−3)

(iv) (E1 (100))−1

(vi) (E12 (7))−1 (vii) (E12 (7)E31 (1))−1 . −8 3 0 0 5 0 0 0 1 [Answers: (i) 1 0 0 (ii) 1 0 0 (iii) −3 1 0 0 0 1 0 0 1 0 1 0 1 1 −7 0 1 −7 0 0 1 0 100 0 0 1 0 .] 1 0 (vii) 0 (iv) 0 1 0 (v) 1 0 0 (vi) 0 −1 7 1 0 0 1 0 0 1 0 0 1 12. Let A be the following product of 4 × 4 elementary row matrices: A = E3 (2)E14 E42 (3). Find A and A−1 explicitly. 0 3 0 0 1 0 [Answers: A = 0 0 2 1 0 0

0 0 1 0 1 0 , A−1 = 0 0 0 0 1 −3

0 1 0 0 .] 1 2 0 0 0

52

CHAPTER 2. MATRICES

13. Determine which of the following matrices over Z2 are non–singular and ﬁnd the inverse, where possible. 1 1 0 1 1 1 0 1 0 0 1 1 (b) 0 1 1 1 . (a) 1 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0 1 [Answer: (a) 1 0 1 0 .] 1 1 1 0 14. Determine which of the following matrices are non–singular and ﬁnd the inverse, where possible. 4 6 −3 2 2 4 1 1 1 7 (a) −1 1 0 (b) 1 0 1 (c) 0 0 0 0 5 0 1 0 2 0 0 1 2 4 6 1 2 3 2 0 0 0 1 2 0 (d) 0 −5 0 (e) 0 0 1 2 (f) 4 5 6 . 5 7 9 0 0 7 0 0 0 2 1 1 1 0 0 2 1 0 0 −2 2 2 1 1 1 0 1 (d) 0 − 5 0 [Answers: (a) 0 (b) 0 2 1 0 0 1 1 −1 −1 2 −1 −1 7 1 −2 0 −3 0 1 −2 2 .] (e) 0 0 1 −1 1 0 0 0 2 15. Let A be a non–singular n × n matrix. Prove that At is non–singular and that (At )−1 = (A−1 )t . 16. Prove that A = a b c d has no inverse if ad − bc = 0.

[Hint: Use the equation A2 − (a + d)A + (ad − bc)I2 = 0.]

2.7. PROBLEMS

53

1 a b 1 c is non–singular by 17. Prove that the real matrix A = −a −b −c 1 proving that A is row–equivalent to I3 . 18. If P −1 AP = B, prove that P −1 An P = B n for n ≥ 1. 19. Let A =

2 3 1 3 1 4 3 4

,P = 1 7

and deduce that An =

1 3 . Verify that P −1 AP = −1 4 + 1 7 5 12

n

5 12

0 0 1

3 3 4 4

4 −3 −4 3

.

20. Let A =

a b c d

be a Markov matrix; that is a matrix whose elements b 1 . c −1

are non–negative and satisfy a+c = 1 = b+d. Also let P = Prove that if A = I2 then (i) P is non–singular and P −1 AP = (ii) An → 1 b+c b b c c 1 0 , 0 a+d−1 0 1 . 1 0

as n → ∞, if A =

−1 1 2 21. If X = 3 4 and Y = 3 , ﬁnd XX t , X t X, 4 5 6 1 −3 5 11 17 35 44 9 , −3 [Answers: 11 25 39 , 44 56 −4 12 17 39 61 22. Prove that the system of linear equations x + 2y = 4 x+y = 5 3x + 5y = 12

Y Y t, Y tY . −4 12 , 26.] 16

is inconsistent and ﬁnd a least squares solution of the system. [Answer: x = 6, y = −7/6.]

54

CHAPTER 2. MATRICES

23. The points (0, 0), (1, 0), (2, −1), (3, 4), (4, 8) are required to lie on a parabola y = a + bx + cx2 . Find a least squares solution for a, b, c. Also prove that no parabola passes through these points. [Answer: a = 1 , b = −2, c = 1.] 5 24. If A is a symmetric n ×n real matrix and B is n ×m, prove that B t AB is a symmetric m × m matrix. 25. If A is m × n and B is n × m, prove that AB is singular if m > n. 26. Let A and B be n × n. If A or B is singular, prove that AB is also singular.

Chapter 3

SUBSPACES

3.1 Introduction

Throughout this chapter, we will be studying F n , the set of all n–dimensional column vectors with components from a ﬁeld F . We continue our study of matrices by considering an important class of subsets of F n called subspaces. These arise naturally for example, when we solve a system of m linear homogeneous equations in n unknowns. We also study the concept of linear dependence of a family of vectors. This was introduced brieﬂy in Chapter 2, Remark 2.5.4. Other topics discussed are the row space, column space and null space of a matrix over F , the dimension of a subspace, particular examples of the latter being the rank and nullity of a matrix.

3.2

Subspaces of F n

DEFINITION 3.2.1 A subset S of F n is called a subspace of F n if 1. The zero vector belongs to S; (that is, 0 ∈ S); 2. If u ∈ S and v ∈ S, then u + v ∈ S; (S is said to be closed under vector addition); 3. If u ∈ S and t ∈ F , then tu ∈ S; (S is said to be closed under scalar multiplication). EXAMPLE 3.2.1 Let A ∈ Mm×n (F ). Then the set of vectors X ∈ F n satisfying AX = 0 is a subspace of F n called the null space of A and is denoted here by N (A). (It is sometimes called the solution space of A.) 55

56

CHAPTER 3. SUBSPACES

Proof. (1) A0 = 0, so 0 ∈ N (A); (2) If X, Y ∈ N (A), then AX = 0 and AY = 0, so A(X + Y ) = AX + AY = 0 + 0 = 0 and so X + Y ∈ N (A); (3) If X ∈ N (A) and t ∈ F , then A(tX) = t(AX) = t0 = 0, so tX ∈ N (A). 1 0 , then N (A) = {0}, the set consisting of 0 1 1 2 just the zero vector. If A = , then N (A) is the set of all scalar 2 4 multiples of [−2, 1]t . For example, if A = EXAMPLE 3.2.2 Let X1 , . . . , Xm ∈ F n . Then the set consisting of all linear combinations x1 X1 + · · · + xm Xm , where x1 , . . . , xm ∈ F , is a subspace of F n . This subspace is called the subspace spanned or generated by X1 , . . . , Xm and is denoted here by X1 , . . . , Xm . We also call X1 , . . . , Xm a spanning family for S = X1 , . . . , Xm . Proof. (1) 0 = 0X1 + · · · + 0Xm , so 0 ∈ X1 , . . . , Xm ; (2) If X, Y ∈ X1 , . . . , Xm , then X = x1 X1 + · · · + xm Xm and Y = y1 X1 + · · · + ym Xm , so X +Y = (x1 X1 + · · · + xm Xm ) + (y1 X1 + · · · + ym Xm )

= (x1 + y1 )X1 + · · · + (xm + ym )Xm ∈ X1 , . . . , Xm .

(3) If X ∈ X1 , . . . , Xm and t ∈ F , then tX = t(x1 X1 + · · · + xm Xm ) X = x1 X1 + · · · + xm Xm

= (tx1 )X1 + · · · + (txm )Xm ∈ X1 , . . . , Xm .

For example, if A ∈ Mm×n (F ), the subspace generated by the columns of A is an important subspace of F m and is called the column space of A. The column space of A is denoted here by C(A). Also the subspace generated by the rows of A is a subspace of F n and is called the row space of A and is denoted by R(A). EXAMPLE 3.2.3 For example F n = E1 , . . . , En , where E1 , . . . , En are the n–dimensional unit vectors. For if X = [x1 , . . . , xn ]t ∈ F n , then X = x1 E1 + · · · + xn En . EXAMPLE 3.2.4 Find a spanning family for the subspace S of R3 deﬁned by the equation 2x − 3y + 5z = 0.

3.2. SUBSPACES OF F N Solution. (S is in fact the null space of [2, −3, 5], so S of R3 .) 5 3 If [x, y, z]t ∈ S, then x = 2 y − 2 z. Then 3 3 5 x 2y − 2z 2 y = = y 1 +z y z z 0

57 is indeed a subspace

3 and conversely. Hence [ 2 , 1, 0]t and [− 5 , 0, 1]t form a spanning family for 2 S. The following result is easy to prove:

−5 2 0 1

LEMMA 3.2.1 Suppose each of X1 , . . . , Xr is a linear combination of Y1 , . . . , Ys . Then any linear combination of X1 , . . . , Xr is a linear combination of Y1 , . . . , Ys . As a corollary we have THEOREM 3.2.1 Subspaces X1 , . . . , Xr and Y1 , . . . , Ys are equal if each of X1 , . . . , Xr is a linear combination of Y1 , . . . , Ys and each of Y1 , . . . , Ys is a linear combination of X1 , . . . , Xr . COROLLARY 3.2.1 Subspaces X1 , . . . , Xr , Z1 , . . . , Zt and X1 , . . . , Xr are equal if each of Z1 , . . . , Zt is a linear combination of X1 , . . . , Xr . EXAMPLE 3.2.5 If X and Y are vectors in Rn , then X, Y = X + Y, X − Y . Solution. Each of X + Y and X − Y is a linear combination of X and Y . Also 1 1 1 1 X = (X + Y ) + (X − Y ) and Y = (X + Y ) − (X − Y ), 2 2 2 2 so each of X and Y is a linear combination of X + Y and X − Y .

There is an important application of Theorem 3.2.1 to row equivalent matrices (see Deﬁnition 1.2.4): THEOREM 3.2.2 If A is row equivalent to B, then R(A) = R(B).

Proof. Suppose that B is obtained from A by a sequence of elementary row operations. Then it is easy to see that each row of B is a linear combination of the rows of A. But A can be obtained from B by a sequence of elementary operations, so each row of A is a linear combination of the rows of B. Hence by Theorem 3.2.1, R(A) = R(B).

58

CHAPTER 3. SUBSPACES

REMARK 3.2.1 If A is row equivalent to B, it is not always true that C(A) = C(B). 1 1 1 1 and B = , then B is in fact the 1 1 0 0 reduced row–echelon form of A. However we see that For example, if A = C(A) = 1 0 1 1 . 1 1 ∈ C(A) but 1 1 ∈ C(B). , 1 1 = 1 1

and similarly C(B) =

Consequently C(A) = C(B), as

3.3

Linear dependence

We now recall the deﬁnition of linear dependence and independence of a family of vectors in F n given in Chapter 2. DEFINITION 3.3.1 Vectors X1 , . . . , Xm in F n are said to be linearly dependent if there exist scalars x1 , . . . , xm , not all zero, such that x1 X1 + · · · + xm Xm = 0. In other words, X1 , . . . , Xm are linearly dependent if some Xi is expressible as a linear combination of the remaining vectors. X1 , . . . , Xm are called linearly independent if they are not linearly dependent. Hence X1 , . . . , Xm are linearly independent if and only if the equation x1 X1 + · · · + xm Xm = 0 has only the trivial solution x1 = 0, . . . , xm = 0. EXAMPLE 3.3.1 The following three vectors in R3 −1 −1 1 X 1 = 2 , X2 = 1 , X3 = 7 12 2 3 are linearly dependent, as 2X1 + 3X2 + (−1)X3 = 0.

3.3. LINEAR DEPENDENCE REMARK 3.3.1 If X1 , . . . , Xm are linearly independent and x1 X1 + · · · + xm Xm = y1 X1 + · · · + ym Xm , then x1 = y1 , . . . , xm = ym . For the equation can be rewritten as (x1 − y1 )X1 + · · · + (xm − ym )Xm = 0 and so x1 − y1 = 0, . . . , xm − ym = 0.

59

THEOREM 3.3.1 A family of m vectors in F n will be linearly dependent if m > n. Equivalently, any linearly independent family of m vectors in F n must satisfy m ≤ n. Proof. The equation x1 X1 + · · · + xm Xm = 0 is equivalent to n homogeneous equations in m unknowns. By Theorem 1.5.1, such a system has a non–trivial solution if m > n. The following theorem is an important generalization of the last result and is left as an exercise for the interested student: THEOREM 3.3.2 A family of s vectors in X1 , . . . , Xr will be linearly dependent if s > r. Equivalently, a linearly independent family of s vectors in X1 , . . . , Xr must have s ≤ r. Here is a useful criterion for linear independence which is sometimes called the left–to–right test: THEOREM 3.3.3 Vectors X1 , . . . , Xm in F n are linearly independent if (a) X1 = 0; (b) For each k with 1 < k ≤ m, Xk is not a linear combination of X1 , . . . , Xk−1 . One application of this criterion is the following result: THEOREM 3.3.4 Every subspace S of F n can be represented in the form S = X1 , . . . , Xm , where m ≤ n.

60

CHAPTER 3. SUBSPACES

Proof. If S = {0}, there is nothing to prove – we take X1 = 0 and m = 1. So we assume S contains a non–zero vector X1 ; then X1 ⊆ S as S is a subspace. If S = X1 , we are ﬁnished. If not, S will contain a vector X2 , not a linear combination of X1 ; then X1 , X2 ⊆ S as S is a subspace. If S = X1 , X2 , we are ﬁnished. If not, S will contain a vector X3 which is not a linear combination of X1 and X2 . This process must eventually stop, for at stage k we have constructed a family of k linearly independent vectors X1 , . . . , Xk , all lying in F n and hence k ≤ n. There is an important relationship between the columns of A and B, if A is row–equivalent to B. THEOREM 3.3.5 Suppose that A is row equivalent to B and let c1 , . . . , cr be distinct integers satisfying 1 ≤ ci ≤ n. Then (a) Columns A∗c1 , . . . , A∗cr of A are linearly dependent if and only if the corresponding columns of B are linearly dependent; indeed more is true: x1 A∗c1 + · · · + xr A∗cr = 0 ⇔ x1 B∗c1 + · · · + xr B∗cr = 0. (b) Columns A∗c1 , . . . , A∗cr of A are linearly independent if and only if the corresponding columns of B are linearly independent. (c) If 1 ≤ cr+1 ≤ n and cr+1 is distinct from c1 , . . . , cr , then A∗cr+1 = z1 A∗c1 + · · · + zr A∗cr ⇔ B∗cr+1 = z1 B∗c1 + · · · + zr B∗cr . Proof. First observe that if Y = [y1 , . . . , yn ]t is an n–dimensional column vector and A is m × n, then AY = y1 A∗1 + · · · + yn A∗n . Also AY = 0 ⇔ BY = 0, if B is row equivalent to A. Then (a) follows by taking yi = xcj if i = cj and yi = 0 otherwise. (b) is logically equivalent to (a), while (c) follows from (a) as A∗cr+1 = z1 A∗c1 + · · · + zr A∗cr ⇔ z1 A∗c1 + · · · + zr A∗cr + (−1)A∗cr+1 = 0 ⇔ B∗cr+1 = z1 B∗c1 + · · · + zr B∗cr .

⇔ z1 B∗c1 + · · · + zr B∗cr + (−1)B∗cr+1 = 0

3.4. BASIS OF A SUBSPACE EXAMPLE 3.3.2 The matrix 1 1 5 1 4 2 A = 2 −1 1 2 3 0 6 0 −3 has reduced row–echelon form equal to 1 0 2 0 −1 2 . B= 0 1 3 0 0 0 0 1 3

61

We notice that B∗1 , B∗2 and B∗4 are linearly independent and hence so are A∗1 , A∗2 and A∗4 . Also B∗5 = (−1)B∗1 + 2B∗2 + 3B∗4 , so consequently A∗3 = 2A∗1 + 3A∗2 B∗3 = 2B∗1 + 3B∗2

A∗5 = (−1)A∗1 + 2A∗2 + 3A∗4 .

3.4

Basis of a subspace

We now come to the important concept of basis of a vector subspace. DEFINITION 3.4.1 Vectors X1 , . . . , Xm belonging to a subspace S are said to form a basis of S if (a) Every vector in S is a linear combination of X1 , . . . , Xm ; (b) X1 , . . . , Xm are linearly independent. Note that (a) is equivalent to the statement that S = X1 , . . . , Xm as we automatically have X1 , . . . , Xm ⊆ S. Also, in view of Remark 3.3.1 above, (a) and (b) are equivalent to the statement that every vector in S is uniquely expressible as a linear combination of X1 , . . . , Xm . EXAMPLE 3.4.1 The unit vectors E1 , . . . , En form a basis for F n .

62

CHAPTER 3. SUBSPACES

REMARK 3.4.1 The subspace {0}, consisting of the zero vector alone, does not have a basis. For every vector in a linearly independent family must necessarily be non–zero. (For example, if X1 = 0, then we have the non–trivial linear relation 1X1 + 0X2 + · · · + 0Xm = 0 and X1 , . . . , Xm would be linearly dependent.) However if we exclude this case, every other subspace of F n has a basis: THEOREM 3.4.1 A subspace of the form X1 , . . . , Xm , where at least one of X1 , . . . , Xm is non–zero, has a basis Xc1 , . . . , Xcr , where 1 ≤ c1 < · · · < cr ≤ m. Proof. (The left–to–right algorithm). Let c1 be the least index k for which Xk is non–zero. If c1 = m or if all the vectors Xk with k > c1 are linear combinations of Xc1 , terminate the algorithm and let r = 1. Otherwise let c2 be the least integer k > c1 such that Xk is not a linear combination of Xc 1 . If c2 = m or if all the vectors Xk with k > c2 are linear combinations of Xc1 and Xc2 , terminate the algorithm and let r = 2. Eventually the algorithm will terminate at the r–th stage, either because cr = m, or because all vectors Xk with k > cr are linear combinations of Xc1 , . . . , Xcr . Then it is clear by the construction of Xc1 , . . . , Xcr , using Corollary 3.2.1 that (a) Xc1 , . . . , Xcr = X1 , . . . , Xm ; (b) the vectors Xc1 , . . . , Xcr are linearly independent by the left–to–right test. Consequently Xc1 , . . . , Xcr form a basis (called the left–to–right basis) for the subspace X1 , . . . , Xm . EXAMPLE 3.4.2 Let X and Y be linearly independent vectors in Rn . Then the subspace 0, 2X, X, −Y, X + Y has left–to–right basis consisting of 2X, −Y . A subspace S will in general have more than one basis. For example, any permutation of the vectors in a basis will yield another basis. Given one particular basis, one can determine all bases for S using a simple formula. This is left as one of the problems at the end of this chapter. We settle for the following important fact about bases:

3.4. BASIS OF A SUBSPACE

63

THEOREM 3.4.2 Any two bases for a subspace S must contain the same number of elements. Proof. For if X1 , . . . , Xr and Y1 , . . . , Ys are bases for S, then Y1 , . . . , Ys form a linearly independent family in S = X1 , . . . , Xr and hence s ≤ r by Theorem 3.3.2. Similarly r ≤ s and hence r = s. DEFINITION 3.4.2 This number is called the dimension of S and is written dim S. Naturally we deﬁne dim {0} = 0. It follows from Theorem 3.3.1 that for any subspace S of F n , we must have dim S ≤ n. EXAMPLE 3.4.3 If E1 , . . . , En denote the n–dimensional unit vectors in F n , then dim E1 , . . . , Ei = i for 1 ≤ i ≤ n. The following result gives a useful way of exhibiting a basis. THEOREM 3.4.3 A linearly independent family of m vectors in a subspace S, with dim S = m, must be a basis for S. Proof. Let X1 , . . . , Xm be a linearly independent family of vectors in a subspace S, where dim S = m. We have to show that every vector X ∈ S is expressible as a linear combination of X1 , . . . , Xm . We consider the following family of vectors in S: X1 , . . . , Xm , X. This family contains m + 1 elements and is consequently linearly dependent by Theorem 3.3.2. Hence we have x1 X1 + · · · + xm Xm + xm+1 X = 0, (3.1)

where not all of x1 , . . . , xm+1 are zero. Now if xm+1 = 0, we would have x1 X1 + · · · + xm Xm = 0, with not all of x1 , . . . , xm zero, contradictiong the assumption that X1 . . . , Xm are linearly independent. Hence xm+1 = 0 and we can use equation 3.1 to express X as a linear combination of X1 , . . . , Xm : X= −x1 −xm X1 + · · · + Xm . xm+1 xm+1

64

CHAPTER 3. SUBSPACES

3.5

Rank and nullity of a matrix

We can now deﬁne three important integers associated with a matrix. DEFINITION 3.5.1 Let A ∈ Mm×n (F ). Then (a) column rank A = dim C(A); (b) row rank A = dim R(A); (c) nullity A = dim N (A). We will now see that the reduced row–echelon form B of a matrix A allows us to exhibit bases for the row space, column space and null space of A. Moreover, an examination of the number of elements in each of these bases will immediately result in the following theorem: THEOREM 3.5.1 Let A ∈ Mm×n (F ). Then (a) column rank A = row rank A; (b) column rank A+ nullity A = n. Finding a basis for R(A): The r non–zero rows of B form a basis for R(A) and hence row rank A = r. For we have seen earlier that R(A) = R(B). Also R(B) = = = B1∗ , . . . , Bm∗ B1∗ , . . . , Br∗ , 0 . . . , 0 B1∗ , . . . , Br∗ .

The linear independence of the non–zero rows of B is proved as follows: Let the leading entries of rows 1, . . . , r of B occur in columns c1 , . . . , cr . Suppose that x1 B1∗ + · · · + xr Br∗ = 0. Then equating components c1 , . . . , cr of both sides of the last equation, gives x1 = 0, . . . , xr = 0, in view of the fact that B is in reduced row– echelon form. Finding a basis for C(A): The r columns A∗c1 , . . . , A∗cr form a basis for C(A) and hence column rank A = r. For it is clear that columns c1 , . . . , cr of B form the left–to–right basis for C(B) and consequently from parts (b) and (c) of Theorem 3.3.5, it follows that columns c1 , . . . , cr of A form the left–to–right basis for C(A).

3.5. RANK AND NULLITY OF A MATRIX

65

Then N (B) and hence N (A) are determined by the equations x1 = (−b1r+1 )xr+1 + · · · + (−b1n )xn . . . xr = (−brr+1 )xr+1 + · · · + (−brn )xn ,

Finding a basis for N (A): For notational simplicity, let us suppose that c1 = 1, . . . , cr = r. Then B has the form 1 0 · · · 0 b1r+1 · · · b1n 0 1 · · · 0 b2r+1 · · · b2n . . . . . . . . ··· . . . . . ··· . . B = 0 0 · · · 1 brr+1 · · · brn . 0 0 ··· 0 0 ··· 0 . . . . . . . . . ··· . . . . . ··· . 0 0 ··· 0 0 ··· 0

where xr+1 , . . . , xn are arbitrary elements of F . Hence the general vector X in N (A) is given by x1 −b1r+1 −bn . . . . . . . . . xr −brr+1 + · · · + xn −brn (3.2) xr+1 = xr+1 0 1 . . . . . . . . . xn 0 = xr+1 X1 + · · · + xn Xn−r . 1 Hence N (A) is spanned by X1 , . . . , Xn−r , as xr+1 , . . . , xn are arbitrary. Also X1 , . . . , Xn−r are linearly independent. For equating the right hand side of equation 3.2 to 0 and then equating components r + 1, . . . , n of both sides of the resulting equation, gives xr+1 = 0, . . . , xn = 0. Consequently X1 , . . . , Xn−r form a basis for N (A). Theorem 3.5.1 now follows. For we have row rank A = dim R(A) = r column rank A = dim C(A) = r. Hence row rank A = column rank A.

66 Also

CHAPTER 3. SUBSPACES

column rank A + nullity A = r + dim N (A) = r + (n − r) = n. DEFINITION 3.5.2 The common value of column rank A and row rank A is called the rank of A and is denoted by rank A. EXAMPLE 3.5.1 Given that the reduced row–echelon form of 1 1 5 1 4 2 A = 2 −1 1 2 3 0 6 0 −3 equal to 1 0 2 0 −1 2 , B= 0 1 3 0 0 0 0 1 3

ﬁnd bases for R(A), C(A) and N (A).

Solution. [1, 0, 2, 0, −1], [0, 1, 3, 0, 2] and [0, 0, 0, 1, 3] form a basis for R(A). Also 1 1 1 2 , A∗2 = −1 , A∗4 = 2 = 0 0 3

A∗1

form a basis for C(A). Finally N (A) is given by x1 x2 x3 x4 x5 −2x3 + x5 −3x3 − 2x5 x3 −3x5 x5 −2 −3 1 0 0 1 −2 0 −3 1

=

= x3

+ x5

= x3 X1 + x5 X2 ,

where x3 and x5 are arbitrary. Hence X1 and X2 form a basis for N (A). Here rank A = 3 and nullity A = 2. EXAMPLE 3.5.2 Let A = row–echelon form of A. 1 2 . Then B = 2 4 1 2 0 0 is the reduced

3.6. PROBLEMS Hence [1, 2] is a basis for R(A) and

67

1 is a basis for C(A). Also N (A) 2 is given by the equation x1 = −2x2 , where x2 is arbitrary. Then x1 x2 and hence = −2x2 x2 = x2 −2 1

−2 is a basis for N (A). 1 Here rank A = 1 and nullity A = 1. 1 2 . Then B = 3 4 1 0 0 1 is the reduced

EXAMPLE 3.5.3 Let A =

row–echelon form of A. Hence [1, 0], [0, 1] form a basis for R(A) while [1, 3], [2, 4] form a basis for C(A). Also N (A) = {0}. Here rank A = 2 and nullity A = 0. We conclude this introduction to vector spaces with a result of great theoretical importance. THEOREM 3.5.2 Every linearly independent family of vectors in a subspace S can be extended to a basis of S. Proof. Suppose S has basis X1 , . . . , Xm and that Y1 , . . . , Yr is a linearly independent family of vectors in S. Then S = X1 , . . . , Xm = Y1 , . . . , Yr , X1 , . . . , Xm , as each of Y1 , . . . , Yr is a linear combination of X1 , . . . , Xm . Then applying the left–to–right algorithm to the second spanning family for S will yield a basis for S which includes Y1 , . . . , Yr .

3.6

PROBLEMS

(a) [x, y] satisfying x = 2y; (b) [x, y] satisfying x = 2y and 2x = y; (c) [x, y] satisfying x = 2y + 1; (d) [x, y] satisfying xy = 0;

1. Which of the following subsets of R2 are subspaces?

68 (e) [x, y] satisfying x ≥ 0 and y ≥ 0. [Answer: (a) and (b).] 2. If X, Y, Z are vectors in Rn , prove that

CHAPTER 3. SUBSPACES

X, Y, Z = X + Y, X + Z, Y + Z . 0 1 1 0 3. Determine if X1 = , X2 = 1 1 2 2 4 independent in R . 1 and X3 = 1 are linearly 1 3

4. For which real numbers λ are the following vectors linearly independent in R3 ? λ −1 −1 X1 = −1 , X2 = λ , X3 = −1 . −1 −1 λ 5. Find bases for the row, column and null spaces of the following matrix over Q: 1 1 2 0 1 2 2 5 0 3 A= 0 0 0 1 3 . 8 11 19 0 11 6. Find bases for the row, column and null spaces of the following matrix over Z2 : 1 0 1 0 1 0 1 0 1 1 A= 1 1 1 1 0 . 0 0 1 1 0 7. Find bases for the row, column and null spaces of the following matrix over Z5 : 1 1 2 0 1 3 2 1 4 0 3 2 A= 0 0 0 1 3 0 . 3 0 2 4 3 2

3.6. PROBLEMS

69

8. Find bases for the row, column and null spaces of the matrix A deﬁned in section 1.6, Problem 17. (Note: In this question, F is a ﬁeld of four elements.) 9. If X1 , . . . , Xm form a basis for a subspace S, prove that X1 , X1 + X2 , . . . , X1 + · · · + Xm also form a basis for S. 10. Let A = a b c . Find conditions on a, b, c such that (a) rank A = 1 1 1 1; (b) rank A = 2. [Answer: (a) a = b = c; (b) at least two of a, b, c are distinct.] 11. Let S be a subspace of F n with dim S = m. If X1 , . . . , Xm are vectors in S with the property that S = X1 , . . . , Xm , prove that X1 . . . , Xm form a basis for S. 12. Find a basis for the subspace S of R3 deﬁned by the equation x + 2y + 3z = 0. Verify that Y1 = [−1, −1, 1]t ∈ S and ﬁnd a basis for S which includes Y1 . 13. Let X1 , . . . , Xm be vectors in F n . If Xi = Xj , where i < j, prove that X1 , . . . Xm are linearly dependent. 14. Let X1 , . . . , Xm+1 be vectors in F n . Prove that dim X1 , . . . , Xm+1 = dim X1 , . . . , Xm if Xm+1 is a linear combination of X1 , . . . , Xm , but dim X1 , . . . , Xm+1 = dim X1 , . . . , Xm + 1 if Xm+1 is not a linear combination of X1 , . . . , Xm . Deduce that the system of linear equations AX = B is consistent, if and only if rank [A|B] = rank A.

70

CHAPTER 3. SUBSPACES

15. Let a1 , . . . , an be elements of F , not all zero. Prove that the set of vectors [x1 , . . . , xn ]t where x1 , . . . , xn satisfy a1 x1 + · · · + an xn = 0 is a subspace of F n with dimension equal to n − 1. 16. Prove Lemma 3.2.1, Theorem 3.2.1, Corollary 3.2.1 and Theorem 3.3.2. 17. Let R and S be subspaces of F n , with R ⊆ S. Prove that dim R ≤ dim S and that equality implies R = S. (This is a very useful way of proving equality of subspaces.) 18. Let R and S be subspaces of F n . If R ∪ S is a subspace of F n , prove that R ⊆ S or S ⊆ R. 19. Let X1 , . . . , Xr be a basis for a subspace S. Prove that all bases for S are given by the family Y1 , . . . , Yr , where

r

Yi =

j=1

aij Xj ,

and where A = [aij ] ∈ Mr×r (F ) is a non–singular matrix.

Chapter 4

DETERMINANTS

DEFINITION 4.0.1 If A = a11 a12 , we deﬁne the determinant of a21 a22 A, (also denoted by det A,) to be the scalar det A = a11 a22 − a12 a21 . The notation a11 a12 a21 a22 is also used for the determinant of A.

If A is a real matrix, there is a geometrical interpretation of det A. If P = (x1 , y1 ) and Q = (x2 , y2 ) are points in the plane, forming a triangle 1 x1 y1 with the origin O = (0, 0), then apart from sign, 2 is the area x2 y2 of the triangle OP Q. For, using polar coordinates, let x1 = r1 cos θ1 and y1 = r1 sin θ1 , where r1 = OP and θ1 is the angle made by the ray OP with 1 the positive x–axis. Then triangle OP Q has area 2 OP · OQ sin α, where α = ∠P OQ. If triangle OP Q has anti–clockwise orientation, then the ray OQ makes angle θ2 = θ1 + α with the positive x–axis. (See Figure 4.1.) Also x2 = r2 cos θ2 and y2 = r2 sin θ2 . Hence Area OP Q = = = = 1 OP · OQ sin α 2 1 OP · OQ sin (θ2 − θ1 ) 2 1 OP · OQ(sin θ2 cos θ1 − cos θ2 sin θ1 ) 2 1 (OQ sin θ2 · OP cos θ1 − OQ cos θ2 · OP sin θ1 ) 2 71

E E

72 y

T

CHAPTER 4. DETERMINANTS

Q

¡d ¡ d ¡ d ¨¨

¡ α ¨¨ ¡¨¨θ 1 ¨ ¡

¡ ¡

d ¨ P Ex

O

Figure 4.1: Area of triangle OP Q.

= =

1 (y2 x1 − x2 y1 ) 2 1 x1 y1 . 2 x2 y2

Similarly, if triangle OP Q has clockwise orientation, then its area equals x1 y1 . −1 2 x 2 y2 For a general triangle P1 P2 P3 , with Pi = (xi , yi ), i = 1, 2, 3, we can take P1 as the origin. Then the above formula gives 1 2 x2 − x1 y2 − y1 x3 − x1 y3 − y1 or − 1 2 x2 − x1 y2 − y1 x3 − x1 y3 − y1 ,

according as vertices P1 P2 P3 are anti–clockwise or clockwise oriented. We now give a recursive deﬁnition of the determinant of an n × n matrix A = [aij ], n ≥ 3. DEFINITION 4.0.2 (Minor) Let Mij (A) (or simply Mij if there is no ambiguity) denote the determinant of the (n − 1) × (n − 1) submatrix of A formed by deleting the i–th row and j–th column of A. (Mij (A) is called the (i, j) minor of A.) Assume that the determinant function has been deﬁned for matrices of size (n−1)×(n−1). Then det A is deﬁned by the so–called ﬁrst–row Laplace

73 expansion: det A = a11 M11 (A) − a12 M12 (A) + . . . + (−1)1+n M1n (A)

n

=

j=1

(−1)1+j a1j M1j (A).

For example, if A = [aij ] is a 3 × 3 matrix, the Laplace expansion gives det A = a11 M11 (A) − a12 M12 (A) + a13 M13 (A) a21 a22 a21 a23 a22 a23 + a13 − a12 = a11 a31 a32 a31 a33 a32 a33 = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )

= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 . (The recursive deﬁnition also works for 2 × 2 determinants, if we deﬁne the determinant of a 1 × 1 matrix [t] to be the scalar t: det A = a11 M11 (A) − a12 M12 (A) = a11 a22 − a12 a21 .) EXAMPLE 4.0.1 If P1 P2 P3 is a triangle with Pi = (xi , yi ), i = 1, 2, 3, then the area of triangle P1 P2 P3 is 1 2 x1 y1 1 x2 y2 1 x3 y3 1 or 1 − 2 x1 y1 1 x2 y2 1 , x3 y3 1

according as the orientation of P1 P2 P3 is anti–clockwise or clockwise. For from the deﬁnition of 3 × 3 determinants, we have 1 2 x1 y1 1 x2 y2 1 x3 y3 1 = = 1 2 1 2 x1 y2 1 y3 1 − y1 x2 1 x3 1 . + x2 y2 x3 y3

x2 − x1 y2 − y1 x3 − x1 y3 − y1

One property of determinants that follows immediately from the deﬁnition is the following: THEOREM 4.0.1 If a row of a matrix is zero, then the value of the determinant is zero.

74

CHAPTER 4. DETERMINANTS

(The corresponding result for columns also holds, but here a simple proof by induction is needed.) One of the simplest determinants to evaluate is that of a lower triangular matrix. THEOREM 4.0.2 Let A = [aij ], where aij = 0 if i < j. Then det A = a11 a22 . . . ann . (4.1)

An important special case is when A is a diagonal matrix. If A =diag (a1 , . . . , an ) then det A = a1 . . . an . In particular, for a scalar matrix tIn , we have det (tIn ) = tn . Proof. Use induction on the size n of the matrix. The result is true for n = 2. Now let n > 2 and assume the result true for matrices of size n − 1. If A is n × n, then expanding det A along row 1 gives a22 a32 . . . 0 a33 ... ... 0 0

det A = a11

an1 an2 . . . ann = a11 (a22 . . . ann ) by the induction hypothesis. If A is upper triangular, equation 4.1 remains true and the proof is again an exercise in induction, with the slight diﬀerence that the column version of theorem 4.0.1 is needed. REMARK 4.0.1 It can be shown that the expanded form of the determinant of an n × n matrix A consists of n! signed products ±a1i1 a2i2 . . . anin , where (i1 , i2 , . . . , in ) is a permutation of (1, 2, . . . , n), the sign being 1 or −1, according as the number of inversions of (i1 , i2 , . . . , in ) is even or odd. An inversion occurs when ir > is but r < s. (The proof is not easy and is omitted.) The deﬁnition of the determinant of an n × n matrix was given in terms of the ﬁrst–row expansion. The next theorem says that we can expand the determinant along any row or column. (The proof is not easy and is omitted.)

75 THEOREM 4.0.3

n

det A =

j=1

(−1)i+j aij Mij (A)

for i = 1, . . . , n (the so–called i–th row expansion) and

n

det A =

i=1

(−1)i+j aij Mij (A)

for j = 1, . . . , n (the so–called j–th column expansion). REMARK 4.0.2 The expression (−1)i+j obeys the chess–board pattern of signs: + − + ... − + − ... + − + ... . . . . The following theorems can be proved by straightforward inductions on the size of the matrix: THEOREM 4.0.4 A matrix and its transpose have equal determinants; that is det At = det A. THEOREM 4.0.5 If two rows of a matrix are equal, the determinant is zero. Similarly for columns. THEOREM 4.0.6 If two rows of a matrix are interchanged, the determinant changes sign. EXAMPLE 4.0.2 If P1 = (x1 , y1 ) and P2 = (x2 , y2 ) are distinct points, then the line through P1 and P2 has equation x y 1 x1 y1 1 x2 y2 1

= 0.

76

CHAPTER 4. DETERMINANTS

For, expanding the determinant along row 1, the equation becomes ax + by + c = 0, where a= y1 1 y2 1 = y1 − y2 and b = − x1 1 x2 1 = x2 − x1 .

This represents a line, as not both a and b can be zero. Also this line passes through Pi , i = 1, 2. For the determinant has its ﬁrst and i–th rows equal if x = xi and y = yi and is consequently zero. There is a corresponding formula in three–dimensional geometry. If P1 , P2 , P3 are non–collinear points in three–dimensional space, with Pi = (xi , yi , zi ), i = 1, 2, 3, then the equation x y z x1 y1 z1 x2 y2 z2 x3 y3 z3 1 1 1 1

=0

represents the plane through P1 , P2 , P3 . For, expanding the determinant along row 1, the equation becomes ax + by + cz + d = 0, where a= x1 z1 1 y1 z 1 1 y2 z2 1 , b = − x2 z2 1 , c = x3 z3 1 y3 z 3 1 x1 y1 1 x2 y2 1 . x3 y3 1

As we shall see in chapter 6, this represents a plane if at least one of a, b, c 1 is non–zero. However, apart from sign and a factor 2 , the determinant expressions for a, b, c give the values of the areas of projections of triangle P1 P2 P3 on the (y, z), (x, z) and (x, y) planes, respectively. Geometrically, it is then clear that at least one of a, b, c is non–zero. It is also possible to give an algebraic proof of this fact. Finally, the plane passes through Pi , i = 1, 2, 3 as the determinant has its ﬁrst and i–th rows equal if x = xi , y = yi , z = zi and is consequently zero. We now work towards proving that a matrix is non–singular if its determinant is non–zero. DEFINITION 4.0.3 (Cofactor) The (i, j) cofactor of A, denoted by Cij (A) (or Cij if there is no ambiguity) is deﬁned by Cij (A) = (−1)i+j Mij (A).

77 REMARK 4.0.3 It is important to notice that Cij (A), like Mij (A), does not depend on aij . Use will be made of this observation presently. In terms of the cofactor notation, Theorem 4.0.3 takes the form THEOREM 4.0.7

n

det A =

j=1

aij Cij (A)

for i = 1, . . . , n and det A =

n

aij Cij (A)

i=1

for j = 1, . . . , n. Another result involving cofactors is THEOREM 4.0.8 Let A be an n × n matrix. Then

n

(a)

j=1

aij Ckj (A) = 0

if i = k.

Also (b)

n

aij Cik (A) = 0

i=1

if j = k.

Proof. If A is n × n and i = k, let B be the matrix obtained from A by replacing row k by row i. Then det B = 0 as B has two identical rows. Now expand det B along row k. We get

n

0 = det B =

j=1 n

bkj Ckj (B) aij Ckj (A),

j=1

= in view of Remark 4.0.3.

78

CHAPTER 4. DETERMINANTS

DEFINITION 4.0.4 (Adjoint) If A = [aij ] is an n × n matrix, the adjoint of A, denoted by adj A, is the transpose of the matrix of cofactors. Hence C11 C21 · · · Cn1 C12 C22 · · · Cn2 adj A = . . . . . . . C1n C2n · · · Cnn Theorems 4.0.7 and 4.0.8 may be combined to give THEOREM 4.0.9 Let A be an n × n matrix. Then A(adj A) = (det A)In = (adj A)A. Proof.

n

(A adj A)ik =

j=1 n

aij (adj A)jk aij Ckj (A)

j=1

=

= δik det A = ((det A)In )ik . Hence A(adj A) = (det A)In . The other equation is proved similarly. COROLLARY 4.0.1 (Formula for the inverse) If det A = 0, then A is non–singular and 1 A−1 = adj A. det A EXAMPLE 4.0.3 The matrix 1 2 3 A= 4 5 6 8 8 9 4 6 5 6 −2 8 9 8 9 = −3 + 24 − 24 4 5 8 8

is non–singular. For det A =

+3

= −3 = 0.

79 Also A−1 C C21 1 11 C12 C22 = −3 C13 C23 5 6 8 9 1 4 6 = − − 8 9 3 4 5 8 8 −3 6 1 = − 12 −15 3 −8 8 C31 C32 C33

2 3 − 8 9 1 3 8 9 −

2 3 5 6 1 3 4 6 1 2 4 5

−

The following theorem is useful for simplifying and numerically evaluating a determinant. Proofs are obtained by expanding along the corresponding row or column. THEOREM 4.0.10 The determinant is a linear function of each row and column. For example (a)

′ ′ a11 + a′ 11 a12 + a12 a13 + a13 a21 a22 a23 a31 a32 a33

−3 6 . −3

1 2 8 8

=

′ ′ a′ a11 a12 a13 11 a12 a13 a21 a22 a23 + a21 a22 a23 a31 a32 a33 a31 a32 a33

(b)

ta11 ta12 ta13 a21 a22 a23 a31 a32 a33

a11 a12 a13 = t a21 a22 a23 a31 a32 a33

.

COROLLARY 4.0.2 If a multiple of a row is added to another row, the value of the determinant is unchanged. Similarly for columns. Proof. We illustrate with a 3 × 3 example, but the proof is really quite general. a11 + ta21 a12 + ta22 a13 + ta23 a21 a22 a23 a31 a32 a33 = a11 a12 a13 a21 a22 a23 a31 a32 a33 + ta21 ta22 ta23 a21 a22 a23 a31 a32 a33

80 a11 a12 a13 a21 a22 a23 a31 a32 a33 a11 a12 a13 a21 a22 a23 a31 a32 a33 a11 a12 a13 a21 a22 a23 a31 a32 a33

CHAPTER 4. DETERMINANTS a21 a22 a23 + t a21 a22 a23 a31 a32 a33 +t×0 .

=

=

=

To evaluate a determinant numerically, it is advisable to reduce the matrix to row–echelon form, recording any sign changes caused by row interchanges, together with any factors taken out of a row, as in the following examples. EXAMPLE 4.0.4 Evaluate the determinant 1 2 3 4 5 6 . 8 8 9 Solution. Using row operations R2 → R2 − 4R1 and R3 → R3 − 8R1 and then expanding along the ﬁrst column, gives 1 2 3 4 5 6 8 8 9 = 1 2 3 0 −3 −6 0 −8 −15 = −3 −6 −8 −15 1 2 0 1 = −3.

= −3

1 2 −8 −15 1 1 6 1 2 4 1 3

= −3

EXAMPLE 4.0.5 Evaluate the determinant 1 3 7 1 Solution. 1 3 7 1 1 1 6 1 2 4 1 3 1 5 2 4 1 1 2 1 0 −2 −2 2 0 −1 −13 −5 0 0 1 3 1 5 . 2 4

=

81 1 1 2 1 0 1 1 −1 = −2 0 −1 −13 −5 0 0 1 3 1 0 = −2 0 0 1 0 = 2 0 0 1 0 = 2 0 0 1 2 1 1 1 −1 0 −12 −6 0 1 3 1 2 1 1 1 −1 0 1 3 0 −12 −6 2 1 1 −1 1 3 0 30 = 60.

1 1 0 0

EXAMPLE 4.0.6 (Vandermonde determinant) Prove that 1 1 1 a b c a2 b2 c2

= (b − a)(c − a)(c − b).

Solution. Subtracting column 1 from columns 2 and 3 , then expanding along row 1, gives 1 1 1 a b c a2 b2 c2 = = 1 0 0 a b−a c−a a2 b2 − a2 c2 − a2 b2 b−a c−a 2 c2 − a2 −a

= (b − a)(c − a)

1 1 b+a c+a

= (b − a)(c − a)(c − b).

REMARK 4.0.4 From theorems 4.0.6, 4.0.10 and corollary 4.0.2, we deduce (a) det (Eij A) = −det A, (b) det (Ei (t)A) = t det A, if t = 0,

82 (c) det (Eij (t)A) =det A.

CHAPTER 4. DETERMINANTS

It follows that if A is row–equivalent to B, then det B = c det A, where c = 0. Hence det B = 0 ⇔ det A = 0 and det B = 0 ⇔ det A = 0. Consequently from theorem 2.5.8 and remark 2.5.7, we have the following important result: THEOREM 4.0.11 Let A be an n × n matrix. Then (i) A is non–singular if and only if det A = 0; (ii) A is singular if and only if det A = 0; (iii) the homogeneous system AX = 0 has a non–trivial solution if and only if det A = 0. EXAMPLE 4.0.7 Find the rational numbers a for which the following homogeneous system has a non–trivial solution and solve the system for these values of a: x − 2y + 3z = 0

ax + 3y + 2z = 0

6x + y + az = 0. Solution. The coeﬃcient determinant of the system is 1 −2 3 a 3 2 6 1 a 1 −2 3 0 3 + 2a 2 − 3a 0 13 a − 18

∆=

= =

3 + 2a 2 − 3a 13 a − 18 = (3 + 2a)(a − 18) − 13(2 − 3a)

= 2a2 + 6a − 80 = 2(a + 8)(a − 5).

So ∆ = 0 ⇔ a = −8 or a = 5 and these values of a are the only values for which the given homogeneous system has a non–trivial solution. If a = −8, the coeﬃcient matrix has reduced row–echelon form equal to 1 0 −1 0 1 −2 0 0 0

83 and so the complete solution is x = z, y = 2z, with z arbitrary. If a = 5, the coeﬃcient matrix has reduced row–echelon form equal to 1 0 1 0 1 −1 0 0 0 and so the complete solution is x = −z, y = z, with z arbitrary. EXAMPLE 4.0.8 Find the values of t for which the following system is consistent and solve the system in each case: x+y = 1 tx + y = t (1 + t)x + 2y = 3. Solution. Suppose that the given system has a solution (x0 , y0 ). Then the following homogeneous system x+y+z = 0 tx + y + tz = 0 (1 + t)x + 2y + 3z = 0 will have a non–trivial solution x = x0 , y = y0 , z = −1.

Hence the coeﬃcient determinant ∆ is zero. However ∆= 1 1 1 t 1 t 1+t 2 3 = 1 0 0 t 1−t 0 1+t 1−t 2−t x+y = 1 x+y = 1 2x + 2y = 3 which is clearly inconsistent. If t = 2, the given system becomes x+y = 1 2x + y = 2 3x + 2y = 3 = 1−t 0 1−t 2−t = (1−t)(2−t).

Hence t = 1 or t = 2. If t = 1, the given system becomes

84

CHAPTER 4. DETERMINANTS

which has the unique solution x = 1, y = 0. To ﬁnish this section, we present an old (1750) method of solving a system of n equations in n unknowns called Cramer’s rule . The method is not used in practice. However it has a theoretical use as it reveals explicitly how the solution depends on the coeﬃcients of the augmented matrix. THEOREM 4.0.12 (Cramer’s rule) The system of n linear equations in n unknowns x1 , . . . , xn a21 x1 + a22 x2 + · · · + a2n xn = b2 . . . has a unique solution if ∆ = det [aij ] = 0, namely ∆1 ∆2 ∆n , x2 = , . . . , xn = , ∆ ∆ ∆ where ∆i is the determinant of the matrix formed by replacing the i–th column of the coeﬃcient matrix A by the entries b1 , b2 , . . . , bn . x1 = Proof. Suppose the coeﬃcient determinant ∆ = 0. Then by corollary 4.0.1, 1 A−1 exists and is given by A−1 = ∆ adj A and the system has the unique solution b1 b1 x1 C11 C21 · · · Cn1 b2 x2 1 C12 C22 · · · Cn2 b2 . = A−1 . = . . . . . . . . ∆ . . . . . bn bn xn C1n C2n · · · Cnn b1 C11 + b2 C21 + . . . + bn Cn1 1 b2 C12 + b2 C22 + . . . + bn Cn2 = . . . ∆ . bn C1n + b2 C2n + . . . + bn Cnn expansion of ∆i along a11 x1 + a12 x2 + · · · + a1n xn = b1

an1 x1 + an2 x2 + · · · + ann xn = bn

However the i–th component of the last vector is the column i. Hence ∆1 /∆ ∆1 x1 ∆2 ∆2 /∆ x2 1 . = . = . . . ∆ . . . . xn ∆n ∆n /∆

.

4.1. PROBLEMS

85

4.1

.

PROBLEMS

1. If the points Pi = (xi , yi ), i = 1, 2, 3, 4 form a quadrilateral with vertices in anti–clockwise orientation, prove that the area of the quadrilateral equals 1 2 x1 x2 y1 y2 + x2 x3 y2 y3 + x3 x4 y3 y4 + x4 x1 y4 y1 .

(This formula generalizes to a simple polygon and is known as the Surveyor’s formula.) 2. Prove that the following identity holds by expressing the left–hand side as the sum of 8 determinants: a+x b+y c+z x+u y+v z+w u+a v+b w+c 3. Prove that a b c =2 x y z . u v w

n2 (n + 1)2 (n + 2)2 2 (n + 2)2 (n + 3)2 (n + 1) (n + 2)2 (n + 3)2 (n + 4)2

= −8.

4. Evaluate the following determinants: 246 427 327 1014 543 443 −342 721 621 1 2 3 4 −2 1 −4 3 . 3 −4 −1 2 4 3 −2 −1

(a)

(b)

[Answers: (a) −29400000; (b) 900.] 5. Compute the inverse of the matrix 1 0 −2 4 A= 3 1 5 2 −3 by ﬁrst computing the adjoint matrix. −11 −4 2 7 −10 .] [Answer: A−1 = −1 29 13 1 −2 1

86

CHAPTER 4. DETERMINANTS 6. Prove that the following identities hold: (i) 2a 2b b−c 2b 2a a + c a+b a+b b b+c b c c c+a a b a a+b = −2(a − b)2 (a + b), = 2a(b2 + c2 ).

(ii)

7. Let Pi = (xi , yi ), i = 1, 2, 3. If x1 , x2 , x3 are distinct, prove that there is precisely one curve of the form y = ax2 + bx + c passing through P1 , P2 and P3 . 8. Let 1 1 −1 k . A= 2 3 1 k 3

Find the values of k for which det A = 0 and hence, or otherwise, determine the value of k for which the following system has more than one solution: 2x + 3y + kz = 3 x+y−z = 1

x + ky + 3z = 2. Solve the system for this value of k and determine the solution for which x2 + y 2 + z 2 has least value. [Answer: k = 2; x = 10/21, y = 13/21, z = 2/21.] 9. By considering the coeﬃcient determinant, ﬁnd all rational numbers a and b for which the following system has (i) no solutions, (ii) exactly one solution, (iii) inﬁnitely many solutions: ax + Solve the system in case (iii). [Answer: (i) ab = 12 and a = 3, no solution; ab = 12, unique solution; 7 a = 3, b = 4, inﬁnitely many solutions; x = − 2 z + 2 , y = 5 z − 6 , with 3 3 3 z arbitrary.] x − 2y + bz = 3 2z = 2 = 1.

5x + 2y

4.1. PROBLEMS 10. Express the determinant of the matrix 1 1 2 1 1 2 3 4 B= 2 4 7 2t + 6 2 2 6−t t

87

as as polynomial in t and hence determine the rational values of t for which B −1 exists. [Answer: det B = (t − 2)(2t − 1); t = 2 and t = 1 .] 2

11. If A is a 3 × 3 matrix over a ﬁeld and det A = 0, prove that (i) det (adj A) = (det A)2 , 1 (ii) (adj A)−1 = A = adj (A−1 ). det A

12. Suppose that A is a real 3 × 3 matrix such that At A = I3 . (ii) Prove that det A = ±1. (i) Prove that At (A − I3 ) = −(A − I3 )t .

(iii) Use (i) to prove that if det A = 1, then det (A − I3 ) = 0. 13. If A is a square matrix such that one column is a linear combination of the remaining columns, prove that det A = 0. Prove that the converse also holds. 14. Use Cramer’s rule to solve the system −2x + 3y − z = 1 x + 2y − z = 4 −2x − y + z = −3. [Answer: x = 2, y = 3, z = 4.] 15. Use remark 4.0.4 to deduce that det Eij = −1, det Ei (t) = t, det Eij (t) = 1

and use theorem 2.5.8 and induction, to prove that det (BA) = det B det A, if B is non–singular. Also prove that the formula holds when B is singular.

88 16. Prove that

CHAPTER 4. DETERMINANTS

a+b+c a+b a a a+b a+b+c a a a a a+b+c a+b a a a+b a+b+c 17. Prove that 1 + u1 u1 u1 u1 u2 1 + u2 u2 u2 u3 u3 1 + u3 u3 u4 u4 u4 1 + u4

= c2 (2b+c)(4a+2b+c).

= 1 + u1 + u2 + u3 + u4 .

18. Let A ∈ Mn×n (F ). If At = −A, prove that det A = 0 if n is odd and 1 + 1 = 0 in F . 19. Prove that 1 r r r 1 1 r r 1 1 1 r 1 1 1 1

= (1 − r)3 .

20. Express the determinant 1 a2 − bc a4 1 b2 − ca b4 1 c2 − ab c4 as the product of one quadratic and four linear factors. [Answer: (b − a)(c − a)(c − b)(a + b + c)(b2 + bc + c2 + ac + ab + a2 ).]

Chapter 5

COMPLEX NUMBERS

5.1 Constructing the complex numbers

One way of introducing the ﬁeld C of complex numbers is via the arithmetic of 2 × 2 matrices. DEFINITION 5.1.1 A complex number is a matrix of the form x −y y x where x and y are real numbers. x 0 are scalar matrices and are called 0 x real complex numbers and are denoted by the symbol {x}. The real complex numbers {x} and {y} are respectively called the real x −y . part and imaginary part of the complex number y x 0 −1 The complex number is denoted by the symbol i. 1 0 Complex numbers of the form We have the identities x −y y x = x 0 0 x + 0 −y y 0 = x 0 0 x + 0 −1 1 0 y 0 0 y ,

= {x} + i{y}, i2 = 0 −1 1 0 0 −1 1 0 89 = −1 0 0 −1 = {−1}.

90

CHAPTER 5. COMPLEX NUMBERS

Complex numbers of the form i{y}, where y is a non–zero real number, are called imaginary numbers. If two complex numbers are equal, we can equate their real and imaginary parts: {x1 } + i{y1 } = {x2 } + i{y2 } ⇒ x1 = x2 and y1 = y2 , if x1 , x2 , y1 , y2 are real numbers. Noting that {0} + i{0} = {0}, gives the useful special case is {x} + i{y} = {0} ⇒ x = 0 and y = 0, if x and y are real numbers. The sum and product of two real complex numbers are also real complex numbers: {x} + {y} = {x + y}, {x}{y} = {xy}. Also, as real complex numbers are scalar matrices, their arithmetic is very simple. They form a ﬁeld under the operations of matrix addition and multiplication. The additive identity is {0}, the additive inverse of {x} is {−x}, the multiplicative identity is {1} and the multiplicative inverse of {x} is {x−1 }. Consequently {x} − {y} = {x} + (−{y}) = {x} + {−y} = {x − y}, {x} = {x}{y}−1 = {x}{y −1 } = {xy −1 } = {y} x y .

It is customary to blur the distinction between the real complex number {x} and the real number x and write {x} as x. Thus we write the complex number {x} + i{y} simply as x + iy. More generally, the sum of two complex numbers is a complex number: (x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + y2 ); (5.1)

and (using the fact that scalar matrices commute with all matrices under matrix multiplication and {−1}A = −A if A is a matrix), the product of two complex numbers is a complex number: (x1 + iy1 )(x2 + iy2 ) = x1 (x2 + iy2 ) + (iy1 )(x2 + iy2 ) = x1 x2 + x1 (iy2 ) + (iy1 )x2 + (iy1 )(iy2 ) = x1 x2 + ix1 y2 + iy1 x2 + i2 y1 y2 = (x1 x2 − y1 y2 ) + i(x1 y2 + y1 x2 ), = (x1 x2 + {−1}y1 y2 ) + i(x1 y2 + y1 x2 ) (5.2)

5.2. CALCULATING WITH COMPLEX NUMBERS

91

The set C of complex numbers forms a ﬁeld under the operations of matrix addition and multiplication. The additive identity is 0, the additive inverse of x + iy is the complex number (−x) + i(−y), the multiplicative identity is 1 and the multiplicative inverse of the non–zero complex number x + iy is the complex number u + iv, where x −y u= 2 and v = 2 . 2 x +y x + y2 (If x + iy = 0, then x = 0 or y = 0, so x2 + y 2 = 0.) From equations 5.1 and 5.2, we observe that addition and multiplication of complex numbers is performed just as for real numbers, replacing i2 by −1, whenever it occurs. A useful identity satisﬁed by complex numbers is r2 + s2 = (r + is)(r − is). This leads to a method of expressing the ratio of two complex numbers in the form x + iy, where x and y are real complex numbers. x1 + iy1 x2 + iy2 = = (x1 + iy1 )(x2 − iy2 ) (x2 + iy2 )(x2 − iy2 ) (x1 x2 + y1 y2 ) + i(−x1 y2 + y1 x2 ) . 2 x2 + y2 2

The process is known as rationalization of the denominator.

5.2

Calculating with complex numbers

We can now do all the standard linear algebra calculations over the ﬁeld of complex numbers – ﬁnd the reduced row–echelon form of an matrix whose elements are complex numbers, solve systems of linear equations, ﬁnd inverses and calculate determinants. For example, solve the system (1 + i)z + (2 − i)w = 2 + 7i The coeﬃcient determinant is 1+i 2−i 7 8 − 2i = (1 + i)(8 − 2i) − 7(2 − i) = (8 − 2i) + i(8 − 2i) − 14 + 7i = −4 + 13i = 0. 7z + (8 − 2i)w = 4 − 9i.

92

CHAPTER 5. COMPLEX NUMBERS

Hence by Cramer’s rule, there is a unique solution: 2 + 7i 2 − i 4 − 9i 8 − 2i −4 + 13i (2 + 7i)(8 − 2i) − (4 − 9i)(2 − i) −4 + 13i 2(8 − 2i) + (7i)(8 − 2i) − {(4(2 − i) − 9i(2 − i)} −4 + 13i 2 − {8 − 4i − 18i + 9i2 } 16 − 4i + 56i − 14i −4 + 13i 31 + 74i −4 + 13i (31 + 74i)(−4 − 13i) (−4)2 + 132 838 − 699i (−4)2 + 132 838 699 − i 185 185 −698 229 + i. 185 185

z = = = = = = = =

and similarly w =

An important property enjoyed by complex numbers is that every complex number has a square root: THEOREM 5.2.1 If w is a non–zero complex number, then the equation z 2 = w has a solution z ∈ C. Proof. Let w = a + ib, a, b ∈ R. √ Case 1. Suppose b = 0. Then if a > 0, z = a is a solution, while if √ a < 0, i −a is a solution. Case 2. Suppose b = 0. Let z = x + iy, x, y ∈ R. Then the equation 2 = w becomes z (x + iy)2 = x2 − y 2 + 2xyi = a + ib, so equating real and imaginary parts gives x2 − y 2 = a and 2xy = b.

5.2. CALCULATING WITH COMPLEX NUMBERS Hence x = 0 and y = b/(2x). Consequently x2 − b 2x

2

93

= a,

so 4x4 − 4ax2 − b2 = 0 and 4(x2 )2 − 4a(x2 ) − b2 = 0. Hence √ √ a ± a2 + b2 4a ± 16a2 + 16b2 2 = . x = 8 2 √ However x2 > 0, so we must take the + sign, as a − a2 + b2 < 0. Hence x =

2

a+

√

a2 + b2 , 2

x=±

a+

√

a2 + b2 . 2

Then y is determined by y = b/(2x). EXAMPLE 5.2.1 Solve the equation z 2 = 1 + i. Solution. Put z = x + iy. Then the equation becomes (x + iy)2 = x2 − y 2 + 2xyi = 1 + i, so equating real and imaginary parts gives x2 − y 2 = 1 and 2xy = 1. Hence x = 0 and y = 1/(2x). Consequently x2 − so 4x4 − 4x2 − 1 = 0. Hence x = Hence

2

1 2x

2

= 1,

4±

√

√ 16 + 16 1± 2 = . 8 2 √ 1+ 2 . 2

√ 1+ 2 x = 2

2

and

x=± 1 1+

Then y= Hence the solutions are z = ±

1 = ±√ 2x 2 √

√ . 2

1+ 2 i +√ √ . 2 2 1+ 2

94

CHAPTER 5. COMPLEX NUMBERS

√ EXAMPLE 5.2.2 Solve the equation z 2 + ( 3 + i)z + 1 = 0. Solution. Because every complex number has a square root, the familiar formula √ −b ± b2 − 4ac z= 2a for the solution of the general quadratic equation az 2 + bz + c = 0 can be used, where now a(= 0), b, c ∈ C. Hence z = = √ −( 3 + i) ± √ −( 3 + i) ± √ ( 3 + i)2 − 4 √ (3 + 2 3i − 1) − 4

2

2 √ √ −( 3 + i) ± −2 + 2 3i . = 2 √ Now we have to solve w2 = −2 + 2 3i. Put w = x + iy. Then w2 = √ x2 − y 2 + 2xyi = −2 + 2 √ and equating real and imaginary parts gives 3i √ x2 − y 2 = −2 and 2xy = 2 3. Hence y = 3/x and so x2 − 3/x2 = −2. So x4 + 2x2 − 3 = 0 and (x2 + 3)(x2 − 1) = 0. √ Hence x2 − 1 = 0 and x = ±1. √ √ 2 Then y = ± 3. Hence (1 + 3i) = −2 + 2 3i and the formula for z now becomes √ √ − 3 − i ± (1 + 3i) z = 2 √ √ √ √ −1 − 3 − (1 + 3)i 1 − 3 + (1 + 3)i or . = 2 2 EXAMPLE 5.2.3 Find the cube roots of 1. Solution. We have to solve the equation z 3 = 1, or z 3 − 1 = 0. Now z 3 − 1 = (z − 1)(z 2 + z + 1). So z 3 − 1 = 0 ⇒ z − 1 = 0 or z 2 + z + 1 = 0. But √ √ −1 ± 12 − 4 −1 ± 3i 2 z +z+1=0⇒z = = . 2 2 √ So there are 3 cube roots of 1, namely 1 and (−1 ± 3i)/2. We state the next theorem without proof. It states that every non– constant polynomial with complex number coeﬃcients has a root in the ﬁeld of complex numbers.

5.3. GEOMETRIC REPRESENTATION OF C

95

THEOREM 5.2.2 (Gauss) If f (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 , where an = 0 and n ≥ 1, then f (z) = 0 for some z ∈ C. It follows that in view of the factor theorem, which states that if a ∈ F is a root of a polynomial f (z) with coeﬃcients from a ﬁeld F , then z − a is a factor of f (z), that is f (z) = (z − a)g(z), where the coeﬃcients of g(z) also belong to F . By repeated application of this result, we can factorize any polynomial with complex coeﬃcients into a product of linear factors with complex coeﬃcients: f (z) = an (z − z1 )(z − z2 ) · · · (z − zn ). There are available a number of computational algorithms for ﬁnding good approximations to the roots of a polynomial with complex coeﬃcients.

5.3

Geometric representation of C

Complex numbers can be represented as points in the plane, using the correspondence x + iy ↔ (x, y). The representation is known as the Argand diagram or complex plane. The real complex numbers lie on the x–axis, which is then called the real axis, while the imaginary numbers lie on the y–axis, which is known as the imaginary axis. The complex numbers with positive imaginary part lie in the upper half plane, while those with negative imaginary part lie in the lower half plane. Because of the equation (x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + y2 ), complex numbers add vectorially, using the parallellogram law. Similarly, the complex number z1 − z2 can be represented by the vector from (x2 , y2 ) to (x1 , y1 ), where z1 = x1 + iy1 and z2 = x2 + iy2 . (See Figure 5.1.) The geometrical representation of complex numbers can be very useful when complex number methods are used to investigate properties of triangles and circles. It is very important in the branch of calculus known as Complex Function theory, where geometric methods play an important role. We mention that the line through two distinct points P1 = (x1 , y1 ) and P2 = (x2 , y2 ) has the form z = (1 − t)z1 + tz2 , t ∈ R, where z = x + iy is any point on the line and zi = xi + iyi , i = 1, 2. For the line has parametric equations x = (1 − t)x1 + tx2 , y = (1 − t)y1 + ty2 and these can be combined into a single equation z = (1 − t)z1 + tz2 .

96

CHAPTER 5. COMPLEX NUMBERS

'

¨ z1 + z2 ¡ ¨ ¨ ¡ ¨ ¡ ¡ z 2 ¨¨ ¡ ! ¡d d ¡ ¡ d ¡ ¡ d ¡ B ¨¡z1 ¡ ¨¨ ¡ ¨ ¡ ¨¨ ¡ ¨¨ E d d d d z1 − z2 c ¨¨

T

Figure 5.1: Complex addition and subraction.

Circles have various equation representations in terms of complex numbers, as will be seen later.

5.4

Complex conjugate

DEFINITION 5.4.1 (Complex conjugate) If z = x + iy, the complex conjugate of z is the complex number deﬁned by z = x − iy. Geometrically, the complex conjugate of z is obtained by reﬂecting z in the real axis (see Figure 5.2). The following properties of the complex conjugate are easy to verify: 1. z1 + z2 = z1 + z2 ; 2. −z = − z. 3. z1 − z2 = z1 − z2 ; 4. z1 z2 = z1 z2 ; 5. (1/z) = 1/z; 6. (z1 /z2 ) = z1 /z2 ;

5.4. COMPLEX CONJUGATE

T & &x b &

97 z y

E ~ z

'

&

c

Figure 5.2: The complex conjugate of z: z.

7. z is real if and only if z = z; 8. With the standard convention that the real and imaginary parts are denoted by Re z and Im z, we have Re z = z+z , 2 Im z = z−z ; 2i

9. If z = x + iy, then zz = x2 + y 2 . THEOREM 5.4.1 If f (z) is a polynomial with real coeﬃcients, then its non–real roots occur in complex–conjugate pairs, i.e. if f (z) = 0, then f (z) = 0. Proof. Suppose f (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 = 0, where an , . . . , a0 are real. Then 0 = 0 = f (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 = an z n + an−1 z n−1 + · · · + a1 z + a0

= an z n + an−1 z n−1 + · · · + a1 z + a0

= f (z).

EXAMPLE 5.4.1 Discuss the position of the roots of the equation z 4 = −1 in the complex plane. Solution. The equation z 4 = −1 has real coeﬃcients and so its roots come in complex conjugate pairs. Also if z is a root, so is −z. Also there are

98

CHAPTER 5. COMPLEX NUMBERS

clearly no real roots and no imaginary roots. So there must be one root w in the ﬁrst quadrant, with all remaining roots being given by w, −w and −w. In fact, as we shall soon see, the roots lie evenly spaced on the unit circle. The following theorem is useful in deciding if a polynomial f (z) has a multiple root a; that is if (z − a)m divides f (z) for some m ≥ 2. (The proof is left as an exercise.) THEOREM 5.4.2 If f (z) = (z − a)m g(z), where m ≥ 2 and g(z) is a polynomial, then f ′ (a) = 0 and the polynomial and its derivative have a common root. From theorem 5.4.1 we obtain a result which is very useful in the explicit integration of rational functions (i.e. ratios of polynomials) with real coeﬃcients. THEOREM 5.4.3 If f (z) is a non–constant polynomial with real coeﬃcients, then f (z) can be factorized as a product of real linear factors and real quadratic factors. Proof. In general f (z) will have r real roots z1 , . . . , zr and 2s non–real roots zr+1 , z r+1 , . . . , zr+s , z r+s , occurring in complex–conjugate pairs by theorem 5.4.1. Then if an is the coeﬃcient of highest degree in f (z), we have the factorization f (z) = an (z − z1 ) · · · (z − zr ) ×

×(z − zr+1 )(z − z r+1 ) · · · (z − zr+s )(z − z r+s ).

We then use the following identity for j = r + 1, . . . , r + s which in turn shows that paired terms give rise to real quadratic factors: (z − zj )(z − z j ) = z 2 − (zj + z j )z + zj z j where zj = xj + iyj . A well–known example of such a factorization is the following: EXAMPLE 5.4.2 Find a factorization of z 4 +1 into real linear and quadratic factors.

2 = z 2 − 2Re zj + (x2 + yj ), j

5.5. MODULUS OF A COMPLEX NUMBER

T

99

'

&

z b |z|&& y

&x

E

c

Figure 5.3: The modulus of z: |z|.

Solution. Clearly there are no real roots. Also we have the preliminary factorization z 4 + 1 = (z 2 − i)(z 2 + i). Now the roots of z 2 − i are easily √ √ veriﬁed to be ±(1 + i)/ 2, so the roots of z 2 + i must be ±(1 − i)/ 2. √ In other words the roots are w = (1 + i)/ 2 and w, −w, −w. Grouping conjugate–complex terms gives the factorization z 4 + 1 = (z − w)(z − w)(z + w)(z + w)

= (z 2 − 2zRe w + ww)(z 2 + 2zRe w + ww) √ √ = (z 2 − 2z + 1)(z 2 + 2z + 1).

5.5

Modulus of a complex number

DEFINITION 5.5.1 (Modulus) If z = x + iy, the modulus of z is the non–negative real number |z| deﬁned by |z| = x2 + y 2 . Geometrically, the modulus of z is the distance from z to 0 (see Figure 5.3). More generally, |z1 −z2 | is the distance between z1 and z2 in the complex plane. For |z1 − z2 | = |(x1 + iy1 ) − (x2 + iy2 )| = |(x1 − x2 ) + i(y1 − y2 )| =

(x1 − x2 )2 + (y1 − y2 )2 .

The following properties of the modulus are easy to verify, using the identity |z|2 = zz: (i) (ii) |z1 z2 | = |z1 ||z2 |; |z −1 | = |z|−1 ;

100 (iii) z1 |z1 | = . z2 |z2 |

CHAPTER 5. COMPLEX NUMBERS

For example, to prove (i): |z1 z2 |2 = (z1 z2 )z1 z2 = (z1 z2 )z1 z2 Hence |z1 z2 | = |z1 ||z2 |. EXAMPLE 5.5.1 Find |z| when z = Solution. |z| = = = = (z1 z1 )(z2 z2 ) = |z1 |2 |z2 |2 = (|z1 ||z2 |)2 . (1 + i)4 . (1 + 6i)(2 − 7i)

|1 + i|4 |1 + 6i||2 − 7i| √ ( 12 + 12 )4 √ 12 + 62 22 + (−7)2 4 √ √ . 37 53

THEOREM 5.5.1 (Ratio formulae) If z lies on the line through z1 and z2 : z = (1 − t)z1 + tz2 , t ∈ R, we have the useful ratio formulae: (i) z − z1 z − z2 z − z1 z1 − z2 = t 1−t |t|. if z = z2 ,

(ii)

=

Circle equations. The equation |z − z0 | = r, where z0 ∈ C and r > 0, represents the circle centre z0 and radius r. For example the equation |z − (1 + 2i)| = 3 represents the circle (x − 1)2 + (y − 2)2 = 9. Another useful circle equation is the circle of Apollonius : z−a = λ, z−b where a and b are distinct complex numbers and λ is a positive real number, λ = 1. (If λ = 1, the above equation represents the perpendicular bisector of the segment joining a and b.)

5.5. MODULUS OF A COMPLEX NUMBER y

T

101

'

Ex

c

Figure 5.4: Apollonius circles:

|z+2i| |z−2i|

1 8 = 1, 3, 2, 5; 4, 3, 2, 8. 4 8 8 1 1 5

An algebraic proof that the above equation represents a circle, runs as follows. We use the following identities: (i) (ii) (iii) |z − a|2 Re (z1 ± z2 ) Re (tz) = = = |z|2 − 2Re (za) + |a|2 Re z1 ± Re z2 tRe z if t ∈ R.

We have

⇔ (1 − λ2 )|z|2 − 2Re {z(a − λ2 b)} = λ2 |b|2 − |a|2 λ2 |b|2 − |a|2 a − λ2 b ⇔ |z|2 − 2Re z = 1 − λ2 1 − λ2 ⇔ |z|2 − 2Re z a − λ2 b 1 − λ2 + a − λ2 b 1 − λ2

2

⇔ |z|2 − 2Re {za} + |a|2 = λ2 (|z|2 − 2Re {zb} + |b|2 )

z−a = λ ⇔ |z − a|2 = λ2 |z − b|2 z−b

=

a − λ2 b λ2 |b|2 − |a|2 + 2 1−λ 1 − λ2

2

.

Now it is easily veriﬁed that |a − λ2 b|2 + (1 − λ2 )(λ2 |b|2 − |a|2 ) = λ2 |a − b|2 .

102 So we obtain z−a =λ ⇔ z−b ⇔

CHAPTER 5. COMPLEX NUMBERS

z− z−

a − λ2 b 1 − λ2 a − λ2 b 1 − λ2

2

λ2 |a − b|2 |1 − λ2 |2 λ|a − b| = . |1 − λ2 | =

The last equation represents a circle centre z0 , radius r, where z0 = a − λ2 b 1 − λ2 and r= λ|a − b| . |1 − λ2 |

There are two special points on the circle of Apollonius, the points z1 and z2 deﬁned by z2 − a z1 − a = λ and = −λ, z1 − b z2 − b or a − λb a + λb z1 = and z2 = . (5.3) 1−λ 1+λ It is easy to verify that z1 and z2 are distinct points on the line through a and b and that z0 = z1 +z2 . Hence the circle of Apollonius is the circle based 2 on the segment z1 , z2 as diameter. EXAMPLE 5.5.2 Find the centre and radius of the circle |z − 1 − i| = 2|z − 5 − 2i|. Solution. Method 1. Proceed algebraically and simplify the equation |x + iy − 1 − i| = 2|x + iy − 5 − 2i| or |x − 1 + i(y − 1)| = 2|x − 5 + i(y − 2)|. Squaring both sides gives (x − 1)2 + (y − 1)2 = 4((x − 5)2 + (y − 2)2 ), which reduces to the circle equation x2 + y 2 − Completing the square gives 7 19 (x − )2 + (y − )2 = 3 3 19 3

2

38 14 x − y + 38 = 0. 3 3

2

+

7 3

− 38 =

68 , 9

5.6. ARGUMENT OF A COMPLEX NUMBER

T & &θ

103

r& x

b &

z y

E

'

c

Figure 5.5: The argument of z: arg z = θ.

so the centre is ( 19 , 7 ) and the radius is 68 . 3 3 9 Method 2. Calculate the diametrical points z1 and z2 deﬁned above by equations 5.3: z1 − 1 − i = 2(z1 − 5 − 2i)

z2 − 1 − i = −2(z2 − 5 − 2i). 19 7 z1 + z2 = + i 2 3 3 √

We ﬁnd z1 = 9 + 3i and z2 = (11 + 5i)/3. Hence the centre z0 is given by z0 = and the radius r is given by r = |z1 − z0 | = 19 7 8 2 + i − (9 + 3i) = − − i = 3 3 3 3 68 . 3

5.6

Argument of a complex number

Let z = x + iy be a non–zero complex number, r = |z| = x2 + y 2 . Then we have x = r cos θ, y = r sin θ, where θ is the angle made by z with the positive x–axis. So θ is unique up to addition of a multiple of 2π radians. DEFINITION 5.6.1 (Argument) Any number θ satisfying the above pair of equations is called an argument of z and is denoted by arg z. The particular argument of z lying in the range −π < θ ≤ π is called the principal argument of z and is denoted by Arg z (see Figure 5.5). We have z = r cos θ + ir sin θ = r(cos θ + i sin θ) and this representation of z is called the polar representation or modulus–argument form of z.

104

CHAPTER 5. COMPLEX NUMBERS

EXAMPLE 5.6.1 Arg 1 = 0, Arg (−1) = π, Arg i = π , Arg (−i) = − π . 2 2 We note that y/x = tan θ if x = 0, so θ is determined by this equation up to a multiple of π. In fact Arg z = tan−1 y + kπ, x

where k = 0 if x > 0; k = 1 if x < 0, y > 0; k = −1 if x < 0, y < 0. To determine Arg z graphically, it is simplest to draw the triangle formed by the points 0, x, z on the complex plane, mark in the positive acute angle α between the rays 0, x and 0, z and determine Arg z geometrically, using the fact that α = tan−1 (|y|/|x|), as in the following examples: EXAMPLE 5.6.2 Determine the principal argument of z for the followig complex numbers: z = 4 + 3i, −4 + 3i, −4 − 3i, 4 − 3i. Solution. Referring to Figure 5.6, we see that Arg z has the values α, π − α, −π + α, −α, where α = tan−1 3 . 4 An important property of the argument of a complex number states that the sum of the arguments of two non–zero complex numbers is an argument of their product: THEOREM 5.6.1 If θ1 and θ2 are arguments of z1 and z2 , then θ1 + θ2 is an argument of z1 z2 . Proof. Let z1 and z2 have polar representations z1 = r1 (cos θ1 + i sin θ1 ) and z2 = r2 (cos θ2 + i sin θ2 ). Then z1 z2 = r1 (cos θ1 + i sin θ1 )r2 (cos θ2 + i sin θ2 ) = r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 + i(cos θ1 sin θ2 + sin θ1 cos θ2 )) = r1 r2 (cos (θ1 + θ2 ) + i sin (θ1 + θ2 )), which is the polar representation of z1 z2 , as r1 r2 = |z1 ||z2 | = |z1 z2 |. Hence θ1 + θ2 is an argument of z1 z2 . An easy induction gives the following generalization to a product of n complex numbers:

5.6. ARGUMENT OF A COMPLEX NUMBER y

T & & b &

y

T

105

4 + 3i

Ex

−4 + 3i

} '

'

&α

α

Ex

c

c

y

T

y

T

' a & &

&

α&

Ex

'

α c

Ex ~ 4 − 3i

−4 − 3i

c

Figure 5.6: Argument examples.

COROLLARY 5.6.1 If θ1 , . . . , θn are arguments for z1 , . . . , zn respectively, then θ1 + · · · + θn is an argument for z1 · · · zn . Taking θ1 = · · · = θn = θ in the previous corollary gives COROLLARY 5.6.2 If θ is an argument of z, then nθ is an argument for zn. THEOREM 5.6.2 If θ is an argument of the non–zero complex number z, then −θ is an argument of z −1 . Proof. Let θ be an argument of z. Then z = r(cos θ +i sin θ), where r = |z|. Hence z −1 = r−1 (cos θ + i sin θ)−1 = r−1 (cos(−θ) + i sin(−θ)). Now r−1 = |z|−1 = |z −1 |, so −θ is an argument of z −1 . COROLLARY 5.6.3 If θ1 and θ2 are arguments of z1 and z2 , then θ1 − θ2 is an argument of z1 /z2 . = r−1 (cos θ − i sin θ)

106

CHAPTER 5. COMPLEX NUMBERS

In terms of principal arguments, we have the following equations: (i) Arg (z1 z2 ) = (ii) Arg (z −1 ) = (iii) Arg (z1 /z2 ) = (iv) Arg (z1 · · · zn ) = (v) Arg (z n ) = where k1 , k2 , k3 , k4 , k5 are Arg z1 +Arg z2 + 2k1 π, −Arg z + 2k2 π, Arg z1 −Arg z2 + 2k3 π, Arg z1 + · · · +Arg zn + 2k4 π, n Arg z + 2k5 π, integers.

In numerical examples, we can write (i), for example, as Arg (z1 z2 ) ≡ Arg z1 + Arg z2 . EXAMPLE 5.6.3 Find the modulus and principal argument of √ 17 3+i z= 1+i and hence express z in modulus–argument form. √ | 3 + i|17 217 Solution. |z| = = √ = 217/2 . |1 + i|17 ( 2)17 √ 3+i Arg z ≡ 17Arg 1+i √ = 17(Arg ( 3 + i) − Arg (1 + i)) π π −17π = 17 = − . 6 4 12 Hence Arg z = −17π + 2kπ, where k is an integer. We see that k = 1 and 12 hence Arg z = 7π . Consequently z = 217/2 cos 7π + i sin 7π . 12 12 12 DEFINITION 5.6.2 If θ is a real number, then we deﬁne eiθ by eiθ = cos θ + i sin θ. More generally, if z = x + iy, then we deﬁne ez by ez = ex eiy . For example, e 2 = i, eiπ = −1, e− 2 = −i. The following properties of the complex exponential function are left as exercises:

iπ iπ

5.7. DE MOIVRE’S THEOREM THEOREM 5.6.3 (i) (ii) (iii) (iv) (v) (vi) ez1 ez2 · · · ezn ez (ez )−1 ez1 /ez2 ez = = = = = = ez1 +z2 , ez1 +···+zn , 0, e−z , ez1 −z2 , ez .

107

ez1

THEOREM 5.6.4 The equation ez = 1 has the complete solution z = 2kπi, k ∈ Z. Proof. First we observe that e2kπi = cos (2kπ) + i sin (2kπ) = 1. Conversely, suppose ez = 1, z = x + iy. Then ex (cos y + i sin y) = 1. Hence ex cos y = 1 and ex sin y = 0. Hence sin y = 0 and so y = nπ, n ∈ Z. Then ex cos (nπ) = 1, so ex (−1)n = 1, from which follows (−1)n = 1 as ex > 0. Hence n = 2k, k ∈ Z and ex = 1. Hence x = 0 and z = 2kπi.

5.7

De Moivre’s theorem

The next theorem has many uses and is a special case of theorem 5.6.3(ii). Alternatively it can be proved directly by induction on n. THEOREM 5.7.1 (De Moivre) If n is a positive integer, then (cos θ + i sin θ)n = cos nθ + i sin nθ. As a ﬁrst application, we consider the equation z n = 1. THEOREM 5.7.2 The equation z n = 1 has n distinct solutions, namely 2kπi the complex numbers ζk = e n , k = 0, 1, . . . , n − 1. These lie equally spaced on the unit circle |z| = 1 and are obtained by starting at 1, moving round the circle anti–clockwise, incrementing the argument in steps of 2π . n (See Figure 5.7) 2πi We notice that the roots are the powers of the special root ζ = e n .

108

CHAPTER 5. COMPLEX NUMBERS

T

|z| = 1 U ζ B ¨ 1 2π/n¨¨ ¨¨ ¨ ¨ 2π/n E E ¨ r ζ0 2π/n rr rr r rr j ζn−1

ζ2

Figure 5.7: The nth roots of unity.

Proof. With ζk deﬁned as above,

n ζk = e

2kπi n

n

=e

2kπi n n

= 1,

by De Moivre’s theorem. However |ζk | = 1 and arg ζk = 2kπ , so the comn plex numbers ζk , k = 0, 1, . . . , n − 1, lie equally spaced on the unit circle. Consequently these numbers must be precisely all the roots of z n − 1. For the polynomial z n − 1, being of degree n over a ﬁeld, can have at most n distinct roots in that ﬁeld. The more general equation z n = a, where a ∈, C, a = 0, can be reduced to the previous case: iα Let α be argument of z, so that a = |a|eiα . Then if w = |a|1/n e n , we have wn = |a|1/n e n

iα

n

iα

= (|a|1/n )n e n = |a|eiα = a.

n

So w is a particular solution. Substituting for a in the original equation, we get z n = wn , or (z/w)n = 1. Hence the complete solution is z/w =

5.7. DE MOIVRE’S THEOREM

T ! ¡ ¡

109 z1

¡ ¨¨ ¡2π/n ¨ ¨ ¡ ¨¨ α ¨ ¡

¡ ¡

|z| = (|a|)1/n

Bz ¨ 0 E zn−1 q

Figure 5.8: The roots of z n = a.

e

2kπi n

, k = 0, 1, . . . , n − 1, or zk = |a|1/n e n e

iα 2kπi n

= |a|1/n e

i(α+2kπ) n

,

(5.4)

k = 0, 1, . . . , n − 1. So the roots are equally spaced on the circle |z| = |a|1/n and are generated from the special solution having argument equal to (arg a)/n, by incrementing the argument in steps of 2π/n. (See Figure 5.8.) EXAMPLE 5.7.1 Factorize the polynomial z 5 − 1 as a product of real linear and quadratic factors. Solution. The roots are 1, e 5 , e 5 , e 5 , e 5 , using the fact that non– real roots come in conjugate–complex pairs. Hence z 5 − 1 = (z − 1)(z − e Now (z − e

2πi 5 2πi 5 2πi −2πi 4πi −4πi

)(z − e

−2πi 5

)(z − e

2πi 5

4πi 5

)(z − e

−2πi 5

−4πi 5 ).

)(z − e

−2πi 5

) = z 2 − z(e

2

+e

2π 5

)+1

= z − 2z cos

+ 1.

110 Similarly (z − e

4πi 5

CHAPTER 5. COMPLEX NUMBERS

)(z − e

−4πi 5

) = z 2 − 2z cos

4π 5

+ 1.

This gives the desired factorization. EXAMPLE 5.7.2 Solve z 3 = i. Solution. |i| = 1 and Arg i =

π 2

= α. So by equation 5.4, the solutions are

i(α+2kπ) 3

zk = |i|1/3 e First, k = 0 gives z0 = e Next, k = 1 gives z1 = e Finally, k = 2 gives z1 = e

9πi 6 5πi 6 iπ 6

, k = 0, 1, 2.

√ π 3 π i = cos + i sin = + . 6 6 2 2

√ 5π 5π − 3 i = cos + i sin = + . 6 6 2 2

= cos

9π 9π + i sin = −i. 6 6

We ﬁnish this chapter with two more examples of De Moivre’s theorem. EXAMPLE 5.7.3 If C = 1 + cos θ + · · · + cos (n − 1)θ, S = sin θ + · · · + sin (n − 1)θ, prove that C= if θ = 2kπ, k ∈ Z. sin

nθ 2 θ sin 2

cos

(n−1)θ 2

and S =

sin

nθ 2 θ sin 2

sin

(n−1)θ , 2

5.8. PROBLEMS Solution.

111

C + iS = 1 + (cos θ + i sin θ) + · · · + (cos (n − 1)θ + i sin (n − 1)θ) = 1 + z + · · · + z n−1 , where z = eiθ 1 − zn , if z = 1, i.e. θ = 2kπ, = 1−z = 1 − einθ e 2 (e 2 − e 2 ) = −iθ iθ iθ 1 − eiθ e 2 (e 2 − e 2 )

−inθ θ inθ inθ

= 1 + eiθ + · · · + ei(n−1)θ

= ei(n−1) 2

sin

nθ 2 θ sin 2

θ θ = (cos (n − 1) 2 + i sin (n − 1) 2 )

sin

nθ 2 θ sin 2

.

The result follows by equating real and imaginary parts. EXAMPLE 5.7.4 Express cos nθ and sin nθ in terms of cos θ and sin θ, using the equation cos nθ + sin nθ = (cos θ + i sin θ)n . Solution. The binomial theorem gives (cos θ + i sin θ)n = cosn θ + + (i sin θ)n . Equating real and imaginary parts gives cos nθ = cosn θ − sin nθ =

n 1 n 2 n 1

cosn−1 θ(i sin θ) +

n 2

cosn−2 θ(i sin θ)2 + · · ·

cosn−2 θ sin2 θ + · · ·

n 3

cosn−1 θ sin θ −

cosn−3 θ sin3 θ + · · · .

5.8

PROBLEMS

2 + 3i (1 + 2i)2 ; (iii) . 1 − 4i 1−i

11 17 i; i (iii) − 7 + 2 .] 2

1. Express the following complex numbers in the form x + iy, x, y real: (i) (−3 + i)(14 − 2i); (ii)

[Answers: (i) −40 + 20i; (ii) − 10 + 17 2. Solve the following equations:

112 (i) (ii) iz + (2 − 10i)z (1 + i)z + (2 − i)w (1 + 2i)z + (3 + i)w

i 41 ;

CHAPTER 5. COMPLEX NUMBERS = = = 3z + 2i, −3i 2 + 2i.

19 5

9 [Answers:(i) z = − 41 −

(ii) z = −1 + 5i, w =

−

8i 5 .]

3. Express 1 + (1 + i) + (1 + i)2 + . . . + (1 + i)99 in the form x + iy, x, y real. [Answer: (1 + 250 )i.] 4. Solve the equations: (i) z 2 = −8 − 6i; (ii) z 2 − (3 + i)z + 4 + 3i = 0. [Answers: (i) z = ±(1 − 3i); (ii) z = 2 − i, 1 + 2i.] 5. Find the modulus and principal argument of each of the following complex numbers: √ i 3 (i) 4 + i; (ii) − 2 − 2 ; (iii) −1 + 2i; (iv) 1 (−1 + i 3). 2 [Answers: (i) tan−1 2.] √

1 17, tan−1 4 ; (ii) √ 10 2 , 1 −π + tan−1 3 ; (iii)

√

5, π −

6. Express the following complex numbers in modulus-argument form: √ √ (i) z = (1 + i)(1 + i 3)( 3 − i). √ (1 + i)5 (1 − i 3)5 √ . (ii) z = ( 3 + i)4 [Answers: √ (i) z = 4 2(cos 7.

5π 12

+ i sin

5π 12 );

(ii) z = 27/2 (cos

π 6

11π 12

+ i sin

11π 12 ).]

(i) If z = 2(cos π +i sin π ) and w = 3(cos 4 4 form of z5 z (a) zw; (b) w ; (c) w ; (d) w2 . z (a) (1 + i)12 ; (b) [Answers: (i): (a) 6(cos

1−i √ 2 5π 12 −6

+i sin

π 6 ),

ﬁnd the polar

(ii) Express the following complex numbers in the form x + iy: .

5π 12 ); 2 (b) 3 (cos π 12

+ i sin (d)

+ i sin

π 12 );

π π (c) 3 (cos − 12 + i sin − 12 ); 2

32 11π 9 (cos 12

+ i sin

11π 12 );

(ii): (a) −64;

(b) −i.]

5.8. PROBLEMS 8. Solve the equations: √ (i) z 2 = 1 + i 3; (ii) z 4 = i; (iii) z 3 = −8i; (iv) z 4 = 2 − 2i.

113

3+i) [Answers: (i) z = ± ( √2 ; (ii) ik (cos π + i sin π ), k = 0, 1, 2, 3; (iii) 8 8 √ √ 3 k 2 8 (cos π − i sin π ), k = 0, 1, 2, 3.] z = 2i, − 3 − i, 3 − i; (iv) z = i 16 16

√

9. Find the reduced row–echelon 2+i 1+i 1 + 2i 1 i 0 [Answer: 0 0 1 .] 0 0 0

form of the complex matrix −1 + 2i 2 −1 + i 1 . −2 + i 1 + i

10.

(i) Prove that the line equation lx + my = n is equivalent to pz + pz = 2n, where p = l + im.

(ii) Use (ii) to deduce that reﬂection in the straight line pz + pz = n is described by the equation pw + pz = n. [Hint: The complex number l + im is perpendicular to the given line.] (iii) Prove that the line |z −a| = |z −b| may be written as pz +pz = n, where p = b − a and n = |b|2 − |a|2 . Deduce that if z lies on the Apollonius circle |z−a| = λ, then w, the reﬂection of z in the line |z−b| |z − a| = |z − b|, lies on the Apollonius circle

|z−a| |z−b| 1 = λ.

11. Let a and b be distinct complex numbers and 0 < α < π. (i) Prove that each of the following sets in the complex plane represents a circular arc and sketch the circular arcs on the same diagram:

114 Arg

CHAPTER 5. COMPLEX NUMBERS z−a = α, −α, π − α, α − π. z−b z−a = π represents the line segment joining Also show that Arg z−b z−a = 0 represents the remaining portion of a and b, while Arg z−b the line through a and b. (ii) Use (i) to prove that four distinct points z1 , z2 , z3 , z4 are concyclic or collinear, if and only if the cross–ratio z4 − z1 z3 − z1 / z4 − z2 z3 − z2 is real. (iii) Use (ii) to derive Ptolemy’s Theorem: Four distinct points A, B, C, D are concyclic or collinear, if and only if one of the following holds: AB · CD + BC · AD = AC · BD

BD · AC + AD · BC = AB · CD

BD · AC + AB · CD = AD · BC.

Chapter 6

EIGENVALUES AND EIGENVECTORS

6.1 Motivation

We motivate the chapter on eigenvalues by discussing the equation ax2 + 2hxy + by 2 = c, where not all of a, h, b are zero. The expression ax2 + 2hxy + by 2 is called a quadratic form in x and y and we have the identity a h h b x y

ax2 + 2hxy + by 2 =

x y

= X t AX,

where X = form.

x y

and A =

a h . A is called the matrix of the quadratic h b

We now rotate the x, y axes anticlockwise through θ radians to new x1 , y1 axes. The equations describing the rotation of axes are derived as follows: Let P have coordinates (x, y) relative to the x, y axes and coordinates (x1 , y1 ) relative to the x1 , y1 axes. Then referring to Figure 6.1: 115

116

CHAPTER 6. EIGENVALUES AND EIGENVECTORS y y1

s d d d T d

P

d

d

d

d

'

α d Q d θ Od d d d d d

d R

x1

Ex

d

d

c

Figure 6.1: Rotating the axes.

x = OQ = OP cos (θ + α) = OP (cos θ cos α − sin θ sin α) = OR cos θ − PR sin θ = x1 cos θ − y1 sin θ. = (OP cos α) cos θ − (OP sin α) sin θ

Similarly y = x1 sin θ + y1 cos θ. We can combine these transformation equations into the single matrix equation: x cos θ − sin θ x1 = , y sin θ cos θ y1 x cos θ − sin θ x1 and P = ,Y = . y y1 sin θ cos θ We note that the columns of P give the directions of the positive x1 and y1 axes. Also P is an orthogonal matrix – we have P P t = I2 and so P −1 = P t . The matrix P has the special property that det P = 1. or X = P Y , where X = cos θ − sin θ is called a rotation matrix. sin θ cos θ We shall show soon that any 2 × 2 real orthogonal matrix with determinant A matrix of the type P =

6.1. MOTIVATION equal to 1 is a rotation matrix. We can also solve for the new coordinates in terms of the old ones: x1 cos θ sin θ x = Y = P tX = , y1 − sin θ cos θ y X t AX = (P Y )t A(P Y ) = Y t (P t AP )Y.

117

so x1 = x cos θ + y sin θ and y1 = −x sin θ + y cos θ. Then

Now suppose, as we later show, that it is possible to choose an angle θ so that P t AP is a diagonal matrix, say diag(λ1 , λ2 ). Then X t AX = x1 y1 λ1 0 0 λ2 x1 y1

2 = λ1 x2 + λ2 y1 1

(6.1)

and relative to the new axes, the equation ax2 + 2hxy + by 2 = c becomes 2 λ1 x2 + λ2 y1 = c, which is quite easy to sketch. This curve is symmetrical 1 about the x1 and y1 axes, with P1 and P2 , the respective columns of P , giving the directions of the axes of symmetry. Also it can be veriﬁed that P1 and P2 satisfy the equations AP1 = λ1 P1 and AP2 = λ2 P2 . These equations force a restriction on λ1 and λ2 . For if P1 = ﬁrst equation becomes a h h b u1 v1 = λ1 u1 v1 or a − λ1 h h b − λ1 u1 v1 = 0 0 . u1 v1 , the

Hence we are dealing with a homogeneous system of two linear equations in two unknowns, having a non–trivial solution (u1 , v1 ). Hence a − λ1 h h b − λ1 = 0.

This equation has real roots λ= a+b±

Similarly, λ2 satisﬁes the same equation. In expanded form, λ1 and λ2 satisfy λ2 − (a + b)λ + ab − h2 = 0. (a + b)2 − 4(ab − h2 ) a + b ± (a − b)2 + 4h2 = (6.2) 2 2 (The roots are distinct if a = b or h = 0. The case a = b and h = 0 needs no investigation, as it gives an equation of a circle.) The equation λ2 − (a + b)λ + ab − h2 = 0 is called the eigenvalue equation of the matrix A.

118

CHAPTER 6. EIGENVALUES AND EIGENVECTORS

6.2

Deﬁnitions and examples

DEFINITION 6.2.1 (Eigenvalue, eigenvector) Let A be a complex square matrix. Then if λ is a complex number and X a non–zero complex column vector satisfying AX = λX, we call X an eigenvector of A, while λ is called an eigenvalue of A. We also say that X is an eigenvector corresponding to the eigenvalue λ. So in the above example P1 and P2 are eigenvectors corresponding to λ1 and λ2 , respectively. We shall give an algorithm which starts from the a h eigenvalues of A = and constructs a rotation matrix P such that h b P t AP is diagonal. As noted above, if λ is an eigenvalue of an n × n matrix A, with corresponding eigenvector X, then (A − λIn )X = 0, with X = 0, so det (A − λIn ) = 0 and there are at most n distinct eigenvalues of A. Conversely if det (A − λIn ) = 0, then (A − λIn )X = 0 has a non–trivial solution X and so λ is an eigenvalue of A with X a corresponding eigenvector. DEFINITION 6.2.2 (Characteristic equation, polynomial) The equation det (A − λIn ) = 0 is called the characteristic equation of A, while the polynomial det (A − λIn ) is called the characteristic polynomial of A. The characteristic polynomial of A is often denoted by chA (λ). Hence the eigenvalues of A are the roots of the characteristic polynomial of A. a b , it is easily veriﬁed that the characterc d istic polynomial is λ2 − (trace A)λ + det A, where trace A = a + d is the sum of the diagonal elements of A. For a 2 × 2 matrix A = EXAMPLE 6.2.1 Find the eigenvalues of A = vectors. Solution. The characteristic equation of A is λ2 − 4λ + 3 = 0, or (λ − 1)(λ − 3) = 0. Hence λ = 1 or 3. The eigenvector equation (A − λIn )X = 0 reduces to 2−λ 1 1 2−λ x y = 0 0 , 2 1 1 2 and ﬁnd all eigen-

6.2. DEFINITIONS AND EXAMPLES or (2 − λ)x + y = 0

119

x + (2 − λ)y = 0. Taking λ = 1 gives x+y = 0 x + y = 0, which has solution x = −y, y arbitrary. Consequently the eigenvectors −y corresponding to λ = 1 are the vectors , with y = 0. y Taking λ = 3 gives −x + y = 0

x − y = 0,

which has solution x = y, y arbitrary. Consequently the eigenvectors correy sponding to λ = 3 are the vectors , with y = 0. y Our next result has wide applicability: THEOREM 6.2.1 Let A be a 2 × 2 matrix having distinct eigenvalues λ1 and λ2 and corresponding eigenvectors X1 and X2 . Let P be the matrix whose columns are X1 and X2 , respectively. Then P is non–singular and P −1 AP = λ1 0 0 λ2 .

Proof. Suppose AX1 = λ1 X1 and AX2 = λ2 X2 . We show that the system of homogeneous equations xX1 + yX2 = 0 has only the trivial solution. Then by theorem 2.5.10 the matrix P = [X1 |X2 ] is non–singular. So assume xX1 + yX2 = 0. Then A(xX1 + yX2 ) = A0 = 0, so x(AX1 ) + y(AX2 ) = 0. Hence xλ1 X1 + yλ2 X2 = 0. (6.4) (6.3)

120

CHAPTER 6. EIGENVALUES AND EIGENVECTORS

Multiplying equation 6.3 by λ1 and subtracting from equation 6.4 gives (λ2 − λ1 )yX2 = 0. Hence y = 0, as (λ2 −λ1 ) = 0 and X2 = 0. Then from equation 6.3, xX1 = 0 and hence x = 0. Then the equations AX1 = λ1 X1 and AX2 = λ2 X2 give AP = A[X1 |X2 ] = [AX1 |AX2 ] = [λ1 X1 |λ2 X2 ] λ1 0 = [X1 |X2 ] 0 λ2 so P −1 AP = EXAMPLE 6.2.2 Let A = X1 = −1 1 and X2 = 1 1 2 1 1 2 λ1 0 0 λ2 . λ1 0 0 λ2

=P

,

be the matrix of example 6.2.1. Then

are eigenvectors corresponding to eigenvalues −1 1 , we have 1 1 1 0 0 3 .

1 and 3, respectively. Hence if P = P −1 AP =

There are two immediate applications of theorem 6.2.1. The ﬁrst is to the calculation of An : If P −1 AP = diag (λ1 , λ2 ), then A = P diag (λ1 , λ2 )P −1 and An = P λ1 0 0 λ2

n

P −1

=P

λ1 0 0 λ2

n

P −1 = P

λn 0 1 0 λn 2

P −1 .

The second application is to solving a system of linear diﬀerential equations dx dt dy dt where A = = ax + by = cx + dy,

a b is a matrix of real or complex numbers and x and y c d ˙ are functions of t. The system can be written in matrix form as X = AX, where dx x x ˙ dt ˙ X= and X = = dy . y y ˙ dt

6.2. DEFINITIONS AND EXAMPLES We make the substitution X = P Y , where Y = are also functions of t and ˙ ˙ ˙ X = P Y = AX = A(P Y ), so Y = (P −1 AP )Y = λ1 0 0 λ2 Y. x1 y1

121 . Then x1 and y1

Hence x1 = λ1 x1 and y˙1 = λ2 y1 . ˙ These diﬀerential equations are well–known to have the solutions x1 = x1 (0)eλ1 t and y1 = y1 (0)eλ2 t , where x1 (0) is the value of x1 when t = 0. [If

dx dt

= kx, where k is a constant, then dx d e−kt x = −ke−kt x + e−kt = −ke−kt x + e−kt kx = 0. dt dt

Hence e−kt x is constant, so e−kt x = e−k0 x(0) = x(0). Hence x = x(0)ekt .] However x1 (0) x(0) = P −1 , so this determines x1 (0) and y1 (0) in y1 (0) y(0) terms of x(0) and y(0). Hence ultimately x and y are determined as explicit functions of t, using the equation X = P Y .

2 −3 . Use the eigenvalue method to 4 −5 derive an explicit formula for An and also solve the system of diﬀerential equations EXAMPLE 6.2.3 Let A = dx dt dy dt = 2x − 3y = 4x − 5y,

given x = 7 and y = 13 when t = 0. Solution. The characteristic polynomial of A is λ2 +3λ+2 which has distinct 1 roots λ1 = −1 and λ2 = −2. We ﬁnd corresponding eigenvectors X1 = 1 3 1 3 and X2 = . Hence if P = , we have P −1 AP = diag (−1, −2). 4 1 4 Hence An = = P diag (−1, −2)P −1 1 3 1 4

n

= P diag ((−1)n , (−2)n )P −1 4 −3 −1 1

(−1)n 0 0 (−2)n

122

CHAPTER 6. EIGENVALUES AND EIGENVECTORS = (−1)n = (−1)n = (−1)n 1 3 1 4 1 0 0 2n 4 −3 −1 1

4 − 3 × 2n −3 + 3 × 2n 4 − 4 × 2n −3 + 4 × 2n

1 3 × 2n 1 4 × 2n

4 −3 −1 1

.

To solve the diﬀerential equation system, make the substitution X = P Y . Then x = x1 + 3y1 , y = x1 + 4y1 . The system then becomes x1 = −x1 ˙ so x1 = x1 (0)e−t , y1 = y1 (0)e−2t . Now x1 (0) y1 (0) = P −1 x(0) y(0) = 4 −3 −1 1 7 13 = −11 6 ,

y1 = −2y1 , ˙

so x1 = −11e−t and y1 = 6e−2t . Hence x = −11e−t + 3(6e−2t ) = −11e−t + 18e−2t , y = −11e−t + 4(6e−2t ) = −11e−t + 24e−2t . For a more complicated example we solve a system of inhomogeneous recurrence relations. EXAMPLE 6.2.4 Solve the system of recurrence relations xn+1 = 2xn − yn − 1 given that x0 = 0 and y0 = −1. Solution. The system can be written in matrix form as Xn+1 = AXn + B, where A= 2 −1 −1 2 and B = −1 2 .

yn+1 = −xn + 2yn + 2,

It is then an easy induction to prove that Xn = An X0 + (An−1 + · · · + A + I2 )B. (6.5)

6.2. DEFINITIONS AND EXAMPLES Also it is easy to verify by the eigenvalue method that An = where U = 1 1 1 1 1 2 1 + 3n 1 − 3n 1 − 3n 1 + 3n 1 3n = U + V, 2 2

123

and V =

1 −1 . Hence −1 1 n (3n−1 + · · · + 3 + 1) U+ V 2 2 (3n−1 − 1) n U+ V. 2 4 n (3n−1 − 1) U+ V 2 4 −1 2

An−1 + · · · + A + I2 = = Then equation 6.5 gives Xn = 1 3n U+ V 2 2 xn yn 0 −1 =

+

,

which simpliﬁes to (2n + 1 − 3n )/4 (2n − 5 + 3n )/4 .

REMARK 6.2.1 If (A − I2 )−1 existed (that is, if det (A − I2 ) = 0, or equivalently, if 1 is not an eigenvalue of A), then we could have used the formula An−1 + · · · + A + I2 = (An − I2 )(A − I2 )−1 . (6.6)

Hence xn = (2n − 1 + 3n )/4 and yn = (2n − 5 + 3n )/4.

However the eigenvalues of A are 1 and 3 in the above problem, so formula 6.6 cannot be used there. Our discussion of eigenvalues and eigenvectors has been limited to 2 × 2 matrices. The discussion is more complicated for matrices of size greater than two and is best left to a second course in linear algebra. Nevertheless the following result is a useful generalization of theorem 6.2.1. The reader is referred to [28, page 350] for a proof. THEOREM 6.2.2 Let A be an n × n matrix having distinct eigenvalues λ1 , . . . , λn and corresponding eigenvectors X1 , . . . , Xn . Let P be the matrix whose columns are respectively X1 , . . . , Xn . Then P is non–singular and λ1 0 · · · 0 0 λ2 · · · 0 P −1 AP = . . . . . . . . . . . . . 0 0 · · · λn

124

CHAPTER 6. EIGENVALUES AND EIGENVECTORS

Another useful result which covers the case where there are multiple eigenvalues is the following (The reader is referred to [28, pages 351–352] for a proof): THEOREM 6.2.3 Suppose the characteristic polynomial of A has the factorization det (λIn − A) = (λ − c1 )n1 · · · (λ − ct )nt , where c1 , . . . , ct are the distinct eigenvalues of A. Suppose that for i = 1, . . . , t, we have nullity (ci In −A) = ni . For each i, choose a basis Xi1 , . . . , Xini for the eigenspace N (ci In − A). Then the matrix P = [X11 | · · · |X1n1 | · · · |Xt1 | · · · |Xtnt ] is non–singular and P −1 AP is the following diagonal matrix c1 In1 0 . . . 0 0 c2 In2 . . . 0 ··· ··· . . . 0 0 . . .

P −1 AP =

· · · ct Int

.

(The notation means that on the diagonal there are n1 elements c1 , followed by n2 elements c2 ,. . . , nt elements ct .)

6.3

PROBLEMS

4 −3 . Find a non–singular matrix P such that P −1 AP = 1 0 diag (1, 3) and hence prove that An = 3n − 1 3 − 3n A+ I2 . 2 2

1. Let A =

2. If A =

0.6 0.8 , prove that An tends to a limiting matrix 0.4 0.2 2/3 2/3 1/3 1/3

as n → ∞.

6.3. PROBLEMS 3. Solve the system of diﬀerential equations dx dt dy dt = 3x − 2y = 5x − 4y,

125

given x = 13 and y = 22 when t = 0. [Answer: x = 7et + 6e−2t , y = 7et + 15e−2t .] 4. Solve the system of recurrence relations xn+1 = 3xn − yn given that x0 = 1 and y0 = 2. [Answer: xn = 2n−1 (3 − 2n ), yn = 2n−1 (3 + 2n ).] 5. Let A = a b be a real or complex matrix with distinct eigenvalues c d λ1 , λ2 and corresponding eigenvectors X1 , X2 . Also let P = [X1 |X2 ]. xn+1 = axn + byn yn+1 = cxn + dyn has the solution xn yn = αλn X1 + βλn X2 , 1 2

yn+1 = −xn + 3yn ,

(a) Prove that the system of recurrence relations

where α and β are determined by the equation α β = P −1 x0 y0 .

(b) Prove that the system of diﬀerential equations dx dt dy dt has the solution x y = ax + by = cx + dy

= αeλ1 t X1 + βeλ2 t X2 ,

126

CHAPTER 6. EIGENVALUES AND EIGENVECTORS where α and β are determined by the equation α β = P −1 x(0) y(0) .

6. Let A =

a11 a12 be a real matrix with non–real eigenvalues λ = a21 a22 a + ib and λ = a − ib, with corresponding eigenvectors X = U + iV and X = U − iV , where U and V are real vectors. Also let P be the real matrix deﬁned by P = [U |V ]. Finally let a + ib = reiθ , where r > 0 and θ is real. (a) Prove that AU AV (b) Deduce that P −1 AP = a b −b a . = aU − bV

= bU + aV.

(c) Prove that the system of recurrence relations xn+1 = a11 xn + a12 yn yn+1 = a21 xn + a22 yn has the solution xn yn = rn {(αU + βV ) cos nθ + (βU − αV ) sin nθ},

where α and β are determined by the equation α β = P −1 x0 y0 .

(d) Prove that the system of diﬀerential equations dx dt dy dt = ax + by = cx + dy

6.3. PROBLEMS has the solution x y = eat {(αU + βV ) cos bt + (βU − αV ) sin bt},

127

where α and β are determined by the equation α β [Hint: Let x y =P x1 y1 = P −1 x(0) y(0) .

. Also let z = x1 + iy1 . Prove that

z = (a − ib)z ˙ and deduce that x1 + iy1 = eat (α + iβ)(cos bt + i sin bt). Then equate real and imaginary parts to solve for x1 , y1 and hence x, y.] 7. (The case of repeated eigenvalues.) Let A = a b and suppose c d that the characteristic polynomial of A, λ2 − (a + d)λ + (ad − bc), has a repeated root α. Also assume that A = αI2 . Let B = A − αI2 . (ii) Prove that B 2 = 0. (i) Prove that (a − d)2 + 4bc = 0.

(iii) Prove that BX2 = 0 for some vector X2 ; indeed, show that X2 0 1 . or can be taken to be 1 0 (iv) Let X1 = BX2 . Prove that P = [X1 |X2 ] is non–singular, AX1 = αX1 and AX2 = αX2 + X1 and deduce that P −1 AP = α 1 0 α

.

8. Use the previous result to solve system of the diﬀerential equations dx dt dy dt = 4x − y = 4x + 8y,

128

CHAPTER 6. EIGENVALUES AND EIGENVECTORS given that x = 1 = y when t = 0. [To solve the diﬀerential equation dx − kx = f (t), dt k a constant,

multiply throughout by e−kt , thereby converting the left–hand side to dx −kt x).] dt (e [Answer: x = (1 − 3t)e6t , y = (1 + 6t)e6t .] 9. Let 1/2 1/2 0 A = 1/4 1/4 1/2 . 1/4 1/4 1/2

(a) Verify that det (λI3 − A), the characteristic polynomial of A, is given by 1 (λ − 1)λ(λ − ). 4 (b) Find a non–singular matrix P such that P −1 AP = diag (1, 0, 1 ). 4 (c) Prove that 1 1 1 2 2 −4 1 1 −1 −1 2 An = 1 1 1 + 3 3 · 4n 1 1 1 −1 −1 2

if n ≥ 1. 10. Let

(a) Verify that det (λI3 − A), the characteristic polynomial of A, is given by (λ − 3)2 (λ − 9). (b) Find a non–singular matrix P such that P −1 AP = diag (3, 3, 9).

5 2 −2 5 −2 . A= 2 −2 −2 5

Chapter 7

Identifying second degree equations

7.1 The eigenvalue method

In this section we apply eigenvalue methods to determine the geometrical nature of the second degree equation ax2 + 2hxy + by 2 + 2gx + 2f y + c = 0, (7.1)

where not all of a, h, b are zero. a h Let A = be the matrix of the quadratic form ax2 + 2hxy + by 2 . h b We saw in section 6.1, equation 6.2 that A has real eigenvalues λ1 and λ2 , given by λ1 = a+b− a+b+ (a − b)2 + 4h2 , λ2 = 2 (a − b)2 + 4h2 . 2

We show that it is always possible to rotate the x, y axes to x1 , x2 axes whose positive directions are determined by eigenvectors X1 and X2 corresponding to λ1 and λ2 in such a way that relative to the x1 , y1 axes, equation 7.1 takes the form a′ x2 + b′ y 2 + 2g ′ x + 2f ′ y + c = 0. (7.2) Then by completing the square and suitably translating the x1 , y1 axes, to new x2 , y2 axes, equation 7.2 can be reduced to one of several standard forms, each of which is easy to sketch. We need some preliminary deﬁnitions. 129

130

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

DEFINITION 7.1.1 (Orthogonal matrix) An n × n real matrix P is called orthogonal if P t P = In . It follows that if P is orthogonal, then det P = ±1. For det (P t P ) = det P t det P = ( det P )2 ,

so (det P )2 = det In = 1. Hence det P = ±1. If P is an orthogonal matrix with det P = 1, then P is called a proper orthogonal matrix. THEOREM 7.1.1 If P is a 2 × 2 orthogonal matrix with det P = 1, then P = for some θ. REMARK 7.1.1 Hence, by the discusssion at the beginning of Chapter 6, if P is a proper orthogonal matrix, the coordinate transformation x y =P x1 y1 cos θ − sin θ sin θ cos θ

represents a rotation of the axes, with new x1 and y1 axes given by the repective columns of P . Proof. Suppose that P t P = I2 , where ∆ = det P = 1. Let P = Then the equation P t = P −1 = gives a c b d Hence a = d, b = −c and so P = = d −b −c a , a b c d .

1 adj P ∆

a −c c a

where a2 + c2 = 1. But then the point (a, c) lies on the unit circle, so a = cos θ and c = sin θ, where θ is uniquely determined up to multiples of 2π.

7.1. THE EIGENVALUE METHOD DEFINITION 7.1.2 (Dot product). If X = X · Y , the dot product of X and Y , is deﬁned by X · Y = ac + bd. The dot product has the following properties: (i) X · (Y + Z) = X · Y + X · Z; (ii) X · Y = Y · X; (iii) (tX) · Y = t(X · Y ); (iv) X · X = a2 + b2 if X = (v) X · Y = X t Y . The length of X is deﬁned by ||X|| = a2 + b2 = (X · X)1/2 . a ; b a b and Y =

131 c , then d

We see that ||X|| is the distance between the origin O = (0, 0) and the point (a, b). THEOREM 7.1.2 (Geometrical interpretation of the dot product) Let A = (x1 , y1 ) and B = (x2 , y2 ) be points, each distinct from the origin x1 x2 O = (0, 0). Then if X = and Y = , we have y1 y2 X · Y = OA · OB cos θ, where θ is the angle between the rays OA and OB. Proof. By the cosine law applied to triangle OAB, we have AB 2 = OA2 + OB 2 − 2OA · OB cos θ.

2 2 Now AB 2 = (x2 − x1 )2 + (y2 − y1 )2 , OA2 = x2 + y1 , OB 2 = x2 + y2 . 1 2

(7.3)

Substituting in equation 7.3 then gives

2 2 (x2 − x1 )2 + (y2 − y1 )2 = (x2 + y1 ) + (x2 + y2 ) − 2OA · OB cos θ, 1 2

132

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

which simpliﬁes to give OA · OB cos θ = x1 x2 + y1 y2 = X · Y. It follows from theorem 7.1.2 that if A = (x1 , y1 ) and B = (x2 , y2 ) are x2 x1 , then and Y = points distinct from O = (0, 0) and X = y2 y1 X · Y = 0 means that the rays OA and OB are perpendicular. This is the reason for the following deﬁnition: DEFINITION 7.1.3 (Orthogonal vectors) Vectors X and Y are called orthogonal if X · Y = 0. There is also a connection with orthogonal matrices: THEOREM 7.1.3 Let P be a 2 × 2 real matrix. Then P is an orthogonal matrix if and only if the columns of P are orthogonal and have unit length. Proof. P is orthogonal if and only if P t P = I2 . Now if P = [X1 |X2 ], the matrix P t P is an important matrix called the Gram matrix of the column vectors X1 and X2 . It is easy to prove that P t P = [Xi · Xj ] = X1 · X1 X1 · X2 X2 · X1 X2 · X2 .

Hence the equation P t P = I2 is equivalent to X1 · X1 X1 · X2 X2 · X1 X2 · X2 = 1 0 0 1 ,

or, equating corresponding elements of both sides: X1 · X1 = 1, X1 · X2 = 0, X2 · X2 = 1, which says that the columns of P are orthogonal and of unit length. The next theorem describes a fundamental property of real symmetric matrices and the proof generalizes to symmetric matrices of any size. THEOREM 7.1.4 If X1 and X2 are eigenvectors corresponding to distinct eigenvalues λ1 and λ2 of a real symmetric matrix A, then X1 and X2 are orthogonal vectors.

7.1. THE EIGENVALUE METHOD Proof. Suppose AX1 = λ1 X1 , AX2 = λ2 X2 , where X1 and X2 are non–zero column vectors, At = A and λ1 = λ2 . t We have to prove that X1 X2 = 0. From equation 7.4,

t t X2 AX1 = λ1 X2 X1

133

(7.4)

(7.5)

and

t t X1 AX2 = λ2 X1 X2 .

(7.6)

From equation 7.5, taking transposes,

t t (X2 AX1 )t = (λ1 X2 X1 )t

so

t t X1 At X2 = λ1 X1 X2 .

Hence

t t X1 AX2 = λ1 X1 X2 .

(7.7)

Finally, subtracting equation 7.6 from equation 7.7, we have

t (λ1 − λ2 )X1 X2 = 0

and hence, since λ1 = λ2 ,

t X1 X2 = 0.

THEOREM 7.1.5 Let A be a real 2 × 2 symmetric matrix with distinct eigenvalues λ1 and λ2 . Then a proper orthogonal 2 × 2 matrix P exists such that P t AP = diag (λ1 , λ2 ). Also the rotation of axes x y =P x1 y1

“diagonalizes” the quadratic form corresponding to A:

2 X t AX = λ1 x2 + λ2 y1 . 1

134

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

Proof. Let X1 and X2 be eigenvectors corresponding to λ1 and λ2 . Then by theorem 7.1.4, X1 and X2 are orthogonal. By dividing X1 and X2 by their lengths (i.e. normalizing X1 and X2 ) if necessary, we can assume that X1 and X2 have unit length. Then by theorem 7.1.1, P = [X1 |X2 ] is an orthogonal matrix. By replacing X1 by −X1 , if necessary, we can assume that det P = 1. Then by theorem 6.2.1, we have P t AP = P −1 AP = Also under the rotation X = P Y , X t AX = (P Y )t A(P Y ) = Y t (P t AP )Y = Y t diag (λ1 , λ2 )Y

2 = λ1 x2 + λ2 y1 . 1

λ1 0 0 λ2

.

EXAMPLE 7.1.1 Let A be the symmetric matrix A= 12 −6 −6 7 .

Find a proper orthogonal matrix P such that P t AP is diagonal. Solution. The characteristic equation of A is λ2 − 19λ + 48 = 0, or (λ − 16)(λ − 3) = 0. Hence A has distinct eigenvalues λ1 = 16 and λ2 = 3. We ﬁnd corresponding eigenvectors −3 2 X1 = and X2 = . 2 3 √ Now ||X1 || = ||X2 || = 13. So we take 1 X1 = √ 13 −3 2 1 and X2 = √ 13 2 3 .

Then if P = [X1 |X2 ], the proof of theorem 7.1.5 shows that P t AP = 16 0 0 3 .

However det P = −1, so replacing X1 by −X1 will give det P = 1.

7.1. THE EIGENVALUE METHOD

135

y 4 2

y

2

x -4 -2 -2 2 x 4

2

-4

Figure 7.1: 12x2 − 12xy + 7y 2 + 60x − 38y + 31 = 0. REMARK 7.1.2 (A shortcut) Once we have determined one eigenveca −b tor X1 = , the other can be taken to be , as these vectors are b a always orthogonal. Also P = [X1 |X2 ] will have det P = a2 + b2 > 0. We now apply the above ideas to determine the geometric nature of second degree equations in x and y. EXAMPLE 7.1.2 Sketch the curve determined by the equation 12x2 − 12xy + 7y 2 + 60x − 38y + 31 = 0. Solution. With P taken to be the proper orthogonal matrix deﬁned in the previous example by √ √ 3/√13 2/√13 P = , −2/ 13 3/ 13 then as theorem 7.1.1 predicts, P is a rotation matrix and the transformation X= x y = PY = P x1 y1

136

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

or more explicitly x= 3x1 + 2y1 −2x1 + 3y1 √ √ ,y= , 13 13 (7.8)

will rotate the x, y axes to positions given by the respective columns of P . (More generally, we can always arrange for the x1 axis to point either into the ﬁrst or fourth quadrant.) 12 −6 Now A = is the matrix of the quadratic form −6 7 12x2 − 12xy + 7y 2 , so we have, by Theorem 7.1.5

2 12x2 − 12xy + 7y 2 = 16x2 + 3y1 . 1

Then under the rotation X = P Y , our original quadratic equation becomes 38 60 2 16x2 + 3y1 + √ (3x1 + 2y1 ) − √ (−2x1 + 3y1 ) + 31 = 0, 1 13 13 6 256 2 16x2 + 3y1 + √ x1 + √ y1 + 31 = 0. 1 13 13 Now complete the square in x1 and y1 : 16 16 x2 + √ x1 1 13 8 16 x1 + √ 13

2

or

2 2 + 3 y1 + √ y1 13

2

+ 31 = 0,

2 2

1 + 3 y1 + √ 13

= 16 = 48.

8 √ 13

+3

1 √ 13

− 31 (7.9)

Then if we perform a translation of axes to the new origin (x1 , y1 ) = (− √8 , − √1 ): 13 13 1 8 x2 = x1 + √ , y2 = y1 + √ , 13 13 equation 7.9 reduces to 2 16x2 + 3y2 = 48, 2 or x2 y2 2 + 2 = 1. 3 16

7.1. THE EIGENVALUE METHOD

137

y

x

Figure 7.2:

x2 y 2 + 2 = 1, 0 < b < a: an ellipse. a2 b

This equation is now in one of the standard forms listed below as Figure 7.2 and is that of a whose centre is at (x2 , y2 ) = (0, 0) and whose axes of symmetry lie along the x2 , y2 axes. In terms of the original x, y coordinates, we ﬁnd that the centre is (x, y) = (−2, 1). Also Y = P t X, so equations 7.8 can be solved to give x1 = 2x1 + 3y1 3x1 − 2y1 √ √ , y1 = . 13 13

Hence the y2 –axis is given by 8 0 = x2 = x1 + √ 13 8 3x − 2y √ +√ , = 13 13 or 3x − 2y + 8 = 0. Similarly the x2 axis is given by 2x + 3y + 1 = 0. This ellipse is sketched in Figure 7.1. Figures 7.2, 7.3, 7.4 and 7.5 are a collection of standard second degree equations: Figure 7.2 is an ellipse; Figures 7.3 are hyperbolas (in both these b examples, the asymptotes are the lines y = ± x); Figures 7.4 and 7.5 a represent parabolas. EXAMPLE 7.1.3 Sketch y 2 − 4x − 10y − 7 = 0.

138

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

y

y

x

x

Figure 7.3: (i)

x2 y 2 − 2 = 1; a2 b

(ii)

x2 y 2 − 2 = −1, 0 < b, 0 < a. a2 b

y y

x

x

Figure 7.4: (i) y 2 = 4ax, a > 0;

(ii) y 2 = 4ax, a < 0.

7.1. THE EIGENVALUE METHOD

139

y y

x

x

Figure 7.5: (iii) x2 = 4ay, a > 0;

(iv) x2 = 4ay, a < 0.

Solution. Complete the square: y 2 − 10y + 25 − 4x − 32 = 0 (y − 5)2 = 4x + 32 = 4(x + 8),

2 or y1 = 4x1 , under the translation of axes x1 = x + 8, y1 = y − 5. Hence we get a parabola with vertex at the new origin (x1 , y1 ) = (0, 0), i.e. (x, y) = (−8, 5).

The parabola is sketched in Figure 7.6. EXAMPLE 7.1.4 Sketch the curve x2 − 4xy + 4y 2 + 5y − 9 = 0. Solution. We have x2 − 4xy + 4y 2 = X t AX, where A= 1 −2 −2 4 .

The characteristic equation of A is λ2 −5λ = 0, so A has distinct eigenvalues λ1 = 5 and λ2 = 0. We ﬁnd corresponding unit length eigenvectors 1 X1 = √ 5 1 −2 1 , X2 = √ 5 2 1 .

Then P = [X1 |X2 ] is a proper orthogonal matrix and under the rotation of axes X = P Y , or x1 + 2y1 √ x = 5 −2x1 + y1 √ , y = 5

140

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

y

1

y 12 8 x 4 x -8 -4 -4 -8 4 8 12

1

Figure 7.6: y 2 − 4x − 10y − 7 = 0. we have The original quadratic equation becomes √ 5 2 5x1 + √ (−2x1 + y1 ) − 9 = 0 5 √ 2 2 5(x1 − √ x1 ) + 5y1 − 9 = 0 5 √ √ √ 1 2 5(y1 − 2 5), 5(x1 − √ ) = 10 − 5y1 = 5

1 or 5x2 = − √5 y2 , where the x1 , y1 axes have been translated to x2 , y2 axes 2 using the transformation √ 1 x2 = x1 − √ , y2 = y1 − 2 5. 5 2 x2 − 4xy + 4y 2 = λ1 x2 + λ2 y1 = 5x2 . 1 1

Hence √ vertex of the parabola is at (x2 , y2 ) = (0, 0), i.e. (x1 , y1 ) = the 1 ( √5 , 2 5), or (x, y) = ( 21 , 8 ). The axis of symmetry of the parabola is the 5 √ 5 line x2 = 0, i.e. x1 = 1/ 5. Using the rotation equations in the form x1 = x − 2y √ 5

7.2. A CLASSIFICATION ALGORITHM

141

y 4 y 2

2

x -4 -2 -2 -4 2 4 x

2

Figure 7.7: x2 − 4xy + 4y 2 + 5y − 9 = 0. 2x + y √ , 5

y1 = we have

x − 2y 1 √ =√ , 5 5

or

x − 2y = 1.

The parabola is sketched in Figure 7.7.

7.2

A classiﬁcation algorithm

There are several possible degenerate cases that can arise from the general second degree equation. For example x2 + y 2 = 0 represents the point (0, 0); x2 + y 2 = −1 deﬁnes the empty set, as does x2 = −1 or y 2 = −1; x2 = 0 deﬁnes the line x = 0; (x + y)2 = 0 deﬁnes the line x + y = 0; x2 − y 2 = 0 deﬁnes the lines x − y = 0, x + y = 0; x2 = 1 deﬁnes the parallel lines x = ±1; (x + y)2 = 1 likewise deﬁnes two parallel lines x + y = ±1. We state without proof a complete classiﬁcation

1

1

of the various cases

This classiﬁcation forms the basis of a computer program which was used to produce the diagrams in this chapter. I am grateful to Peter Adams for his programming assistance.

142

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

that can possibly arise for the general second degree equation ax2 + 2hxy + by 2 + 2gx + 2f y + c = 0. (7.10)

It turns out to be more convenient to ﬁrst perform a suitable translation of axes, before rotating the axes. Let ∆= If C = 0, let α= CASE 1. ∆ = 0. (1.1) C = 0. Translate axes to the new origin (α, β), where α and β are given by equations 7.11: x = x1 + α, Then equation 7.10 reduces to

2 ax2 + 2hx1 y1 + by1 = 0. 1

a h g h b f , g f c

C = ab − h2 , A = bc − f 2 , B = ca − g 2 .

−

g h f b , C

β=

−

a g h f . C

(7.11)

y = y1 + β.

(a) C > 0: Single point (x, y) = (α, β). (b) C < 0: Two non–parallel lines intersecting in (x, y) = (α, β). The lines are √ y−β −h ± −C = if b = 0, x−α b a y−β = − , if b = 0. x=α and x−α 2h (1.2) C = 0. (a) h = 0. (i) a = g = 0. (A) A > 0: Empty set. (B) A = 0: Single line y = −f /b.

7.2. A CLASSIFICATION ALGORITHM (C) A < 0: Two parallel lines √ −f ± −A y= b (ii) b = f = 0. (A) B > 0: Empty set. (B) B = 0: Single line x = −g/a. (C) B < 0: Two parallel lines √ −g ± −B x= a (b) h = 0. (i) B > 0: Empty set. (ii) B = 0: Single line ax + hy = −g. (iii) B < 0: Two parallel lines ax + hy = −g ± CASE 2. ∆ = 0. √ −B.

143

(2.1) C = 0. Translate axes to the new origin (α, β), where α and β are given by equations 7.11: x = x1 + α, Equation 7.10 becomes

2 ax2 + 2hx1 y1 + by1 = − 1

y = y1 + β.

∆ . C

−∆ C .

(7.12)

2 CASE 2.1(i) h = 0. Equation 7.12 becomes ax2 + by1 = 1

(a) C < 0: Hyperbola. (b) C > 0 and a∆ > 0: Empty set. (c) C > 0 and a∆ < 0. (i) a = b: Circle, centre (α, β), radius (ii) a = b: Ellipse.

g 2 +f 2 −ac . a

144

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS CASE 2.1(ii) h = 0. Rotate the (x1 , y1 ) axes with the new positive x2 –axis in the direction of [(b − a + R)/2, −h], where R = Then equation 7.12 becomes

2 λ1 x2 + λ2 y2 = − 2

(a − b)2 + 4h2 .

∆ . C

(7.13)

where λ1 = (a + b − R)/2, λ2 = (a + b + R)/2, Here λ1 λ2 = C. (a) C < 0: Hyperbola. Here λ2 > 0 > λ1 and equation 7.13 becomes

2 x2 y2 −∆ 2 − 2 = , 2 u v |∆|

where u= |∆| ,v= Cλ1 |∆| . −Cλ2

(b) C > 0 and a∆ > 0: Empty set. (c) C > 0 and a∆ < 0: Ellipse. Here λ1 , λ2 , a, b have the same sign and λ1 = λ2 and equation 7.13 becomes 2 x2 y2 2 + 2 = 1, u2 v where ∆ ∆ ,v= . u= −Cλ1 −Cλ2 (2.1) C = 0. (a) h = 0. (i) a = 0: Then b = 0 and g = 0. Parabola with vertex −A f ,− 2gb b .

7.2. A CLASSIFICATION ALGORITHM Translate axes to (x1 , y1 ) axes:

2 y1 = −

145

2g x1 . b

(ii) b = 0: Then a = 0 and f = 0. Parabola with vertex g −B − , a 2f a Translate axes to (x1 , y1 ) axes: x2 = − 1 (b) h = 0: Parabola. Let k= The vertex of the parabola is (2akf − hk 2 − hac) a(k 2 + ac − 2kg) , d d . ga + bf . a+b 2f y1 . a .

Now translate to the vertex as the new origin, then rotate to (x2 , y2 ) axes with the positive x2 –axis along [sa, −sh], where s = sign (a). (The positive x2 –axis points into the ﬁrst or fourth quadrant.) Then the parabola has equation x2 = √ 2 where t = (af − gh)/(a + b). REMARK 7.2.1 If ∆ = 0, it is not necessary to rotate the axes. Instead it is always possible to translate the axes suitably so that the coeﬃcients of the terms of the ﬁrst degree vanish. EXAMPLE 7.2.1 Identify the curve 2x2 + xy − y 2 + 6y − 8 = 0. (7.14) −2st y2 , a2 + h2

146

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS

Solution. Here ∆=

Let x = x1 + α, y = y1 + β and substitute in equation 7.14 to get 2(x1 + α)2 + (x1 + α)(y1 + β) − (y1 + β)2 + 4(y1 + β) − 8 = 0. Then equating the coeﬃcients of x1 and y1 to 0 gives 4α + β = 0 α + 2β + 4 = 0, which has the unique solution α = − 2 , β = 8 . Then equation 7.15 simpliﬁes 3 3 to 2 2x2 + x1 y1 − y1 = 0 = (2x1 − y1 )(x1 + y1 ), 1 so relative to the x1 , y1 coordinates, equation 7.14 describes two lines: 2x1 − y1 = 0 or x1 + y1 = 0. In terms of the original x, y coordinates, these lines 8 become 2(x + 2 ) − (y − 8 ) = 0 and (x + 2 ) + (y − 3 ) = 0, i.e. 2x − y + 4 = 0 3 3 3 and x + y − 2 = 0, which intersect in the point (7.15)

0 −1 3 0 3 −8

1 2

2

1 2

= 0.

2 8 (x, y) = (α, β) = (− , ). 3 3 EXAMPLE 7.2.2 Identify the curve

x2 + 2xy + y 2 + +2x + 2y + 1 = 0. Solution. Here ∆= 1 1 1 1 1 1 1 1 1

(7.16)

= 0.

Let x = x1 + α, y = y1 + β and substitute in equation 7.16 to get (x1 +α)2 +2(x1 +α)(y1 +β)+(y1 +β)2 +2(x1 +α)+2(y1 +β)+1 = 0. (7.17) Then equating the coeﬃcients of x1 and y1 to 0 gives the same equation 2α + 2β + 2 = 0. Take α = 0, β = −1. Then equation 7.17 simpliﬁes to

2 x2 + 2x1 y1 + y1 = 0 = (x1 + y1 )2 , 1

and in terms of x, y coordinates, equation 7.16 becomes (x + y + 1)2 = 0, or x + y + 1 = 0.

7.3. PROBLEMS

147

7.3

PROBLEMS

(i) x2 − 8x + 8y + 8 = 0; (ii) y 2 − 12x + 2y + 25 = 0.

1. Sketch the curves

2. Sketch the hyperbola 4xy − 3y 2 = 8 and ﬁnd the equations of the asymptotes. [Answer: y = 0 and y = 4 x.] 3 3. Sketch the ellipse 8x2 − 4xy + 5y 2 = 36 and ﬁnd the equations of the axes of symmetry. [Answer: y = 2x and x = −2y.] 4. Sketch the conics deﬁned by the following equations. Find the centre when the conic is an ellipse or hyperbola, asymptotes if an hyperbola, the vertex and axis of symmetry if a parabola: (i) 4x2 − 9y 2 − 24x − 36y − 36 = 0; √ √ (ii) 5x2 − 4xy + 8y 2 + 4 5x − 16 5y + 4 = 0; (iii) 4x2 + y 2 − 4xy − 10y − 19 = 0; (iv) 77x2 + 78xy − 27y 2 + 70x − 30y + 29 = 0. [Answers: (i) hyperbola, centre (3, −2), asymptotes 2x − 3y − 12 = 0, 2x + 3y = 0; √ (ii) ellipse, centre (0, 5);

1 (iv) hyperbola, centre (− 10 , 11x − 3y − 1 = 0.] 9 7 (iii) parabola, vertex (− 5 , − 5 ), axis of symmetry 2x − y + 1 = 0; 7 10 ),

asymptotes 7x + 9y + 7 = 0 and

5. Identify the lines determined by the equations: (i) 2x2 + y 2 + 3xy − 5x − 4y + 3 = 0;

148

CHAPTER 7. IDENTIFYING SECOND DEGREE EQUATIONS (ii) 9x2 + y 2 − 6xy + 6x − 2y + 1 = 0; (iii) x2 + 4xy + 4y 2 − x − 2y − 2 = 0. [Answers: (i) 2x + y − 3 = 0 and x + y − 1 = 0; (ii) 3x − y + 1 = 0; (iii) x + 2y + 1 = 0 and x + 2y − 2 = 0.]

Chapter 8

THREE–DIMENSIONAL GEOMETRY

8.1 Introduction

In this chapter we present a vector–algebra approach to three–dimensional geometry. The aim is to present standard properties of lines and planes, with minimum use of complicated three–dimensional diagrams such as those involving similar triangles. We summarize the chapter: Points are deﬁned as ordered triples of real numbers and the distance between points P1 = (x1 , y1 , z1 ) and P2 = (x2 , y2 , z2 ) is deﬁned by the formula P1 P2 = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 . Directed line segments AB are introduced as three–dimensional column vectors: If A = (x1 , y1 , z1 ) and B = (x2 , y2 , z2 ), then x2 − x1 E AB= y2 − y1 . z2 − z1

E E

If P is a point, we let P =OP and call P the position vector of P . With suitable deﬁnitions of lines, parallel lines, there are important geometrical interpretations of equality, addition and scalar multiplication of vectors. (i) Equality of vectors: Suppose A, B, C, D are distinct points such that no three are collinear. Then AB=CD if and only if AB AC

E E E E

BD (See Figure 8.1.) 149

E

CD and

E

150

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

z

T

B

A

O

¡ ¡

¡ ¡

D E E y

AB + AC=AD x Figure 8.1: Equality and addition of vectors. (ii) Addition of vectors obeys the parallelogram law: Let A, B, C be non– collinear. Then AB + AC=AD, where D is the point such that AB ure 8.1.)

E E E E E E E

¡ ¡

E E AB=CD, E E

E E AC=BD E

C

CD and AC

E E

BD. (See Fig-

(iii) Scalar multiplication of vectors: Let AP = t AB, where A and B are distinct points. Then P is on the line AB, AP = |t| AB and (a) P = A if t = 0, P = B if t = 1; (b) P is between A and B if 0 < t < 1; (c) B is between A and P if 1 < t; (d) A is between P and B if t < 0. (See Figure 8.2.)

8.1. INTRODUCTION

151

z

T

d

A

d d P d d B E

O

¡ ¡

¡ ¡

E y

AP = t AB, 0 < t < 1

E

x Figure 8.2: Scalar multiplication of vectors. a1 a2 The dot product X ·Y of vectors X = b1 and Y = b2 , is deﬁned c1 c2 X · Y = a1 a2 + b1 b2 + c1 c2 . The length ||X|| of a vector X is deﬁned by ||X|| = (X · X)1/2 and the Cauchy–Schwarz inequality holds: |X · Y | ≤ ||X|| · ||Y ||. The triangle inequality for vector length now follows as a simple deduction: ||X + Y || ≤ ||X|| + ||Y ||. Using the equation AB = || AB ||, we deduce the corresponding familiar triangle inequality for distance: AB ≤ AC + CB.

E

¡ ¡

by

152

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY The angle θ between two non–zero vectors X and Y is then deﬁned by cos θ = X ·Y , ||X|| · ||Y || 0 ≤ θ ≤ π.

This deﬁnition makes sense. For by the Cauchy–Schwarz inequality, −1 ≤ X ·Y ≤ 1. ||X|| · ||Y ||

Non–zero vectors X and Y are parallel or proportional if the angle between X and Y equals 0 or π; equivalently if X = tY for some real number t. Vectors X and Y are then said to have the same or opposite direction, according as t > 0 or t < 0. We are then led to study straight lines. If A and B are distinct points, it is easy to show that AP + P B = AB holds if and only if AP = t AB, where 0 ≤ t ≤ 1. A line is deﬁned as a set consisting of all points P satisfying P = P0 + tX, t ∈ R or equivalently P0 P = tX,

E E E

are unit vectors and every vector is a linear combination of i, j and k: a b = ai + bj + ck. c

Vectors X and Y are said to be perpendicular or orthogonal if X · Y = 0. Vectors of unit length are called unit vectors. The vectors 0 0 1 0 , j = 1 , k = 0 i= 1 0 0

for some ﬁxed point P0 and ﬁxed non–zero vector X called a direction vector for the line. Equivalently, in terms of coordinates, x = x0 + ta, y = y0 + tb, z = z0 + tc, where P0 = (x0 , y0 , z0 ) and not all of a, b, c are zero.

8.1. INTRODUCTION

153

There is then one and only one line passing passing through two distinct points A and B. It consists of the points P satisfying AP = t AB, where t is a real number. The cross–product X ×Y provides us with a vector which is perpendicular to both X and Y . It is deﬁned in terms of the components of X and Y : Let X = a1 i + b1 j + c1 k and Y = a2 i + b2 j + c2 k. Then X × Y = ai + bj + ck, where a= b1 c1 b2 c2 , b=− a1 c1 a2 c2 , c= a1 b1 a2 b2 .

E E

The cross–product enables us to derive elegant formulae for the distance from a point to a line, the area of a triangle and the distance between two skew lines. Finally we turn to the geometrical concept of a plane in three–dimensional space. A plane is a set of points P satisfying an equation of the form P = P0 + sX + tY, s, t ∈ R, where X and Y are non–zero, non–parallel vectors. In terms of coordinates, equation 8.1 takes the form x = x0 + sa1 + ta2 y = y0 + sb1 + tb2 z = z0 + sc1 + tc2 , where P0 = (x0 , y0 , z0 ). There is then one and only one plane passing passing through three non–collinear points A, B, C. It consists of the points P satisfying AP = s AB +t AC, where s and t are real numbers. The cross–product enables us to derive a concise equation for the plane through three non–collinear points A, B, C, namely AP ·(AB × AC) = 0.

E E E E E E

(8.1)

154

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY When expanded, this equation has the form ax + by + cz = d,

E

where ai + bj + ck is a non–zero vector which is perpendicular to P1 P2 for all points P1 , P2 lying in the plane. Any vector with this property is said to be a normal to the plane. It is then easy to prove that two planes with non–parallel normal vectors must intersect in a line. We conclude the chapter by deriving a formula for the distance from a point to a plane.

8.2

Three–dimensional space

DEFINITION 8.2.1 Three–dimensional space is the set E 3 of ordered triples (x, y, z), where x, y, z are real numbers. The triple (x, y, z) is called a point P in E 3 and we write P = (x, y, z). The numbers x, y, z are called, respectively, the x, y, z coordinates of P . The coordinate axes are the sets of points: {(x, 0, 0)} (x–axis), {(0, y, 0)} (y–axis), {(0, 0, z)} (z–axis). The only point common to all three axes is the origin O = (0, 0, 0). The coordinate planes are the sets of points: {(x, y, 0)} (xy–plane), {(0, y, z)} (yz–plane), {(x, 0, z)} (xz–plane). The positive octant consists of the points (x, y, z), where x > 0, y > 0, z > 0. We think of the points (x, y, z) with z > 0 as lying above the xy–plane, and those with z < 0 as lying beneath the xy–plane. A point P = (x, y, z) will be represented as in Figure 8.3. The point illustrated lies in the positive octant. DEFINITION 8.2.2 The distance P1 P2 between points P1 = (x1 , y1 , z1 ) and P2 = (x2 , y2 , z2 ) is deﬁned by the formula P1 P2 = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2 . x2 + y 2 + z 2 .

For example, if P = (x, y, z), OP =

8.2. THREE–DIMENSIONAL SPACE

155

z (0, 0, z) T

P = (x, y, z)

O (x, 0, 0) ¡

¡ ¡ ¡

¡ ¡

¡(x, y, 0)

¡ ¡

¡

(0, y, 0)E y

x Figure 8.3: Representation of three-dimensional space. z

T

(0, 0, z2 )

(0, 0, z1 )

B

A

(x1 , 0, 0) (x2 , 0, 0)

)x

(0, y1 , 0)

(0, y2 , 0)

¡ ¡ ¡ ¡

E y

¡ ¡

Figure 8.4: The vector AB.

E

156

E

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

DEFINITION 8.2.3 If A = (x1 , y1 , z1 ) and B = (x2 , y2 , z2 ) we deﬁne the symbol AB to be the column vector x2 − x1 E AB= y2 − y1 . z2 − z1 We let P =OP and call P the position vector of P . The components of AB are the coordinates of B when the axes are translated to A as origin of coordinates. We think of AB as being represented by the directed line segment from A to B and think of it as an arrow whose tail is at A and whose head is at B. (See Figure 8.4.) Some mathematicians think of AB as representing the translation of space which takes A into B. The following simple properties of AB are easily veriﬁed and correspond to how we intuitively think of directed line segments: (i) AB= 0 ⇔ A = B; (ii) BA= − AB;

E E E E E E E E E E E E E

(iii) AB + BC=AC (the triangle law); (iv) BC=AC − AB= C − B; (v) if X is a vector and A a point, there is exactly one point B such that AB= X, namely that deﬁned by B = A + X.

E E E

To derive properties of the distance function and the vector function

P1 P2 , we need to introduce the dot product of two vectors in R3 .

8.3

Dot product

a2 a1 DEFINITION 8.3.1 If X = b1 and Y = b2 , then X · Y , the c1 c2 dot product of X and Y , is deﬁned by X · Y = a1 a2 + b1 b2 + c1 c2 .

8.3. DOT PRODUCT

T & b &

157 B

T & &

B

A

¡ ¡ ¡

&

&

A

E E ¡ ¡ ¡ ¡

a &

&

E E

¡

v =AB

−v =BA B b & z X $ & $$$C & $ $$ & $

E E E

Figure 8.5: The negative of a vector.

T & & b &

B

& & &

b &

D

T

A

¡ ¡ (a)

&

C

E E

A

¡ ¡ (b)

¡ ¡

AB=CD

E

¡ ¡

AC=AB + BC BC=AC − AB

E E

E

E

Figure 8.6: (a) Equality of vectors; (b) Addition and subtraction of vectors. The dot product has the following properties: (i) X · (Y + Z) = X · Y + X · Z; (ii) X · Y = Y · X; (iii) (tX) · Y = t(X · Y ); a (iv) X · X = a2 + b2 + c2 if X = b ; c (v) X · Y = X t Y ; (vi) X · X = 0 if and only if X = 0. The length of X is deﬁned by ||X|| = a2 + b2 + c2 = (X · X)1/2 .

We see that ||P|| = OP and more generally || P1 P2 || = P1 P2 , the distance between P1 and P2 .

E

158

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY z ck T

T

x Figure 8.7: Position vector as a linear combination of i, j and k. Vectors having unit length are called unit vectors. The vectors 0 0 1 i = 0 , j = 1 , k = 0 1 0 0

D D T D k D bj O D jE E ¡ ¡ i¡ ¡ ¡ ¡ s ¡ ai ¡ ai + bj ¡ ¡

D

P = ai + bj + ck

E y

(See Figure 8.7.) It is easy to prove that

are unit vectors. Every vector is a linear combination of i, j and k: a b = ai + bj + ck. c

||tX|| = |t| · ||X||, if t is a real number. Hence if X is a non–zero vector, the vectors ± are unit vectors. A useful property of the length of a vector is ||X ± Y ||2 = ||X||2 ± 2X · Y + ||Y ||2 . (8.2) 1 X ||X||

8.3. DOT PRODUCT

159

The following important property of the dot product is widely used in mathematics: THEOREM 8.3.1 (The Cauchy–Schwarz inequality) If X and Y are vectors in R3 , then |X · Y | ≤ ||X|| · ||Y ||. Moreover if X = 0 and Y = 0, then X · Y = ||X|| · ||Y || ⇔ Y = tX, t > 0, (8.3)

X · Y = −||X|| · ||Y || ⇔ Y = tX, t < 0. Proof. If X = 0, then inequality 8.3 is trivially true. So assume X = 0. Now if t is any real number, by equation 8.2, 0 ≤ ||tX − Y ||2 = ||tX||2 − 2(tX) · Y + ||Y ||2 = at2 − 2bt + c, where a = ||X||2 > 0, b = X · Y, c = ||Y ||2 . Hence c 2b t+ ) ≥0 a a b 2 ca − b2 t− ≥0 . + a a2 a(t2 − Substituting t = b/a in the last inequality then gives ac − b2 ≥ 0, a2 so |b| ≤ √ ac = √ √ a c = t2 ||X||2 − 2(X · Y )t + ||Y ||2

and hence inequality 8.3 follows. To discuss equality in the Cauchy–Schwarz inequality, assume X = 0 and Y = 0. Then if X · Y = ||X|| · ||Y ||, we have for all t ||tX − Y ||2 = t2 ||X||2 − 2tX · Y + ||Y ||2 = ||tX − Y ||2 . = t2 ||X||2 − 2t||X|| · ||Y || + ||Y ||2

160

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

Taking t = ||X||/||Y || then gives ||tX − Y ||2 = 0 and hence tX − Y = 0. Hence Y = tX, where t > 0. The case X · Y = −||X|| · ||Y || is proved similarly. COROLLARY 8.3.1 (The triangle inequality for vectors) If X and Y are vectors, then ||X + Y || ≤ ||X|| + ||Y ||. (8.4)

Moreover if X = 0 and Y = 0, then equality occurs in inequality 8.4 if and only if Y = tX, where t > 0. Proof. ||X + Y ||2 = ||X||2 + 2X · Y + ||Y ||2 = (||X|| + ||Y ||)2

≤ ||X||2 + 2||X|| · ||Y || + ||Y ||2

and inequality 8.4 follows. If ||X + Y || = ||X|| + ||Y ||, then the above proof shows that X · Y = ||X|| · ||Y ||. Hence if X = 0 and Y = 0, the ﬁrst case of equality in the Cauchy–Schwarz inequality shows that Y = tX with t > 0. The triangle inequality for vectors gives rise to a corresponding inequality for the distance function: THEOREM 8.3.2 (The triangle inequality for distance) If A, B, C are points, then AC ≤ AB + BC. only if AB= r AC, where 0 < r < 1. Proof. AC = || AC || = || AB + BC ||

E E E E E E E

(8.5)

Moreover if B = A and B = C, then equality occurs in inequality 8.5 if and

≤ || AB || + || BC || = AB + BC.

8.4. LINES

E E

161

Moreover if equality occurs in inequality 8.5 and B = A and B = C, then X =AB= 0 and Y =BC= 0 and the equation AC = AB + BC becomes ||X + Y || = ||X|| + ||Y ||. Hence the case of equality in the vector triangle inequality gives Y =BC= tX = t AB, where t > 0. Then BC = AC − AB= t AB AC = (1 + t) AB

E E E E E E E E E E

AB = r AC, where r = 1/(t + 1) satisﬁes 0 < r < 1.

8.4

Lines

DEFINITION 8.4.1 A line in E 3 is the set L(P0 , X) consisting of all points P satisfying P = P0 + tX, t ∈ R or equivalently P0 P = tX,

E

(8.6)

for some ﬁxed point P0 and ﬁxed non–zero vector X. (See Figure 8.8.) Equivalently, in terms of coordinates, equation 8.6 becomes x = x0 + ta, y = y0 + tb, z = z0 + tc, where not all of a, b, c are zero. The following familiar property of straight lines is easily veriﬁed. THEOREM 8.4.1 If A and B are distinct points, there is one and only one line containing A and B, namely L(A, AB) or more explicitly the line deﬁned by AP = t AB, or equivalently, in terms of position vectors: P = (1 − t)A + tB or P = A + t AB .

E E E E

(8.7)

Equations 8.7 may be expressed in terms of coordinates: if A = (x1 , y1 , z1 ) and B = (x2 , y2 , z2 ), then x = (1 − t)x1 + tx2 , y = (1 − t)y1 + ty2 , z = (1 − t)z1 + tz2 .

162

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

z

dT d d d

C

O

¡ ¡

¡ ¡

x Figure 8.8: Representation of a line.

¡ ¡

P0 P = t CD

d d d P0 d d d d D d P d d d E y d d d E E

z

dT d d

x Figure 8.9: The line segment AB.

¡ ¡

d A ! ¡d P ¡ d ¡ d B B ¡ ¨¨d ¡ ¨¨ d O ¨¨ ¡ d E y d ¡ d ¡ d ¡ ¡ E

P = A + t AB, 0 < t < 1

8.4. LINES

163

There is an important geometric signiﬁcance in the number t of the above equation of the line through A and B. The proof is left as an exercise: THEOREM 8.4.2 (Joachimsthal’s ratio formulae) If t is the parameter occurring in theorem 8.4.1, then (i) |t| = Also AP ; AB (ii) t AP = 1−t PB if P = B.

(iii) P is between A and B if 0 < t < 1; (iv) B is between A and P if 1 < t; (v) A is between P and B if t < 0. (See Figure 8.9.) For example, t =

1 2

gives the mid–point P of the segment AB: 1 P = (A + B). 2

EXAMPLE 8.4.1 L is the line AB, where A = (−4, 3, 1), B = (1, 1, 0); M is the line CD, where C = (2, 0, 2), D = (−1, 3, −2); N is the line EF , where E = (1, 4, 7), F = (−4, −3, −13). Find which pairs of lines intersect and also the points of intersection. Solution. In fact only L and N intersect, in the point (− 2 , 5 , 1 ). For 3 3 3 example, to determine if L and N meet, we start with vector equations for L and N : E E P = A + t AB, Q = E + s EF , equate P and Q and solve for s and t: (−4i + 3j + k) + t(5i − 2j − k) = (i + 4j + 7k) + s(−5i − 7j − 20k), which on simplifying, gives 5t + 5s = 5 −2t + 7s = 1 −t + 20s = 6

1 This system has the unique solution t = 2 , s = 3 and this determines a 3 2 corresponding point P where the lines meet, namely P = (− 3 , 5 , 1 ). 3 3 The same method yields inconsistent systems when applied to the other pairs of lines.

164

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

EXAMPLE 8.4.2 If A = (5, 0, 7) and B = (2, −3, 6), ﬁnd the points P on the line AB which satisfy AP/P B = 3. Solution. Use the formulae P = A + t AB Then

3 4

E

and

AP t = = 3. 1−t PB

so t =

9 or t = 3 . The corresponding points are ( 11 , 4 , 2 4

t = 3 or − 3, 1−t

25 4 )

and ( 1 , 9 , 2 2

11 2 ).

DEFINITION 8.4.2 Let X and Y be non–zero vectors. Then X is parallel or proportional to Y if X = tY for some t ∈ R. We write X Y if X is parallel to Y . If X = tY , we say that X and Y have the same or opposite direction, according as t > 0 or t < 0. DEFINITION 8.4.3 if A and B are distinct points on a line L, the non– zero vector AB is called a direction vector for L.

E

It is easy to prove that any two direction vectors for a line are parallel. DEFINITION 8.4.4 Let L and M be lines having direction vectors X and Y , respectively. Then L is parallel to M if X is parallel to Y . Clearly any line is parallel to itself. It is easy to prove that the line through a given point A and parallel to a given line CD has an equation P = A + t CD. THEOREM 8.4.3 Let X = a1 i + b1 j + c1 k and Y = a2 i + b2 j + c2 k be non–zero vectors. Then X is parallel to Y if and only if a1 b1 a2 b2 = b1 c 1 b2 c 2 = a1 c1 a2 c2 = 0. (8.8)

E

Proof. The case of equality in the Cauchy–Schwarz inequality (theorem 8.3.1) shows that X and Y are parallel if and only if |X · Y | = ||X|| · ||Y ||. Squaring gives the equivalent equality (a1 a2 + b1 b2 + c1 c2 )2 = (a2 + b2 + c2 )(a2 + b2 + c2 ), 1 1 1 2 2 2

8.4. LINES which simpliﬁes to (a1 b2 − a2 b1 )2 + (b1 c2 − b2 c1 )2 + (a1 c2 − a2 c1 )2 = 0, which is equivalent to a1 b2 − a2 b1 = 0, b1 c2 − b2 c1 = 0, a1 c2 − a2 c1 = 0, which is equation 8.8.

165

Equality of geometrical vectors has a fundamental geometrical interpretation: THEOREM 8.4.4 Suppose A, B, C, D are distinct points such that no three are collinear. Then AB=CD if and only if AB (See Figure 8.1.) Proof. If AB=CD then B − A = D − C, CD and AC

E E E E E E E E E E E E

CD and AC

BD

and so AC=BD. Hence AB

E

E

E

C−A = D−B

BD.

E

E

Conversely, suppose that AB AB= s CD or

E E

CD and AC and

E

BD. Then

E

E

AC= t BD,

B − A = s(D − C) and C − A = tD − B. We have to prove s = 1 or equivalently, t = 1. Now subtracting the second equation above from the ﬁrst, gives B − C = s(D − C) − t(D − B), so (1 − t)B = (1 − s)C + (s − t)D. If t = 1, then 1−s s−t C+ D 1−t 1−t and B would lie on the line CD. Hence t = 1. B=

166

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

8.5

The angle between two vectors

DEFINITION 8.5.1 Let X and Y be non–zero vectors. Then the angle between X and Y is the unique value of θ deﬁned by cos θ = X ·Y , ||X|| · ||Y || 0 ≤ θ ≤ π.

REMARK 8.5.1 By Cauchy’s inequality, we have −1 ≤ X ·Y ≤ 1, ||X|| · ||Y ||

so the above equation does deﬁne an angle θ. In terms of components, if X = [a1 , b1 , c1 ]t and Y = [a2 , b2 , c2 ]t , then cos θ = a1 a2 + b1 b2 + c1 c2 a2 1 + b2 + c 2 1 1 a2 + b2 + c2 2 2 2 . (8.9)

The next result is the well-known cosine rule for a triangle. THEOREM 8.5.1 (Cosine rule) If A, B, C are points with A = B and A = C, then the angle θ between vectors AB and AC satiﬁes cos θ = or equivalently BC 2 = AB 2 + AC 2 − 2AB · AC cos θ. (See Figure 8.10.) Proof. Let A = (x1 , y1 , z1 ), B = (x2 , y2 , z2 ), C = (x3 , y3 , z3 ). Then AB = a1 i + b1 j + c1 k AC = a2 i + b2 j + c2 k BC = (a2 − a1 )i + (b2 − b1 )j + (c2 − c1 )k, where ai = xi+1 − x1 , bi = yi+1 − y1 , ci = zi+1 − z1 , i = 1, 2.

E E E E E

AB 2 + AC 2 − BC 2 , 2AB · AC

(8.10)

8.5. THE ANGLE BETWEEN TWO VECTORS z

T

167

B

g g g g

O

¡ ¡

A θ

¡ ¡

g g g

g

E y

C

x Figure 8.10: The cosine rule for a triangle. Now by equation 8.9, cos θ = Also AB 2 + AC 2 − BC 2 = (a2 + b2 + c2 ) + (a2 + b2 + c2 ) 1 1 1 2 2 2 = 2a1 a2 + 2b1 b2 + c1 c2 . Equation 8.10 now follows, since AB · AC= a1 a2 + b1 b2 + c1 c2 . EXAMPLE 8.5.1 Let A = (2, 1, 0), B = (3, 2, 0), C = (5, 0, 1). Find the angle θ between vectors AB and AC. Solution.

E E E E E E

¡ ¡

cos θ =

AB 2 +AC 2 −BC 2 2AB·AC

a1 a2 + b1 b2 + c1 c2 . AB · AC

− ((a2 − a1 )2 + (b2 − b1 )2 + (c2 − c1 )2 )

Now

AB · AC . cos θ = AB · AC AB= i + j

E

and

AC= 3i − j + k.

E

168

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY z

T

B

A g g g g

O

¡ ¡

¡ ¡

g g g

g

E y

C

x Figure 8.11: Pythagoras’ theorem for a right–angled triangle. Hence cos θ = √ Hence θ = cos−1 12 + 12 + 02 1 × 3 + 1 × (−1) + 0 × 1 √ 2 2 =√ √ =√ . 2 11 11 32 + (−1)2 + 12

¡ ¡

AB 2 + AC 2 = BC 2

√ √2 . 11

DEFINITION 8.5.2 If X and Y are vectors satisfying X · Y = 0, we say X is orthogonal or perpendicular to Y . REMARK 8.5.2 If A, B, C are points forming a triangle and AB is orthogonal to AC, then the angle θ between AB and AC satisﬁes cos θ = 0 and hence θ = π and the triangle is right–angled at A. 2 Then we have Pythagoras’ theorem: BC 2 = AB 2 + AC 2 . (8.11)

E E E E

We also note that BC ≥ AB and BC ≥ AC follow from equation 8.11. (See Figure 8.11.) EXAMPLE 8.5.2 Let A = (2, 9, 8), B = (6, 4, −2), C = (7, 15, 7). Show

E E

that AB and AC are perpendicular and ﬁnd the point D such that ABDC forms a rectangle.

8.5. THE ANGLE BETWEEN TWO VECTORS z

Td A g d g d g d g d g d P g d d g 22 B d g222 d C O d E y d ¡ ¡

169

x Figure 8.12: Distance from a point to a line. Solution. AB · AC= (4i − 5j − 10k) · (5i + 6j − k) = 20 − 30 + 10 = 0. Hence AB and AC are perpendicular. Also, the required fourth point D clearly has to satisfy the equation BD=AC, or equivalently D − B =AC . Hence D = B+ AC= (6i + 4j − 2k) + (5i + 6j − k) = 11i + 10j − 3k, so D = (11, 10, −3). THEOREM 8.5.2 (Distance from a point to a line) If C is a point and L is the line through A and B, then there is exactly one point P on L such that CP is perpendicular to AB, namely P = A + t AB,

E E E E E E E E E E E

¡ ¡

¡ ¡

AC · AB t= . AB 2

E

E

(8.12)

Moreover if Q is any point on L, then CQ ≥ CP and hence P is the point on L closest to C.

170

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY The shortest distance CP is given by CP = AC 2 AB 2 − (AC · AB)2 . AB

E E E

(8.13)

E

(See Figure 8.12.)

Proof. Let P = A + t AB and assume that CP is perpendicular to AB. Then CP · AB = 0

E E E E E

E

(P − C)· AB = 0 (A + t AB −C)· AB = 0 CA · AB +t(AB · AB) = 0 − AC · AB +t(AB · AB) = 0, so equation 8.12 follows. The inequality CQ ≥ CP , where Q is any point on L, is a consequence of Pythagoras’ theorem. Finally, as CP and P A are perpendicular, Pythagoras’ theorem gives CP 2 = AC 2 − P A2

E E E E E E E E E

(CA +t AB)· AB = 0

E E E

E

E

= AC 2 − t2 AB 2 E E 2 AC · AB 2 AB 2 = AC − AB 2 = as required. EXAMPLE 8.5.3 The closest point on the line through A = (1, 2, 1) and 17 19 20 B = (2, −1, 3) to the origin is P = ( 14 , 14 , 14 ) and the corresponding √ 5 shortest distance equals 14 42. Another application of theorem 8.5.2 is to the projection of a line segment on another line: AC 2 AB 2 − (AC · AB)2 , AB 2

E E

= AC 2 − ||t AB ||2

8.5. THE ANGLE BETWEEN TWO VECTORS z

T

171

C1

Ad

4

4 4

d d d P1d d

4 C 2

O

¡ ¡

¡ ¡

4 d 4 44 d4 d P2 d d d

4

4

4

E y d

B

x Figure 8.13: Projecting the segment C1 C2 onto the line AB. THEOREM 8.5.3 (The projection of a line segment onto a line) Let C1 , C2 be points and P1 , P2 be the feet of the perpendiculars from C1 and C2 to the line AB. Then n P1 P2 = | C1 C2 ·ˆ |, where n= ˆ Also C1 C2 ≥ P1 P2 . (See Figure 8.13.) Proof. Using equations 8.12, we have P1 = A + t1 AB, where AC1 · AB , t1 = AB 2 Hence P1 P2 = (A + t2 AB) − (A + t1 AB) = (t2 − t1 ) AB,

E E E E E E E E

¡ ¡

1 E AB . AB (8.14)

P2 = A + t2 AB, AC2 · AB t2 = . AB 2

E E

E

172 so

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

P1 P2 = || P1 P2 || = |t2 − t1 |AB

E E E

E

=

AC2 · AB AC1 · AB AB − AB 2 AB 2 C1 C2 · AB C1 C2 ·ˆ , n n= ˆ

E E E

E

= = where n is the unit vector ˆ

AB 2

AB

1 E AB . AB Inequality 8.14 then follows from the Cauchy–Schwarz inequality 8.3.

DEFINITION 8.5.3 Two non–intersecting lines are called skew if they have non–parallel direction vectors. Theorem 8.5.3 has an application to the problem of showing that two skew lines have a shortest distance between them. (The reader is referred to problem 16 at the end of the chapter.) Before we turn to the study of planes, it is convenient to introduce the cross–product of two vectors.

8.6

The cross–product of two vectors

DEFINITION 8.6.1 Let X = a1 i + b1 j + c1 k and Y = a2 i + b2 j + c2 k. Then X × Y , the cross–product of X and Y , is deﬁned by X × Y = ai + bj + ck, where a= b1 c1 b2 c2 , b=− a1 c1 a2 c2 , c= a1 b1 a2 b2 .

The vector cross–product has the following properties which follow from properties of 2 × 2 and 3 × 3 determinants: (i) i × j = k, j × k = i, k × i = j;

8.6. THE CROSS–PRODUCT OF TWO VECTORS (ii) X × X = 0; (iii) Y × X = −X × Y ; (iv) X × (Y + Z) = X × Y + X × Z; (v) (tX) × Y = t(X × Y ); (vi) (Scalar triple product formula) if Z = a3 i + b3 j + c3 k, then X · (Y × Z) = a1 b1 c1 a2 b2 c2 a3 b3 c3 = (X × Y ) · Z;

173

(vii) X · (X × Y ) = 0 = Y · (X × Y ); (viii) ||X × Y || = ||X||2 ||Y ||2 − (X · Y )2 ;

(ix) if X and Y are non–zero vectors and θ is the angle between X and Y , then ||X × Y || = ||X|| · ||Y || sin θ. (See Figure 8.14.) From theorem 8.4.3 and the deﬁnition of cross–product, it follows that non–zero vectors X and Y are parallel if and only if X × Y = 0; hence by (vii), the cross–product of two non–parallel, non–zero vectors X and Y , is a non–zero vector perpendicular to both X and Y . LEMMA 8.6.1 Let X and Y be non–zero, non–parallel vectors. (i) Z is a linear combination of X and Y , if and only if Z is perpendicular to X × Y ; (ii) Z is perpendicular to X and Y , if and only if Z is parallel to X × Y . Proof. Let X and Y be non–zero, non–parallel vectors. Then X × Y = 0. Then if X × Y = ai + bj + ck, we have det [X × Y |X|Y ] =

t

a b c a1 b1 c1 a2 b2 c2

= (X × Y ) · (X × Y ) > 0.

174

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY z

T s d d

O

¡ ¡

d d d d θ X ©

X ×Y

YE

¡ ¡

E y

x Figure 8.14: The vector cross–product. Hence the matrix [X × Y |X|Y ] is non–singular. Consequently the linear system r(X × Y ) + sX + tY = Z (8.15) has a unique solution r, s, t. (i) Suppose Z = sX + tY . Then Z · (X × Y ) = sX · (X × Y ) + tY · (X × Y ) = s0 + t0 = 0. Conversely, suppose that Z · (X × Y ) = 0. Now from equation 8.15, r, s, t exist satisfying Z = r(X × Y ) + sX + tY. Then equation 8.16 gives 0 = (r(X × Y ) + sX + tY ) · (X × Y ) = r||X × Y ||2 . (8.16)

¡ ¡

= r||X × Y ||2 + sX · (X × Y ) + tY · (Y × X)

Hence r = 0 and Z = sX + tY , as required. (ii) Suppose Z = λ(X × Y ). Then clearly Z is perpendicular to X and Y .

8.6. THE CROSS–PRODUCT OF TWO VECTORS Conversely suppose that Z is perpendicular to X and Y . Now from equation 8.15, r, s, t exist satisfying Z = r(X × Y ) + sX + tY. Then sX · X + tX · Y from which it follows that (sX + tY ) · (sX + tY ) = 0. = X ·Z =0

175

sY · X + tY · Y

= Y · Z = 0,

Hence sX + tY = 0 and so s = 0, t = 0. Consequently Z = r(X × Y ), as required. The cross–product gives a compact formula for the distance from a point to a line, as well as the area of a triangle. THEOREM 8.6.1 (Area of a triangle) If A, B, C are distinct non–collinear points, then (i) the distance d from C to the line AB is given by || AB × AC || d= , AB (ii) the area of the triangle ABC equals || AB × AC || ||A × B + B × C + C × A|| = . 2 2 Proof. The area ∆ of triangle ABC is given by ∆= AB · CP , 2

E E E E

(8.17)

(8.18)

where P is the foot of the perpendicular from C to the line AB. Now by formula 8.13, we have CP = = AC 2 · AB 2 − (AC · AB)2 AB || AB × AC || , AB

E E E E

176

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

which, by property (viii) of the cross–product, gives formula 8.17. The second formula of equation 8.18 follows from the equations AB × AC = (B − A) × (C − A)

E E

= B×C−A×C−B×A as required.

= {(B × C − A × C)} − {(B × A − A × A)}

= {(B − A) × C} − {(C − A) × A}

= B × C + C × A + A × B,

8.7

Planes

DEFINITION 8.7.1 A plane is a set of points P satisfying an equation of the form P = P0 + sX + tY, s, t ∈ R, (8.19) where X and Y are non–zero, non–parallel vectors. For example, the xy–plane consists of the points P = (x, y, 0) and corresponds to the plane equation P = xi + yj = O + xi + yj. In terms of coordinates, equation 8.19 takes the form x = x0 + sa1 + ta2 y = y0 + sb1 + tb2 z = z0 + sc1 + tc2 , where P0 = (x0 , y0 , z0 ) and (a1 , b1 , c1 ) and (a2 , b2 , c2 ) are non–zero and non–proportional. THEOREM 8.7.1 Let A, B, C be three non–collinear points. Then there is one and only one plane through these points, namely the plane given by the equation P = A + s AB +t AC, or equivalently (See Figure 8.15.) AP = s AB +t AC .

E E E E E

(8.20) (8.21)

8.7. PLANES z

T

177

x Figure 8.15: Vector equation for the plane ABC. Proof. First note that equation 8.20 is indeed the equation of a plane through A, B and C, as AB and AC are non–zero and non–parallel and (s, t) = (0, 0), (1, 0) and (0, 1) give P = A, B and C, respectively. Call this plane P. Conversely, suppose P = P0 + sX + tY is the equation of a plane Q passing through A, B, C. Then A = P0 + s0 X + t0 Y , so the equation for Q may be written P = A + (s − s0 )X + (t − t0 )Y = A + s′ X + t′ Y ; so in eﬀect we can take P0 = A in the equation of Q. Then the fact that B and C lie on Q gives equations B = A + s1 X + t1 Y, or AB= s1 X + t1 Y,

E E E

¡ ¡

¡ ¡

P E X $ $$ ¨ q $ $$¨¨ 0 B $$ $$ ¨ ′ q B O $$ E y E E ¡ E E AB ′ = s AB, AC ′ = t AC ¡

C¨ ¨ ¨ A B

B C ¨

′

AP = s AB +t AC

E

E

E

C = A + s2 X + t2 Y, AC= s2 X + t2 Y.

E

(8.22)

Then equations 8.22 and equation 8.20 show that P ⊆ Q.

E E

Conversely, it is straightforward to show that because AB and AC are not parallel, we have s1 t1 = 0. s2 t2

178

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY D z

s dT d d

C

£ d d £ dr £ A rr £ #

P

rr

j r B E y

O

¡ ¡

¡ ¡

AD=AB × AC AD · AP = 0

E E

E

E

E

x Figure 8.16: Normal equation of the plane ABC. Hence equations 8.22 can be solved for X and Y as linear combinations of AB and AC, allowing us to deduce that Q ⊆ P. Hence Q = P. THEOREM 8.7.2 (Normal equation for a plane) Let A = (x1 , y1 , z1 ), B = (x2 , y2 , z2 ), C = (x3 , y3 , z3 ) be three non–collinear points. Then the plane through A, B, C is given by AP ·(AB × AC) = 0, or equivalently, x − x1 y − y1 z − z1 x2 − x1 y2 − y1 z2 − z1 x3 − x1 y3 − y1 z3 − z1 where P = (x, y, z). (See Figure 8.16.) = 0, (8.24)

E E E E E

¡ ¡

(8.23)

8.7. PLANES z

T u e ai + bj + ck $$ e$ $$ ¡ $e $ $ ¡ e ¡ ¡ ¡ $¡ $ ¡ $$$ $ $ $ ¡

179

O

¡ ¡

ax + by + cz = d

¡ ¡

E y

x Figure 8.17: The plane ax + by + cz = d. REMARK 8.7.1 Equation 8.24 can be written in more symmetrical form as x y z 1 x1 y1 z1 1 = 0. (8.25) x2 y2 z2 1 x3 y3 z3 1 Proof. Let P be the plane through A, B, C. Then by equation 8.21, we by lemma 8.6.1(i), using the fact that AB × AC= 0 here, if and only if AP

E E E E E

¡ ¡

have P ∈ P if and only if AP is a linear combination of AB and AC and so

E E E

is perpendicular to AB × AC. This gives equation 8.23. Equation 8.24 is the scalar triple product version of equation 8.23, taking into account the equations AP

E E E

= (x − x1 )i + (y − y1 )j + (z − z1 )k,

AB = (x2 − x1 )i + (y2 − y1 )j + (z2 − z1 )k, AC = (x3 − x1 )i + (y3 − y1 )j + (z3 − z1 )k. REMARK 8.7.2 Equation 8.24 gives rise to a linear equation in x, y and z: ax + by + cz = d,

180

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

where ai + bj + ck = 0. For x − x1 y − y1 z − z1 x2 − x1 y2 − y1 z2 − z1 x3 − x1 y3 − y1 z3 − z1 = x1 y1 z1 x2 − x1 y2 − y1 z2 − z1 (8.26) x3 − x1 y3 − y1 z3 − z1

x y z x2 − x1 y2 − y1 z2 − z1 x3 − x1 y3 − y1 z3 − z1

−

and expanding the ﬁrst determinant on the right–hand side of equation 8.26 along row 1 gives an expression ax + by + cz where a= y2 − y 1 z 2 − z 1 y3 − y 1 z 3 − z 1 ,b=− x2 − x1 z2 − z1 x3 − x1 z3 − z1

E E

,c=

x2 − x1 y2 − y1 x3 − x1 y3 − y1

.

But a, b, c are the components of AB × AC, which in turn is non–zero, as A, B, C are non–collinear here. Conversely if ai + bj + ck = 0, the equation ax + by + cz = d does indeed represent a plane. For if say a = 0, the equation can be solved for x in terms of y and z: d b c x −a −a −a y = 0 + y 1 + z 0 , z 1 0 0 which gives the plane P = P0 + yX + zY,

b c d where P0 = (− a , 0, 0) and X = − a i + j and Y = − a i + k are evidently non–parallel vectors.

REMARK 8.7.3 The plane equation ax + by + cz = d is called the normal form, as it is easy to prove that if P1 and P2 are two points in the plane, then ai + bj + ck is perpendicular to P1 P2 . Any non–zero vector with this property is called a normal to the plane. (See Figure 8.17.)

E

8.7. PLANES

E E

181

By lemma 8.6.1(ii), it follows that every vector X normal to a plane through three non–collinear points A, B, C is parallel to AB × AC, since X is perpendicular to AB and AC.

E E

EXAMPLE 8.7.1 Show that the planes x + y − 2z = 1 and x + 3y − z = 4

intersect in a line and ﬁnd the distance from the point C = (1, 0, 1) to this line. Solution. Solving the two equations simultaneously gives 1 5 x = − + z, 2 2 where z is arbitrary. Hence 3 5 1 1 xi + yj + zk = − i − j + z( i − j + k), 2 2 2 2 which is the equation of a line L through A = (− 1 , − 3 , 0) and having 2 2 1 direction vector 5 i − 2 j + k. 2 We can now proceed in one of three ways to ﬁnd the closest point on L to A. One way is to use equation 8.17 with B deﬁned by

E 5 1 AB= i − j + k. 2 2

y=

3 1 − z, 2 2

(8.27)

Another method minimizes the distance CP , where P ranges over L. A third way is to ﬁnd an equation for the plane through C, having 1 5 2 i − 2 j + k as a normal. Such a plane has equation 5x − y + 2z = d, where d is found by substituting the coordinates of C in the last equation. d = 5 × 1 − 0 + 2 × 1 = 7.

E

We now ﬁnd the point P where the plane intersects the line L. Then CP will be perpendicular to L and CP will be the required shortest distance from C to L. We ﬁnd using equations 8.27 that 1 5 3 1 5(− + z) − ( − z) + 2z = 7, 2 2 2 2

182

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

z

T ¡ a2 i + b2 j + c2 k ! ¡ $ ¡ $ ¡ e$$$$ ¡ ¡ $$e ¡ ¡ ¡ e ¡ ¡ L ¡ ¡ ¡ $ $ ¡ $$$ $ ¡ ¡ $ ¡ ¡ $ ¡ u e

a1 i + b1 j + c1 k

a1 x + b1 y + c1 z = d1 O

¡ ¡ ¡ ¡

a2 x + b2 y + c2 z = d2

E y

x

¡ ¡

Figure 8.18: Line of intersection of two planes. so z =

11 15 . 4 Hence P = ( 3 , 17 11 15 , 15 ).

It is clear that through a given line and a point not on that line, there passes exactly one plane. If the line is given as the intersection of two planes, each in normal form, there is a simple way of ﬁnding an equation for this plane. More explicitly we have the following result: THEOREM 8.7.3 Suppose the planes a1 x + b1 y + c1 z = d1 a2 x + b2 y + c2 z = d2 have non–parallel normals. Then the planes intersect in a line L. Moreover the equation λ(a1 x + b1 y + c1 z − d1 ) + µ(a2 x + b2 y + c2 z − d2 ) = 0, where λ and µ are not both zero, gives all planes through L. (See Figure 8.18.) Proof. Assume that the normals a1 i + b1 j + c1 k and a2 i + b2 j + c2 k are non–parallel. Then by theorem 8.4.3, not all of ∆1 = a1 b1 a2 b2 , ∆2 = b1 c 1 b2 c 2 , ∆3 = a1 c1 a2 c2 (8.31) (8.30) (8.28) (8.29)

8.7. PLANES

183

are zero. If say ∆1 = 0, we can solve equations 8.28 and 8.29 for x and y in terms of z, as we did in the previous example, to show that the intersection forms a line L. We next have to check that if λ and µ are not both zero, then equation 8.30 represents a plane. (Whatever set of points equation 8.30 represents, this set certainly contains L.) (λa1 + µa2 )x + (λb1 + µb2 )y + (λc1 + µc2 )z − (λd1 + µd2 ) = 0. Then we clearly cannot have all the coeﬃcients λa1 + µa2 , λb1 + µb2 , λc1 + µc2

zero, as otherwise the vectors a1 i + b1 j + c1 k and a2 i + b2 j + c2 k would be parallel. Finally, if P is a plane containing L, let P0 = (x0 , y0 , z0 ) be a point not on L. Then if we deﬁne λ and µ by λ = −(a2 x0 + b2 y0 + c2 z0 − d2 ), µ = a1 x0 + b1 y0 + c1 z0 − d1 , then at least one of λ and µ is non–zero. Then the coordinates of P0 satisfy equation 8.30, which therefore represents a plane passing through L and P0 and hence identical with P. EXAMPLE 8.7.2 Find an equation for the plane through P0 = (1, 0, 1) and passing through the line of intersection of the planes x + y − 2z = 1 and x + 3y − z = 4.

Solution. The required plane has the form λ(x + y − 2z − 1) + µ(x + 3y − z − 4) = 0, where not both of λ and µ are zero. Substituting the coordinates of P0 into this equation gives −2λ + µ(−4) = 0, So the required equation is −2µ(x + y − 2z − 1) + µ(x + 3y − z − 4) = 0, or −x + y + 3z − 2 = 0. Our ﬁnal result is a formula for the distance from a point to a plane. λ = −2µ.

184

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

u e P0 e z e e T u e ai + bj + ck $$ e$ $$ ¡ $e $ $ ¡ e ¡ ¡ P $$¡ ¡ ¡ $$$ $ $ ¡ $

O

¡ ¡

ax + by + cz = d

¡ ¡

E y

x Figure 8.19: Distance from a point P0 to the plane ax + by + cz = d. THEOREM 8.7.4 (Distance from a point to a plane) Let P0 = (x0 , y0 , z0 ) and P be the plane ax + by + cz = d.

E

¡ ¡

(8.32)

Then there is a unique point P on P such that P0 P is normal to P. Morever P0 P = |ax0 + by0 + cz0 − d| √ a2 + b2 + c2

(See Figure 8.19.) Proof. The line through P0 normal to P is given by P = P0 + t(ai + bj + ck), or in terms of coordinates x = x0 + at, y = y0 + bt, z = z0 + ct.

Substituting these formulae in equation 8.32 gives a(x0 + at) + b(y0 + bt) + c(z0 + ct) = d t(a2 + b2 + c2 ) = −(ax0 + by0 + cz0 − d), so t=− ax0 + by0 + cz0 − d a2 + b2 + c2 .

8.8. PROBLEMS Then P0 P = || P0 P || = ||t(ai + bj + ck)||

E

185

= |t| a2 + b2 + c2 |ax0 + by0 + cz0 − d| = a2 + b2 + c2 a2 + b2 + c2 |ax0 + by0 + cz0 − d| √ . = a2 + b2 + c2

Other interesting geometrical facts about lines and planes are left to the problems at the end of this chapter.

8.8

.

PROBLEMS

1. Find the point where the line through A = (3, −2, 7) and B = (13, 3, −8) meets the xz–plane. [Ans: (7, 0, 1).] 2. Let A, B, C be non–collinear points. If E is the mid–point of the AF segment BC and F is the point on the segment EA satisfying EF = 2, prove that 1 F = (A + B + C). 3 (F is called the centroid of triangle ABC.) 3. Prove that the points (2, 1, 4), (1, −1, 2), (3, 3, 6) are collinear. 4. If A = (2, 3, −1) and B = (3, 7, 4), ﬁnd the points P on the line AB satisfying PA/PB = 2/5. [Ans:

16 29 3 7 , 7 , 7

and

4 1 3, 3,

− 13 .] 3

5. Let M be the line through A = (1, 2, 3) parallel to the line joining B = (−2, 2, 0) and C = (4, −1, 7). Also N is the line joining E = (1, −1, 8) and F = (10, −1, 11). Prove that M and N intersect and ﬁnd the point of intersection. [Ans: (7, −1, 10).]

186

CHAPTER 8. THREE–DIMENSIONAL GEOMETRY

6. Prove that the triangle formed by the points (−3, 5, 6), (−2, 7, 9) and (2, 1, 7) is a 30o , 60o , 90o triangle. 7. Find the point on the line AB closest to the origin, where A = (−2, 1, 3) and B = (1, 2, 4). Also ﬁnd this shortest distance.

16 [Ans: − 11 , 13 35 11 , 11

and

150 11 .]

8. A line N is determined by the two planes x + y − 2z = 1,

√

and

x + 3y − z = 4.

Find the point P on N closest to the point C = (1, 0, 1) and ﬁnd the distance P C. [Ans:

4 17 11 3 , 15 , 15

and

330 15 .]

9. Find a linear equation describing the plane perpendicular to the line of intersection of the planes x + y − 2z = 4 and 3x − 2y + z = 1 and which passes through (6, 0, 2). [Ans: 3x + 7y + 5z = 28.] 10. Find the length of the projection of the segment AB on the line L, where A = (1, 2, 3), B = (5, −2, 6) and L is the line CD, where C = (7, 1, 9) and D = (−1, 5, 8). [Ans:

17 3 .]

11. Find a linear equation for the plane through A = (3, −1, 2), perpendicular to the line L joining B = (2, 1, 4) and C = (−3, −1, 7). Also ﬁnd the point of intersection of L and the plane and hence determine the distance from A to L. [Ans: 5x+2y−3z = 7,

111 52 131 38 , 38 , 38

,

293 38 .]

12. If P is a point inside the triangle ABC, prove that P = rA + sB + tC, where r + s + t = 1 and r > 0, s > 0, t > 0. 13. If B is the point where the perpendicular from A = (6, −1, 11) meets the plane 3x + 4y + 5z = 10, ﬁnd B and the distance AB. [Ans: B =

123 −286 255 50 , 50 , 50

and AB =

59 √ .] 50

8.8. PROBLEMS

187

14. Prove that√ the triangle with vertices (−3, 0, 2), (6, 1, 4), (−5, 1, 0) 1 has area 2 333. 15. Find an equation for the plane through (2, 1, 4), (1, −1, 2), (4, −1, 1). [Ans: 2x − 7y + 6z = 21.] 16. Lines L and M are non–parallel in 3–dimensional space and are given by equations P = A + sX, Q = B + tY. (i) Prove that there is precisely one pair of points P and Q such that P Q is perpendicular to X and Y . (ii) Explain why P Q is the shortest distance between lines L and M. Also prove that | (X × Y )· AB| . PQ = X ×Y 17. If L is the line through A = (1, 2, 1) and C = (3, −1, 2), while M is the line through B = (1, 0, 2) and D = (2, 1, 3), prove that the 13 shortest distance between L and M equals √62 . 18. Prove that the volume of the tetrahedron formed by four non–coplanar points Ai = (xi , yi , zi ), 1 ≤ i ≤ 4, is equal to

E E E 1 | (A1 A2 × A1 A3 )· A1 A4 |, 6 which in turn equals the absolute value of the determinant E E

1 6

1 1 1 1

x1 x2 x3 x4

y1 y2 y3 y4

z1 z2 z3 z4

.

19. The points A = (1, 1, 5), B = (2, 2, 1), C = (1, −2, 2) and D = (−2, 1, 2) are the vertices of a tetrahedron. Find the equation of the line through A perpendicular to the face BCD and the distance of A from this face. Also ﬁnd the shortest distance between the skew lines AD and BC. √ [Ans: P = (1 + t)(i + j + 5k); 2 3; 3.]

188

Chapter 9

FURTHER READING

Matrix theory has many applications to science, mathematics, economics and engineering. Some of these applications can be found in the books [2, 3, 4, 5, 11, 13, 16, 20, 26, 28]. For the numerical side of matrix theory, [6] is recommended. Its bibliography is also useful as a source of further references. For applications to: 1. Graph theory, see [7, 13]; 2. Coding theory, see [8, 15]; 3. Game theory, see [13]; 4. Statistics, see [9]; 5. Economics, see [10]; 6. Biological systems, see [12]; 7. Markov non–negative matrices, see [11, 13, 14, 17]; 8. The general equation of the second degree in three variables, see [18]; 9. Aﬃne and projective geometry, see [19, 21, 22]; 10. Computer graphics, see [23, 24].

189

190

Bibliography

[1] B. Noble. Applied Linear Algebra, 1969. Prentice Hall, NJ. [2] B. Noble and J.W. Daniel. Applied Linear Algebra, third edition, 1988. Prentice Hall, NJ. [3] R.P. Yantis and R.J. Painter. Elementary Matrix Algebra with Application, second edition, 1977. Prindle, Weber and Schmidt, Inc. Boston, Massachusetts. [4] T.J. Fletcher. Linear Algebra through its Applications, 1972. Van Nostrand Reinhold Company, New York. [5] A.R. Magid. Applied Matrix Models, 1984. John Wiley and Sons, New York. [6] D.R. Hill and C.B. Moler. Experiments in Computational Matrix Algebra, 1988. Random House, New York. [7] N. Deo. Graph Theory with Applications to Engineering and Computer Science, 1974. Prentice–Hall, N. J. [8] V. Pless. Introduction to the Theory of Error–Correcting Codes, 1982. John Wiley and Sons, New York. [9] F.A. Graybill. Matrices with Applications in Statistics, Wadsworth, Belmont Ca. 1983.

[10] A.C. Chiang. Fundamental Methods of Mathematical Economics, second edition, 1974. McGraw–Hill Book Company, New York. [11] N.J. Pullman. Matrix Theory and its Applications, 1976. Marcel Dekker Inc. New York. 191

[12] J.M. Geramita and N.J. Pullman. An Introduction to the Application of Nonnegative Matrices to Biological Systems, 1984. Queen’s Papers in Pure and Applied Mathematics 68. Queen’s University, Kingston, Canada. [13] M. Pearl. Matrix Theory and Finite Mathematics, 1973. McGraw–Hill Book Company, New York. [14] J.G. Kemeny and J.L. Snell. Finite Markov Chains, 1967. Van Nostrand Reinhold, N.J. [15] E.R. Berlekamp. Algebraic Coding Theory, 1968. McGraw–Hill Book Company, New York. [16] G. Strang. Linear Algebra and its Applications, 1988. Harcourt Brace Jovanovich, San Diego. [17] H. Minc. Nonnegative Matrices, 1988. John Wiley and Sons, New York. [18] G.C. Preston and A.R. Lovaglia. Modern Analytic Geometry, 1971. Harper and Row, New York. [19] J.A. Murtha and E.R. Willard. Linear Algebra and Geometry, 1969. Holt, Rinehart and Winston, Inc. New York. [20] L.A. Pipes. Matrix Methods for Engineering, 1963. Prentice–Hall, Inc. N. J. [21] D. Gans. Transformations and Geometries, 1969. Appleton–Century– Crofts, New York. [22] J.N. Kapur. Transformation Geometry, 1976. Aﬃliated East–West Press, New Delhi. [23] G.C. Reid. Postscript Language Tutorial and Cookbook, 1988. Addison– Wesley Publishing Company, New York. [24] D. Hearn and M.P. Baker. Computer Graphics, 1989. Prentice–Hall, Inc. N. J. [25] C.G. Cullen. Linear Algebra with Applications, 1988. Scott, Foresman and Company, Glenview, Illinois. [26] R.E. Larson and B.H. Edwards. Elementary Linear Algebra, 1988. D.C. Heath and Company, Lexington, Massachusetts Toronto. 192

[27] N. Magnenat–Thalman and D. Thalmann. State–of–the–art–in Computer Animation, 1989. Springer–Verlag Tokyo. [28] W.K. Nicholson. Elementary Linear Algebra, 1990. PWS–Kent, Boston.

193

Index

2 × 2 determinant, 71 algorithm, Gauss-Jordan, 8 angle between vectors, 166 asymptotes, 137 basis, left-to-right algorithm, 62 Cauchy-Schwarz inequality, 159 centroid, 185 column space, 56 complex number, 89 complex number, imaginary number, 90 complex number, imaginary part, 89 complex number, rationalization, 91 complex number, real, 89 complex number, real part, 89 complex numbers, Apollonius’ circle, 100 complex numbers, Argand diagram, 95 complex numbers, argument, 103 complex numbers, complex conjugate, 96 complex numbers, complex exponential, 107 complex numbers, complex plane, 95 complex numbers, cross-ratio, 114 complex numbers, De Moivre, 107 complex numbers, lower half plane, 95 complex numbers, modulus, 99 complex numbers, modulus-argument form, 103 complex numbers, polar representation, 103 complex numbers, ratio formulae, 100 complex numbers, square root, 92 complex numbers, upper half plane, 95 coordinate axes, 154 coordinate planes, 154 cosine rule, 166 determinant, 38 determinant, cofactor, 76 determinant, diagonal matrix, 74 determinant, Laplace expansion, 73 determinant, lower triangular, 74 determinant, minor, 72 determinant, recursive deﬁnition, 72 determinant, scalar matrix, 74 determinant, Surveyor’s formula, 85 determinant, upper triangular, 74 diﬀerential equations, 120 direction of a vector, 164 distance, 154 distance to a plane, 184 dot product, 131, 156 eigenvalue, 118 eigenvalues, characteristic equation, 118 eigenvector, 118

194

ellipse, 137 equation, linear, 1 equations, consistent system of, 1, 11 equations, Cramer’s rule, 39 equations, dependent unknowns, 11 equations, homogeneous system of, 16 equations, homogeneous, non–trivial solution, 16 equations, homogeneous, trivial solution, 16 equations, inconsistent system of, 1 equations, independent unknowns, 11 equations, system of linear, 1 factor theorem, 95 ﬁeld, 3 ﬁeld, additive inverse, 4 ﬁeld, multiplicative inverse, 4 Gauss’ theorem, 95 hyperbola, 137 imaginary axis, 95 independence, left-to-right test, 59 inversion, 74 Joachimsthal, 163 least squares, 47 least squares, normal equations, 47 least squares, residuals, 47 length of a vector, 131, 157 linear combination, 17 linear dependence, 58 linear equations, Cramer’s rule, 84 linear transformation, 27 linearly independent, 41 mathematical induction, 31 matrices, row–equivalence of, 7 matrix, 23

matrix, addition, 23 matrix, additive inverse, 24 matrix, adjoint, 78 matrix, augmented, 2 matrix, coeﬃcient, 26 matrix, coeﬃcient , 2 matrix, diagonal, 49 matrix, elementary row, 41 matrix, elementary row operations, 7 matrix, equality, 23 matrix, Gram, 132 matrix, identity, 31 matrix, inverse, 36 matrix, invertible, 36 matrix, Markov, 53 matrix, non–singular, 36 matrix, non–singular diagonal, 49 matrix, orthogonal , 130 matrix, power, 31 matrix, product, 25 matrix, proper orthogonal, 130 matrix, reduced row–echelon form, 6 matrix, row-echelon form, 6 matrix, scalar multiple, 24 matrix, singular, 36 matrix, skew–symmetric, 46 matrix, subtraction, 24 matrix, symmetric, 46 matrix, transpose, 45 matrix, unit vectors, 28 matrix, zero, 24 modular addition, 4 modular multiplication, 4 normal form, 180 orthogonal matrix, 116 orthogonal vectors, 168 parabola, 137 parallel lines, 164

195

parallelogram law, 150 perpendicular vectors, 168 plane, 176 plane through 3 points, 176, 178 position vector, 156 positive octant, 154 projection on a line, 171 rank, 66 real axis, 95 recurrence relations, 32 reﬂection equations, 29 rotation equations, 28 row space, 56 scalar multiplication of vectors, 150 scalar triple product, 173 skew lines, 172 subspace, 55 subspace, basis, 61 subspace, dimension, 63 subspace, generated, 56 subspace, null space, 55 Three–dimensional space, 154 triangle inequality, 160 unit vectors, 158 vector cross-product, 172 vector equality, 149, 165 vector, column, 27 vector, of constants, 26 vector, of unknowns, 26 vectors, parallel vectors, 164

196