Skip to article frontmatterSkip to article content

Appendix A - Calculus

A.1Overview

This appendix gives a very brief introduction to calculus with a focus on the tools needed in physics.

A.2Functions of real numbers

In calculus, we work with functions and their properties, rather than with variables as we do in algebra. We are usually concerned with describing functions in terms of their slope, the area (or volumes) that they enclose, their curvature, their roots (when they have a value of zero) and their continuity. The functions that we will examine are a mapping from one or more independent real numbers to one real number. By convention, we will use x,y,zx,y,z to indicate independent variables, and f()f() and g()g(), to denote functions. For example, if we say:

f(x)=x2f(2)=4\begin{align*} f(x) &= x^2\\ \therefore f(2) &= 4 \end{align*}

we mean that f(x)f(x) is a function that can be evaluated for any real number, xx, and the result of evaluating the function is to square the number xx. In the second line, we evaluated the function with x=2x=2. Similarly, we can have a function, g(x,y)g(x,y) of multiple variables:

g(x,y)=x2+2y2g(2,3)=22\begin{align*} g(x,y)&=x^2+2y^2\\ \therefore g(2,3)&=22 \end{align*}

We can easily visualize a function of 1 variable by plotting it, as in Figure A.1.

$f(x)=x^2$ plotted between $x=-5$ and $=+5$.

Figure A.1:f(x)=x2f(x)=x^2 plotted between x=5x=-5 and =+5=+5.

Plotting a function of 2 variables is a little trickier, since we need to do it in three dimensions (one axis for xx, one axis for yy, and one axis for g(x,y)g(x,y)). Figure A.2 shows an example of plotting a function of 2 variables.

$g(x,y)=x^2+2y^2$ plotted for $x$ between -5 and +5 and for $y$ between -5 and +5. A function of two variables can be visualized as a surface in three dimensions. One can also visualize the function by look at its "contours" (the lines drawn in the $xy$ plane).

Figure A.2:g(x,y)=x2+2y2g(x,y)=x^2+2y^2 plotted for xx between -5 and +5 and for yy between -5 and +5. A function of two variables can be visualized as a surface in three dimensions. One can also visualize the function by look at its “contours” (the lines drawn in the xyxy plane).

Unfortunately, it becomes difficult to visualize functions of more than 2 variables, although one can usually look at projections of those functions to try and visualize some of the features (for example, contour maps are 2D projections of 3D surfaces, as shown in the xy plane of Figure A.2). When you encounter a function, it is good practice to try and visualize it if you can. For example, ask yourself the following questions:

  • Does the function have one or more maxima and/or minima?
  • Does the function cross zero?
  • Is the function continuous everywhere?
  • Is the function always defined for any value of the independent variables?

A.3Derivatives

Consider the function f(x)=x2f(x)=x^2 that is plotted in Figure A.1. For any value of xx, we can define the slope of the function as the “steepness of the curve”. For values of x>0x>0 the function increases as xx increases, so we say that the slope is positive. For values of x<0x<0, the function decreases as xx increases, so we say that the slope is negative. A synonym for the word slope is “derivative”, which is the word that we prefer to use in calculus. The derivative of a function f(x)f(x) is given the symbol dfdx\frac{df}{dx} to indicate that we are referring to the slope of f(x)f(x) when plotted as a function of xx.

We need to specify which variable we are taking the derivative with respect to when the function has more than one variable but only one of them should be considered independent. For example, the function f(x)=ax2+bf(x)=ax^2+b will have different values if aa and bb are changed, so we have to be precise in specifying that we are taking the derivative with respect to xx. The following notations are equivalent ways to say that we are taking the derivative of f(x)f(x) with respect to xx:

dfdx=ddxf(x)=f(x)=f\frac{df}{dx}=\frac{d}{dx} f(x) = f'(x) = f'

The notation with the prime (f(x),ff'(x),f') can be useful to indicate that the derivative itself is also a function of xx.

The slope (derivative) of a function tells us how rapidly the value of the function is changing when the independent variable is changing. For f(x)=x2f(x)=x^2, as xx gets more and more positive, the function gets steeper and steeper; the derivative is thus increasing with xx. The sign of the derivative tells us if the function is increasing or decreasing, whereas its absolute value tells how quickly the function is changing (how steep it is).

We can approximate the derivative by evaluating how much f(x)f(x) changes when xx changes by a small amount, say, Δx\Delta x. In the limit of Δx0\Delta x\to 0, we get the derivative. In fact, this is the formal definition of the derivative:

dfdx=limΔx0ΔfΔx=limΔx0f(x+Δx)f(x)Δx\boxed{\frac{df}{dx}=\lim_{\Delta x\to 0}\frac{\Delta f}{\Delta x} =\lim_{\Delta x\to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x} }

where Δf\Delta f is the small change in f(x)f(x) that corresponds to the small change, Δx\Delta x, in xx. This makes the notation for the derivative more clear, dxdx is Δx\Delta x in the limit where Δx0\Delta x\to0, and dfdf is Δf\Delta f, in the same limit of Δx0\Delta x\to 0.

As an example, let us determine the function f(x)f'(x) that is the derivative of f(x)=x2f(x)=x^2. We start by calculating Δf\Delta f:

Δf=f(x+Δx)f(x)=(x+Δx)2x2=x2+2xΔx+Δx2x2=2xΔx+Δx2\begin{align*} \Delta f &= f(x+\Delta x)-f(x)\\ &=(x+\Delta x)^2 - x^2\\ &=x^2+2x\Delta x+\Delta x^2 -x^2\\ &=2x\Delta x+\Delta x^2 \end{align*}

We now calculate ΔfΔx\frac{\Delta f}{\Delta x}:

ΔfΔx=2xΔx+Δx2Δx=2x+Δx\begin{align*} \frac{\Delta f}{\Delta x}&=\frac{2x\Delta x+\Delta x^2}{\Delta x}\\ &=2x+\Delta x \end{align*}

and take the limit Δx0\Delta x\to 0:

dfdx=limΔx0ΔfΔx=limΔx0(2x+Δx)=2x\begin{align*} \frac{df}{dx}&=\lim_{\Delta x\to 0 }\frac{\Delta f}{\Delta x}\\ &=\lim_{\Delta x\to 0 }(2x+\Delta x)\\ &=2x \end{align*}

We have thus found that the function, f(x)=2xf'(x)=2x, is the derivative of the function f(x)=x2f(x)=x^2. This is illustrated in Figure A.3. Note that:

  • For x>0x>0, f(x)f'(x) is positive and increasing with increasing xx, just as we described earlier (the function f(x)f(x) is increasing and getting steeper).
  • For x<0x<0, f(x)f'(x) is negative and decreasing in magnitude as xx increases. Thus f(x)f(x) decreases and gets less steep as xx increases.
  • At x=0x=0, f(x)=0f'(x)=0 indicating that, at the origin, the function f(x)f(x) is (momentarily) flat.
$f(x)=x^2$ and its derivative, $f'(x)=2x$ plotted for x between -5 and +5.

Figure A.3:f(x)=x2f(x)=x^2 and its derivative, f(x)=2xf'(x)=2x plotted for x between -5 and +5.

A.3.1Common derivatives and properties

It is beyond the scope of this document to derive the functional form of the derivative for any function using equation (A.4). Table A.1 below gives the derivatives for common functions. In all cases, xx is the independent variable, and all other variables should be thought of as constants:

Table A.1:Common derivatives of functions.

Function, f(x)f(x)Derivative, f(x)f'(x)
f(x)=af(x)=af(x)=0f'(x)=0
f(x)=xnf(x)=x^nf(x)=nxn1f'(x)=nx^{n-1}
f(x)=sin(x)f(x)=\sin(x)f(x)=cos(x)f'(x)=\cos(x)
f(x)=cos(x)f(x)=\cos(x)f(x)=sin(x)f'(x)=-\sin(x)
f(x)=tan(x)f(x)=\tan(x)f(x)=1cos2(x)f'(x)=\frac{1}{\cos^2(x)}
f(x)=exf(x)=e^xf(x)=exf'(x)=e^x
f(x)=ln(x)f(x)=\ln(x)f(x)=1xf'(x)=\frac{1}{x}

If two functions of 1 variable, f(x)f(x) and g(x)g(x), are combined into a third function, h(x)h(x), then there are simple rules for finding the derivative, h(x)h'(x), based on the derivatives f(x)f'(x) and g(x)g'(x). These are summarized in Table A.2 below.

Table A.2:Derivatives of combined functions.

Function, h(x)h(x)Derivative, h(x)h'(x)
h(x)=f(x)+g(x)h(x)=f(x)+g(x)h(x)=f(x)+g(x)h'(x)=f'(x)+g'(x)
h(x)=f(x)g(x)h(x)=f(x)-g(x)h(x)=f(x)g(x)h'(x)=f'(x)-g'(x)
h(x)=f(x)g(x)h(x)=f(x)g(x)h(x)=f(x)g(x)+f(x)g(x)h'(x)=f'(x)g(x)+f(x)g'(x) (The product rule)
h(x)=f(x)g(x)h(x)=\frac{f(x)}{g(x)}h(x)=f(x)g(x)f(x)g(x)g2(x)h'(x)=\frac{f'(x)g(x)-f(x)g'(x)}{g^2(x)} (The quotient rule)
h(x)=f(g(x))h(x)=f(g(x))h(x)=f(g(x))g(x)h'(x)=f'(g(x))g'(x) (The Chain Rule)

A.3.2Partial derivatives and gradients

So far, we have only looked at the derivative of a function of a single independent variable and used it to quantify how much the function changes when the independent variable changes. We can proceed analogously for a function of multiple variables, f(x,y)f(x,y), by quantifying how much the function changes along the direction associated with a particular variable. This is illustrated in Figure A.4 for the function f(x,y)=x22y2f(x,y)=x^2-2y^2, which looks somewhat like a saddle.

$f(x,y)=x^2-2y^2$ plotted for $x$ between -5 and +5 and for $y$ between -5 and +5. The point P labelled on the figure shows the value of the function at $f(-2,-2)$. The two lines show the function evaluated when one of $x$ or $y$ is held constant.

Figure A.4:f(x,y)=x22y2f(x,y)=x^2-2y^2 plotted for xx between -5 and +5 and for yy between -5 and +5. The point P labelled on the figure shows the value of the function at f(2,2)f(-2,-2). The two lines show the function evaluated when one of xx or yy is held constant.

Suppose that we wish to determine the derivative of the function f(x,y)f(x,y) at x=2x=-2 and y=2y=-2. In this case, it does not make sense to simply determine the “derivative”, but rather, we must specify in which direction we want the derivative. That is, we need to specify in which direction we are interested in quantifying the rate of change of the function.

One possibility is to quantify the rate of change in the xx direction. The solid line in Figure A.4 shows the part of the function surface where yy is fixed at -2, that is, the function evaluated as f(x,y=2)f(x,y=-2). The point PP on the figure shows the value of the function when x=2x=-2 and y=2y=-2. By looking at the solid line at point PP, we can see that as xx increases, the value of the function is gently decreasing. The derivative of f(x,y)f(x,y) with respect to xx when yy is held constant and evaluated at x=2x=-2 and y=2y=-2 is thus negative. Rather than saying “The derivative of f(x,y)f(x,y) with respect to xx when yy is held constant” we say “The partial derivative of f(x,y)f(x,y) with respect to xx”.

Since the partial derivative is different than the ordinary derivative (as it implies that we are holding independent variables fixed), we give it a different symbol, namely, we use instead of dd:

fx=xf(x,y) (Partial derivative of f with respect to x)\frac{\partial f}{\partial x}=\frac{\partial}{\partial x}f(x,y)\text{ (Partial derivative of f with respect to x)}

Calculating the partial derivative is very easy, as we just treat all variables as constants except for the variable with respect to which we are differentiating[1]. For the function f(x,y)=x22y2f(x,y)=x^2-2y^2, we have:

fx=x(x22y2)=2xfy=y(x22y2)=4y\begin{align*} \frac{\partial f}{\partial x}&=\frac{\partial}{\partial x}\left(x^2-2y^2\right) = 2x\\ \frac{\partial f}{\partial y}&=\frac{\partial}{\partial y}\left(x^2-2y^2\right) = -4y \end{align*}

At x=2x=-2, the partial derivative of f(x,y)f(x,y) is indeed negative, consistent with our observation that, along the solid line, at point PP, the function is decreasing.

A function will have as many partial derivatives as it has independent variables. Also note that, just like a normal derivative, a partial derivative is still a function. The partial derivative with respect to a variable tells us how steep the function is in the direction in which that variable increases and whether it is increasing or decreasing.

Since the partial derivatives tell us how the function changes in a particular direction, we can use them to find the direction in which the function changes the most rapidly. For example, suppose that the surface from Figure A.4 corresponds to a real physical surface and that we place a ball at point PP. We wish to know in which direction the ball will roll. The direction that it will roll in is the opposite of the direction where f(x,y)f(x,y) increases the most rapidly (i.e. it will roll in the direction where f(x,y)f(x,y) decreases the most rapidly). The direction in which the function increases the most rapidly is called the “gradient” and denoted by f(x,y)\nabla f(x,y).

Since the gradient is a direction, it cannot be represented by a single number. Rather, we use a “vector” to indicate this direction. Since f(x,y)f(x,y) has two independent variables, the gradient will be a vector with two components. The components of the gradient are given by the partial derivatives:

f(x,y)=fxx^+fyy^\nabla f(x,y) = \frac{\partial f}{\partial x}\hat x+\frac{\partial f}{\partial y} \hat y

where x^\hat x and y^\hat y are the unit vectors in the xx and yy directions, respectively (sometimes, the unit vectors are denoted i^\hat i and j^\hat j). The direction of the gradient tells us in which direction the function increases the fastest, and the magnitude of the gradient tells us how much the function increases in that direction.

The gradient is itself a function, but it is not a real function (in the sense of a real number), since it evaluates to a vector. It is a mapping from real numbers x,yx,y to a vector. As you take more advanced calculus courses, you will eventually encounter “vector calculus”, which is just the calculus for functions of multiple variables to which you were just introduced. The key point to remember here is that the gradient can be used to find the vector that points in the direction of maximal increase of the corresponding multi-variate function. This is precisely the quantity that we need in physics to determine in which direction a ball will roll when placed on a surface (it will roll in the direction opposite to the gradient vector).

A.3.3Common uses of derivatives in physics

The simplest case of using a derivative is to describe the speed of an object. If an object covers a distance Δx\Delta x in a period of time Δt\Delta t, it’s “average speed”, vavgv_{avg}, is defined as the distance covered by the object divided by the amount of time it took to cover that distance:

vavg=ΔxΔtv_{avg} = \frac{\Delta x}{\Delta t}

If the object changes speed (for example it is slowing down) over the distance Δx\Delta x, we can still define its “instantaneous speed”, vv, by measuring the amount of time, Δt\Delta t, that it takes the object to cover a very small distance, Δx\Delta x. The instantaneous speed is defined in the limit where Δx0\Delta x \to 0:

v=limΔx0ΔxΔt=dxdtv = \lim_{\Delta x\to 0}\frac{\Delta x}{\Delta t}=\frac{dx}{dt}

which is precisely the derivative of x(t)x(t) with respect to tt. x(t)x(t) is a function that gives the position, xx, of the object along some xx axis as a function of time. The speed of the object is thus the rate of change of its position.

Similarly, if the speed is changing with time, then we can define the “acceleration”, aa, of an object as the rate of change of its speed:

a=dvdta = \frac{dv}{dt}

A.4Anti-derivatives and integrals

In the previous section, we were concerned with determining the derivative of a function f(x)f(x). The derivative is useful because it tells us how the function f(x)f(x) varies as a function of xx. In physics, we often know how a function varies, but we do not know the actual function. In other words, we often have the opposite problem: we are given the derivative of a function, and wish to determine the actual function. For this case, we will limit our discussion to functions of a single independent variable.

Suppose that we are given a function f(x)f(x) and we know that this is the derivative of some other function, F(x)F(x), which we do not know. We call F(x)F(x) the anti-derivative of f(x)f(x). The anti-derivative of a function f(x)f(x), written F(x)F(x), thus satisfies the property:

dFdx=f(x)\frac{dF}{dx}=f(x)

Since we have a symbol for indicating that we take the derivative with respect to xx (ddx\frac{d}{dx}), we also have a symbol, dx\int dx, for indicating that we take the anti-derivative with respect to xx:

f(x)dx=F(x)ddx(f(x)dx)=dFdx=f(x)\begin{align*} \int f(x) dx &= F(x) \\ \therefore \frac{d}{dx}\left(\int f(x) dx\right) &= \frac{dF}{dx}=f(x) \end{align*}

Earlier, we justified the symbol for the derivative by pointing out that it is like ΔfΔx\frac{\Delta f}{\Delta x} but for the case when Δx0\Delta x\to 0. Similarly, we will justify the anti-derivative sign, f(x)dx\int f(x) dx, by showing that it is related to a sum of f(x)Δxf(x)\Delta x, in the limit Δx0\Delta x\to 0. The \int sign looks like an “S” for sum.

While it is possible to exactly determine the derivative of a function f(x)f(x), the anti-derivative can only be determined up to a constant. Consider for example a different function, F~(x)=F(x)+C\tilde F(x)=F(x)+C, where CC is a constant. The derivative of F~(x)\tilde F(x) with respect to xx is given by:

dF~dx=ddx(F(x)+C)=dFdx+dCdx=dFdx+0=f(x)\begin{align*} \frac{d\tilde{F}}{dx}&=\frac{d}{dx}\left(F(x)+C\right)\\ &=\frac{dF}{dx}+\frac{dC}{dx}\\ &=\frac{dF}{dx}+0\\ &=f(x) \end{align*}

Hence, the function F~(x)=F(x)+C\tilde F(x)=F(x)+C is also an anti-derivative of f(x)f(x). The constant CC can often be determined using additional information (sometimes called “initial conditions”). Recall the function, f(x)=x2f(x)=x^2, shown in Figure A.3 (left panel). If you imagine shifting the whole function up or down, the derivative would not change. In other words, if the origin of the axes were not drawn on the left panel, you would still be able to determine the derivative of the function (how steep it is). Adding a constant, CC, to a function is exactly the same as shifting the function up or down, which does not change its derivative. Thus, when you know the derivative, you cannot know the value of CC, unless you are also told that the function must go through a specific point (a so-called initial condition).

In order to determine the derivative of a function, we used equation (A.4). We now need to derive an equivalent prescription for determining the anti-derivative. Suppose that we have the two pieces of information required to determine F(x)F(x) completely, namely:

  1. the function f(x)=dFdxf(x)=\frac{dF}{dx} (its derivative).
  2. the condition that F(x)F(x) must pass through a specific point, F(x0)=F0F(x_0)=F_0.
Determining the anti-derivative, $F(x)$, given the function $f(x)=2x$ and the initial condition that $F(x)$ passes through the point $(x_0,F_0)=(1,3)$.

Figure A.5:Determining the anti-derivative, F(x)F(x), given the function f(x)=2xf(x)=2x and the initial condition that F(x)F(x) passes through the point (x0,F0)=(1,3)(x_0,F_0)=(1,3).

The procedure for determining the anti-derivative F(x)F(x) is illustrated above in Figure A.5. We start by drawing the point that we know the function F(x)F(x) must go through, (x0,F0)(x_0,F_0). We then choose a value of Δx\Delta x and use the derivative, f(x)f(x), to calculate ΔF0\Delta F_0, the amount by which F(x)F(x) changes when xx changes by Δx\Delta x. Using the derivative f(x)f(x) evaluated at x0x_0, we have:

ΔF0Δxf(x0)        (in the limitΔx0)ΔF0=f(x0)Δx\begin{align*} \frac{\Delta F_0}{\Delta x} &\approx f(x_0)\;\;\;\; (\text{in the limit} \Delta x\to 0 )\\ \therefore \Delta F_0 &= f(x_0) \Delta x \end{align*}

We can then estimate the value of the function F1=F(x1)F_1=F(x_1) at the next point, x1=x0+Δxx_1=x_0+\Delta x, as illustrated by the black arrow in Figure A.5

F1=F(x1)=F(x+Δx)F0+ΔF0F0+f(x0)Δx\begin{align*} F_1&=F(x_1)\\ &=F(x+\Delta x) \\ &\approx F_0 + \Delta F_0\\ &\approx F_0+f(x_0)\Delta x \end{align*}

Now that we have determined the value of the function F(x)F(x) at x=x1x=x_1, we can repeat the procedure to determine the value of the function F(x)F(x) at the next point, x2=x1+Δxx_2=x_1+\Delta x. Again, we use the derivative evaluated at x1x_1, f(x1)f(x_1), to determine ΔF1\Delta F_1, and add that to F1F_1 to get F2=F(x2)F_2=F(x_2), as illustrated by the grey arrow in Figure A.5:

F2=F(x1+Δx)F1+ΔF1F1+f(x1)ΔxF0+f(x0)Δx+f(x1)Δx\begin{align*} F_2&=F(x_1+\Delta x) \\ &\approx F_1+\Delta F_1\\ &\approx F_1+f(x_1)\Delta x\\ &\approx F_0+f(x_0)\Delta x+f(x_1)\Delta x \end{align*}

Using the summation notation, we can generalize the result and write the function F(x)F(x) evaluated at any point, xN=x0+NΔxx_N=x_0+N\Delta x:

F(xN)F0+i=1i=Nf(xi1)ΔxF(x_N) \approx F_0+\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x

The result above will become exactly correct in the limit Δx0\Delta x\to 0:

F(xN)=F(x0)+limΔx0i=1i=Nf(xi1)ΔxF(x_N) = F(x_0)+\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x

Let us take a closer look at the sum. Each term in the sum is of the form f(xi1)Δxf(x_{i-1})\Delta x, and is illustrated in Figure A.6 for the same case as in Figure A.5 (that is, Figure A.6 shows f(x)f(x) that we know, and Figure A.5 shows F(x)F(x) that we are trying to find).

The function $f(x)=2x$ and illustration of the terms $f(x_0)\Delta x$ and $f(x_1)\Delta x$ as the area between the curve $f(x)$ and the $x$ axis when $\Delta x\to 0$.

Figure A.6:The function f(x)=2xf(x)=2x and illustration of the terms f(x0)Δxf(x_0)\Delta x and f(x1)Δxf(x_1)\Delta x as the area between the curve f(x)f(x) and the xx axis when Δx0\Delta x\to 0.

As you can see, each term in the sum corresponds to the area of a rectangle between the function f(x)f(x) and the xx axis (with a piece missing). In the limit where Δx0\Delta x\to 0, the missing pieces (shown by the hashed areas in Figure A.6) will vanish and f(xi)Δxf(x_i)\Delta x will become exactly the area between f(x)f(x) and the xx axis over a length Δx\Delta x. The sum of the rectangular areas will thus approach the area between f(x)f(x) and the xx axis between x0x_0 and xNx_N:

limΔx0i=1i=Nf(xi1)Δx=Area between f(x) and x axis from x0 to xN\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x=\text{Area between f(x) and x axis from $x_0$ to $x_N$}

Re-arranging equation (A.29) gives us a prescription for determining the anti-derivative:

F(xN)F(x0)=limΔx0i=1i=Nf(xi1)Δx\begin{align*} F(x_N) - F(x_0)&=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x \end{align*}

We see that if we determine the area between f(x)f(x) and the xx axis from x0x_0 to xNx_N, we can obtain the difference between the anti-derivative at two points, F(xN)F(x0)F(x_N)-F(x_0)

The difference between the anti-derivative, F(x)F(x), evaluated at two different values of xx is called the integral of f(x)f(x) and has the following notation:

x0xNf(x)dx=F(xN)F(x0)=limΔx0i=1i=Nf(xi1)Δx\boxed{\int_{x_0}^{x_N}f(x) dx=F(x_N) - F(x_0)=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x}

As you can see, the integral has labels that specify the range over which we calculate the area between f(x)f(x) and the xx axis. A common notation to express the difference F(xN)F(x0)F(x_N) - F(x_0) is to use brackets:

x0xNf(x)dx=F(xN)F(x0)=[F(x)]x0xN\int_{x_0}^{x_N}f(x) dx=F(x_N) - F(x_0) =\big [ F(x) \big]_{x_0}^{x_N}

Recall that we wrote the anti-derivative with the same \int symbol earlier:

f(x)dx=F(x)\int f(x) dx = F(x)

The symbol f(x)dx\int f(x) dx without the limits is called the indefinite integral. You can also see that when you take the (definite) integral (i.e. the difference between F(x)F(x) evaluated at two points), any constant that is added to F(x)F(x) will cancel. Physical quantities are always based on definite integrals, so when we write the constant CC it is primarily for completeness and to emphasize that we have an indefinite integral.

As an example, let us determine the integral of f(x)=2xf(x)=2x between x=1x=1 and x=4x=4, as well as the indefinite integral of f(x)f(x), which is the case that we illustrated in Figures \ref{fig:Calculus:fint} and \ref{fig:Calculus:fintarea}. Using equation (A.32), we have:

x0xNf(x)dx=limΔx0i=1i=Nf(xi1)Δx=limΔx0i=1i=N2xi1Δx\begin{align*} \int_{x_0}^{x_N}f(x) dx&=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x \\ &=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} 2x_{i-1} \Delta x \end{align*}

where we have:

x0=1xN=4Δx=xNx0N\begin{align*} x_0 &=1 \\ x_N &=4 \\ \Delta x &= \frac{x_N-x_0}{N} \end{align*}

Note that NN is the number of times we have Δx\Delta x in the interval between x0x_0 and xNx_N. Thus, taking the limit of Δx0\Delta x\to 0 is the same as taking the limit NN\to\infty. Let us illustrate the sum for the case where N=3N=3, and thus when Δx=1\Delta x=1, corresponding to the illustration in Figure A.6:

i=1i=N=32xi1Δx=2x0Δx+2x1Δx+2x2Δx=2Δx(x0+x1+x2)=2x3x0N(x0+x1+x2)=2(4)(1)(3)(1+2+3)=12\begin{align*} \sum_{i=1}^{i=N=3} 2x_{i-1} \Delta x &=2x_0\Delta x+2x_1\Delta x+2x_2\Delta x\\ &=2\Delta x (x_0+x_1+x_2) \\ &=2 \frac{x_3-x_0}{N}(x_0+x_1+x_2) \\ &=2 \frac{(4)-(1)}{(3)}(1+2+3) \\ &=12 \end{align*}

where in the second line, we noticed that we could factor out the 2Δx2\Delta x because it appears in each term. Since we only used 4 points, this is a pretty coarse approximation of the integral, and we expect it to be an underestimate (as the missing area represented by the hashed lines in Figure A.6 is quite large).

If we repeat this for a larger value of N, N=6N=6 (Δx=0.5\Delta x = 0.5), we should obtain a more accurate answer:

i=1i=62xi1Δx=2x6x0N(x0+x1+x2+x3+x4+x5)=2416(1+1.5+2+2.5+3+3.5)=13.5\begin{align*} \sum_{i=1}^{i=6} 2x_{i-1} \Delta x &=2 \frac{x_6-x_0}{N}(x_0+x_1+x_2+x_3+x_4+x_5)\\ &=2\frac{4-1}{6} (1+1.5+2+2.5+3+3.5)\\ &=13.5 \end{align*}

Writing this out again for the general case so that we can take the limit NN\to\infty, and factoring out the 2Δx2\Delta x:

i=1i=N2xi1Δx=2Δxi=1i=Nxi1=2xNx0Ni=1i=Nxi1\begin{align*} \sum_{i=1}^{i=N} 2x_{i-1} \Delta x &=2 \Delta x\sum_{i=1}^{i=N}x_{i-1}\\ &=2 \frac{x_N-x_0}{N}\sum_{i=1}^{i=N}x_{i-1} \end{align*}

Now, consider the combination:

1Ni=1i=Nxi1\frac{1}{N}\sum_{i=1}^{i=N}x_{i-1}

that appears above. This corresponds to the arithmetic average of the values from x0x_0 to xN1x_{N-1} (sum the values and divide by the number of values). In the limit where NN\to \infty, then the value xN1xNx_{N-1}\approx x_N. The average value of xx in the interval between x0x_0 and xNx_N is simply given by the value of xx at the midpoint of the interval:

limN1Ni=1i=Nxi1=12(xN+x0)\lim_{N\to\infty}\frac{1}{N}\sum_{i=1}^{i=N}x_{i-1}=\frac{1}{2}(x_N+x_0)

Putting everything together:

limNi=1i=N2xi1Δx=2(xN+x0)limN1Ni=1i=Nxi1=2(xNx0)12(xN+x0)=xN2x02=(4)2(1)2=15\begin{align*} \lim_{N\to\infty}\sum_{i=1}^{i=N} 2x_{i-1} \Delta x &=2 (x_N+x_0)\lim_{N\to\infty}\frac{1}{N}\sum_{i=1}^{i=N}x_{i-1}\\ &=2 (x_N-x_0)\frac{1}{2}(x_N+x_0)\\ &=x_N^2 - x_0^2\\ &=(4)^2 - (1)^2 = 15 \end{align*}

where in the last line, we substituted in the values of x0=1x_0=1 and xN=4x_N=4. Writing this as the integral:

x0xN2xdx=F(xN)F(x0)=xN2x02\int_{x_0}^{x_N}2x dx=F(x_N) - F(x_0)=x_N^2 - x_0^2

we can immediately identify the anti-derivative and the indefinite integral:

F(x)=x2+C2xdx=x2+C\begin{align*} F(x) &= x^2 +C \\ \int 2xdx&=x^2 +C \end{align*}

This is of course the result that we expected, and we can check our answer by taking the derivative of F(x)F(x):

dFdx=ddx(x2+C)=2x\frac{dF}{dx}=\frac{d}{dx}(x^2+C) = 2x

We have thus confirmed that F(x)=x2+CF(x)=x^2+C is the anti-derivative of f(x)=2xf(x)=2x.

A.4.1Common anti-derivative and properties

Table A.3 below gives the anti-derivatives (indefinite integrals) for common functions. In all cases, x,x, is the independent variable, and all other variables should be thought of as constants:

Table A.3:Common indefinite integrals of functions.

Function, f(x)f(x)Anti-derivative, F(x)F(x)
f(x)=af(x)=aF(x)=ax+CF(x)=ax+C
f(x)=xnf(x)=x^nF(x)=1n+1xn+1+CF(x)=\frac{1}{n+1}x^{n+1}+C
f(x)=1xf(x)=\frac{1}{x}$F(x)=\ln(
f(x)=sin(x)f(x)=\sin(x)F(x)=cos(x)+CF(x)=-\cos(x)+C
f(x)=cos(x)f(x)=\cos(x)F(x)=sin(x)+CF(x)=\sin(x)+C
f(x)=tan(x)f(x)=\tan(x)F(x)=ln(cos(x))+CF(x)=-\ln\left(\left|\cos(x)\right|\right)+C
f(x)=exf(x)=e^xF(x)=ex+CF(x)=e^x+C
f(x)=ln(x)f(x)=\ln(x)F(x)=xln(x)x+CF(x)=x\ln(x)-x+C

Note that, in general, it is much more difficult to obtain the anti-derivative of a function than it is to take its derivative. A few common properties to help evaluate indefinite integrals are shown in Table A.4 below.

Table A.4:Some properties of indefinite integrals.

Anti-derivativeEquivalent anti-derivative
(f(x)+g(x))dx\int (f(x)+g(x)) dxf(x)dx+g(x)dx\int f(x)dx+\int g(x) dx (sum)
(f(x)g(x))dx\int (f(x)-g(x)) dxf(x)dxg(x)dx\int f(x)dx-\int g(x) dx (subtraction)
af(x)dx\int af(x) dxaf(x)dxa\int f(x)dx (multiplication by constant)
f(x)g(x)dx\int f'(x)g(x) dxf(x)g(x)f(x)g(x)dxf(x)g(x)-\int f(x)g'(x) dx (integration by parts)

A.4.2Common uses of integrals in Physics - from a sum to an integral

Integrals are extremely useful in physics because they are related to sums. If we assume that our mathematician friends (or computers) can determine anti-derivatives for us, using integrals is not that complicated.

The key idea in physics is that integrals are a tool to easily performing sums. As we saw above, integrals correspond to the area underneath a curve, which is found by summing the (different) areas of an infinite number of infinitely small rectangles. In physics, it is often the case that we need to take the sum of an infinite number of small things that keep varying, just as the areas of the rectangles.

Consider, for example, a rod of length, LL, and total mass MM, as shown in Figure A.7. If the rod is uniform in density, then if we cut it into, say, two equal pieces, those two pieces will weigh the same. We can define a “linear mass density”, μ, for the rod, as the mass per unit length of the rod:

μ=ML\mu = \frac{M}{L}

The linear mass density has dimensions of mass over length and can be used to find the mass of any length of rod. For example, if the rod has a mass of M=5kgM=5 {\rm kg} and a length of L=2mL=2 {\rm m}, then the mass density is:

μ=ML=(5kg)(2m)=2.5kg/m\mu=\frac{M}{L}=\frac{(5 {\rm kg})}{(2 {\rm m})}=2.5 {\rm kg/m}

Knowing the mass density, we can now easily find the mass, mm, of a piece of rod that has a length of, say, l=10cml=10 {\rm cm}. Using the mass density, the mass of the 10cm10 {\rm cm} rod is given by:

m=μl=(2.5kg/m)(0.1m)=0.25kgm=\mu l=(2.5 {\rm kg/m})(0.1 {\rm m})=0.25 {\rm kg}

Now suppose that we have a rod of length LL that is not uniform, as in Figure A.7, and that does not have a constant linear mass density. Perhaps the rod gets wider and wider, or it has holes in it that make it not uniform. Imagine that the mass density of the rod is instead given by a function, μ(x)\mu(x), that depends on the position along the rod, where xx is the distance measured from one side of the rod.

A rod with a varying linear density. To calculate the mass of the rod, we consider a small mass element $\Delta m_i$ of length $\Delta x$ at position $x_i$. The total mass of the rod is found by summing the mass of the small mass elements.

Figure A.7:A rod with a varying linear density. To calculate the mass of the rod, we consider a small mass element Δmi\Delta m_i of length Δx\Delta x at position xix_i. The total mass of the rod is found by summing the mass of the small mass elements.

Now, we cannot simply determine the mass of the rod by multiplying μ(x)\mu(x) and LL, since we do not know which value of xx to use. In fact, we have to use all of the values of xx, between x=0x=0 and x=Lx=L.

The strategy is to divide the rod up into NN pieces of length Δx\Delta x. If we label our pieces of rod with an index ii, we can say that the piece that is at position xix_i has a tiny mass, Δmi\Delta m_i. We assume that Δx\Delta x is small enough so that μ(x)\mu(x) can be taken as constant over the length of that tiny piece of rod. Then, the tiny piece of rod at x=xix=x_i, has a mass, Δmi\Delta m_i, given by:

Δmi=μ(xi)Δx\Delta m_i = \mu(x_i) \Delta x

where μ(xi)\mu(x_i) is evaluated at the position, xix_i, of our tiny piece of rod. The total mass, MM, of the rod is then the sum of the masses of the tiny rods, in the limit where Δx0\Delta x\to 0:

M=limΔx0i=1i=NΔmi=limΔx0i=1i=Nμ(xi)Δx\begin{align*} M &= \lim_{\Delta x\to 0}\sum_{i=1}^{i=N}\Delta m_i \\ &= \lim_{\Delta x\to 0}\sum_{i=1}^{i=N} \mu(x_i) \Delta x \end{align*}

But this is precisely the definition of the integral (equation (A.29)), which we can easily evaluate with an anti-derivative:

M=limΔx0i=1i=Nμ(xi)Δx=0Lμ(x)dx=G(L)G(0)\begin{align*} M &=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} \mu(x_i) \Delta x \\ &= \int_0^L \mu(x) dx \\ &= G(L) - G(0) \end{align*}

where G(x)G(x) is the anti-derivative of μ(x)\mu(x).

Suppose that the mass density is given by the function:

μ(x)=ax3\mu(x)=ax^3

with anti-derivative (Table \ref{tab:Calculus:commonints}):

G(x)=a14x4+CG(x)=a\frac{1}{4}x^4 + C

Let a=5kg/m4a=5 {\rm kg/m^4} and let’s say that the length of the rod is L=0.5mL=0.5 {\rm m}. The total mass of the rod is then:

M=0Lμ(x)dx=0Lax3dx=G(L)G(0)=[a14L4][a1404]=5kg/m414(0.5m)4=78g\begin{align*} M&=\int_0^L \mu(x) dx \\ &=\int_0^L ax^3 dx \\ &= G(L)-G(0)\\ &=\left[ a\frac{1}{4}L^4 \right] - \left[ a\frac{1}{4}0^4 \right]\\ &=5 {\rm kg/m^4}\frac{1}{4}(0.5 {\rm m})^4 \\ &=78 {\rm g}\\ \end{align*}

With a little practice, you can solve this type of problem without writing out the sum explicitly. Picture an infinitesimal piece of the rod of length dxdx at position xx. It will have an infinitesimal mass, dmdm, given by:

dm=μ(x)dxdm = \mu(x) dx

The total mass of the rod is the then the sum (i.e. the integral) of the mass elements

M=dmM = \int dm

and we really can think of the \int sign as a sum, when the things being summed are infinitesimally small. In the above equation, we still have not specified the range in xx over which we want to take the sum; that is, we need some sort of index for the mass elements to make this a meaningful definite integral. Since we already know how to express dmdm in terms of dxdx, we can substitute our expression for dmdm using one with dxdx:

M=dm=0Lμ(x)dxM = \int dm = \int_0^L \mu(x) dx

where we have made the integral definite by specifying the range over which to sum, since we can use xx to “label” the mass elements.

One should note that coming up with the above integral is physics. Solving it is math. We will worry much more about writing out the integral than evaluating its value. Evaluating the integral can always be done by a mathematician friend or a computer, but determining which integral to write down is the physicist’s job!

A.5Summary

The derivative of a function, f(x)f(x), with respect to xx can be written as:

ddxf(x)=dfdx=f(x)\frac{d}{dx} f(x)=\frac{df}{dx}=f'(x)

and measures the rate of change of the function with respect to xx. The derivative of a function is generally itself a function. The derivative is defined as:

f(x)=limΔx0f(x+Δx)f(x)Δxf'(x) = \lim_{\Delta x \to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}

Graphically, the derivative of a function represents the slope of the function, and it is positive if the function is increasing, negative if the function is decreasing and zero if the function is flat. Derivatives can always be determined analytically for any continuous function.

A partial derivative measures the rate of change of a multi-variate function, f(x,y)f(x,y), with respect to one of its independent variables. The partial derivative with respect to one of the variables is evaluated by taking the derivative of the function with respect to that variable while treating all other independent variables as if they were constant. The partial derivative of a function (with respect to xx) is written as:

fx\frac{\partial f}{\partial x}

The gradient of a function, f(x,y)\nabla f(x,y), is a vector in the direction in which that function is increasing most rapidly. It is given by:

f(x,y)=fxx^+fyy^\nabla f(x,y)=\frac{\partial f}{\partial x}\hat x + \frac{\partial f}{\partial y} \hat y

Given a function, f(x)f(x), its anti-derivative with respect to xx, F(x)F(x), is written:

F(x)=f(x)dxF(x) = \int f(x) dx

F(x)F(x) is such that its derivative with respect to xx is f(x)f(x):

dFdx=f(x)\frac{dF}{dx}=f(x)

The anti-derivative of a function is only ever defined up to a constant, CC. We usually write this as:

f(x)dx=F(x)+C\int f(x) dx = F(x) + C

since the derivative of F(x)+CF(x) +C will also be equal to f(x)f(x). The anti-derivative is also called the “indefinite integral” of f(x)f(x).

The definite integral of a function f(x)f(x), between x=ax=a and x=bx=b, is written:

abf(x)dx\int_a^b f(x) dx

and is equal to the difference in the anti-derivative evaluated at x=ax=a and x=bx=b:

abf(x)dx=F(b)F(a)\int_a^b f(x) dx = F(b) - F(a)

where the constant CC no longer matters, since it cancels out. Physical quantities only ever depend on definite integrals, since they must be determined without an arbitrary constant.

Definite integrals are very useful in physics because they are related to a sum. Given a function f(x)f(x), one can relate the sum of terms of the form f(xi)Δxf(x_i)\Delta x over a range of values from x=ax=a to x=bx=b to the integral of f(x)f(x) over that range:

limΔx0i=1i=Nf(xi1)Δx=x0xNf(x)dx=F(xN)F(x0)=\lim_{\Delta x\to 0}\sum_{i=1}^{i=N} f(x_{i-1}) \Delta x = \int_{x_0}^{x_N}f(x) dx=F(x_N) - F(x_0)=

A.6Thinking about the Material

A.7Sample problems and solutions

A.7.1Problems

A.7.2Solutions