Tuesday, September 9, 2008

Line of Best Fit For Points in Three Dimensional Space

Theory:
(from : http://mathforum.org/library/drmath/view/69103.html)
The line you are looking for is called the 3D Orthogonal Distance Regression (ODR) line.  This line can be computed without writing a
lot of computer code if you already have code for the Eigen decomposition or the simgular value decomposition (SVD).

Let's parameterize the line like this,

x = x0 + at
y = y0 + bt
z = z0 + ct

where (x0,y0,z0) is a point on the line and (a,b,c) is a unit vector.

We want to minimize the sum of squared distances from the points to the line. The vector equation for the sum leads to

f(x0,y0,z0,a,b,c) = SUM{[c(yi - y0) - b(z - z0)]^2 +
[a(zi - z0) - c(x - x0)]^2 +
[b(xi - x0) - a(y - y0)]^2}

If we take the first derivatives with respect to x0, y0, and z0 and set the results equal to zero we get equations that can be manipulated
to yield

(x0-xbar)/a = (y0-ybar)/b = (z0-zbar)/c

where (xbar,ybar,zbar) is the centroid of the data. Notice that (xbar,ybar,zbar) is a valid choice for (x0,y0,z0). This proves that
the centroid is on the ODR line, and so we choose it for (x0,y0,z0).

Now we just need to find vector (a,b,c).

We need a little notation to make the next steps easier. Let C be the centroid, let L be the ODR line, and let P be the plane through C such
L is perpendicular to P. We are doing this in order to make use of the ODR plane analysis, which is closely related.

By the Pythagorean theorem, for a set of points Xi,

SUM[distance(Xi,L)^2] = SUM[distance(Xi,C)^2] -
SUM[distance(Xi,P)^2]

We want to choose (a,b,c) so that SUM[distance(Xi,L)^2] is minimized. This is the same as maximizing SUM[distance(Xi,P)^2]. (Notice that
SUM[distance(Xi,C)^2] is a constant).

The ODR plane analysis can be found here:

Orthogonal Distance Regression Planes
http://mathforum.org/library/drmath/view/63765.html

The ODR plane analysis uses matrices A and M. For the ODR plane we want the eigen vector of A that corresponds to its smallest
eigenvalue. This is the same as the singular vector of M that corresponds to its smallest singular value. For the ODR line we
instead want the eigen vector of A that corresponds to its largest eigenvalue, or the singular vector of M that corresponds to its
largest singular value. This maximizes the desired sum. The decomposition of A or M gives us the vectors for both the ODR plane
and the ODR line at the same time.

To summarize, the 3D ODR line contains the centroid, and has as its
direction vector the eigen vector (or singular vector) referred to above



No comments: