The Iterative Weighted Least Squares and the Fishers Scoring methods are two most commonly used iterative maximum likelihood optimization methods in generalized linear models. The Fishers Scoring method is given in terms of the gradient vector. While, the Iterative Weighted Least Squares method is based on the adjusted dependent vector. Using the relation between the expected Hessian matrix and weighted sum of squares, established for quasi-likelihood function and the link between the expected Hessian and the weighted sum of cross product, a proof of the theorem on the equivalence of the two quasi-Newton schemes is presented.
INTRODUCTION
The maximum likelihood estimator is an alternative to the minimum variance unbiased estimator (Scott and Nowak, 2006). In generalized linear models parameter estimation is accomplished by iterative maximum likelihood procedure. Generalized linear models extend the idea of non linear regression to models with non-normal error distribution (Smyth, 2002). This is done (Allen, 1987) by replacing the objective function, f(x) with the log likelihood function l (θ, y).
Stokes et al. (1975), McCullagh and Nelder (1992) used the logit defined as the logarithm of the ratio between the probability of success and the probability of failure to demonstrate the concept of link function in generalized linear models. Based on this, the weight function of the Iterative weighted least squares method is defined.
Definitions: Let be estimate of parameter vector β at iteration k, then the Fisher’s Scoring method is given as
![]() |
where,
H=The Hessian matrix and g is the gradient vector
![]() |
The iterative weighted least squares method is a maximum likelihood estimation method for generalized linear models. The solution is given as follows:
![]() |
where,
Z=Adjusted dependent vector
![]() |
is the systematic component of the model. X is the design matrix. Wedderburn (1974) stated the theorem on the equivalence of the Fisher’s Scoring method and the Iterative Weighted Least Squares method and showed that
![]() |
where,
k=Quasi-likelihood function having properties similar to those of the log likelihood function
McCullagh and Nelder (1992) using the log likelihood function, l and an adjusted component βr established that
![]() |
Where:
![]() |
These facts are used to present a formal proof of the theorem. Wedderburn (1974) established in his theorem (1) and proof.
MATERIALS AND METHODS
Theorem 1: Let yi (i = 1,...,n) be independent observations with expectations μi and variances V (μi). Let K (yi, μi) be the quasi-likelihood function of the observation yi and suppose that μ is expressed as a function of parameters βi,...,βm then
![]() |
Proof : Note that
![]() |
since, V (μ) = var (y).
Also, we have
![]() |
which completes the proof.
![]() |
The quasi-likelihood function and the log likelihood function have similar properties. For this reason, we consider the expectation of the Hessian matrix defined on the log likelihood function.
The loglikelihood function and fisher’s information .
The loglikelihood for a binary response variable can be written as:
![]() |
which becomes
![]() |
From
![]() |
since
![]() |
so that
![]() |
The fisher information for β is given (Silvey, 1970) as -E (∂2l/∂βr∂βs):
![]() |
Where:
![]() |
Therefore, -E (H) =X'WX. = Fisher’s information.
The iterative Weighted Least Squares method is derivable form the Fisher’s scoring method as shown by the theorem and proof.
Theorem 2 (the main theorem): The iterative weighted least squares and the Fisher’s Scoring methods are equivalent optimization schemes in generalized linear models.
Proof: Let the adjusted dependent variate
![]() |
where,
η=Systematic component of the model
Let the gradient vector, g = ∂l/∂β and A = -E (∂2l/∂βr∂βs)
Let
![]() |
The replacement of ∂2l/∂βr∂βs with (∂2l/∂βr∂βs) in the Newton-Raphson method yields the Fisher’s Scoring method.
From the Newton-Raphson update
![]() |
Hence,
![]() |
For a single observation with constant dispersion, α(φ) disappears,
![]() |
Taking all the observations together,
![]() |
xi = xij (summation over all n individual observations). The components of g are
![]() |
The new estimate β(k+1) = β(k) + δβ = β(k) + A-1 g, Aβ(k+1) = Aβ(k) + Aδβ = Aβ(k) + g, The component, βr of β is given as:
![]() |
and the adjusted component based on z is given as
![]() |
but
![]() |
Hence,
![]() |
which is the Iterative Weighted Least Squares update. Thus we have shown that the Fisher’s scoring algorithm is the same as the iterative Weighted Least Squares algorithm.
RESULTS AND DISCUSSION
The exploration of alternative estimation schemes in generalized linear models arises from the complexities associated with the computation of the Hessian matrix in the Newton-Raphson method. Each member of the Hessian matrix involves a weight matrix, both partial and ordinary differential operators and the systematic component of the model. The quasi-Newton methods avoid the direct use of the Hessian matrix by considering its expected value. The prove that both methods are equivalent rests on the fact that the expected value of the Hessian matrix, E (H) = - (X1WX) and that the gradient vector g is a product of the Hessian matrix and the discrepancy between current and previous quasi-Newton’s updates. The loglikelihood function for a binary response variable has been used to first establish that the expected Hessian matrix used in Fisher’s Scoring method is actually the Fisher’s information matrix, X1WX used in the Iterative Weighted Least Squares method. The gradient vector or score function is a weighted differential operator of the systematic component of the model. Computational ease remains the guiding factor in the choice of either of the method.
CONCLUSION
Parameter estimates of generalized linear models can be obtained using the Fisher’s Scoring method or the Iterative Weighted Least Squares method. The Fisher’s Scoring method uses the gradient vector while the Iterative Weighted Least Squares method uses the adjusted dependent variate. These differences not withstanding, bo th method yield the same solutions. The ease of computation of the gradient vector g and the adjusted dependent variate become the deciding factor as to which method to adopt in any given situation.
ACKNOWLEDGEMENTS
I acknowledge the contributions of Prof. T.A. Bamiduro, who introduced me to Generalized Linear Models. The research of R.W.M. Wedderburn on Quasi-likelihood functions and P. McCullagh and J.A. Nelder on Generalized Linear Models have been very useful in the development of this study.
Mbe Egom Nja . On the Equivalence of Two Quasi-Newton Schemes in Generalized Linear Models.
DOI: https://doi.org/10.36478/jmmstat.2009.25.28
URL: https://www.makhillpublications.co/view-article/1994-5388/jmmstat.2009.25.28