files/journal/2022-09-01_23-34-07-000000_997.jpg

Journal of Modern Mathematics and Statistics

ISSN: Online
ISSN: Print 1994-5388
139
Views
1
Downloads

On the Equivalence of Two Quasi-Newton Schemes in Generalized Linear Models

Mbe Egom Nja
Page: 25-28 | Received 21 Sep 2022, Published online: 21 Sep 2022

Full Text Reference XML File PDF File

Abstract

The Iterative Weighted Least Squares and the Fisher’s Scoring methods are two most commonly used iterative maximum likelihood optimization methods in generalized linear models. The Fisher’s Scoring method is given in terms of the gradient vector. While, the Iterative Weighted Least Squares method is based on the adjusted dependent vector. Using the relation between the expected Hessian matrix and weighted sum of squares, established for quasi-likelihood function and the link between the expected Hessian and the weighted sum of cross product, a proof of the theorem on the equivalence of the two quasi-Newton schemes is presented.


INTRODUCTION

The maximum likelihood estimator is an alternative to the minimum variance unbiased estimator (Scott and Nowak, 2006). In generalized linear models parameter estimation is accomplished by iterative maximum likelihood procedure. Generalized linear models extend the idea of non linear regression to models with non-normal error distribution (Smyth, 2002). This is done (Allen, 1987) by replacing the objective function, f(x) with the log likelihood function l (θ, y).

Stokes et al. (1975), McCullagh and Nelder (1992) used the logit defined as the logarithm of the ratio between the probability of success and the probability of failure to demonstrate the concept of link function in generalized linear models. Based on this, the weight function of the Iterative weighted least squares method is defined.

Definitions: Let be estimate of parameter vector β at iteration k, then the Fisher’s Scoring method is given as

where,

H=The Hessian matrix and g is the gradient vector

The iterative weighted least squares method is a maximum likelihood estimation method for generalized linear models. The solution is given as follows:

where,

Z=Adjusted dependent vector

is the systematic component of the model. X is the design matrix. Wedderburn (1974) stated the theorem on the equivalence of the Fisher’s Scoring method and the Iterative Weighted Least Squares method and showed that

where,

k=Quasi-likelihood function having properties similar to those of the log likelihood function

McCullagh and Nelder (1992) using the log likelihood function, l and an adjusted component βr established that

Where:

These facts are used to present a formal proof of the theorem. Wedderburn (1974) established in his theorem (1) and proof.

MATERIALS AND METHODS

Theorem 1: Let yi (i = 1,...,n) be independent observations with expectations μi and variances V (μi). Let K (yi, μi) be the quasi-likelihood function of the observation yi and suppose that μ is expressed as a function of parameters βi,...,βm then

Proof : Note that

since, V (μ) = var (y).

Also, we have

which completes the proof.

The quasi-likelihood function and the log likelihood function have similar properties. For this reason, we consider the expectation of the Hessian matrix defined on the log likelihood function.

The loglikelihood function and fisher’s information .

The loglikelihood for a binary response variable can be written as:

which becomes

From

since

so that

The fisher information for β is given (Silvey, 1970) as -E (∂2l/∂βr∂βs):

Where:

Therefore, -E (H) =X'WX. = Fisher’s information.

The iterative Weighted Least Squares method is derivable form the Fisher’s scoring method as shown by the theorem and proof.

Theorem 2 (the main theorem): The iterative weighted least squares and the Fisher’s Scoring methods are equivalent optimization schemes in generalized linear models.

Proof: Let the adjusted dependent variate

where,

η=Systematic component of the model

Let the gradient vector, g = ∂l/∂β and A = -E (∂2l/∂βr∂βs)

Let

The replacement of ∂2l/∂βr∂βs with (∂2l/∂βr∂βs) in the Newton-Raphson method yields the Fisher’s Scoring method.

From the Newton-Raphson update

Hence,

For a single observation with constant dispersion, α(φ) disappears,

Taking all the observations together,

xi = xij (summation over all n individual observations). The components of g are

The new estimate β(k+1) = β(k) + δβ = β(k) + A-1 g, Aβ(k+1) = Aβ(k) + Aδβ = Aβ(k) + g, The component, βr of β is given as:

and the adjusted component based on z is given as

but

Hence,

which is the Iterative Weighted Least Squares update. Thus we have shown that the Fisher’s scoring algorithm is the same as the iterative Weighted Least Squares algorithm.

RESULTS AND DISCUSSION

The exploration of alternative estimation schemes in generalized linear models arises from the complexities associated with the computation of the Hessian matrix in the Newton-Raphson method. Each member of the Hessian matrix involves a weight matrix, both partial and ordinary differential operators and the systematic component of the model. The quasi-Newton methods avoid the direct use of the Hessian matrix by considering its expected value. The prove that both methods are equivalent rests on the fact that the expected value of the Hessian matrix, E (H) = - (X1WX) and that the gradient vector g is a product of the Hessian matrix and the discrepancy between current and previous quasi-Newton’s updates. The loglikelihood function for a binary response variable has been used to first establish that the expected Hessian matrix used in Fisher’s Scoring method is actually the Fisher’s information matrix, X1WX used in the Iterative Weighted Least Squares method. The gradient vector or score function is a weighted differential operator of the systematic component of the model. Computational ease remains the guiding factor in the choice of either of the method.

CONCLUSION

Parameter estimates of generalized linear models can be obtained using the Fisher’s Scoring method or the Iterative Weighted Least Squares method. The Fisher’s Scoring method uses the gradient vector while the Iterative Weighted Least Squares method uses the adjusted dependent variate. These differences not withstanding, bo th method yield the same solutions. The ease of computation of the gradient vector g and the adjusted dependent variate become the deciding factor as to which method to adopt in any given situation.

ACKNOWLEDGEMENTS

I acknowledge the contributions of Prof. T.A. Bamiduro, who introduced me to Generalized Linear Models. The research of R.W.M. Wedderburn on Quasi-likelihood functions and P. McCullagh and J.A. Nelder on Generalized Linear Models have been very useful in the development of this study.

How to cite this article:

Mbe Egom Nja . On the Equivalence of Two Quasi-Newton Schemes in Generalized Linear Models.
DOI: https://doi.org/10.36478/jmmstat.2009.25.28
URL: https://www.makhillpublications.co/view-article/1994-5388/jmmstat.2009.25.28