This study proposes, a new learning algorithm for extracting the independent source signals from an artificially mixed signal. An adaptive self-normalized radial basis function neural network is developed and trained by the proposed learning algorithm to model the nonlinearity from the latent variables to the observations. The joint probability density function and marginal probability density functions are used to determine the inverse of the nonlinear mixing matrix, which is assumed to exist and able to be approximated. The centers of the ASN-RBF network are initialized with the weights between input and hidden layer to update the parameters in the generative model. This proposed algorithm is well-suited for nonlinear data analysis problems and theoretically interesting. Minimum 3 signals are considered for simulation. Simulation results show the feasibility of the proposed algorithm. The performance of the proposed network is compared with the Independent Component Analysis (ICA) algorithm and it is illustrated with computer simulated experiments.
INTRODUCTION
In many signal and data analysis situations, observed data are known to be some mixture of underlying sources. The mixing process may be linear or nonlinear and while, the structure of the mixing process may be known, the mixture parameters (in the linear case, the mixing matrix) will be unknown. Blind Source Separation (BSS) is a technique, which allows separating a number of source signals from observed mixtures of those sources without a previous knowledge of the mixing process (Cichocki and Amari, 2002). For example, if there are many speakers in a room, then each microphone receives a different mixture of the speaker signals. The task is then to separate the original (unmixed) speaker signals from the mixtures received at the microphones. A numerous attention has been aroused in these techniques in recent years with an increasing number of existing approaches. So, far several authors studied the difficult problem of the nonlinear blind source separation and proposed a few efficient demixing algorithms (Burel, 1992; Deco and Brauer, 1995; Pajunen, 1998; Hyvarinen and Pajunen, 1999). Model-free methods, which used Kohonen’s Self-Organizing Map (SOM) have been proposed to extract independent sources from nonlinear mixture, but suffers from the exponential growth of network complexity and interpolation error in recovering continuous sources (Herrmann and Yang, 1996; Pajunen et al., 1996; Lin and Grier, 1997). A nonlinear blind source separation algorithm has been proposed using two-layer perceptrons by the gradient descent method to minimize the mutual information (Burel, 1992). Subsequently, backpropagation algorithm has been developed for Burel’s model by natural gradient method (Yang et al., 1997). In their model cross nonlinearities are included. An entropy-based direct algorithm has been proposed for blind source separation in post nonlinear mixtures (Taleb et al., 1995). In addition, the extension of related linear ICA theories to the context of nonlinear mixtures has resulted in the development of nonlinear ICA (Pajunen et al., 1996). The so-called nonlinear ICA is to employ a nonlinear function to transform the nonlinear mixture such that the outputs become statistically independent after the transformation. However, this transform is not unique without some specific constraints on the function of nonlinear mixing. If x and y are 2 independent random variables, then f(x) and g(y) are also statistically independent regardless of the nonlinear functions f and g. Although, there exists many difficulties for this problem, several nonlinear ICA algorithms have been proposed and developed (Pajunen et al., 1996; Lee et al., 1997). The existence and uniqueness of nonlinear ICA are discussed in detail (Herrmann and Yang, 1996; Pajunen et al., 1996; Pajunen, 1996, 1999; Lin et al., 1997; Pajunen, 1998) and pointed out that the solution of nonlinear ICA always exists (Hyvarinen and Pajunen, 1999). It can become unique up to a rotation provided that the mixing function is constrained to a conformal mapping for a 2-dimensional problem together with some other assumptions such as bounded support of the probability density function (pdf) (Herault and Jutten, 1986, 1991; Li and Sejnowski, 1995; Makeig and Bell, 1996; Papadias and Paulraj, 1997; Van der Veen et al., 1997; Linde et al., 1980). Several authors have introduced a class of adaptive algorithms for source separation (Cardoso and Laheld, 1996). A contrast function, which consists of the mutual information and partial moments of the outputs of the separation system is defined to separate nonlinear mixture (Ying et al., 2001).
In this study, we propose a novel learning algorithm to extract independent components from the mixture signal. The ASN-RBF neural network is developed by MATLAB and trained by the proposed algorithm to minimize the objective function, which is the difference between the joint probability density function (pdf) and the product of marginal pdfs of the output vectors. The training continues until a stable value of weight vector is obtained.
FORMULATION OF THE PROBLEM
Blind Source Separation (BSS) is a technique, which allows separating a number of source signals from observed mixtures of those sources without a previous knowledge of the mixing process (Comon et al., 1991; Lin et al., 1997). Suppose that the two audio signals are generated by two sources simultaneously. Assume that the two signal generators are placed at 2 different locations and the 2 signal observers (e.g., microphones) receive the nonlinear mixture of these 2 signals. Each of these observed signals is a weighted sum of the 2 source signals generated by 2 sources. We denote them as m1 and m2, respectively. Now, the 2 observed signals can be represented as:
![]() |
(1) |
![]() |
(2) |
where, w11, w12, w21, w22 are the parameters whose value depends on the locations of the 2 sources. These values can be represented as a linear mixing matrix, say W as given in Eq. 3.
The observed signal can be represented in the form of a vector as given in Eq. 4:
![]() |
(3) |
![]() |
(4) |
where,
Now, the blind source separation problem can be defined as an estimation of the two original source signals m1 and m2 by receiving only the observed signals O1 and O2. This problem is also called as cocktail party problem as already been mentioned by many researches.
Actually, if we knew the mixing matrix W, then we can easily separate the two source signals by finding the inverse of the mixing matrix, i.e. W-1, which can also be called as demixing matrix.
![]() |
(5) |
But, the problem is considerably more difficult since we do not know the mixing matrix W a priori. Blind means that we know very little if anything on the mixing matrix and make little assumptions on the source signals.
The block diagram of mixing and non-mixing matrix is shown in Fig. 1. The three signals M1, M2 and M3 (which are assumed to be independent) are mixed by the random mixing network and they are observed as inputs to the microphones. The output from the microphone is given as inputs to ASN-RBF neural network. The weights are updated by the Stochastic Gradient descent Algorithm and the training continues until there is no change in the weight values in the consecutive iterations.
This problem can be solved by the classical method, Independent component Analysis (ICA), which assumes that the 2 sources are statistically independent of each other and non-Gaussian. ICA was originally developed to deal with problems that are closely related to cock-tail party problem. Since, the recent increase of interest in ICA, it has become clear that this principle has a lot of other interesting applications as well.
![]() |
|
Fig. 1: | This figure shows the block diagram of mixing and non-mixing matrix |
![]() |
|
Fig. 2: | This block diagram shows the design method to solve BSS problem |
Without loss of generality, we can assume that both the mixture variables and the independent components have zero mean. If this is not true, then the observed variables can always be centered by subtracting the sample mean, which makes the model zero-mean (Hyvärinen, 1999).
The ICA can solve BSS problem under the following assumptions (Comon, 1994).
• | The sources mi are statistically independent. |
• | The sources must have non-Gaussian distributions. |
But, by using ICA, the variances and the order of the independent sources can not be determined.
As shown in Fig. 2, it is necessary to design a suitable neural network for the blind source separation problem. In our research, we have developed ASN-RBF neural network since it exhibits fast training, simplicity and good generalization. To extract independent components from mixture signal, an appropriate objective
function is chosen and it is minimized by the unsupervised learning algorithm.
STOCHASTIC GRADIENT DESCENT ALGORITHM (SGDA)
To separate independent components from the observed signal, we require an objective function. The objective function is chosen such that it should give original signals when it is minimized (Yogesh and Rai, 2002). In signal processing, when the components of the output vector become independent, its joint probability function factorizes to marginal pdfs, which is given in Eq. 6:
![]() |
(6) |
where, mi is the ith component of the output signal. The pdf of m parameterized by W can be written as given in Eq. 7 (Papoulis, 1991):
![]() |
(7) |
where, |J| is determinant of the Jacobian matrix J. It can be defined as:
![]() |
(8) |
From Eq. 4, each element in Eq. 8 can be represented in terms of w as:
![]() |
(9) |
Therefore, Eq. 8 can now be written as:
![]() |
(10) |
Now Eq. 7 can be written as:
![]() |
(11) |
To extract independent components from the observed signal, the difference between the joint pdf and product of marginal pdfs has to be determined. When the components become independent, the difference becomes zero. This can be represented as:
![]() |
(12) |
Since, logarithm provides computational simplicity, taking logarithm on both sides of Eq. 12, we get
![]() |
(13) |
Substituting the value of f(m,W) from Eq. 11 in Eq. 13, we get
![]() |
(14) |
Because, the pdf of the input vector is independent of the parameter vector W, the objective function for optimization becomes
![]() |
(15) |
Now, the Edgeworth series has been used to expand the second term in Eq. 15. The first three terms of this expansion is
![]() |
(16) |
Here, the random variables Oj have mean μ, variance σ2 and higher cumulants k = σrλ. Ψ(j) (i) is the jth derivative of Ψ(m) with respect to m. Cumulants can also be expressed in terms of moments. The rth order cumulants are expressed in terms of moments as follows:
![]() |
(17) |
![]() |
(18) |
After simplification, the Gradient descent of Eq. 15 now becomes
![]() |
(19) |
The Stochastic gradient descent algorithm for weight updation can now be written as:
![]() |
(20) |
Substituting the gradient of the cost from Eq. 19, the weight update rule can now be written as:
![]() |
(21) |
The advantage of Edgeworth series is that the error is controlled, so that it is a true asymptotic expansion.
Algorithm description
Step 1: Initialize the parameter.
• | Assign weights between input and hidden layer. |
• | Assign weights between hidden and output layer. |
• | Setη = 0.99, σ = 0.09. |
Step 2: Apply the input.
Step 3: Compute the output.
Step 4: Update weights between hidden and output layer
d1 = inv(det(hou_old));
for k = 1:output_neurons
![]() |
end
deloutput = d1-(d2*mixtures');
w (t + 1) = w(t) + lrp* deloutput (k) + 0.05;
Step 5: Evaluate O(t)=W(t)*x(t).
![]() |
|
Fig. 3: | This figure is the architecture of adaptive self-normalized radial basis function neural architecture |
Step 6: Repeat steps 2-5 until a stable value of w (t) is obtained.
ASN-RBF neural architecture: To choose an architecture for a given problem, the generalization error has to be minimum and to get quantitative measures for it, we have to consider the characteristics of the network such as number of layers, number of nodes in hidden layer and connectivity: A priori information about the problem may be included here. The radial basis function neural network is proposed for BSS problem since the network exhibits rapid training, simplicity and generality. In recent years, there has been an increasing interest in using Radial Basis Function Neural Networks for many problems. Like Backpropagation and Counter propagation neural networks, it is a feedforward neural network that is capable of performing nonlinear relationship between the input and output vector spaces. The network consists of three layers: an input layer, a single layer of nonlinear processing hidden neurons and an output layer.
The ASN-RBF neural network architecture is shown in Fig. (3). The 1000 samples from input signals are given as inputs to input layer. The input layer behaves as fan-in, fan-out since it does not perform any computation (i.e.,) it does not process the inputs. The outputs from the input layer are given as inputs to hidden layer.
The output of the ASN-RBF neural network is obtained by the Eq. 22.
![]() |
(22) |
for i =1,2,3, .……, m
where, x ε Rn+1 is an input vector and Φk(.) is a radial basis function, which is given by e-D2i/(2σ)2
where, D2i = (X - Wik)T (X - Wik), σ is the spread factor, which controls the width of the Radial Basis Function, wik are the weights in the output layer, N is the number of neurons in the hidden layer and ck ε Rnx1are the RBF centers in the input vector space. For each neuron in the hidden layer, the Euclidean distance between its associated center and the input to the network is computed (Haykin, 1994). The output of the neuron in a hidden layer is a nonlinear function of the distance. Finally, the output of the network is computed as a weighted sum of the hidden layer outputs given by Output_of_outputn(b) = (input_to_outputn(b)/α), where, α is the scaling parameter, which determines the convergence of the learning algorithm. During training, if it is very low, the total error becomes NaN. It is increased gradually, so that for a particular value, the network converges and the error was reduced to acceptance value.
The centers ck are defined points that are assumed to perform an adequate sampling of the input vector space. They are usually chosen as subset of the input data.
The weight vector Wik determines the value of X, which produces the maximum output from the neuron. The response at other values of X drops quickly as X deviates from W, becoming negligible in value when X is far from W.
PERFORMANCE MEASURES AND EXPERIMENTAL RESULTS
To analyze the performance of an algorithm, we must know the following.
• | When has something been learned? |
• | When is the network good? |
• | How long does learning take? |
Learning means fitting a model to a set of training data, such that for a given set of input patterns, the desired output patterns are reproduced (Poggio and Girosi, 1989). Learning criterion: When we have binary output units, we can define that an example has been learned if the correct output has been produced by the network. In general, we need a global error measure (E) and we define a critical error Ec. The condition is that not only should the global error at every output node exceed a certain value; all patterns should have been learned to a certain extent. If the output units are continuous valued, e.g. within the interval [0,1], then we might define anything <0.4 as 0 and anything >0.6 as 1; whatever lies between 0.4-0.6 is considered as incorrect. In this way, a certain tolerance is possible. We also have to distinguish between performance on the training and test set. So, we need a quantitative measure of generalization ability.
The performance of the proposed algorithm can be analyzed by the parameters given below.
Convergence: It minimizes the objective function. Minimizing the objective function can be visualized using error surfaces. The metaphor used is that the system moves on the error surface to a local minimum. The error surface or landscape is typically visualized by plotting the error as a function of two weights. Error surfaces represent a visualization of some parts of the search space. i.e., the space, in which the weights are optimized. Weight spaces are typically high-dimensional, so what is visualized is the error corresponding to just two weights. The error function depends not only on the data to be learned but also on the activations.
Convergence rates with stochastic gradient algorithm are typically faster than backpropagation algorithm. There is a lot of literature about improvements. In order to increase the convergence rate, the learning rate parameter η can also be changed over time:
![]() |
(23) |
where, c and d are parameters whose values are chosen between 0 and 1. There are various ways, in which the learning rate can be adapted. Newton, Steepest descent, Conjugate gradient and Quasi-Newton are all alternatives.
Local minima: One of the problems with all gradient descent algorithms is that they may get stuck with in local minima. There are various ways, in which they can be escaped. Noise can be introduced by shaking the weights. Shaking the weights means that a random variable is added to the weights. Alternatively, the algorithm can be run again using a different initialization of the weights. It has been argued that because the space is so high dimensional, there is always a ‘ridge’ where an escape from a local minimum is possible (Lappalainen and Giannakopoulos, 1999; Lee et al., 1997). Because error functions are normally only visualized with very few dimensions, one gets the impression that a backpropagation algorithm is very likely to get in a local minimum. This seems not to be the case with large dimensions. Since the proposed algorithm is stochastic, it does not up with local minima.
Performance index: After the separating matrix W has been computed by the stochastic gradient descent algorithm, the separation quality can be measured by the measure, so called performance index, (PI) (Cichocki and Amari, 2002) as:
![]() |
(24) |
where, sij denotes the ith row and jth column of P, (P = A.W)
![]() |
|
Fig. 4: | Comparision of SNR for ICA and SGDA algorithm |
Table 1: | SNR of separated signals using ICA |
![]() |
Table 2: | SNR of separated signals using SGDA |
![]() |
Signal to noise ratio: Another measure of separation quality used was the Signal to Noise Ratio (SNR) of the separated outputs, given by the Eq. (25) (Guillermo et al., 2003).
![]() |
(25) |
where, m(t) is the desired signal and n(t) = o(t)-m(t) is the noise indicating the undesired signal. o(t) is the estimated source signals.
From Quantitative analysis of Table 1 and 2, it has been observed that the Signal to Noise (SNR) of the separated signals by the Proposed Stochastic Gradient Descent Algorithm is much lower than that of ICA algorithm and its comparison for 1000 sampled data is shown in Fig. 4.
EXPERIMENTAL RESULTS
The three audio signals, which are given below in Table 3 are nonlinearly mixed by the mixing matrix A given in Eq. 26, which is generated by the matlab function rand().
![]() |
(26) |
The learning rate parameter was initially chosen as 0.99 and it is varied by the Eq. 23 during training. The spread factor for radial basis function is set at 0.09. The centre of the basis functions are initially set to the weight matrix between input and hidden layer. The original, mixture and separated output signals are shown in Fig. 5. Figure 5a shows the signals, which are generated artificially and the Fig. 5b shows the mixture signal after the three source signals are mixed by the mixing matrix ‘A’ and the Fig. 5c represents the recovered signal from ASN-RBF neural network.
![]() |
|
Fig. 5a: | Original source signals |
![]() |
|
Fig. 5b: | Mixture signal |
![]() |
|
Fig. 5c: | Recovered source signal |
Table 3: | Three source signals used for simulation |
![]() |
CONCLUSION
This study proposes a novel neural learning algorithm for extracting independent sources from an artificially mixed signal. An adaptive self-normalized Radial Basis Function neural network has been developed and trained by the proposed stochastic gradient descent optimization algorithm with fixed centers to update the parameters in the generative model. Many different approaches have been attempted by numerous researches using neural networks, each claiming various degrees of success. But, the separation of signals in real environments is still to be improved. The performance of the proposed network is compared with the ICA algorithm and it is illustrated with computer simulated experiments.
D. Malathi and N. Gunasekaran . A Novel Neural Learning Algorithm for Separation of Blind Signals.
DOI: https://doi.org/10.36478/ijscomp.2009.16.24
URL: https://www.makhillpublications.co/view-article/1816-9503/ijscomp.2009.16.24