ISSN: 1682-3915 © Medwell Journals, 2016 ## Fast Architecture Multiplier Less Based DWT <sup>1</sup>A. Akilandeswari and <sup>2</sup>P. Sakthivel <sup>1</sup>Department of Electronics and Communication Engineering, St. Joseph's Institute of Technology, Chennai, India <sup>2</sup>Department of Electronics and Communication Engineering, Anna University, Chennai, India **Abstract:** Now a days, there is very much use of multimedia technologies, so there is need of improvement in the image compression technique in terms of performance and also the new features. Due to advantages of the discrete wavelet transform over the traditional transforms, it became very popular in the area of image processing. A Fast Architecture (FA) for 2-D Discrete Wavelet Transform (DWT) with use of improved Lifting scheme is presented in this study. Likewise embedded decimation technique used for the 1-D Discrete Wavelet Transform (DWT), pipelined and parallel structured 2-D DWT proposed in this study. In this study, we have proposed the multiplier less pipeline method for DWT. The advantage of this technique is that it does maximum utilization of the designed hardware. It does the J levels of decomposition when input image of size N×N given in an around 2N2 (1-4raise to-j)/3 of clock cycles. This method is called as Fast Architecture (FA). Using this technique throughput rate, output latency, etc are improved at the cost of some additional hardware. So, proposed architecture is better alternative for high speed applications. Key words: Multimedia, fast architecture, applications, traditional, transforms ### INTRODUCTION It has been seen in today's world that the multimedia technologies are widely used in many multimedia devices. The various operations are performed on the images in multimedia technologies. It may be analysis of the image, image compression, image processing etc. Among all transform techniques, DWT (Discrete Wavelet Transform) have the better advantages over the Convolution based transform or lifting based transform in terms of high computation, low hardware cost, higher efficiency, maximum hardware utilization. A fast architecture for DWT is introduced here to achieve all these advantages. Lifting scheme also reduce the number of computations but fast architecture along with improved lifting scheme gives other advantages too. For the use of transforms in high speed/low power applications, there should be high efficiency, low hardware cost is required. 2D-DWT have composed of multi rate filters. It decomposes the signal with time and frequency domain in different sub-bands. Also, fast architecture introduces high speed advantage in it and numbers of levels of decompositions are increased. So, there is high computation ability in it that's improving the utilization of hardware. There are different proven methods of transform. In convolution based or traditional method transform, there is filtering of input signal into low pass and high pass filter sub-bands with the use of low pass and high pass filters. Then these low pass and high pass sub-bands are sub sampled and image is recovered from it. So, the basically, it uses much power also the number of computations are more in number. So, the speed is less in this type of transform. Then the wavelet transform is introduced to overcome these drawbacks of the convolution based transform. Also, there are many methods for discrete wavelet transform. The well known method is Lifting scheme for the discrete wavelet transform. Lifting architecture for 2D-DWT uses the 3 simple lifting steps for the decomposition of the input signal. These steps are splitting of the signal into high pass and low pass filters into sequence of filters then lifting step for the multiplication and addition operations (computations) on it and then scaling of it. This technique has low computations but it cannot be used for all high speed applications rather it is used for real time applications. But also there are no of multiplication and additions operations involved. So, there is use of multipliers in the lifting scheme. With the use of multipliers, it increases the hardware and area. Also, it makes the design costly. So, I proposed a new technique called fast architecture with improved lifting scheme for discrete wavelet transform for the high speed applications in this study. In this technique, parallel and pipeline execution of the lifting steps is done. Also, the use of multiplier is removed. Multipliers are replaced by shift-add circuit. It reduces the hardware and decreases the cost of hardware. So, using this technique throughput rate is increased, output latency time is decreased. So, it is better technique than conventional 2D-DWT. In 1D-DWT with use of embedded decimation, it takes the input image and decomposes the input image into low and high pass components. These components are available at the output alternately. With use of it, I have proposed the fast architecture for 2D-DWT using parallel and pipeline technique. In this technique, there are two horizontal and vertical filter used. Due to use of these filters in parallel and pipelined fashion there is maximum utilization of the hardware. Also, the speed of execution (throughput rate) is increased by 2 times than the simple 2D-DWT architecture using the lifting scheme. **Literature review:** There are many research work done on the DWT architecture till now. Among that entire research works, we will discuss few of them here as follows. The memory efficient architecture is proposed by Chiang *et al.* (2005). This architecture have low internal memory than the existing lifting based 2D-DWT. The main components of this architecture are 1-D Row processor, Internal Memory and 1-D Column processor. When input is given to this row processor perform row wise DWT operation and coefficients of these are stored in the internal memory. If there are sufficient number of coefficients in the internal memory then column wise DWT is performed. So, this technique gives require less internal memory. Movva and Srinivasan (2003) proposed the novel architecture for lifting based discrete wavelet transform (Daqrouq, 2005). In this architecture they have used the lifting scheme instead of convolution based DWT along with Single Sample Overlap scheme. Single Sample Overlap (SSO) gives the low memory block representation. Also the lifting scheme lowers the computational complexity of the transform. The different components in this architecture communicates with each other with use of shared memories and all components works in parallel fashion. One more modification is done in this architecture, i.e., Canonic Signed Digit (CSD). Canonic Signed Digit lowers the number of computations and decreases the area and power requirement. Qin et al. (2004) proposed efficient architecture for 2D-DWT. Architecture consists of four main block viz. 1D-DWT, Main Processor, Memory management Unit and eight SRAM Memories. 2D-DWT is performed as row-column DWT fashion. Once, the row DWT is finished then coefficients are stored in the SRAM and then column DWT is performed. The coefficients obtained from 1D-DWT unit are stored in the eight SRAM. Four coefficients out of these coefficients are read or write in one clock cycle. This is done by the memory management unit. It selects the particular memories on which the operations are to be performed. The Main controller/processor controls all the actions performed in this architecture. Also, the 1D-DWT is designed for pipeline fashion so the throughput is also improved in this technique. Iwahashi and Kiya (2010) proposed the new lifting architecture for Non separable 2D-DWT which is compatible with JPEG-2000. The main aim behind this is to make 2D-DWT compatible with the JPEG-2000. In this technique, there is replacement of the lifting steps of 1D-DWT in horizontal and vertical transformations. Also, it is synthesized into less number of lifting steps with non separable 2D functions. The advantages of this technique are less number of lifting steps, lifting latency is reduced, high throughput, etc. Wu and Lin (2005) proposed high performance memory efficient architecture for 5/3 and 9/7 discrete wavelet transform of JPEG 2000 CODEC. This technique of 2D-DWT uses column-row wise transformation. It has three components as column processor, transposing buffer and row processor. Row processor performs partial transformation on column transformed coefficients. Due to this there is short path of pipeline stage. Also due to partial transformation in row processor, it reduces the required internal memory. Design and FPGA Implementation of improved lifting scheme based DWT for OFDM systems is proposed by Deepthi *et al.* (2011). This technique is mainly used for the wireless communication applications. Mohanty and Meher (2013) proposed Memory Efficient High-speed Convolution based architecture for 2D-DWT. This architecture have line buffers instead of frame buffers. Also these buffers are throughput independent. That's why this technique is very useful in applications where higher throughput is required. In this architecture down sampling filtering is used. Due to which computational complexity decreases. Also the 100% utilization of the hardware is done due to parallelism. Also, there is reduction in power consumption than the old convolution based architecture of 2D-DWT. Martina and Masera (2006) proposed low-complexity, efficient 9/7 wavelet filters VLSI implementation. Using this technique, number of multiplications operation in lifting steps are reduced. Also, the there is low complexity and achieving the high quality JPEG2000 CODEC. **Problem statement:** With the increase use of multimedia technologies, design of more and more efficient technique for image compression is required. So, the Discrete Wavelet Transform is introduced to overcome the issues in Convolution based transform of for image compression. Convolution based transform require more number of computations and also require high memory. These drawbacks are removed with use of DWT. DWT can be performed with different methods as follows: - Direct form structure - Poly phase structure - Lattice structure - Lifting scheme Among all these technique, lifting scheme for DWT is well known and widely used. Lifting scheme has simple 3 steps, ie. Splitting, Lifting and Scaling. In this technique, DWT filter bank is decomposed into sequence of lifting steps. After that wavelet transform is decomposed using less numbers of lifting steps of arithmetic operations (Multiplication, addition, etc.,) that gives the sequence of upper and lower triangular matrices of poly phase matrices. It requires the less numbers of computations and less storage memory. Again there is use of multipliers in the lifting based DWT. It is very well known that use of multipliers increases the hardware cost. To overcome this issue, Multiplier less Lifting based DWT is proposed. In this technique, multiplication operations are replaced by shift-add operations. Also, there are many issued related to multiplier less lifting based DWT. In this technique, speed decreases due to sequential operations held in it. Also, there are samples available at the output with one after another. That means it can perform single level transform. So, the output latency is increased. Operations performed in this type of transform require more no. of internal cycles. Also there is less utilization of hardware due to sequential use of lifting steps. To overcome these problems, I have proposed the parallel and pipelined architecture with improved lifting, ie. Multiplier less based DWT. In this technique, lifting steps are performed in parallel fashion so the output latency is decreased. Also due to use of pipelined architecture, there is maximum utilization of the hardware. Also, it can perform J levels of decomposition of the image in less no. of internal clock cycles. By the use of multiplier less lifting scheme, we can reduce hardware cost. So, it can be used in high speed and low power DSP applications with reduced cost. This technique of transform is also called as Fast architecture with improved lifting based DWT. ### MATERIALS AND METHODS **Proposed method:** As per discussion above on Lifting scheme for 2D-DWT, it is better technique than the convolution based transform. In this technique computational complexity is reduced as multiplication and addition operations are replaced by shift-add operations. Lifting scheme for transform performs its operations in 3 computed steps as follows. **Splitting of input data set:** In this phase, the input data set is split into even and odd samples. There are down samplers to split the input into even and odd data sets. The basic idea behind the splitting to break up the polyphase matrices for wavelet filters into upper and lower triangular matrices. Suppose h'(z) and g'(z) are low and high pass analysis filters and h(z) and g(z) are low and high pass synthesis filters then decomposition and reconstruction polyphase matrices are given as follows: $$p'(z) = \begin{bmatrix} h'_{e}(z) & h'_{o}(z) \\ g'_{e}(z) & h'_{o}(z) \end{bmatrix}$$ $$p(z) = \begin{bmatrix} h_{e}(z) & h_{o}(z) \\ g_{e}(z) & h_{o}(z) \end{bmatrix}$$ **Predict and update:** The main aim of this step is to minimize the redundancy and give compact representation to the split data. In this step, even data set is to predict the odd data set and the difference between predicted value and original one is processed and replaces it with: $$Y(2n+1) = X_{o}(2n+1)-P(X_{B})$$ In the update stage, the even coefficients are lifted with neighboring wavelet coefficients that is with odd coefficients. This update step is called Primal Lifting and predict step is called Dual Lifting phase of lifting algorithm. **Scaling:** In this step, scaling of the output is done in order to normalize the value of the output. It is performed in predict and update step itself with K and 1/K as a normalization factors for predict and update polynomial. This conventional lifting is shown in Fig. 1. In this conventional lifting, we can see that to calculate predict and update step coefficients, there are numbers of multiplication and addition operations involved. These predict and update steps are calculated according to following equations in the lifting scheme: Fig. 1: Conventional lifting scheme $$P1(n) = x_{0}(n) + a[x_{B}(n) + x_{B}(n+1)]$$ (1) From Eq. 2, it is clear that every lifting step require one multiplication and two addition operations. So, I have proposed that fast architecture with improved lifting scheme. In this technique, there are no multipliers. Multipliers are replaced by the shift-add operations. Every predict and update stage coefficients are calculated using shift-add operations (Fig. 2). Now it can be replaced by shift-add operation (Fig. 3). We can see that it uses the comparator, adder, counter and multiplexer as main parts. Now conventional lifting is improved by using above shift-add multiplier in the lifting scheme to calculate predict and update stage coefficients as shown in Fig. 4. With the multiplier less lifting scheme, proposed lifting scheme is shown below in Fig. 5. To solve the Fig. 2: Conventional coefficients calculation of lifting stages Fig. 3: Proposed shift-add operations Fig. 4: Proposed shift-add multiplier for lifting for coefficients calculation issues discussed in previous study, I have proposed here Fast architecture for the 2D-DWT for 9/7 DWT filter along with improved lifting scheme as discussed above. The block diagram for the fast architecture is given in Fig. 6. Proposed fast architecture has the following main blocks. Fig. 5: Proposed lifting scheme using multiplier less architecture Fig. 6: Proposed fast architecture for 2D-DWT **IDBU unit (Input Data Buffer Unit):** It mainly consists of two First in First out RAMs. When input data signal is given these are used to store input data samples in row transform module. It is used to store the data samples in odd and even fashion ie. Even samples are stored in RAM2 and odd samples are stored in RAM1. Wavelet unit: It consists of: - Two horizontal filters - One vertical filter - SNU (Scale Normalization Unit) | Table 1: Sub band of input image | | | | | | |----------------------------------|-------------------|----------|---------|-----------|-----------| | IN | O1 | O2 | O3 | 04 | | | 0 | x (0, 0) 0 | | | | | | 1 | x(0,1)0 | | | | | | 2 | x(0, 2)0 | 0 | H(0, 0) | | | | 3 | x(0,3)0 | 0 | L(0,0) | | | | 4 | x(2, 0) x(1, 0) | 0 | H(0, 1) | | | | 5 | x(2, 1) x(1, 1) | 0 | L(0, 1) | | | | 6 | x(2, 2) x(1, 2) | H(1, 0) | H(2, 0) | HL(0, 0) | HH(0, 0) | | 7 | x(2, 3) x(1, 3) | L(1,0) | L(2,0) | LL (0, 0) | LH (0, 0) | | 8 | $0 \times (3, 0)$ | H(1, 1) | H(2, 1) | HL (0, 1) | HH(0,1) | | 9 | $0 \times (3, 1)$ | L(1, 1) | L(2, 1) | LL (0, 1) | LH (0, 1) | | 10 | 0 x (3, 2) | H (3, 0) | 0 | HL(1, 0) | HH(1, 0) | | 11 | $0 \times (3, 3)$ | L(3,0) | 0 | LL (1, 0) | LH(1, 0) | | 12 | | H (3, 1) | 0 | HL(1,1) | HH(1, 1) | | 13 | | L(3, 1) | 0 | LL (1, 1) | LH (1, 1) | Proposed architecture performs 2D-DWT row transform first and then column transform. Column transform is performed when sufficient no. of data samples are obtained from the row transform. The exact operation is done in following sequence: The input data samples are split and stored in IDBU unit consisting of two RAMs. Even numbered sample rows are stored in RAM2 and odd numbered sample rows are stored in RAM1. These samples are read in wavelet unit at one internal cycle so they must be read at double rate of internal cycle in IDBU unit. So to avoid the mixing of samples the RAM memories are kept of different size. When the even numbered rows are read into RAM2 first halves rows of the data samples are transformed concurrently. While before second halves rows of data sample transformed reading of next odd numbered row is done in RAM1. When the even data samples and odd data samples are written in HF2 and HF1 (horizontal filters) then it performs row wise transform on it. At the end of it, there are row transformed coefficients are available for next step. After the row transform, column transform is performed by the vertical filter. When there are sufficient no. of data samples available from horizontal filter then only vertical filter perform its operation. Vertical filter accepts two inputs differently from HF1 and HF2. Then, vertical filter generates the two sub-bands of the input image HH and HL or LH and LL. This is one-level decomposition of an input image. This one level decomposition is shown in following Table 1. ### Proposed architecture for 2D-DWT for horizontal filter: Here we will use the two 1D-DWT horizontal filters for even and odd rows filtering to produce the transformed coefficients of the input data samples. The detailed circuit diagram for proposed Horizontal filter is shown below in Fig. 7. This horizontal filter accepts the input, processes Fig. 7: Proposed architecture for horizontal filter it and gives the row transformed coefficients at its output. When there are coefficients are available at the output then improved lifting operation is done, i.e., predict and update lifting is done on these row transformed coefficients. This lifting operation is shown in fig. no. While the row transformation in horizontal filter, it produces the transformed coefficients as low and high frequency coefficients alternatively at the output. When the high pass coefficients are available then these are given back to Processing element as shown in Figure and low pass coefficients are produced from it. When the efficient row transformed coefficients (even and odd) are available at the output of horizontal filter then these coefficients are given to the vertical filter for the column transform. # Proposed architecture for 2D-DWT for vertical filter: In 2D-DWT, there is sequential operation that is first input sequence is divided into even and odd rows. Then, it is given to the horizontal transform. It performs the row wise transformation and then finally lifting operation is done on it. After that when there are sufficient numbers of coefficients are available at the output of horizontal filter and then these coefficients are given as input to vertical filter. Vertical filter performs the column wise transformation on it. This operation of column wise transformation has much importance in fast architecture as it is used to reduce the output latency time. To reduce the output latency, numbers of pipeline stages are used. Figure 8 shows the proposed architecture for the vertical filter for 2D-DWT for 9/7 filter. Proposed signal normalization unit: As discussed earlier, there are 3 stages in the lifting scheme of 2D-DWT. Out of that 3rd stage is scaling of output. Scaling is done to normalize the output. Scaling is done to decompose the image into approximate coefficients and detailed coefficients at different levels. With the scale normalization, visibility of the edges and small important parts of image can be improved by enlarging the high frequency parts of an image. As we have seen, this architecture is pipelined and it can be decomposed in many levels of decomposition. For every further decomposition, there is improvement in contrast and edges. So with help of scale normalization unit, edge enhancement is also achieved. The proposed architecture is show in Fig. 9. Fig. 8: Proposed architecture for vertical filter Fig. 9: Proposed architecture for scale normalization unit ### RESULTS AND DISCUSSION The proposed Fast Architecture based Discrete Wavelet Transform using Lifting scheme is designed on FPGA platform and implemented on verilog module. This design is simulated in Xilinx 14.1 design tool. The various results like synthesis report, FPGA design summary are given below for the comparison of proposed architecture with multiplier less architecture of 9/7 2D-DWT (Table 2-4). From above results of the proposed architecture and multiplier less architecture we can observe that I have proposed the pipeline and parallel architecture. In this method of pipeline and parallel architecture, decomposition of the input string and filtering of it is enhanced and divided into different parts. That are executed in parallel fashion so that critical data path of the Table 2: Design summary of multiplier less 2D-DWT architecture | | Device utilization summary | | | | |---------------------------------------------|----------------------------|-----------|-----------------|-----------| | Logic utilization | Used | Available | Utilization (%) | Notes (s) | | No. size flip flop | 82 | 10944 | 1 | | | No. of 4 input LUTs | 265 | 10944 | 2 | | | No. of occupied slices | 150 | 5472 | 2 | | | No. of slices containing only related logic | 150 | 150 | 100 | | | No. of slices containing unrelated logic | 0 | 150 | 0 | | | Total number of 4 input LUTs | 265 | 10944 | 2 | | Table 3: Design summary of conventional fast architecture for 2D-DWT Device utilization summary | Logic utilization | Used | Available | Utilization (%) | Notes (s) | |------------------------------------------------|------|-----------|-----------------|-----------| | Number size flip flop | 29 | 10944 | 1 | | | Number of 4 input LUTs | 25 | 10944 | 2 | | | Number of occupied slices | 18 | 5472 | 2 | | | Number of slices containing only related logic | 18 | 18 | 100 | | | Number of slices containing unrelated logic | 0 | 18 | 0 | | | Total number of 4 input LTUs | 25 | 10944 | 1 | | | Number used as logic | 10 | 10944 | | | | Number used as shift registers | 15 | | | | | | | | | | filtering of input string is minimized. Also due use of the pipeline architecture there is maximum utilization of hardware. It is very important factor in the design of discrete wavelet transform. I have used multiplier less architecture. In this architecture, multipliers are replaced by shift and add operations. We know that use of multiplier in any circuit adds the hardware to it. Also, as the hardware increased, Table 4: Design summary of proposed fast architecture for 2D-DWT | | Device utilization summary | | | |--------------------------------|----------------------------|-----------|---------------------------| | | | | | | Logic utilization | Used | Available | Utilization (%) Notes (s) | | Number size flip flop | 56 | 10,944 | 1 | | Number of 4 input LUTs | 71 | 10,944 | 2 | | Number of occupied slices | 62 | 5,472 | 2 | | Number of slices containing | 62 | 62 | 100 | | only related logic | | | | | Number of slices containing | 0 | 62 | 0 | | unrelated logic | | | | | Total number of 4 input LTUs | 91 | 10,944 | 1 | | Number used as logic | 64 | | | | Number used as route-thru | 20 | | | | Number used as shift registers | 17 | | | Table 5: The comparison with respect to the slices occupied in proposed and multiplier less architecture and conventional fast architecture | and material less defined the differential rust defined the | | | |-------------------------------------------------------------|--------------------|--| | Architecture used | Slices used (area) | | | Multiplier less architecture | 497 | | | Conventional fast architecture | 72 | | | Proposed fast architecture | 209 | | cost also increases in that way. Due to use multiplier less architecture, multipliers are eliminated. So, it reduces the hardware and also the cost of hardware. In conventional method Lifting scheme of DWT, due to use of multipliers, there are numbers of multiplication and addition operations held. So, the calculation complexity increases. Due to use of multiplier less architecture calculation complexity also decreased. Power required or power consumed is also the important factor in any electronic circuit. Ideally, it should be as low as possible. We can see that from results given below, power required in proposed architecture is lower than the multiplier less architecture. So, it good advantage of using this method of DWT. **Comparison:** The different comparison points are shown below with reference to above design summaries of proposed and multiplier less architectures. Slices occupied: Table 5 shows the comparison with respect to the slices occupied in proposed and multiplier less architecture and conventional fast architecture. Area required is one of the most important factor considered in the design of VLSI circuit. Area required in VLSI hardware design is decided on the number of slices used, number of flip-flops used and number of LUT required in the FPGA implementation. From above table, it is very clear that area required for the proposed fast architecture is 42% less than the area required for the multiplier less architecture. Area required is quiet high in Proposed fast architecture than the conventional fast architecture. Though the number of shift registers in multiplier increases area but the elimination of the multipliers reduces the power required for the designed circuit and power is critical Fig. 10: Comparison of various elements | Table 6: The power required for multiplier less architecture | | | | | |--------------------------------------------------------------|---------|-----------|-------|--| | Parameter | Dynamic | Quiescent | Total | | | Supply power (W) | 0.008 | 0.167 | 0.174 | | | Table 7: The power required for conventional fast architecture | | | | | |----------------------------------------------------------------|---------|-----------|-------|--| | Parameter | Dynamic | Quiescent | Total | | | Supply power (W) | 0.008 | 0.167 | 0.174 | | | Table 8: The power required for proposed fast architecture | | | | | | |------------------------------------------------------------|---------|-----------|-------|--|--| | Parameter | Dynamic | Quiescent | Total | | | | Supply power (W) | 0.008 | 0.167 | 0.174 | | | factor considered in the electronic circuit design. So, again it is the big advantage in using proposed architecture on FPGA platform. Figure 10 describes the various elements required in the proposed and multiplier less architecture. From Fig. 10, there are shown different components required for multiplier less architecture, conventional fast architecture and proposed fast architecture. From it can seen that the no. of elements required such as no. of slices, no. flip-flops required, no. of LUTs required are less in proposed architecture than multiplier less architecture and greater than conventional fast architecture. But there is other advantage of proposed fast architecture, i.e., power consumption. **Power consumption:** The power required for multiplier less architecture is shown in Table 6. The power required for conventional Fast architecture is shown in Table 7. The power required for Proposed Fast architecture is shown in Table 8. As discussed in previous section, power is important factor in the design of electronics circuit design. Table 9 shows the comparison of proposed architecture and multiplier less architecture with respect to power consumption. Table 9 shows there are power required for various types of multipliers used in the Discrete Wavelet Transform. It is very clear that power required for the proposed architecture is minimized greatly. So, it is efficient method of discrete wavelet transform. Table 9: The comparison of proposed architecture and multiplier | Multiplier used | Power required | |--------------------------------|----------------| | Modified BZ-FAD multiplier | 126.00 | | Shift and add multiplier | 194.00 | | Booth multiplier | 379.12 | | Array multiplier | 231.50 | | Wallace tree multiplier | 289.90 | | Multiplier less architecture | 0.1740 | | Conventional fast architecture | 0.1740 | | Proposed fast architecture | 0.1720 | Fig. 11: Comparison of power Figure 11 shown below describes the power required in the proposed and multiplier less architecture and conventional fast architecture. From Fig. 11, it shown the power consumption required for multiplier less architecture, conventional fast architecture and Proposed fast architecture. It can seen that power required for proposed fast architecture is less than other two architecture. So, it will be the better method among three. ### CONCLUSION Discrete Wavelet Transform (DWT) is widely used in image compression techniques such as JPEG2000, etc. DWT has been used for different types of applications such as image coding, image compression, speech analysis, pattern recognition, etc. DWT is a multi-resolution decomposition of a signal decomposes the signal in several components in different frequency band. So, DWT always require large amount of computational memory. In conventional DWT, there is use of multipliers. It increases the cost of hardware. Also, the computational complexity increases in it. Comparing the results and comparison of multiplier less architecture with fast architecture with multiplier less scheme based DWT, we can conclude that it reduces the hardware cost as multipliers are replaced by shift add operation. So, the computational complexity. Also, there is less power consumption in the proposed method as it is very important in keeping the power consumption as low as possible in electronics circuits. The area used on FPGA platform is also one of the important factor. We can see that the area required for proposed architecture is reduced. So, we can conclude that fast architecture using multiplier less based DWT is better than the multiplier less lifting scheme. #### REFERENCES - Chiang, J.S., C.H. Hsia, H.J. Chen and T.J. Lo, 2005. VLSI architecture of low memory and high speed 2D lifting-based discrete wavelet transform for JPEG2000 applications. Proceedings of the IEEE International Symposium on Circuits and Systems ISCAS 2005, May 23-26, 2005, IEEE, USA., pp. 4554-4557. - Daqrouq, K., 2005. ECG baseline wandering reduction using discrete wavelet transform. Asia J. Inform. Technol., 4: 989-995. - Deepthi, H.S., S.S. Manure, C.P. Raj, S.S. Bhusare and U.L. Naik, 2011. Design and FPGA implementation of improved lifting scheme based DWT for OFDM systems. Proceedings of the 3rd International Conference on Advances in Recent Technologies in Communication and Computing, November 14-15, 2011, Bangalore, India, pp: 184-187. - Iwahashi, M. and H. Kiya, 2010. A new lifting structure of non separable 2D DWT with compatibility to JPEG 2000. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, March 14-19, 2010, Dallas, TX., USA., pp: 1306-1309. - Martina, M. and G. Masera, 2006. Low-complexity, efficient 9/7 wavelet filters VLSI implementation. IEEE Trans. Circ. Syst. II: Express Briefs, 53: 1289-1293. - Mohanty, B.K. and P.K. Meher, 2013. Memory-efficient high-speed convolution-based generic structure for multilevel 2-D DWT. IEEE. Trans. Circ. Syst. Video Technol., 23: 353-363. - Movva, S. and S. Srinivasan, 2003. A novel architecture for lifting-based discrete wavelet transform for JPEG2000 standard suitable for VLSI implementation. Proceedings of the 16th International Conference on VLSI Design, January 4-8, 2003, New Delhi, India, pp: 202-207. - Qin, X., X.L. Yan, C.P. Yang and X. Zhao, 2004. Novel VLSI architecture of 2-D DWT/IDWT for JPEG2000 based on diagonal storage. Proceedings of the 7th International Conference on Solid-State and Integrated Circuits Technology, Volume 3, October 18-21, 2004, Beijing, China, pp: 1657-1660. - Wu, B.F. and C.F. Lin, 2005. A high-performance and memory-efficient pipeline architecture for the 5/3 and 9/7 discrete wavelet transform of JPEG2000 codec. IEEE Trans. Circuits Syst. Video Technol., 15: 1615-1628.