

# Design of Conventional and Modified Router Design for NOC and its FPGA Implementation

Manjunatha S. 1SP16LVS02 M.Tech (VLSI and Embedded Systems) SEA CET Bengaluru, India

# ABSTRACT

FPGA Based Systems are helpful improves the performance metrics with high throughput for SOC and NOC Based designs. NOC is an integration of complex-network system into singledevice or a chip. This reduces the manufacturing cost of SOC, increases the performance and reduces the time-consumption. In this paper, two different types of NOC Designs are synthesized and implemented. Firstly, Conventional NOC 2X2 Router includes the 5-channels namely, local, east, north, south and north. Each channel is having 2 Multiplexors along with cross bar switch. Secondly, Proposed NOC 2X2 Router includes 4 Nodes and each Node is having the priority encoder, Random arbiter and router design using XY Algorithms. The proposed designs executing in Xilinx 14.7 and simulated using Modelsim. The outcome of executed scheme gives detailed results with their input and output. The system design is synthesized in Xilinx 14.7 ISE using Verilog and simulated in Modelsim 6.3f and these implemented on FPGA board of version Artix 7. The performance analysis of Proposed NOC 2X2 Router over Conventional NOC 2X2 Router is improves the area overhead and it is tabulated.

# **Keywords**

System on chip (SoC), Network on Chip (NoC), Router, FPGA

# 1. INTRODUCTION

In modern century everyone wants to use small things which contain more features and less in cost. It can be possible only if the number of functionality is combined into single-device or one-chip which help to reach our goal and demand [1]. For this researchers were developed the "Nano-technology" concept. Nano means tiny (10-9), very small device. After this, the biggest challenge was how to integrate the intellectual-properties through various sources rapidly and efficiently [2]. To overcome these situation researchers introduced the new method i.e. "System 0n Chip" S0C, which is the blooming topic in modern networking technology. It is the good solution to integrate the number, of functionality into single-device or a chip. SOC is integrated device or circuit which integrates all the system components on electronic-system into a single device or chip [3]. The one chip contains digital-signals, analog-signals and mixed signals and can compute in any frequency range. The architecture of SOC contains a processor, time, system-bus and power-controller circuits [4].

Advantages of SOC: Low-power with less space consumption, faster operation and less cost per gate and so on. The computation of SOC is very reliable and efficient. The important drawback of these devices, to integrate huge number of components into one device and make the interconnection between them [5]. To avoid this kind of complexity problems, a new technique was introduced i.e. "Network on Chip". NOC is a integration of complex-network system into single-device or a chip. This reduces the manufacturing cost of SOC, increases the performance and reduces the time-consumption [6]. For an efficient communication, have to transfer the data from sender to receiver with zero-noise and any interruption in communication. The NOC is a device or chip which is the combination of transmitter and receiver in a single device [7]. The "NOC" circuit structure can be divided into 2-catogories [8]:

### 1) Circuit-Switched NOC and

#### 2) Packet-Switched NOC.

These two categories are based on data-nodes switching concept. The switched-circuit NOC is a type of Mesh-topology network. In this network, connection is established b/w nodes before the origin of transfer of data b/w two-nodes. The best example for switched-circuit NOC is Telephone-line and "ISDN". There are 3 phase in circuit-switched network. 1) Establishment of circuit 2) transfer of data 3) circuit-disconnection. First phase is circuit establishment, where the circuit will establish between one node to another node. In this phase information can send through signal the receiving of data. In second-phase data can be send for one to another node with the help of network. The final phase is circuit-disconnect phase which occurs after data transformation. and the established circuit connection between nodes will disconnect [9]. Overall in this network system, the physical link is generated between sender and receiver-node before the start of communication-process. But in "Packet-Switching" network, combines all the transmitted-data into suitable block size i.e. "Packets". Each packets are transmitted independently form one to another node. And allocated the suitable path of each packet as they needed [10]. The packet-switching network goal is to optimize the scope of accessible link, minimize the responsetime for communication-process and maximize the toughness of communication n/w. The packet-switching NOC method is safe and best compared to circuit-switching NOC method.

This paper presents two different types of NOC designs are synthesized and implemented i.e., Conventional NOC 2X2 router and proposed NOC 2x2 router. The paper is organized with sections like: discussion of existing researches in NOC design in Section 2 (Review of Literature), design process of proposed method in Section 3, analysis of the results obtained from both the NOC designs in Section 4. Followingly, the performance analysis and conclusive points of the paper are discussed in Section 5 and 6 respectively.

## 2. REVIEW OF LITERATURE

This section gives the review of existing researches in NOC design domain. The review is performed by considering the researches published in top publication like IEEE Xplore. The work of Wu [11] has designed NoC(Network-on-Chip) power optimization using routing algorithm. Here combined routing algorithm (CRA) is presented with a novel Multi-NoC design with distinct routing algorithms for different subnets. Their



experimental results show that combined routing algorithm (CRA) consumes an average of 15.58% less power than Catnap, the state of the art power efficient Multi-NoC design, and the EDP (energy delay product) is 8.59% lower than Catnap on average.

Lan [12] this work proposed an efficient virtual channel (VC) buffer management structure and a dynamic VC allocation mechanism for the router to minimize latency, and area (buffer allocation) overhead. Finally, trhe performance was evaluated for different load scenarios and comparisons to existing VC allocation algorithms are discussed.

Kamali [13] have presented adaptive NoC, a configurable cycleaccurate FPGA-based NoC simulator, which can be configured via software. A wide range of parameters are configurable in FPGA side of the proposed simulator, and the software side is implemented on an embedded soft-core processor. Author has implemented dual-clock architecture as an innovation in virtualization methodology, which is also capable to share idle time-slots, which helps not only simulate bigger NoCs, but also reduce simulation time drastically. Malviya [14] has presented a router with five connections in same time operation without any interrupt. Here, various types of algorithms mechanisms are introduced to improve the performance of router. i.e. XY routing algorithm, forward flow mechanism, input-output buffering, packet switching etc. Rohini et al. [15] have given the router design based on dead lock free NoC architecture. The implementation of XY algorithm is found in this work and offer area efficient design.

In Kale and Gaikwad [16], the design of router based on NoC is illustrated and also focused on the current issues of router design to use less hardware and to get high throughput. The method has outcome with the power optimized results in the router consumes 7mW for 16bit packet size.

The work introduced by Anirudh [17] gives an application specific routing algorithm for mesh based topologies with irregular core sizes which aims to develop a more efficient algorithm by reducing the length of the path between the two communicating cores. The algorithm proposed follows wormhole switching and uses XY routing algorithm as a basis and improves on it.

Further the work of Jiang [18] introduced the virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. For throttling information collection, instead of transmitting the topology information of the whole network, they use a 12 bits register to reserve the router state for one hop away instead of transmitting the topology information of the whole network. Experimental results show that the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.

The work of Plalesi et al. [19] design possibility of more efficient ASRA, which dispatches the uniform traffic by consuming communication-topology and BW (bandwidth). The proposed method utilized off-line exploration to form expected-load on several interactions in the set-up system. The analysis of ASRA is addressed in other several routing system, in each router to distribute less-traffic to paths and to reduce network congestion.

The research of Imbewa and Khalid [20] presents a Fast Light Weight NoC Router (FLNR) designed for FPGA. FLNR is a 5 port-packet switched (PS) wormhole-router that utilizes XYrouting algorithm and round-robin-arbitration (RRA) and size is parametrizable. The size of the buffer used is reduced by number of control-fields in a packet is decreased. The result of FLNR is compared to other previously proposed router based on frequency, area and zero load latency, which shows that FLNR is significantly superior.

The analysis of these recent researches idealizes the state of art in NOC design with consideration of different methods. The analysis gives that most of the designed NOC are facing issues in NOC design performance.

## 3. PROPOSED METHOD

In order to overcome issues associated with NOC design, following routing methodology is adopted.

## **3.1 Conventional NOC Architecture**

The existing method of router architecture proposed by Bhanawala et al. [21] is shown in below figure 1. Here to select the various inputs bit for the desired output the multiplexer will use the addressing bits, which first select a data of input and forward it the multiplexer output. To implement the 4x1 multiplexer the gate of inverter or the gate of AND can be used. By using write logic control the FIFO write control can be done. From the binary code the pointer is generated, binary code it will guide the memory location to write. In every stage the operation of the successful pointer can be incremented. Also similarly a read logic control can be incremented. Here cross bar switch is used to build the connection between the channels. In the present work, the design of cross bar switch is used which has 5 inputs and 5 outputs. It switches the data from input port to the output port during the time of router function. For router function it has five input and 5 output for the design of cross bar switch.



Fig 1: The router architecture given by Bhanwala et al. [21]

### A. Proposed NOC Architecture

The No $\overline{C}$  using router operation which consists of crossbar switch, priority encoder and random arbiter. Figure 2 below shows the router architecture of proposed method.





Fig 2: Proposed NoC Router architecture

In the proposed system, NoC design contains random arbiter, priority encoder and routing XY algorithm which is shown in above figure. The priority encoder have five input node sides such as packet-in and four PIN with four direction like north, south, east and west. Each input have 16 bit. As per selection requirement of line information encoder will choose the input. This information is produced by the random arbiter. The priority encoder is directly connected to the output through crossbar switch which gives the output based on input. The data packet output in north, south, east and west node [22].

#### B. XY-Routing algorithm on NoC

The x-y routing is a deterministic and distributive routing algorithm which implements the coordinates to identifying the destination and forwards the pack through network.



#### Fig 3: Flow chart of implemented (XY) routing algorithm

The x coordinates extents to the column in horizontal direction at the same time the vertical coordinates reaches in vertical manner. The algorithm with XY- routing is mainly introduced for the deadlock and live lock free network. The scheme deals a load in network center to have irregular traffic. The flow chart of implemented x-y routing algorithm is shown in figure 3.

## 4. RESULTS ANALYSIS

The designed system is verified using appropriate tools. Here we are executed in Xilinx- 14.7 and simulated in modelsim. The output results of executed system gives detailed description of outcome based on their input and output. The architecture design has been done using one of the highly descriptive languages in VLSI is "Verilog code". The proposed design is synthesized using Verilog code in Xlinx-14.7 ISE. With simulation has been done using-Modelsim6.3f and this is implemented on Artix7-FPGA Board. The Device is 7A100T-3 CSG324. MODIFIED NoC 2X2 DESIGN:

The proposed modified NoC 2X2 architecture of Top module is shown in below figure 4. It contains Global clock, reset input pin like pack\_in1, packet\_in2, packet\_in3 and packet\_in4 and each input pin have 16 bit. Similarly, based on input pin output pins are included i.e. data\_out\_n1, dta\_outn2, dta\_out\_n3, data\_out\_n4.



Fig 4: Modified NOC-2X2 Top Module

The simulation results obtained for the NOC 2X2 Router. Figure 5 shows simulation Results of 2x2 NOC Router. Our current xy address is taken as 0000, so input Packet\_in1\_16'h87aa, means  $16^{th}$  bit selling =1, For destination port address is given by packet in [14:11]= 0000, so we will get the output in data\_out\_n1 = 8'haa.Similarly packet\_in2,packet\_3, packet\_4 get the output result in data\_out\_n2=8'hbb,data\_out\_n3= 8'hcc,data\_out\_n4= 8'hdd.

| Modified_NOC_2x2_tb/dk           | 0    |        | הההההה |    | 10000000 |      |
|----------------------------------|------|--------|--------|----|----------|------|
| /Modified_NOC_2x2_tb/rst         | 0    | Γ      |        |    |          |      |
| /Modified_NOC_2x2_tb/packet_in1  | 0000 | 0 87aa | 0000   |    |          |      |
| /Modified_NOC_2x2_tb/packet_in2  | 0000 | 0000   | 8fbb   |    | 0000     |      |
| /Modified_NOC_2x2_tb/packet_in3  | 0000 | 0000   |        |    | a7cc     | 0000 |
| /Modified_NOC_2x2_tb/packet_in4  | a8dd | 0000   |        |    |          | a8dd |
| /Modified_NOC_2x2_tb/data_out_n1 | 00   | 00 )a  | a (00  |    |          |      |
| /Modified_NOC_2x2_tb/data_out_n2 | 00   | -00    |        | bb | 00       |      |
| /Modified_NOC_2x2_tb/data_out_n3 | 00   | -00    |        |    | cc       | 00   |
| /Modified_NOC_2x2_tb/data_out_n4 | dd   | -00    |        |    |          | dd   |

Fig 5: Modified NOC-2X2 Module Simulation Results.



• **Conventional NOC-2X2 Design Results:** - The top module of 2X2 NOC conventional routing architecture is as shown in the figure 6. It contains Global clock, reset input pin, read and write input (winc, rinc) 8-bit din1, din2, din3 and din4 input ports along with 2-bit select line (sel) and 3-bit select line for cross bar(sel\_c). Four different 8-bit dout1, dout2, dout3 and dout4 output pins are included.



Fig 6: Conventional NOC-2X2 Top Module

The simulation results obtained for the NOC 2X2 Router. Fig 7 shows simulation Results of Conventional 2x2 NOC Router.

When clock is actived, with reset =0, din1= 10, din2=20, din3=30 din4=40, sel =2'b00 and  $sel_c$  is changing along with winc and rinc input signal. Based on these changes the corresponding outputs dout1, dout2, dout3, and dout4 are generating the output.



Fig 7: Conventional NOC-2X2 Module Simulation Results

The simulation results obtained for the modified NoC single router which is shown in Figure 8. In current work, if we need east output xy addressed as 0000. So input Pcket\_in1- 16'h87aa, i.e.  $16^{th}$  bit of select line equal to 1 and destination port address is given by packet\_in [14:11] = 0000. The output in data out na=8'haa.



Fig 8: Modified NOC-Single Router Simulation Results

The conventional NoC Single Router Simulation result is shown in below Figure 9. When clock is actived, with reset =0, din\_l= 10, din\_e=20, din\_w=30 din\_s =40, din\_n= 50 and sel =2'b00 and sel\_c is changing along with winc and rinc input signal. Based on these changes the corresponding outputs 8-bit dout\_e, dout\_w, dout\_s, dout\_n and dout\_l are generate the outputs.

| /router_module_tb/dk     | 1   |      |   |     |   |     |   |     | huuuunn |     |
|--------------------------|-----|------|---|-----|---|-----|---|-----|---------|-----|
| /router_module_tb/rst    | 0   |      |   |     |   |     |   |     |         |     |
| /router_module_tb/din_l  | 10  | (10  |   |     |   |     |   |     |         |     |
| /router_module_tb/din_e  | 20  | 20   |   |     |   |     |   |     |         |     |
| /router_module_tb/din_w  | 30  | (30  |   |     |   |     |   |     |         |     |
| /router_module_tb/din_s  | 40  | 40   |   |     |   |     |   |     |         |     |
| /router_module_tb/din_n  | 50  | 50   |   |     |   |     |   |     |         |     |
| /router_module_tb/sel    | 00  | 00   |   |     |   |     |   |     |         |     |
| /router_module_tb/sel_c  | 100 | 000  |   | 001 |   | 010 |   | 011 |         | 100 |
| /router_module_tb/rinc   | 0   | L    |   |     |   |     |   |     |         |     |
| /router_module_tb/winc   | 1   | ·    | ħ |     | ٦ |     | ٦ |     | ħ       |     |
| /router_module_tb/dout_l | 50  | -(10 |   | 20  |   | 30  |   | 40  |         | 50  |
| /router_module_tb/dout_e | 10  | -20  |   | 30  |   | 40  |   | 50  |         | 10  |
| /router_module_tb/dout_w | 20  | -(30 |   | 40  |   | 50  |   | 10  |         | 20  |
| /router_module_tb/dout_s | 30  | -(40 |   | 50  |   | (10 |   | 20  |         | 30  |
| /router_module_tb/dout_n | 40  | - 50 |   | 10  |   | 20  |   | 30  |         | (40 |

Fig 9: Conventional NOC-Single Router Simulation Results

# 5. PERFORMANCE ANALYSIS

The following Table 1: gives the area utilization summary corresponding to modified and conventional NOC-2X2 Router Design. The FPGA artix-7 device is selected with available resources and utilized resources of Modified and conventional NOC-2X2 Router are tabulated.

 Table 1: Area Utilization comparison of Modified and conventional NOC-2X2 Router Design

| Resource<br>Utilization                 | Available | MNR | C<br>N<br>R | Overhead |
|-----------------------------------------|-----------|-----|-------------|----------|
| Number of<br>Slice<br>Registers         | 126800    | 205 | 12<br>64    | 83.78%   |
| Number of<br>Slice LUTs                 | 63400     | 265 | 10<br>24    | 74.12%   |
| Number of<br>fully used<br>LUT-FF pairs | 1912      | 194 | 37<br>6     | 48.40%   |
| Number of<br>bonded IOBs                | 210       | 86  | 73          | NA       |
| Number of<br>BUFG/BUFG<br>CTRLs         | 32        | 1   | 1           | NA       |



# 6. CONCLUSION

In this project random arbiter with XY routing algorithm is implemented. The router in the proposed technique facilitates all the incoming packets randomly without any deadlock and live lock. Hence the proposed design with random arbiter improves the performance and reduces the area by decrease the packet staking which is more suitable for NoC design. Similar type of routers reduces the no. of instances required to design various sizes of router like 2X2 NoC. The proposed NoC router is compared with conventional NoC router which is shown in Table 1.

The improved overhead of the Modified NOC-2X2 Router includes Number of slice registers, number of slice LUT's, LUT-FF Pairs are 83.78%, 74.12% and 48.40 % respectively over conventional NOC-2X2 Router. It is clearly shows the proposed design is better area utilization than the previous conventional approaches.

In the future work, proposed router will be implemented with full NOC architecture i.e. with Network interface, linkers, IP module.

# 7. REFERENCES

- Wang, Albert ZH. On-chip ESD protection for integrated circuits: an IC design perspective. Vol. 663. Springer Science & Business Media, 2006.
- [2] Brune, Harald, et al. Nanotechnology: assessment and perspectives. Vol. 27. Springer Science & Business Media, 2006.
- [3] Hwang, L. James, and Reno L. Sanchez. "Method and system for integrating cores in FPGA-based system-on-chip (SoC)." U.S. Patent No. 6,941,538. 6 Sep. 2005.
- [4] Kim, Jae-hyun. "System on chip processor for multimedia devices." U.S. Patent No. 7,171,050. 30 Jan. 2007.
- [5] Benini, Luca, and Giovanni De Micheli. "Networks on chips: A new SoC paradigm." *computer* 35.1 (2002): 70-78.
- [6] Windhorst, Torsten, and Gordon Blount. "Carbon-carbon composites: a summary of recent developments and applications." *Materials & Design* 18.1 (1997): 11-15.
- [7] Sahoo, Swetaleena. *Network on chip modelling using CDMA concept*. Diss. 2014.
- [8] Marculescu, Radu, Jingcao Hu, and Umit Y. Ogras. "Key research problems in NoC design: a holistic perspective." *Hardware/Software Codesign and System Synthesis, 2005. CODES+ ISSS'05. Third IEEE/ACM/IFIP International Conference on.* IEEE, 2005.
- [9] Biber, Jeffrey A., and Zon-Hong Hsieh. "Host network communication with transparent connection devices." U.S. Patent No. 5,170,394. 8 Dec. 1992.
- [10] Chiang, Ching-Chuan, et al. "Routing in clustered multihop, mobile wireless networks with fading

channel." proceedings of IEEE SICON. Vol. 97. No. 1997. 1997.

- [11] Wu, Ji, Dezun Dong, and Li Wang. "NoC power optimization using combined routing algorithms." *Computer and Information Science (ICIS), 2017 IEEE/ACIS 16th International Conference on.* IEEE, 2017.
- [12] Lan, Yun Long, and V. Muthukumar. "Efficient virtual channel allocator for NoC router micro-architecture." *System-on-Chip Conference (SOCC), 2017 30th IEEE International.* IEEE, 2017.
- [13] Hadi Mardani Kamali\*, Shahin Hessabi. "AdopNoC-A fast and flexible FPGA based Noc algorithm." Department of Computer Engineering Sharif University of TechnologyTehran, Iran.
- [14] Malviya, Swati, H. O. D. I. T. Former, and MITS FMS. "Five Port Router for Network on Chip." *Proc. of ARRL and TAPR Digital Communications Conference*. 2010.
- [15] Rohini, Udupi, G. R., and G. A. Bidkar. "Design and implementation of deadlock free NoC Router Architecture." International Journal of Advanced Research in Electronics and Communication Engineering 2.4 (2013): pp-485.
- [16] Kale, Ms AS, and M. A. Gaikwad. "Design and Analysis of On-Chip Router for Network On Chip." International Journal of Computer Trends and Technology, July to Aug 2011 (2011).
- [17] Grandhi Sai Anirudh, Soumya J Routing algorithm for applications Specific on chip with irregular core sizes. 2017 IEEE International Symposium
- [18] Xin Jiang, Xiangyang Lei, Lian Zeng, Takahiro Watanabe"High performance virtul channel fully adaptive thermal aware routing for 3D NoC" Graduate School of Information, Production & Systems, Waseda University.(2017)
- [19] Palesi, Maurizio, Shashi Kumar, and Vincenzo Catania. "Bandwidth-aware routing algorithms for networks-on-chip platforms." *IET computers & digital techniques* 3.5 (2009): 413-429.
- [20] Imbewa, Abdelrazag, and Mohammed AS Khalid. "FLNR: A fast light-weight NoC router for FPGAs." *Circuits and Systems (MWSCAS), 2013 IEEE 56th International Midwest Symposium on.* IEEE, 2013.
- [21] Bhanwala, Amit, Mayank Kumar, and Yogendera Kumar. "FPGA based design of low power reconfigurable router for Network on Chip (NoC)." *Computing, Communication & Automation (ICCCA), 2015 International Conference on.* IEEE, 2015.
- [22] Kendaganna Swamy, S., Anand Jatti, and B. V. Uma. "Performance enhancement and area optimization of 3× 3 NOC using random arbiter." *International Advance Computing Conference (IACC)*. 2015.