To implement a specific design on a FPGA, a number of strategies can be followed while writing an HDL code. The number of resources used and their allocation will vary with the logic supporting the design. Every FPGA has fixed number of programmable logic, I/O banks and memory elements.
The CLBs (Configurable Logic Blocks), which are present as a part of programmable resources, contains flip-flops, Look-Up tables (LUTs) and multiplexers. These CLBs along with programmable routing can form a complex web of combinational and sequential circuits.
The LUTs mainly define the behaviour of the combinational logic designed with a VHDL or Verilog code. It simply generates output based on the input combination. An LUT carries a customized truth table for every possible input. The results are already stored, which gets loaded as the FPGA is powered up.
The incoming sections are going to provide a brief understanding of manually estimating the number of LUTs required to implement a piece of HDL code on hardware.
Implementing MUX from LUT6
Let’s step forth with an easy example which depicts the schematic of a MUX obtained from Vivado tool. A single LUT6 contains 6 input pins and a output pin, which can be used to implement a 4:1 MUX (where 4 inputs + 2 select lines = 6 inputs).
Consider an example of MUX consisting of 4 bit input and 2 bit sel line and is implemented in HDL as,
Then the Schematic would look like as depicted by below illustration.
We have seen if the MUX input is 4 bit then it can be implemented using a single LUT6, but what happens if input contains more than 4 bits. This will be implemented in different logic levels levels.
Consider a input of N bits, now one LUT can accommodate 4 bits therefore N/4 LUTs can carry N bits, now N/4 output will be there in first logic level (where all the outputs are evaluated simultaneously) and will travel to the next stage or logic level where we are gonna require (N/4)/4 (=N/16) LUTs to process N/4 inputs. This process continues until we get 1 as our final number. Then we simply add all the LUTs required in all the logic levels.
if N is not divisible by 4 then we will take smallest integer value of (N/4).
Lets consider an example of 128:1 MUX
Total required LUTs = 32+8+2+1 = 43.
N will only include those bits of input on which the output depends. In this above example of 128:1 MUX, we have considered that two inputs among the six, are already occupied by select lines. If it isn’t the case then we will divide the integer N by 6.
LUT inputs may vary as different input LUTs are present on the FPGA. Some have 4 inputs whereas others have 6 inputs. Thus, the number dividing N will vary with the type of LUT.
Here, we will look into some examples and compare the LUTs count calculated from the above mentioned formulae and the value estimated by Vivado tool.
8 bit input and single bit output
Consider a 8 bit input sequential circuit, where output will be decided by the status or value of “sel” signal. Based on the current value of “sel” signal, the corresponding value of input will be taken out as output.
HDL Simulator evaluation:
Below mentioned diagram depicts the behaviour of the above code, and here one can clearly see that 3 LUTs and 1 flip-flop are needed, and 2 logic levels are present.
Total LUTs required = (8/4) + LIF(8/16) = 2 + 1 = 3, where LIF is least integer function or Ceiling function. Here we got 1 as our final quotient after two times division by 4, so logic levels count will be 2.
8 bit input and two bit output
Consider a 8 bit input sequential circuit, where output will be decided by the status or value of “sel” signal. Here, output is a 2 bit function.
Total LUTs required = LUT[output(0)] + LUT[output(1)] = 2*[(8/4) + LIF(8/16)] = 2*[2 + 1] = 6, where LIF is least integer function or Ceiling function. Logic level count is also 2 here.
HDL Simulator evaluation:
(when output(0) and output(1) depends on different values of input)
Here, in spite of input being (8 downto 0), the output(0) and output(1) depends only on 8 bits of input but the inputs are different for both, therefore in the manual calculation we take N = 8.
Below mentioned diagram depicts the behaviour of the above code, and here one can clearly see that 6 LUTs and 2 flip-flop are needed, and 2 logic levels are present, and this matches with our mathematical calculation.
(when output(0) and output(1) depends on same values of input)
Here, the output(0) and output(1) depends on only eight bit input and in a similar fashion.
Below mentioned diagram depicts the behaviour of the above code, and here one can clearly see that 4 LUTs and 2 flip-flop are needed, and 2 logic levels are present.
Here the LUTs are found to be lesser than the calculated value, this is, maybe, because of the tools capability to minimize the LUT by resource sharing as the output(0) and output(1) are sharing the same LUTs in the first logic level.
Uncertainty in calculating LUTs
Sometimes LUTs count could be different from the calculated value, and it is because of the resource sharing (as explained above) or due to the introduction of the other resources, like MUX, etc., during resource management.
Consider a 128 bit input sequential circuit and a single output signal
For the above mentioned code:
Calculated LUTs = (128/4) + (128/16) + (128/64) + LIF(128/256) = 43
But the LUT count from vivado simulator was 35, because at the second logic level 2:1 MUX were used in place of LUTs.
Only provided with the set number of programmable and memory resources, sometimes it becomes really crucial to keep the used LUTs count in check. Mathematical calculation could give us the rough estimate of possible LUT count for a given HDL code, and this value may vary among different tools or different versions of the same tool as they have their own defined way of allocating resources.
If your current HDL design has put forth an unprecedented situation, and you are looking for solutions, Logic Fruit Technologies can help you find them. Right from planning the entire schematic to optimizing the final design, Logic Fruit can assist you in every possible way. We have developed timing and resource optimized data paths, supporting bandwidths upto 2Tbps, in FPGAs.
Logic Fruit Technologies is driven by more than a decade of experience in high speed protocols and interfaces, including 1G/10G/40G/100G Ethernet, PCIe(Gen1-Gen6), USB3.0/4.0, CPRI/ORAN, Display Port, ARINC818 etc, and are well equipped to facilitate every design requirement.