



# **ELEC 305**

# **Digital System Design Lab**

**Fall 2024**

## **Lecture 2:** Revisiting Fundamentals



- **.** In this lecture we will design a digital system for an example task (a fire detector) and revisit our fundamentals along the way by dissecting each design choice
- Specifically, we'll talk about…
	- Digital vs. analog
	- © 2024 Burak Soner © 2024 Burak Soner - Using processors vs. application-specific circuits, and software design vs. hardware design
	- Combinational and sequential logic
	- Intro to HDLs and how to use them for realizing digital circuits on FPGAs
- We will treat all of these only lightly in this lecture though, detailed treatment will follow in later lectures and labs. Think of this lecture as an intro to the topics the course covers as well as review of some pre-requisite material.



- We basically want a system that predicts whether or not there is a fire in a room
- **EXECT:** Let's break down an example product: [Nest by Google](https://store.google.com/gb/product/nest_protect_2nd_gen?hl=en-GB&pli=1)





**• Temperature and Smoke sensors**  $\rightarrow$  **These are enough for us now, ignore the others** (we're not trying to build a product, this is just a case study)

**Sensors** 

Split-spectrum smoke sensor

10-year electrochemical carbon monoxide sensor

Temperature



Occupancy (120° field of view up to 6 m [20 ft])

Ambient light

Accelerometer



**Task - Fire Detector**

© 2024 Burak Soner

**Light Source** 

**• We can buy the temperature sensor in TR**  $\rightarrow$  **[SHT30](https://www.direnc.net/gravity-analog-sht30-nem-ve-sicaklik-sensoru) gives analog** voltage output proportional to the temperature (it also gives out the humidity level on a separate channel, but ignore that)

ADIs advice [here](https://www.analog.com/en/design-notes/a148450-smoke-alarm-system-2-0.html) using LEDs and photodetectors which gives analog **.** The smoke sensor is a bit hard to get, assume we built one based on voltage output proportional to "smoke density"

▪ Voltage outputs from these sensors typically have a clear bijective (1-to-1 and onto) mapping to the physical quantities they measure



**Detector** 



- Naive attempt at a fire detection algorithm using these 2 sensors:
	- →*TA*: temperature, *SA*: smoke density
	- $\rightarrow X = TA * p1 + SA * p2 + b1$  : affine combination of *SA* and *TA*
	- $\rightarrow$ *Y* = *TA*(t) *TA*(t-100ms) : rise in temperature over 100ms
	- $\rightarrow$ if (*X* > *THD1*) and (*Y* > *THD2*) then "fire detected, start alarm"
	- →if (*X* < *THD3*) and (*Y* < *THD4*) then "fire extinguished / cooling, stop alarm"
	- →*THD3-4* has hysteresis against *THD1-2* (i.e., *THD4*<<*THD2*, *THD3*<<*THD1*)
- Through some calibration we can find OK values for p1, p2, b1 and *THD1-4*





▪ OK we have the sensors and an algorithm to make sense of the sensor readings

**E** everything works on paper

■ let's start designing its realization





- **First design choice**  $\rightarrow$  **analog vs. digital** (we'll choose digital of course, but still, let's investigate)
	- Analog: Continuous-time, "continuous-valued". The physical world is mostly analog.
	- Digital: Discrete-time, discrete-valued. Computers are mostly digital nowadays.

**• Our algorithm inputs (sensors) are analog. We can digitize them straight away and** use digital computation OR keep them as is and use analog computation

**EX** Let's consider the analog case first (we will not start designing analog circuits now, but we will consider them as modules to make sense of the design in the analog domain)

# **Design - Analog vs. Digital**



#### **Analog computing**

- Constant levels (parameters) can be realized with some resistive dividers + buffers.
- There are 6 different operators, we need to design analog realizations for each:
	- add  $\rightarrow$  passive avg (with resistors) + 1 amplifier of gain=2
	- comparison  $\rightarrow$  differential amplifier (2 transistors)
	- mult  $\rightarrow$  typically 4+ transistors (a.k.a. "modulator")
	- delay  $\rightarrow$  other than RC delay, which is not really a delay! it's tough to build, but possible (see [bucket brigade or CTD\)](https://www.n5dux.com/ham/files/pdf/Analog%20Delay%20Lines.pdf) →→→→→→→→
	- AND, latch $\rightarrow$  these are inherently digital but they have analog implementations for a given "logic" voltage level. An AND gate is 2 transistors in series, and a latch is a bistable configuration of those (typically called SR)





#### **Analog computing**

- OK we **were** able to build the system in analog fashion, what's the problem?
- Alongside design challenges with certain components (e.g., a simple analog delay is much harder its digital counterpart), analog designs suffer significantly from external disturbances, noise and loss.
- thermal effects etc., change the information they carry. ▪ Specifically, since analog values are continuous, minor inaccuracies such as tolerances, parasitics,
- **Furthermore, signal losses are always present and typically vary unpredictably. Together, these cause** the signals to always be "dirty", i.e., you never have a deterministic output like 0/1 as in digital.
- **•** In our fire detector example we could tune the parameters assuming clean signals or a more realistic certain set of "dirty" signals and the system could then fail (either false positives or false negatives) in the case of unexpected amounts of noise and loss due to such disturbances.

# **Design - Analog vs. Digital**

### **Motivation for digital computing**

- **Noise and inaccuracy are unavoidable though,** so what can we do?
- **The digital abstraction is a "workaround" to this**
- **.** If our application allows us to settle for a few distinct voltage levels instead of the whole voltage range, we might recover the correct signal from its noisy mix and avoid error
- For example, if we can get by with only 3 distinct levels, e.g., {0, 0.5, 1} V, we can treat 0.3V as 0.5V and 0.1V as 0V, and so on. This way, if there's a < 0.25V disturbance, we're good!



• Next step  $\rightarrow$  let's try to realize a digital design for our system



# **Design - Analog vs. Digital**

### **Digital computing**

▪ To realize our design with digital computation, we first need to "digitize" the analog inputs, i.e., sample them in time and discretize ("quantize") the values

> will cover quantization and its effects in detail.  $\rightarrow$  Note: We will not cover sampling in detail in this course (consider taking DSP: ELEC 303 for that if you haven't), but we

- **EXECTE:** Temperature and smoke density have slow dynamics, so sampling them at a modest ADC clock of 1 kHz would be more than enough.
- **EXTED** SHT30 transfer characteristics are shown on the right. Let's assume our smoke sensor has a similar response curve, and that 30%-70% of the total range would correspond to safe and dangerous smoke density level limits respectively



#### **Digital computing**

■ Quantization  $\rightarrow$  Let's chop this 0.3V-2.7V voltage range up to 32 pieces and represent it with a uniform fixed-point number representation (5-bits)



(other values will be rounded to **.** This means we'll have the following values to work with these somehow): {0.300, 0.375, 0.450, … , 2.550, 2.625}

> $\rightarrow$ Note how we don't have 2.7 anymore. that would require a 33rd value

▪ We can now treat this set of 32 values as a 5-bit digital value set (i.e.,  $\{b00000, b00001, b00010, ..., b11110, b11111\}$ ) and design around it



#### **Digital computing**

**.** Let's review our operators again like we did in the analog case:

this is the 1-bit version of course, we'll need the 5-bit version but it's the same thing

- $\Box$ - add  $\rightarrow$  Recall the full adder  $\frac{1}{2}$ ,  $\rightarrow$   $\rightarrow$  (don't worry we'll review these when the time comes)
- mult  $\rightarrow$  same approach, different circuit for multiplication:







▪ Overall, the design complexity turned out to be significantly lower than the analog case because we were able to use the **digital abstraction** and do gate-level design!

- delay: 1 kHz clock + a 5-bit latch + a counter triggering at 100

#### **Comparison**

Now let's take a step back and compare what we have in digital vs. analog

- Digital (mostly) saves us from the detrimental effects of noise, we get deterministic outputs (to be fair the output of the task was binary like yes-fire / no-fire so it's inherently more amenable to a digital design, anyhow  $\rightarrow$ analog = noise problems)
- spends more power, and probably takes up more area **·** Digital looks simpler (thanks to the gate abstraction), but it actually has **many more transistors**, which means it © 2024 Burak Soner
	- 5-bit full adder requires [on the order of 100s of transistors.](https://ieeexplore.ieee.org/document/5407195) The analog adder needed only 1!!
- **This also implies analog chips should be cheaper than their digital counterparts**  $\rightarrow$  **Yes… but not really! Once you're in** production, the material costs are naturally lower yes, but the development effort drives the costs up in analog!
- The easier development process drove costs down for digital, especially in CMOS, and led to the famous Moore's law (exponential scaling in number of transistors per mm<sup>2</sup>), which basically meant this for hard analog tasks: "if you can do it fast / resolute enough in digital, don't bother with the analog design, you'll be better off in the long run"



#### **Comparison**

- *"So… do we always choose digital?!"* →No, proven behavioral design + financial means (and time!) to support the physical design effort + millions / billions of volume could mean analog is better. But if you're prototyping or need re-configurability after deployment, digital is probably better.
- © 2024 Burak Soner **This is typically a very complex analysis since the business implications are huge though, so don't** take my word for it.
- Today, power and signal path (radio & comms) chips are mostly analog, and recently the analog AI accelerator market is growing (e.g., see [Mythic](https://mythic.ai/) and [Blumind\)](https://www.youtube.com/watch?v=lG5_EaIJXhc) together with neuromorphic designs.
- There are also designs based on different materials / processes such as carbon nanotube FETs, memristors, phase-change memory etc. and chipmakers are trying to find ways to increase efficiency with tricks like compute-in-memory (a.k.a. in-memory compute? the terms are relatively new).
- > 50% of the market is still digital and mixed-signal (where the reconfigurable parts are mostly digital)



Blumind's presentation at tinyML Asia 2023 had a few informative slides about this, click on the screenshots if you want to see the video:



Many startups like Blumind have popped up in the last 10 years, trying to build the ultimate edge AI accelerator with the best TOPS/Watt. Movidius (acquired by intel) was probably one of the first, with their "Green Computing" vision processors ("Myriad", not an analog design but it was revolutionary at the time). We haven't seen a "winner" yet, the problem  $\rightarrow$  requirement variation is just too high for different apps

### **Bonus: Other types of digital systems**

- We only discussed electrical approaches so far, and more specifically voltage-based signaling
- **That's a bit unfair, the first digital computer was mechanical!**  $\rightarrow$  **[The Babbage Engine](https://www.computerhistory.org/babbage/)**
- and div (sound familiar?), Babbage designed a machine **E** Since an adder is easier to build with gears than mul that realizes the [method of finite differences](https://en.wikipedia.org/wiki/Finite_difference_method#:~:text=The%20finite%20difference%20method%20relies,uniform%20grid%20(see%20image).):
	- basically polynomial approx allowing arbitrary ops with addition only, of course with a certain approx error
	- Input numbers with levers, turn the wheel, get your answer
- **This is a clever design optimization, we'll frequently** do stuff like this to work our way around constraints



### **Bonus: Other types of digital systems**

**• Mechanical/electrical analogies are well-known by now though, so maybe this wasn't a shock.** Well how about a [swarming-behaviour-of-soldier-crabs digital computer from 2012](https://www.technologyreview.com/2012/04/12/186779/computer-scientists-build-computer-using-swarms-of-crabs/) !!

the billiard balls either collide and emerge in a direction that is the result of the ballistics of "Back in the early 80s, a couple of computer scientists … studied how it might be possible to build a computer out of billiard balls… This information is processed through gates in which the collision, or don't collide and emerge with the same velocities. Now … a couple of pals have built what is essentially billiard ball computer using soldier crabs. "We demonstrate that swarms of soldier crabs can implement logical gates when placed in a geometrically constrained environment," they say"



■ A true digital computer, but not a practical (or animal/environmentally friendly) one for sure. Takeaway: there are other ways to realize digital systems, not just voltage-based, not even just electrical!



#### **Bonus: Other types of digital systems (and more)**

OK these were interesting for sure, but let's get back to the practical approaches and recap:

- making it the dominant approach. We'll exclusively focus on voltage-based in this course. • Voltage-based signaling is extremely amenable to CMOS (dominant manufacturing technology today) and most components consume approx. 0 power in resting state (e.g., think non-volatile SSDs),
- . However computing based on current amplitude, or the phase or frequency instead of amplitude, are also possible, and all have different applications (think BJTs vs. FETs)
- **Other emerging technologies for the curious:** 
	- Optical computing based on fiber modes, nanophotonics, …
	- Reservoir computing with organic-electrical hybrids  $\rightarrow$  [Brainoware](https://www.nature.com/articles/s41928-023-01069-w) (extremely interesting)
	- Quantum computing

- …

# Design - Circuits vs. Processors



- **OK we chose digital, let's look back at our design**  $\rightarrow$  we built dedicated circuits for our task
- $\blacksquare$  This isn't the only way though, we're all more familiar with using processors for realizing algorithms like this in the digital realm.
- **Specifically for a task like ours (fire detector), we** interface with physical signals, so we utilize embedded processors (microcontrollers, "MCUs")
- On an MCU, this algorithm would not take longer than  $\approx$  50 lines of code!

#### ■ Let's have a go on an Arduino:

```
uinto_t iA = quantize_to_bbits(u<br>uint8 t SA = analogRead(A1);
// define the quantize to 5bits() function
void setup(){
// define sensor transfer characteristics
  define p1, p2, b1, THD1, THD2, THD3, THD4
int delay counter = 0;pinMode(1, OUTPUT);}
void loop() {
  uint8 t TA = quantize to 5bits(analogRead(A0));
  delay(100); // unit is ms
  uint8 t TA 100msdelay = analogRead(A0);
  int X = TA \times p1 + SA \times p2 + b1;int Y = TA - TA 100msdelay;
  if((X > THD1) & (Y > THD2)}
     digitalWrite(1, HIGH)
 }
  else if((X < THD3)) & (Y < THD4)) {
     digitalWrite(1, LOW)
\qquad \}delay(1); // for stability at approx. 1kHz sampling
}
```
(this is of course not exactly the same implementation, the timing's off, we're not using 5-bits etc., but it serves our purpose of analyzing processor implementations)



- **That was much easier than designing a digital circuit with gates etc. (higher level of** abstraction!) and it's still digital.
- However, this implementation failed to realize a few things (alongside precise timing):
	- circuit we can just synthesize two separate units and compute them simultaneously to gain speed if needed. - Parallelization: the computation of X and Y use the same arithmetic resources on the processor, but on the
	- Arbitrary arithmetic: the smallest data type in arduino is 8-bits, but with our circuit design we could go down to 5 bits and use "just enough" resources for the task if cost was an issue.
	- Task specificity: it's not just the arithmetic that's wasteful, the processor has TONS of extra overhead (it has to "boot" to a stable state for starters). This is unavoidable as the processor is useless without this specific overhead. This means power, space, … all sorts of extra costs.
- **Of course, the arduino is an especially weak processor, and some processors** (especially custom MCUs or massively parallel ones like GPUs or ones with programmable fabric in them) do allow this sort of customization so this comparison doesn't generalize well, but it does show our point.



- So in general, when would we choose to build circuits instead of software on processors?
- Naturally, the reasons differ with respect to your design goals, but here are a few:
	- Building and verifying your digital circuit for this task is a much stronger verification before the ASIC you scrap all unnecessary components and build an Application Specific Integrated Circuit (ASIC). - **First step towards an ASIC:** Probably the most popular reason outside the defence sector. When you're going to build a system in the milions (think of something like a 555, ≈[1B/year](https://en.wikipedia.org/wiki/555_timer_IC#cite_note-Dummies-5) ), every transistor counts, so phase compared to doing that on an MCU because you will surely not put that MCU in your ASIC!
	- **High performance:** Software is flexible but its performance is bounded by the hardware architecture of the processor. For instance, the AVX-512 extensions on modern processors have convoluted software implementations that get the best performance out of them for 512-bit vector ops (refs: [1](https://www.intel.com/content/www/us/en/developer/articles/technical/optimizing-maxloc-operation-using-avx-512-vector-instructions.html#gs.4drtl9), [2\)](https://stackoverflow.com/questions/75904198/simple-avx512-dot-product-loop-only-10-6x-faster-expected-16x). However, if you want anything larger (e.g., image processing), you're out of luck. Processor makers cannot support that sort of flexibility continuously unless it's economically feasible (it's almost always not!). FPGAs (on which we build arbitrary circuits, not processor-friendly software) typically fill this gap.



- Reasons geared towards mission-critical apps like aviation and defence (more refs: [reddit,](https://www.reddit.com/r/FPGA/comments/rf8dz5/use_of_an_fpga_in_aviation_and_mission_critical/) [vhdlwhiz,](https://vhdlwhiz.com/fpga-or-microcontroller/) [fpgainsights\)](https://fpgainsights.com/fpga/fpga-in-aerospace-and-defense-advancements-and-applications/#:~:text=FPGAs%20are%20indispensable%20for%20high,of%20data%20at%20astonishing%20speeds.):
	- **- Flexibility:** Changing requirements over the course of a project might render your previous processor choice obsolete (legacy comm protocols, power requirements, wider parallelization, …), incurring a lot of technical debt. The custom circuit (FPGA) approach is flexible here since you can upsize/downsize your design as needed, without system/board-level changes.
	- doing so in a semantically meaningful way on FPGAs (e.g., getting behavioral HDL code from the bitstream) is even harder. One **- Security / anti-tampering:** Reverse-engineering is **significantly** easier for software than an FPGA bitstream, and reason: compiling software is significantly less complex than implementation + place-and-route on the FPGA. There are also a lot of encryption or obfuscation based protection methods too. In general, there is a lot of interesting ongoing work on this  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$  $(1, 2, 3, 4)$ .
	- **- Reliability and testing:** More abstraction means easier design, but harder testing! Hardware can be verified more reliably than software for this reason (+ thanks to companies like Xilinx you have super-accurate device models, that's part of why Vivado is huge). Furthermore, when you have flexible hardware, you can do things like [thermal-aware design](https://par.nsf.gov/servlets/purl/10169648) to avoid [failures](https://semiengineering.com/thermal-cycling-failure-in-electronics/) and further improve long-term reliability.



- **OK we've decided on a custom digital circuit for this task, let's examine components.** There are basically two types of logic components: combinational and sequential
	- Combinational logic = output depends only on input
		- e.g., button pushed, LED turns on and stays on as long as the button is down, turns off when button is up
	- © 2024 Burak Soner - Sequential logic = output depends on input + "state" (memory)
		- e.g., button pushed, LED turns on, however this time it turns off when the button is pressed again
- Sequential logic has memory! Depending on how the memory is arranged, how it can be accessed and the data it holds, different types of "automata" can be built: Finite state machines, "Pushdown" machines ( $\approx$  FSM + stack), random-access machines (very similar to current Von Neumann computers with CPU + RAM), Turing machines.
- **Check [John E. Savage's book](https://cs.brown.edu/people/jsavage/book/pdfs/ModelsOfComputation.pdf) for more on automata theory if you're curious. My knowledge** on this topic is limited so I don't want to mislead anyone here with a light treatment.

- Recall the basic set of logic gates
- A gate is defined by an input-output function (truth table) + temporal response (propagation delay)
- hence, our digital circuits **• These are our "lego pieces", we connect them in various** ways to realize both combinational and sequential logic,
- However, some of these are easier to manufacture and integrate than others (e.g., NAND flash) and there are also simplification algorithms (e.g., SOP and POS) so we typically do not rely on all of them at the same time





- Addition is combinational, the operation has no state. It just takes two inputs and computes the output based on a set of logic rules
- Specifically, our 5-bit adder needs to do the following:
	- Do 5 x 1-bit additions
	- Manage the carry bit in each step
	- Push out the 6-bit result (5 bit + 1 carry)
- Note that the circuit does this **"non-stop"**, there is no state change to wait for, it keeps outputting A+B as fast as possible, this is the essence of combinational circuits!

### **Combinational Circuit (add)**

- Recall the 1-bit full adder on the right
- final  $C_{\text{out}}$  is the additional mandatory 1 bit consecutive adder to the  $C_{in}$  of the next. The ▪ To extend this to 5-bits, we connect a 0 to the first  $C_{in}$  bit, and connect the  $C_{out}$  of each resulting from the addition operation
	- Recall: addition of two N-bit numbers creates an additional bit, output gets represented by N+1 bits. Multiplication of two N-bit numbers becomes 2N bits.

Out<sub>2</sub>

 $Q$ ut3







#### **Combinational Circuit (add)**

- The logic is OK. But there is one more aspect, mostly covered in a "ceremonial" manner in introductory courses like ELEC 205: **propagation delay**
- ircuits nave finite bandwidth, you can c snift. voltages up and down between logic levels in 0 time (such a square wave requires ∞ bandwidth!)
- **The rising and falling times make up the delay, and** the worst case delay of this circuit as a whole (considering all input→output links) defines the latency







#### **Combinational Circuit (add)**

- Even though this is purely combinational, the higher-level app typically at least saves the values in time as a series, so the system is typically state-based (has memory)
- As expected, this clock speed is dictated by the worst case ▪ This typically means the circuit is clocked at a certain speed. delays in the circuit (we can't try saving the output of the next cycle before this cycle's output settles!)

Note: There's one important "lookahead" point here, FPGAs implement gates with [look-up tables](https://digilent.com/blog/fpga-configurable-logic-block/) [inside "configurable logic blocks" \(CLBs\)](https://digilent.com/blog/fpga-configurable-logic-block/) to be able to make them programmable. However these CLBs naturally have different propagation characteristics since they are different than actual fixed gates, so the delay we compute on paper will not be equal to the delay on the FPGA. Furthermore it will change from FPGA to FPGA due to the way the CLBS are set up on the chip. Managing the delay on an ASIC requires a lot of [black magic](https://www.amazon.com/High-Speed-Digital-Design-Handbook/dp/0133957241) as discussed earlier so that's a whole other topic. We'll see more on [managing delay and clocks in FPGAs](https://www.reddit.com/r/FPGA/comments/p8fi6b/how_to_find_fmax/) later.



**.** In the case of this N-bit full adder, our gate delay is:

 $t_n = 4 + 2(n-2) + 2 = 2n + 2$ 

where n is per-gate delay (assuming all gates have the same delay)

#### **Combinational Circuit (add)**

- Notice something here  $\rightarrow$  Our full adder has a worst case delay of 12 unit gate delays in 5-bit, and it grows linearly with number of bits (in 32-bit it has 66, that's huge!!)
- naive approach at an adder circuit with a large word width. ▪ Our current design is called a "ripple carry adder", and it's a
- **EXECT** Faster versions are available, such as the "carry lookahead adder" which uses a partial version of the full adder as the building block and adds a "carry lookahead logic" component and attains O(1) in 6 (constant) unit delays. One downside  $\rightarrow$  this needs gates with more than 2 inputs for n>2.





- **OK we've characterized a combinational component, went over the preliminary design** aspects, we'll dive deeper in later lectures and labs
- **EXECT:** Let's do the same for the delay function to review sequential components
- **The delay function can be implemented as follows:** 
	- a 1 kHz clock
	- a 5-bit latch
	- a counter triggering at 100, saves the delayed 5-bit value to the latch at each trigger
- **Leaping ahead a bit**  $\rightarrow$  **the counter already contains latches, so let's just have a look the** counter and the clock in detail.



#### **Clock**

- The clock is not exactly a digital component, it's rather something like a signal source. Generating a stable / accurate clock of a desired freq on a chip is a **loaded** topic and it's not in the scope of this course, we will only see basics.
- signal (harmonics and exact frequency), we add some signal conditioning (e.g., filters) and scalers (multiplier / dividers). cases (>10 GHz) → cavity resonators (<u>DRO, Gunn etc.</u>). Since oscillators typically cannot exactly produce the desired **The [clock generation](https://en.wikipedia.org/wiki/Clock_generator) process starts from an oscillation source: a crystal, a tank circuit, and in some niche high-freq use**
- Also sometimes, when the speed of these "source oscillators" are not enough for the application (e.g., crystals are typically limited at ≈200 MHz, but you want to power a processor at 3 GHz), we may use something called a [phase-locked loop \(PLL\).](https://en.wikipedia.org/wiki/Phase-locked_loop)
- PLLs are not only used to multiply clocks though, they have a pretty vast application space, and designing a good one for a given set of application requirements is very hard work  $\rightarrow$  check these [extra pll references](https://www.ti.com.cn/cn/lit/ml/snaa106c/snaa106c.pdf) and the last chapters of the famous ["High Speed Digital Design: A Handbook of Black Magic"](https://www.amazon.com/High-Speed-Digital-Design-Handbook/dp/0133957241) book for more info on clock generation

### **Sequential Circuit (counter)**

- The simple free-running counter is the canonical example of a sequential circuit. Its current count value is its state, and it jumps between states at each clock tick.
- dependent on its current state (remember Moore vs. Mealy finite state machines? This is a Moore's FSM) ▪ In other words, its input is always the same: tick, tick, tick, … and its output is only
- Our counter FSM of limit=100 therefore needs 100 states. To represent 100 different states with binary logic, we'll need 7 bits (6 makes 64 states available, not enough, and 8  $\rightarrow$  256 states, too much).
- **These state bits are typically stored in circuits via flip-flops, the most popular one being a [D flip-flop.](https://en.wikipedia.org/wiki/Flip-flop_(electronics)#D_flip-flop)**









#### **Sequential Circuit (counter)**

- How does a digital component do this though? How does the flip-flop "latch" onto a certain value and retain that? We do not know of any gates that can do this.
- **Answer**  $\rightarrow$  **bistable multivibrator!**
- **The feedback connection on the SR part makes this** possible. It's a circuit that is stable at two points and unstable at all others. The trigger pushes the output between those two states.
- **The transistor version is a bit more intuitive. You might** remember this sort of feedback behavior from lab work on ["Schmitt triggers"](https://en.wikipedia.org/wiki/Schmitt_trigger) in introductory circuits courses.









#### **Sequential Circuit (counter)**

**.** There are two main types of counter circuits: synchronous and asynchronous, they have different use cases, in our case it doesn't really matter because the clock frequency is very low





#### **Sequential Circuit (counter)**

- **The propagation delay issue and related design aspects are valid also for sequential** circuits just like combinational circuits
- smaller than one clock period, so the delays become negligible. **.** However, the fact that a slow clock dictates the operation of the circuit makes things easier in the case of our fire detector  $\rightarrow$  the delays are orders of magnitude
- **The main challenges in sequential circuits arise from issues related to clock** management (e.g., CDC) and other application-level issues (sampling etc.).
- Terminology note → we put together flip-flops to build **registers**. The register abstraction is important because it will be our most basic unit of memory!



- **OK** we analyzed digital vs. analog designs, investigated using processors vs. custom circuits and dove a bit deeper into how our digital circuit can be implemented. We are ready to deploy this realization.
- Old school  $\rightarrow$  we can deploy this using discrete logic ICs (the 74 series)



**.** This might actually be an option for our small fire detector system example, but we can all imagine  $\rightarrow$  for anything larger, this approach will not be scalable and that's certainly not how people build larger systems like processors etc. nowadays…



- **The need is clear**  $\rightarrow$  **another level of abstraction on top of gates, a language, so that we** can define and simulate the "behavior" of such circuits clearly.
- **The language is to serve as both the input of an automated "gate netlist generator" for** easy deployment, as well as a specification of the behavioral model of the circuit.





- We do have languages like this now, they are called hardware description languages (HDLs). Digital designers typically use these while designing circuits instead of netlist drawings like we did earlier. The most popular ones are VHDL and Verilog (and also SystemVerilog)
- HDLs are not even restricted to digital! There is an analog version of Verilog called Verilog-A which analog designers use to describe and simulate circuits (SPICE fashion)
- independently  $\rightarrow$  synthesizing a gate-level netlist + realizing a physical layout for that netlist on a chosen **• This way the design procedure gets separated into two automated parts which can be improved** piece of hardware (an FPGA, or an ASIC with a [pre-determined underlying structure\)](https://en.wikipedia.org/wiki/Standard_cell)



**• The VHDL vs. Verilog (+SysVerilog) debate is endless, highly resembles any debate on** any programming language or software tool, and in my opinion it's a bit funny



- not created differently for • The debate is also pointless of course, as these two HDLs were "artistic" reasons, they serve different purposes.
- **.** Industry-dependent and geographical (cultural?) differences typically dictate the choice





- **.** VHDL was [invented](https://www.doulos.com/knowhow/vhdl/a-brief-history-of-vhdl/) by and for the defence industry (specifically, US DoD) and tied to [MIL-STD-454](https://apps.dtic.mil/sti/tr/pdf/ADA304607.pdf) →→→
- non-US defence industries are also **.** The dominance of the US over the global defence arena might have been the reason for this  $\rightarrow$  most highly inclined towards VHDL as opposed to Verilog, the latter sees wider use in commercial sectors.

The Department of Defense (DOD) is engaged in a number of programs which require VHDL (VHSIC Hardware Description Language) models of ASIC's and systems. Specifically, the details of the deliverable VHDL models are expressed in a combination of documents such as MIL-STD-454, the VHDL Data Item Description (VHDL-DID (DI-EGDS-80811)) and any additional requirements specified in any given Contract Deliverable Data Items ("CDRLs" or "data items").

VHDL data items capture the behavior and structure of an electronic system, subsystem, or device. The primary purpose of these data items is to document hardware designs in a machine executable, simulatable, and hierarchical format. VHDL models themselves must be inspected to insure that they meet the requirements specified in the contract or VHDL-DID, as applicable. The VHDL-DID may be tailored by the contract requirements for some applications.

For acceptance, VHDL simulation models provided to the Government as CDRLs must satisfy some known acceptance and verification criteria and procedure. These criteria and procedures are the purpose of this document.

**• We will study VHDL almost exclusively in this course, not because of its ties with the defence** industry, but mostly because it's a very explicit (formally called "strongly typed") language that allows to picture the hardware more clearly (less abstractions).



- **Example VHDL code from an online source** for an up counter is shown on the right
- **.** We'll dive into this in more detail later on, but for now let's point out an important difference with programming languages:
	- This is not code that gets executed line by line like in C, Python, … The whole source describes a circuit (the entity!), with input/output ports, signals (on wires) and "processes" inside the behavioral definition signifying the sequential components.

```
library IEEE;
use IEEE.STD LOGIC 1164.ALL;
use IEEE.STD LOGIC UNSIGNED.ALL;
-- FPGA projects using Verilog code VHDL code
-- fpga4student.com: FPGA projects, Verilog projects, VHDL projects
-- VHDL project: VHDL code for counters with testbench
-- VHDL project: VHDL code for up counter
entity UP COUNTER is
    Port ( clk: in std logic; -- clock input
           reset: in std logic; -- reset input
           counter: out std logic vector(3 downto 0) -- output 4-bit counte
     ) ;
end UP COUNTER;
architecture Behavioral of UP COUNTER is
signal counter up: std logic vector(3 downto \theta);
beain
-- up counter
process(clk)begin
if(rising edge(clk)) thenif(reset='1') then
         counter up \leq x"0":
    else
        counter up \le counter up + x''1'';
    end if:
 end if;
end process;
 counter \leq counterend Behavioral:
```
- Verilog is a commercial effort, [invented](https://digilent.com/reference/learn/fundamentals/digital-logic/verilog-hdl-background-and-history/start) at a company called Gateway Design Automation (acquired by [Cadence](https://www.cadence.com/en_US/home.html))
- Verilog was proprietary. Realizing **• At that point VHDL was open,** this would prevent widespread adoption, Verilog was converted to an open IEEE standard (#1364).
- **System Verilog (superset of Verilog)** followed this with #1800.

### A Tale of Two HDLs

#### **VHDL**

ADA-like verbose syntax, lots of redundancy (which can be good!)

Extensible types and simulation engine. Logic representations are not built in and have evolved with time (IEEE-1164).

Design is composed of entities<br>each of which can have multiple architectures. A configuration chooses what architecture is used for a given instance of an entity.

Behavioral, dataflow and structural modeling. Synthesizable subset...

Harder to learn and use, not technology-specific, DoD mandate

#### Verilog

C-like concise syntax

Built-in types and logic representations. Oddly, this led to slightly incompatible simulators<br>from different vendors.

Design is composed of modules.

Behavioral, dataflow and structural modeling. Synthesizable subset...

Easy to learn and use, fast simulation, good for hardware design

Lecture 2

6.111 Fall 2019

43



- **Once the HDL source is ready, we feed it to two automatic tools in series:** 
	- 1) gate netlist generator, which is typically called "synthesis", and
	- 2) place-and-route, which is typically called "implementation"
- this after having a look at VHDL fundamentals so let's focus on implementation for now. © 2024 Burak Soner ▪ The synthesis output is generic, it's the gate-level design that we drew earlier and can be implemented anywhere (discrete digital ICs, FPGAs, ASIC). We'll have a better idea about
- Alongside the synthesis output, the implementation tool takes in chip constraints, and plans a layout of the synthesized circuit on the die using its "resources".
- **For ASICs this is typically a free-for-all situation (although there are some standards),** and in FPGAs these "resources" are called configurable logic blocks (CLBs)



(gates, or more complicated units like the D flip-flop). It **.** The CLB unit is the core of the FPGA as it realizes the letter P (Field **Programmable** Gate Array). It can mimic most of the fundamental digital units we covered earlier typically has a complicated design with many resources inside to maintain versatility.







© 2024 Burak Soner 46



- **EXEC** Between these programmable CLBs is another programmable component, the "interconnect"
- certain roles as per the needs of the design, and **• The implementation tool takes the gate-level** design, assigns the resources inside the CLBs to then uses the switch matrices on the interconnect buses (typically 2D, see right) to connect those configured CLBs together and realize the circuit.



- iteratively and require a significant amount **• As you can imagine, finding the optimal** combination of CLB assignments and interconnect configurations is not trivial, especially so for large circuits. Implementation tools typically work of computation with high-fidelity physical models.
- **This is part of why Vivado is such a huge** software package. It's literally building a circuit on the FPGA automatically like this.

#### Synthesis and Mapping for FPGAs

• Infer logic: choose the FPGA CLB that efficiently implement various parts of the HDL code



• Place-and-route: with area and/or speed in mind, choose the needed macros by location and route the interconnect



"This design only uses 10% of the FPGA. Let's use the CLB in one corner to minimize the distance between blocks."

6.111 Fall 2019

Lecture 1

- **Once the implementation phase is completed,** the designer then has the option to upload the generated bitstream to program the FPGA and finally realize the circuit in hardware.
- any serious project must consider: **[simulation.](https://www.reddit.com/r/FPGA/comments/yvm9pe/is_there_a_way_to_simulate_fpga_projects_virtually/)** The second **.** However, there is a crucial step before this that
- **The reason is simple: by simulating this design** before deployment in the development tool, you can 1) give the circuit arbitrary inputs very easily (as opposed to doing this via a signal generator + logic analyzer on a desktop lab unit), and 2) debug internal signals alongside outputs in response to those inputs.

▪ A few types of simulation is typically available:



**EXECUTE:** Behavioral and post-synthesis simulations are better for uncovering your coding bugs since they are faster (no routing info), but are typically inaccurate for timing. Post-implementation simulations are closer to the real case.



- Synthesis, implementation and accurate simulation are **fascinating** technologies, and we can safely say these are among the primary "accelerators" of the modern semiconductor industry.
- **.** While they certainly sound like topics that only people lik 30-year Xilinx veterans would know something about (especially the place-route and post-implementation simulations), [there are successful](https://www.reddit.com/r/FPGA/comments/u0y17a/is_there_a_free_open_source_fpga_programming/) [open source efforts](https://www.reddit.com/r/FPGA/comments/u0y17a/is_there_a_free_open_source_fpga_programming/) (FOSS) in this domain! Some examples <sup>†</sup>:
	- Icarus Verilog → <https://steveicarus.github.io/iverilog/>
	- YosysHQ's "nextpnr" → <https://github.com/YosysHQ/nextpnr>+<https://arxiv.org/pdf/1903.10407.pdf>
	- A python-based "modern" HDL → <https://github.com/amaranth-lang/amaranth>
	- EDA playground's online simulator  $\rightarrow$  <https://edaplayground.com/>
- $\blacksquare$  The board support is not great in these though, so handle with care.

<sup>T</sup> special thanks to *İ[hsan Kehribar](https://www.kehribartech.com/)* for introducing me to these



- OK we've covered almost every aspect (except hardware testing), let's recap this intro:
	- We laid out task requirements and chose sensors
	- We compared and contrasted digital vs. analog realizations of the fire detector
	- We investigated a (very simple) processor implementation of this algorithm and discussed what advantages could be leveraged if this was instead implemented via a custom circuit
	- We analyzed the modules in the circuit design and discussed their implementation details (using combinational and sequential logic components, gate-level design)
	- We investigated deployment options for the circuit. We discussed HDLs as a scalable alternative to gate-level representation, and how FPGA deployment is the natural choice. We briefly discussed FPGA structures as well as FPGA toolkits which allow for automatic netlist generation from HDLs as well as place-and-route and simulation.





# **next → mandatory lab tutorial**

## **we'll have a look at Vivado + VHDL projects**



**© 2024 Burak Soner 10/10/2024 52**