The BASIL Networks Public Blog contains information on Product Designs, New Technologies. Manufacturing, Technology Law, Trade Secretes & IP, Cyber Security, LAN Security, Product Development Security
Internet of Things (IoT) -Security, Privacy, Safety-Platform Development Project Part-8
saltuzzo | 12 January, 2018 14:58
Part 8: Embedded Processor Systems - System On a Chip (SoC), System In a Package (SIP)
"We are drowning in information but starved in knowledge:" - John Naisbitt
Part 1 Introduction - Setting the Atmosphere for the Series (September 26, 2016)
Quick review to set the atmosphere for Part 8
What we want to cover in Part 8:
Since the beginning of this series in September 2016 there have been many hacked IoT devices using COTS embedded hardware and software creating high visibility to security and privacy. The current database of breaches encouraged us to present a more detailed hardware and software presentation to assist designers and educate new comers of the new challenges with security and privacy. Due to the complexities of processors today we will continue to follow our technical presentation methodology, Overview → Basic → Detailed (OBD). We will be addressing the many sections of the Core IoT Platform separately to keep the presentations at a reasonable length. The full details will be presented during the actual hardware, firmware and software design stages.
The preliminary preview of the entire index is shown below and will be updated with links as we progress through the live series. Comments are welcome both publicly and privately. If you want to participate privately we will give acknowledge of your participation only by your permission. We hope to share insight to address hardware and software solutions to the security, privacy and safety issues of IoT devices. Parts of the embedded processor series will apply to both desktop, tablets as well as embedded IoT devices since they all share common elements of a CPU system.
A Brief CPU Summary:
In order to prevent such intrusions and unwanted code from being executed we have to understand the hardware, firmware and software details being used on the core IoT Platform. When we analyze the embedded processor market place we see that there are really only a handful of different category's of processors and a large variation of licensing of the same or simular cores. It would take a several hundred page book to cover all the processor differences and is really outside the scope of this series, however we will look at the major players of 32 bit CPU cores and start there which is well within our scope. The major players are Intel®, AMD®, ARM®, NXP®, Microchip® and a smaller x86 player but still applicable is ZFMicro®. There are three that are x86 based, Intel, AMD and ZFMicro. NXP M&A acquired the original Motorola/freescale 68K CPU line; ARM Ltd stands alone and has licensed the technology to many players including all the above. Microchip is in a unique position due to the several M&A's of Atmel, SMSC and others. The processor lines for these major players cover a broad spectrum of applications which make it difficult to select a processor that will allow full control to insure security and privacy at the core level.
All the major players compete with their own versions of an IDE (Integrated Development Environment) package and once selected you are basically locked into the selected manufacturer. Using a third party package allows the fast turn of a product to market, however it does not guarantee privacy or security. In order to insure security and privacy a detailed understanding and disclosure of the internals of the processor and software packages used are a must as well as the access to the core macro assembler to be able to incorporate a users integrated security methodology. That being said we will now present the basics of CPU architectures in order to ask the right questions when performing our selection process.
The variations of processors today range from a dollar to hundreds of dollars depending on the bit size 8/16/32/64 bits and speed from a few KHz to GigaHertz as well as integrated peripheral functions. The main function of the CPU is the programmability of a sequence of instructions fetched from a memory system to control a users process. What makes CPU's different in the industry is the Micro-Coded ROM (MCR) which identifies the unique set of machine instructions for each CPU manufacturer. If you change the Micro-Coded ROM you change the processor instruction set even though it still controls memory access and some logic functions it has a unique Macro-Assembler assigned to the processor. Open source compilers like GCC have incorporated several families of processors into the compiler allowing the user to write code in C and compile it for several types of processors.
The remaining blocks of the CPU all perform similar functions, the memory controller and sequencer controls the access to memory locations and also controls the CPU jump tables. The Central Process Controller or Instruction Process Decoder is what performs the instructions that is fetched from memory and keeps track of the programmed instructions with a program address counter. The Execution Control Unit performs basic arithmetic and logic functions and incorporates a set of general purpose registers. The remaining interfaces, the External Memory is to store the application program and other application parameters, the Data/Address BUS & Control are for adding user peripherals. The BUS control allows direct access to the memory controller for fast data transfers. The final section the Power-Up Entry is a special one time execution during power-on that will allow the user to enter a unique memory address to start fetching instructions to execute. To understand the core requirements of security one should understand how the internals of selected processor functions. This introductory presentation will shed light on the complexities of designing an embedded system with the highest level of security possible from the core hardware to firmware to application software.
During the CPU presentation we will be creating a Key Security Requirements (KSR) list to be use for the Embedded Processor selection process. It is important to keep in mind that all the security requirements may not be met with COTS (Commercial-Off-The-Shelf) embedded processor and may shed some light on the limitations of COTS embedded systems and the compromises that are being made to put a product on the market.
CPU (Central Processor Unit) What They All Have In Common:
The roadmap for Intel®, AMD®, Microchip®, NXP® and other manufacturers of embedded processors are very well documented leading us from a 4 bit microprocessor (historical read Intel 4004 microprocessor) up to the 64 bit processor families on the market today. Our intent here is to present the core functionality of all Central Processing Units to understand how we will implement the security policies for the Core IoT Platform. The CPU is just a programmable block of logic gates that allow the user to program a set of instructions designed into the processor unit connecting real world peripherals, transfer data as well as perform arithmetic and logical computations on digital data.
When we perform a Internet search on Embedded Processors we get inundated with millions of hits on a variety of devices from Single Board Computers (SBC) to general purpose MIPS, ARM, etc. type micro controllers CPU and other names associated to the programmable device. These embedded systems are a finished product that the manufacturer supplies with a associated IDE (Integrated Development Environment) package to get a product on the market fast. Putting a product on the market with canned IDE's without knowing or understanding the software and the amount of control questions the vulnerabilities of the IDE for hackers etc. The better we understand the device we are programming the more confidence we will have in securing the device for our application. With that said, lets start at the core of the Embedded system, the Central Processing Unit. Figures 8.0 is a functional block diagram of a typical CPU. When we look at Figure 8.0 we see that the Instruction Process Logic Block is the central controlling core of the processing unit and everything else are "internal peripherals" that are used as support interface devices for the core process block. The real world connections for the CPU are just a interconnecting buffer block to form a protective communications mechanism to the internal BUS & Control. All central processing units have a few section in common, they all have:
An Instruction Process Logic Block
• Instruction Fetch Queue Buffer
• An Instruction Process Decoder to execute a series of programmed instructions stored at specific memory locations.
• A Micro-Coded ROM (MICR) Contains Machine Instruction Set - Registered, Patented or Trademarked.
• An Execution Control Unit - AKA - Arithmetic Logic Unit - performs data and logic manipulation
A Memory Access Controller, AKA, Memory Management Unit (MMU) - logic to fetch and store data to memory addresses.
A Memory Data/Address Interface Control BUS to attach External Memory, RAM, EEPROM, DRAM etc.
A Set of General Purpose Registers - usually eight registers plus a Stack, Status, Program & Control Register
A Small Internal RAM Buffer for general purpose registers for the Execution Control Unit
A External Data /Address Interface BUS to transfer data to/from the CPU internal logic
A Clock interface controller for CPU process timing.
A Power On Sequence (POS) unique to the CPU core to start executing instructions from a defined memory location.
Core Functions of the Central Processor Unit Architecture:
Today there are peripherals that include an embedded processor to communicate complex instructions to reduce the instructions required to communicate when connected to an external main CPU that controls the system. This adds a new challenge to the security of the system since the peripheral today does not necessarily need the CPU to communicate, it just needs to have access to the local BUS. Some of the more advanced intelligent peripherals are easily hacked today due to the vulnerability of the BUS access and the Internet controller used to communicate through the Internet, hence, The Reaper IoT Botnets Infects Millions of Networks is just the beginning of the security and privacy challenges. The challenge today is to incorporate security in all part of the designs especially in the BUS protocols in order to prevent unwanted intrusions and hidden root kits.
From Figure 8.0 which defines the Central Processor Core functions we may then define anything outside the core as a peripheral whether the peripheral function remains inside the actual chip or connected to the external pins of the chip. If outside access is allowed to the CPU then full control is obtained over the IoT platform and so enters the Security, Privacy & Safety Policies to insure the CPU performs its programmed sequences without interference. It is important that we understand all vulnerabilities of the selected CPU core, how it communicates with the internal bus to the real world external bus, so enters the cliché "If you don't know where you are going any BUS will take you there" and the adventure begins. The Memory Access Controller is considered a peripheral to connect external memory to the CPU core in order to execute instructions. It is half of the memory interface peripheral and is one of the vulnerabilities that we enter as part of the KSR list. Some embedded processor SoC include both EEPROM and RAM to address simple controllers, however for our Core IoT Platform we would like much more control that a fixed embedded system.
Instruction Process Logic Block
Instruction Process Decoder (IPD)
From a security point of view the IPD should not be accessible to the outside world and it should not be modifiable in any way. If you do a bit of research there are some processors that allow micro-code updates when loading specific Operating Systems. These patches are generally a firmware patch and linked to other (Basic Input/Output System) BIOS instructions toptimize the processor for the selected OS. Older processors before 2007 require the operating system to control all the BIOS functions as part of the OS and do not use the firmware BIOS except for loading the OS at POST, this is not the case today which we will cover as we progress through the series.
Intel processors incorporate a SMM (System Management Mode) controller to hold the setup of system parameters for access to the required core functions that the OS has privilege to. Initially, System Management Mode was used for implementing Advanced Power Management (APM) features. However, over time, some BIOS manufacturers have relied on SMM for other functionality like making a USB keyboard work in Legacy BIOS mode
Some uses of the System Management Mode are:
System Management Mode can also be abused to run high-privileged rootkits, as demonstrated at Black Hat 2008 and 2015. Sony BMG copy protection rootkit scandal in 2005 is just one example and robbing banks is still against the law but still happens, hackers are always tempted and execute these kits..
Types of Processors, MIPS, RISC, CISC, Pipeline, Clocked.
Processors that are not RISC based will handle completes functions via a series of steps in sequence and are called CISC (Complex Instruction Set Computers) such as Digital Equipment PD -11 and VAX systems that incorporated a Polynomial evaluation instruction set based on CISC Instruction Set Architecture as well as several other complex instructions. The Intel IA32, IA64, x86 product line, NXP 68000 are all CISC architecture processors.
The summary of RISC vs CISC is that the CISC processor performs a function in hardware such as a multiply Reg0, Reg1 and moves the results to Reg0 and only requires one line of code as MULT A, B and in C code, A=A*B. The MCR only needs one word for the entire operation. The RISC processor performs this same Multiply in several steps such as:
MOV Reg1, B
MULT A, B
MOV Reg0, A
It is obvious that more memory and more clock cycles are required as well as more compile time for the same function. However Even though it looks like a single operand there are still clock cycles involved. Also in a CISC based system only the result is saved and if the B has to be used again it has to be reloaded where as in a RISC based B is still in the register. We will come back to this at a later time when we get into the processor selection and security. When we look at the available FPGAs (Field Programmable Gate Arrays) today it allows us to apply both worlds to obtain greater performance. We will cover FPGA's and CPLD's (Complex Programmable Logic Device) when we present the peripheral interface section of the series, keep in mind that Intel Corporation purchased Altera Corporation so new advancements with processor technology and FPGA's falls in the probability arena.
The Instruction Process Decoder (IPD) is a key mechanism that determines the complexity of a processor core. Every manufacture of processors be it a stand alone CPU or embedded surrounded by a host of peripherals will incorporate a Instruction Process Decoder. It is this decoder logic that makes the block of logic a processor due to the fact that this IPD allows a sequence of instructions to be performed in a unique sequence of steps. The used within the IPD determines the type of Instruction Step Architecture for the processor, simply presented there are generally two type of architecture that applies to processors, the first is Memory Mapped I/O architecture and the second is Dedicated I/O architecture. Both types of architecture incorporate a memory access architecture that may be used for I/O peripherals, however the dedicated I/O incorporates an independent I/O instruction set that is separate from the typical fetch and store memory functions. In the Dedicated I/O architecture there is usually a separate set of address lines to be used for peripherals and communicate through a selected register set to identify the I/O physical address to send the data to. In a Memory Mapped I/O users may treat I/O Peripherals just like a memory location with all the memory instruction set functions within the processor. The Intel x86 processor line has a separate I/O instruction set and uses the first 16 bits of the address bus as a shared memory and I/O since only one of the two functions, memory or I/O may be accessed at a time. The NXP 68000 set of processors use the Memory Address/Data BUS as both for peripheral I/O and memory access. Both type of architecture have their pro's and con's.
All processor instructions require clock cycles to be implemented, the IPD is simply a sequencer that when input with a specific bit pattern will perform a specific sequence on the processor core. Since all processor instructions that are not I/O peripheral are of a fixed nature the exact timing for each instruction may be calculated by the number of clock cycles it takes and the time period of the clock executing the instruction. This is an important part of the processor and is added to the KSR list for the processor selection. I/O timing is dependent on the real world inputs and may vary when waiting for responses to continue and are calculated based on the peripheral features and the application.
Pipeline processors that include pipeline instruction logic still require clocks to function as they require some type of status flag setup as well as results to be placed somewhere. Pipeline instructions have the given of a fixed time to execute since it is based on two controllable parameters, the clock period and the propagation delays through the pipeline logic. This gives them an advantage of clocked instruction sequences especially if the clock is much slower than the pipeline execution time of the instruction. Pipeline instructions are less likely to be interrupted during the cycle since it is a fixed hardware function where as a clocked instruction may be interrupted in the middle of its process pushing the return state on the stack which requires more clocks then start executing the interrupt handling process until the interruption is completed, then returning to the interrupted instruction by popping the return state off the stack and continuing the instruction. We will address processor interrupts and the security policies required to handle them later in the series.
Micro-Coded ROM (MCR) for the unique registered, patented or trademarked instruction set.
In order to have feedback on an instructions process in real time a set of FLAG bits are used by the process block to identify the final state of the last instruction. Flags are set during the execution of instructions to determine the state or results of the instruction. The assigned bit position is specific to the processor design. Figure 8.1 shows a Typical; Intel Instruction Set Architecture Flag Bit Assignments. Generally there are three classes of FLAG bits, Status, Control and System. Example: if we compare two registers data and they are equal the ZF (Zero Flag) is set to 0.
It is not uncommon that a large number of instructions are designed into the hardware processor. The MCR, IPD along with the Sequencer interact to control the step sequence for all the defined instructions. There is no magic to a processor design just a large block of logic gates to control the 1's and 0's. If the instruction is a pipeline type instruction then it is started with a single clock cycle and the data is pipelined through the fixed logic for the process. There will always be a debate on which is better, pipeline logic or clocked logic blocks, regardless there are applications that are better for each. The clarity will present itself when we address security and peripheral communications in the sections that follow. CPU's that incorporate a Reduced Instruction Set methodology are directed to a fixed logic function generally a pipeline logic set that makes up the Instruction Set Architecture (ISA) that requires less clock cycles to complete.
Instruction Control Sequencer - Loop & Jump Control:
Execution Control Unit (ECU) - AKA - Arithmetic Logic Unit (ALU)
Instrucion Process Logic Block Summary:
One of the areas we did not cover yet is the interrupt architecture of the processor core. Interrupts are a independent function that are usually integrated into the core instruction architecture that allow an interruption of the instruction process usually at the end of the instruction. The architecture is such that it uses a memory pointer. This pointer is called the Stack and is a small contiguous block of memory specifically used for interrupts. When an interrupt happens the state of the core is pushed on the stack to be used for returning to the interrupted process; we will return to interrupts in the Memory section that follows.
Memory & Memory Access Controllers (MAC):
Allocating memory space is critical to a secure user process as well as a secure operating system, so before we get into the memory allocation lets look at the actual hardware that is used to communicate wit the memory. Memory chips handle access by a selecting an address the contains a cell of data that data is either placed on the memory data bus or there is data on the memory databus to be stored at that address. A direct read or write is the fastest memory cycle that exists in any processor block, delays enter the picture when we add address translation and protection to the simple memory access. Enter the MMU or MPU blocks of logic as well as some intelligent for encryption to determine if the address being accessed is assigned to the process being executed along with several other conditions. I would take a guess that many of you reading this blog that you have heard of Spectre and Meltdown, (not the movies 2015, 2004), however they are both disaster movies as is the conditions. These two vulnerabilities have to do with memory access from an unwanted source to intercept the process instructions to gain access to the system. This is both a hardware and software issue so we will cover the hardware side now and when we get to the security software part of the series we will address this in detail..
Regardless of the memory type be it Static RAM or Dynamic RAM the access still has to be protected from unwanted intrusions, a challenge that has been here from the beginning. There are basically two categories of memory Volatile and Non-Volatile, Volatile is memory that looses its data when the power is turned off and obviously Non-Volatile retains it data when the power is turned off. There are basically three types of volatile Random Access Memory today, Static (SRAM), Dynamic (DRAM) and Pseudostatic (PSRAM). Static (SRAM) does not require any refresh to maintain its memory content, however, the cost is chip density and reduced memory size. Dynamic (DRAM) does require the cells to be refreshed (rewriting the cell data) periodically in order to maintain data storage and requires a special DRAM controller that shares real world data access with the refresh cycle. A new kid on the block is Pseudostatic (PSRAM) which is dynamic RAM with a build in refresh that functions like SRAM but a bit slower speeds to handle the refresh and as with the DRAM controller requires a configuration process to maintain the thermals and access timing within the chip.
Within the volatile SRAM type there is some sub-types, Conventional memory and Content Addressable Memory (CAM), Conventional RAM is arranged as a single address to a single data word, Conventional RAM is present in all the desktops, tablets, smartphone's, 98% of the servers in the cloud, one address location one data location. Content Addressable Memory (CAM) an older technology that has been around for over 60 years but very little has been discussed or applied using this category in today's processors. There are multiple patents on CAM and applications filed from 1970 through 2012 for various versions and applications using CAM from companies like IBM®, AMD®, IDT®, TI® and many others to get their dominance in the market. Content Addressable Memory is primarily used with Content Addressable Parallel Processors (CAPP), however it does not have to be applied to parallel processors in general. This technology is very applicable to Artificial Intelligent since search compare and execute functions are a major part of AI processing and CAPP give higher speed and functionality than conventional processor systems. CAM is generally related to Content Addressable Parallel Processors (CAPP) a subject we will address at another time since it is beyond the scope of this project at this time.
The memory controller handles access to the internals of the CPU; since instructions cannot be executed without memory the CPU issues a fetch instruction requests from the Memory access controller to start an instruction processes from a memory address. Memory access is generally defined with the CPU specifications and features in order to incorporate security policies. There are simple Memory Access Controllers that just handle a large linear array of memory without any special features, then there are the Memory Management Units which incorporate memory segmentation and protection, however they also are integrated with other blocks of the CPU architecture. The MAC, MMU and MPU are just controllers that define memory access requirements attached to the CPU, they require some type of physical memory attached to them to function.
Memory access is a major security issue, how it is connected to the Core CPU and how it is allocated, Figure 8.2 shows the functional blocks of a typical memory access architecture connected to the core CPU that consists of a Dynamic Random Access Controller, DRAM, EEPROM and SRAM. Since nothing happens without memory it stands to reason that hackers want access to plant virus, worms, root kits and other codes to obtain access which makes the memory hardware a high profile part of the security architecture.
The simple Memory Access Controller is just a buffer cache that allows access to the memory array in a linear form from address 0000 to address nnnn, just one big linear array of memory addresses. This is the most vulnerable memory access since there is no protection of data or program code regardless of where it is placed in memory and accessed without any fault or protection. From Figure 8.2 we see that the MMU or MPU controls all aspects of memory access that includes Direct Memory Access from I/O devices which is a hardware security vulnerability that we will address in more detail as we progress through the series. This is a major concern and noted in the KSR list.
Memory Management Unit (MMU) is a controller that intercepts "ALL" memory access. Generally the common definition is that the MMU handles accesses to memory requests by the CPU, however that is not totally true. For devices that incorporate DMA (Direct memory Access) hardware the MMU also handles this function and is termed the I/O MMU. This does create security vulnerabilities since the direct writing to memory from a device may also put code as well as data in memory and if the code is in a page that is legal to the current program it may be executed These devices are typically USB ports, Firewire, or storage type peripherals. MMU's primary function is to control memory segmentation and paging, translate virtual memory to physical memory accesses and handle memory cache, protection and bus arbitration. Hence: the MMU is just an add-on to the CPU to create a complete CPU (Central Processing Unit). Unwanted access to the MMU and Memory is one of the root security vulnerabilities that has existed since its conception and still exists today.
The pros are paging, segmentation, virtual memory to physical memory, security with a question mark, by intercepting all memory transactions through a TLB (Translation Lookaside Buffer) to insure multiple processes are directed to the CPU accordingly. This does slow down the access since it adds additional clock cycles to perform the translation. The MMU segmentation protection methodology allows the memory to be partitioned and controlled into Kernal (trusted segment), code or program Protected segment, Supervisory and Data segment groups that add security to the processor system. MMU's are incorporated into the majority of 32 bit processors available to the COTS (Commercial-Off-The-Shelf) market today. We will research the market place for both simple processors without MMU that such that the MMU may be added as a peripheral for comparison as the series progresses. Intel incorporated MMU as been part of the processor since the 80286 release and since then ARM, AMD, MIPS and other COTS embedded processors incorporate MMU's.
What is incorporated is an interrupt function that is in all CPUs today. The interrupt sequence allows the core CPU to share time with other processes by saving the complete state of the core CPU along with a return location to restore the previous state and continue processing. Each manufacturer specifies how their interrupt functions processes information to accomplish the switching. Some use an interrupt controller that allows the fetching of a block of addresses defined as the interrupt vectors that hold pointers to processes to be executed. Those that do not use an interrupt controller are limited to only a single interrupt along with an interrupt handler that must poll all devices to determine which one caused the interrupt. Time sharing is an important feature when selecting a single processor core system since it will determine the overall program efficiency when several peripherals are attached. This is also an issue with multi-core chips as well since each processor runs independently and shares memory. Number of Processors and Interrupt capabilities will be put on the KSR list as well and does create a major security vulnerability if not handled properly.
Interrupts are a integral part of the CPU that allows the CPU to multi-task by interrupting a current process and jumping to another processor then returning when the interrupted process is completed. CPU architectures generally have a single interrupt process that requires an external interrupt controller allowing management of several interrupt lines connected to peripherals that are assigned unique memory address locations that contain pointer to the interrupt handler code. We will cover interrupt structures as we proceed through the series. From the Instruction Process Logic Block can conclude that instructions are single thread and require a memory fetch request for the instruction to be executed which point to the memory as a key security vulnerability.
Each of the user defined memory segments have access control parameters to prevent any unwanted access. Any intrusion to protected areas creates a page fault interrupt to a jump location that contains the error handling code to be executed sending a notification or correcting the intrusion. We will cover Interrupt control processes in the Interface section as the series progresses. The Memory Management Units control registers are generally integrated into the CPU's core control for better security such as in the Intel IA32, IA64 product lines.
Memory Protection Unit (MPU)
MMU, MPU Summary:
OK, now time to bring up interrupts again. Now that we have all this memory access control, what happens if an unwanted access happens as it does in multi-tasking and multi-processor systems. The answer is a page fault interrupt is generated and there is a separate stack of contiguous memory outside the normal OS that points to a block of code that handles the page fault interrupt. A few conditions that are tested and some internal fixes are automatically fixed and process is resumed. Some errors halt the system and some errors just notify the user with some options. Hackers are constantly playing with code injection and getting page faults to find back doors to the OS. In a multii-tasking application the task manager in the MMU handles the segmentation and virtual memory allocation along with all the page/bank switching to save each tasks position in the task scheduler.
The interrupt sequence allows the "single thread" core CPU to share time with other processes by saving the complete state of the core CPU along with the tasks return location to restore the previous state and continue processing. Each manufacturer specifies how their interrupt function processes information to accomplish the switching. Some use an interrupt controller that allows the fetching of a block of addresses defined as the interrupt vectors that hold pointers to processes to be executed. Those that do not use an interrupt controller are limited to only a single interrupt along with an interrupt handler that must poll all devices to determine which one caused the interrupt. Time sharing is an important feature when selecting a single processor core system since it will determine the overall program efficiency when several peripherals are attached. This is also an issue with multi-core chips as well since each processor runs independently and shares memory. Number of Processors and Interrupt capabilities will be put on the KSR list as well and does create a major security vulnerability if not handled properly.
Task handlers require the management of all the tasks opened for processing as well as supplying a separate interrupt structure for each task, this is where bank/page switching and virtual segmentation is used for management both locally and globally. CPU architectures generally have a single interrupt process that jumps to a specific location either pointed to by an internal CPU register set to a memory address. Some cores require the use of an external interrupt controller allowing management of several interrupt lines connected to peripherals that are assigned unique memory address locations that contain pointer to the interrupt handler code. We will cover interrupt structures as we proceed through the series. From the Instruction Process Logic Block can conclude that instructions are single thread and require a memory fetch request for the instruction to be executed which point to the memory as a key security vulnerability. Generally interrupt handlers are accessed indirectly from a contiguous block of memory specifically assigned by the core processor and integrated with an interrupt controller.
If access to the memory is permitted while the CPU is off executing tasks then all activity in the Core IoT Platform may be monitored, intercepted and controlled. Therefore controlling access to this section of the processor is noted in the KSR list to address during the design. Several embedded processor chips incorporate the Memory Management Unit (MMU) as part of the Memory Access Controller.
The majority of embedded platforms work in the linear memory address mode and segmentation is generally not implemented probably for simplicity for those applications that do not require an OS to function. Another major of concern for the MMY is the connection of a Direct Memory Access Controller (DMA) which also a direct access to memory without and CPU intervention. This is a major security concern since DMA controllers transfer data in both directions at very high speeds to/from the real world BUS interface. Intel incorporated DMA controllers in the original 8088 PC and used it for all disc transfers for the speed. When the 80286 was introduced the disk controller was redesigned and DMA was not used since the 80286 was faster by transferring direct. Direct Memory Access for devices like USB, Firewire and other streaming devices present a security risk since they access a block of defined memory without CPU intervention. Memory access is a critical security vulnerability since it contains everything that is happening in the core system therefore it is on the top level KSR list of concerns.
External BUS Interface - Data /Address & Control for External Devices & Memory
I/O Interface Control BUS:
Clock Interface Controller (CIC) for CPU process timing
Power On Sequence (POS) Initialization Test unique to the CPU core to start executing instructions from a defined memory location.
At this point we will end this part and cover the remaining part of the CPU core system.
This part of the series is just the beginning and is meant to be an outline for reference. The embedded processor world is expanding at such a rate that security is be ing bypassed for the fastest to market at the expense of the publics privacy and safety.
Reference Links for Part 8:
Reference documents for continued reading are listed below:
Part 9+ "Preliminary Outline" Embedded Processor Systesm: Continued
Publishing this series on a website or reprinting is authorized by displaying the following, including the hyperlink to BASIL Networks, PLLC either at the beginning or end of each part.
For Website Link: cut and past this code:
Internet of Things (IoT) -Security, Privacy, Safety-Platform Development Project Part-7
saltuzzo | 23 November, 2017 06:56
Part 7: IPv4, IPv6, Protocols - Network, Transport & Application: Continued
Design is a way of life, a point of view. It involves the whole complex of visual communications: talent, creative ability, manual skill, and technical knowledge. Aesthetics and economics, technology and psychology are intrinsically related to the process. - Paul Rand
Part 1 Introduction - Setting the Atmosphere for the Series (September 26, 2016)
Quick review to set the atmosphere for Part 7
We presented the first part of the Ethernet Protocol hardware characteristics, the software frame structure and how it identifies devices on its network.
What we want to cover in Part 7:
The Checksum Algotiyhms and the Ethernet Protocol Cyclic Redundancy Check (CRC-32)
Enjoy the series.
Lets Get Started: A "BIT" of CRC and Checksum History
The object of the theory is to detect errors in transmission by using some sort of error detection code sequence which was introduce back in the 1940's by Richard W. Hamming who modernized the development of error-correcting codes. This was fast evolving by the contribution of other theorists and ended up with the name Hamming Distance and so we have the introduction to checksums and CRC by way of Error-Detecting Codes and Correction incorporating Hamming Distance theory.
There are many documents published detailing several different methodologies to test a block of bits, in fact so many that some sort of standard had to be set for the Internet in order to insure some type of data transfer integrity where all system use the same methodology at both ends to insure consistency throughout the networks. As we stated in previous parts of this series, "as long as you follow the protocols in place the data will be routed source to destination however, there is no guarantee that the data will not contain bit errors "and" that is the main task of the CRC and Checksum algorithms. The user data extraction is up to the application to encode/decode the users data which also should contain some sort or data integrity checking process as well. So with that said we will look at the protocols being presented which are Ethernet, IP, TCP, and UDP and the checking methodologies used to insure data transfer integrity. There is an intense amount of research information on serial data bit testing and grouping of bits into blocks of various sizes attempting to create the best process of identifying a single bit error in a large block of data during transfer. We will give some references at the end of this part for reference, one could make a career in error-detecting-codes, that is not the intent here, just an overview with an understanding to implement the algorithms for our IoT Core Platform development. There are so many web pages that give the source code in several languages for the Hamming distance function eliminating the need to present it here.
We often see in the protocol header used in the Internet fields labeled "Checksum", Frame Check Sequence (FCS) , CRC and others, these fields may be a Checksum or CRC or some other error checking methodology unique to the protocol. To clear this up we will look at the checksum functions and the Cyclic Redundancy Check (CRC) algorithm which are the main functions used for many of the protocols, specifically for the initial IoT Core Platform. We will return to the Checksums and CRC later in the series when we address the programming of selected protocols and the OSI model.
Hamming Distance And The Checksum:
The IntelHex file format checksum is the interesting section of the IntelHex File format that we want to cover here which is defined as the sum of all the data bytes in the single line as shown in Figure 7.0 below. The line of code has a maximum length of [Start Code(S) + Byte Count + Address + Record Type + Data ] bytes all represented in ASCII Hex format. Each eight bit byte is represented by two ASCII hex characters (0-9-A-F) for a maximum ASCII hex character line length of [1*2 + 2*2 + 1*2 + 256*2 ] = 1032 ASCII hex characters + 1*2 characters for the checksum which is attached to the line after calculation.
The checksum calculation is the two's compliment of the Least Significant Byte of the summation of ASCII Character bytes in the line less the Start Character, ASCII colon ":". So how is this effective? Lets look at Figure 7.1 The Hex Number Notation and the actual data transitions of the hex data 0-9, A-F in binary format, they are 0x30 - 0x39 and 0x41-0x46 respectfully to binary string values are 00110000 - 00111001 and 01000001 - 01000110. So as we see the number of transitions to make any two binary hex digit strings to be the same is always less than 5 transitions. This is important for checksums since the lower HD number the better chance of catching a bit error in a sequence. The upper case A-F is used since bits 5, 6, & 7 never change leaving only 5 bits to test. Bit 4 gives a minimum HD of 1 and the detection of a bit reversal would only require the checksums to be different. This is the reason that the simple IntelHex File checksum is effective. This checksum algorithm would have a greater opportunity to miss the detection of a bit reversal if the entire byte of 255 reversals were allowed.
The IntelHex File Format for transferring data is one of the "least efficient" methodologies and would totally burden the Internet, however for loading an embedded processor program memory, FPGA's, CPLD's EEPROM etc. it is efficient and accurate which is why several embedded processor manufacturers incorporated it into their Integrated Development Environment (IDE) tool and still widely used today. Since the IntelHex File Format is only applied for single text line of characters with data blocks of 256 bytes [256x2 hex characters] or less makes this algorithm not applicable for Internet data transfer checking function that require large amounts of data transfers.
Taking into to account what we are looking for is actual bit reversals from the transfer of data from source to destination through a medium. So if we look at the data and ask the following questions while visualizing the data blocks shown in Figure 7.2.
For this example we are only using the ASCII Hex byte for simplicity since the resultant G(x) size for the sum would be less than 16 bits and the highest bit change would be bit five would be 64 x 1032 = 66,048 which would require 17 bits sum register for G(x). Looking at the bit patterns in Figure 7.1 and the blocks in 7.2, in order to get the same checksum from bit reversals two consecutive blocks would have to have a specific bit reversal that would subtract from one and add to the other to give the same sum. The probability of this happening is very small due to the uniqueness of the ASCII Hex bit patterns. The probability of detecting one to several bit reversals is very good. This is why the ASCII Hex file format has a high reliability. If all eight bits were to be used this would reduce the reliability and be difficult to handle if the total number of blocks were to increase. This is the basics to understanding bit reversals and the front door of error-detecting, error-correcting code sequences.
Moving forward, to the late 70's an individual from Lawrence Livermore Labs credited for creating a flexible checksum algorithm that incorporates a variable block size and was given the name the Fletcher Checksum after its creator John G. Fletcher (1934-2012). This added more credibility to the error-detection process and is used throughout the Internet today. However, keep in mind that all checksum algorithms that use the sum algorithm process have their limitations since it is just a sum and XOR of bits that give a result on a block of bits. Adding two byte/word bit strings is a simple non-cpu taxing process and is the fastest algorithm for obtaining a sum of a block of bits for a simple checksum algorithm however, as stated it does have its limitations.
The Fletcher Checksum implemented a Checksum Size to the process which allowed variations for different applications and improved performance. If the groups are sized properly depending on the Hamming Distance the probably of missing the detection of a bit reversal is reduced to a very usable state for transferring large amounts of data as well as loading embedded system memory that would be apparent over the Internet, however it is still questionable today of its performance with present Gigabit network speeds. The Quality of Service (QoS) of the Internet is quite high and if the physical layer is properly installed the transfer errors are very low;.in a gigabit network even if the data resends a few packets it would not be realized over time.
Cyclic Redundancy Check CRC Overview:
Hamming Distance, Polynomials And The CRC:
Hamming Distance, CRC, Checksum Summary:
Performance of Checksums and CRCs over Real Data (1998)
Looking at systems in mathematics and definitions, multiplication is group addition and division is group subtraction; the basics of the checksum is group addition, we see that both CRC polynomial evaluation and checksum have similar properties. The uniqueness of the CRC is that the polynomial allows a combination of groups because of the division and has a greater probability of detecting a bit reversal because of the Hamming Distance. The conclusion is that error-detection for bit reversals during transmission will remain a challenging topic for one to create a fool proof methodology. The interesting part of all this is the ability to detail the limitations of the algorithms and still be able to implement them and generally rely on their functionality as we have for the past 50 years and will continue until a better methodology in created. We will be implementing various security protocols and policies during once the hardware has been defined for the Core IoT Platform. This covers the initial presentation of the Internet IPv4 and IPv6 in general and gives us enough details to select the hardware platform. We will be implementing the Checksum and CRC in a software module for the first time through development. Hardware implementation of CRC algorithms will be addressed after the initial Core IoT Platform has been through a POD (Proof of Design).
A Brief History of the Embedded and CPU Major Players
There are several other players that cross license cores and put their own name on them which we did not mention here for simplicity. All of the above companies and several other younger players incorporate the ARM Cortex line of processors, since ARM allows the cross licensing of the processor technology. This allows each of these manufacturers to incorporate the ARM processor technology and incorporate their own unique interfaces and software development environment. ARM also distributes its own development software as well as training for the processor line. OK, that is a brief history of the embedded processor marketplace from 1971 to 2017 that shows nothing is as stable as we would like it to be when developing hardware. For the selection process we have to decide on a 32 or 64 bit processor and which manufacturer will be manufacturing this processor for several years. Researching embedded processors on the Internet we see that there are many available, however when we review the Last-Time-Buy (LTB) we see that many are being discontinued by the end of 2018/2019. That means that we would have to redesign the platform before we even get it on the market for a year. Silicon rollover is one of the major concerns in the hardware development process. If you are in the market for the long term you have to make long term decisions to insure cost effectiveness. This is usually overlooked during the startup stages since the main objective is to get the product to market first and create the market need and identity.
The Processors Dilemma:
Going back in time, the original Intel 80486 back in 1989 introduced the first processor incorporating the tightly woven pipeline architecture and remained in the embedded market for over 15 years before Intel officially stopped manufacturing the chip. However, there are still a couple of manufacturers that still produce chip with the x86 pipeline process under one of the few remaining perpetual licenses. The ones I found on the Internet that sell the Chip and not a fully assembled single board computer are AMD® Corporation GEODE™ Series, system on a chip with graphics engine, ZFMicro™ Corporation ZFx86™ which is a 100MHz 486DX pipeline processor with an 80 bit FPU core with no graphics engine and 33 MHz PCI BUS and IDE drive interface under 1 watt; both ZFx86 and GEODE series are SoC's. The Microchip MIPS32/64® processor line is a RISC (Reduced Instruction Set Computer) M-Class processor core and as of 2017 MIPS32/64® processors are still being used for residential gateways, routers and other Android/Linux OS based embedded systems. MIPS origin is formally from MIPS Technologies back in the early days. From the history MIPS processors would be the likely choice for the IoT Core Platform. MIPS architecture is still a pipeline architecture with some added features that add up to five additional cycles to complete the fetch and execution while balancing the System Clock to the Instruction Performance.
Which Embedded Processor To Choose?
Putting our top level requirement on the table for a flexible IoT Core Platform that has reasonable RAM and FLASH memory to execute the simple to the complex application does present a challenge. There are two schools of thought when developing a platform, the first is to reduce the chip count to the smallest possible number up front and struggle with the selection of embedded peripherals that can be shared; Second, start with a stand alone processor then add the memory and peripherals selected for a proof of design then start to reduce the design for cost savings. There are pros & cons for both approaches. Our approach in this series is to create a functional block diagram using single blocks for each function to get a top level (40,000 foot) envision of the all the functionality we would like for the platform. From the functional block diagram we will look at how an embedded architecture will incorporate some of the blocks and build the system platform from there.
For those who prefer running an OS like Linux or others we will keep that in mind when selecting the embedded processor system. The embedded marketplace has caused a lot of work for the Linux development team that certifies Linux OS implementations with hundreds of embedded processors if any would be available today. Remember if we just look at the core functions we could use other technology to add functionality later as long as we maintain control over the functional components for implementing security policies.
The Embedded Processor Selection:
Peripheral vs. Functions
Table 7.0 Core IoT Platform Peripherals and Functions
OK, the functional block diagram shows many peripherals attached to the main bus, it is not very difficult to create a block diagram like this considering the number of features we would like to see in the IoT Core Platform. The two items that should stand out are the 32 bit parallel interface controller and the Custom User Interface Controller. If we were to remove all the other peripherals then the 32Bit Parallel Interface Controller and the Custom User Interface Controller would allow us to add just about any type of peripheral(s) that can be imagined within the boundaries of the processor.
I am not a big fan of wireless in a process control area for many reasons that we will cover when we get into the security and software development parts of the series.
This is the first conceptual block diagram presentation of the Core IoT Platform, as we continue the series we will apply any changes to the platform as required for the applications and optimization
Reference Links for Part 7:
The high level expert links for the CRC and Checksum are listed below; there are so many Internet references to this subject that listing them would take up several pages and is not the intent of this series.
The iSCSI CRC32C Digest and the Simultaneous Multiply and Divide Algorithm January 30, 2002 Luben Tuikov & Vicente Cavannay
Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks 2004 Philip Koopman, Tridib Chakravarty
Performance of Checksums and CRCs over Real Data (1998) Craig Partridge, Jim Hughes, Jonathan Stone
Part 8 Preliminary Outline:
Publishing this series on a website or reprinting is authorized by displaying the following, including the hyperlink to BASIL Networks, PLLC either at the beginning or end of each part.
For Website Link: cut and past this code: