Archive for the ‘FPGA’ Category

January 27th, 2012

FPGA Basics

January 27th, 2012

FPGA vs ASIC

Definitions
FPGA: A Field-Programmable Gate Array (FPGA) is a semiconductor device containing programmable logic components called “logic blocks”, and programmable interconnects. Logic blocks can be programmed to perform the function of basic logic gates such as AND, and XOR, or more complex combinational functions such as decoders or mathematical functions.

ASIC: An application-specific integrated circuit (ASIC) is an integrated circuit designed for a particular use, rather than intended for general-purpose use. Processors, RAM, ROM, etc are examples of ASICs.
FPGA vs ASIC
Speed
ASIC rules out FPGA in terms of speed. As ASIC are designed for a specific application they can be optimized to maximum, hence we can have high speed in ASIC designs. ASIC can have hight speed clocks.
Cost
FPGAs are cost effective for small applications. But when it comes to complex and large volume designs (like 32-bit processors) ASIC products are cheaper.
Size/Area
FPGA are contains lots of LUTs, and routing channels which are connected via bit streams(program). As they are made for general purpose and because of re-usability. They are in-general larger designs than corresponding ASIC design. For example, LUT gives you both registered and non-register output, but if we require only non-registered output, then its a waste of having a extra circuitry. In this way ASIC will be smaller in size.
Power
FPGA designs consume more power than ASIC designs. As explained above the unwanted circuitry results wastage of power. FPGA wont allow us to have better power optimization. When it comes to ASIC designs we can optimize them to the fullest.
Time to Market
FPGA designs will till less time, as the design cycle is small when compared to that of ASIC designs. No need of layouts, masks or other back-end processes. Its very simple: Specifications — HDL + simulations — Synthesis — Place and Route (along with static-analysis) — Dump code onto FPGA and Verify. When it comes to ASIC we have to do floor planning and also advanced verification. The FPGA design flow eliminates the complex and time-consuming floor planning, place and route, timing analysis, and mask / re-spin stages of the project since the design logic is already synthesized to be placed onto an already verified, characterized FPGA device.

Type of Design
ASIC can have mixed-signal designs, or only analog designs. But it is not possible to design them using FPGA chips.
Customization
ASIC has the upper hand when comes to the customization. The device can be fully customized as ASICs will be designed according to a given specification. Just imagine implementing a 32-bit processor on a FPGA!
Prototyping
Because of re-usability of FPGAs, they are used as ASIC prototypes. ASIC design HDL code is first dumped onto a FPGA and tested for accurate results. Once the design is error free then it is taken for further steps. Its clear that FPGA may be needed for designing an ASIC.
Non Recurring Engineering/Expenses
NRE refers to the one-time cost of researching, designing, and testing a new product, which is generally associated with ASICs. No such thing is associated with FPGA. Hence FPGA designs are cost effective.
Simpler Design Cycle
Due to software that handles much of the routing, placement, and timing, FPGA designs have smaller designed cycle than ASICs.
More Predictable Project Cycle
Due to elimination of potential re-spins, wafer capacities, etc. FPGA designs have better project cycle.
Tools
Tools which are used for FPGA designs are relatively cheaper than ASIC designs.
Re-Usability
A single FPGA can be used for various applications, by simply reprogramming it (dumping new HDL code). By definition ASIC are application specific cannot be reused.

What are FPGAs

Field Programmable Gate Arrays (FPGAs) are programmable semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects.

What are ASICs

Application Specific Integrated Circuits (ASICs) are devices custom built for the particular design.

What are FPGA Design Advantages

    Faster time-to-market – no layout, masks or other manufacturing steps are needed No upfront NRE (non recurring expenses) – costs typically associated with an ASIC design Simpler design cycle – due to software that handles much of the routing, placement, and timing More predictable project cycle – due to elimination of potential re-spins, wafer capacities, etc. Field reprogramability – a new bitstream can be uploaded remotely

What are FPGA Design Limitations

    Power consumption – FPGAs fundamentally use a lot more power than ASICs Price – they also fundamentally cost more Speed – ASICs can still blow any FPGA away in sheer speed although design techniques can help with this issue. Density – ASIcs can still pack a lot more logic into a single chip than an FPGA IP – modern, complex IP (a complete PCI Express of Hypertransport core for example) may take up most or all of an FPGA but only 10% of an ASIC

What are ASIC Design Advantages

    Full custom capability – for design since device is manufactured to design specs Lower unit costs – for very high volume designs Smaller form factor – since device is manufactured to design specs Higher raw internal clock speeds

What are ASIC Design Limitations

    High NRE cost – for design since device is manufactured to design specs Lower unit costs – for very high volume designs Smaller form factor – since device is manufactured to design specs Higher raw internal clock speeds

http://only-vlsi.blogspot.com/2008/05/fpga-vs-asic.html

http://electronicsbus.com/fpga-vs-asic-design-verification-comparison/

January 27th, 2012

FPGA

fpga
fpga tutorial
fpga programming
fpga design
fpga4fun
fpga development board
fpga 2012
fpga basics
fpga bitcoin
fpga arcadefpga arcade
fpga architecture
fpga applications
fpga arduino
fpga adc
fpga arm
fpga advantage
fpga asic
fpga altera
fpga acceleration
fpga board
fpga basics
fpga bitcoin
fpga bitcoin mining
fpga bitcoin miner
fpga blog
fpga books
fpga breakout
fpga buy
fpga bitstream
fpga companies
fpga chip
fpga conference
fpga card
fpga cpu
fpga conference 2012
fpga cost
fpga core
fpga course
fpga central
fpga design
fpga development board
fpga dsp
fpga design flow
fpga development
fpga definition
fpga design process
fpga design engineer
fpga design engineer salary
fpga development board xilinx
fpga ethernet
fpga engineer
fpga emulator
fpga evaluation board
fpga engineer salary
fpga engineer resume
fpga encryption
fpga evolution
fpga evaluation kit
fpga editor
fpga for fun
fpga fft
fpga for dummies
fpga forum
fpga firmware
fpga fabric
fpga floating point
fpga from scratch
fpga fir filter
fpga for beginners
fpga guru
fpga gpu
fpga games
fpga getting started
fpga gate count
fpga guitar effects
fpga gameboy
fpga gps
fpga gates
fpga gigabit ethernet
fpga hdmi
fpga hobby
fpga hpc
fpga high frequency trading
fpga hardware
fpga hello world
fpga heatsink
fpga h 264
fpga hacking
fpga hardware acceleration
fpga interview questions
fpga image processing
fpga introduction
fpga ip
fpga ip cores
fpga intern
fpga intro
fpga i2c
fpga implementation
fpga in trading
fpga jobs
fpga journal
fpga jp morgan
fpga jobs san jose
fpga java
fpga jtag
fpga jobs in chicago
fpga jobs boston ma
fpga jobs huntsville al
fpga jobs virginia
fpga kit
fpga keyboard interface
fpga kalman filter
fpga kickstarter
fpga keyboard
fpga kit xilinx
fpga kit cost
fpga kit india
fpga karachi
fpga kit spartan 3e
fpga lut
fpga labview
fpga linux
fpga lookup table
fpga logic cell
fpga labview tutorial
fpga logic analyzer
fpga latch
fpga lvds
fpga lcd controller
fpga mining
fpga manufacturers
fpga mezzanine card
fpga miner
fpga memory
fpga market data
fpga market
fpga market size
fpga matlab
fpga market share
fpga news
fpga nes
fpga netezza
fpga neural network
fpga netlist
fpga national instruments
fpga network processor
fpga nios
fpga nic
fpga nintendo
fpga oscilloscope
fpga opencl
fpga open source
fpga overview
fpga oscillator
fpga os x
fpga open core
fpga on mac
fpga operating system
fpga os
fpga programming
fpga projects
fpga prototyping by verilog examples
fpga programming tutorial
fpga project ideas
fpga prototyping by vhdl examples
fpga programming language
fpga prototyping
fpga programmer
fpga pcie
fpga qpi
fpga quadrature decoder
fpga que es
fpga quant
fpga questions
fpga quadcopter
fpga quartus
fpga qam demodulator design
fpga quadrature encoder
fpga quadrature
fpga rtl
fpga resume
fpga replay
fpga random number generator
fpga reverse engineering
fpga research
fpga ram
fpga ray tracing
fpga reliability
fpga router
fpga starter kit
fpga simulator
fpga synthesis
fpga slice
fpga speed grade
fpga software
fpga spi
fpga speed
fpga supercomputer
fpga synthesizer
fpga tutorial
fpga trading
fpga training
fpga tutorial xilinx
fpga technology
fpga tools
fpga to asic conversion
fpga testing
fpga tcp stack
fpga timing closure
fpga uses
fpga usb
fpga uart
fpga usb interface
fpga usb core
fpga usb stick
fpga ucf
fpga ucf file
fpga update
fpga usb 3.0
fpga vs asic
fpga vs microcontroller
fpga vs cpld
fpga vs gpu
fpga vga
fpga vendors
fpga vs dsp
fpga vhdl
fpga verification
fpga vs microprocessor
fpga wiki
fpga with adc
fpga world
fpga wishbone
fpga wireless
fpga wifi
fpga web server
fpga with arm
fpga wall street
fpga works
fpga xilinx
fpga x86
fpga xilinx tutorial
fpga xilinx spartan 3
fpga xmc
fpga xilinx board
fpga xilinx spartan
fpga xilinix
fpga xaui
fpga xilinx spartan-3e
fpga youtube
fpga yorkshire
fpga yield
fpga you
fpga yuv to rgb
fpga yahoo groups
fpga york
fpga your pc
fpga yout
fpga you program
fpga z80
fpga z80 core
fpga zx spectrum
fpga zx81
fpga zurich
fpga zebu
fpga zx
fpga zero-crossing detector
fpga z-transform
fpga zealand

January 21st, 2012

Multiplexer 16-to-4 using if-then-elsif-else Statement

“– Multiplexer 16-to-4 using if-then-elsif-else Statement
– download from www.pld.com.cn & www.fpga.com.cn

library ieee;
use ieee.std_logic_1164.all;

entity mux is port(
a, b, c, d: in std_logic_vector(3 downto 0);
s: in std_logic_vector(1 downto 0);
x: out std_logic_vector(3 downto 0));
end mux;

architecture archmux of mux is
begin
mux4_1: process (a, b, c, d)
begin
if s = “00″ then
x <= a;
elsif s = “01″ then
x <= b;
elsif s = “10″ then
x <= c;
else
x <= d;
end if;
end process mux4_1;
end archmux;

January 21st, 2012

High-Performance Reconfigurable Computing

High-Performance Reconfigurable ComputingDuncan Buell, University of South Carolina, Tarek El-Ghazawi, George Washington University, Kris Gaj, George Mason University, Volodymyr Kindratenko, University of Illinois at Urbana-ChampaignHigh-performance reconfigurable computers (HPRCs) 1,2 based on conventional processors and field-programmable gate arrays (FPGAs) 3 have been gaining the attention of the high-performance computing community in the past few years. 4 These synergistic systems have the potential to exploit coarse-grained functional parallelism as well as fine-grained instruction-level parallelism through direct hardware execution on FPGAs.
HPRCs, also known as reconfigurable supercomputers, have shown orders-of-magnitude improvement in performance, power, size, and cost over conventional high-performance computers (HPCs) in some compute-intensive integer applications. However, they still have not achieved high performance gains in most general scientific applications. Programming HPRCs is still not straightforward and, depending on the programming tool, can range from designing hardware to software programming that requires substantial hardware knowledge.
The development of HPRCs has made substantial progress in the past several years, and nearly all major high-performance computing vendors now have HPRC product lines. This reflects a clear belief that HPRCs have tremendous potential and that resolving all remaining issues is just a matter of time.
This special issue will shed some light on the state of the field of high-performance reconfigurable computing.

What Are High-Performance Reconfigurable Computers?

HPRCs are parallel computing systems that contain multiple microprocessors and multiple FPGAs. In current settings, the design uses FPGAs as coprocessors that are deployed to execute the small portion of the application that takes most of the time—under the 10-90 rule, the 10 percent of code that takes 90 percent of the execution time. FPGAs can certainly accomplish this when computations lend themselves to implementation in hardware, subject to the limitations of the current FPGA chip architectures and the overall system data transfer constraints.
In theory, any hardware reconfigurable devices that change their configurations under the control of a program can replace the FPGAs to satisfy the same key concepts behind this class of architectures. FPGAs, however, are the currently available technology that provides the most desirable level of hardware reconfigurability. Xilinx, followed by Altera, dominates the FPGA market, but new startups are also beginning to enter this market.
FPGAs are based on SRAM, but they vary in structure. Figure A in the “FPGA Architecture” sidebar shows an FPGA’s internal structure based on the Xilinx architecture style. The configurable logic block (CLB) is the basic building block for creating logic. It includes RAM used as a lookup table and flip-flops for buffering, as well as multiplexers and carry logic. A side-by-side 2D array of switching matrices for programmable routing connects the 2D array of CLBs.
Figure A image
Figure A. FPGA internal structure based on the Xilinx architecture style. An FPGA can be described as “islands” of (reconfigurable) logic in a “sea” of (reconfigurable) connectors.
Figure B image
Figure B. Typical FPGA design flow.

FPGA Architecture

Ross Freeman, one of the founders of Xilinx (www.xilinx.com), invented field-programmable gate arrays in the mid-1980s. 1 Other current FPGA vendors include Altera (www.altera.com), Actel (www.actel.com), Lattice Semiconductor (www.latticesemi.com), and Atmel (www.atmel.com).
As Figure A shows, an FPGA is a semiconductor device consisting of programmable logic elements, interconnects, and input/output (I/O) blocks (IOBs)—all runtime user-configurable—that allow implementing complex digital circuits. The IOBs form a ring around the outer edge of the microchip; each IOB provides individually selectable I/O access to one of the I/O pins on the exterior of the FPGA package. A rectangular array of logic blocks lies inside the IOB ring.
A typical FPGA logic block consists of a four-input lookup table (LUT) and a flip-flop. Modern FPGA devices also include higher-level functionality embedded into the silicon, such as generic DSP blocks, high-speed IOBs, embedded memories, and embedded processors. Programmable interconnect wiring is implemented so that it’s possible to connect logic blocks to logic blocks and IOBs to logic blocks arbitrarily.
A slice (using Xilinx terminology) or adaptive logic module (using Altera terminology), which contains a small set of basic building blocks—for example, two LUTs, two flip-flops, and some control logic—is the basic unit area when determining an FPGA-based design’s size. Configurable logic blocks (CLBs) consist of multiple slices. Modern FPGAs consist of tens of thousands of CLBs and a programmable interconnection network arranged in a rectangular grid.
Unlike a standard application-specific integrated circuit that performs a single specific function for a chip’s lifetime, an FPGA chip can be reprogrammed to perform a different function in a matter of microseconds. Typically, either source code written in a hardware description language, such as VHDL or Verilog, or a schematic design provides the functionality that an FPGA assumes at runtime.
As Figure B shows, in the first step, a synthesis process generates a technology-mapped netlist. A map, place, and route process then fits the netlist to the actual FPGA architecture. The process generates a bitstream—the final binary configuration filecan be used to reconfigure the FPGA. Timing analysis, simulation, and other verification methodologies can validate the map, place, and route results.
Reference
1. S.M. Trimberger, ed., Field-Programmable Gate Array Technology, Kluwer Academic, 1994.

Progress in System Hardware and Programming Software

During the past few years, many hardware systems have begun to resemble parallel computers. When such systems originally appeared, they were not designed to be scalable—they were merely a single board of one or more FPGA devices connected to a single board of one or more microprocessors via the microprocessor bus or the memory interface.
The recent SRC-6 and SRC-7 parallel architectures from SRC Computers use a crossbar switch that can be stacked for further scalability. In addition, traditional high-performance computing vendors—specifically, Silicon Graphics Inc. (SGI), Cray, and Linux Networx—have incorporated FPGAs into their parallel architectures. In addition to the SRC-7, models of such HPC systems include the SGI RASC RC100 and the Cray XD1 and XT4. The Linux Networx work focuses on the design of the acceleration boards and on coupling them with PC nodes for constructing clusters.
On the software side, SRC Computers provides a semi-integrated solution that addresses the hardware (FPGA) and software (microprocessor) sides of the application separately. The hardware side is expressed using Carte C or Carte Fortran as a separate function, compiled separately and linked to the compiled C (or Fortran) software side to form one application.
Other hardware vendors use a third-party software tool, such as Impulse C, Handel-C, Mitrion C, or DSPlogic’s RC Toolbox. However, these tools handle only the FPGA side of the application, and each machine has its own application interface to call those functions. At present, Mitrion C and Handel-C support the SGI RASC, while Mitrion C, Impulse C, and RC Toolbox support the Cray XD1. Only a library-based parallel tool such as the message-passing interface can handle scaling an application beyond one node in a parallel system.

Research Challenges and the Evolving HPRC Community

FPGAs were first introduced as glue logic and eventually became popular in embedded systems. When FPGAs were applied to computing, they were introduced as a back-end processing engine that plugs into a CPU bus. The CPU in this case did not participate in the computation, but only served as the front end (host) to facilitate working with the FPGA.
The limitations of each of these scenarios left many issues that have not been explored, yet they are of great importance to HPRC and the scientific applications it targets. These issues include the need for programming tools that address the overall parallel architecture. Such tools must be able to exploit the synergism between hardware and software execution and should be able to understand and exploit the multiple granularities and localities in such architectures.
The need for parallel and reconfigurable performance profiling and debugging tools also must be addressed. With the multiplicity of resources, operating system support and middleware layers are needed to shield users from having to deal with the hardware’s intricate details. Further, application-portability issues should be thoroughly investigated. In addition, new chip architectures that can address the floating-point requirements of scientific applications should be explored. Portable libraries that can support scientific applications must be sought, and the need for more closely integrated microprocessor and FPGA architectures to facilitate the data-intensive hardware/software interactions should be further studied.
As researchers pursue developments to meet a wide range of HPRC requirements, the failure to incorporate standardization into some of these efforts would be detrimental. It can be particularly useful if academia, industry, and government work together to create a community that can approach these problems with the full intellectual intensity it deserves, subject to the needs of the end users and the experience of the implementers.
Some of this community-forming has been already observed. On the one hand, OpenFPGA (www.openfpga.org) has recently been formed as a consortium that mainly pursues standardization. On the other, the NSF has recently granted to the University of Florida and George Washington University an Industry/University Center for High-Performance Reconfigurable Computing (http://chrec.ufl.edu) award. The center includes more than 20 industry and government members who will guide the university research projects.

In this Issue

We have selected five articles for this special issue that represent the latest trends and developments in the HPRC field. The first two cover particularly important topics: a C-to-FPGA compiler and a library framework for code portability across different RC platforms. The third article describes an extensive collection of FPGA software development patterns, and the last two describe HPRC applications.
In “Trident: From High-Level Language to Hardware Circuitry,” Justin Tripp, Maya Gokhale, and Kristopher Peterson describe an effort undertaken at the Los Alamos National Laboratory to build Trident, a high-level-language to hardware-description-language compiler that translates C language programs to FPGA hardware circuits. While several such compilers are commercially available, Trident’s unique characteristics include its open source availability, open framework, ability to use custom floating-point libraries, and ability to retarget to new FPGA board architectures. The authors enumerate the compiler framework’s building blocks and provide some results obtained on the Cray XD1 platform.
“V-Force: An Extensible Framework for Reconfigurable Computing” by Miriam Leeser and her colleagues and students from Northeastern University and the College of the Holy Cross outlines their efforts to implement the Vforce framework. Based on the object-oriented VSIPL++ standard, Vforce encapsulates hardware-specific implementations behind a standard API, thus insulating application-level code from hardware-specific details. As a result, as long as the third-party hardware-specific implementation is available, the same application code can run on different reconfigurable computer architectures with no change. The authors include examples of applications and results from using Vforce for application development.
In “Achieving High Performance with FPGA-Based Computing,” Martin Herbordt and his students from Boston University share a valuable collection of FPGA software design patterns. The authors start with an observation that the performance of HPC applications accelerated with FPGA coprocessors is “unusually sensitive” to the quality of the implementation. They examine reasons for such a “sensitivity,” list numerous methods and techniques to avoid generating “implementational heat,” and provide a few application examples that greatly benefit from the uncovered design patterns.
“Sparse Matrix Computations on Reconfigurable Hardware,” by Gerald Morris and Viktor Prasanna describes implementations of conjugate gradient and Jacobi sparse matrix solvers. In “Using FPGA Devices to Accelerate Biomolecular Simulations,” Sadaf Alam and her colleagues from the Oak Ridge National Laboratory and SRC Computers describe an effort to port a production supercomputing application, a molecular dynamics code called Amber, to a reconfigurable supercomputer platform. Although the speedups obtained while porting these applications—highly optimized for the conventional microprocessors—to an SRC-6 reconfigurable computer are not spectacular, these articles accurately capture the overall trend.
Reconfigurable supercomputing has demonstrated its potential to accelerate computationally demanding applications and is rapidly entering the mainstream HPC world.
High-performance reconfigurable computing has demonstrated its potential to accelerate demanding computational applications. Much, however, must be done before this technology becomes a mainstream computing paradigm. The articles in this issue highlight a small subset of challenging problems that must be addressed. We encourage you to get involved with HPRC and contribute to this newly developing field.
References

  1. D.A. Buell, J.M. Arnold, and W.J. Kleinfelder, eds., Splash 2: FPGAs in a Custom Computing Machine, IEEE CS Press, 1996.
  2. M.B. Gokhale and P.S. Graham, Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays, Springer, 2005.
  3. S.M. Trimberger, ed., Field-Programmable Gate Array Technology, Kluwer Academic, 1994.
  4. T. El-Ghazawi et al., “Reconfigurable Supercomputing Tutorial,” Int’l Conf. High-Performance Computing, Networking, Storage and Analysis (SC06); http://sc06.supercomputing.org/schedule/event_detail.php?evid=5072.

Duncan Buell is a professor in the Department of Computer Science and Engineering at the University of South Carolina, Columbia. Buell received a PhD in mathematics from the University of Illinois at Chicago. Contact him at buell@sc.edu.
Tarek El-Ghazawi is a professor in the Department of Electrical and Computer Engineering at the George Washington University, Washington, D.C. El-Ghazawi received a PhD in electrical and computer engineering from New Mexico State University. Contact him at tarek@gwu.edu.
Kris Gaj is an associate professor in the Department of Electrical and Computer Engineering at George Mason University, Fairfax, Virginia. Gaj received a PhD in electrical engineering from Warsaw University of Technology, Poland. Contact him at kgaj@gmu.edu.
Volodymyr Kindratenko is a senior research scientist at the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana. He received a DSc in analytical chemistry from the University of Antwerp, Belgium. Contact him at kindr@ncsa.uiuc.edu.

 Current print edition



Project Summary

High-performance reconfigurable computing, the focus of CHREC, holds tremendous promise in addressing the needs of a broad range of applications, in areas such as signal and image processing, cryptology, communications processing, data and text mining, optimization, bioinformatics, and complex system simulations. Reconfigurable systems span a variety of platform types, from leading-edge machines on earth to mission-critical machines in space. Advantages from a reconfigurable approach can be realized in terms of performance, power, size, cooling, cost, versatility, scalability, and dependability to name a few, important facets where conventional computing infrastructure alone is proving unable to meet the needs of an increasing number of critical applications. Preliminary thrust areas for CHREC include device and core building blocks, reconfigurable systems and services, design automation and programming methods and tools, and reconfigurable and parallel algorithms and applications. Research projects in these areas are formulated on an annual basis in concert with Center partners, emphasizing a keen interest in exploring and evaluating new methods as well as key tradeoff analyses.
Although a relatively new field, reconfigurable computing (RC) has come to the forefront as an important processing paradigm for high-performance computing (HPC), often in concert with conventional microprocessor-based computing. With RC, the full potential of underlying electronics in a system may be better realized in an adaptive manner. At the heart of RC, field-programmable hardware in its many forms has the potential to revolutionize the performance and efficiency of systems for HPC as well as deployable systems in high-performance embedded computing (HPEC). One ideal of the RC paradigm is to achieve the performance, scalability, power, and cooling advantages of the “Master of a trade,” custom hardware, with the versatility, flexibility, and efficacy of the “Jack of all trades,” a general-purpose processor. As is commonplace with components for HPC such as microprocessors, memory, networking, storage, etc., critical technologies for RC can also be leveraged from other IT markets to achieve a better performance-cost ratio, most notably the field-programmable gate array or FPGA. Each of these devices is inherently heterogeneous, being a predefined mixture of configurable logic cells and powerful, fixed resources.
Many opportunities and challenges exist in realizing the full potential of reconfigurable hardware for HPC. Among the opportunities offered by field-programmable hardware are a high degree of on-chip parallelism that can be mapped directly from dataflow characteristics of the application’s defining parallel algorithm, user control over low-level resource definition and allocation, and user-defined data format and precision rendered efficiently in hardware. In realizing these opportunities, there are many vertical challenges, where we seek to bridge the semantic gap between the high level at which HPC applications are developed and the low level (i.e. HDL) at which hardware is typically defined. There are also many horizontal challenges, where we seek to integrate or marry diverse resources such as microprocessors, FPGAs, and memory in optimal relationships, in essence bridging the paradigm gap between conventional and reconfigurable processing at various levels in the system and software architectures.
Success is expected to come from both revolutionary and evolutionary advances. For example, at one end of the spectrum, internal design strategies of field-programmable devices need to be reevaluated in light of a broad range of HPC and HPEC applications, not only to potentially achieve a more effective mixture of on-chip fixed resources alongside reconfigurable logic blocks, but also as a prime target for higher-level programming and translation. At the other end of the spectrum, new concepts and tools are needed to analyze the algorithmic basis of applications under study (e.g. inherent control-flow vs. data-flow components, numeric format vs. dynamic range), and new programming models to render this basis in an abstracted design strategy, so as to potentially target and exploit a combination of resources (e.g. general-purpose processors, reconfigurable processors, and special-purpose processors such as GPUs, DSPs, and NPs). While attempting to build highly heterogeneous systems composed of resources from many diverse categories can be cost-prohibitive, and a goal of uni-paradigm application design for multi-paradigm computing may be extremely difficult to perfect, one of the inherent advantages of RC is that it promises to support these goals in a more flexible and cost-effective manner. Between the two extremes of devices and programming models for multi-paradigm computing, many challenges await with new concepts and tools – compilers, core libraries, system services, debug and performance analysis tools, etc. These and related steps will be of paramount importance for the transition of RC technologies into the mainstream of HPC and HPEC.