Princeton Parallel Processor

The Princeton Parallel Processor is a 40-core chip designed to test concepts in scalable computing. Novel architectural features include a clumpy cache coherence framework and bandwidth limiting technology. I was responsible for designing and testing the host board for this many-core chip. The host board, implemented on an ML605 Development kit using a Virtex-6 FPGA, connects computing resources and memory resources inside a computing node. This computing node is capable of communicating with other identical nodes in a larger system to share processing and memory resources. The report below describes the overall structure and goals of the project, including the design of the chip interface, inter-node interface, packet routing, memory controller, and I/O control. Challenges addressed include overcoming pin limits, increasing bandwidth across limited channels, abstracting the structure of random access memories, instantiating and interfacing with Xilinx COREgen modules, designing safe mechanisms to transfer signals across clock domains, combining deadlock-free networks in a hierarchical fashion while preserving deadlock-free properties, using Xilinx synthesis flow tools to load custom logic onto FPGAs, and adjusting hardware platforms for a specific purpose.
  • Date: September 2013 - June 2014
  • Fields: Computer Architecture, Memory Heirarchy, Multi-Node Systems, Many-Core Processors, Network Deadlock, Asynchronus Buffer Design, System Design
  • Tools: Verilog, Xilinx CORE-gen, Xilinx Synthesis Pipeline (X-flow), simv logic simulation
  • Group Members: David Wentzlaff, Yanqi Zhou, Tri Nguyen, Mike McKeown, Yaosheng Fu, Jonathan Balkind
  • Documentation: Report (PDF) Official Website
  • In the News: PCWorld