Pixelstrom: Image-Combining SystemPixelstrom: Image-Combining System

“It makes my heart sing” – Henry Fuchs

Abstract
Pixelstrom is a FPGA-based image combining system for parallel graphics systems. The goal of our current work is to reduce network traffic and latency for increasing performance in parallel visualization systems. The architecture of Pixelstrom also allows to load huge datasets without any preprocessing step (e.g. Level of Detail). Initial data distribution is based on a common ethernet network whereas image combining and returning differs to traditional parallel rendering methods. Calculated sub-images are grabbed directly from the DVI-Ports for fast image compositing by a FPGA-based combiner.

Introduction
The system is designed to visualize large 3D models/scenes and highly complex shaders in real- time. In comparison to traditional parallel graphics systems the realized FPGA-based system reduces network traffic, latency and memory accesses by directly grabbing the rendered sub-images from the DVI-Ports of the render servers. An especially designed hardware combiner merges the rendered sub-images. Related publications or commercial systems in this area often have complex system configurations, are too expensive or merge sub-images in cascaded network processing units. In contrast to nVidias Scalable Link Interface our system is not limited to a fixed number of GPUs.

Implementation
Because of the flexibility of FPGAs, different algorithms for parallel rendering can be implemented. Due to the genlocked signal of the rendered image data the hardware combiner has no need for an external RAM. This is one reason for the cost- effectiveness of the hardware. Another reason for the economic efficiency is the use of standard hardware components like FPGAs and DVI-D, which also makes the system small and compact. The current implementation of the FPGA is the sort-first algorithm. In this case each connection via DVI-D transfers only color information to the combiner. In a final step the combiner merges the calculated sub-images additively. To provide a high speed-up in parallel rendering clusters using hardware rendering, the models have to be spatialized for an efficient culling process.

In interactive and dynamic scenes the need for good load balancing is essential. The implemented dynamic view frustum splits the scene into n new frusta. The size of the frusta is coordinated by the render client and calculated every frame depending on the response time of each server. Due to the synchronization between all render servers and the static latency of the combiners it is guaranteed that more than one hardware combiner can be used. The cascading composition of the combiners unlocks the limitation of a fix number of render servers, which enables an arbitrary performance benefit.

Results
The setup for the first evaluation: one render client and two render servers with a Nvidia FX 3000G. The following results are measured using a XGA resolution and a dynamic view frustum. The latency of the FPGA combiner is about 40 microseconds.

Model Triangles Spatialized 1 Server 2 Server
Mercedes 300SL 0 800 000 No 20 fps 30 fps
Infinity Triant 1 227 000 Yes 29 fps 59 fps
Synthetic Scene 2 000 000 Yes 22 fps 44 fps

Future work of this project is to implement the sort-last algorithm and to analyze cascading, alpha-blending and other renderers e.g. a ray tracer or a volume renderer.