Download White Paper

Remote Direct Memory Access (RDMA) with Atlas10 Cameras

RDMA for 10GigE Cameras Whitepaper

This white paper delves into the ways in which RDMA technology is transforming high-speed Ethernet machine vision cameras. It explores the origins and mechanics of RDMA, as well as the RoCEv2 standard, providing a detailed comparison with traditional protocols such as TCP and UDP. The paper emphasizes the zero-copy transmission capability that RDMA provides, which enables data transmission to bypass the CPU and OS. Furthermore, the paper presents a comprehensive overview of how this feature affects 10GigE and 25GigE cameras, including preliminary benchmark results. By reading this white paper, you can gain valuable insights into the impact of RDMA technology on the machine vision camera industry.

What’s Inside:

• Why Use UDP for the GigE Vision standard?
• Challenges of UDP for 10GigE Cameras
• Removing the CPU Bottleneck
• Remote DMA over Convergent Ethernet (RoCE v2)
• RDMA Connection and Transfer Steps + Notes
• RDMA for GigE Vision
• What about GPU Memory?
• UDP versus RDMA Benchmarks
• The Future is Fast

After clicking “submit” a download link will be provided.
(English, Japanese, Korean and Chinese PDFs available for download)

Sneak Peek

Surviving Industrial Challenges To build these compact embedded vision systems however, application designers must navigate the challenges of harsher operating environments and the complexities of building smaller, faster, more power efficient systems. They must work to validate their system through time-consuming stages, starting from the proof of concept (POC), to prototyping, and finally to a minimum viable product (MVP) or a Full Custom Design (FCD). Off-the-shelf embedded development kits, such as those from NVIDIA, Xilinx, or Raspberry Pi offer a quick solution to building a proof-of-concept design. However, many camera modules and embedded development boards offer little to no protection from the harsh environments of industrial spaces. A considerable amount of time must be spent on designing and testing prototypes that are protected against dust and moisture (IP67 or IP65), electromagnetic interference (EMC immunity) and ... Fill out the form above and download the full white paper Adapting RDMA over the Ethernet network enables the following benefits: • Fastest throughput and lowest latency available at all speeds of Ethernet networks, compatible with existing switching infrastructure and cabling. • Complete hardware offload of packet handling with no CPU involvement through zero-copy; ability to send and receive data to and from remote buffers. • Comprehensive ecosystem of industrial connectivity solutions offering secure connectors, EMI shielding, ground isolation, and Power over Ethernet (PoE). • Supported by many vendors of both hardware and software solutions — including Broadcom, Marvell, Nvidia, and Intel — promoting interoperability. The Host Channel Adapter (HCA) manages RoCE operations, implementing in hardware all the logic needed to execute RDMA protocol. Data segmentation and reassembly as well as flow control are managed by the HCA allowing the sender and receiver applications to work with whole buffers. The RDMA channel is initiated by “pinning” memory. A memory region is reserved on the host for RDMA usage, the necessary protections are applied, and then the host passes the address to the HCA and removes itself from the data path. This registered memory region can now be used for any RDMA operation, bypassing the operating system, and generating no additional CPU load. RoCE RDMA transactions use three queues. The send and receive queues handle all the data transaction and are always created together as a queue pair (QP). The completion queue (CQ) is used to track the completion of the work scheduled on the QP. The QPs enables application-level flow control to notify the sender of available buffers for RDMA transfer on the receiver’s end. figure 7: In a conventional networking stack (left), data transfer from one application to another application on a remote machine involves multiple buffer copies and context switching to mobilize the CPU at each stage. In an RoCE network, data is transfered directly from the initiator application to the target application with no CPU involvement, bypassing the OS stack entirely. What do you need to enable RDMA with
LUCID 10GigE Cameras?
• LUCID Atlas10 camera with RDMA firmware • RDMA 10GigE Host Channel Adapter (HCA) • (Optional) 10GigE Switch with PFC priority-enabled VLAN • Cat6 or better Ethernet cable
Before RDMA data transfers can take place, a queue pair (QP) and a completion queue (CQ) must be created for each hardware port, known as a channel adapter (CA) on the RDMA network. What is needed on the camera: Camera QP (Send and Receive Queues) Camera CQ (Completion Queue) Registering and reserving host memory for RDMA user applications is possible thanks to Network Direct in MS Windows® and the Libibverbs library in Linux. RDMA data transfers can only start after memory is registered. This is done by “pinning” the host’s memory. A memory region is reserved by the OS and registered with the HCA. Host Channel Adapter (HCA) Port 2. HCA2 QP (Send and Receive Queues) HCA2 CQ (Completion Queue) Host Channel Adapter (HCA) Port 1. HCA1 QP (Send and Receive Queues) HCA1 CQ (Completion Queue) How are RDMA devices programmed in applications? It starts with RDMA verbs, which are the low-level building blocks for RDMA applications. They are accessed by various API libraries, however, the two major libraries used are Libibverbs API for Linux and Network Direct SPI for Microsoft Windows. These APIs allow RDMA devices to establish channel adapter connections, pin and register memory, execute data transfers, and terminate connections. There are two types of Verbs: one-sided and two-sided verbs. One-sided verbs allow remote devices (such as cameras) to completely bypass the CPU/OS when sending data. Two-sided verbs act more like traditional sockets that utilize the CPU/OS. LUCID uses two-sided verbs. Using two-sided verbs still removes several sources of CPU overhead compared to conventional Ethernet data transfers. Using two-sided verbs is necessary for requeuing transfers and polling the CQ. These tasks take up negligible CPU resources. ======RDMA Verbs and Verbs APIs======
10GigE RDMA White Paper