Isaac Event Vision
Neuromorphic DVS plugin for NVIDIA Isaac Lab
Event cameras report per-pixel brightness changes asynchronously, with microsecond latency and 120 dB of dynamic range. That suits fast robotics well. The catch is that training RL policies on them needs simulation, and every existing event-camera simulator runs offline, one video at a time. Isaac Event Vision is the first event-camera sensor that generates events inside Isaac Lab's GPU-parallel simulation loop, across thousands of environments at once.
What it does
- Sub-frame event generation from RTX motion vectors. Each rendered frame is warped to intermediate timepoints, so events stay temporally accurate without rendering extra frames.
- Runs natively across vectorised environments on GPU. A streaming-accumulation mode drops around 2 GB of memory overhead at 1024 parallel environments.
- Configurable sensor-noise model covering threshold mismatch, refractory period, leak events, hot pixels, and shot noise.
- Four ML-ready representations (event frame, voxel grid, time surface, event-count image), plus HDF5 (Prophesee/Metavision-compatible) and numpy output.
How it works
The gap
Modern manipulation and locomotion RL runs thousands of environments in parallel on a single GPU. Event-camera simulators never fit that loop. v2e, ESIM, and DVS-Voltmeter all consume pre-recorded video and emit event files offline, so training a policy on event observations meant breaking the simulation loop entirely.
Approach
Isaac Sim's RTX renderer already emits per-pixel motion vectors alongside every frame. By warping the current frame along those vectors at several sub-steps, the plugin interpolates log-intensity at intermediate timepoints and fires events wherever the contrast threshold is crossed. The events stay temporally accurate, the cost is negligible, and it all runs on-GPU.
Validation
I validated event output against v2e across rotation, translation, ego-motion, and looming scenes, in both clean and noisy configurations. Agreement is strongest on the ego-motion pan: spatial correlation 0.84, density cosine similarity 0.78, spatial IoU 0.64, ON/OFF polarity within 0.2%, and the plugin active on 98.7% of pixels, with event counts within ~16% of v2e. Pure rotation and translation agree less closely (IoU near 0.35), which is expected: sub-frame motion-vector interpolation places events at intermediate timepoints that v2e's frame-aligned output does not, so the two diverge most where motion is fastest.
Next steps
The point of running in the loop is to train policies on event observations, so the next step is a downstream RL benchmark: a Shadow Hand in-hand cube reorientation task across the plugin's observation variants (event frame, voxel grid, time surface, plus noisy-sensor and low-frame-rate ablations against an RGB baseline), to measure what event input actually buys a policy. That training is future work, not a result yet.
Performance
Per-step event-generation latency on GPU (batch size 1, no noise; from the repo's benchmark). Motion-vector generation holds near 2–4 ms even at 480×640, well inside a real-time RL step. Throughput scales further with batched environments.