While most filtering approaches based on random finite sets have focused on improving performance, in this paper, we argue that computation times are very important in order to enable real-time applications such as pedestrian detection. Towards this goal, this paper investigates the use of OpenCL to accelerate the computation of random finite set-based Bayesian filtering in a heterogeneous system.
In detail, we developed an efficient and fully-functional pedestrian-tracking system implementation, which can run under real-time constraints, meanwhile offering decent tracking accuracy. An extensive evaluation analysis was carried out to ensure the fulfillment of sufficient accuracy requirements. This was followed by extensive profiling analysis to spot the potential bottlenecks in terms of execution performance, which were then targeted to come up with an OpenCL accelerated application.
Video-throughput improvements from roughly 15 fps to 100 fps (6 ×) were observed on average while processing typical MOT benchmark videos. Moreover, the worst-case frame processing yielded an 18 × advantage from nearly 2 fps to 36 fps, thereby comfortably meeting the real-time constraints. Our implementation is released as open-source code.
In the multi-target tracking problem, the number of targets to be tracked is unknown a priori and stochastically varies with time. At the sensor, a random number of measurements is received due to detection uncertainty and false alarms. Consequently, standard Bayesian filtering techniques are not directly applicable, since it is not known which of the received measurements, if any, should be used to update which target state, if any, at each sensor scan.
SYSTEM DESIGN AND IMPLEMENTATION
Like any good engineering design, the main focus has been to come up with a modular design approach to overcome the system complexity efficiently, while aiding in quick development of the system with each module being designed in an isolated fashion and having a clear notion of its input/output interfaces. Figure 1 presents a high-level abstracted view of the overall pedestrian tracking system resulting from this approach.
The GM-PHD filter works by propagating the posterior PHD of the multi-target state in time during each of its recursions. The GM-PHD filter recursion is carried out as shown in Figure 2. Each of the block represents a C++ class method performing its specific functionality. The arrows represent the data flow, whereby the GM terms representing the PHDs of multi-target state travel back and forth between the prediction and the update modules.
As can be seen from Figure 4, the GM-PHD tracker performs apparently perfectly in the easy scenario for estimating the individual target states as illustrated by the OSPA measure. The cardinality estimate, however, shows that even in ideal conditions, the tracker does make occasional mistakes. This is attributed to the adaptive birth distribution model, being embedded inside the tracker algorithm, which requires some initial scans to confirm successive measurements of the new target in order to confirm it as a new track.
These results are presented in Figure 10. As shown in these plots, the OpenCL implementation not only offers advantages in terms of sheer execution timings, but also provides a much more scalable implementation as compared to its pure C++ counterpart. The execution times rise nearly exponentially for both MOT videos in the case of the C++ version, while the rise is much less steep or rather linear for the case of the OpenCL-based implementation.
In this paper, we have developed two random finite set-based Bayesian filtering approaches, Gaussian mixture probability hypothesis density (GM-PHD) and labeled multi-Bernoulli (LMB) filters. The two approaches were designed in a highly modular way. After conducting their accuracy evaluations towards the multi-target tracking problem, we found that LMB filters were more appropriate to track the pedestrians.
Then, we implemented in the LMB filter in C++ and carried out an extensive execution profiling. OpenCL programming was used to relieve the execution burden from the execution bottlenecks. The experimental results demonstrated a high computation improvement. In particular, the frame per second was improved from 15 fps to 100 fps on average, and the worst-case computation was also improved 18 × from 2 fps to 36 fps.
Source: Sun Yat-sen University
Authors: Biao Hu | Uzair Sharif | Rajat Koner | Guang Chen | Kai Huang | Feihu Zhang | Walter Stechele | Alois Knoll