This service takes as input video with metadata (in KLV or Mavlink format) of drone flight (live or recorded) and generates classification of elements present in the footage.
The service implements various methodologies for detecting objects of different classes in the frames that compose a video. Different search algorithms are used along the frame and different convolution neural network architectures. This unit can be trained for specific objects described in the use case.
Architectures
Technically, different convolution neural network architectures will be implemented with different object detection algorithms. All of them are based on pre-trained classification network architectures in ImageNet:
- Single-Shot Multibox Detector (SSD) with MobileNets
- SSD with Inception V2
- Region-based Fully-Convolutional Networks (R-FCN) with Resnet 101
- Region-based Convolutional Neural Networks (RCNN) with Resnet 101
- RCNN with Resnet v2
Components
- Computer vision module: This module reads the video streaming images and applies the algorithms of analysis and object identification. It has an output of the 2D coordinates of a rectangle containing the identified object and the frame where the identification has been made.
- Metadata decoding module: In charge of decoding the metadata channel in Key-Length-Value (KLV) or equivalent format (to be decided).
- Fusion module: The data coming from the computer vision and metadata decoding modules is synchronized and the 3D coordinates of the identified object, latitude, longitude and altitude are calculated.
Execution and deployment
On-Board Edge device
- NDIVIA Jetson Nano
- Raspberry PI 4 – 2GB RAM
- Raspberry PI 3 B+