Introduction

This is a data integration guide with the purpose to help integrate imagery and metadata for full motion video captured by an Unmanned Aerial Vehicle (UAV) in such a way that the data is adapted and feasible for Vantor Raptor Sync to process.

To be feasible for Raptor Sync both imagery and metadata need to be translated into a format understandable by Raptor Sync, and they also need to be within the tolerances for successful geo-registration. The tolerances for geo-registration are described in the Raptor Sync Product Specification.

Canonical Video Logical Representation

The Canonical Video Format (or canv for short) is a simple, human understandable and well-defined data representation. The format makes it suitable for integration tasks, data exchange and normalized storage.

The format's simplicity and its support by a small, but growing, integration toolbox makes the Canonical Video Format the preferred alternative at early stages during a Raptor Sync integration project, or during Raptor Sync evaluation.

In the Canonical Video Format there's an unambiguous one-to-one mapping between any given image in the video and its corresponding metadata record.
Images are represented as JPEGs, with the requirement that all images for a Canonical Video share the same dimensions.
Metadata records are represented as JSON objects, and they describe the extrinsic and intrinsic parameters for the pinhole camera capturing the image.

Layout of Metadata

The complete layout for a metadata JSON record is depicted in the subsections below. JSON fields enclosed in <> are intended to be replaced with real data.

   {
        "pos": [...],
        "att": [...],
        "lens": {...}
   }

Camera World Position

The camera's world position is described by the JSON array "pos". It contains the three values latitude, longitude and height. The camera world position uses WGS84 as its horizontal reference system, where latitude and longitude are given in degrees.

        "pos": [
            <latitude in degrees>,
            <longitude in degrees>,
            <height above geoid, in meters and with negative sign>
        ],

The vertical reference system used is EGM2008 (which roughly equals mean sea level). The height is given in meters and shall be assigned a negative sign for heights that are above the reference. The reason for the negative height is that the reference frame describing the position and attitude is a North-East-Down (NED) system, where the positive z-axis points down. Hence altitudes above the reference are negative.

Camera Attitude

The camera's attitude is described by the JSON array "att". It contains the three values yaw, pitch and roll, all given in radians. The attitude is applied to a North-East-Down (NED) reference frame, where the x-axis points to the north, the y-axis points to the east and the z-axis points to the center of Earth.

        "att": [
            <yaw rotation in radians>,
            <pitch rotation in radians>,
            <roll rotation in radians>
        ],

The order in which the attitude angles are applied to form the camera's pose:

The yaw angle is applied to the z-axis. A rotation with a positive sign results in the camera "turning to the right".
The pitch angle is applied to the y-axis. A rotation with a positive sign results in the camera "looking up".
The roll angle is applied to the x-axis. A rotation with a positive sign results in the camera "tilting to the right".

Camera Lens

The camera's lens, or intrinsic, parameters are described by two mandatory and three optional parameters in the JSON object "lens".

        "lens": {
            "hfov": <horizontal field of view in radians>,
            "vfov": <vertical field of view in radians>,
            "k2": <radial lens distortion coefficient>,
            "k3": <radial lens distortion coefficient>,
            "k4": <radial lens distortion coefficient>
        }

The mandatory parameters are fields of view for the horizontal and vertical image directions, given in radians.

The optional parameters are the coefficients for radial lens distortion. If the camera is calibrated and the coefficients are given the image will be "undistorted" by Raptor Sync before geo-registration. If coefficients are not provided they are defaulted to zero (0.0).

Canonical Video Storage

Beside the logical representation of Canonical Video there's also a physical storage format. The storage format is built upon Zip archives acting as containers for JSON documents and JPEG images.

A single Canonical Video is based on two Zip archives; one archive with the file extension .canv and the other with the file extension .ims.

The archive with the .canv extension constitutes the main file for a video. During normal handling it is the only file that explicitly needs to be addressed while working with the tools in the integration toolbox. Integration toolbox tools are implicitly accessing the .ims files when needed.

Metadata records and images are mapped to each other using their filenames inside their respective Zip archives.

Canv (.canv) Files

When listing a file with the .canv extension (using e.g., unzip -l <file.canv>) something like the following can be observed.

M Filemode      Length  Date         Time      File
- ----------  --------  -----------  --------  ----------
  -rw-rw-r--       400  13-Apr-2016  17:31:20  0000.json
  -rw-rw-r--       399  13-Apr-2016  17:31:20  0001.json
  -rw-rw-r--       398  13-Apr-2016  17:31:20  0002.json
  -rw-rw-r--       399  13-Apr-2016  17:31:20  0003.json
  -rw-rw-r--       398  13-Apr-2016  17:31:20  0004.json
  -rw-rw-r--       400  13-Apr-2016  17:31:20  0005.json
  -rw-rw-r--       399  13-Apr-2016  17:31:20  0006.json
  -rw-rw-r--       396  13-Apr-2016  17:31:20  0007.json

[..]

  -rw-rw-r--       401  13-Apr-2016  17:31:20  5419.json
  -rw-rw-r--       399  13-Apr-2016  17:31:20  5420.json
  -rw-rw-r--       144  13-Apr-2016  17:31:20  index.json
  -rw-rw-r--      1261  13-Apr-2016  17:31:20  proc.json
- ----------  --------  -----------  --------  ----------
               2141665                         5423 files

The JSON files with a numeric file names are documents representing camera metadata for each frame in the Canonical Video.

The file index.json is a document with common metadata for the video, and the file proc.json is a document describing the steps how the video was created.

Ims (.ims) Files

The .ims files are similar, but instead of JSON files they contain JPEG images enumerated and named in the same way as the metadata in the .canv files.

A .ims file can look like this:

M Filemode      Length  Date         Time      File
- ----------  --------  -----------  --------  ----------
  -rw-rw-r--    355154  13-Apr-2016  17:31:20  0000.jpeg
  -rw-rw-r--    368887  13-Apr-2016  17:31:20  0001.jpeg
  -rw-rw-r--    367164  13-Apr-2016  17:31:20  0002.jpeg
  -rw-rw-r--    366716  13-Apr-2016  17:31:20  0003.jpeg
  -rw-rw-r--    365012  13-Apr-2016  17:31:20  0004.jpeg
  -rw-rw-r--    365622  13-Apr-2016  17:31:20  0005.jpeg
  -rw-rw-r--    364577  13-Apr-2016  17:31:20  0006.jpeg
  -rw-rw-r--    364064  13-Apr-2016  17:31:20  0007.jpeg

[..]

  -rw-rw-r--    347939  13-Apr-2016  17:31:20  5419.jpeg
  -rw-rw-r--    343970  13-Apr-2016  17:31:20  5420.jpeg
  -rw-rw-r--        41  13-Apr-2016  17:31:20  index.json
  -rw-rw-r--      1261  13-Apr-2016  17:31:20  proc.json
- ----------  --------  -----------  --------  ----------
              1778528143                         5423 files

Relation to the Protobuf Format

The native format for communicating with the Raptor Sync Server is a Google Protobuf network message protocol (see the Raptor Sync User Guide for information about the video server and its interfaces). The Protobuf protocol shares many similarities with the Canonical Video Format when it comes to how the data is represented. There are however also a few important differences worth mentioning.

In the Protobuf format:

The attitude angles yaw, pitch and roll are in degrees instead of in radians.
The horizontal and vertical fields of view angles are in degrees instead of in radians.
The image payload can be either PNG or uncompressed grayscale or RGB byte arrays.

In the software package for the Raptor Sync Server the documented Protobuf specification raptor_sync_message_definition.proto is included.

Tools in the Integration Toolbox

The various tools in the integration toolbox are mentioned and briefly described below.

Vivid™ Terrain Explorer (previously known as Precision3D Explorer). It provides advanced visualization of the Canonical Video projected on top of the Vivid Terrain surface model.
raptor-canv. Python package with the functionality to create, read and write Canonical Videos, without the user having to know the details about the physical storage format.
raptor-sync-cli. Python package with the functionality to interact (start, stop, and geo-register) with the Raptor Sync server. Provides CLI for the geo-registration of Canonical Video.
raptor-tiny-geo: Python package with simple and limited functionality for conversion between a few common coordinate system. E.g. conversion between ECEF and WGS84, or between ellipsoid and geoid heights.
raptor-tiny-cam: Python package with simple and limited functionality for common camera transforms.

Case Study: Convert to Canonic

A case study where Vivid Terrain Explorer Image Folder Format is converted to Canonical Video Format. This is done by utilizing the raptor-canv python module. Once in the Canonical Video Format the video can be geo-registered using the raptor-sync-cli package.

The Image Folder Format is used by the Vivid Terrain Explorer to manage image datasets. For this particular case the Image Folder Format uses a file triplet to represent each frame in a sequence: a PNG file with the pixel data and two JSON files, one with the common camera metadata and the other with coefficients for the radial lens distortion. The two JSON files have the file extensions .meta and .dist respectively. Besides the different file extensions, the file triplet is identified by having the same filename.

Given a file path for the Canonical Video target file, and a list of file triplets for the Image Folder, the core logic for data conversion is exemplified below. The Python code shown below requires the built-in json module, the Image type from the PIL package and the Canv and Ims types from the vantor_canv package.

import json
from PIL import Image
from vantor_canv import Canv, Ims

A function with the core conversion logic from Image Folder to Canonical Video could look like the following.

def if2canv(files: list[tuple], canv_path: pathlib.Path) -> None:

To create a new Canonical Video the number of frames, and the size of the images needs to be known.

    image_size = Image.open(files[0][0], formats=['png']).size
    num_files = len(files)

The objects which implement the Canonical Video Format can now be created.

    ims_path = canv_path.with_suffix('.ims')
    canv = Canv.new(canv_path, num_files, image_size,
                    pathlib.Path(ims_path.name), [])
    ims = Ims.new(ims_path, num_files, [])

With the help of a JSON data translation function the entire conversion can be implemented by a simple for-loop, where the converted data and the images are appended to the canv and ims objects.

    for png_file, meta_file, dist_file in files:
        canv_obj = to_canv(read_json(meta_file)['Projector'],
                           read_json(dist_file)['Projection'] if not dist_file is None else None)
        image = Image.open(png_file, formats=['png'])

        canv.append(canv_obj)
        ims.append(image)

For completeness, this is the JSON data conversion function:

def to_canv(meta: Dict[str, Any], dist: Dict[str, Any] | None) -> Dict[str, Any]:
    obj = {
        'cam': {
            'pos': meta['Position'],
            'att': meta['Attitude'],
            'lens': {
                'hfov': meta['FovH'],
                'vfov': meta['FovV']
            }
        }
    }

    if not dist is None:
        obj['cam']['lens']['k2'] = dist['K2']
        obj['cam']['lens']['k3'] = dist['K3']
        obj['cam']['lens']['k4'] = dist['K4']

    return obj

That's all.

Tips & Tricks

To get the expected results from Raptor Sync it is important that both the metadata and the image data is within the tolerances of Raptor Sync (see the Raptor Sync product specification for further details). A first step in the process of integrating Raptor Sync with a new source of data is to successfully translate the data to Canonical Video Format and then to visualize and assess the result in Vivid Terrain Explorer.

Geographic Location

The first thing is to get the geographic location of the video right. Does the general location look okay when the video is viewed in the Vivid™ Terrain Explorer? Good! Continue with the camera orientation then.

If not there are a few tips & tricks to consider:

Are latitude and longitude being mixed up? The order shall be first latitude and then longitude.
Are latitude and longitude given in degrees? If not, convert them to degrees (in Python the math.degrees() function can be used).
Is the height given with a negative sign? If not, just add a minus sign (-) in front of the value.

Camera Attitude

Once the geographic location is in place the next step is to get the camera orientation somewhat right. In the visualization in Vivid Terrain Explorer the camera orientation is easy to notice as a video frame is graphically represented by its camera center, camera frustum (the lines from the camera center to the image corners) and the image itself.

If not looking as expected the following is worth to consider:

Have the attitude angles been given in the expected order? They should come in order yaw, pitch and roll.
Are the attitude angles given in radians? If not, convert them to radians (in Python the math.radians() function can be used).
Is the camera turning in the opposite direction as expected? Perhaps the attitude angles not are adapted for the same sort of NED frame as for Raptor Sync and Canonical Video (see section about Camera Attitude for more details). Try add a minus sign (-) in front of an attitude angle turning in the opposite direction.

Image Compared to Surface Model

With both the geographic location and camera orientation for the camera in place the next step is to compare the visual content of the image with the surface model in the immediate surrounding area where the image is being projected.

If the image is visually similar to the surface model beneath it, i.e., the same stuff is visible, but with some spatial or perspective displacement, the data is ready to be geo-registered using Raptor Sync.

If not, a few tips & tricks can help:

Is there a big difference in scale between the image and surface model? Perhaps the fields of view values have been given in degrees instead of in radians? Convert them to radians (in Python the math.radians() function can be used).
Does the image appear to have the incorrect aspect ratio? Perhaps the values for horizontal and vertical field of view have been flipped?
Is there a minor difference in scale between the image and the sensor model? One probable cause is that the height is based on an incorrect vertical reference (making the camera being too low or too high, and resulting in a minor scaling difference). Canonical Video is expecting the height to be relative to the EGM2008 geoid (in Python, code in the raptor-tiny-geo package can be used for conversion).
An alternative cause for minor difference in scale is that the camera sensor is reporting slightly erroneous values for field of view. In such case the camera needs to be calibrated (see the Raptor Sync product specification for details about tolerances for field of view; calibration is outside the scope of this Integration Guide, though).

Canonical Video Logical Representation

Layout of Metadata​

Camera World Position​

Camera Attitude​

Camera Lens​

Canonical Video Storage

Canv (.canv) Files​

Ims (.ims) Files​

Relation to the Protobuf Format

Tools in the Integration Toolbox

Case Study: Convert to Canonic

Tips & Tricks

Geographic Location​

Camera Attitude​

Image Compared to Surface Model​

Layout of Metadata

Camera World Position

Camera Attitude

Camera Lens

Canv (.canv) Files

Ims (.ims) Files

Geographic Location

Camera Attitude

Image Compared to Surface Model