# boundingbox¶

The boundingbox provider module is designed to produce simple bounding boxes from a set of input annotations. For provision of more complex information, see the localization provider module, which uses the same input source type as boundingbox, but generates additional buffers that are appropriate for training protocols needed for Faster-RCNN and SSD. Top-left corner of a bounding box is xmin,ymin and bottom-right corner is xmax,ymax.

Input data used to provision bounding box output buffers can be provided via the manifest file as pointers to json annotation files:

@FILE
/annotations/0001.json
/annotations/0002.json
/annotations/0003.json


Note that one would typically not use a boundingbox provider in isolation, but would pair it with an image file that the annotation is referring to. The image information is omitted here. Each annotation is in the JSON format, which should have the main field “object” containing the bounding box, class, and difficulty of each object in the image. For example:

{
"object": [
{
"bndbox": {
"xmax": 262,
"xmin": 207,
"ymax": 75,
"ymin": 10
},
"difficult": false,
"name": "tvmonitor",
},
{
"bndbox": {
"xmax": 431,
"xmin": 369,
"ymax": 335,
"ymin": 127
},
"difficult": false,
"name": "person",
},
],
}


To generate these json files from the XML format used by some object localization datasets such as PASCALVOC, see the main neon repository.

The dataloader generates on-the-fly the anchor targets required for training neon’s Faster-RCNN model. Several important parameters control this anchor generation process:

Name Default Description
height (uint) Required Height of provisioned image (pixels)
width (uint) Required Width of provisioned image (pixels)
class_names (vector of strings) Required List of class names (e.g. [“person”, “tvmonitor”]). Should match the names provided in the json annotation files.
max_bbox_count (uint) Required Maximum bounding boxes in the output buffer.
name (string) “” Name prepended to the output buffer name
output_type (string) “uint8_t” Output data type.

This provider creates a set of eleven buffers that are consumed by the Faster-RCNN model. Defining A as the number of anchor boxes that tile the final convolutional feature map, M as the max_gt_boxes parameter, and N as the batch size, we have the following provisioned buffer:

Buffer Name Shape Description
boundingbox (N, M * 4) Ground truth bounding box coordinates. Boxes are padded into a larger buffer.

For Faster-RCNN, we handle variable image sizes by padding an image into a fixed canvas to pass to the network. The image configuration is used as above with the added flags crop_enable set to False and fixed_aspect_ratio set to True. These settings place the largest possible image in the output canvas in the upper left corner. Note that the angle transformation is not supported.

For SSD, we handle variable image sizes by resizing (warping) an image of input_size to the input size of the network output_size. The image configuration is used as above with the added parameters expand_ratio set to {1., 4.}, expand_probability to 1., and emit_constraint_type set to center. These settings place the original image at random position inside the output canvas enlarged by randomized ratio in range from 1 to 4. Expansion is applied before cropping (according to sampled patch). For patch sampling, you can define a number of batch_samplers. If provided, max_sample determines how many patch samples at most (satisfying constraints) can be generated by this patch sampler in max_trials number of trials during single patch sampling step. Batch sampler contains a sampler structure and sample constraint. If specified, the contraints can be either min_jaccard_overlap or max_jaccard_overlap or both. At least one ground truth box has to meet the constraints for the sample to be satisfied. Parameters of sampler (min_scale, max_scale, min_aspect_ratio and max_aspect_ratio) bind the dimensions of sample to a specified scale and aspect ratio range. Note that the angle transformation is not supported.