PSP.Ship and PSP.Plane banner

PSP Benchmark

for Model-Specific Aerial Ship and Plane Detection

Overview

To facilitate model-specific aerial object detection, we focus on two representative object categories in aerial imagery, ship and aircraft, and construct two fine-grained object detection datasets, PSP.Ship and PSP.Plane. Unlike existing fine-grained datasets that often mix specific classes (e.g., Asagiri-class) with generic, unresolved sub-categories (e.g., Destroyer) at the same taxonomic level, our datasets introduce a carefully curated hierarchical taxonomy that organizes object models in a consistent and fine-grained manner. Within this taxonomy, PSP.Ship contains 106 ship models and PSP.Plane includes 30 airplane models, providing the most extensive coverage of model-specific categories among existing fine-grained aerial object detection datasets to date.

Highlights:
  • The most extensive coverage of model-specific categories.
  • Carefully curated hierarchical taxonomy.
  • Fine-grained attribute annotations.

News

  • [2026-05-11] Dataset homepage is released.
  • [YYYY-MM-DD] Evaluation server is online.
  • [YYYY-MM-DD] Dataset download links are available.

Dataset Information

This section describes the data sources, annotation protocol, hierarchical taxonomy, attribute annotations, and dataset splits of PSP.Ship and PSP.Plane.

Task Definition

The goal of this benchmark is model-specific aerial object detection. Given an aerial image, algorithms are required to detect all target objects and predict their rotated bounding boxes as well as their fine-grained model labels. Compared with conventional object detection that only recognizes coarse categories such as ship or airplane, this task requires distinguishing visually similar object models with subtle differences in size, style, and structure.

PSP.Ship

The images in PSP.Ship are collected from Google Earth imagery and selected images from existing datasets, including ShipRSImageNet, MCSD, and DOTA-v2, with spatial resolutions ranging from 0.1 m to 2 m. For the Google Earth imagery, ship instances are manually annotated with rotated bounding boxes and model-specific labels. For the selected images from existing datasets, the annotations are derived from the original labels and further refined.

In particular, instances in ShipRSImageNet are partially annotated with specific ship models, while others are labeled with unresolved generic categories such as other destroyer. Meanwhile, ship instances in MCSD are labeled with subcategory names such as destroyer, and instances in DOTA-v2 are labeled only at the coarse level of ship. We therefore refine these annotations by assigning model-specific ship labels to unresolved instances from ShipRSImageNet as well as those from MCSD and DOTA-v2. In addition, a portion of the original bounding boxes are horizontal or fail to tightly enclose objects, so we manually correct them into more accurate rotated bounding boxes.

Overall, PSP.Ship contains 1,953 images with 5,214 ship instances covering 106 ship models. The dataset is randomly split into training and test sets at a ratio of 4:1. The training set contains 1,563 images with 4,181 instances spanning 105 ship models, while the test set contains 390 images with 1,033 instances spanning 91 ship models.

PSP.Ship Taxonomy

Instead of treating each ship model as an independent flat category, we organize the 106 ship models into a four-level hierarchical taxonomy following publicly available ship classification schemes. The first level is the root category ship. The second level groups ships by operational roles into six categories: surface combatants (SC), amphibious warfare ships (AW), auxiliary ships (AS), aircraft carriers (AC), cargo vessels (Cargo), and mine warfare vessels (MW). The third level corresponds to more specific ship types under each operational role. For example, surface combatants are further divided into Cruiser, Destroyer, Frigate, and Littoral Combat Ship. The fourth level corresponds to specific ship classes, which constitute the model-level categories in the dataset, such as Akizuki-class, Arleigh Burke-class, and Asagiri-class.

PSP.Ship Attributes

For each ship class, we annotate 11 attributes derived from publicly available information. These attributes capture visual cues from three perspectives: absolute size cues, including overall length and beam; stylistic semantic cues, represented by the operator; and structural detail cues, including bow, stern, midships structure, funnel, fore bulwark, vertical replenishment point, gun type, and missile launcher type.

The attribute values include numerical measurements, binary indicators, categorical labels, and textual descriptors, providing rich auxiliary knowledge for fine-grained model discrimination.

PSP.Plane

The images in PSP.Plane are collected from existing datasets, namely MAR20 and FAIR1M-v2, with spatial resolutions ranging from 0.3 m to 1.1 m. We retain the original category annotations and directly adopt the official training and validation splits of MAR20 and FAIR1M-v2, which are merged into a unified dataset.

The resulting PSP.Plane dataset contains 6,192 training images with 28,216 instances and 4,567 validation images with 23,572 instances. Both splits cover 30 airplane model categories.

PSP.Plane Taxonomy

Similar to PSP.Ship, the 30 airplane models in PSP.Plane are organized into a four-level hierarchical taxonomy according to publicly available airplane classification schemes. The first level is the root category plane. The second level groups planes by usage into military (M) and civil (C). The third level further refines each branch into functional plane types. For example, the civil branch includes wide-body airliner, narrow-body airliner, and regional airliner. The fourth level corresponds to specific airplane models such as Airbus A330, Boeing 747, and COMAC C919.

PSP.Plane Attributes

For each airplane model, we annotate 9 attributes covering the same aspects as those in PSP.Ship, including absolute size cues, stylistic semantic cues, and structural detail cues. Specifically, the absolute size cues include wingspan, length, and aspect ratio; the stylistic semantic cue is represented by the manufacturer; and the structural detail cues include wing shape, wing position, engine number, canard, and tail shape.

Dataset Summary

Dataset Image Sources Spatial Resolution Images Instances Categories Attributes
PSP.Ship Google Earth, ShipRSImageNet, MCSD, DOTA-v2 0.1 m – 2 m 1,953 5,214 106 ship models 11
PSP.Plane MAR20, FAIR1M-v2 0.3 m – 1.1 m 10,759 51,788 30 airplane models 9

Dataset Statistics

The following statistics summarize the scale of PSP.Ship and PSP.Plane.

1,953

PSP.Ship Images

5,214

PSP.Ship Instances

106

Ship Models

11

Ship Attributes

10,759

PSP.Plane Images

51,788

PSP.Plane Instances

30

Airplane Models

9

Airplane Attributes

Dataset

Usage License

License: This dataset is available for non-commercial academic research only. Users must not redistribute the dataset or use it for commercial purposes without prior permission from the dataset owners.

Target Categories

Dataset Root Category Model Categories Description
PSP.Ship Ship 106 Model-specific ship detection in aerial images.
PSP.Plane Plane 30 Model-specific airplane detection in aerial images.

Download Links

File Description Size Download Link
PSP.Ship Training Set Training images and annotations for PSP.Ship To be updated Download
PSP.Plane Training Set Training annotations and formatting script for PSP.Plane To be updated Download
Taxonomy and Attribute Files Hierarchical taxonomy and attribute annotations for PSP.Ship To be updated Download
Toolkit Dataloader and evaluator based on MMRotate-1.x framework To be updated Download

Evaluation

Submission Format

Please upload your detection results as a compressed file (e.g., .zip) containing prediction files for all test images. Each prediction should include the image identifier, category label, confidence score, and rotated bounding box coordinates.

image_name category_name score x1 y1 x2 y2 x3 y3 x4 y4

Evaluation Metrics

We adopt mean Average Precision (mAP) at an IoU threshold of 0.5 as the basic evaluation metric. Let i denote the i-th hierarchy level. We report two types of hierarchical mAP, namely the multi-level fine-grained mAP (mAPfi) and the multi-level coarse-grained mAP (mAPci).

Metric Definition
mAP@0.5 Mean Average Precision at an IoU threshold of 0.5 over all model-specific categories.
mAPfi Multi-level fine-grained mAP at the i-th hierarchy level. It evaluates model-specific detection performance. The score of a coarse category is computed as the mean AP over all its model-specific subclasses.
mAPci Multi-level coarse-grained mAP at the i-th hierarchy level. It evaluates detection performance at the current semantic granularity by ignoring confusion among model-specific classes under the same ancestor category. A prediction is regarded as correct if the predicted class and the ground-truth class belong to the same category at the corresponding hierarchy level.

Evaluation Server

Submit detection results and view evaluation reports through the online evaluation server. The leaderboard is available within the Evaluation Server.

The following table reports the performance comparison between general detectors and state-of-the-art fine-grained object detection methods on PSP.Ship and PSP.Plane datasets.

PSP.Ship

Performance comparison between general detectors and state-of-the-art fine-grained object detection methods on PSP.Ship.
Method Publication Backbone Sch. Ship SC AW CG AS MW AC #Params. GFLOPs
One-stage
R. RetinaNet TPAMI 2020 Swin-T 24.9 18.3 32.7 14.6 19.7 5.9 31.9 37.2 161.7
R. FCOS ICCV 2019 Swin-T 59.0 47.4 59.9 48.1 46.0 51.5 66.2 35.1 131.6
R. ATSS CVPR 2020 Swin-T 61.0 49.6 57.9 44.7 49.3 44.7 77.5 35.3 134.3
R3Det AAAI 2021 Swin-T 32.1 33.6 33.8 20.3 20.2 22.7 45.3 42.6 237.7
S2A-Net TGRS 2021 Swin-T 45.5 45.0 49.1 38.7 28.9 21.6 63.1 37.7 128.1
R. GLIP CVPR 2022 Swin-T 57.4 59.4 56.5 35.4 41.3 32.3 71.4 221.2 150.0
OM (R. FCOS) IGARSS 2025 Swin-T 60.8 (+1.8) 56.3 56.0 48.2 44.2 45.2 81.8 35.3 134.4
Transformer-based
R. Deformable DETR ICLR 2021 Swin-T 50e 30.1 30.5 36.2 18.6 20.9 15.4 29.0 40.9 136.0
RHINO WACV 2025 Swin-T 63.8 63.8 55.5 46.5 50.7 36.4 68.5 51.0 193.0
R. Grounding DINO ECCV 2024 Swin-T 58.6 55.3 50.7 50.0 44.3 40.1 72.7 164.9 152.0
Two-stage
R. Faster R-CNN TPAMI 2016 Swin-T 61.9 57.6 56.9 51.9 48.7 21.0 75.9 44.4 138.0
Oriented R-CNN ICCV 2021 Swin-T 79.3 79.8 69.6 68.6 58.9 41.8 93.7 44.5 138.2
ReDet CVPR 2021 ReR50 68.5 69.7 67.0 56.9 50.4 50.1 59.7 30.8 60.5
Gliding Vertex TPAMI 2021 Swin-T 62.1 58.0 50.7 57.2 48.0 50.2 71.4 44.5 138.2
EQLv2 (Oriented R-CNN) TPAMI 2023 Swin-T 60.3 (-19.0) 62.2 57.2 41.3 51.5 26.8 37.1 44.5 138.1
LogN (Oriented R-CNN) IJCV 2024 Swin-T 71.9 (-7.4) 70.5 70.0 62.5 50.5 48.9 83.3 44.5 138.2
R. HiCLPL (R. Faster R-CNN) ACMMM 2022 Swin-T 16.1 (-45.8) 9.7 21.3 12.9 10.2 6.5 34.3 46.7 138.2
ISCL (Oriented R-CNN) TGRS 2022 Swin-T 76.4 (-2.9) 80.3 65.9 71.1 53.8 44.3 89.8 58.2 152.7
PETDet (Oriented R-CNN) TGRS 2023 Swin-T 76.7 (-2.6) 79.4 67.4 55.6 57.6 61.7 79.3 46.2 133.0
PCLDet (Oriented R-CNN) TGRS 2023 Swin-T 78.4 (-0.9) 80.4 66.0 63.7 58.0 58.5 89.8 46.0 139.8
OM (Oriented R-CNN) IGARSS 2025 Swin-T 77.9 (-1.4) 79.2 67.7 55.6 57.6 61.7 79.3 45.6 138.2
EagleVision (Oriented R-CNN) arXiv 2025 Swin-T 64.5 (-14.8) 61.1 63.4 59.3 46.0 37.4 75.7 669.0 142.0
Ours
ExpertDet (R. ATSS) - Swin-T 63.0 (+2.0) 54.0 55.8 48.4 50.4 54.5 75.9 34.2 134.3
ExpertDet (ReDet) - ReR50 70.9 (+2.4) 69.2 67.1 60.4 52.3 55.2 70.4 31.0 60.5
ExpertDet (Oriented R-CNN) - Swin-T 82.5 (+3.2) 81.1 68.5 68.5 63.4 60.3 93.9 44.6 138.2

PSP.Plane

Performance comparison on PSP.Plane.
Method Airplane Military Civil
One-stage
R. RetinaNet 58.5 72.5 30.6
R. FCOS 61.9 76.9 31.8
R. ATSS 62.6 77.6 32.8
R3Det 56.5 71.6 26.4
S2A-Net 62.6 79.6 28.7
R. GLIP 58.4 72.4 30.3
OM (R. FCOS) 64.2 (+2.3) 80.5 31.7
Transformer-based
R. DeformableDETR 51.8 65.0 25.6
RHINO 60.0 76.2 27.8
R. Grounding DINO 59.3 74.7 28.3
Two-stage
R. Faster R-CNN 61.1 78.6 26.3
Oriented R-CNN 66.4 84.7 29.8
ReDet 67.9 85.7 32.4
Gliding Vertex 57.7 73.9 25.4
EQLv2 (Oriented R-CNN) 66.1 (-0.3) 85.1 28.2
LogN (Oriented R-CNN) 64.4 (-2.0) 83.5 26.3
R. HiCLPL (R. Faster R-CNN) 59.2 (-1.9) 75.3 26.9
ISCL (Oriented R-CNN) 65.1 (-1.4) 82.2 30.8
PETDet (Oriented R-CNN) 63.1 (-3.3) 79.5 30.4
PCLDet (Oriented R-CNN) 66.3 (-0.1) 84.3 30.2
OM (Oriented R-CNN) 63.5 (-2.9) 79.2 31.9
EagleVision (Oriented R-CNN) 66.3 (-0.1) 84.1 30.6
Ours
ExpertDet (R. ATSS) 64.9 (+2.3) 80.5 33.8
ExpertDet (ReDet) 68.8 (+0.9) 86.1 34.1
ExpertDet (Oriented R-CNN) 67.4 (+1.0) 85.7 30.7

Code for ExpertDet