PSP Benchmark

Overview

To facilitate model-specific aerial object detection, we focus on two representative object categories in aerial imagery, ship and aircraft, and construct two fine-grained object detection datasets, PSP.Ship and PSP.Plane. Unlike existing fine-grained datasets that often mix specific classes (e.g., Asagiri-class) with generic, unresolved sub-categories (e.g., Destroyer) at the same taxonomic level, our datasets introduce a carefully curated hierarchical taxonomy that organizes object models in a consistent and fine-grained manner. Within this taxonomy, PSP.Ship contains 106 ship models and PSP.Plane includes 30 airplane models, providing the most extensive coverage of model-specific categories among existing fine-grained aerial object detection datasets to date.

            Highlights:
            The most extensive coverage of model-specific categories.
Carefully curated hierarchical taxonomy.
Fine-grained attribute annotations.

          

News

[2026-05-11] Dataset homepage is released.
[YYYY-MM-DD] Evaluation server is online.
[YYYY-MM-DD] Dataset download links are available.

Dataset Information

This section describes the data sources, annotation protocol, hierarchical taxonomy, attribute annotations, and dataset splits of PSP.Ship and PSP.Plane.

Task Definition

The goal of this benchmark is model-specific aerial object detection. Given an aerial image, algorithms are required to detect all target objects and predict their rotated bounding boxes as well as their fine-grained model labels. Compared with conventional object detection that only recognizes coarse categories such as ship or airplane, this task requires distinguishing visually similar object models with subtle differences in size, style, and structure.

PSP.Ship

Data overview of PSP.Ship.

The images in PSP.Ship are collected from Google Earth imagery and selected images from existing datasets, including ShipRSImageNet, MCSD, and DOTA-v2, with spatial resolutions ranging from 0.1 m to 2 m. For the Google Earth imagery, ship instances are manually annotated with rotated bounding boxes and model-specific labels. For the selected images from existing datasets, the annotations are derived from the original labels and further refined.

In particular, instances in ShipRSImageNet are partially annotated with specific ship models, while others are labeled with unresolved generic categories such as other destroyer. Meanwhile, ship instances in MCSD are labeled with subcategory names such as destroyer, and instances in DOTA-v2 are labeled only at the coarse level of ship. We therefore refine these annotations by assigning model-specific ship labels to unresolved instances from ShipRSImageNet as well as those from MCSD and DOTA-v2. In addition, a portion of the original bounding boxes are horizontal or fail to tightly enclose objects, so we manually correct them into more accurate rotated bounding boxes.

Overall, PSP.Ship contains 1,953 images with 5,214 ship instances covering 106 ship models. The dataset is randomly split into training and test sets at a ratio of 4:1. The training set contains 1,563 images with 4,181 instances spanning 105 ship models, while the test set contains 390 images with 1,033 instances spanning 91 ship models.

PSP.Ship Taxonomy

Instead of treating each ship model as an independent flat category, we organize the 106 ship models into a four-level hierarchical taxonomy following publicly available ship classification schemes. The first level is the root category ship. The second level groups ships by operational roles into six categories: surface combatants (SC), amphibious warfare ships (AW), auxiliary ships (AS), aircraft carriers (AC), cargo vessels (Cargo), and mine warfare vessels (MW). The third level corresponds to more specific ship types under each operational role. For example, surface combatants are further divided into Cruiser, Destroyer, Frigate, and Littoral Combat Ship. The fourth level corresponds to specific ship classes, which constitute the model-level categories in the dataset, such as Akizuki-class, Arleigh Burke-class, and Asagiri-class.

PSP.Ship Attributes

For each ship class, we annotate 11 attributes derived from publicly available information. These attributes capture visual cues from three perspectives: absolute size cues, including overall length and beam; stylistic semantic cues, represented by the operator; and structural detail cues, including bow, stern, midships structure, funnel, fore bulwark, vertical replenishment point, gun type, and missile launcher type.

The attribute values include numerical measurements, binary indicators, categorical labels, and textual descriptors, providing rich auxiliary knowledge for fine-grained model discrimination.

PSP.Plane

Data overview of PSP.Plane.

The images in PSP.Plane are collected from existing datasets, namely MAR20 and FAIR1M-v2, with spatial resolutions ranging from 0.3 m to 1.1 m. We retain the original category annotations and directly adopt the official training and validation splits of MAR20 and FAIR1M-v2, which are merged into a unified dataset.

The resulting PSP.Plane dataset contains 6,192 training images with 28,216 instances and 4,567 validation images with 23,572 instances. Both splits cover 30 airplane model categories.

PSP.Plane Taxonomy

Similar to PSP.Ship, the 30 airplane models in PSP.Plane are organized into a four-level hierarchical taxonomy according to publicly available airplane classification schemes. The first level is the root category plane. The second level groups planes by usage into military (M) and civil (C). The third level further refines each branch into functional plane types. For example, the civil branch includes wide-body airliner, narrow-body airliner, and regional airliner. The fourth level corresponds to specific airplane models such as Airbus A330, Boeing 747, and COMAC C919.

PSP.Plane Attributes

For each airplane model, we annotate 9 attributes covering the same aspects as those in PSP.Ship, including absolute size cues, stylistic semantic cues, and structural detail cues. Specifically, the absolute size cues include wingspan, length, and aspect ratio; the stylistic semantic cue is represented by the manufacturer; and the structural detail cues include wing shape, wing position, engine number, canard, and tail shape.

Dataset Summary

Dataset	Image Sources	Spatial Resolution	Images	Instances	Categories	Attributes
PSP.Ship	Google Earth, ShipRSImageNet, MCSD, DOTA-v2	0.1 m – 2 m	1,953	5,214	106 ship models	11
PSP.Plane	MAR20, FAIR1M-v2	0.3 m – 1.1 m	10,759	51,788	30 airplane models	9

Dataset Statistics

The following statistics summarize the scale of PSP.Ship and PSP.Plane.

1,953

PSP.Ship Images

5,214

PSP.Ship Instances

106

Ship Models

11

Ship Attributes

10,759

PSP.Plane Images

51,788

PSP.Plane Instances

30

Airplane Models

9

Airplane Attributes

Dataset

Usage License

            License:
            This dataset is available for non-commercial academic research only.
            Users must not redistribute the dataset or use it for commercial purposes
            without prior permission from the dataset owners.
          

Target Categories

Dataset	Root Category	Model Categories	Description
PSP.Ship	Ship	106	Model-specific ship detection in aerial images.
PSP.Plane	Plane	30	Model-specific airplane detection in aerial images.

Download Links

File	Description	Size	Download Link
PSP.Ship Training Set	Training images and annotations for PSP.Ship	To be updated	Download
PSP.Plane Training Set	Training annotations and formatting script for PSP.Plane	To be updated	Download
Taxonomy and Attribute Files	Hierarchical taxonomy and attribute annotations for PSP.Ship	To be updated	Download
Toolkit	Dataloader and evaluator based on MMRotate-1.x framework	To be updated	Download

Evaluation

Submission Format

Please upload your detection results as a compressed file (e.g., .zip) containing prediction files for all test images. Each prediction should include the image identifier, category label, confidence score, and rotated bounding box coordinates.

image_name category_name score x1 y1 x2 y2 x3 y3 x4 y4

Evaluation Metrics

We adopt mean Average Precision (mAP) at an IoU threshold of 0.5 as the basic evaluation metric. Let i denote the i-th hierarchy level. We report two types of hierarchical mAP, namely the multi-level fine-grained mAP (mAP_fⁱ) and the multi-level coarse-grained mAP (mAP_cⁱ).

Metric	Definition
mAP@0.5	Mean Average Precision at an IoU threshold of 0.5 over all model-specific categories.
mAP_fⁱ	Multi-level fine-grained mAP at the i-th hierarchy level. It evaluates model-specific detection performance. The score of a coarse category is computed as the mean AP over all its model-specific subclasses.
mAP_cⁱ	Multi-level coarse-grained mAP at the i-th hierarchy level. It evaluates detection performance at the current semantic granularity by ignoring confusion among model-specific classes under the same ancestor category. A prediction is regarded as correct if the predicted class and the ground-truth class belong to the same category at the corresponding hierarchy level.

Evaluation Server

Submit detection results and view evaluation reports through the online evaluation server. The leaderboard is available within the Evaluation Server.

Open Evaluation Server

The following table reports the performance comparison between general detectors and state-of-the-art fine-grained object detection methods on PSP.Ship and PSP.Plane datasets.

PSP.Ship

Performance comparison between general detectors and state-of-the-art fine-grained object detection methods on PSP.Ship.
Method	Publication	Backbone	Sch.	Ship	SC	AW	CG	AS	MW	AC	#Params.	GFLOPs
One-stage
R. RetinaNet	TPAMI 2020	Swin-T	2×	24.9	18.3	32.7	14.6	19.7	5.9	31.9	37.2	161.7
R. FCOS	ICCV 2019	Swin-T	2×	59.0	47.4	59.9	48.1	46.0	51.5	66.2	35.1	131.6
R. ATSS	CVPR 2020	Swin-T	2×	61.0	49.6	57.9	44.7	49.3	44.7	77.5	35.3	134.3
R³Det	AAAI 2021	Swin-T	2×	32.1	33.6	33.8	20.3	20.2	22.7	45.3	42.6	237.7
S²A-Net	TGRS 2021	Swin-T	2×	45.5	45.0	49.1	38.7	28.9	21.6	63.1	37.7	128.1
R. GLIP	CVPR 2022	Swin-T	2×	57.4	59.4	56.5	35.4	41.3	32.3	71.4	221.2	150.0
OM (R. FCOS)	IGARSS 2025	Swin-T	2×	60.8 (+1.8)	56.3	56.0	48.2	44.2	45.2	81.8	35.3	134.4
Transformer-based
R. Deformable DETR	ICLR 2021	Swin-T	50e	30.1	30.5	36.2	18.6	20.9	15.4	29.0	40.9	136.0
RHINO	WACV 2025	Swin-T	2×	63.8	63.8	55.5	46.5	50.7	36.4	68.5	51.0	193.0
R. Grounding DINO	ECCV 2024	Swin-T	2×	58.6	55.3	50.7	50.0	44.3	40.1	72.7	164.9	152.0
Two-stage
R. Faster R-CNN	TPAMI 2016	Swin-T	2×	61.9	57.6	56.9	51.9	48.7	21.0	75.9	44.4	138.0
Oriented R-CNN	ICCV 2021	Swin-T	2×	79.3	79.8	69.6	68.6	58.9	41.8	93.7	44.5	138.2
ReDet	CVPR 2021	ReR50	2×	68.5	69.7	67.0	56.9	50.4	50.1	59.7	30.8	60.5
Gliding Vertex	TPAMI 2021	Swin-T	2×	62.1	58.0	50.7	57.2	48.0	50.2	71.4	44.5	138.2
EQLv2 (Oriented R-CNN)	TPAMI 2023	Swin-T	2×	60.3 (-19.0)	62.2	57.2	41.3	51.5	26.8	37.1	44.5	138.1
LogN (Oriented R-CNN)	IJCV 2024	Swin-T	2×	71.9 (-7.4)	70.5	70.0	62.5	50.5	48.9	83.3	44.5	138.2
R. HiCLPL (R. Faster R-CNN)	ACMMM 2022	Swin-T	2×	16.1 (-45.8)	9.7	21.3	12.9	10.2	6.5	34.3	46.7	138.2
ISCL (Oriented R-CNN)	TGRS 2022	Swin-T	2×	76.4 (-2.9)	80.3	65.9	71.1	53.8	44.3	89.8	58.2	152.7
PETDet (Oriented R-CNN)	TGRS 2023	Swin-T	2×	76.7 (-2.6)	79.4	67.4	55.6	57.6	61.7	79.3	46.2	133.0
PCLDet (Oriented R-CNN)	TGRS 2023	Swin-T	2×	78.4 (-0.9)	80.4	66.0	63.7	58.0	58.5	89.8	46.0	139.8
OM (Oriented R-CNN)	IGARSS 2025	Swin-T	2×	77.9 (-1.4)	79.2	67.7	55.6	57.6	61.7	79.3	45.6	138.2
EagleVision (Oriented R-CNN)	arXiv 2025	Swin-T	3×	64.5 (-14.8)	61.1	63.4	59.3	46.0	37.4	75.7	669.0	142.0
Ours
ExpertDet (R. ATSS)	-	Swin-T	2×	63.0 (+2.0)	54.0	55.8	48.4	50.4	54.5	75.9	34.2	134.3
ExpertDet (ReDet)	-	ReR50	2×	70.9 (+2.4)	69.2	67.1	60.4	52.3	55.2	70.4	31.0	60.5
ExpertDet (Oriented R-CNN)	-	Swin-T	2×	82.5 (+3.2)	81.1	68.5	68.5	63.4	60.3	93.9	44.6	138.2

PSP.Plane

Performance comparison on PSP.Plane.
Method	Airplane	Military	Civil
One-stage
R. RetinaNet	58.5	72.5	30.6
R. FCOS	61.9	76.9	31.8
R. ATSS	62.6	77.6	32.8
R³Det	56.5	71.6	26.4
S²A-Net	62.6	79.6	28.7
R. GLIP	58.4	72.4	30.3
OM (R. FCOS)	64.2 (+2.3)	80.5	31.7
Transformer-based
R. DeformableDETR	51.8	65.0	25.6
RHINO	60.0	76.2	27.8
R. Grounding DINO	59.3	74.7	28.3
Two-stage
R. Faster R-CNN	61.1	78.6	26.3
Oriented R-CNN	66.4	84.7	29.8
ReDet	67.9	85.7	32.4
Gliding Vertex	57.7	73.9	25.4
EQLv2 (Oriented R-CNN)	66.1 (-0.3)	85.1	28.2
LogN (Oriented R-CNN)	64.4 (-2.0)	83.5	26.3
R. HiCLPL (R. Faster R-CNN)	59.2 (-1.9)	75.3	26.9
ISCL (Oriented R-CNN)	65.1 (-1.4)	82.2	30.8
PETDet (Oriented R-CNN)	63.1 (-3.3)	79.5	30.4
PCLDet (Oriented R-CNN)	66.3 (-0.1)	84.3	30.2
OM (Oriented R-CNN)	63.5 (-2.9)	79.2	31.9
EagleVision (Oriented R-CNN)	66.3 (-0.1)	84.1	30.6
Ours
ExpertDet (R. ATSS)	64.9 (+2.3)	80.5	33.8
ExpertDet (ReDet)	68.8 (+0.9)	86.1	34.1
ExpertDet (Oriented R-CNN)	67.4 (+1.0)	85.7	30.7

Overview

News

Dataset Information

Task Definition

PSP.Ship

PSP.Ship Taxonomy

PSP.Ship Attributes

PSP.Plane

PSP.Plane Taxonomy

PSP.Plane Attributes

Dataset Summary

Dataset Statistics

1,953

5,214

106

11

10,759

51,788

30

9

Dataset

Usage License

Target Categories

Download Links

Evaluation

Submission Format

Evaluation Metrics

Evaluation Server

PSP.Ship

PSP.Plane

Code for ExpertDet