dataset

本文主要介绍几个自己用到的数据集

KITTI

KITTI数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办，是一个自动驾驶场景下的大规模数据集。
KITTI数据采集平台包括2个灰度摄像机，2个彩色摄像机，一个Velodyne 3D激光雷达，4个光学镜头，以及1个GPS导航系统。
往往常用的是左侧彩色摄像头和激光雷达传感器。

数据格式

Values	Name	Description
1	type	Describes the type of object: ‘Car’, ‘Van’, ‘Truck’, ‘Pedestrian’, ‘Person_sitting’, ‘Cyclist’, ‘Tram’, ‘Misc’ or ‘DontCare’
1	truncated	Float from 0 (non-truncated) to 1 (truncated), where truncated refers to the object leaving image boundaries
1	occluded	Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded 2 = largely occluded, 3 = unknown
1	alpha	Observation angle of object, ranging [-pi…pi]
4	bbox	2D bounding box of object in the image (0-based index): contains left, top, right, bottom pixel coordinates
3	dimensions	3D object dimensions: height, width, length (in meters)
3	location	3D object location x,y,z in camera coordinates (in meters)
1	rotation_y	Rotation ry around Y-axis in camera coordinates [-pi…pi]
1	score	Only for results: Float, indicating confidence in detection, needed for p/r curves, higher is better.

KITTI 2d Detection

下载以下两项即可：

Download left color images of object data set (12 GB)
Download training labels of object data set (5 MB)

训练注意事项

类别：‘Car’-汽车, ‘Van’-厢式货车, ‘Truck’-载货卡车, ‘Pedestrian’-行人, ‘Person_sitting’, ‘Cyclist’-骑车人, ‘Tram’-电车, ‘Misc’ or ‘DontCare’

CrowdHuman

CrowdHuman数据集是旷世发布的用于行人检测的数据集，图片数据大多来自于google搜索。约每张图片包含23个人，同时存在各种各样的遮挡。每个人类实例都用头部边界框、人类可见区域边界框和人体全身边界框注释。

数据集准备

CrowdHuman官网
 个人链接
数据集包括3个train，1一个val，1个test，两个odgt格式的标签文件

数据处理

官方给的是odgt标注格式，一般不用。例如，我想用YOLO进行训练，就需要进行格式转换。代码如下
odgt标注格式转为yolo格式

import os
import json
from PIL import Image

def load_func(fpath):
    """
    Load and parse ODGT file
    """
    assert os.path.exists(fpath), f"File not found: {fpath}"
    with open(fpath, 'r') as fid:
        lines = fid.readlines()
    records = [json.loads(line.strip('\n')) for line in lines]
    return records


def convert_crowdhuman_odgt_to_txt(odgt_path, output_dir):
    """
    Convert CrowdHuman ODGT annotations to COCO format TXT files
    将边界框坐标从[x, y, w, h]转换为[x_center, y_center, w, h]
    并进行归一化
    """
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Create classes.txt
    with open(os.path.join(output_dir, 'classes.txt'), 'w') as f:
        f.write('person\n')

    # Load ODGT annotations
    bbox_records = load_func(odgt_path)

    # Process each record
    for record in bbox_records:
        # Get image ID
        image_id = record['ID']
        txt_filename = f"{image_id}.txt"
        txt_path = os.path.join(output_dir, txt_filename)

        # Get image size 这里也需要修改
        img_path = os.path.join(os.path.dirname(odgt_path),"images",
                                f"{image_id}.jpg")  # Assuming images are in the same folder as ODGT
        img = Image.open(img_path)
        img_width, img_height = img.size

        with open(txt_path, 'w') as f:
            for bbox in record['gtboxes']:
                # 跳过mask标签和需要忽略的框
                if bbox['tag'] == 'mask' or bbox.get('extra', {}).get('ignore', 0) == 1:
                    continue

                # 获取全身框坐标 [x, y, w, h]
                x, y, w, h = bbox['fbox']

                # 计算中心点坐标
                x_center = x + w / 2
                y_center = y + h / 2

                # 归一化坐标
                x_center /= img_width
                y_center /= img_height
                w /= img_width
                h /= img_height

                # 写入COCO格式：class_id x_center y_center width height
                bbox_str = f"0 {x_center} {y_center} {w} {h}"
                f.write(f"{bbox_str}\n")


def main():
    # 需要修改的两处
    odgt_path = './data/val/annotation_val.odgt'  # Replace with your ODGT file path
    output_dir = './data/val/labels'  # Replace with desired output directory

    try:
        convert_crowdhuman_odgt_to_txt(odgt_path, output_dir)
        print(f"转换完成。标签文件保存在: {output_dir}")
        print("\n每个txt文件的格式:")
        print("class_id x_center y_center width height")
        print("其中class_id=0表示person类")
        print("注意：坐标格式为COCO格式（中心点坐标 + 宽高）")

    except Exception as e:
        print(f"转换过程中出错: {str(e)}")


if __name__ == "__main__":
    main()

其中，需要在main函数中修改数据集的位置。以及如果图片不和odgt文件在一个文件夹下，需要修改convert_crowdhuman_odgt_to_txt函数。（PS：这里才发现这个数据集真是从网上爬来的，很多图甚至还有水印，大小也都不一致。）

Nuscenes数据集

该数据集是用于自动驾驶的公共大规模数据集，收集了波士顿和新加坡的1000个驾驶场景。相机运行在12Hz，而激光雷达运行在20Hz。

数据集结构

- v1.0-mini
    - maps
        four maps(.jpg)
    - samples
        - CAM_BACK
        - CAM_BACK_LEFT
        - LIDAR_TOP
        - RADAR_BACK_LEFT
        ...(the sensors' data)
    - sweeps
        same to 'samples'
        # 过渡帧或中间帧
    - v1.0-mini
        labels