Files
cursebreaker-parser-rust/DESIGN.md
2025-12-30 18:48:18 +09:00

6.2 KiB

Cursebreaker Unity Parser - Design Document

Project Overview

A high-performance Rust library for parsing and querying Unity project files (.unity scenes, .prefab prefabs, and .asset ScriptableObjects).

Goals

  1. Parse Unity YAML Format: Handle Unity's YAML 1.1 format with custom tags (!u!) and file ID references
  2. Extract Structure: Parse GameObjects, Components, and their properties into queryable data structures
  3. High Performance: Optimized for large Unity projects with minimal memory footprint
  4. Type Safety: Strong typing for Unity's component system
  5. Library-First: Designed as a reusable SDK for other Rust tools

Target File Formats

  • .unity - Unity scene files
  • .prefab - Unity prefab files
  • .asset - Unity ScriptableObject and other asset files

All three formats share the same underlying YAML structure with Unity-specific extensions.

Unity File Format Structure

Unity files use YAML 1.1 with special conventions:

%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!1 &1866116814460599870
GameObject:
  m_ObjectHideFlags: 0
  m_Component:
  - component: {fileID: 8151827567463220614}
  - component: {fileID: 8755205353704683373}
  m_Name: CardGrabber
--- !u!224 &8151827567463220614
RectTransform:
  m_GameObject: {fileID: 1866116814460599870}
  m_LocalPosition: {x: 0, y: 0, z: 0}

Key Concepts

  1. Documents: Each --- starts a new YAML document representing a Unity object
  2. Type Tags: !u!N indicates Unity type (e.g., !u!1 = GameObject, !u!224 = RectTransform)
  3. Anchors: &ID defines a local file ID for the object
  4. File References: {fileID: N} references objects by their ID (local or external)
  5. GUID References: {guid: ...} references external assets
  6. Properties: All Unity objects have serialized fields (usually prefixed with m_)

Architecture

Core Components

cursebreaker-parser/
├── src/
│   ├── lib.rs              # Public API exports
│   ├── parser/             # YAML parsing layer
│   │   ├── mod.rs
│   │   ├── yaml.rs         # YAML document parser
│   │   ├── unity_tag.rs    # Unity type tag handler (!u!)
│   │   └── reference.rs    # FileID/GUID reference parser
│   ├── model/              # Data model
│   │   ├── mod.rs
│   │   ├── document.rs     # UnityDocument struct
│   │   ├── object.rs       # UnityObject base
│   │   ├── gameobject.rs   # GameObject type
│   │   ├── component.rs    # Component types
│   │   └── property.rs     # Property value types
│   ├── types/              # Unity type system
│   │   ├── mod.rs
│   │   ├── type_id.rs      # Unity type ID -> name mapping
│   │   └── component_types.rs
│   ├── query/              # Query API
│   │   ├── mod.rs
│   │   ├── project.rs      # UnityProject (multi-file)
│   │   ├── find.rs         # Find objects/components
│   │   └── filter.rs       # Filter/search utilities
│   └── error.rs            # Error types

Data Model

// Core types
pub struct UnityFile {
    pub path: PathBuf,
    pub documents: Vec<UnityDocument>,
}

pub struct UnityDocument {
    pub type_id: u32,           // From !u!N
    pub file_id: i64,           // From &ID
    pub class_name: String,     // E.g., "GameObject"
    pub properties: PropertyMap,
}

pub struct UnityProject {
    pub files: HashMap<PathBuf, UnityFile>,
    // Reference resolution cache
}

// Property values (simplified)
pub enum PropertyValue {
    Integer(i64),
    Float(f64),
    String(String),
    Boolean(bool),
    FileRef { file_id: i64, guid: Option<String> },
    Vector3 { x: f64, y: f64, z: f64 },
    Color { r: f64, g: f64, b: f64, a: f64 },
    Array(Vec<PropertyValue>),
    Object(PropertyMap),
}

Performance Considerations

  1. Streaming Parser: Parse YAML incrementally rather than loading entire file into memory
  2. Lazy Loading: Only parse files when accessed
  3. Reference Caching: Cache resolved references to avoid repeated lookups
  4. Zero-Copy Where Possible: Use string slices and borrowed data where feasible
  5. Parallel Parsing: Support parsing multiple files concurrently

Dependencies

  • yaml-rust2 or serde_yaml - YAML parsing (evaluate both)
  • serde - Serialization/deserialization
  • rayon - Parallel processing (optional, for multi-file parsing)
  • thiserror - Error handling
  • indexmap - Ordered maps for properties

Testing Strategy

  1. Unit Tests: Each parser component tested independently
  2. Integration Tests: Full file parsing with real Unity files
  3. Sample Data: Use PiratePanic project as test corpus
  4. Benchmarks: Performance tests on large Unity projects
  5. Fuzzing: Fuzz testing for parser robustness (future)

API Design Goals

Simple File Parsing

let file = UnityFile::from_path("Scene.unity")?;
for doc in &file.documents {
    println!("{}: {}", doc.class_name, doc.file_id);
}

Query API

let project = UnityProject::from_directory("Assets/")?;

// Find all GameObjects
let objects = project.find_all_by_type("GameObject");

// Find by name
let player = project.find_by_name("Player")?;

// Get components
let transform = player.get_component("Transform")?;
let position = transform.get_vector3("m_LocalPosition")?;

Reference Resolution

// Follow references automatically
let gameobject = project.get_object(file_id)?;
let transform_ref = gameobject.get_file_ref("m_Component[0].component")?;
let transform = project.resolve_reference(transform_ref)?;

Future Enhancements (Out of Scope for v1)

  • Unity YAML serialization (writing files)
  • C# script parsing
  • Asset dependency graphs
  • Unity version detection and compatibility
  • Binary .unity format support (older Unity versions)
  • Meta file parsing (.meta files)

Success Criteria

  1. Successfully parse all files in PiratePanic sample project
  2. Extract all GameObjects and Components with properties
  3. Resolve all internal file references correctly
  4. Parse large scene files (>10MB) in <100ms
  5. Memory usage scales linearly with file size
  6. Clean, documented public API