Files
cursebreaker-parser-rust/DESIGN.md
2025-12-30 12:16:52 +09:00

7.4 KiB
Raw Blame History

Unity Parser Design Document

Overview

Unity Parser is a Rust library for parsing local Unity projects (scenes and prefabs) from their YAML representation (.unity and .prefab files) and loading the resulting data into an ECS world.

The primary goal is to enable users to:

  • Selectively extract only the data they care about (minimal memory footprint).
  • Mirror Unity MonoBehaviour types in Rust with minimal boilerplate.
  • Query the fully instantiated scene (including all nested prefabs) using ECS queries.

Use cases include:

  • Modding tools
  • Static analysis
  • Database generation
  • Asset inspection / reporting
  • Custom exporters

The library is offline-only it works exclusively on exported Unity project files (YAML + assets). No runtime or in-engine integration is planned.

Core Principles

  • Minimal memory usage: Only parse and store components explicitly requested by the user.
  • Fast setup: Users declare desired types via a single procedural macro.
  • Full prefab instantiation: All prefabs (including nested/variant) are fully expanded into the scene.
  • Simple querying: Users work directly with the ECS world (Sparsey) or optional helper methods.

Architecture

ECS Backend

  • Sparsey is used as the ECS implementation.
    • Rationale: Lightweight, excellent insertion performance, no archetype overhead.
    • Query performance trade-off is acceptable because queries are infrequent (typically once or a few times per tool run, not per-frame like in games).
  • Each loaded scene gets its own World (Sparsey terminology).
  • The ECS world is exposed directly to users for maximum flexibility.
  • Optional ergonomic helpers may be added later (e.g., scene.foreach::<(GameObject, Transform, Interactable)>(|...|)).

Data Flow

  1. User configures which component types to parse (via macro).
  2. Library scans project for relevant .unity, .prefab, and .meta files.
  3. Scenes and prefabs are streamed parsed (YAML).
  4. Only declared components are deserialized and inserted.
  5. Prefab instances are recursively instantiated (new fileID mapping per nesting level).
  6. After all objects are created, world transforms are computed in a post-process pass.
  7. Resulting World is returned (or cached).

User Configuration

Users declare all desired types with a single procedural macro:

#[unity_parser(
    // Built-in Unity components (non-script)
    unity_types(Transform, MeshFilter, MeshRenderer, Collider /* ... */),
    
    // Custom MonoBehaviour components
    custom_types(Interactable, Harvestable, LootContainer, EnemyAI),
    
    // Asset types beyond scenes/prefabs (future extension)
    asset_types(/* Material, Texture2D */)
)]
struct MyProjectConfig;

Rules

  • unity_types: Built-in Unity components (no associated script).
  • custom_types: User-defined structs that mirror MonoBehaviour scripts.
    • Struct name must exactly match the C# class name.
    • The parser will automatically locate the corresponding .cs file to extract its GUID for matching YAML entries.
  • Users must explicitly list every component they want. Nothing is parsed by default.
  • Examples and common sets will be provided in documentation.

Component Definition

Components are plain Rust structs mirroring Unitys serialized fields.

#[derive(Component)]
struct Transform {
    local_position: Vec3,
    local_rotation: Quat,
    local_scale: Vec3,
    world_matrix: Mat4,        // Computed in post-process
    parent: Option<Entity>,
    children: Vec<Entity>,
}

#[derive(Component)]
struct Interactable {
    interaction_prompt: String,
    radius: f32,
}
  • Users can implement custom parsing logic if needed.
  • Derive macros will offer automatic field parsing for common cases.

Special Cases

  • GameObject: Not a true component, but stored as a component containing:
    • name: String
    • layer: u32
    • active: bool

Prefab Instantiation

  • Full support for nested prefabs (modern Unity prefab workflow).
  • Strategy:
    • Prefabs are parsed exactly like scenes.
    • When a PrefabInstance is encountered, the referenced prefab is loaded recursively.
    • A new HashMap<fileID → Entity> mapping is created for each nesting level.
    • Overrides are applied only to property values (via propertyPath).
    • Current scope: only property overrides are applied.
    • TODO: Support added/removed components, reordered children, removed GameObjects.

Asset Handling

All parsable assets implement a trait:

trait AssetParser {
    fn extensions() -> &'static [&'static str];
    fn parse(yaml: &YamlNode, context: &ParseContext) -> Result<Self>;
}
  • Built-in: .unity (scenes), .prefab (prefabs).
  • .meta files are parsed to build GUID ↔ path mappings.
  • Future extension possible for other YAML assets (e.g., ScriptableObjects).

Selective Parsing & Memory

  • Only components listed in the config macro are parsed.
  • During YAML streaming, unknown component types (!u!XXX) are completely skipped no allocation, no temporary structures.
  • Goal: Load even very large scenes (hundreds of thousands of objects) into moderate RAM when only a subset of components is requested.

Transform Hierarchy

  • Local transforms are parsed immediately.
  • Parent/child relationships are recorded.
  • World matrices and full hierarchy are computed in a single post-process pass after all entities exist.

Caching

  • Optional caching to SQLite.
  • Single database file containing all scenes.
  • Tables:
    • scenes(scene_path PRIMARY KEY, hash, timestamp)
    • entities(entity_id, scene_path, gameobject_name, layer, active)
    • One table per component type (e.g., transform, interactable)
  • Cache contains only final ECS data (post-instantiation, post-transform pass).
  • No sophisticated invalidation: user controls caching via flag/option.
    • parse(..., use_cache: bool)
    • CLI: --cache / --no-cache
  • Cache is regenerated completely when enabled and source files are newer or cache missing.

API Sketch

let world = unity_parser::parse::<MyProjectConfig>(
    project_root: "/path/to/unity/project",
    scenes: vec!["Assets/Scenes/Level1.unity"],
    use_cache: true,
    max_parallel: Some(4),
)?;
  • ParserBuilder may be added later for more configuration.
  • Parallel parsing of independent scenes/prefabs is supported (rayon, limited to 4 jobs by default to control memory).

Error Handling

  • Malformed YAML or missing references: log warning/error, continue parsing.
  • Missing expected component fields: log, insert default/None where possible.
  • Critical failures (e.g., corrupted scene file): return Err.

Future Considerations / TODOs

  • ParserBuilder API
  • Automatic derive for common component parsing
  • Support for added/removed components in prefab overrides
  • Component serialization versioning
  • More asset types (Materials, Animators, etc.)
  • Binary cache format for faster loading
  • Helper query methods on top of raw Sparsey API

Testing

To test this repo, another project will be made in the same repository directory that will load the "Cursebreaker" game that can be found at a certain path that can be configured in the .env file.

Summary

Unity Parser aims to be the fastest, most memory-efficient way to extract structured data from Unity YAML projects in Rust, with a focus on user-defined components and full prefab instantiation. By leveraging Sparsey and aggressive selective parsing, it enables tools that process massive Unity scenes on ordinary hardware.