# Cursebreaker Unity Parser - Design Document ## Project Overview A high-performance Rust library for parsing and querying Unity project files (.unity scenes, .prefab prefabs, and .asset ScriptableObjects). ## Goals 1. **Parse Unity YAML Format**: Handle Unity's YAML 1.1 format with custom tags (`!u!`) and file ID references 2. **Extract Structure**: Parse GameObjects, Components, and their properties into queryable data structures 3. **High Performance**: Optimized for large Unity projects with minimal memory footprint 4. **Type Safety**: Strong typing for Unity's component system 5. **Library-First**: Designed as a reusable SDK for other Rust tools ## Target File Formats - `.unity` - Unity scene files - `.prefab` - Unity prefab files - `.asset` - Unity ScriptableObject and other asset files All three formats share the same underlying YAML structure with Unity-specific extensions. ## Unity File Format Structure Unity files use YAML 1.1 with special conventions: ```yaml %YAML 1.1 %TAG !u! tag:unity3d.com,2011: --- !u!1 &1866116814460599870 GameObject: m_ObjectHideFlags: 0 m_Component: - component: {fileID: 8151827567463220614} - component: {fileID: 8755205353704683373} m_Name: CardGrabber --- !u!224 &8151827567463220614 RectTransform: m_GameObject: {fileID: 1866116814460599870} m_LocalPosition: {x: 0, y: 0, z: 0} ``` ### Key Concepts 1. **Documents**: Each `---` starts a new YAML document representing a Unity object 2. **Type Tags**: `!u!N` indicates Unity type (e.g., `!u!1` = GameObject, `!u!224` = RectTransform) 3. **Anchors**: `&ID` defines a local file ID for the object 4. **File References**: `{fileID: N}` references objects by their ID (local or external) 5. **GUID References**: `{guid: ...}` references external assets 6. **Properties**: All Unity objects have serialized fields (usually prefixed with `m_`) ## Architecture ### Core Components ``` cursebreaker-parser/ ├── src/ │ ├── lib.rs # Public API exports │ ├── parser/ # YAML parsing layer │ │ ├── mod.rs │ │ ├── yaml.rs # YAML document parser │ │ ├── unity_tag.rs # Unity type tag handler (!u!) │ │ └── reference.rs # FileID/GUID reference parser │ ├── model/ # Data model │ │ ├── mod.rs │ │ ├── document.rs # UnityDocument struct │ │ ├── object.rs # UnityObject base │ │ ├── gameobject.rs # GameObject type │ │ ├── component.rs # Component types │ │ └── property.rs # Property value types │ ├── types/ # Unity type system │ │ ├── mod.rs │ │ ├── type_id.rs # Unity type ID -> name mapping │ │ └── component_types.rs │ ├── query/ # Query API │ │ ├── mod.rs │ │ ├── project.rs # UnityProject (multi-file) │ │ ├── find.rs # Find objects/components │ │ └── filter.rs # Filter/search utilities │ └── error.rs # Error types ``` ### Data Model ```rust // Core types pub struct UnityFile { pub path: PathBuf, pub documents: Vec, } pub struct UnityDocument { pub type_id: u32, // From !u!N pub file_id: i64, // From &ID pub class_name: String, // E.g., "GameObject" pub properties: PropertyMap, } pub struct UnityProject { pub files: HashMap, // Reference resolution cache } // Property values (simplified) pub enum PropertyValue { Integer(i64), Float(f64), String(String), Boolean(bool), FileRef { file_id: i64, guid: Option }, Vector3 { x: f64, y: f64, z: f64 }, Color { r: f64, g: f64, b: f64, a: f64 }, Array(Vec), Object(PropertyMap), } ``` ## Performance Considerations 1. **Streaming Parser**: Parse YAML incrementally rather than loading entire file into memory 2. **Lazy Loading**: Only parse files when accessed 3. **Reference Caching**: Cache resolved references to avoid repeated lookups 4. **Zero-Copy Where Possible**: Use string slices and borrowed data where feasible 5. **Parallel Parsing**: Support parsing multiple files concurrently ## Dependencies - `yaml-rust2` or `serde_yaml` - YAML parsing (evaluate both) - `serde` - Serialization/deserialization - `rayon` - Parallel processing (optional, for multi-file parsing) - `thiserror` - Error handling - `indexmap` - Ordered maps for properties ## Testing Strategy 1. **Unit Tests**: Each parser component tested independently 2. **Integration Tests**: Full file parsing with real Unity files 3. **Sample Data**: Use PiratePanic project as test corpus 4. **Benchmarks**: Performance tests on large Unity projects 5. **Fuzzing**: Fuzz testing for parser robustness (future) ## API Design Goals ### Simple File Parsing ```rust let file = UnityFile::from_path("Scene.unity")?; for doc in &file.documents { println!("{}: {}", doc.class_name, doc.file_id); } ``` ### Query API ```rust let project = UnityProject::from_directory("Assets/")?; // Find all GameObjects let objects = project.find_all_by_type("GameObject"); // Find by name let player = project.find_by_name("Player")?; // Get components let transform = player.get_component("Transform")?; let position = transform.get_vector3("m_LocalPosition")?; ``` ### Reference Resolution ```rust // Follow references automatically let gameobject = project.get_object(file_id)?; let transform_ref = gameobject.get_file_ref("m_Component[0].component")?; let transform = project.resolve_reference(transform_ref)?; ``` ## Future Enhancements (Out of Scope for v1) - Unity YAML serialization (writing files) - C# script parsing - Asset dependency graphs - Unity version detection and compatibility - Binary .unity format support (older Unity versions) - Meta file parsing (.meta files) ## Success Criteria 1. Successfully parse all files in PiratePanic sample project 2. Extract all GameObjects and Components with properties 3. Resolve all internal file references correctly 4. Parse large scene files (>10MB) in <100ms 5. Memory usage scales linearly with file size 6. Clean, documented public API