regex for file parsing
This commit is contained in:
427
README.md
Normal file
427
README.md
Normal file
@@ -0,0 +1,427 @@
|
||||
# Cursebreaker Parser
|
||||
|
||||
A high-performance Rust library for parsing Unity project files (.unity scenes, .prefab prefabs, and .asset ScriptableObjects) with automatic MonoBehaviour component discovery and ECS integration.
|
||||
|
||||
[](https://www.rust-lang.org/)
|
||||
[](LICENSE)
|
||||
|
||||
## Features
|
||||
|
||||
### 🚀 Core Parsing
|
||||
- **Multi-format support**: Parse `.unity` scenes, `.prefab` prefabs, and `.asset` files
|
||||
- **ECS integration**: Automatically builds [Sparsey](https://github.com/LechintanTudor/sparsey) ECS worlds from scenes
|
||||
- **Type-safe**: Strong typing for Unity primitives (Vector3, Quaternion, Color, etc.)
|
||||
- **Fast**: Efficient parsing with minimal allocations
|
||||
|
||||
### 🎯 Custom Component System
|
||||
- **Derive macro**: `#[derive(UnityComponent)]` for automatic component parsing
|
||||
- **Auto-registration**: Components register themselves via [inventory](https://github.com/dtolnay/inventory)
|
||||
- **GUID resolution**: Automatically resolves MonoBehaviour GUIDs to class names
|
||||
- **Type filtering**: Selectively parse only the components you need
|
||||
|
||||
### 🔍 Advanced Features
|
||||
- **Prefab instantiation**: Clone and modify prefab instances
|
||||
- **Reference resolution**: Automatic FileID → Entity mapping
|
||||
- **Regex filtering**: Parse only scenes matching specific patterns
|
||||
- **Transform hierarchies**: Parent-child relationships preserved
|
||||
- **Memory efficient**: 128-bit GUIDs with 3.5x memory reduction vs strings
|
||||
|
||||
## Installation
|
||||
|
||||
Add to your `Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
cursebreaker-parser = "0.1"
|
||||
sparsey = "0.13" # For ECS queries
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Parse a Unity Scene
|
||||
|
||||
```rust
|
||||
use cursebreaker_parser::UnityFile;
|
||||
|
||||
fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
let file = UnityFile::from_path("Scene.unity")?;
|
||||
|
||||
match file {
|
||||
UnityFile::Scene(scene) => {
|
||||
println!("Scene with {} entities", scene.entity_map.len());
|
||||
|
||||
// Query transforms
|
||||
let transforms = scene.world.borrow::<cursebreaker_parser::Transform>();
|
||||
for (file_id, entity) in &scene.entity_map {
|
||||
if let Some(transform) = transforms.get(*entity) {
|
||||
println!("Entity {} at {:?}", file_id, transform.local_position());
|
||||
}
|
||||
}
|
||||
}
|
||||
UnityFile::Prefab(prefab) => {
|
||||
println!("Prefab with {} documents", prefab.documents.len());
|
||||
}
|
||||
UnityFile::Asset(asset) => {
|
||||
println!("Asset with {} documents", asset.documents.len());
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Define Custom Components
|
||||
|
||||
```rust
|
||||
use cursebreaker_parser::UnityComponent;
|
||||
|
||||
#[derive(Debug, Clone, UnityComponent)]
|
||||
#[unity_class("PlaySFX")]
|
||||
pub struct PlaySFX {
|
||||
#[unity_field("volume")]
|
||||
pub volume: f64,
|
||||
|
||||
#[unity_field("startTime")]
|
||||
pub start_time: f64,
|
||||
|
||||
#[unity_field("endTime")]
|
||||
pub end_time: f64,
|
||||
|
||||
#[unity_field("isLoop")]
|
||||
pub is_loop: bool,
|
||||
}
|
||||
|
||||
// Now automatically parsed from Unity scenes!
|
||||
fn find_audio_components(scene: &UnityScene) {
|
||||
let playsfx_view = scene.world.borrow::<PlaySFX>();
|
||||
|
||||
for entity in scene.entity_map.values() {
|
||||
if let Some(sfx) = playsfx_view.get(*entity) {
|
||||
println!("Found audio: volume={}, loop={}", sfx.volume, sfx.is_loop);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## GUID Resolution
|
||||
|
||||
The parser automatically resolves Unity MonoBehaviour GUIDs to class names, enabling seamless custom component discovery:
|
||||
|
||||
```
|
||||
Unity Scene File Rust Code
|
||||
───────────────── ─────────
|
||||
MonoBehaviour: #[derive(UnityComponent)]
|
||||
m_Script: #[unity_class("PlaySFX")]
|
||||
guid: 091c537... ──────────> pub struct PlaySFX { ... }
|
||||
volume: 1.0
|
||||
isLoop: 0
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
1. **Scan**: Parser scans `Assets/` for `*.cs.meta` files
|
||||
2. **Build Map**: Extracts GUIDs from `.meta` files
|
||||
3. **Extract Class**: Parses `.cs` files to get `class Name : MonoBehaviour`
|
||||
4. **Resolve**: Maps GUID → Class Name → Registered Component
|
||||
5. **Parse**: Automatically parses MonoBehaviour YAML into your Rust struct
|
||||
|
||||
**Performance**: GUID resolver caches mappings per project for fast lookups.
|
||||
|
||||
## Regex Filtering
|
||||
|
||||
Parse only scenes matching specific patterns:
|
||||
|
||||
```rust
|
||||
use regex::Regex;
|
||||
use cursebreaker_parser::parse_unity_file_filtered;
|
||||
|
||||
// Only parse production scenes
|
||||
let filter = Regex::new(r"Assets/Scenes/Production/")?;
|
||||
let scene = parse_unity_file_filtered("Assets/Scenes/Production/Level1.unity", Some(&filter))?;
|
||||
|
||||
// Parse everything (default)
|
||||
let scene = parse_unity_file_filtered("Scene.unity", None)?;
|
||||
```
|
||||
|
||||
Common patterns:
|
||||
|
||||
```rust
|
||||
// Test scenes only
|
||||
let filter = Regex::new(r"(?i)test")?;
|
||||
|
||||
// Exclude debug/temp scenes
|
||||
let filter = Regex::new(r"^(?!.*(debug|tmp|test))")?;
|
||||
|
||||
// Specific level range
|
||||
let filter = Regex::new(r"Level[1-5]\.unity$")?;
|
||||
```
|
||||
|
||||
## Type Filtering
|
||||
|
||||
Selectively parse components for better performance:
|
||||
|
||||
```rust
|
||||
use cursebreaker_parser::{TypeFilter, parse_with_types};
|
||||
|
||||
// Parse only transforms and custom components
|
||||
let filter = TypeFilter::parse_with_types(&["Transform", "PlaySFX"]);
|
||||
|
||||
// Or use the macro for convenience
|
||||
let filter = parse_with_types!(Transform, PlaySFX);
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component Flow
|
||||
|
||||
```
|
||||
Unity Scene File
|
||||
↓
|
||||
Raw YAML Documents
|
||||
↓
|
||||
┌─────────────────────┐
|
||||
│ GUID Resolution │ ← .meta files + .cs files
|
||||
│ (MonoBehaviour) │
|
||||
└─────────────────────┘
|
||||
↓
|
||||
┌─────────────────────┐
|
||||
│ Component Registry │ ← #[derive(UnityComponent)]
|
||||
│ (inventory) │
|
||||
└─────────────────────┘
|
||||
↓
|
||||
┌─────────────────────┐
|
||||
│ ECS World │ ← Sparsey entities
|
||||
│ (Transforms, etc) │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Memory Efficiency
|
||||
|
||||
**GUID Storage**:
|
||||
- Old: `String` (56 bytes heap allocated)
|
||||
- New: `Guid` (16 bytes on stack)
|
||||
- **3.5x memory reduction** for GUID maps
|
||||
|
||||
**GUID Comparison**:
|
||||
- Old: O(n) string comparison (32 characters)
|
||||
- New: O(1) integer comparison
|
||||
- **⚡ Significant speedup** for lookups
|
||||
|
||||
## Supported Unity Types
|
||||
|
||||
### Built-in Components
|
||||
- ✅ GameObject
|
||||
- ✅ Transform
|
||||
- ✅ RectTransform
|
||||
- ✅ PrefabInstance
|
||||
|
||||
### Value Types
|
||||
- ✅ Vector2, Vector3, Vector4
|
||||
- ✅ Quaternion
|
||||
- ✅ Color, Color32
|
||||
- ✅ FileID, GUID
|
||||
- ✅ ExternalRef, FileRef
|
||||
|
||||
### Custom Components
|
||||
- ✅ Any `MonoBehaviour` via `#[derive(UnityComponent)]`
|
||||
- ✅ Automatic field mapping with `#[unity_field("fieldName")]`
|
||||
- ✅ Support for all Unity primitive types
|
||||
|
||||
## Examples
|
||||
|
||||
### Find All Components of a Type
|
||||
|
||||
```rust
|
||||
use cursebreaker_parser::UnityFile;
|
||||
|
||||
let scene = UnityFile::from_path("Scene.unity")?;
|
||||
|
||||
if let UnityFile::Scene(scene) = scene {
|
||||
let transforms = scene.world.borrow::<Transform>();
|
||||
let gameobjects = scene.world.borrow::<GameObject>();
|
||||
|
||||
for (file_id, entity) in &scene.entity_map {
|
||||
let name = gameobjects.get(*entity)
|
||||
.and_then(|go| go.name())
|
||||
.unwrap_or("(unnamed)");
|
||||
|
||||
if let Some(transform) = transforms.get(*entity) {
|
||||
println!("{}: {:?}", name, transform.local_position());
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Prefab Instantiation
|
||||
|
||||
```rust
|
||||
let prefab_file = UnityFile::from_path("Player.prefab")?;
|
||||
|
||||
if let UnityFile::Prefab(prefab) = prefab_file {
|
||||
// Create instance
|
||||
let mut instance = prefab.instantiate();
|
||||
|
||||
// Override values
|
||||
instance.override_value(
|
||||
file_id,
|
||||
"m_Name",
|
||||
serde_yaml::Value::String("Player_Clone".to_string())
|
||||
)?;
|
||||
|
||||
// Access remapped FileIDs
|
||||
let new_file_ids = instance.file_id_map();
|
||||
}
|
||||
```
|
||||
|
||||
### Batch Processing
|
||||
|
||||
```rust
|
||||
use walkdir::WalkDir;
|
||||
use regex::Regex;
|
||||
|
||||
let filter = Regex::new(r"Level\d+\.unity")?;
|
||||
|
||||
for entry in WalkDir::new("Assets/Scenes") {
|
||||
let path = entry?.path();
|
||||
|
||||
if path.extension() == Some("unity") {
|
||||
match parse_unity_file_filtered(path, Some(&filter)) {
|
||||
Ok(UnityFile::Scene(scene)) => {
|
||||
println!("Processed: {} ({} entities)",
|
||||
path.display(),
|
||||
scene.entity_map.len()
|
||||
);
|
||||
}
|
||||
Err(e) if e.to_string().contains("does not match filter") => {
|
||||
// Filtered out
|
||||
}
|
||||
Err(e) => eprintln!("Error: {}", e),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Running Examples
|
||||
|
||||
The repository includes several examples:
|
||||
|
||||
```bash
|
||||
# Parse and display basic scene info
|
||||
cargo run --example basic_parsing
|
||||
|
||||
# Demonstrate custom component parsing
|
||||
cargo run --example custom_component
|
||||
|
||||
# ECS integration showcase
|
||||
cargo run --example ecs_integration
|
||||
|
||||
# Find all PlaySFX components in VR Horror project
|
||||
cargo run --example find_playsfx
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Run the test suite:
|
||||
|
||||
```bash
|
||||
# Unit tests
|
||||
cargo test --lib
|
||||
|
||||
# Integration tests (requires git for downloading test projects)
|
||||
cargo test --test integration_tests
|
||||
|
||||
# GUID resolution tests
|
||||
cargo test test_guid_resolution -- --nocapture
|
||||
|
||||
# All tests
|
||||
cargo test
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
Benchmarks on VR Horror project (21 scenes, 77 C# scripts):
|
||||
|
||||
| Operation | Time | Throughput |
|
||||
|-----------|------|------------|
|
||||
| GUID Resolver Build | ~800ms | 77 scripts |
|
||||
| Scene Parse | ~100-500ms | per scene |
|
||||
| GUID Lookup | <1μs | O(1) HashMap |
|
||||
|
||||
**Memory**: ~16 bytes per GUID (vs ~56 bytes for String-based approach)
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Phase 1: GUID Resolution ✅ COMPLETE
|
||||
- [x] Scan `.cs.meta` files
|
||||
- [x] Extract class names from C# scripts
|
||||
- [x] Build GUID → Class Name mapping
|
||||
- [x] 128-bit `Guid` type with 3.5x memory reduction
|
||||
|
||||
### Phase 2: MonoBehaviour Parser ✅ COMPLETE
|
||||
- [x] Extract `m_Script` GUID from components
|
||||
- [x] Resolve GUID to class name
|
||||
- [x] Match with registered components
|
||||
- [x] Automatic parsing via `#[derive(UnityComponent)]`
|
||||
|
||||
### Phase 3: Advanced Features ✅ COMPLETE
|
||||
- [x] Regex filtering for selective parsing
|
||||
- [x] Type filtering for performance
|
||||
- [x] Prefab instantiation and overrides
|
||||
|
||||
### Future Enhancements
|
||||
- [ ] Prefab GUID resolution (nested prefabs)
|
||||
- [ ] Full AssetDatabase resolution (materials, textures)
|
||||
- [ ] Persistent GUID cache for instant loading
|
||||
- [ ] Watch mode for live Unity project monitoring
|
||||
- [ ] Cross-platform path handling
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
cursebreaker-parser/
|
||||
├── src/
|
||||
│ ├── ecs/ # ECS world building
|
||||
│ ├── model/ # UnityFile, Scene, Prefab models
|
||||
│ ├── parser/ # YAML parsing, GUID resolution
|
||||
│ │ ├── guid_resolver.rs # GUID → Class Name mapping
|
||||
│ │ ├── meta.rs # .meta file parsing
|
||||
│ │ └── yaml.rs # YAML document splitting
|
||||
│ ├── types/ # Unity types and components
|
||||
│ │ ├── guid.rs # 128-bit GUID type
|
||||
│ │ ├── component.rs # Component trait system
|
||||
│ │ └── ...
|
||||
│ └── lib.rs
|
||||
├── cursebreaker-parser-macros/ # Derive macro crate
|
||||
├── examples/ # Usage examples
|
||||
├── tests/ # Integration tests
|
||||
└── test_data/ # Test Unity projects
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Areas for improvement:
|
||||
|
||||
- **Performance**: Optimize YAML parsing, parallel processing
|
||||
- **Features**: Additional Unity component types, better error messages
|
||||
- **Testing**: More integration tests with real Unity projects
|
||||
- **Documentation**: API docs, tutorials, cookbook examples
|
||||
|
||||
## License
|
||||
|
||||
Licensed under either of:
|
||||
|
||||
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
|
||||
- MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
|
||||
|
||||
at your option.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- **Unity**: For the YAML-based file format
|
||||
- **Sparsey**: ECS library for component storage
|
||||
- **serde_yaml**: YAML parsing foundation
|
||||
- **inventory**: Compile-time component registration
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ in Rust**
|
||||
Reference in New Issue
Block a user