Data Formats
Data Formats
sgraph supports multiple data formats, each optimized for different use cases. This guide explains when and how to use each format.
Overview
| Format | Use Case | File Size | Performance | Human Readable |
|---|---|---|---|---|
| XML | Large models, production | Compact | Very Fast | Moderate |
| Deps | Scripting, simple analysis | Small | Fast | Very High |
| JSON | Web applications | Medium | Moderate | High |
| GraphML | Graph visualization tools | Large | Moderate | Low |
XML Format
The XML format is the primary format for sgraph, designed for performance and scalability.
Structure
<model version="2.1">
<elements t="architecture">
<e n="root_element">
<e n="child_element" i="2">
<r r="3" t="relationship_type" />
</e>
<e i="3" n="target_element">
</e>
</e>
</elements>
</model>
Key Features
- Integer References: Elements use integer IDs (
iattribute) for efficient relationships - Compact Representation: Minimal XML overhead
- Hierarchical Structure: Nested elements represent containment
- Relationships:
<r>tags define associations between elements - Attributes: Custom attributes stored as XML attributes
Attributes Reference
Element Attributes (<e>)
n- Name of the elementi- Unique integer identifiert- Type of the element- Custom attributes as needed
Relationship Attributes (<r>)
r- Reference to target element IDt- Type of relationship- Custom attributes for relationship metadata
Example: C Project Structure
<model version="2.1">
<elements t="c_project">
<e n="nginx">
<e n="src">
<e n="core">
<e n="nginx.c" t="source_file">
<r r="2" t="includes" />
<r r="3" t="includes" />
</e>
<e i="2" n="nginx.h" t="header_file">
</e>
<e i="3" n="config.h" t="header_file">
</e>
</e>
</e>
</e>
</elements>
</model>
Working with XML
from sgraph import SGraph
from sgraph.modelapi import ModelApi
# Create and export to XML
model = SGraph(SElement(None, ''))
# ... build your model ...
model.to_xml('output.xml')
# Load from XML
api = ModelApi(filepath='model.xml')
elements = api.getAllElements()
Deps Format
The Deps format is a simple, line-based text format perfect for scripting and quick analysis.
Structure
source_path:target_path:relationship_type
source_path:target_path:relationship_type:attribute1=value1,attribute2=value2
Examples
Basic Dependencies
/nginx/src/core/nginx.c:/nginx/src/core/nginx.h:includes
/nginx/src/core/nginx.c:/nginx/src/core/config.h:includes
/myapp/views.py:/myapp/models.py:imports
/myapp/views.py:/myapp/utils.py:imports
With Attributes
/api/user.py:/api/auth.py:imports:frequency=high,critical=true
/frontend/login.js:/api/auth.py:calls:method=POST,async=true
Hierarchical Elements Only
When no relationships exist, the format shows the hierarchical structure:
/nginx
/nginx/src
/nginx/src/core
/nginx/src/core/nginx.c
/nginx/src/core/nginx.h
Working with Deps Format
from sgraph import SGraph
from sgraph.converters.deps_to_xml import DepsToXml
from sgraph.converters.xml_to_deps import XmlToDeps
# Export to Deps
model.to_deps('dependencies.txt')
# Convert between formats
deps_to_xml = DepsToXml()
deps_to_xml.convert('dependencies.txt', 'model.xml')
xml_to_deps = XmlToDeps()
xml_to_deps.convert('model.xml', 'dependencies.txt')
Command Line Usage
# Convert XML to Deps
python -m sgraph.converters.xml_to_deps model.xml output.deps
# Convert Deps to XML
python -m sgraph.converters.deps_to_xml dependencies.txt model.xml
# Filter deps files
grep "\.py:" dependencies.txt > python_deps.txt
JSON Format
JSON format provides a web-friendly representation of sgraph models.
Structure
{
"model_version": "2.1",
"root": {
"name": "root",
"type": "root",
"children": [
{
"name": "module1",
"type": "module",
"attributes": {
"language": "python",
"lines": 150
},
"children": [],
"relationships": [
{
"target_path": "/module2",
"type": "imports",
"attributes": {}
}
]
}
]
}
}
Working with JSON
from sgraph.converters.sgraph_json import SGraphJson
from sgraph.converters.xml_to_json import XmlToJson
# Convert to JSON
converter = XmlToJson()
converter.convert('model.xml', 'model.json')
# Load JSON in web applications
import json
with open('model.json', 'r') as f:
model_data = json.load(f)
GraphML Format
GraphML is a standard XML format for graphs, supported by many visualization tools.
Features
- Compatible with Gephi, yEd, Cytoscape
- Rich metadata support
- Standard format for graph exchange
Working with GraphML
from sgraph.converters.xml_to_graphml import XmlToGraphMl
# Convert to GraphML
converter = XmlToGraphMl()
converter.convert('model.xml', 'graph.graphml')
# Import into visualization tools:
# - Gephi: File > Open > graph.graphml
# - yEd: File > Open > graph.graphml
# - Cytoscape: File > Import > Network from File
Format Comparison
Performance Benchmarks
| Format | 1K Elements | 100K Elements | 1M Elements |
|---|---|---|---|
| XML Load | 10ms | 500ms | 5s |
| Deps Load | 5ms | 200ms | 2s |
| JSON Load | 15ms | 800ms | 8s |
| GraphML Load | 20ms | 1.2s | 12s |
File Size Comparison
For a typical software project with 10K elements:
| Format | File Size | Compression Ratio |
|---|---|---|
| XML | 2.5 MB | 1.0x (baseline) |
| Deps | 800 KB | 3.1x smaller |
| JSON | 4.2 MB | 1.7x larger |
| GraphML | 8.1 MB | 3.2x larger |
Best Practices
Choosing the Right Format
Use XML when:
- Working with large models (>10K elements)
- Need maximum performance
- Building production systems
- Preserving all metadata and attributes
Use Deps when:
- Simple dependency analysis
- Scripting and automation
- Human-readable output needed
- Working with shell tools (grep, awk, etc.)
Use JSON when:
- Building web applications
- Need JavaScript compatibility
- Creating REST APIs
- Moderate-sized models (<50K elements)
Use GraphML when:
- Importing into visualization tools
- Sharing with researchers
- Need standards compliance
- One-time analysis tasks
Performance Optimization
For Large Models
# Use XML format for storage
model.to_xml('large_model.xml')
# Load with ModelApi for efficient querying
api = ModelApi(filepath='large_model.xml')
# Use specific queries instead of loading all elements
functions = api.getElementsByType('function')
specific_elements = api.getElementsByName('main')
For Streaming Processing
# Process deps format line by line for very large files
def process_large_deps_file(filepath):
with open(filepath, 'r') as f:
for line in f:
if ':' in line:
parts = line.strip().split(':')
source, target, rel_type = parts[:3]
# Process dependency
yield source, target, rel_type
Memory Management
# For very large models, use streaming
from sgraph.loader.modelloader import ModelLoader
# Load incrementally
loader = ModelLoader()
for element_batch in loader.load_streaming('huge_model.xml', batch_size=1000):
# Process batch
process_elements(element_batch)
Migration Between Formats
Preserving Metadata
When converting between formats, be aware of metadata preservation:
| From → To | Elements | Relationships | Attributes | Performance |
|---|---|---|---|---|
| XML → Deps | ✅ Paths only | ✅ | ⚠️ Limited | Fast |
| XML → JSON | ✅ | ✅ | ✅ | Medium |
| XML → GraphML | ✅ | ✅ | ✅ | Slow |
| Deps → XML | ✅ | ✅ | ⚠️ Limited | Fast |
Batch Conversion
import os
from sgraph.converters.xml_to_deps import XmlToDeps
def convert_project_models(input_dir, output_dir):
"""Convert all XML models to Deps format"""
converter = XmlToDeps()
for filename in os.listdir(input_dir):
if filename.endswith('.xml'):
input_path = os.path.join(input_dir, filename)
output_path = os.path.join(output_dir, filename.replace('.xml', '.deps'))
print(f"Converting {filename}...")
converter.convert(input_path, output_path)
# Usage
convert_project_models('models/', 'deps_output/')
Advanced Features
Custom Attributes in XML
<e n="MyClass" t="class" visibility="public" complexity="high" loc="250">
<r r="2" t="inherits" strength="strong" />
</e>
Relationship Attributes in Deps
/src/main.py:/src/utils.py:imports:frequency=10,last_used=2023-12-01
Nested Attributes in JSON
{
"name": "MyFunction",
"attributes": {
"metrics": {
"complexity": 15,
"lines": 45,
"parameters": 3
},
"metadata": {
"author": "developer",
"last_modified": "2023-12-01"
}
}
}
Understanding these formats allows you to choose the right tool for each task and integrate sgraph into your development workflow effectively!