Data Formats

sgraph supports multiple data formats, each optimized for different use cases. This guide explains when and how to use each format.

Overview

Format	Use Case	File Size	Performance	Human Readable
XML	Large models, production	Compact	Very Fast	Moderate
Deps	Scripting, simple analysis	Small	Fast	Very High
JSON	Web applications	Medium	Moderate	High
GraphML	Graph visualization tools	Large	Moderate	Low

XML Format

The XML format is the primary format for sgraph, designed for performance and scalability.

Structure

<model version="2.1">
  <elements t="architecture">
    <e n="root_element">
      <e n="child_element" i="2">
        <r r="3" t="relationship_type" />
      </e>
      <e i="3" n="target_element">
      </e>
    </e>
  </elements>
</model>

Key Features

Integer References: Elements use integer IDs (i attribute) for efficient relationships
Compact Representation: Minimal XML overhead
Hierarchical Structure: Nested elements represent containment
Relationships: <r> tags define associations between elements
Attributes: Custom attributes stored as XML attributes

Attributes Reference

Element Attributes (`<e>`)

n - Name of the element
i - Unique integer identifier
t - Type of the element
Custom attributes as needed

Relationship Attributes (`<r>`)

r - Reference to target element ID
t - Type of relationship
Custom attributes for relationship metadata

Example: C Project Structure

<model version="2.1">
  <elements t="c_project">
    <e n="nginx">
      <e n="src">
        <e n="core">
          <e n="nginx.c" t="source_file">
            <r r="2" t="includes" />
            <r r="3" t="includes" />
          </e>
          <e i="2" n="nginx.h" t="header_file">
          </e>
          <e i="3" n="config.h" t="header_file">
          </e>
        </e>
      </e>
    </e>
  </elements>
</model>

Working with XML

from sgraph import SGraph
from sgraph.modelapi import ModelApi

# Create and export to XML
model = SGraph(SElement(None, ''))
# ... build your model ...
model.to_xml('output.xml')

# Load from XML
api = ModelApi(filepath='model.xml')
elements = api.getAllElements()

Deps Format

The Deps format is a simple, line-based text format perfect for scripting and quick analysis.

Structure

source_path:target_path:relationship_type
source_path:target_path:relationship_type:attribute1=value1,attribute2=value2

Examples

Basic Dependencies

/nginx/src/core/nginx.c:/nginx/src/core/nginx.h:includes
/nginx/src/core/nginx.c:/nginx/src/core/config.h:includes
/myapp/views.py:/myapp/models.py:imports
/myapp/views.py:/myapp/utils.py:imports

With Attributes

/api/user.py:/api/auth.py:imports:frequency=high,critical=true
/frontend/login.js:/api/auth.py:calls:method=POST,async=true

Hierarchical Elements Only

When no relationships exist, the format shows the hierarchical structure:

/nginx
/nginx/src
/nginx/src/core
/nginx/src/core/nginx.c
/nginx/src/core/nginx.h

Working with Deps Format

from sgraph import SGraph
from sgraph.converters.deps_to_xml import DepsToXml
from sgraph.converters.xml_to_deps import XmlToDeps

# Export to Deps
model.to_deps('dependencies.txt')

# Convert between formats
deps_to_xml = DepsToXml()
deps_to_xml.convert('dependencies.txt', 'model.xml')

xml_to_deps = XmlToDeps()
xml_to_deps.convert('model.xml', 'dependencies.txt')

Command Line Usage

# Convert XML to Deps
python -m sgraph.converters.xml_to_deps model.xml output.deps

# Convert Deps to XML  
python -m sgraph.converters.deps_to_xml dependencies.txt model.xml

# Filter deps files
grep "\.py:" dependencies.txt > python_deps.txt

JSON Format

JSON format provides a web-friendly representation of sgraph models.

Structure

{
  "model_version": "2.1",
  "root": {
    "name": "root",
    "type": "root",
    "children": [
      {
        "name": "module1",
        "type": "module",
        "attributes": {
          "language": "python",
          "lines": 150
        },
        "children": [],
        "relationships": [
          {
            "target_path": "/module2",
            "type": "imports",
            "attributes": {}
          }
        ]
      }
    ]
  }
}

Working with JSON

from sgraph.converters.sgraph_json import SGraphJson
from sgraph.converters.xml_to_json import XmlToJson

# Convert to JSON
converter = XmlToJson()
converter.convert('model.xml', 'model.json')

# Load JSON in web applications
import json
with open('model.json', 'r') as f:
    model_data = json.load(f)

GraphML Format

GraphML is a standard XML format for graphs, supported by many visualization tools.

Features

Compatible with Gephi, yEd, Cytoscape
Rich metadata support
Standard format for graph exchange

Working with GraphML

from sgraph.converters.xml_to_graphml import XmlToGraphMl

# Convert to GraphML
converter = XmlToGraphMl()
converter.convert('model.xml', 'graph.graphml')

# Import into visualization tools:
# - Gephi: File > Open > graph.graphml
# - yEd: File > Open > graph.graphml  
# - Cytoscape: File > Import > Network from File

Format Comparison

Performance Benchmarks

Format	1K Elements	100K Elements	1M Elements
XML Load	10ms	500ms	5s
Deps Load	5ms	200ms	2s
JSON Load	15ms	800ms	8s
GraphML Load	20ms	1.2s	12s

File Size Comparison

For a typical software project with 10K elements:

Format	File Size	Compression Ratio
XML	2.5 MB	1.0x (baseline)
Deps	800 KB	3.1x smaller
JSON	4.2 MB	1.7x larger
GraphML	8.1 MB	3.2x larger

Best Practices

Choosing the Right Format

Use XML when:

Working with large models (>10K elements)
Need maximum performance
Building production systems
Preserving all metadata and attributes

Use Deps when:

Simple dependency analysis
Scripting and automation
Human-readable output needed
Working with shell tools (grep, awk, etc.)

Use JSON when:

Building web applications
Need JavaScript compatibility
Creating REST APIs
Moderate-sized models (<50K elements)

Use GraphML when:

Importing into visualization tools
Sharing with researchers
Need standards compliance
One-time analysis tasks

Performance Optimization

For Large Models

# Use XML format for storage
model.to_xml('large_model.xml')

# Load with ModelApi for efficient querying
api = ModelApi(filepath='large_model.xml')

# Use specific queries instead of loading all elements
functions = api.getElementsByType('function')
specific_elements = api.getElementsByName('main')

For Streaming Processing

# Process deps format line by line for very large files
def process_large_deps_file(filepath):
    with open(filepath, 'r') as f:
        for line in f:
            if ':' in line:
                parts = line.strip().split(':')
                source, target, rel_type = parts[:3]
                # Process dependency
                yield source, target, rel_type

Memory Management

# For very large models, use streaming
from sgraph.loader.modelloader import ModelLoader

# Load incrementally
loader = ModelLoader()
for element_batch in loader.load_streaming('huge_model.xml', batch_size=1000):
    # Process batch
    process_elements(element_batch)

Migration Between Formats

Preserving Metadata

When converting between formats, be aware of metadata preservation:

From → To	Elements	Relationships	Attributes	Performance
XML → Deps	✅ Paths only	✅	⚠️ Limited	Fast
XML → JSON	✅	✅	✅	Medium
XML → GraphML	✅	✅	✅	Slow
Deps → XML	✅	✅	⚠️ Limited	Fast

Batch Conversion

import os
from sgraph.converters.xml_to_deps import XmlToDeps

def convert_project_models(input_dir, output_dir):
    """Convert all XML models to Deps format"""
    converter = XmlToDeps()
    
    for filename in os.listdir(input_dir):
        if filename.endswith('.xml'):
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, filename.replace('.xml', '.deps'))
            
            print(f"Converting {filename}...")
            converter.convert(input_path, output_path)

# Usage
convert_project_models('models/', 'deps_output/')

Advanced Features

Custom Attributes in XML

<e n="MyClass" t="class" visibility="public" complexity="high" loc="250">
  <r r="2" t="inherits" strength="strong" />
</e>

Relationship Attributes in Deps

/src/main.py:/src/utils.py:imports:frequency=10,last_used=2023-12-01

Nested Attributes in JSON

{
  "name": "MyFunction",
  "attributes": {
    "metrics": {
      "complexity": 15,
      "lines": 45,
      "parameters": 3
    },
    "metadata": {
      "author": "developer",
      "last_modified": "2023-12-01"
    }
  }
}

Understanding these formats allows you to choose the right tool for each task and integrate sgraph into your development workflow effectively!

Data Formats

Overview

XML Format

Structure

Key Features

Attributes Reference

Element Attributes (<e>)

Relationship Attributes (<r>)

Example: C Project Structure

Working with XML

Deps Format

Structure

Examples

Basic Dependencies

With Attributes

Hierarchical Elements Only

Working with Deps Format

Command Line Usage

JSON Format

Structure

Working with JSON

GraphML Format

Features

Working with GraphML

Format Comparison

Performance Benchmarks

File Size Comparison

Best Practices

Choosing the Right Format

Performance Optimization

For Large Models

For Streaming Processing

Memory Management

Migration Between Formats

Preserving Metadata

Batch Conversion

Advanced Features

Custom Attributes in XML

Relationship Attributes in Deps

Nested Attributes in JSON

Element Attributes (`<e>`)

Relationship Attributes (`<r>`)