Data Formats

sgraph supports multiple data formats, each optimized for different use cases. This guide explains when and how to use each format.

Overview

Format Use Case File Size Performance Human Readable
XML Large models, production Compact Very Fast Moderate
Deps Scripting, simple analysis Small Fast Very High
JSON Web applications Medium Moderate High
GraphML Graph visualization tools Large Moderate Low

XML Format

The XML format is the primary format for sgraph, designed for performance and scalability.

Structure

<model version="2.1">
  <elements t="architecture">
    <e n="root_element">
      <e n="child_element" i="2">
        <r r="3" t="relationship_type" />
      </e>
      <e i="3" n="target_element">
      </e>
    </e>
  </elements>
</model>

Key Features

  • Integer References: Elements use integer IDs (i attribute) for efficient relationships
  • Compact Representation: Minimal XML overhead
  • Hierarchical Structure: Nested elements represent containment
  • Relationships: <r> tags define associations between elements
  • Attributes: Custom attributes stored as XML attributes

Attributes Reference

Element Attributes (<e>)

  • n - Name of the element
  • i - Unique integer identifier
  • t - Type of the element
  • Custom attributes as needed

Relationship Attributes (<r>)

  • r - Reference to target element ID
  • t - Type of relationship
  • Custom attributes for relationship metadata

Example: C Project Structure

<model version="2.1">
  <elements t="c_project">
    <e n="nginx">
      <e n="src">
        <e n="core">
          <e n="nginx.c" t="source_file">
            <r r="2" t="includes" />
            <r r="3" t="includes" />
          </e>
          <e i="2" n="nginx.h" t="header_file">
          </e>
          <e i="3" n="config.h" t="header_file">
          </e>
        </e>
      </e>
    </e>
  </elements>
</model>

Working with XML

from sgraph import SGraph
from sgraph.modelapi import ModelApi

# Create and export to XML
model = SGraph(SElement(None, ''))
# ... build your model ...
model.to_xml('output.xml')

# Load from XML
api = ModelApi(filepath='model.xml')
elements = api.getAllElements()

Deps Format

The Deps format is a simple, line-based text format perfect for scripting and quick analysis.

Structure

source_path:target_path:relationship_type
source_path:target_path:relationship_type:attribute1=value1,attribute2=value2

Examples

Basic Dependencies

/nginx/src/core/nginx.c:/nginx/src/core/nginx.h:includes
/nginx/src/core/nginx.c:/nginx/src/core/config.h:includes
/myapp/views.py:/myapp/models.py:imports
/myapp/views.py:/myapp/utils.py:imports

With Attributes

/api/user.py:/api/auth.py:imports:frequency=high,critical=true
/frontend/login.js:/api/auth.py:calls:method=POST,async=true

Hierarchical Elements Only

When no relationships exist, the format shows the hierarchical structure:

/nginx
/nginx/src
/nginx/src/core
/nginx/src/core/nginx.c
/nginx/src/core/nginx.h

Working with Deps Format

from sgraph import SGraph
from sgraph.converters.deps_to_xml import DepsToXml
from sgraph.converters.xml_to_deps import XmlToDeps

# Export to Deps
model.to_deps('dependencies.txt')

# Convert between formats
deps_to_xml = DepsToXml()
deps_to_xml.convert('dependencies.txt', 'model.xml')

xml_to_deps = XmlToDeps()
xml_to_deps.convert('model.xml', 'dependencies.txt')

Command Line Usage

# Convert XML to Deps
python -m sgraph.converters.xml_to_deps model.xml output.deps

# Convert Deps to XML  
python -m sgraph.converters.deps_to_xml dependencies.txt model.xml

# Filter deps files
grep "\.py:" dependencies.txt > python_deps.txt

JSON Format

JSON format provides a web-friendly representation of sgraph models.

Structure

{
  "model_version": "2.1",
  "root": {
    "name": "root",
    "type": "root",
    "children": [
      {
        "name": "module1",
        "type": "module",
        "attributes": {
          "language": "python",
          "lines": 150
        },
        "children": [],
        "relationships": [
          {
            "target_path": "/module2",
            "type": "imports",
            "attributes": {}
          }
        ]
      }
    ]
  }
}

Working with JSON

from sgraph.converters.sgraph_json import SGraphJson
from sgraph.converters.xml_to_json import XmlToJson

# Convert to JSON
converter = XmlToJson()
converter.convert('model.xml', 'model.json')

# Load JSON in web applications
import json
with open('model.json', 'r') as f:
    model_data = json.load(f)

GraphML Format

GraphML is a standard XML format for graphs, supported by many visualization tools.

Features

  • Compatible with Gephi, yEd, Cytoscape
  • Rich metadata support
  • Standard format for graph exchange

Working with GraphML

from sgraph.converters.xml_to_graphml import XmlToGraphMl

# Convert to GraphML
converter = XmlToGraphMl()
converter.convert('model.xml', 'graph.graphml')

# Import into visualization tools:
# - Gephi: File > Open > graph.graphml
# - yEd: File > Open > graph.graphml  
# - Cytoscape: File > Import > Network from File

Format Comparison

Performance Benchmarks

Format 1K Elements 100K Elements 1M Elements
XML Load 10ms 500ms 5s
Deps Load 5ms 200ms 2s
JSON Load 15ms 800ms 8s
GraphML Load 20ms 1.2s 12s

File Size Comparison

For a typical software project with 10K elements:

Format File Size Compression Ratio
XML 2.5 MB 1.0x (baseline)
Deps 800 KB 3.1x smaller
JSON 4.2 MB 1.7x larger
GraphML 8.1 MB 3.2x larger

Best Practices

Choosing the Right Format

Use XML when:

  • Working with large models (>10K elements)
  • Need maximum performance
  • Building production systems
  • Preserving all metadata and attributes

Use Deps when:

  • Simple dependency analysis
  • Scripting and automation
  • Human-readable output needed
  • Working with shell tools (grep, awk, etc.)

Use JSON when:

  • Building web applications
  • Need JavaScript compatibility
  • Creating REST APIs
  • Moderate-sized models (<50K elements)

Use GraphML when:

  • Importing into visualization tools
  • Sharing with researchers
  • Need standards compliance
  • One-time analysis tasks

Performance Optimization

For Large Models

# Use XML format for storage
model.to_xml('large_model.xml')

# Load with ModelApi for efficient querying
api = ModelApi(filepath='large_model.xml')

# Use specific queries instead of loading all elements
functions = api.getElementsByType('function')
specific_elements = api.getElementsByName('main')

For Streaming Processing

# Process deps format line by line for very large files
def process_large_deps_file(filepath):
    with open(filepath, 'r') as f:
        for line in f:
            if ':' in line:
                parts = line.strip().split(':')
                source, target, rel_type = parts[:3]
                # Process dependency
                yield source, target, rel_type

Memory Management

# For very large models, use streaming
from sgraph.loader.modelloader import ModelLoader

# Load incrementally
loader = ModelLoader()
for element_batch in loader.load_streaming('huge_model.xml', batch_size=1000):
    # Process batch
    process_elements(element_batch)

Migration Between Formats

Preserving Metadata

When converting between formats, be aware of metadata preservation:

From → To Elements Relationships Attributes Performance
XML → Deps ✅ Paths only ⚠️ Limited Fast
XML → JSON Medium
XML → GraphML Slow
Deps → XML ⚠️ Limited Fast

Batch Conversion

import os
from sgraph.converters.xml_to_deps import XmlToDeps

def convert_project_models(input_dir, output_dir):
    """Convert all XML models to Deps format"""
    converter = XmlToDeps()
    
    for filename in os.listdir(input_dir):
        if filename.endswith('.xml'):
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, filename.replace('.xml', '.deps'))
            
            print(f"Converting {filename}...")
            converter.convert(input_path, output_path)

# Usage
convert_project_models('models/', 'deps_output/')

Advanced Features

Custom Attributes in XML

<e n="MyClass" t="class" visibility="public" complexity="high" loc="250">
  <r r="2" t="inherits" strength="strong" />
</e>

Relationship Attributes in Deps

/src/main.py:/src/utils.py:imports:frequency=10,last_used=2023-12-01

Nested Attributes in JSON

{
  "name": "MyFunction",
  "attributes": {
    "metrics": {
      "complexity": 15,
      "lines": 45,
      "parameters": 3
    },
    "metadata": {
      "author": "developer",
      "last_modified": "2023-12-01"
    }
  }
}

Understanding these formats allows you to choose the right tool for each task and integrate sgraph into your development workflow effectively!