Skip to content

DataTape Usage Guide

This guide explains how to work with Treasury DataTapes, including running existing datatapes, understanding their outputs, and customizing parameters.

Available DataTapes

Treasury DataTapes are production-ready datasets that can be generated on-demand:

DataTape Script Output
CONCORD run_concord.py Backup servicer report for systems managed by Concord
DEALER_TRANSACTIONS run_dealer_transactions.py Partner and vendor payment processing
GREAT_AMERICA run_great_america.py System and payment data for Great America portfolios
KLIM run_klim.py Essential loan information and customer demographics
SLA run_sla.py Solar Loan Asset (SLA) Facility systems and operational metr...
TEP run_tep.py Tax Equity Partnerships datatape analytics
THEOREM run_theorem.py Interest analysis and loan portfolio reporting

Running DataTapes

Basic Execution

Each datatape has its own runner script in the respective directory:

# Navigate to the datatape directory
cd datatapes/concord

# Run the datatape
python run_concord.py

# Or run from the project root
python -m datatapes.concord.run_concord

Common Parameters

Most datatape runners support these parameters:

# Specify target date
python run_concord.py --target-date "2025-06-18"

# Force refresh cached data
python run_concord.py --force-refresh

# Run for specific portfolios
python run_concord.py --portfolio "Sunnova SAP IV LLC"

# Use custom configuration
python run_concord.py --config "./custom_config.yaml"

Understanding DataTape Outputs

Output Structure

Each datatape generates the following outputs:

datatapes/[name]/
├── completed_output/
│   ├── [DataTape]_YYYY-MM-DD.csv     # Main output file
│   └── [DataTape]_YYYY-MM-DD.xlsx    # Excel version (if enabled)
├── profiling_output/
│   └── [DataTape]_YYYY-MM-DD_profile.html  # Data quality report
└── logs/
    └── [DataTape]_YYYY-MM-DD.log     # Execution log

Column Documentation

Every datatape includes comprehensive column documentation showing:

  • Column Name: The output column name
  • Source: Which database table(s) provide the data
  • Type: Data type (VARCHAR, DECIMAL, DATE, etc.)
  • Description: Business meaning and usage
  • Calculated: Whether the field is derived vs. direct from database

Example from Concord datatape:

Column Source Type Description
System Name CONCORD.CORE_SYSTEM VARCHAR Unique identifier for the solar system
Customer Name CONCORD.CUSTOMER_DATA VARCHAR Customer's full legal name
Monthly Payment CONCORD.PAYMENTS (calculated) DECIMAL Average monthly payment amount

Data Quality Reports

Each datatape run generates an automated data quality report including:

  • Column Profiling: Data types, null counts, unique values
  • Data Distribution: Min/max values, common patterns
  • Quality Issues: Missing data, outliers, format problems
  • Row Counts: Total records and any filtering applied

Working with Specific DataTapes

Concord DataTape

Purpose: Core system data for Concord-managed portfolios
Key Outputs: Customer demographics, equipment details, financial data
Typical Use Cases: Portfolio reporting, customer analysis, payment tracking

cd datatapes/concord
python run_concord.py --target-date "2025-06-18"

KLIM DataTape

Purpose: Key Loan Information Matrix for loan analytics
Key Outputs: Loan details, customer info, payment history
Typical Use Cases: Credit analysis, portfolio performance, regulatory reporting

cd datatapes/klim  
python run_klim.py --target-date "2025-06-18"

Dealer Transactions DataTape

Purpose: Partner and vendor payment processing
Key Outputs: AP payments, dealer system data, transaction tracking
Typical Use Cases: Vendor management, payment reconciliation

cd datatapes/dealer_transactions
python run_dealer_transactions.py --target-date "2025-06-18"

DataTape Configuration

Viewing Configuration

Each datatape's configuration is stored in its [name]_config.yaml file:

# View Concord configuration
cat datatapes/concord/concord_config.yaml

# View KLIM configuration  
cat datatapes/klim/klim_config.yaml

Key Configuration Elements

Data Sources: Database connections and query files

queries:
  core_system:
    file: "combined_core_system.sql"
    description: "Main system data"

Output Settings: File format and naming

output:
  file_prefix: "Concord_DataTape"
  format: "csv"
  include_profiling: true

Column Definitions: Output structure

columns:
  - name: "System Name"
    required: true
    description: "Unique system identifier"

Troubleshooting

Common Issues

Problem Solution
"Database connection failed" Check VPN connection and database credentials
"SQL query timeout" Use --force-refresh or check query performance
"Missing output columns" Review configuration file and SQL query results
"Permission denied" Ensure write access to output directories

Log Files

Each datatape run creates detailed logs:

# View latest run log
cat datatapes/concord/logs/Concord_2025-06-18.log

# Monitor real-time execution
tail -f datatapes/concord/logs/Concord_2025-06-18.log

Data Quality Issues

If data quality reports show issues:

  1. Missing Data: Check source systems for data availability
  2. Format Problems: Review SQL query logic and transformations
  3. Unexpected Values: Validate against business rules
  4. Performance Issues: Consider query optimization or caching

Best Practices

Regular Execution

  • Run datatapes on a consistent schedule
  • Use target dates that align with business cycles
  • Monitor data quality reports for trends

Documentation Review

  • Keep column documentation updated
  • Document any custom configurations
  • Review SQL queries for business logic changes

Output Management

  • Archive completed outputs regularly
  • Clean up old cache files periodically
  • Monitor disk space usage

Advanced Usage

Custom Portfolios

Filter datatapes for specific portfolios:

# Single portfolio
python run_concord.py --portfolio "Sunnova SAP IV LLC"

# Multiple portfolios
python run_concord.py --portfolio "Sunnova SAP IV LLC" --portfolio "Sunnova TEP 8-B LLC"

Date Ranges

Generate historical data:

# Specific date
python run_concord.py --target-date "2025-05-01"

# Use with automation tools for date ranges
for date in 2025-01-01 2025-02-01 2025-03-01; do
    python run_concord.py --target-date $date
done

Performance Optimization

For large datatapes:

# Use cached results when possible
python run_concord.py  # Uses cache by default

# Force refresh only when needed
python run_concord.py --force-refresh

# Monitor performance
python run_concord.py --verbose