DataTape Usage Guide¶
This guide explains how to work with Treasury DataTapes, including running existing datatapes, understanding their outputs, and customizing parameters.
Available DataTapes¶
Treasury DataTapes are production-ready datasets that can be generated on-demand:
| DataTape | Script | Output |
|---|---|---|
| CONCORD | run_concord.py |
Backup servicer report for systems managed by Concord |
| DEALER_TRANSACTIONS | run_dealer_transactions.py |
Partner and vendor payment processing |
| GREAT_AMERICA | run_great_america.py |
System and payment data for Great America portfolios |
| KLIM | run_klim.py |
Essential loan information and customer demographics |
| SLA | run_sla.py |
Solar Loan Asset (SLA) Facility systems and operational metr... |
| TEP | run_tep.py |
Tax Equity Partnerships datatape analytics |
| THEOREM | run_theorem.py |
Interest analysis and loan portfolio reporting |
Running DataTapes¶
Basic Execution¶
Each datatape has its own runner script in the respective directory:
# Navigate to the datatape directory
cd datatapes/concord
# Run the datatape
python run_concord.py
# Or run from the project root
python -m datatapes.concord.run_concord
Common Parameters¶
Most datatape runners support these parameters:
# Specify target date
python run_concord.py --target-date "2025-06-18"
# Force refresh cached data
python run_concord.py --force-refresh
# Run for specific portfolios
python run_concord.py --portfolio "Sunnova SAP IV LLC"
# Use custom configuration
python run_concord.py --config "./custom_config.yaml"
Understanding DataTape Outputs¶
Output Structure¶
Each datatape generates the following outputs:
datatapes/[name]/
├── completed_output/
│ ├── [DataTape]_YYYY-MM-DD.csv # Main output file
│ └── [DataTape]_YYYY-MM-DD.xlsx # Excel version (if enabled)
├── profiling_output/
│ └── [DataTape]_YYYY-MM-DD_profile.html # Data quality report
└── logs/
└── [DataTape]_YYYY-MM-DD.log # Execution log
Column Documentation¶
Every datatape includes comprehensive column documentation showing:
- Column Name: The output column name
- Source: Which database table(s) provide the data
- Type: Data type (VARCHAR, DECIMAL, DATE, etc.)
- Description: Business meaning and usage
- Calculated: Whether the field is derived vs. direct from database
Example from Concord datatape:
| Column | Source | Type | Description |
|---|---|---|---|
| System Name | CONCORD.CORE_SYSTEM | VARCHAR | Unique identifier for the solar system |
| Customer Name | CONCORD.CUSTOMER_DATA | VARCHAR | Customer's full legal name |
| Monthly Payment | CONCORD.PAYMENTS (calculated) | DECIMAL | Average monthly payment amount |
Data Quality Reports¶
Each datatape run generates an automated data quality report including:
- Column Profiling: Data types, null counts, unique values
- Data Distribution: Min/max values, common patterns
- Quality Issues: Missing data, outliers, format problems
- Row Counts: Total records and any filtering applied
Working with Specific DataTapes¶
Concord DataTape¶
Purpose: Core system data for Concord-managed portfolios
Key Outputs: Customer demographics, equipment details, financial data
Typical Use Cases: Portfolio reporting, customer analysis, payment tracking
KLIM DataTape¶
Purpose: Key Loan Information Matrix for loan analytics
Key Outputs: Loan details, customer info, payment history
Typical Use Cases: Credit analysis, portfolio performance, regulatory reporting
Dealer Transactions DataTape¶
Purpose: Partner and vendor payment processing
Key Outputs: AP payments, dealer system data, transaction tracking
Typical Use Cases: Vendor management, payment reconciliation
DataTape Configuration¶
Viewing Configuration¶
Each datatape's configuration is stored in its [name]_config.yaml file:
# View Concord configuration
cat datatapes/concord/concord_config.yaml
# View KLIM configuration
cat datatapes/klim/klim_config.yaml
Key Configuration Elements¶
Data Sources: Database connections and query files
Output Settings: File format and naming
Column Definitions: Output structure
Troubleshooting¶
Common Issues¶
| Problem | Solution |
|---|---|
| "Database connection failed" | Check VPN connection and database credentials |
| "SQL query timeout" | Use --force-refresh or check query performance |
| "Missing output columns" | Review configuration file and SQL query results |
| "Permission denied" | Ensure write access to output directories |
Log Files¶
Each datatape run creates detailed logs:
# View latest run log
cat datatapes/concord/logs/Concord_2025-06-18.log
# Monitor real-time execution
tail -f datatapes/concord/logs/Concord_2025-06-18.log
Data Quality Issues¶
If data quality reports show issues:
- Missing Data: Check source systems for data availability
- Format Problems: Review SQL query logic and transformations
- Unexpected Values: Validate against business rules
- Performance Issues: Consider query optimization or caching
Best Practices¶
Regular Execution¶
- Run datatapes on a consistent schedule
- Use target dates that align with business cycles
- Monitor data quality reports for trends
Documentation Review¶
- Keep column documentation updated
- Document any custom configurations
- Review SQL queries for business logic changes
Output Management¶
- Archive completed outputs regularly
- Clean up old cache files periodically
- Monitor disk space usage
Advanced Usage¶
Custom Portfolios¶
Filter datatapes for specific portfolios:
# Single portfolio
python run_concord.py --portfolio "Sunnova SAP IV LLC"
# Multiple portfolios
python run_concord.py --portfolio "Sunnova SAP IV LLC" --portfolio "Sunnova TEP 8-B LLC"
Date Ranges¶
Generate historical data:
# Specific date
python run_concord.py --target-date "2025-05-01"
# Use with automation tools for date ranges
for date in 2025-01-01 2025-02-01 2025-03-01; do
python run_concord.py --target-date $date
done
Performance Optimization¶
For large datatapes: