DataTape Configuration Reference¶
Each Treasury DataTape is configured through a YAML file that defines data sources, transformations, and output specifications. This guide explains the configuration structure and options available for each datatape.
Configuration Files¶
Each datatape has its own configuration file in its directory:
datatapes/
├── concord/concord_config.yaml # 130+ columns
├── dealer_transactions/dealer_transactions_config.yaml # 47+ columns
├── great_america/ga_config.yaml # 79+ columns
├── klim/klim_config.yaml # 28+ columns
├── sla/sla_config.yaml # 76+ columns
└── tep/tep_config.yaml # 111+ columns
Standard Configuration Structure¶
All datatape configurations follow this standard structure:
# DataTape Metadata
name: "DataTape Name"
description: "Purpose and scope of this datatape"
version: "1.0.0"
# File and Directory Paths
paths:
sql_dir: "./sql" # SQL query files location
output_dir: "./completed_output" # Generated files location
cache_dir: "./query_cache" # Cached query results
log_dir: "./logs" # Execution logs
# Database Configuration
database:
use_global_settings: true # Use shared database settings
# SQL Query Definitions
queries:
query_name:
file: "query_file.sql" # SQL file name
description: "Query purpose" # What this query retrieves
parameters: [] # Query parameters
# Output Configuration
output:
file_prefix: "DataTape_Name" # Output file naming
format: "csv" # Output format
include_profiling: true # Generate data quality reports
# Column Definitions (generated automatically)
columns: [] # Column metadata and validation
Real DataTape Examples¶
CONCORD DataTape Configuration¶
File: datatapes/concord/concord_config.yaml
Purpose: Backup servicer report for systems managed by Concord
Output: 156 columns
DEALER_TRANSACTIONS DataTape Configuration¶
File: datatapes/dealer_transactions/dealer_transactions_config.yaml
Purpose: Partner and vendor payment processing
Output: 17 columns
GREAT_AMERICA DataTape Configuration¶
File: datatapes/great_america/great_america_config.yaml
Purpose: System and payment data for Great America portfolios
Output: 57 columns
For complete configuration examples, see each datatape's individual documentation page.
Configuration Components¶
Query Definitions¶
Each SQL query is defined with metadata:
queries:
query_name:
file: "query_file.sql" # SQL file in sql/ directory
description: "Business purpose" # What data this retrieves
parameters: # Optional query parameters
- name: "target_date"
description: "Report date"
- name: "portfolio_filter"
description: "Portfolio selection"
Path Configuration¶
Standard paths used by all datatapes:
paths:
sql_dir: "./sql" # Where SQL files are stored
output_dir: "./completed_output" # Where CSV/Excel files are generated
cache_dir: "./query_cache" # Where query results are cached
log_dir: "./logs" # Where execution logs are written
profiling_dir: "./profiling_output" # Where data quality reports go
Output Settings¶
Control how datatapes are generated and formatted:
output:
file_prefix: "DataTape_Name" # Prefix for output files
format: "csv" # Primary format (csv, xlsx, json)
encoding: "utf-8" # File encoding
include_headers: true # Include column headers
include_profiling: true # Generate data quality report
date_format: "%Y-%m-%d" # Date formatting in output
Database Configuration¶
All datatapes use shared database settings:
database:
use_global_settings: true # Use centralized DB config
# Global settings include:
# - Connection parameters
# - Credential management
# - Connection pooling
# - Query timeout settings
Column Documentation¶
Each datatape automatically generates comprehensive column documentation that includes:
Column Metadata Structure¶
columns:
- name: "System Name" # Output column name
source: "CONCORD.CORE_SYSTEM" # Source table/query
type: "VARCHAR(50)" # Data type and length
description: "Unique identifier for the solar system"
calculated: false # Direct from DB vs. derived
- name: "Monthly Payment Amount"
source: "CONCORD.PAYMENTS"
type: "DECIMAL(10,2)"
description: "Average monthly payment calculated from payment history"
calculated: true # This is a derived field
Column Categories¶
Direct Database Fields: Pulled directly from source tables
- System identifiers
- Customer demographics
- Financial amounts
- Dates and timestamps
Calculated Fields: Derived through SQL logic or transformations - Averages and aggregations - Date calculations - Status derivations - Concatenated values
Reference Data: From CSV files or lookup tables - Portfolio mappings - Category codes - Business rules
SQL Query Structure¶
Each datatape uses multiple SQL files to organize data retrieval:
Query Organization¶
sql/
├── core_data.sql # Primary business entities
├── customer_info.sql # Customer demographics
├── financial_data.sql # Payments, invoices, loans
├── equipment_data.sql # Equipment and installations
└── reference_data.sql # Lookup tables and mappings
Parameter Usage¶
SQL queries support parameterization for dynamic execution:
-- Example: Parameter usage in SQL
SELECT
system_id,
customer_name,
payment_amount
FROM payments
WHERE payment_date >= :target_date
AND portfolio_name IN (:portfolio_list)
Parameters are automatically provided by the runner scripts based on command-line arguments.
Best Practices¶
Configuration Management¶
- Keep configurations in version control
- Document any custom changes
- Use consistent naming conventions
- Test configuration changes in development
SQL Organization¶
- Group related queries in logical files
- Use clear, descriptive file names
- Document complex business logic in SQL comments
- Optimize queries for performance
Column Documentation¶
- Provide clear business descriptions
- Indicate calculated vs. direct fields
- Document any transformations applied
- Keep documentation current with schema changes
Output Management¶
- Use consistent file naming patterns
- Include dates in output filenames
- Generate data quality reports
- Archive historical outputs
Troubleshooting Configuration Issues¶
Common Problems¶
| Issue | Cause | Solution |
|---|---|---|
| "SQL file not found" | Incorrect path in config | Check sql_dir path and file names |
| "Column missing from output" | Query doesn't return expected column | Review SQL query and column aliases |
| "Parameter not found" | SQL parameter not defined | Add parameter to query configuration |
| "Database connection failed" | Incorrect database settings | Verify use_global_settings: true |
Validation Steps¶
- Check YAML Syntax: Use a YAML validator to check configuration syntax
- Verify File Paths: Ensure all SQL files exist in the specified locations
- Test SQL Queries: Run SQL queries independently to verify they work
- Check Column Names: Ensure SQL output columns match expected names
- Validate Parameters: Confirm all SQL parameters are properly defined
For detailed examples of each datatape's configuration, see the individual datatape documentation pages.