Skip to content

DataTape Configuration Reference

Each Treasury DataTape is configured through a YAML file that defines data sources, transformations, and output specifications. This guide explains the configuration structure and options available for each datatape.

Configuration Files

Each datatape has its own configuration file in its directory:

datatapes/
├── concord/concord_config.yaml           # 130+ columns
├── dealer_transactions/dealer_transactions_config.yaml  # 47+ columns  
├── great_america/ga_config.yaml          # 79+ columns
├── klim/klim_config.yaml                 # 28+ columns
├── sla/sla_config.yaml                   # 76+ columns
└── tep/tep_config.yaml                   # 111+ columns

Standard Configuration Structure

All datatape configurations follow this standard structure:

# DataTape Metadata
name: "DataTape Name"
description: "Purpose and scope of this datatape"
version: "1.0.0"

# File and Directory Paths
paths:
  sql_dir: "./sql"                    # SQL query files location
  output_dir: "./completed_output"    # Generated files location  
  cache_dir: "./query_cache"         # Cached query results
  log_dir: "./logs"                  # Execution logs

# Database Configuration  
database:
  use_global_settings: true          # Use shared database settings

# SQL Query Definitions
queries:
  query_name:
    file: "query_file.sql"           # SQL file name
    description: "Query purpose"     # What this query retrieves
    parameters: []                   # Query parameters

# Output Configuration
output:
  file_prefix: "DataTape_Name"       # Output file naming
  format: "csv"                      # Output format
  include_profiling: true            # Generate data quality reports

# Column Definitions (generated automatically)
columns: []                          # Column metadata and validation

Real DataTape Examples

CONCORD DataTape Configuration

File: datatapes/concord/concord_config.yaml
Purpose: Backup servicer report for systems managed by Concord
Output: 156 columns

DEALER_TRANSACTIONS DataTape Configuration

File: datatapes/dealer_transactions/dealer_transactions_config.yaml
Purpose: Partner and vendor payment processing
Output: 17 columns

GREAT_AMERICA DataTape Configuration

File: datatapes/great_america/great_america_config.yaml
Purpose: System and payment data for Great America portfolios
Output: 57 columns

For complete configuration examples, see each datatape's individual documentation page.

Configuration Components

Query Definitions

Each SQL query is defined with metadata:

queries:
  query_name:
    file: "query_file.sql"           # SQL file in sql/ directory
    description: "Business purpose"  # What data this retrieves
    parameters:                      # Optional query parameters
      - name: "target_date"
        description: "Report date"
      - name: "portfolio_filter"
        description: "Portfolio selection"

Path Configuration

Standard paths used by all datatapes:

paths:
  sql_dir: "./sql"                   # Where SQL files are stored
  output_dir: "./completed_output"   # Where CSV/Excel files are generated  
  cache_dir: "./query_cache"        # Where query results are cached
  log_dir: "./logs"                 # Where execution logs are written
  profiling_dir: "./profiling_output" # Where data quality reports go

Output Settings

Control how datatapes are generated and formatted:

output:
  file_prefix: "DataTape_Name"       # Prefix for output files
  format: "csv"                      # Primary format (csv, xlsx, json)
  encoding: "utf-8"                  # File encoding
  include_headers: true              # Include column headers
  include_profiling: true            # Generate data quality report
  date_format: "%Y-%m-%d"           # Date formatting in output

Database Configuration

All datatapes use shared database settings:

database:
  use_global_settings: true          # Use centralized DB config
  # Global settings include:
  # - Connection parameters
  # - Credential management  
  # - Connection pooling
  # - Query timeout settings

Column Documentation

Each datatape automatically generates comprehensive column documentation that includes:

Column Metadata Structure

columns:
  - name: "System Name"              # Output column name
    source: "CONCORD.CORE_SYSTEM"   # Source table/query
    type: "VARCHAR(50)"             # Data type and length
    description: "Unique identifier for the solar system"
    calculated: false               # Direct from DB vs. derived

  - name: "Monthly Payment Amount"   
    source: "CONCORD.PAYMENTS"      
    type: "DECIMAL(10,2)"           
    description: "Average monthly payment calculated from payment history"
    calculated: true                # This is a derived field

Column Categories

Direct Database Fields: Pulled directly from source tables - System identifiers - Customer demographics
- Financial amounts - Dates and timestamps

Calculated Fields: Derived through SQL logic or transformations - Averages and aggregations - Date calculations - Status derivations - Concatenated values

Reference Data: From CSV files or lookup tables - Portfolio mappings - Category codes - Business rules

SQL Query Structure

Each datatape uses multiple SQL files to organize data retrieval:

Query Organization

sql/
├── core_data.sql          # Primary business entities
├── customer_info.sql      # Customer demographics  
├── financial_data.sql     # Payments, invoices, loans
├── equipment_data.sql     # Equipment and installations
└── reference_data.sql     # Lookup tables and mappings

Parameter Usage

SQL queries support parameterization for dynamic execution:

-- Example: Parameter usage in SQL
SELECT 
    system_id,
    customer_name,
    payment_amount
FROM payments 
WHERE payment_date >= :target_date
  AND portfolio_name IN (:portfolio_list)

Parameters are automatically provided by the runner scripts based on command-line arguments.

Best Practices

Configuration Management

  • Keep configurations in version control
  • Document any custom changes
  • Use consistent naming conventions
  • Test configuration changes in development

SQL Organization

  • Group related queries in logical files
  • Use clear, descriptive file names
  • Document complex business logic in SQL comments
  • Optimize queries for performance

Column Documentation

  • Provide clear business descriptions
  • Indicate calculated vs. direct fields
  • Document any transformations applied
  • Keep documentation current with schema changes

Output Management

  • Use consistent file naming patterns
  • Include dates in output filenames
  • Generate data quality reports
  • Archive historical outputs

Troubleshooting Configuration Issues

Common Problems

Issue Cause Solution
"SQL file not found" Incorrect path in config Check sql_dir path and file names
"Column missing from output" Query doesn't return expected column Review SQL query and column aliases
"Parameter not found" SQL parameter not defined Add parameter to query configuration
"Database connection failed" Incorrect database settings Verify use_global_settings: true

Validation Steps

  1. Check YAML Syntax: Use a YAML validator to check configuration syntax
  2. Verify File Paths: Ensure all SQL files exist in the specified locations
  3. Test SQL Queries: Run SQL queries independently to verify they work
  4. Check Column Names: Ensure SQL output columns match expected names
  5. Validate Parameters: Confirm all SQL parameters are properly defined

For detailed examples of each datatape's configuration, see the individual datatape documentation pages.