Skip to content

THEOREM Datatape

Interest analysis and loan data reports for Theorem portfolio using. Includes interest calculations, loan tape data, payment tracking, and pipeline analysis.

Overview

Generation Frequency: Monthly Development Status: Stable

Key Features: - Interest calculation engine - Loan tape data processing - Payment tracking integration

Key Stakeholders: - Treasury Reporting Team - Theorem

Output Columns

The following columns are included in the final datatape output (filtered from SQL queries based on configuration):

Column Analysis Summary

  • Total columns found in SQL queries: 24
  • Columns specified in output_columns config: 7
  • Columns found in SQL and included in output: 4
  • Missing columns (in config but not found in SQL): 3

Note: Only columns that exist in SQL queries are included in the final output.

Column Name Required Source Query Source Table Source Field Derived Transformations
accrued_interest Yes interest_calc lpt accrued_interest Yes
calculation_date Yes interest_calc calculation_date Yes
last_payment_date Yes interest_calc lpt last_payment_date No
principal_balance Yes interest_calc lpt principal_balance Yes

Note: Columns marked as 'Derived' are calculated or transformed rather than directly selected from the source table.

Data Sources

The configuration pulls data from the following sources:

Source Name Type Query/File Description
thorem_pipeline database thorem_pipeline.sql SQL query for thorem_pipeline data
loan_tape database loan_tape.sql SQL query for loan_tape data
theorem_payments database theorem_payments.sql SQL query for theorem_payments data
interest_calc database interest_calc.sql SQL query for interest_calc data

Configuration

Portfolio Groups

theorem_portfolios: - Theorem Portfolio A - Theorem Portfolio B

Database Configuration

  • dbcreds_environment: fusionods

Paths

  • sql_dir: ./sql
  • cache_dir: ./query_cache
  • output_dir: ./completed_output
  • log_dir: ./logs

SQL Queries

SELECT t.*
FROM ods.vw_theorem_pipeline_data t
SELECT t.*
FROM ods.vw_loan_tape t
SELECT t.*
FROM ods.vw_theorem_payments_data t
-- Optimized query for interest calculations
-- Parameters: {{cut_off_date}} should be replaced with the desired cutoff date

SELECT 
    lpt."Loan Account",
    ROUND(lpt."Balance"::numeric, 2) as principal_balance,
    ROUND((lpt."Interest Rate" / 100.0)::numeric, 4) as annual_rate,
    lpt."Transaction Date" as last_payment_date,
    '{{cut_off_date}}'::date as calculation_date,
    ('{{cut_off_date}}'::date - lpt."Transaction Date"::date) as days_since_payment,
    ROUND((lpt."Interest Rate" / 100.0 / 360)::numeric, 6) as daily_rate,
    ROUND((lpt."Balance" * (lpt."Interest Rate" / 100.0 / 360))::numeric, 2) as daily_interest,
    ROUND((lpt."Balance" * (lpt."Interest Rate" / 100.0 / 360) * 
        ('{{cut_off_date}}'::date - lpt."Transaction Date"::date))::numeric, 2) as accrued_interest,
    cl."Status" as loan_status,
    cl."Loan Type",
    s."Sunnova System ID",
    s."Placed InService Date"
FROM (
    SELECT DISTINCT ON ("Loan Account")
        "Loan Account",
        "Created Date",
        "Transaction Amount",
        "Transaction Date",
        "Transaction Time",
        "Balance",
        "Interest Rate",
        "Interest"
    FROM sfdc_staging.vwx_loan__loan_payment_transaction__c
    WHERE "Transaction Date" <= '{{cut_off_date}}'
    AND NOT "Reversed"
    ORDER BY "Loan Account", "Transaction Date" DESC, "Transaction Time" DESC
) lpt
JOIN sfdc_staging.vwx_loan__loan_account__c cl 
    ON lpt."Loan Account" = cl."Record ID"
JOIN sfdc_staging.vwx_system__c s 
    ON cl."System" = s."Record ID"
WHERE s."Financing Company" = 'Theorem'
AND s."Asset Portfolio - Customer" = 'Theorem'
AND cl."Loan Type" != 'Scheduled Prepayment';

Data Processing

Data processing is handled by: datatapes/theorem/run_theorem.py

Processing Steps

The data processing includes: 1. SQL query execution 2. Data transformation and cleaning 3. Column mapping and validation 4. Output file generation

Python Modules

This datatape includes additional Python modules with business logic:

theorem_interest_data

Theorem Interest Analysis Script This script analyzes interest calculations for Theorem loans based on user-provided month.

analyze_interest_changes(current_df: pd.DataFrame, previous_df: pd.DataFrame) -> pd.DataFrame

Analyze month-over-month interest changes.

Parameters:

Name Type Description Default
current_df DataFrame

Current month's interest data

required
previous_df DataFrame

Previous month's interest data

required

Returns:

Type Description
DataFrame

DataFrame containing comparison analysis

Source code in datatapes/theorem/theorem_interest_data.py
def analyze_interest_changes(current_df: pd.DataFrame, 
                           previous_df: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze month-over-month interest changes.

    Args:
        current_df: Current month's interest data
        previous_df: Previous month's interest data

    Returns:
        DataFrame containing comparison analysis
    """
    try:
        comparison = pd.merge(
            previous_df,
            current_df,
            on="Loan Account",
            suffixes=('_prev', '_curr')
        )

        comparison['interest_change'] = (
            comparison['accrued_interest_curr'] - 
            comparison['accrued_interest_prev']
        )

        return comparison
    except Exception as e:
        log.error(f"Failed to analyze interest changes: {str(e)}")
        return pd.DataFrame()

create_db_connection(db_config: dict) -> object

Create database connection using SQLAlchemy.

Parameters:

Name Type Description Default
db_config dict

Dictionary containing database configuration

required

Returns:

Type Description
object

SQLAlchemy engine object

Source code in datatapes/theorem/theorem_interest_data.py
def create_db_connection(db_config: dict) -> object:
    """
    Create database connection using SQLAlchemy.

    Args:
        db_config: Dictionary containing database configuration

    Returns:
        SQLAlchemy engine object
    """
    try:
        connection_string = (
            f"postgresql+pg8000://{db_config['username']}:{db_config['password']}"
            f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
        )
        return create_engine(connection_string, client_encoding='utf8')
    except Exception as e:
        log.error(f"Failed to create database connection: {str(e)}")
        sys.exit(1)

display_summary_statistics(comparison_df: pd.DataFrame) -> None

Display summary statistics using Rich tables.

Parameters:

Name Type Description Default
comparison_df DataFrame

DataFrame containing comparison data

required
Source code in datatapes/theorem/theorem_interest_data.py
def display_summary_statistics(comparison_df: pd.DataFrame) -> None:
    """
    Display summary statistics using Rich tables.

    Args:
        comparison_df: DataFrame containing comparison data
    """
    try:
        # Create summary statistics table
        stats_table = Table(title="Summary Statistics")
        stats_table.add_column("Statistic")
        stats_table.add_column("Previous Month")
        stats_table.add_column("Current Month")
        stats_table.add_column("Change")

        stats = comparison_df[[
            'accrued_interest_prev',
            'accrued_interest_curr',
            'interest_change'
        ]].describe()

        for idx in stats.index:
            stats_table.add_row(
                idx,
                f"{stats['accrued_interest_prev'][idx]:.2f}",
                f"{stats['accrued_interest_curr'][idx]:.2f}",
                f"{stats['interest_change'][idx]:.2f}"
            )

        console.print(stats_table)

        # Display significant changes
        significant_changes = comparison_df[
            abs(comparison_df['interest_change']) > 100
        ].sort_values('interest_change', ascending=False)

        if not significant_changes.empty:
            changes_table = Table(title="Significant Interest Changes")
            changes_table.add_column("Loan Account")
            changes_table.add_column("Previous Interest")
            changes_table.add_column("Current Interest")
            changes_table.add_column("Change")

            for _, row in significant_changes.iterrows():
                changes_table.add_row(
                    str(row['Loan Account']),
                    f"{row['accrued_interest_prev']:.2f}",
                    f"{row['accrued_interest_curr']:.2f}",
                    f"{row['interest_change']:.2f}"
                )

            console.print(changes_table)

    except Exception as e:
        log.error(f"Failed to display summary statistics: {str(e)}")

execute_and_save_query(conn: Connection, query: str, cutoff_date: str, query_name: str, output_path: Path) -> None

Execute a query and save its results.

Parameters:

Name Type Description Default
conn Connection

Database connection

required
query str

SQL query string

required
cutoff_date str

Cutoff date string

required
query_name str

Name of the query for the output file

required
output_path Path

Path where the file should be saved

required
Source code in datatapes/theorem/theorem_interest_data.py
def execute_and_save_query(conn: Connection, query: str, cutoff_date: str, 
                          query_name: str, output_path: Path) -> None:
    """
    Execute a query and save its results.

    Args:
        conn: Database connection
        query: SQL query string
        cutoff_date: Cutoff date string
        query_name: Name of the query for the output file
        output_path: Path where the file should be saved
    """
    try:
        formatted_query = query.replace('{{cut_off_date}}', cutoff_date)
        df = pd.read_sql(formatted_query, conn)

        if not df.empty:
            timestamp = datetime.now().strftime('%Y%m%d')
            current_month = datetime.strptime(cutoff_date, '%Y-%m-%d').strftime('%B_%Y').lower()
            filename = output_path / f"{current_month}_{timestamp}_{query_name}.csv"
            df.to_csv(filename, index=False)
            log.info(f"Saved {len(df)} records to {filename}")
        else:
            log.warning(f"No data returned for query: {query_name}")

    except Exception as e:
        log.error(f"Failed to execute and save query {query_name}: {str(e)}")

get_cutoff_date(default_date: date = None, prompt_text: str = 'Enter cutoff date') -> str

Get cutoff date from user input with validation.

Parameters:

Name Type Description Default
default_date date

Default date to display

None
prompt_text str

Text to display in the prompt

'Enter cutoff date'

Returns:

Type Description
str

Validated cutoff date string

Source code in datatapes/theorem/theorem_interest_data.py
def get_cutoff_date(default_date: date = None, prompt_text: str = "Enter cutoff date") -> str:
    """
    Get cutoff date from user input with validation.

    Args:
        default_date: Default date to display
        prompt_text: Text to display in the prompt

    Returns:
        Validated cutoff date string
    """
    default = str(default_date or date.today())
    while True:
        date_str = Prompt.ask(
            prompt_text,
            default=default,
            show_default=True
        )
        if validate_date(date_str):
            return date_str
        log.error("Invalid date format. Please use YYYY-MM-DD")

load_additional_queries(sql_folder: Path) -> Dict[str, str]

Load additional SQL queries from the sql folder.

Parameters:

Name Type Description Default
sql_folder Path

Path to the folder containing SQL files

required

Returns:

Type Description
Dict[str, str]

Dictionary of query names and their contents

Source code in datatapes/theorem/theorem_interest_data.py
def load_additional_queries(sql_folder: Path) -> Dict[str, str]:
    """
    Load additional SQL queries from the sql folder.

    Args:
        sql_folder: Path to the folder containing SQL files

    Returns:
        Dictionary of query names and their contents
    """
    queries = {}
    query_files = ['loan_tape.sql', 'theorem_payments.sql', 'thorem_pipeline.sql']

    for filename in query_files:
        try:
            file_path = sql_folder / filename
            if file_path.exists():
                with open(file_path, 'r') as f:
                    queries[file_path.stem] = f.read()
                log.info(f"Loaded query: {filename}")
            else:
                log.warning(f"Query file not found: {filename}")
        except Exception as e:
            log.error(f"Failed to load query {filename}: {str(e)}")

    return queries

load_config(config_path: str) -> dict

Load database configuration from INI file.

Parameters:

Name Type Description Default
config_path str

Path to the configuration file

required

Returns:

Type Description
dict

Dictionary containing database configuration

Source code in datatapes/theorem/theorem_interest_data.py
def load_config(config_path: str) -> dict:
    """
    Load database configuration from INI file.

    Args:
        config_path: Path to the configuration file

    Returns:
        Dictionary containing database configuration
    """
    try:
        config = configparser.ConfigParser()
        if not config_path.exists():
            log.error(f"Configuration file not found at {config_path}")
            return {}

        config.read(config_path)

        # Try different possible section names
        section_names = ['ODS-Ploomber', 'ODS',]

        for section in section_names:
            if section in config:
                return dict(config[section])

        # If no known section is found, try to use the first available section
        if config.sections():
            first_section = config.sections()[0]
            log.warning(f"Using configuration from section: {first_section}")
            return dict(config[first_section])

        log.error("No valid database configuration section found")
        return {}
    except Exception as e:
        log.error(f"Failed to load configuration: {str(e)}")
        return {}

load_sql_query(sql_path: str) -> str

Load SQL query from file.

Parameters:

Name Type Description Default
sql_path str

Path to the SQL query file

required

Returns:

Type Description
str

SQL query string

Source code in datatapes/theorem/theorem_interest_data.py
def load_sql_query(sql_path: str) -> str:
    """
    Load SQL query from file.

    Args:
        sql_path: Path to the SQL query file

    Returns:
        SQL query string
    """
    try:
        with open(sql_path, 'r') as f:
            return f.read()
    except Exception as e:
        log.error(f"Failed to load SQL query: {str(e)}")
        sys.exit(1)

main(config_path: Optional[str] = None, sql_folder: Optional[str] = None, output_path: Optional[str] = None)

Main execution function.

Parameters:

Name Type Description Default
config_path Optional[str]

Path to the configuration file (optional)

None
sql_folder Optional[str]

Path to the folder containing SQL files (optional)

None
output_path Optional[str]

Path where output files should be saved (optional)

None
Source code in datatapes/theorem/theorem_interest_data.py
def main(config_path: Optional[str] = None, 
         sql_folder: Optional[str] = None,
         output_path: Optional[str] = None):
    """
    Main execution function.

    Args:
        config_path: Path to the configuration file (optional)
        sql_folder: Path to the folder containing SQL files (optional)
        output_path: Path where output files should be saved (optional)
    """
    try:
        # Set default paths if not provided
        script_path = Path(__file__).resolve()
        if not config_path:
            config_path = Path('Z:/Private/thandolwethu.dlamini/configurations/connections.ini')
        if not sql_folder:
            sql_folder = script_path.parent / 'sql'
        if not output_path:
            output_path = Path(r"C:\Users\thandolwethu.dlamini\OneDrive - Sunnova Energy International\Documents - FinOps\Treasury\Reporting\Code\TreasuryReporting\CompletedOutput\Theorem")

        # Ensure output directory exists
        output_path.mkdir(parents=True, exist_ok=True)

        # Load configuration
        db_config = load_config(config_path)
        if not db_config:
            log.error("Failed to load valid database configuration")
            sys.exit(1)

        # Create database connection
        engine = create_db_connection(db_config)

        # Load SQL queries
        interest_query = load_sql_query(sql_folder / 'interest_calc.sql')
        additional_queries = load_additional_queries(sql_folder)

        # Get cutoff dates
        # Get cutoff dates
        end_of_prev_month, end_of_prev_prev_month = get_default_dates()

        log.info("Getting cutoff dates...")
        current_cutoff = get_cutoff_date(
            default_date=end_of_prev_month,
            prompt_text="Enter cutoff date for current month"
        )
        previous_cutoff = get_cutoff_date(
            default_date=end_of_prev_prev_month, 
            prompt_text="Enter cutoff date for previous month"
        )

        # Execute queries and save results
        with engine.connect() as conn:
            # Process interest calculation
            current_df = pd.read_sql(
                interest_query.replace('{{cut_off_date}}', current_cutoff),
                conn
            )
            previous_df = pd.read_sql(
                interest_query.replace('{{cut_off_date}}', previous_cutoff),
                conn
            )

            # Analyze interest data
            comparison = analyze_interest_changes(current_df, previous_df)
            display_summary_statistics(comparison)

            # Save interest data
            current_month = datetime.strptime(current_cutoff, '%Y-%m-%d').strftime('%B_%Y')
            save_interest_data(current_df, current_month.lower(), output_path)

            # Process additional queries
            for query_name, query in additional_queries.items():
                execute_and_save_query(conn, query, current_cutoff, 
                                    query_name, output_path)

    except Exception as e:
        log.error(f"An error occurred: {str(e)}")
        sys.exit(1)

save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None

Save interest data to CSV file.

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing interest data

required
month str

Month identifier for filename

required
output_path Path

Path where the file should be saved

required
Source code in datatapes/theorem/theorem_interest_data.py
def save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None:
    """
    Save interest data to CSV file.

    Args:
        df: DataFrame containing interest data
        month: Month identifier for filename
        output_path: Path where the file should be saved
    """
    try:
        if df.empty:
            log.warning(f"No data found for {month}")
            return

        timestamp = datetime.now().strftime('%Y%m%d')
        filename = output_path / f"{month}_{timestamp}_theorem_interest.csv"
        df.to_csv(filename, index=False)
        log.info(f"Saved {len(df)} records to {filename}")

    except Exception as e:
        log.error(f"Failed to save interest data: {str(e)}")

validate_date(date_str: str) -> Optional[date]

Validate date string format and return date object.

Parameters:

Name Type Description Default
date_str str

Date string in YYYY-MM-DD format

required

Returns:

Type Description
Optional[date]

datetime.date object if valid, None otherwise

Source code in datatapes/theorem/theorem_interest_data.py
def validate_date(date_str: str) -> Optional[date]:
    """
    Validate date string format and return date object.

    Args:
        date_str: Date string in YYYY-MM-DD format

    Returns:
        datetime.date object if valid, None otherwise
    """
    try:
        return datetime.strptime(date_str, '%Y-%m-%d').date()
    except ValueError:
        return None