THEOREM Datatape¶

Interest analysis and loan data reports for Theorem portfolio using. Includes interest calculations, loan tape data, payment tracking, and pipeline analysis.

Overview¶

Generation Frequency: Monthly Development Status: Stable

Key Features: - Interest calculation engine - Loan tape data processing - Payment tracking integration

Key Stakeholders: - Treasury Reporting Team - Theorem

Output Columns¶

The following columns are included in the final datatape output (filtered from SQL queries based on configuration):

Column Analysis Summary¶

Total columns found in SQL queries: 24
Columns specified in output_columns config: 7
Columns found in SQL and included in output: 4
Missing columns (in config but not found in SQL): 3

Note: Only columns that exist in SQL queries are included in the final output.

Column Name	Required	Source Query	Source Table	Source Field	Derived
accrued_interest	Yes	interest_calc	lpt	accrued_interest	Yes
calculation_date	Yes	interest_calc		calculation_date	Yes
last_payment_date	Yes	interest_calc	lpt	last_payment_date	No
principal_balance	Yes	interest_calc	lpt	principal_balance	Yes

Note: Columns marked as 'Derived' are calculated or transformed rather than directly selected from the source table.

Data Sources¶

The configuration pulls data from the following sources:

Source Name	Type	Query/File	Description
thorem_pipeline	database	thorem_pipeline.sql	SQL query for thorem_pipeline data
loan_tape	database	loan_tape.sql	SQL query for loan_tape data
theorem_payments	database	theorem_payments.sql	SQL query for theorem_payments data
interest_calc	database	interest_calc.sql	SQL query for interest_calc data

Configuration¶

Portfolio Groups¶

theorem_portfolios: - Theorem Portfolio A - Theorem Portfolio B

Database Configuration¶

dbcreds_environment: fusionods

Paths¶

sql_dir: ./sql
cache_dir: ./query_cache
output_dir: ./completed_output
log_dir: ./logs

SQL Queries¶

thorem_pipelineloan_tapetheorem_paymentsinterest_calc

SELECT t.*
FROM ods.vw_theorem_pipeline_data t

SELECT t.*
FROM ods.vw_loan_tape t

SELECT t.*
FROM ods.vw_theorem_payments_data t

-- Optimized query for interest calculations
-- Parameters: {{cut_off_date}} should be replaced with the desired cutoff date

SELECT 
    lpt."Loan Account",
    ROUND(lpt."Balance"::numeric, 2) as principal_balance,
    ROUND((lpt."Interest Rate" / 100.0)::numeric, 4) as annual_rate,
    lpt."Transaction Date" as last_payment_date,
    '{{cut_off_date}}'::date as calculation_date,
    ('{{cut_off_date}}'::date - lpt."Transaction Date"::date) as days_since_payment,
    ROUND((lpt."Interest Rate" / 100.0 / 360)::numeric, 6) as daily_rate,
    ROUND((lpt."Balance" * (lpt."Interest Rate" / 100.0 / 360))::numeric, 2) as daily_interest,
    ROUND((lpt."Balance" * (lpt."Interest Rate" / 100.0 / 360) * 
        ('{{cut_off_date}}'::date - lpt."Transaction Date"::date))::numeric, 2) as accrued_interest,
    cl."Status" as loan_status,
    cl."Loan Type",
    s."Sunnova System ID",
    s."Placed InService Date"
FROM (
    SELECT DISTINCT ON ("Loan Account")
        "Loan Account",
        "Created Date",
        "Transaction Amount",
        "Transaction Date",
        "Transaction Time",
        "Balance",
        "Interest Rate",
        "Interest"
    FROM sfdc_staging.vwx_loan__loan_payment_transaction__c
    WHERE "Transaction Date" <= '{{cut_off_date}}'
    AND NOT "Reversed"
    ORDER BY "Loan Account", "Transaction Date" DESC, "Transaction Time" DESC
) lpt
JOIN sfdc_staging.vwx_loan__loan_account__c cl 
    ON lpt."Loan Account" = cl."Record ID"
JOIN sfdc_staging.vwx_system__c s 
    ON cl."System" = s."Record ID"
WHERE s."Financing Company" = 'Theorem'
AND s."Asset Portfolio - Customer" = 'Theorem'
AND cl."Loan Type" != 'Scheduled Prepayment';

Data Processing¶

Data processing is handled by: datatapes/theorem/run_theorem.py

Processing Steps¶

The data processing includes: 1. SQL query execution 2. Data transformation and cleaning 3. Column mapping and validation 4. Output file generation

Python Modules¶

This datatape includes additional Python modules with business logic:

theorem_interest_data¶

Theorem Interest Analysis Script This script analyzes interest calculations for Theorem loans based on user-provided month.

`analyze_interest_changes(current_df: pd.DataFrame, previous_df: pd.DataFrame) -> pd.DataFrame` ¶

Analyze month-over-month interest changes.

Parameters:

Name	Type	Description	Default
`current_df`	`DataFrame`	Current month's interest data	required
`previous_df`	`DataFrame`	Previous month's interest data	required

Returns:

Type	Description
`DataFrame`	DataFrame containing comparison analysis

Source code in datatapes/theorem/theorem_interest_data.py

def analyze_interest_changes(current_df: pd.DataFrame, 
                           previous_df: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze month-over-month interest changes.

    Args:
        current_df: Current month's interest data
        previous_df: Previous month's interest data

    Returns:
        DataFrame containing comparison analysis
    """
    try:
        comparison = pd.merge(
            previous_df,
            current_df,
            on="Loan Account",
            suffixes=('_prev', '_curr')
        )

        comparison['interest_change'] = (
            comparison['accrued_interest_curr'] - 
            comparison['accrued_interest_prev']
        )

        return comparison
    except Exception as e:
        log.error(f"Failed to analyze interest changes: {str(e)}")
        return pd.DataFrame()

`create_db_connection(db_config: dict) -> object` ¶

Create database connection using SQLAlchemy.

Parameters:

Name	Type	Description	Default
`db_config`	`dict`	Dictionary containing database configuration	required

Returns:

Type	Description
`object`	SQLAlchemy engine object

Source code in datatapes/theorem/theorem_interest_data.py

def create_db_connection(db_config: dict) -> object:
    """
    Create database connection using SQLAlchemy.

    Args:
        db_config: Dictionary containing database configuration

    Returns:
        SQLAlchemy engine object
    """
    try:
        connection_string = (
            f"postgresql+pg8000://{db_config['username']}:{db_config['password']}"
            f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
        )
        return create_engine(connection_string, client_encoding='utf8')
    except Exception as e:
        log.error(f"Failed to create database connection: {str(e)}")
        sys.exit(1)

`display_summary_statistics(comparison_df: pd.DataFrame) -> None` ¶

Display summary statistics using Rich tables.

Parameters:

Name	Type	Description	Default
`comparison_df`	`DataFrame`	DataFrame containing comparison data	required

Source code in datatapes/theorem/theorem_interest_data.py

def display_summary_statistics(comparison_df: pd.DataFrame) -> None:
    """
    Display summary statistics using Rich tables.

    Args:
        comparison_df: DataFrame containing comparison data
    """
    try:
        # Create summary statistics table
        stats_table = Table(title="Summary Statistics")
        stats_table.add_column("Statistic")
        stats_table.add_column("Previous Month")
        stats_table.add_column("Current Month")
        stats_table.add_column("Change")

        stats = comparison_df[[
            'accrued_interest_prev',
            'accrued_interest_curr',
            'interest_change'
        ]].describe()

        for idx in stats.index:
            stats_table.add_row(
                idx,
                f"{stats['accrued_interest_prev'][idx]:.2f}",
                f"{stats['accrued_interest_curr'][idx]:.2f}",
                f"{stats['interest_change'][idx]:.2f}"
            )

        console.print(stats_table)

        # Display significant changes
        significant_changes = comparison_df[
            abs(comparison_df['interest_change']) > 100
        ].sort_values('interest_change', ascending=False)

        if not significant_changes.empty:
            changes_table = Table(title="Significant Interest Changes")
            changes_table.add_column("Loan Account")
            changes_table.add_column("Previous Interest")
            changes_table.add_column("Current Interest")
            changes_table.add_column("Change")

            for _, row in significant_changes.iterrows():
                changes_table.add_row(
                    str(row['Loan Account']),
                    f"{row['accrued_interest_prev']:.2f}",
                    f"{row['accrued_interest_curr']:.2f}",
                    f"{row['interest_change']:.2f}"
                )

            console.print(changes_table)

    except Exception as e:
        log.error(f"Failed to display summary statistics: {str(e)}")

`execute_and_save_query(conn: Connection, query: str, cutoff_date: str, query_name: str, output_path: Path) -> None` ¶

Execute a query and save its results.

Parameters:

Name	Type	Description	Default
`conn`	`Connection`	Database connection	required
`query`	`str`	SQL query string	required
`cutoff_date`	`str`	Cutoff date string	required
`query_name`	`str`	Name of the query for the output file	required
`output_path`	`Path`	Path where the file should be saved	required

Source code in datatapes/theorem/theorem_interest_data.py

def execute_and_save_query(conn: Connection, query: str, cutoff_date: str, 
                          query_name: str, output_path: Path) -> None:
    """
    Execute a query and save its results.

    Args:
        conn: Database connection
        query: SQL query string
        cutoff_date: Cutoff date string
        query_name: Name of the query for the output file
        output_path: Path where the file should be saved
    """
    try:
        formatted_query = query.replace('{{cut_off_date}}', cutoff_date)
        df = pd.read_sql(formatted_query, conn)

        if not df.empty:
            timestamp = datetime.now().strftime('%Y%m%d')
            current_month = datetime.strptime(cutoff_date, '%Y-%m-%d').strftime('%B_%Y').lower()
            filename = output_path / f"{current_month}_{timestamp}_{query_name}.csv"
            df.to_csv(filename, index=False)
            log.info(f"Saved {len(df)} records to {filename}")
        else:
            log.warning(f"No data returned for query: {query_name}")

    except Exception as e:
        log.error(f"Failed to execute and save query {query_name}: {str(e)}")

`get_cutoff_date(default_date: date = None, prompt_text: str = 'Enter cutoff date') -> str` ¶

Get cutoff date from user input with validation.

Parameters:

Name	Type	Description	Default
`default_date`	`date`	Default date to display	`None`
`prompt_text`	`str`	Text to display in the prompt	`'Enter cutoff date'`

Returns:

Type	Description
`str`	Validated cutoff date string

Source code in datatapes/theorem/theorem_interest_data.py

def get_cutoff_date(default_date: date = None, prompt_text: str = "Enter cutoff date") -> str:
    """
    Get cutoff date from user input with validation.

    Args:
        default_date: Default date to display
        prompt_text: Text to display in the prompt

    Returns:
        Validated cutoff date string
    """
    default = str(default_date or date.today())
    while True:
        date_str = Prompt.ask(
            prompt_text,
            default=default,
            show_default=True
        )
        if validate_date(date_str):
            return date_str
        log.error("Invalid date format. Please use YYYY-MM-DD")

`load_additional_queries(sql_folder: Path) -> Dict[str, str]` ¶

Load additional SQL queries from the sql folder.

Parameters:

Name	Type	Description	Default
`sql_folder`	`Path`	Path to the folder containing SQL files	required

Returns:

Type	Description
`Dict[str, str]`	Dictionary of query names and their contents

Source code in datatapes/theorem/theorem_interest_data.py

def load_additional_queries(sql_folder: Path) -> Dict[str, str]:
    """
    Load additional SQL queries from the sql folder.

    Args:
        sql_folder: Path to the folder containing SQL files

    Returns:
        Dictionary of query names and their contents
    """
    queries = {}
    query_files = ['loan_tape.sql', 'theorem_payments.sql', 'thorem_pipeline.sql']

    for filename in query_files:
        try:
            file_path = sql_folder / filename
            if file_path.exists():
                with open(file_path, 'r') as f:
                    queries[file_path.stem] = f.read()
                log.info(f"Loaded query: {filename}")
            else:
                log.warning(f"Query file not found: {filename}")
        except Exception as e:
            log.error(f"Failed to load query {filename}: {str(e)}")

    return queries

`load_config(config_path: str) -> dict` ¶

Load database configuration from INI file.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the configuration file	required

Returns:

Type	Description
`dict`	Dictionary containing database configuration

Source code in datatapes/theorem/theorem_interest_data.py

def load_config(config_path: str) -> dict:
    """
    Load database configuration from INI file.

    Args:
        config_path: Path to the configuration file

    Returns:
        Dictionary containing database configuration
    """
    try:
        config = configparser.ConfigParser()
        if not config_path.exists():
            log.error(f"Configuration file not found at {config_path}")
            return {}

        config.read(config_path)

        # Try different possible section names
        section_names = ['ODS-Ploomber', 'ODS',]

        for section in section_names:
            if section in config:
                return dict(config[section])

        # If no known section is found, try to use the first available section
        if config.sections():
            first_section = config.sections()[0]
            log.warning(f"Using configuration from section: {first_section}")
            return dict(config[first_section])

        log.error("No valid database configuration section found")
        return {}
    except Exception as e:
        log.error(f"Failed to load configuration: {str(e)}")
        return {}

`load_sql_query(sql_path: str) -> str` ¶

Load SQL query from file.

Parameters:

Name	Type	Description	Default
`sql_path`	`str`	Path to the SQL query file	required

Returns:

Type	Description
`str`	SQL query string

Source code in datatapes/theorem/theorem_interest_data.py

def load_sql_query(sql_path: str) -> str:
    """
    Load SQL query from file.

    Args:
        sql_path: Path to the SQL query file

    Returns:
        SQL query string
    """
    try:
        with open(sql_path, 'r') as f:
            return f.read()
    except Exception as e:
        log.error(f"Failed to load SQL query: {str(e)}")
        sys.exit(1)

`main(config_path: Optional[str] = None, sql_folder: Optional[str] = None, output_path: Optional[str] = None)` ¶

Main execution function.

Parameters:

Name	Type	Description	Default
`config_path`	`Optional[str]`	Path to the configuration file (optional)	`None`
`sql_folder`	`Optional[str]`	Path to the folder containing SQL files (optional)	`None`
`output_path`	`Optional[str]`	Path where output files should be saved (optional)	`None`

Source code in datatapes/theorem/theorem_interest_data.py

def main(config_path: Optional[str] = None, 
         sql_folder: Optional[str] = None,
         output_path: Optional[str] = None):
    """
    Main execution function.

    Args:
        config_path: Path to the configuration file (optional)
        sql_folder: Path to the folder containing SQL files (optional)
        output_path: Path where output files should be saved (optional)
    """
    try:
        # Set default paths if not provided
        script_path = Path(__file__).resolve()
        if not config_path:
            config_path = Path('Z:/Private/thandolwethu.dlamini/configurations/connections.ini')
        if not sql_folder:
            sql_folder = script_path.parent / 'sql'
        if not output_path:
            output_path = Path(r"C:\Users\thandolwethu.dlamini\OneDrive - Sunnova Energy International\Documents - FinOps\Treasury\Reporting\Code\TreasuryReporting\CompletedOutput\Theorem")

        # Ensure output directory exists
        output_path.mkdir(parents=True, exist_ok=True)

        # Load configuration
        db_config = load_config(config_path)
        if not db_config:
            log.error("Failed to load valid database configuration")
            sys.exit(1)

        # Create database connection
        engine = create_db_connection(db_config)

        # Load SQL queries
        interest_query = load_sql_query(sql_folder / 'interest_calc.sql')
        additional_queries = load_additional_queries(sql_folder)

        # Get cutoff dates
        # Get cutoff dates
        end_of_prev_month, end_of_prev_prev_month = get_default_dates()

        log.info("Getting cutoff dates...")
        current_cutoff = get_cutoff_date(
            default_date=end_of_prev_month,
            prompt_text="Enter cutoff date for current month"
        )
        previous_cutoff = get_cutoff_date(
            default_date=end_of_prev_prev_month, 
            prompt_text="Enter cutoff date for previous month"
        )

        # Execute queries and save results
        with engine.connect() as conn:
            # Process interest calculation
            current_df = pd.read_sql(
                interest_query.replace('{{cut_off_date}}', current_cutoff),
                conn
            )
            previous_df = pd.read_sql(
                interest_query.replace('{{cut_off_date}}', previous_cutoff),
                conn
            )

            # Analyze interest data
            comparison = analyze_interest_changes(current_df, previous_df)
            display_summary_statistics(comparison)

            # Save interest data
            current_month = datetime.strptime(current_cutoff, '%Y-%m-%d').strftime('%B_%Y')
            save_interest_data(current_df, current_month.lower(), output_path)

            # Process additional queries
            for query_name, query in additional_queries.items():
                execute_and_save_query(conn, query, current_cutoff, 
                                    query_name, output_path)

    except Exception as e:
        log.error(f"An error occurred: {str(e)}")
        sys.exit(1)

`save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None` ¶

Save interest data to CSV file.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame containing interest data	required
`month`	`str`	Month identifier for filename	required
`output_path`	`Path`	Path where the file should be saved	required

Source code in datatapes/theorem/theorem_interest_data.py

def save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None:
    """
    Save interest data to CSV file.

    Args:
        df: DataFrame containing interest data
        month: Month identifier for filename
        output_path: Path where the file should be saved
    """
    try:
        if df.empty:
            log.warning(f"No data found for {month}")
            return

        timestamp = datetime.now().strftime('%Y%m%d')
        filename = output_path / f"{month}_{timestamp}_theorem_interest.csv"
        df.to_csv(filename, index=False)
        log.info(f"Saved {len(df)} records to {filename}")

    except Exception as e:
        log.error(f"Failed to save interest data: {str(e)}")

`validate_date(date_str: str) -> Optional[date]` ¶

Validate date string format and return date object.

Parameters:

Name	Type	Description	Default
`date_str`	`str`	Date string in YYYY-MM-DD format	required

Returns:

Type	Description
`Optional[date]`	datetime.date object if valid, None otherwise

Source code in datatapes/theorem/theorem_interest_data.py

def validate_date(date_str: str) -> Optional[date]:
    """
    Validate date string format and return date object.

    Args:
        date_str: Date string in YYYY-MM-DD format

    Returns:
        datetime.date object if valid, None otherwise
    """
    try:
        return datetime.strptime(date_str, '%Y-%m-%d').date()
    except ValueError:
        return None

THEOREM Datatape¶

Overview¶

Output Columns¶

Column Analysis Summary¶

Data Sources¶

Configuration¶

Portfolio Groups¶

Database Configuration¶

Paths¶

SQL Queries¶

Data Processing¶

Processing Steps¶

Python Modules¶

theorem_interest_data¶

analyze_interest_changes(current_df: pd.DataFrame, previous_df: pd.DataFrame) -> pd.DataFrame ¶

create_db_connection(db_config: dict) -> object ¶

display_summary_statistics(comparison_df: pd.DataFrame) -> None ¶

execute_and_save_query(conn: Connection, query: str, cutoff_date: str, query_name: str, output_path: Path) -> None ¶

get_cutoff_date(default_date: date = None, prompt_text: str = 'Enter cutoff date') -> str ¶

load_additional_queries(sql_folder: Path) -> Dict[str, str] ¶

load_config(config_path: str) -> dict ¶

load_sql_query(sql_path: str) -> str ¶

main(config_path: Optional[str] = None, sql_folder: Optional[str] = None, output_path: Optional[str] = None) ¶

save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None ¶

validate_date(date_str: str) -> Optional[date] ¶

`analyze_interest_changes(current_df: pd.DataFrame, previous_df: pd.DataFrame) -> pd.DataFrame` ¶

`create_db_connection(db_config: dict) -> object` ¶

`display_summary_statistics(comparison_df: pd.DataFrame) -> None` ¶

`execute_and_save_query(conn: Connection, query: str, cutoff_date: str, query_name: str, output_path: Path) -> None` ¶

`get_cutoff_date(default_date: date = None, prompt_text: str = 'Enter cutoff date') -> str` ¶

`load_additional_queries(sql_folder: Path) -> Dict[str, str]` ¶

`load_config(config_path: str) -> dict` ¶

`load_sql_query(sql_path: str) -> str` ¶

`main(config_path: Optional[str] = None, sql_folder: Optional[str] = None, output_path: Optional[str] = None)` ¶

`save_interest_data(df: pd.DataFrame, month: str, output_path: Path) -> None` ¶

`validate_date(date_str: str) -> Optional[date]` ¶