export_utils

matching.export_utils

Utilities for cleaning and exporting energy community matching results.

Functions

Name	Description
check_for_issues	Check for any remaining issues in the cleaned DataFrame.
clean_column	Clean a single column of text data.
clean_dataframe_parallel	Clean DataFrame columns in parallel with progress visualization.
diagnose_problematic_columns	Perform detailed diagnostics on potentially problematic columns.
export_results	Clean and export results to CSV file with proper formatting.
get_text_columns	Identify potential text columns that need cleaning.

check_for_issues

matching.export_utils.check_for_issues(df)

Check for any remaining issues in the cleaned DataFrame.

Args: df: DataFrame to check

Returns: Dictionary of issues found by column and type

clean_column

matching.export_utils.clean_column(args)

Clean a single column of text data.

Args: args: Tuple containing (column_name, series)

Returns: Tuple of (column_name, cleaned_series)

clean_dataframe_parallel

matching.export_utils.clean_dataframe_parallel(df, max_workers=None)

Clean DataFrame columns in parallel with progress visualization.

Args: df: DataFrame to clean max_workers: Maximum number of worker threads (defaults to CPU count)

Returns: Cleaned DataFrame

diagnose_problematic_columns

matching.export_utils.diagnose_problematic_columns(df, columns=None)

Perform detailed diagnostics on potentially problematic columns.

Args: df: DataFrame to analyze columns: List of columns to check (if None, will check all text columns)

Returns: Dictionary with detailed diagnostics per column

export_results

matching.export_utils.export_results(
    df,
    output_path,
    parallel_workers=None,
    verify=True,
)

Clean and export results to CSV file with proper formatting.

Args: df: DataFrame to export output_path: Path to save the CSV file parallel_workers: Number of threads for parallel processing verify: Whether to verify the exported file

Returns: True if export was successful, False otherwise

get_text_columns

matching.export_utils.get_text_columns(df)

Identify potential text columns that need cleaning.

Args: df: DataFrame to analyze

Returns: List of column names that are likely to contain text