export_utils
matching.export_utils
Utilities for cleaning and exporting energy community matching results.
Functions
| Name | Description |
|---|---|
| check_for_issues | Check for any remaining issues in the cleaned DataFrame. |
| clean_column | Clean a single column of text data. |
| clean_dataframe_parallel | Clean DataFrame columns in parallel with progress visualization. |
| diagnose_problematic_columns | Perform detailed diagnostics on potentially problematic columns. |
| export_results | Clean and export results to CSV file with proper formatting. |
| get_text_columns | Identify potential text columns that need cleaning. |
check_for_issues
matching.export_utils.check_for_issues(df)Check for any remaining issues in the cleaned DataFrame.
Args: df: DataFrame to check
Returns: Dictionary of issues found by column and type
clean_column
matching.export_utils.clean_column(args)Clean a single column of text data.
Args: args: Tuple containing (column_name, series)
Returns: Tuple of (column_name, cleaned_series)
clean_dataframe_parallel
matching.export_utils.clean_dataframe_parallel(df, max_workers=None)Clean DataFrame columns in parallel with progress visualization.
Args: df: DataFrame to clean max_workers: Maximum number of worker threads (defaults to CPU count)
Returns: Cleaned DataFrame
diagnose_problematic_columns
matching.export_utils.diagnose_problematic_columns(df, columns=None)Perform detailed diagnostics on potentially problematic columns.
Args: df: DataFrame to analyze columns: List of columns to check (if None, will check all text columns)
Returns: Dictionary with detailed diagnostics per column
export_results
matching.export_utils.export_results(
df,
output_path,
parallel_workers=None,
verify=True,
)Clean and export results to CSV file with proper formatting.
Args: df: DataFrame to export output_path: Path to save the CSV file parallel_workers: Number of threads for parallel processing verify: Whether to verify the exported file
Returns: True if export was successful, False otherwise
get_text_columns
matching.export_utils.get_text_columns(df)Identify potential text columns that need cleaning.
Args: df: DataFrame to analyze
Returns: List of column names that are likely to contain text