

# analysis_utils

`utils.analysis_utils`

Utility functions for energy community analysis workflow.

## Functions

| Name | Description |
|----|----|
| [find_latest_file](#nova_ec.utils.analysis_utils.find_latest_file) | Find the most recent file matching the pattern in the directory. |
| [generate_analysis_summary](#nova_ec.utils.analysis_utils.generate_analysis_summary) | Generate and display a summary of the analysis results. |
| [load_solar_systems](#nova_ec.utils.analysis_utils.load_solar_systems) | Find and load solar systems data. |
| [match_systems_to_energy_communities](#nova_ec.utils.analysis_utils.match_systems_to_energy_communities) | Match solar systems to energy communities. |
| [merge_energy_community_columns](#nova_ec.utils.analysis_utils.merge_energy_community_columns) | Merge similar energy community columns from the FFSAEC and CCEC groups. |
| [validate_energy_communities](#nova_ec.utils.analysis_utils.validate_energy_communities) | Validate energy community data files. |

### find_latest_file

``` python
utils.analysis_utils.find_latest_file(directory, pattern)
```

Find the most recent file matching the pattern in the directory.

Args: directory: Directory to search pattern: Glob pattern for files

Returns: Path to the most recent file, or None if no files found

### generate_analysis_summary

``` python
utils.analysis_utils.generate_analysis_summary(
    final_output_df,
    output_path,
    log_path,
)
```

Generate and display a summary of the analysis results.

Args: final_output_df: DataFrame with analysis results output_path: Path to the output file log_path: Path to the log file

### load_solar_systems

``` python
utils.analysis_utils.load_solar_systems(config, solar_systems_file=None)
```

Find and load solar systems data.

Args: config: Configuration dictionary solar_systems_file: Optional path to solar systems file

Returns: DataFrame with solar systems data, or None if loading failed

### match_systems_to_energy_communities

``` python
utils.analysis_utils.match_systems_to_energy_communities(
    solar_systems_df,
    CCEC_2023=None,
    CCEC_2024=None,
    FFSAEC_2023=None,
    FFSAEC_2024=None,
)
```

Match solar systems to energy communities.

Args: solar_systems_df: DataFrame containing solar system data CCEC_2023: Coal Closure Energy Communities for 2023 CCEC_2024: Coal Closure Energy Communities for 2024 FFSAEC_2023: Fossil Fuel Statistical Areas for 2023 FFSAEC_2024: Fossil Fuel Statistical Areas for 2024

Returns: DataFrame with matching results

### merge_energy_community_columns

``` python
utils.analysis_utils.merge_energy_community_columns(df)
```

Merge similar energy community columns from the FFSAEC and CCEC groups.

For example: - “ffsaec_2023_fips_state” and “ffsaec_2024_fips_state” become “fips_state” - “ccec_2023_symbol” and “ccec_2024_symbol” become “symbol”

The merging prioritizes non-empty values (i.e. non-NaN and not an empty string) by scanning in order from the FFSAEC columns, then using CCEC values only when needed.

Parameters: df : pd.DataFrame DataFrame that contains the original FFSAEC and CCEC columns.

Returns: pd.DataFrame DataFrame with new merged columns (and the original merged columns dropped).

Note: Adjust the lists below if there are additional or different attributes.

### validate_energy_communities

``` python
utils.analysis_utils.validate_energy_communities(
    config,
    skip_validation=False,
    debug_paths=False,
)
```

Validate energy community data files.

Args: config: Configuration dictionary skip_validation: Whether to skip validation debug_paths: Whether to print detailed debug info for file paths

Returns: True if validation passed or skipped, False otherwise
