

# system_data

`data.system_data`

Module for loading and processing solar system data.

## Classes

| Name | Description |
|----|----|
| [DateValidationResult](#nova_ec.data.system_data.DateValidationResult) | Model for date validation results. |

### DateValidationResult

``` python
data.system_data.DateValidationResult()
```

Model for date validation results.

## Functions

| Name | Description |
|----|----|
| [add_state_full_name_column](#nova_ec.data.system_data.add_state_full_name_column) | Add full state names to the DataFrame. |
| [calculate_distances](#nova_ec.data.system_data.calculate_distances) | Calculate distances between approximate and ArcGIS coordinates. |
| [clean_data](#nova_ec.data.system_data.clean_data) | Load and clean the data. |
| [convert_and_validate_dates](#nova_ec.data.system_data.convert_and_validate_dates) | Convert date columns to datetime and validate the conversion. |
| [count_rows_with_missing_geographical_info](#nova_ec.data.system_data.count_rows_with_missing_geographical_info) | Count rows with missing geographical information. |
| [filter_columns](#nova_ec.data.system_data.filter_columns) | Filter DataFrame to keep only specified columns. |
| [get_unique_states](#nova_ec.data.system_data.get_unique_states) | Get unique states in the dataset. |
| [load_data](#nova_ec.data.system_data.load_data) | Load data from a CSV file. |
| [print_validation_results](#nova_ec.data.system_data.print_validation_results) | Print date validation results in a table format. |
| [process_solar_systems](#nova_ec.data.system_data.process_solar_systems) | Process solar system data from a file. |
| [systems_with_no_county](#nova_ec.data.system_data.systems_with_no_county) | Identify systems with no county information. |
| [systems_with_no_lat_lon](#nova_ec.data.system_data.systems_with_no_lat_lon) | Identify systems with no latitude and longitude. |

### add_state_full_name_column

``` python
data.system_data.add_state_full_name_column(df, state_dict)
```

Add full state names to the DataFrame.

Args: df: DataFrame to process state_dict: Dictionary mapping state abbreviations to full names

Returns: DataFrame with added StateFullName column

### calculate_distances

``` python
data.system_data.calculate_distances(df, max_workers=8)
```

Calculate distances between approximate and ArcGIS coordinates.

Args: df: DataFrame containing coordinate columns max_workers: Maximum number of parallel workers

Returns: Tuple of (DataFrame with distance column, summary dict, list of skipped indices)

### clean_data

``` python
data.system_data.clean_data(filepath)
```

Load and clean the data.

Args: filepath: Path to the data file

Returns: Cleaned DataFrame

### convert_and_validate_dates

``` python
data.system_data.convert_and_validate_dates(df, date_columns)
```

Convert date columns to datetime and validate the conversion.

Args: df: DataFrame to process date_columns: List of date columns to convert

Returns: Tuple of (processed DataFrame, validation results)

### count_rows_with_missing_geographical_info

``` python
data.system_data.count_rows_with_missing_geographical_info(df)
```

Count rows with missing geographical information.

Args: df: DataFrame to check

Returns: Number of rows with missing geographical information

### filter_columns

``` python
data.system_data.filter_columns(df, columns_to_keep)
```

Filter DataFrame to keep only specified columns.

Args: df: DataFrame to filter columns_to_keep: List of columns to keep

Returns: Filtered DataFrame

### get_unique_states

``` python
data.system_data.get_unique_states(df)
```

Get unique states in the dataset.

Args: df: DataFrame to check

Returns: Set of unique states

### load_data

``` python
data.system_data.load_data(filepath)
```

Load data from a CSV file.

Args: filepath: Path to the CSV file

Returns: DataFrame containing the loaded data

Raises: FileNotFoundError: If the file doesn’t exist

### print_validation_results

``` python
data.system_data.print_validation_results(validation_results)
```

Print date validation results in a table format.

Args: validation_results: Dictionary of validation results by column

### process_solar_systems

``` python
data.system_data.process_solar_systems(systems_path, date_cols, systems_cols)
```

Process solar system data from a file.

Args: systems_path: Path to the systems data file date_cols: List of date columns to process systems_cols: List of columns to include

Returns: Processed DataFrame

### systems_with_no_county

``` python
data.system_data.systems_with_no_county(df)
```

Identify systems with no county information.

Args: df: DataFrame to check

Returns: Boolean Series indicating rows with no county

### systems_with_no_lat_lon

``` python
data.system_data.systems_with_no_lat_lon(df)
```

Identify systems with no latitude and longitude.

Args: df: DataFrame to check

Returns: Boolean Series indicating rows with no lat/lon
