system_data
data.system_data
Module for loading and processing solar system data.
Classes
| Name | Description |
|---|---|
| DateValidationResult | Model for date validation results. |
DateValidationResult
data.system_data.DateValidationResult()Model for date validation results.
Functions
| Name | Description |
|---|---|
| add_state_full_name_column | Add full state names to the DataFrame. |
| calculate_distances | Calculate distances between approximate and ArcGIS coordinates. |
| clean_data | Load and clean the data. |
| convert_and_validate_dates | Convert date columns to datetime and validate the conversion. |
| count_rows_with_missing_geographical_info | Count rows with missing geographical information. |
| filter_columns | Filter DataFrame to keep only specified columns. |
| get_unique_states | Get unique states in the dataset. |
| load_data | Load data from a CSV file. |
| print_validation_results | Print date validation results in a table format. |
| process_solar_systems | Process solar system data from a file. |
| systems_with_no_county | Identify systems with no county information. |
| systems_with_no_lat_lon | Identify systems with no latitude and longitude. |
add_state_full_name_column
data.system_data.add_state_full_name_column(df, state_dict)Add full state names to the DataFrame.
Args: df: DataFrame to process state_dict: Dictionary mapping state abbreviations to full names
Returns: DataFrame with added StateFullName column
calculate_distances
data.system_data.calculate_distances(df, max_workers=8)Calculate distances between approximate and ArcGIS coordinates.
Args: df: DataFrame containing coordinate columns max_workers: Maximum number of parallel workers
Returns: Tuple of (DataFrame with distance column, summary dict, list of skipped indices)
clean_data
data.system_data.clean_data(filepath)Load and clean the data.
Args: filepath: Path to the data file
Returns: Cleaned DataFrame
convert_and_validate_dates
data.system_data.convert_and_validate_dates(df, date_columns)Convert date columns to datetime and validate the conversion.
Args: df: DataFrame to process date_columns: List of date columns to convert
Returns: Tuple of (processed DataFrame, validation results)
count_rows_with_missing_geographical_info
data.system_data.count_rows_with_missing_geographical_info(df)Count rows with missing geographical information.
Args: df: DataFrame to check
Returns: Number of rows with missing geographical information
filter_columns
data.system_data.filter_columns(df, columns_to_keep)Filter DataFrame to keep only specified columns.
Args: df: DataFrame to filter columns_to_keep: List of columns to keep
Returns: Filtered DataFrame
get_unique_states
data.system_data.get_unique_states(df)Get unique states in the dataset.
Args: df: DataFrame to check
Returns: Set of unique states
load_data
data.system_data.load_data(filepath)Load data from a CSV file.
Args: filepath: Path to the CSV file
Returns: DataFrame containing the loaded data
Raises: FileNotFoundError: If the file doesnโt exist
print_validation_results
data.system_data.print_validation_results(validation_results)Print date validation results in a table format.
Args: validation_results: Dictionary of validation results by column
process_solar_systems
data.system_data.process_solar_systems(systems_path, date_cols, systems_cols)Process solar system data from a file.
Args: systems_path: Path to the systems data file date_cols: List of date columns to process systems_cols: List of columns to include
Returns: Processed DataFrame
systems_with_no_county
data.system_data.systems_with_no_county(df)Identify systems with no county information.
Args: df: DataFrame to check
Returns: Boolean Series indicating rows with no county
systems_with_no_lat_lon
data.system_data.systems_with_no_lat_lon(df)Identify systems with no latitude and longitude.
Args: df: DataFrame to check
Returns: Boolean Series indicating rows with no lat/lon