

# dataframe_utils

`utils.dataframe_utils`

Data quality diagnostics utilities.

This module provides functions for diagnosing data quality issues in pandas DataFrames, including detecting duplicates, missing values, and data type mismatches.

## Functions

| Name | Description |
|----|----|
| [attach_metadata](#nova_fde.utils.dataframe_utils.attach_metadata) | Attach metadata to a DataFrame. |
| [combine_dataframes](#nova_fde.utils.dataframe_utils.combine_dataframes) | Combine multiple DataFrames into a single DataFrame. |
| [compare_dataframes](#nova_fde.utils.dataframe_utils.compare_dataframes) | Compare two DataFrames and identify differences. |
| [create_summary_dataframe](#nova_fde.utils.dataframe_utils.create_summary_dataframe) | Create a summary DataFrame from a dictionary of DataFrames. |
| [diagnose_dataframe](#nova_fde.utils.dataframe_utils.diagnose_dataframe) | Diagnose issues in a DataFrame. |
| [extract_dataframes](#nova_fde.utils.dataframe_utils.extract_dataframes) | Extract DataFrames from various result types. |
| [get_attached_frames](#nova_fde.utils.dataframe_utils.get_attached_frames) | Get attached data frames from a DataFrame. |
| [has_attached_frames](#nova_fde.utils.dataframe_utils.has_attached_frames) | Check if a DataFrame has attached data frames. |
| [safe_to_dataframe](#nova_fde.utils.dataframe_utils.safe_to_dataframe) | Safely convert various input types to a DataFrame. |
| [validate_dataframe](#nova_fde.utils.dataframe_utils.validate_dataframe) | Validate a DataFrame against requirements. |

### attach_metadata

``` python
utils.dataframe_utils.attach_metadata(df, metadata)
```

Attach metadata to a DataFrame.

#### Parameters

| Name     | Type         | Description                      | Default    |
|----------|--------------|----------------------------------|------------|
| df       | pd.DataFrame | DataFrame to attach metadata to. | *required* |
| metadata | Dict         | Metadata dictionary to attach.   | *required* |

#### Returns

| Name | Type         | Description                       |
|------|--------------|-----------------------------------|
|      | pd.DataFrame | DataFrame with attached metadata. |

### combine_dataframes

``` python
utils.dataframe_utils.combine_dataframes(
    data_frames,
    join_column=None,
    how='left',
)
```

Combine multiple DataFrames into a single DataFrame.

#### Parameters

| Name | Type | Description | Default |
|----|----|----|----|
| data_frames | Dict\[str, pd.DataFrame\] | Dictionary of DataFrames to combine. | *required* |
| join_column | Optional\[str\] | Column to join on, by default None. If None, will try to find common columns. | `None` |
| how | str | Join method, by default ‘left’. | `'left'` |

#### Returns

| Name | Type         | Description                                       |
|------|--------------|---------------------------------------------------|
|      | pd.DataFrame | Combined DataFrame or empty DataFrame if no data. |

### compare_dataframes

``` python
utils.dataframe_utils.compare_dataframes(
    df1,
    df2,
    key_column,
    compare_columns=None,
    console=None,
)
```

Compare two DataFrames and identify differences.

#### Parameters

| Name | Type | Description | Default |
|----|----|----|----|
| df1 | pd.DataFrame | First DataFrame | *required* |
| df2 | pd.DataFrame | Second DataFrame | *required* |
| key_column | str | Column to use as key for matching rows | *required* |
| compare_columns | List\[str\] | List of columns to compare (if None, compares all common columns) | `None` |
| console | Console | Rich Console instance for logging. If None, print statements are used. | `None` |

#### Returns

| Name | Type             | Description                        |
|------|------------------|------------------------------------|
|      | Dict\[str, Any\] | Dictionary with comparison results |

### create_summary_dataframe

``` python
utils.dataframe_utils.create_summary_dataframe(data_frames)
```

Create a summary DataFrame from a dictionary of DataFrames.

#### Parameters

| Name | Type | Description | Default |
|----|----|----|----|
| data_frames | Dict\[str, pd.DataFrame\] | Dictionary of DataFrames to summarize. | *required* |

#### Returns

| Name | Type         | Description                                            |
|------|--------------|--------------------------------------------------------|
|      | pd.DataFrame | Summary DataFrame with information about each dataset. |

### diagnose_dataframe

``` python
utils.dataframe_utils.diagnose_dataframe(
    df,
    key_columns=None,
    id_columns=None,
    console=None,
)
```

Diagnose issues in a DataFrame.

#### Parameters

| Name | Type | Description | Default |
|----|----|----|----|
| df | pd.DataFrame | DataFrame to diagnose | *required* |
| key_columns | List\[str\] | List of columns that should be unique keys | `None` |
| id_columns | List\[str\] | List of columns that should contain IDs (e.g., system IDs, account IDs) | `None` |
| console | Console | Rich Console instance for logging. If None, print statements are used. | `None` |

#### Returns

| Name | Type             | Description                      |
|------|------------------|----------------------------------|
|      | Dict\[str, Any\] | Dictionary of diagnostic results |

### extract_dataframes

``` python
utils.dataframe_utils.extract_dataframes(result)
```

Extract DataFrames from various result types.

#### Parameters

| Name   | Type | Description                                  | Default    |
|--------|------|----------------------------------------------|------------|
| result | Any  | Result object that might contain DataFrames. | *required* |

#### Returns

| Name | Type                      | Description                         |
|------|---------------------------|-------------------------------------|
|      | Dict\[str, pd.DataFrame\] | Dictionary of extracted DataFrames. |

### get_attached_frames

``` python
utils.dataframe_utils.get_attached_frames(df)
```

Get attached data frames from a DataFrame.

#### Parameters

| Name | Type         | Description                            | Default    |
|------|--------------|----------------------------------------|------------|
| df   | pd.DataFrame | DataFrame to get attached frames from. | *required* |

#### Returns

| Name | Type | Description |
|----|----|----|
|  | Dict\[str, pd.DataFrame\] | Dictionary of attached frames or empty dict if none found. |

### has_attached_frames

``` python
utils.dataframe_utils.has_attached_frames(df)
```

Check if a DataFrame has attached data frames.

#### Parameters

| Name | Type         | Description         | Default    |
|------|--------------|---------------------|------------|
| df   | pd.DataFrame | DataFrame to check. | *required* |

#### Returns

| Name | Type | Description                                                 |
|------|------|-------------------------------------------------------------|
|      | bool | True if the DataFrame has attached frames, False otherwise. |

### safe_to_dataframe

``` python
utils.dataframe_utils.safe_to_dataframe(data)
```

Safely convert various input types to a DataFrame.

#### Parameters

| Name | Type | Description                   | Default    |
|------|------|-------------------------------|------------|
| data | Any  | Data to convert to DataFrame. | *required* |

#### Returns

| Name | Type | Description |
|----|----|----|
|  | pd.DataFrame | Converted DataFrame or empty DataFrame if conversion fails. |

### validate_dataframe

``` python
utils.dataframe_utils.validate_dataframe(
    df,
    required_columns=None,
    column_types=None,
    non_null_columns=None,
    unique_columns=None,
    console=None,
)
```

Validate a DataFrame against requirements.

#### Parameters

| Name | Type | Description | Default |
|----|----|----|----|
| df | pd.DataFrame | DataFrame to validate | *required* |
| required_columns | List\[str\] | List of required column names | `None` |
| column_types | Dict\[str, str\] | Dictionary mapping column names to required types | `None` |
| non_null_columns | List\[str\] | List of columns that should not contain null values | `None` |
| unique_columns | List\[str\] | List of columns that should have unique values | `None` |
| console | Console | Rich Console instance for logging. If None, print statements are used. | `None` |

#### Returns

| Name | Type             | Description                        |
|------|------------------|------------------------------------|
|      | Dict\[str, Any\] | Dictionary with validation results |
