Treasury Analytics Core
  • Home
  • API Reference
  • Examples
  • Credentials
  • Global Settings
  1. API Reference
  2. Utilities
  3. dataframe_utils
  • Overview
    • Treasury Analytics Core
  • Credential Management
    • Secure Credential Management
    • Global Settings
  • Examples
    • Example Projects
  • API Reference
    • Function reference
    • Core Components
      • FinanceDataEngine
      • engine_factory
      • engine
      • database
      • cache
      • processor
      • analyzer
    • Configuration
      • settings
      • settings_factory
      • global_settings_manager
    • Utilities
      • credentials
      • dataframe_utils
      • setup_logging
      • env_checker
      • query_timer
      • setup_helper
      • logging
      • constants
      • types
    • Scripts
      • configure_db
      • create_project
      • manage_settings

On this page

  • dataframe_utils
    • Functions
      • attach_metadata
      • combine_dataframes
      • compare_dataframes
      • create_summary_dataframe
      • diagnose_dataframe
      • extract_dataframes
      • get_attached_frames
      • has_attached_frames
      • safe_to_dataframe
      • validate_dataframe

Other Formats

  • Github (GFM)
  1. API Reference
  2. Utilities
  3. dataframe_utils

dataframe_utils

utils.dataframe_utils

Data quality diagnostics utilities.

This module provides functions for diagnosing data quality issues in pandas DataFrames, including detecting duplicates, missing values, and data type mismatches.

Functions

Name Description
attach_metadata Attach metadata to a DataFrame.
combine_dataframes Combine multiple DataFrames into a single DataFrame.
compare_dataframes Compare two DataFrames and identify differences.
create_summary_dataframe Create a summary DataFrame from a dictionary of DataFrames.
diagnose_dataframe Diagnose issues in a DataFrame.
extract_dataframes Extract DataFrames from various result types.
get_attached_frames Get attached data frames from a DataFrame.
has_attached_frames Check if a DataFrame has attached data frames.
safe_to_dataframe Safely convert various input types to a DataFrame.
validate_dataframe Validate a DataFrame against requirements.

attach_metadata

utils.dataframe_utils.attach_metadata(df, metadata)

Attach metadata to a DataFrame.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to attach metadata to. required
metadata Dict Metadata dictionary to attach. required

Returns

Name Type Description
pd.DataFrame DataFrame with attached metadata.

combine_dataframes

utils.dataframe_utils.combine_dataframes(
    data_frames,
    join_column=None,
    how='left',
)

Combine multiple DataFrames into a single DataFrame.

Parameters

Name Type Description Default
data_frames Dict[str, pd.DataFrame] Dictionary of DataFrames to combine. required
join_column Optional[str] Column to join on, by default None. If None, will try to find common columns. None
how str Join method, by default β€˜left’. 'left'

Returns

Name Type Description
pd.DataFrame Combined DataFrame or empty DataFrame if no data.

compare_dataframes

utils.dataframe_utils.compare_dataframes(
    df1,
    df2,
    key_column,
    compare_columns=None,
    console=None,
)

Compare two DataFrames and identify differences.

Parameters

Name Type Description Default
df1 pd.DataFrame First DataFrame required
df2 pd.DataFrame Second DataFrame required
key_column str Column to use as key for matching rows required
compare_columns List[str] List of columns to compare (if None, compares all common columns) None
console Console Rich Console instance for logging. If None, print statements are used. None

Returns

Name Type Description
Dict[str, Any] Dictionary with comparison results

create_summary_dataframe

utils.dataframe_utils.create_summary_dataframe(data_frames)

Create a summary DataFrame from a dictionary of DataFrames.

Parameters

Name Type Description Default
data_frames Dict[str, pd.DataFrame] Dictionary of DataFrames to summarize. required

Returns

Name Type Description
pd.DataFrame Summary DataFrame with information about each dataset.

diagnose_dataframe

utils.dataframe_utils.diagnose_dataframe(
    df,
    key_columns=None,
    id_columns=None,
    console=None,
)

Diagnose issues in a DataFrame.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to diagnose required
key_columns List[str] List of columns that should be unique keys None
id_columns List[str] List of columns that should contain IDs (e.g., system IDs, account IDs) None
console Console Rich Console instance for logging. If None, print statements are used. None

Returns

Name Type Description
Dict[str, Any] Dictionary of diagnostic results

extract_dataframes

utils.dataframe_utils.extract_dataframes(result)

Extract DataFrames from various result types.

Parameters

Name Type Description Default
result Any Result object that might contain DataFrames. required

Returns

Name Type Description
Dict[str, pd.DataFrame] Dictionary of extracted DataFrames.

get_attached_frames

utils.dataframe_utils.get_attached_frames(df)

Get attached data frames from a DataFrame.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to get attached frames from. required

Returns

Name Type Description
Dict[str, pd.DataFrame] Dictionary of attached frames or empty dict if none found.

has_attached_frames

utils.dataframe_utils.has_attached_frames(df)

Check if a DataFrame has attached data frames.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to check. required

Returns

Name Type Description
bool True if the DataFrame has attached frames, False otherwise.

safe_to_dataframe

utils.dataframe_utils.safe_to_dataframe(data)

Safely convert various input types to a DataFrame.

Parameters

Name Type Description Default
data Any Data to convert to DataFrame. required

Returns

Name Type Description
pd.DataFrame Converted DataFrame or empty DataFrame if conversion fails.

validate_dataframe

utils.dataframe_utils.validate_dataframe(
    df,
    required_columns=None,
    column_types=None,
    non_null_columns=None,
    unique_columns=None,
    console=None,
)

Validate a DataFrame against requirements.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to validate required
required_columns List[str] List of required column names None
column_types Dict[str, str] Dictionary mapping column names to required types None
non_null_columns List[str] List of columns that should not contain null values None
unique_columns List[str] List of columns that should have unique values None
console Console Rich Console instance for logging. If None, print statements are used. None

Returns

Name Type Description
Dict[str, Any] Dictionary with validation results
credentials
setup_logging
 
 
  • Built with [Quarto](https://quarto.org/) and [quartodoc](https://machow.github.io/quartodoc/)