unravel.cluster_stats.table module#
Use cstats_table
(ct
) from UNRAVEL to summarize volumes of the top x regions and collapsing them into parent regions until a criterion is met.
- Prereqs:
This command is usually run via
cstats_summary
.
- Inputs:
CSVs with sunburst data for each cluster (e.g., cluster_`*`_sunburst.csv).
*`cluster_info.txt in the parent dir (made by ``cstats_fdr` and copied by
cstats_org_data
).
- Outputs:
A color-coded xlsx table summarizing the top regions and their volumes for each cluster.
A hierarchically sorted CSV with regional volumes for each cluster.
Sorting by hierarchy and volume:#
- Group by Depth: Starting from the earliest depth column, for each depth level:
Sum the volumes of all rows sharing the same region (or combination of regions up to that depth).
Sort these groups by their aggregate volume in descending order, ensuring larger groups are prioritized.
- Sort Within Groups: Within each group created in step 1:
Sort the rows by their individual volume in descending order.
- Maintain Grouping Order:
As we move to deeper depth levels, maintain the grouping and ordering established in previous steps (only adjusting the order within groups based on the new depth’s aggregate volumes).
Note
CCFv3-2020_info.csv is in UNRAVEL/unravel/core/csvs/
It has columns: structure_id_path,very_general_region,collapsed_region_name,abbreviation,collapsed_region,other_abbreviation,other_abbreviation_defined,layer,sunburst
Alternatively, use CCFv3-2017_info.csv or provide a custom CSV with the same columns.
Usage:#
cstats_table [-vcd <val_clusters_dir>] [-t <number of top regions>] [-pv <perecent volume criterion>] [-csv CCFv3-2020_info.csv] [-rgb sunburst_RGBs.csv] [-v]
- unravel.cluster_stats.table.sort_sunburst_hierarchy(df)[source]#
Sort the DataFrame by hierarchy and volume.
- unravel.cluster_stats.table.can_collapse(df, depth_col)[source]#
Determine if regions in the specified depth column can be collapsed. Returns a DataFrame with regions that can be collapsed based on volume and count criteria.
- unravel.cluster_stats.table.calculate_top_regions(df, top_n, percent_vol_threshold, verbose=False)[source]#
Identify the top regions based on the dynamically collapsed hierarchy, ensuring they meet the specified percentage volume criterion.
- Parameters:
df_collapsed – DataFrame with the hierarchy collapsed where meaningful.
top_n – The number of top regions to identify.
percent_vol_threshold – The minimum percentage of total volume these regions should represent.
- Returns:
DataFrame with the top regions and their volumes if the criterion is met; otherwise, None.