unravel.cluster_stats.cstats module#

Use cstats from UNRAVEL to validate clusters based on differences in cell/object or label density w/ t-tests.

Input files:
  • *`_density_data.csv from ``cstats_validation` (e.g., in each subdir named after the rev_cluster_index.nii.gz file)

Outputs:
  • ./_valid_clusters_stats/

Note

  • Organize data in directories for each comparison (e.g., psilocybin > saline, etc.)

  • This script will loop through all directories in the current working dir and process the data in each subdir.

  • Each subdir should contain .csv files with the density data for each cluster.

  • The first 2 groups reflect the main comparison for validation rates.

  • Clusters are not considered valid if the effect direction does not match the expected direction.

CSV naming conventions:
  • Condition: first word before ‘_’ in the file name

  • Side: last word before .csv (LH or RH)

Example unilateral inputs in the subdirs:
  • condition1_sample01_<cell|label>_density_data.csv

  • condition1_sample02_<cell|label>_density_data.csv

  • condition2_sample03_<cell|label>_density_data.csv

  • condition2_sample04_<cell|label>_density_data.csv

Example bilateral inputs (if any file has _LH.csv or _RH.csv, the command will attempt to pool data):
  • condition1_sample01_<cell|label>_density_data_LH.csv

  • condition1_sample01_<cell|label>_density_data_RH.csv

Examples

  • Grouping data by condition prefixes:

    cstats –groups psilocybin saline –condition_prefixes saline psilocybin - This will treat all ‘psilocybin*’ conditions as one group and all ‘saline*’ conditions as another - Since there will then effectively be two conditions in this case, they will be compared using a t-test

Columns in the .csv files:

sample, cluster_ID, <cell_count|label_volume>, cluster_volume, <cell_density|label_density>, …

Usage for t-tests:#

cstats –groups <group1> <group2> -hg <group1|group2> [-cp <condition_prefixes>] [-alt <two-sided|less|greater>] [-pvt <p_value_threshold.txt>] [-v]

Usage for Tukey’s tests:#

cstats –groups <group1> <group2> <group3> <group4> … -hg <group1|group2> [-cp <condition_prefixes>] [-alt <two-sided|less|greater>] [-pvt <p_value_threshold.txt>] [-v]

unravel.cluster_stats.cstats.parse_args()[source]#
unravel.cluster_stats.cstats.condition_selector(df, condition, unique_conditions, condition_column='Conditions')[source]#

Create a condition selector to handle pooling of data in a DataFrame based on specified conditions. This function checks if the ‘condition’ is exactly present in the ‘Conditions’ column or is a prefix of any condition in this column. If the exact condition is found, it selects those rows. If the condition is a prefix (e.g., ‘saline’ matches ‘saline-1’, ‘saline-2’), it selects all rows where the ‘Conditions’ column starts with this prefix. An error is raised if the condition is neither found as an exact match nor as a prefix.

Parameters:
  • df (pd.DataFrame) – DataFrame whose ‘Conditions’ column contains the conditions of interest.

  • condition (str) – The condition or prefix of interest.

  • unique_conditions (list) – List of unique conditions in the ‘Conditions’ column to validate against.

Returns:

A boolean Series to select rows based on the condition.

Return type:

pd.Series

unravel.cluster_stats.cstats.cluster_validation_data_df(density_col, has_hemisphere, csv_files, groups, data_col, data_col_pooled, condition_prefixes=None)[source]#

Aggregate the data from all .csv files, pool bilateral data if hemispheres are present, optionally pool data by condition, and return the DataFrame.

Parameters:
  • density_col (-) – the column name for the density data

  • has_hemisphere (-) – whether the data files contain hemisphere indicators (e.g., _LH.csv or _RH.csv)

  • csv_files (-) – a list of .csv files

  • groups (-) – a list of group names

  • data_col (-) – the column name for the data (cell_count or label_volume)

  • data_col_pooled (-) – the column name for the pooled data

Returns:

the DataFrame containing the cluster data
  • Columns: ‘condition’, ‘sample’, ‘cluster_ID’, ‘cell_count’, ‘cluster_volume’, ‘cell_density’

Return type:

  • data_df (pd.DataFrame)

unravel.cluster_stats.cstats.valid_clusters_t_test(df, group1, group2, density_col, alternative='two-sided')[source]#

Perform unpaired t-tests for each cluster in the DataFrame and return the results as a DataFrame.

Parameters:
  • df (-) – the DataFrame containing the cluster data - Columns: ‘condition’, ‘sample’, ‘cluster_ID’, ‘cell_count’, ‘cluster_volume’, ‘cell_density’

  • group1 (-) – the name of the first group

  • group2 (-) – the name of the second group

  • density_col (-) – the column name for the density data

  • alternative (-) – the alternative hypothesis (‘two-sided’, ‘less’, or ‘greater’)

Returns:

the DataFrame containing the t-test results
  • Columns: ‘cluster_ID’, ‘comparison’, ‘higher_mean_group’, ‘p-value’, ‘significance’

Return type:

  • stats_df (pd.DataFrame)

unravel.cluster_stats.cstats.perform_tukey_test(df, groups, density_col)[source]#

Perform Tukey’s HSD test for each cluster in the DataFrame and return the results as a DataFrame

Parameters:
  • df (-) – the DataFrame containing the cluster data - Columns: ‘condition’, ‘sample’, ‘cluster_ID’, ‘cell_count’, ‘cluster_volume’, ‘cell_density’

  • groups (-) – a list of group names

  • density_col (-) – the column name for the density data

Returns:

the DataFrame containing the Tukey’s HSD test results
  • Columns: ‘cluster_ID’, ‘comparison’, ‘higher_mean_group’, ‘p-value’, ‘significance’

Return type:

  • stats_df (pd.DataFrame)

unravel.cluster_stats.cstats.main()[source]#