a2ihelper¶

a2ihelper is a Python library for downstream analysis of A-to-I editing. It is build under Python 3.10.13 to analyze statistics REDItools2 output files. It is possible to analyze frequency or proportional data. The principal statistics tests used in 68 papers are included in a2iHelperPy.

Requirements¶

numpy (version >= 1.26.4)

pandas (version >= 2.2.0)

scipy (version >= 1.12.0)

scikit-posthocs (version >= 0.9.0)

Optional plot:

matplotlib (version >= 3.8.3)

seaborn (version >= 0.13.2)

Optionals Editing detection:

reditools (version = 2.0)

pysam (versoin >= 0.22.0)

sortedcontainers (version >= 2.4.0)

psutil (version >= 5.9.8)

netifaces (version >= 0.11.0)

Installing¶

We recommend use pip to install the package

pip install a2ihelper

But you can also use the git clone and install via setup.py

Quickstart¶

You need:

Mapped BAM files.
GTF annotation files (the same version used to mapping).
List of genes or coordinates of interest.

Get coordinates from list of genes:

You need to inform a list of genes that you want to get the coordinates and the GTF annotated path file (Make sure to set gzip_file=True if the file is “gzipped”). It is important to use the same file used to mapping.

genes = ['Notch1','B2m','Hdc','Il1b']

path_ref_annotation='~/../reference_files/gencode.annotation.gtf'

gene_coord = a2i.call_reditools2.get_genes_positions(genes, path_ref_annotation, gzip_file=False)
gene_coord

{'Notch1': 'chr2:26457903-26516663', 'B2m': 'chr2:122147686-122153083',
'Hdc': 'chr2:126593667-126619299', 'Il1b': 'chr2:129364570-129371139'}

Running REDItools2 from a2ihelper

genes_positions = gene_coord.values() # coordinates from gene_coord dictionary
in_bam_file_list = ['/mnt/d/rna_editing/bams/'+f for f in os.listdir('/mnt/d/rna_editing/bams/') if f.endswith('bam')] #list of bam files
path_out_res = '/mnt/d/rna_editing/a2i_helper_output/res/' # directory for output files
ref_genome_file = '/mnt/d/reference_files/mus_musculus/GRCm39.genome.fa' # Reference Genome
path_reditools = '/mnt/d/reditools2.0/src/cineca/' # directory where reditools.py is installed
reditools_options = '--strict' # optional arguments separeted per single space

for in_bam_file in in_bam_file_list:
   a2i.call_reditools2.run_per_gene_position_list(genes_positions, in_bam_file, path_out_res,
                                                  ref_genome_file, path_reditools,
                                                  reditools_options='', n_jobs=10)

It’ll generate the RES files in the path_out_res directory.

Now you need to prepare your metadata file. The first column must be the path to the RES files, the second columns must be the sample name, third must be the region, fourth the condition, and the fifth the coordinates.

Analyzing ONE region

df, df_a, df_g = a2i.editing.merge_files_one_region(meta[meta.region=='B2m'])

The function merge_files_one_region() returns three DataFrames. First is the frequency of editing, second is the Adenine counts and the third the Guanine (Inosine) counts.

Analyzing ALL region

df, df_a, df_g, region_list = a2i.editing.merge_files_all_regions(meta)

The function merge_files_all_regions() returns three DataFrames and one list with the sequence of genes counts. First DataFrame is the frequency of editing, second is the Adenine counts and the third the Guanine (Inosine) counts.

Statistics for FREQUENCY

Mann-Whitney U test

df_pv = a2i.editing.mannwhitney_test(df,
                                     only_pvalue=True,
                                     pvalue_filter_limit=0.05,
                                     fdr_correction=True,
                                     fdr_filter_limit=0.05,
                                     return_only_significant=True)

ANOVA Tukey

df_pv = a2i.editing.anova_tukey_test(df,
                                     only_pvalue=True,
                                     pvalue_filter_limit_anova=0.05,
                                     pvalue_filter_limit_tukey=0.05,
                                     return_only_significant=True)

Kruskal Dunn

df_pv = a2i.editing.kruskal_dunn_test(df,
                                      only_pvalue=True,
                                      pvalue_filter_limit_kruskal=0.05,
                                      pvalue_filter_limit_dunn=0.05,
                                      return_only_significant=True)

Statistics for PROPORTION

Pooling replicates

May you need to pool the replicates to perform a chi-square or fisher tests. To do that you can use the pool_positions() to sum the coordinates by independecy G-test.

a, g = a2i.editing.pool_positions(df_a,
                                  df_g,
                                  pvalue_filter_limit=0.05,
                                  gtest_filter_limit=0,
                                  bh_correction=False)

Chi-square test

chi = a2i.editing.chi2_test(a, g,
                            only_pvalue=True,
                            pvalue_filter_limit=0.05)

Fisher test

fis = a2i.editing.fisher_test(a, g,
                              only_pvalue=True,
                              pvalue_filter_limit=.05)

a2ihelper¶

Requirements¶

Installing¶

Quickstart¶

a2iHelperPy

Navigation

Related Topics