ArcFISH as a command line tool#
Here we demonstrate how to use ArcFISH as a command line tool. Make sure arcfish is successfully installed (see installation) and activate the environment with ArcFISH in shell:
>>> conda activate arcfish_env
>>> python -m pip show arcfish
Name: arcfish
Version: ...
>>> source .arcfish_env/bin/activate
>>> python -m pip show arcfish
Name: arcfish
Version: ...
Here we use the DNA seqFISH+ data from Takei et al., 2021 to call chromatin loops, TADs, and A/B compartments. The data can be downloaded from the 4DN data portal with ID: 4DNFIW4S8M6J (biological replicate 1), 4DNFI4LI6NNV (biological replicate 2), and 4DNFIDUJQDNO (biological replicate 3).
Download the data and place them like the following:
.
├── data
│ ├── takei_science_2021
│ │ ├── 4DNFIW4S8M6J.csv
│ │ ├── 4DNFI4LI6NNV.csv
│ │ └── 4DNFIDUJQDNO.csv
├── output
Preprocessing#
First run the following to preprocess the data and store them as AnnData objects:
arcfish preprocess \
-i "rep1::data/takei_science_2021/4DNFIW4S8M6J.csv,\
rep2::data/takei_science_2021/4DNFI4LI6NNV.csv,\
rep3::data/takei_science_2021/4DNFIDUJQDNO.csv" \
-o pp_chr_adata \
-v X::1000,Y::1000,Z::1000
The -v argument specifies how to convert the input data to nm. Since the 3D coordinates are stored as µm, we multiply each axis by 1000.
This step creates a directory pp_chr_adata with a .h5ad file for each chromosome. The processed data will be used in all following steps.
Loop calling#
Once the processed data are stored in pp_chr_adata, loop calling can be done by
arcfish loop -i pp_chr_adata -o output/loop_res.bedpe
which stores the loop calling result to output/loop_res.bedpe.
If desired, loop calling parameters can be changed by adding additional arguments:
fdr: FDR cutoff to define loop candidates, by default 0.1.pval: p-value cutoff to filter loop summits, by default 1e-5.lo: minimum loop length to be considered, by default 100Kb.up: maximum loop length to be considered, by default 1Mb.gap: candidates withingapaway are treated as from the same summit, by default 50Kb.outer: local background size, by default 50Kb.
For example, to test loops with range from 10Kb to 2Mb, change the command to
arcfish loop -i pp_chr_adata -o output/loop_res.bedpe -lo 10000 -up 1000000
Domain calling#
To call TADs, run the following
arcfish domain -i pp_chr_adata -o output/domain_res.bedpe
The domain calling result is in output/domain_res.bedpe.
Additional optional arguments for TAD calling are:
fdr: FDR cutoff to define TAD boundaries, by default 0.1.window: region size to consider intra/inter-domain contacts, by default 100Kb.tree: whether to return hierarchical TADs, by default True.min: minimum TAD size, by default 0.
Compartment calling#
To call A/B compartments, run the following
arcfish cpmt -i pp_chr_adata -o output/cpmt_res.bedpe
An optional -min argument can be passed in to specify the minimum A/B compartment size, by default 0.
Note
The compartment calling result here is not very meaningful as only a 1.5Mb (60 loci at 25Kb resolution) region is imaged on each chromosome. This is mainly to demonstrate the compartment calling workflow.