import anndata as ad1 Introduction
起始于 raw counts matrix。
2 Seurat
The Seurat object contains the following important data structures:
assays:可以包含很多 assay(Assayin Seurat 3, andAssay5in Seurat 5),如RNA和SCT。每个 assay 以layers形式存储 gene by cell matrix(行为 genes/features,列为 cells/barcodes),如counts、data和scale.data。meta.data:存储与细胞相关的元信息。基因相关的元信息存储在对应 assay 的meta.data中。graphs:it contains a list of graphs (usually a nearest-neighbor graph) that are used for clustering and other analyses.reductions:存储细胞或基因在低维空间中的信息,如 PCA、UMAP 和 t-SNE。对于细胞:it is called the matrix of cell embeddings, which has the coordinates of each cell in a low-dimensional space that summarize the most important variation in the data. 对于基因:it is called the matrix of gene loadings, which describe how much each gene contributes to each PC. They tell you which genes drive the variation captured by a given PC.
其它的还有如 clusters、commands、misc 等。
Seurat object 一般存储为 RDS 文件即可。
3 AnnData
AnnData 一般存储为 H5AD 文件。
The active data matrix, a scipy sparse matrix, is in ad.X, which is the matrix of normalized and log(1+x)-transformed counts. Its rows are cells/observations and its columns are genes/variables.
Alternative versions of the active data in is layers, a Python dictionary, which stores each layer under a separate key, such as ad.layers["raw"].
对于 Seurat 而言,上述 matrix 均存储在某一个 assay 的 layers 中,如 RNA assay 中存储的 counts、data 和 scale.data。
Simple annotations (metadata) for cells and genes are in obs and var, Pandas Data Frame objects.
对于 Seurat 而言,就是 meta.data,或给定 assay 的 meta.data(对于基因)。
Index cells and genes with obs_names and var_names which directly mirror the indices of obs and var.
Subset AnnData with either cells, genes or both by name index, numerical index, and/or boolean index, such as ad[:4, ["LYZ", "FOS"]], which returns a view of the AnnData.
Multidimensional annotations for cells and genes are in obsm and varm, a Python dictionary with separate keys, such as ad.obsm["X_pca"].
对于 Seurat 而言,主要存储在 reductions 中。
Annotations for cell-cell and gene-gene pairs are in obsp (all matrices in obsp must have shape n_cells x n_cells and use obs_names as an index to both dimensions) and varp (all matrices in varp must have shape n_genes x n_genes and use var_names as an index to both dimensions), a Python dictionary.
对于 Seurat 而言,如 graphs。
Unstructured annotations or general metadata is in uns, a Python dictionary.
对于 Seurat 而言,诸如 misc。
Note: subsetting AnnData like ad[:4, ["LYZ", "FOS"]] returns a view. There are two cases where a copy will be returned.
Call
.copy()likead[:4, ["LYZ", "FOS"]].copy().Modify any of elements of the view - the view object will be turned into copy.
4 MuData
mudata is the extension of anndata to multimodal datasets. It builds intuitively on anndata by treating the data from each modality as a separate AnnData object, but combining them to a joint, multimodal way in the MuData object.

