Dataset download
File explanation Download
Homo sapiens (Human) Download
Mus musculus (Mouse) Download
Rattus norvegicus (Rat) Download
Arabidopsis thaliana (Mouse-ear cress) Download
Danio rerio (Zebrafish) (Brachydanio rerio) Download
Gallus gallus (Chicken) Download
Bos taurus (Bovine) Download
Dictyostelium discoideum (Slime mold) Download
Canis lupus familiaris (Dog) (Canis familiaris) Download
Sus scrofa (Pig) Download
Bacillus subtilis (strain 168) Download
Caenorhabditis elegans Download
Oryza sativa subsp. japonica (rice) Download
Xenopus laevis (African clawed frog) Download
Drosophila melanogaster (Fruit fly) Download
Saccharomyces cerevisiae (Baker's yeast) Download
Escherichia coli (Strain K12 / DH10B) Download
Schizosaccharomyces pombe (strain 972 / ATCC 24843) (fission yeast) Download

File loading:

We provide all predictors' scores integrated in PhaSePred,
Python pandas.read_json() was recommended to read these files:

import pandas as pd
df = pd.read_json(json_path)

If you met an error when using pd.read_json, you can load the json file as a dict with the following codes:

import json
with open(json_path, 'r') as f:
    df = json.load(f)

Key fields in the json file:

The json file is indexed by the UniProt entries. For each entry, the following fields were provided: 'Entry name', 'Status', 'Gene names', 'Organism', 'Sequence', 'catGRANULE', 'PLAAC', 'PScore', 'ESpritz-DisProt', 'Hydropathy', 'DeepCoil', 'SEG', 'Charged residue', 'Phos', 'DeepPhase', 'PhaSePred', 'InterPro'. The 'Entry name', 'Status', 'Gene names', 'Organism', and 'Sequence' provide basic protein information, and the remaining fields provide predictions of LLPS-related properties:
1. catGRANULE
'residue': Residue-level score by catGRANULE. Scores of the first 25 residues and the last 25 residues are missed.
'single': Protein-level score by catGRANULE.
'start' & 'end': Index pairs of regions with top 10% ranked scores in the corresponding species.
'rnk': Rank of the protein-level score in the corresponding species (1-rank was shown).
2. PLAAC
'residue': Residue-level score by PLAAC.
'NLLR': Protein-level score by PLAAC.
'start' & 'end': Index pairs of regions with PLAAC score >= 0.5.
'rnk': Rank of the NLLR score in the corresponding species (1-rank was shown).
3. PScore
'residue': Residue-level score by PScore. The scores of the first residue and last two residues are missed.
'single': Protein-level score by PScore.
'start' & 'end': Index pairs of regions with PScore >= 4.
'rnk': Rank of the protein-level score in the corresponding species (1-rank was shown).
4. ESpritz-DisProt
'residue': Residue-level score of IDR by ESpritz-DisProt.
'label': Residue-level label of IDR. 'D' represents a predicted disordered residue, and 'O' represents a predicted ordered residue.
'start' & 'end': Index pairs of regions labeled with 'D'.
'rnk': Rank of the proportion of disordered residues in the corresponding species (1-rank was shown).
5. Hydropathy
'residue': Residue-level hydropathy score by localCIDER.
'single': Protein-level hydropathy score by localCIDER.
'start' & 'end': Index pairs of regions with Hydropathy score >= 0.5.
'rnk': Rank of the protein-level Hydropathy score in the corresponding species (1-rank was shown).
6. DeepCoil
'coiled-coil': Residue-level score of coiled coil domain.
'sharpen': Sharpened coiled coil propensity with a peak detection algorithm.
'start' & 'end': Index pairs of regions with DeepCoil score >= 0.75.
7. SEG
'label': Residue-level label of LCR. '1' represents a predicted low-complexity residue.
'start' & 'end': Index pairs of regions labeled with '1'.
'rnk': Rank of the proportion of low-complexity residues in the corresponding species (1-rank was shown).
8. Charged residue
'label': Residue-level label of charged residue. '1' represents a positively charged residue, '-1' represents a negatively charged residue.
'FCR': Fraction of charged residues.
'POS_pos': Index of residues labeled with '1'.
'POS_neg': Index of residues labeled with '-1'.
'rnk': Rank of the FCR in the corresponding species (1-rank was shown).
9. Phos
'Phos': Residue-level label of phosphorylated residue. '1' represents a phosphorylation site recorded in the PhophoSitePlus.
'POS': Index of residues labeled with '1'.
'rnk': Rank of the Phos frequency in the human proteome (1-rank was shown).
10. DeepPhase
'single': Protein-level score by DeepPhase.
'location': Protein subcellular localization.
'url': URL of IF image.
'rnk': Rank of the DeepPhase score in the human proteome (1-rank was shown).
11. PhaSePred
'SaPS-8fea': Protein-level score by 8-feature SaPS model.
'PdPS-8fea': Protein-level score by 8-feature PdPS model.
'SaPS-10fea': Protein-level score by 10-feature SaPS model (Only available for human proteome).
'PdPS-10fea': Protein-level score by 10-feature SaPS model (Only available for human proteome).
'SaPS-8fea_rnk': Rank of the protein-level score by 8-feature SaPS model in the corresponding species (1-rank was shown).
'PdPS-8fea_rnk': Rank of the protein-level score by 8-feature PdPS model in the corresponding species (1-rank was shown).
'SaPS-10fea_rnk': Rank of the protein-level score by 10-feature SaPS model in the human proteome (1-rank was shown).
'PdPS-10fea_rnk': Rank of the protein-level score by 10-feature PdPS model in the human proteome (1-rank was shown).
12. InterPro
'PfamID': PfamID of domains detected by InterPro.
'domain': Domain names of corresponding PfamIDs.
'start' & 'end': Index pairs of detected domains.