File loading:
We provide all predictors' scores integrated in PhaSePred,
Python pandas.read_json() was recommended to read these files:
import pandas as pd
df = pd.read_json(json_path)
If you met an error when using pd.read_json, you can load the json file as a dict with the following codes:
import json
with open(json_path, 'r') as f:
df = json.load(f)
Key fields in the json file:
The json file is indexed by the UniProt entries. For each entry, the following fields were provided: 'Entry name', 'Status', 'Gene names', 'Organism', 'Sequence', 'catGRANULE', 'PLAAC',
'PScore', 'ESpritz-DisProt', 'Hydropathy', 'DeepCoil', 'SEG', 'Charged residue', 'Phos', 'DeepPhase', 'PhaSePred', 'InterPro'.
The 'Entry name', 'Status', 'Gene names', 'Organism', and 'Sequence' provide basic protein information, and the remaining fields provide
predictions of LLPS-related properties:
1. catGRANULE
'residue': Residue-level score by catGRANULE. Scores of the first 25 residues and the last 25 residues are missed.
'single': Protein-level score by catGRANULE.
'start' & 'end': Index pairs of regions with top 10% ranked scores in the corresponding species.
'rnk': Rank of the protein-level score in the corresponding species (1-rank was shown).
2. PLAAC
'residue': Residue-level score by PLAAC.
'NLLR': Protein-level score by PLAAC.
'start' & 'end': Index pairs of regions with PLAAC score >= 0.5.
'rnk': Rank of the NLLR score in the corresponding species (1-rank was shown).
3. PScore
'residue': Residue-level score by PScore. The scores of the first residue and last two residues are missed.
'single': Protein-level score by PScore.
'start' & 'end': Index pairs of regions with PScore >= 4.
'rnk': Rank of the protein-level score in the corresponding species (1-rank was shown).
4. ESpritz-DisProt
'residue': Residue-level score of IDR by ESpritz-DisProt.
'label': Residue-level label of IDR. 'D' represents a predicted disordered residue, and 'O' represents a predicted ordered residue.
'start' & 'end': Index pairs of regions labeled with 'D'.
'rnk': Rank of the proportion of disordered residues in the corresponding species (1-rank was shown).
5. Hydropathy
'residue': Residue-level hydropathy score by localCIDER.
'single': Protein-level hydropathy score by localCIDER.
'start' & 'end': Index pairs of regions with Hydropathy score >= 0.5.
'rnk': Rank of the protein-level Hydropathy score in the corresponding species (1-rank was shown).
6. DeepCoil
'coiled-coil': Residue-level score of coiled coil domain.
'sharpen': Sharpened coiled coil propensity with a peak detection algorithm.
'start' & 'end': Index pairs of regions with DeepCoil score >= 0.75.
7. SEG
'label': Residue-level label of LCR. '1' represents a predicted low-complexity residue.
'start' & 'end': Index pairs of regions labeled with '1'.
'rnk': Rank of the proportion of low-complexity residues in the corresponding species (1-rank was shown).
8. Charged residue
'label': Residue-level label of charged residue. '1' represents a positively charged residue, '-1' represents a negatively charged residue.
'FCR': Fraction of charged residues.
'POS_pos': Index of residues labeled with '1'.
'POS_neg': Index of residues labeled with '-1'.
'rnk': Rank of the FCR in the corresponding species (1-rank was shown).
9. Phos
'Phos': Residue-level label of phosphorylated residue. '1' represents a phosphorylation site recorded in the PhophoSitePlus.
'POS': Index of residues labeled with '1'.
'rnk': Rank of the Phos frequency in the human proteome (1-rank was shown).
10. DeepPhase
'single': Protein-level score by DeepPhase.
'location': Protein subcellular localization.
'url': URL of IF image.
'rnk': Rank of the DeepPhase score in the human proteome (1-rank was shown).
11. PhaSePred
'SaPS-8fea': Protein-level score by 8-feature SaPS model.
'PdPS-8fea': Protein-level score by 8-feature PdPS model.
'SaPS-10fea': Protein-level score by 10-feature SaPS model (Only available for human proteome).
'PdPS-10fea': Protein-level score by 10-feature SaPS model (Only available for human proteome).
'SaPS-8fea_rnk': Rank of the protein-level score by 8-feature SaPS model in the corresponding species (1-rank was shown).
'PdPS-8fea_rnk': Rank of the protein-level score by 8-feature PdPS model in the corresponding species (1-rank was shown).
'SaPS-10fea_rnk': Rank of the protein-level score by 10-feature SaPS model in the human proteome (1-rank was shown).
'PdPS-10fea_rnk': Rank of the protein-level score by 10-feature PdPS model in the human proteome (1-rank was shown).
12. InterPro
'PfamID': PfamID of domains detected by InterPro.
'domain': Domain names of corresponding PfamIDs.
'start' & 'end': Index pairs of detected domains.