NABpredv1.1 | LLPS Propensity Predictor
Sequence-based prediction of protein phase separation

NABpred

NABpred estimates the probability that a nucleic-acid binding protein is LLPS-associated, using ESM2-650M mean+max pooled embeddings (ℝ²⁵⁶⁰) passed through a compact MLP classifier. Supports single-sequence and batch input from FASTA, CSV, or Excel.

ESM2-650M · 33 layers 2×1280 → 2560 pooling MLP · 2560→128→1 Single & batch prediction Input: 10–2000 aa

Submit Sequences

Paste one sequence, multi-FASTA, or upload a CSV/Excel file for batch prediction.

SEQUENCE INPUT · FASTA / RAW · MULTI-FASTA SUPPORTED
load:
1
sequences0 total length0 valid
Drop CSV or Excel file
or browse  ·  column 1 = identifier, column 2 = sequence
.csv.xlsx.xls.tsvup to 5 MB
0 sequences loaded · file
PROCESSING 0 / 0
completed 0 errors 0 positives 0 elapsed 0s

Batch summary report

— sequences processed
Total processed
0
0 valid · 0 errors
LLPS-positive
0
0% of valid predictions
Mean P(LLPS)
0.000
median 0.000
Top score
0.000
Score distribution · 10 bins
non-LLPS LLPS
Propensity leaderboard · top 5
    All results
    # Identifier Length P(LLPS) Logit Class

    Methods

    A four-stage pipeline from raw sequence to calibrated LLPS probability.

    The input sequence is validated, then embedded by the ESM2-650M transformer, producing a per-residue representation of dimension L × 1280. Masked mean and max pooling along the residue axis yields a fixed 2560-dim vector capturing both average and salient local features.

    This vector passes through a small fully-connected classifier — 2560 → 128 → 1 with ReLU and dropout — trained to discriminate LLPS-associated nucleic-acid binding proteins from non-LLPS NABs. The output logit is converted to a probability via sigmoid, with a default threshold of τ = 0.50.

    STEP 01
    Validate
    Strip FASTA, uppercase,
    L ∈ [10, 2000], 20-AA.
    STEP 02
    Embed (ESM2-650M)
    Per-residue hidden states,
    shape L × 1280.
    STEP 03
    Pool
    Masked mean + max along L,
    concatenated to ℝ²⁵⁶⁰.
    STEP 04
    Classify
    MLP 2560→128→1, sigmoid,
    threshold τ = 0.50.

    API Reference

    Single or batch JSON endpoint for scripted and programmatic use.

    POST /predict accepts either { "sequence": "..." } for one sequence or { "sequences": [...] } for a batch (max 16 per request — chunk client-side for larger sets).

    Endpoint
    POST /predict
    Single
    { "sequence": "MTEIT…YL" }{ classification, score, logit, threshold, length }
    Batch
    { "sequences": [{"id":"FUS","sequence":"…"}, …] }{ count, ok, positives, errors, results: [...] }
    Constraints
    Length 10–2000 · standard AA alphabet (ACDEFGHIKLMNPQRSTVWY) · max 16 sequences per request
    Health
    GET /health{ status, hf_token_configured, max_batch }