ClinVar-BERT Prediction Lookup
Search for ClinVar variant predictions by gene, variant, or ClinVar identifiers.
Model: weijiang99/clinvarbert
Dataset: Dataset on Figshare, ~540MB Parquet file with ClinVar-BERT predictions for ClinVar variants
Search Options
Enter one or more search criteria below. Leave fields empty to ignore them.
Results
Search Results (click on Comment cell to see full text)
Column Descriptions
- Gene: Gene symbol
- Variant: Genomic coordinates (GRCh38) in format chr:pos ref>alt
- AA Change: Amino acid change (e.g., H1860P)
- Prediction: Model's predicted label (P/LP, VUS, or B/LB)
- Model Prediction Probabilities: P/LP, VUS, and B/LB probabilities (displayed one per line)
- Clinical Significance: ClinVar's clinical significance
- Submission Classification: Original submission classification
- Conflicting Submissions: True if this variant has conflicting classifications across submissions (some benign, some pathogenic)
- ClinVar Identifiers: VCV, RCV, SCV
- Comment: Submission comment/evidence - Click on cell to view full text
Results are limited to 100 entries per search.
Example Searches
Try these examples:
Click to load example
| Gene Symbol | ClinVar VCV | ClinVar RCV | ClinVar SCV | ClinVar VariationID | Chromosome | Position (GRCh38) | Reference Allele | Alternate Allele |
|---|
View overall statistics about the prediction dataset.
About This Tool
This web interface allows you to search and view ClinVarBERT model predictions for ClinVar variants.
Model Information
- Model: ClinVarBERT - A BERT-based model for variant pathogenicity prediction
- Training: Fine-tuned on ClinVar variant submissions with clinical evidence text
- Output: Classifies variants into three categories:
- P/LP: Pathogenic/Likely Pathogenic
- VUS: Variant of Uncertain Significance
- B/LB: Benign/Likely Benign
Dataset
- Size: ~540MB Parquet file
- Format: Parquet (efficient columnar storage)
- License: CC BY 4.0
- Source: ClinVar variant submissions with model predictions
Features
- Multi-criteria search (gene, ClinVar identifiers, coordinates)
- Fast queries using DuckDB
- Probability distributions for each prediction
- Comparison with ClinVar labels
- Dataset statistics
Citation
If you use this tool or data, please cite the associated publication.
Links
- Model on Hugging Face
- Dataset on Figshare
- GitHub Repository (update this link)
License
This tool is part of the ClinVar LLM project.