3. Data, Science & AI

24 skills

Found 9989 skills

Total Stars:6.7M
Avg Stars:667

nemo-curator

davila7

18.0K

GPU-accelerated data curation tool for preparing high-quality training datasets for LLMs, featuring deduplication, quality filtering, and content safety checks.

RAPIDS
LLM
Data Curation
3. Data, Science & AI

fluidsim

davila7

18.0K

Framework for Python-based computational fluid dynamics simulations, supporting Navier-Stokes equations, turbulence analysis, and HPC with FFT methods.

Navier-Stokes
FFT
Pseudospectral
3. Data, Science & AI

fine-tuning-with-trl

davila7

18.0K

Fine-tunes LLMs using RLHF techniques (SFT, DPO, PPO) with HuggingFace Transformers for preference alignment and reward optimization.

TRL
RLHF
HuggingFace
3. Data, Science & AI

sparse-autoencoder-training

davila7

18.0K

Guides training and analysis of Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features for model analysis.

Sparse Autoencoders
SAELens
Interpretable Features
3. Data, Science & AI

optimizing-attention-flash

davila7

18.0K

Accelerates transformer training/inference with 2-4x speedup and 10-20x memory reduction using Flash Attention for long sequences.

Flash Attention
PyTorch
H100
3. Data, Science & AI

torch-geometric

davila7

18.0K

Provides tools for building and training Graph Neural Networks (GNNs) for node classification, link prediction, and molecular property prediction using PyTorch Geometric.

PyTorch Geometric
Graph Neural Networks
3. Data, Science & AI

pennylane

davila7

18.0K

Python library for quantum machine learning, quantum circuit design, and hybrid quantum-classical model training with automatic differentiation and PyTorch integration.

Quantum Circuits
Automatic Differentiation
PyTorch
3. Data, Science & AI

diffdock

davila7

18.0K

Predicts protein-ligand binding poses and confidence scores using diffusion models, supporting PDB and SMILES inputs for structure-based drug design.

Diffusion Models
Molecular Docking
3. Data, Science & AI

segment-anything-model

davila7

18.0K

Provides zero-shot image segmentation using points, boxes, or masks as prompts, or automatically generates all object masks in an image.

Image Segmentation
Zero-Shot
Prompt-Based
3. Data, Science & AI

exploratory-data-analysis

davila7

18.0K

Automates exploratory data analysis for scientific datasets, detecting file types and generating structured reports with quality metrics and recommendations.

Exploratory Data Analysis
Scientific Data
Data Quality Metrics
3. Data, Science & AI

drugbank-database

davila7

18.0K

Provides access to and analysis of comprehensive drug data from DrugBank, including properties, interactions, targets, and pharmacology for research and discovery.

DrugBank
Pharmacology
Drug-Drug Interactions
3. Data, Science & AI

geniml

davila7

18.0K

Enables machine learning analysis of genomic regions using BED files, including region embeddings and scATAC-seq processing.

BED files
Region2Vec
scATAC-seq
3. Data, Science & AI

zinc-database

davila7

18.0K

Accesses ZINC database for drug discovery, enabling compound searches by ID, SMILES, similarity, and 3D structure analysis for virtual screening.

ZINC
SMILES
Virtual Screening
3. Data, Science & AI

cobrapy

davila7

18.0K

Enables constraint-based metabolic modeling with FBA, FVA, gene knockouts, and SBML support for systems biology and metabolic engineering analysis.

COBRA
FBA
SBML
3. Data, Science & AI

senior-prompt-engineer

davila7

18.0K

Provides advanced prompt engineering for LLM optimization, RAG, agent design, and structured outputs to enhance AI product performance.

RAG
Agent Design
Chain-of-Thought
3. Data, Science & AI

biorxiv-database

davila7

18.0K

Efficiently search bioRxiv preprints by keywords, authors, or date ranges, retrieving metadata and PDFs for scientific literature reviews.

bioRxiv
preprint
life sciences
3. Data, Science & AI

uniprot-database

davila7

18.0K

Provides direct REST API access for protein data retrieval from UniProt, including searches, FASTA sequences, and ID mapping for bioinformatics workflows.

UniProt
REST API
FASTA
3. Data, Science & AI

llama-factory

davila7

18.0K

Provides no-code web interface for fine-tuning large language models with quantization support and multimodal capabilities.

LLM Fine-tuning
QLoRA
Multimodal
3. Data, Science & AI

labarchive-integration

davila7

18.0K

Provides API integration for electronic lab notebooks (ELN) to manage entries, attachments, and workflows with scientific tools including Jupyter and REDCap.

ELN
Jupyter
REDCap
3. Data, Science & AI

pubmed-database

davila7

18.0K

Provides direct REST API access to PubMed for querying biomedical literature, supporting advanced Boolean/MeSH queries, batch processing, and citation management.

PubMed
REST API
MeSH
3. Data, Science & AI

scientific-schematics

davila7

18.0K

Creates publication-quality scientific diagrams with AI-driven refinement for neural networks, biological pathways, and complex visualizations.

Neural Networks
Biological Pathways
Scientific Visualization
3. Data, Science & AI

alphafold-database

davila7

18.0K

Accesses AlphaFold's database of AI-predicted protein structures, enabling retrieval by UniProt ID and analysis of confidence metrics for structural biology research.

AlphaFold
UniProt
pLDDT
3. Data, Science & AI

seaborn

davila7

18.0K

Statistical visualization library for creating scatter, box, violin, heatmap, and regression plots for exploratory data analysis and publication-ready figures.

Seaborn
Data Visualization
Exploratory Data Analysis
3. Data, Science & AI

sentencepiece

davila7

18.0K

Language-independent text tokenizer using BPE and Unigram algorithms, optimized for speed and multilingual support in AI models.

BPE
Unigram
Tokenization
3. Data, Science & AI
PreviousPage 8 of 417 PageNext