awesome-nlp

A curated list of resources dedicated to Natural Language Processing

Please read the contribution guidelines before contributing. Please add your favourite NLP resource by raising a pull request

Scope

This list covers natural language processing — linguistic analysis, multilingual tooling, classical and neural methods, datasets, and evaluation. Large language models are included only where they advance or evaluate a core NLP task or capability (tokenization, multilinguality, MT, summarization, NER, QA, factuality, probing, distillation). General-purpose chatbots, agent frameworks, prompt-template repositories, code-generation tools, and RAG application starter kits live in other lists — see See Also.

Research Summaries and Trends

Where to follow current NLP research:

ACL Anthology - canonical archive of papers from ACL, EMNLP, NAACL, EACL, COLING, and related venues.
NLP-Progress - tracks state-of-the-art results across common NLP tasks and datasets.
Papers With Code: NLP - papers, benchmarks, and leaderboards for NLP tasks.
Sebastian Ruder's newsletter - regular roundups of NLP research and trends.
ACL Rolling Review - the rolling review process feeding ACL-affiliated venues.
The Gradient - long-form essays on ML and NLP research.
Visual NLP Paper Summaries - illustrated summaries of recent papers.

Historical highlights

NLP's ImageNet moment has arrived - 2018 essay on the rise of pretrained language models.
Survey of the State of the Art in Natural Language Generation - 2017 NLG survey.
The Illustrated Transformer and The Illustrated BERT, ELMo, and co. - canonical visual explanations.

Name		Name	Last commit message	Last commit date
Latest commit History 598 Commits
LICENSE		LICENSE
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
contributing.md		contributing.md

Folders and files

Latest commit

History

Repository files navigation

awesome-nlp

Scope

Contents

Research Summaries and Trends

Historical highlights

Prominent NLP Research Labs

Tutorials

Reading Content

Videos and Online Courses

Books

Libraries

Services

Annotation Tools

Tasks and Methods

Text Embeddings

Tokenization, Morphology, and Segmentation

POS Tagging and Dependency Parsing

Named Entity Recognition and Information Extraction

Coreference Resolution

Text Classification and Sentiment Analysis

Topic Modeling

Summarization

Machine Translation

Question Answering and Reading Comprehension

Information Extraction Beyond NER

Retrieval and Embeddings

Speech and Text

Datasets

Multilingual NLP Frameworks

Language Models for NLP

Pretraining and Adaptation

Multilingual and Cross-Lingual Models

Evaluation and Benchmarks

Reasoning and Test-Time Compute

Long Context and Alternative Architectures

Factuality, Hallucination, Calibration

Probing and Interpretability

Efficient and Small Language Models

Instruction Tuning and Preference Optimization

Bias, Fairness, Safety in NLP

NLP per Language

NLP in Arabic

Libraries

Models and Embeddings

Datasets

NLP in Chinese

Libraries

Models and Embeddings

Anthology

NLP in Danish

NLP in Dutch

NLP in German

NLP in Hungarian

NLP in Indic Languages

Data, Corpora and Treebanks

Corpora/Datasets that need a login/access can be gained via email

Language Models and Word Embeddings

Libraries and Tooling

Models and Embeddings

NLP in Indonesian

Libraries and Embeddings

Models

Datasets

NLP in Korean

Libraries

Models and Embeddings

Blogs and Tutorials

Datasets

NLP in Persian

Libraries

Models

Datasets

NLP in Polish

NLP in Portuguese

Models

NLP in Spanish

Packages