August 2023

sentibank: Unleashing the emotions of textual data

Providing a trusted foundation for sentiment analysis via verified dictionaries

GitHub Icon

May 31, 2023

sentibank is an open database project that embraces emotions as the heart of textual data. The journey began with a realisation of the profound impact emotions have on every aspect of human life.

Sentiments are at the core of human decision-making and understanding – Influencing social dynamics, transforming the political landscape and even driving economic fluctuations. And thus naturally, sentiment analysis has become an increasingly critical technique across social science domains, ranging from policy-making to business analytics. While deep learning models have excelled in achieving high accuracy, often surpassing simpler lexicon models in sentiment analysis tasks, their inherently opaque nature poses challenges for applications in high-stakes domains like government policymaking or mental health diagnosis, where transparent and interpretable decision-making is crucial. Recognising the continued importance of dictionary-based sentiment analysis, particularly in computational social science fields where interpretability is paramount, improving dictionary-based sentiment analysis remains vital. Dictionary-based sentiment analysis relies on expert-curated lexicons, yet effectively applying these lexicons in rule-based systems faces several challenges.

People standing in an office having a convesation

Scattered Landscape Various lexicons are scattered across various sources, such as GitHub repositories, appendices of publications, supplementary materials, and author/institutional websites. This fragmented distribution poses a significant challenge for researchers who seek to leverage sentiment analysis effectively. Furthermore, they are distributed in diverse file formats, necessitating the tedious process of exporting and importing data into a format compatible with the researcher’s workflow.

Championing Quality Numerous lexicons, including those that undergo peer review, frequently encounter challenges such as the presence of duplicates accompanied by conflicting labels. A substantial portion of existing dictionaries in sentibank (60%) required removal of duplicates, function words, and lexicons lacking substantive sentiment content.

An encyclopaedic hub of sentiment dictionaries With sentibank, researchers can access a vast array of sentiment lexicons from diverse domains and genres. This streamlined approach not only reduces the hassle of locating and validating individual lexicons but also expands the use cases of these invaluable legacy resources. sentibank's centralised hub empowers users to combine and compare lexicons, facilitating more comprehensive sentiment analysis. Moreover, it opens up opportunities for data integration across domains and genres, enriching the depth and accuracy of research findings.

A crowd of people stood facing the same direction, in black and white
Utilise sentibank in your research

Import and access to our curated collection of sentiment dictionaries in 3 frictionless steps.

1# 1. Installation 2 3 pip install sentibank 4 5 6 # 2. Import modules 7 8 from sentibank import archive 9 10 11 # 3. Access dictionaries 12 13 load = archive.load( ) 14 vader = load.dict(“VADER_v2014”) 15

For further information, such as loading dictionaries based on the predefined lexicon identifiers, visit our documentation.

Author

Nick Oh

Explore our documentation for sentibank

ANEW

Emotional ratings across the dimensions of Pleasure, Arousal, and Dominance

Learn more
An aerial shot of people on a beach and in the sea
General Inquirer

Lexicon categorising words across multiple psycholinguistic dimensions

Learn more
OpinionLexicon

A dictionary for product reviews, comprising words curated for informal language

Learn more
SentiWordNet

A comprehensive dictionary that assigns graded sentiment scores to WordNet synsets

Learn more
Vader

A gold-standard lexicon optimised for social media sentiment analysis

Learn more
WordNet-Affect

Affective labels that are hierarchically organised based on WordNet synsets

Learn more