Sentiment Analysis using Swift

Lately I’ve been working on several projects using conversational UI or “Bot” interactions.  A key interaction pattern when designing “Bots” you must listen to both the intent and sentiment of the text the user is providing.  Through the use of sentiment analysis you can help determine the mood of the user. This can be an important tool in determining when to offer help or involve a human.  You might think of this as the “help representative” moment we are all familiar with.  Using sentiment analysis you can try to offer help before that moment occurs.

There are several great sentiment analysis node.js packages but nothing I could find to run offline in Swift. A majority of the node.js projects seem to be a forks of the Sentiment package. The Sentiment-v2 package worked best for many of my cases and became my starting point.

A majority of the sentiment analysis  packages available through NPM use the same scoring approach. First they parse a provided phrase into individual words. For example “Cats are amazing” would be turned into an array of words, ie [“cats”, “are”,”amazing”].

Next a dictionary of keywords and associated weights are created. These scoring dictionary is created using the AFINN wordlist and Emoji Sentiment Ranking. In a nutshell, words like “amazing” would have a positive weight whereas words like “bad” would have a negative weight. The weight of each word in the provided phrase is added together to get the total weight of the phrase. If the phrase has a negative weight, chances are your user is starting to get frustrated or at least talking about a negative subject.

Using this approach I created the SentimentlySwift playground to demonstrate how this can be done on device using Swift.  This playground uses the same FINN wordlist and Emoji Sentiment Ranking weights to determine a sentiment analysis score without the network dependency.   To make comparisons easier, I tried to mirror the Sentiment package API as must as possible.  The below demonstrates the output for a few of the test phrases included with Sentiment.

let sentiment = Sentimently()
print(sentiment.score("Cats are stupid."))
analysisResult(score: -2, comparative: -0.66666666666666663, positive: [], negative: ["stupid"], wordTokens: [wordToken(word: "cats", wordStem: Optional("cat")), wordToken(word: "are", wordStem: Optional("be")), wordToken(word: "stupid", wordStem: Optional("stupid"))])
print(sentiment.score("Cats are very stupid."))
analysisResult(score: -3, comparative: -0.75, positive: [], negative: ["stupid"], wordTokens: [wordToken(word: "cats", wordStem: Optional("cat")), wordToken(word: "are", wordStem: Optional("be")), wordToken(word: "very", wordStem: Optional("very")), wordToken(word: "stupid", wordStem: Optional("stupid"))])
print(sentiment.score("Cats are totally amazing!"))
analysisResult(score: 4, comparative: 1.0, positive: ["amazing"], negative: [], wordTokens: [wordToken(word: "cats", wordStem: Optional("cat")), wordToken(word: "are", wordStem: Optional("be")), wordToken(word: "totally", wordStem: Optional("totally")), wordToken(word: "amazing", wordStem: Optional("amaze"))])
var testInject = [sentimentWeightValue]()
testInject.append(sentimentWeightValue(word: "cats", score: 5))
testInject.append(sentimentWeightValue(word: "amazing", score: 2))
print(sentiment.score("Cats are totally amazing!", addWeights: testInject))
analysisResult(score: 7, comparative: 1.75, positive: ["cats", "amazing"], negative: [], wordTokens: [wordToken(word: "cats", wordStem: Optional("cat")), wordToken(word: "are", wordStem: Optional("be")), wordToken(word: "totally", wordStem: Optional("totally")), wordToken(word: "amazing", wordStem: Optional("amaze"))])

Although the APIs are similar there is one important difference between the two approaches. The SentimentlySwift playground uses NSLinguisticTagger to tokenize the provided phrase. Using NSLinguisticTagger, SentimentlySwift first parsers each word into a series of word slices. Each slice is a word tokenized using the options provided to the NSLinguisticTagger. Next the slides are enumerated and an optional “tag” or word stem is calculated. For example, in the phrase “cats are amazing”, the “amazing” word generates a word stem of “amaze”. A better example would be the word “hiking” produces the word stem of “hike”.

The following snippet shows an example on how this can be implemented.

public struct wordToken {
let word: String
let wordStem: String?
init(word: String, wordStem: String?) {
self.word = word
self.wordStem = wordStem
}
}
func lemmatize(_ text: String) -> [wordToken] {
let text = text.lowercased()
let options: NSLinguisticTagger.Options = [.omitWhitespace, .omitPunctuation, .omitOther]
let tagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemes(forLanguage: "en"),
options: Int(options.rawValue))
tagger.string = text
var tokens: [wordToken] = []
tagger.enumerateTags(in: NSMakeRange(0, text.characters.count), scheme: NSLinguisticTagSchemeLemma, options: options) { tag, tokenRange, _, _ in
let word = (text as NSString).substring(with: tokenRange)
tokens.append(wordToken(word: word, wordStem: tag))
}
return tokens
}
view raw lemmatize.swift hosted with ❤ by GitHub

You might be asking why this is important? By implementing Lemmatisation you increase your AFINN hit rate and improve your overall analysis scoring.

This is one trick I’ve found for improving or at least monitoring your conversational UI or “Bot” interactions.