Zoltar: The fortune telling AI
Published:
In this blog post, I’m excited to talk about one of my project - Zoltar the AI. Zoltar is an artificial conversation entity that has the potential for both simplicity and tremendous growth. I’ll provide a brief overview of the project, its dependencies, and walk you through the main concepts involved in creating a conversational AI. Then, based on the knowledge gained, I’ll explain the technical workings of this project in detail. To conclude, I’ll share a video demonstrating a conversation I had with Zoltar. So, let’s dive right in!
The Fortune Teller project, known as Zoltar, is an AI powered by a custom-made dataset created from subreddits and New York Times Horoscope Data. For simplicity, I chose to utilize Python’s NLTK package, which is well-suited for cloud-based environments. This project is completely data-driven, meaning that more data leads to better results. Now, let’s start from the beginning to understand what exactly an AI is and how it can engage in conversations without explicit programming.
AI is a sub-domain of data-driven computational practices that encompasses technologies like machine learning and deep learning. Back in 1955, Marvin Minsky and John McCarthy, both pioneers of Artificial Intelligence, defined it as any task that, if performed by a human, would be considered an application of intelligence. While traditional data-driven domains like machine learning focus on learning from data and making predictions, AI aims to replicate human behavior with notable success.
The fortune-telling AI uses data to comprehend sentence structure and the words provided. To delve deeper into its functioning, let’s familiarize ourselves with a few key terms:
NLTK: Short for Natural Language Processing Toolkit, NLTK provides tools and functions for processing human language inputs. It is employed in various applications such as text classification, chatbots, sentiment analysis, and language translation. This is the main package used to build Zoltar.
Tokenization: Tokenization is the process of dividing large quantities of text into smaller parts called tokens. It consists of word and sentence tokenization.
Word Tokenization: This operation is accomplished using the word_tokenize() function of NLTK, which splits a sentence into individual words.
Sentence Tokenization: The sent_tokenize() module is used to obtain a list of sentences from a paragraph, enabling us to analyze average words per sentence.
Stemming: Stemming is a process of linguistic normalization that reduces words to their root form or chops off derivational affixes. For instance, words like “connection,” “connected,” and “connecting” are reduced to the common word “connect.”
POS Tagging: POS tagging identifies the grammatical group of a given word and looks for relationships within the sentence, assigning a corresponding tag to each word.
Lemmatization: Lemmatization reduces words to their base form, ensuring linguistically correct lemmas. It is usually more sophisticated than stemming, as it takes into account the context of the word. For example, “better” is lemmatized to “good.”
Stop Words: Stop words are considered noise in text analysis. They are generic, repetitive, and irrelevant words that do not contribute significantly to the overall context. Removing stop words helps refine the data.
Now that we have a grasp of these fundamental concepts, let’s gain a technical overview of how Zoltar operates.The first step is to collect relevant data for training the model. I sourced New York Times fortune data from Kaggle and selected subreddits as data sources. Next, sentence and word lemmatization are performed to understand the data at its basic building units.
Following this, irrelevant content such as stop words and punctuation marks are removed. Sentence vectorization is then used to group the stripped sentences into a bag-of-words model.The model is now ready to take user input and generate the most appropriate response by performing cosine similarity operations.
And there you have it! Your very own conversational AI, Zoltar, is now ready to hold stimulating conversations (well, not quite). To gain deeper insights into the project or try building it yourself, feel free to refer to the GitHub repository here for the code and data. As a bonus, I recorded a conversation I had with Zoltar and added some creepy music to set the ambiance. Enjoy the video!
If you have any questions about the terms or the project itself, don’t hesitate to reach out to me. I’m also open to discussing any improvements or additions you may have in mind for Zoltar.