CS PhD student at IGDTUW | UNESCO Research Fellow | Don Lavoie Fellow at GMU | Elinor Ostrom Doctoral Fellow | ex-Google | ex-Adobe | ex-Mozilla
Nota Bene: I self-sustain. No cash reserves. I work and pay my bills, while studying. I am looking for remote work in 2024. Write to me if you're a professor, recruiter, admissions counselor, grant writer, policy researcher, run a non-profit, looking for online tutor, or simply a hustler yourself.Published May 05, 2018
Chatbots are conversational interfaces that bring to light a revolutionary and a new way for individuals to interact with computer sys- tems. Therefore, they are increasingly present in messaging applications. They are versatile, therefore they are able to adapt and solve dif- ferent business pains. A Chatbot allows a user to simply ask questions in the same manner that they would address a human, rather than online search or form filling. For our project, we have created a Chatbot in an Indian regional Language, namely Hindi. We have developed an Android Application Interface with our Nat- ural Language Processing Model as Python Code on the back-end. Specifically, the Domain of our Project is Hindi Language ChatBot for breastfeeding mothers. We have made use of Retrieval based models that use a collection of predefined responses and a heuristic to pick an appropriate response based on the input and context. Our heuristic is a mix of rule-based expression match, and Machine Learning clas- sifiers. Essentially, the problem is a MultiClass Classification Problem with pattern matching to the nearest question in the training set of the Predefined Responses.
A bot is an artificial intelligence based software designed to perform a decided series of tasks on its own and without any human assistance. By the most basic definition, interactive chatbots are Question Answering systems that can be seen as information retrieval systems with an objective to form answers to the query on its own instead of simply returning a list of documents, by making use of powerful concepts from Natural Language Processing domain. In a typical information- seeking session, the user submits a query and the system returns a result; the session is then con- cluded.
In Question-Answering systems, the first and the most basic component is to identify the type of question. There are broadly two types of ques- tions: Factoid and Narrative. Factoid questions are mainly WH-type of questions or single word questions. For example, Who wrote The Simp- sons and Their Mathematical Secrets?, What is the average age of the onset of Autism?, etc. Narra- tive (Complex) questions are mainly the questions who demand an abstractive and a well constructed thoughtful answer. For example, What do schol- ars think about Jeffersons position on dealing with pirates?
The most common Paradigms for QA systems are: NLP based approaches and Knowledge-based and Hybrid based approaches. NLP based ap- proach, as used majorly by IBM Watson, Google, involve majorly three steps: Question Processing, Document Retrieval, Passage Ranking, Answer Processing. Knowledge-based approach, as fa- mously used by Siri, builds a semantic representa- tion of the query and perform named entity recog- nition as an initial step. It then maps from this se- mantics to query structured data or resources, like Geo spatial databases.
For the scope of our course project, we have built an NLP based Hindi Language Medical Assistant for breastfeeding mothers, expecting women, and their spouses. We have collected and expanded out data set to six hundred question- answer pairs for training purpose. We have implemented four different algorithms namely, Cosine Similarity, Naive Bayes Algorithm, and Word2Vec model. We have explained each of these in detail in subsequent sections.
Natural Language Processing is essentially con- cerned with how technology can be used to extract a meaning out of human language and can mean- ingfully interpret it. NLP allows technology such as Amazon Alexa to makes sense out of what we are saying and how to respond to it. Without NLP, AI that requires input and need to process the text and produce answers would be relatively useless.
Chatbots would be able to provide little to no value without Natural Language Processing. It forms the backbone of all the chatting applications and speech based devices we use. It is a powerful domain that equips us with the ability to process the textual information and make a sense out of it. When you type a message Hello, it is the NLP component of your system which enables you to know know that you have posted a standard greeting, which in turn makes use of AI capabilities to come up with a fitting response.
Some of the very common problems that we face regularly while texting to a chatbot include, spelling and grammatical errors and poor language use in general. Advanced NLP capabilities can identify spelling and grammatical errors and allow the chatbot to understand what you really mean to say despite the mistakes. It can decipher the in- tent behind the messages. A simple case would be to: understand whether a statement is made or a question is asked. Although this may sound quite trivial at a glance of it, but it can have a significant impact on a chatbot ability to carry on a successful conversation with a user.
While one can argue that NLP cant certainly work miracle and guarantee that chabot will always respond correctly to every message, but it powerful enough to make-or-break a chatbots suc- cess.
The main challenge while creating a Chatbot per- taining to the Indian Languages is the limited availability of datasets in the Indian Languages.
The initial dataset has been adapted from a book on breastfeeding written by Dr. Vijay Kumar, SWACHH Foundation. The book consists of 100 question answer pair. Book has been translated and parsed into Hindi language, which forms the core of our Dataset.
In addition to this, we have extended our set of 100 Question Answers to 600 by manual data augmentation.
Word2Vec
Results can be viewed in the full report.
A medical Chatbot provides answers and diag- noses based on queries entered by the user, which can be the symptoms. As we progress, the Chat- bots symptom recognition and diagnosis perfor- mance could greatly improved by adding support for more and more medical features by expanding the Dataset for the Hindi Language. Also more of generic problems and symptoms can be added, the functionality of the Chatbot can be made to in- clude men as well. We can also include further pa- rameters and make the Chatbot much more generic so as to include location, duration, and intensity of symptoms.
As for the future scope, we can also hope to work on increasing the dataset either by external resources or by our manual efforts. Once we have created enough data, we can apply Recurrent Neu- ral Network approach to solve the same problem statement, and we can hope to get even better re- sults via RNNs.
We can increase the scope of our project domain from something as specific as Chatbot for Breast- feeding mothers to Chatbot expert in Gynaecology or Medical health assistant chatbot and encompass a wide range of problems and issues.
Full Report: https://drive.google.com/file/d/1aN0jVzX0KUYehlKpU8fPE7aoSb80TUXa/view