
Language Bot Sprint
A major pharmaceutical company solicited proposals for a Hebrew language chatbot that could answer questions about a particular drug. I was given two PDFs, each of which was around 20 pages of content in Hebrew, and asked to create a demo chatbot that could disperse this information to customers in a friendly manner.
Additional Challenges
I am unable to read or speak Hebrew and had no team members who could read or speak the language.
Our company did not have a Hebrew language processor.
The deadline for the demo was in two weeks
Initial Approach
I started by researching the Hebrew language so I could try to anticipate extra challenges that we might face using a non-Hebrew language processor to process Hebrew. I learned that Hebrew is read right to left, most Hebrew texts contain no vowel markings, and some Hebrew characters change form at the end of words. I also learned that Hebrew is a very gendered language which can change both based on the speaker’s gender as well as the gender of the person they are speaking to.
I used online translators to translate the content and created an outline of each document that featured headings in both Hebrew and English to use as a map for myself when searching for specific information in the documents. The first document gave a comprehensive overview of the product as well as detailed instructions on how to use it. The second document was a series of FAQs, many of which were repeated multiple times throughout with slight variations.
I decided to structure my conversation flow such that the information of the first document would be prioritized as it seemed more detailed and oriented towards helping customers understand the product. I organized the information into several categories: About the drug, Before using the drug, Side effects, and FAQs. I also created a flow that would give users step-by-step instructions on how to use the drug.
Collaboration with Engineers
Since we didn’t have a Hebrew language processor, we experimented with translating Hebrew inputs into English and using our English language processor to select responses. Initial tests had mixed results. Ultimately, this solution was shelved due to not having enough time to test and implement it.
Notes on Translation
The chatbot required user-facing text in Hebrew (ex: greetings, confirming questions, error messages, etc). Since nobody on the team understood Hebrew, I had to step up to populate the bot’s Hebrew content despite my lack of familiarity with the language. To increase the odds of generating comprehensible translations, I opted to write simple sentences. As often as I was able, I copied text directly from the documents to guarantee that they would be well-formed sentences.
When creating sample user phrases to train the bot on, I was less concerned about simplifying my language since we would be training it on many different phrases and the users would never see these directly. I tried to come up with as many different variations as possible.
A Change in Direction
At the end of our second week, there were two major changes to the project:
The company extended the deadline by one week.
We were able to get a native Hebrew speaker on contract.
This significantly changed my approach. Now that I knew a native speaker could review the text, I prioritized rewriting the user-facing text so it was friendlier and did not shy away from more complicated concepts. Once the native speaker had edited all user-facing text, I sent them the sample user phrases to edit and add to so we could train the bot on them.
Testing
The native speaker came up with sample questions to test the bot on. We ended up with a 60% correct response rate, which is much lower than ideal, but not surprising since we were using an English language processor to deal with Hebrew phrases. We collaborated to create more sample Hebrew phrases to train the bot on.
Next Steps
The company chose a different proposal. However, had they chosen to move forward with this one, our first priority would have been to get a working Hebrew language processor.
In the meantime I would have collaborated with the Hebrew translator to create lists of synonyms for each of the commonly used verbs when discussing the product, to make sure that the English language processor recognizes a verb that has been gendered or is in a different tense.