Working on the repurposed Dell r710 server I got, I mostly want to run Machine Learning algorithms on it. Specifically, I wanted to run the Universal Sentence Encoder to generate sentence embeddings. But the pip installed version of Tensorflow doesn’t work with the E5660 CPU I have because it requires AVX CPU instruction set. Without it, it will just give me Illegal instruction (core dumped)
message. I can kind of get around it by using the Anaconda version of Tensorflow. One exception I found is that the multi-language pre-trained model require tensorflow_text
package. This packages is not compatible with the Ananconda version of TensorFlow for some reason. So once again, I’m left with the only option for English text and cannot compare sentence embedding across different languages.
I went to the NeurIPS 2019 conference in December and focused on NLP and reinforcement learning (RL) topics. The former is what I do for work, analyzing call center conversations, understanding what works for customer interactions, and make suggestions to clients based on their data. The latter is my personal interest, started all the way back when DeepMind beat the world’s best Go players. At the RL sessions, the tutorials mentioned using imitation learning to do natural language understanding tasks and generate responses to questions or chit-chat. People have some success, but it has some of the common pitfalls like repetitive usage of the most likely responses, and responses being too short. And sometimes the response is too simple and the bots fall into a cycle of “I don’t understand what you are saying”. So one night at a social gathering, I did get to meet Drs. David Silver and Richard Sutton. They briefly mentioned that I could try to set up the RL environment like a conversation and see if the agents can learn from the conversations. And in one of the workshops related to NLG conversations, people have talked about using various rewards to penalize for repetitiveness and encourage the generation of different texts. So that got me thinking. In addition to making a chatbot that’s similar to a call-center agent and customer interactions, I can design an environment that helps me discover the reason why people are calling. What I can do is set up categories of actions that are similar to “give refund”, “cancel service”, “keep the customer on the phone” etc, and use (regret = current action – best action), when the reward is either made the customer happy for 1 or made customer mad for 0. This may help me find the best action in a specific situation that results in the most wanted outcome given by the client. Granted, this might not give me the causal reasons why did the customer call directly, but it’s more similar to me designing a potential causal relationship graph before making the model and test if the causal relationship is correct. Even if it does not give me the exact cause, if it gives me the best action to take, then at least I have a product that does what the clients want. So the goal of the new year is to design an environment so I can test this idea and see if I can get the best type of action to take.
To get myself familiarize with the Gym framework from Open.Ai, I set up my own card game environment and made a double deep q-learning network. So here are some resources if you want to start your own:
Taking some design hints from https://github.com/zmcx16/OpenAI-Gym-Hearts
Some more help to design the environment: https://datascience.stackexchange.com/questions/28858/card-game-for-gym-reward-shaping
Other projects using gym: https://awesomeopensource.com/projects/openai-gym
Probably already programmed here: https://awesomeopensource.com/project/datamllab/rlcard
Create your own gym environments: https://github.com/openai/gym/blob/master/docs/environments.md https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html https://github.com/openai/gym/blob/master/docs/creating-environments.md https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e https://medium.com/@apoddar573/making-your-own-custom-environment-in-gym-c3b65ff8cdaa
For the environment I look for in the conversation task, I will need to have an environment that mimics a conversation. A reset would result in starting a new conversation. Render is just how the conversation carried before a specific point. Each step can be a turn of the conversation, with texts randomly chosen from the same category pool. A downside I can see with this approach is that the training data might not generate the best solution to the customer service session. It may just be the bare minimum to get to the desired outcome. But I hope that the fact I can design my own regret in an RL framework, I can penalize for things like the length of the conversation or sentiment/emotional outcome, while I’m trying to achieve the outcome of retaining a customer.
I have encountered a similar situation before when I was making a chatbot using Rasa. They have a simpler RL environment, where a user can choose which route to take in a certain situation. But when I used it, the policy was too simple and does not achieve what I want, especially not give a causal relationship. I hope this could be integrated into this framework and be more useful.
Recently, we have been trying to identify the emotion or sentiment in a phone conversation. The first step we have tried to use a while back was Universal Sentence Embedding. That algorithm transforms a sentence into a fix length sentence embedding or 512 numbers. But we are ignoring the acoustic features all together with this approach. We are getting 0.70 on test accuracy on about 2000 sentences/utterances, but hoping to do better. So we started trying to combine text embedding with acoustic features such as MFCC and may others offered by Librosa. I used an LSTM with features for each window as the time steps over however many windows I have. Of course, I had to pad to the maximum number of windows. I tried to even include all the features librosa offered but was not able to increase the accuracy of more than 0.73. After the concatenation layer of text and audio signals, I played around with both LSTM and Dense layers, with just dense layers being more accurate. I suspect there just aren’t enough samples, especially consider the LSTM for the acoustic signals. So will try to see if I can get more labeled data for the audio signal to improve accuracy.