Monday, June 22, 2009

BeVocal vs Voxeo - Which one is good for a beginner?

Recently I have been learning ccxml and vxml. To begin with, I chose to do ccxml. But I recognized that I had to understand how vxml works before I could do something on ccxml. So in the past 2 weeks, I've begun working on vxml.

I signed up two free accounts with BeVocal and Voxeo respectively. Here are my simple comparison as an IVR beginner.

Category BeVocal Voxeo
Customer Support
and Forum
Worst! I have never got a response from them. I guess they are not entertaining any free account members. If so, they should mention this in their site. In addition, I am still unable to access their news forum discussion groups either. Extremely efficient, fast response. I have no problem to access their forum discussion boards.
Text to Speech (TTS) Sound more natural. I've seldom needed to do some customization in code in order to get a better speech. BeVocal sounds better than Voxeo, especially in reading a long sentence or a paragraph. A short sentence or a few words interpreted by voxeo TTS are okay. Otherwise, there is a tweak work, like manually using <break> or using comma to break the sentence into a few words together so as to make the speech more natural. This could be a problem if the text is instantly interpreted to a speech on the fly as soon as TTS engine receives text from a Web application. Voxeo claims that the speech will be better with their pay voice version. In addition, some words from voxeo TTS are never right to my ears. For example, slash or the punctuation / is always pronounced as flash. The word project sometimes is read as pro-jack or pro-ject.
Voice Recognition Acceptable. Frustrated. You'd better code with DTMF (telephone keybad input). I am not sure if voxeo's voice recognition engine is coded with genetic algorithm so that it could learn in the long run.
HTTP Integration As of my writing now, I still don't know where/what the actual voice URL of BeVocal is. Thus, I cannot forward the application control (using <goto>) back to the particular dialog of the vxml on BeVocal server. I need to duplicate all the scripts on my Web Server in order to continue the application. BeVocal probably has published the HTTP connection information somewhere in their site. However, for this moment, I just want to test and see how my Web application works with voice dialogs.

When you're serving your vxml via HTTP to BeVocal voice server, your vxml file must set CONTENT-TYPE header to application/voicexml+xml. Otherwise, an error is expected.
I like voxeo approach using Web URL as their voice browser URL. It is a lot simpler for a beginner like me. The concept is very simple and easy to implement. With Web URL approach, I can simply use <goto> fetch my vxml files or return to any <form> of the same vxml file. There is no file duplicate needed on my Web server. And I don't need to research how to do HTTP Integration either at my early learning stage.

Voxeo will process your vxml with CONTENT-TYPE header set to either text/html or application/voicexml+xml.
Grammar and Syntax It is hard to say which one will be easy to work with. I cannot tell which one is more straight on the W3C spec either. They are slightly different.

For example, For BeVocal, we must have this; otherwise, it will be error out.
    <vxml version="2.1" xmln="http://www.w3.org/2001/vxml">

For Voxeo, we can have:
    <vxml version="2.1">
or
    <vxml version="2.1" xmln="http://www.w3.org/2001/vxml">


For BeVocal, it accepts the following and your script runs without any problem:
  <form id="q">
    <block> 
    <subdialog name="result" src="#personalInfo">  
      <filled> 
          ... 
      </filled> 
    </subdialog> 
    </block> 
  </form> 
For Voxeo, your scripts will never be run but it won't generate an error for you that makes debugging very difficult. I have left a note to their support team and hope that they will speak to their engineers about this.

To work around this, it is better to follow the W3C specification 100%. If something goes wrong, check the spec first.
How Easy to Test
an Application?
BeVocal displays 1-800 number + pin, a direct number and SIP on every Web page saying for application testing. For unknown reason, I can only use the direct number. The 1-800 number and SIP have never worked for me. Voxeo provides 5 ways to test every application: 1-800 + pin, a direct number, Skype, FWD, SIP and even iNum number. Every method works for me like a charm.
How Many Applications
Can be Tested
at a Time?
One. BeVocal only allows you activating an application for test at a time. Many. Each application will be assigned to different numbers and pin.
Application Debugging BeVocal's Log Browser can be used as a tool to debug your vxml but you cannot view that particular log while you're executing the application. Each log can be reviewed after execution with color highlighting. All errors will be highlighted in red, letting you know exactly which line it is, similar to other programming debugging tool.

Their Vocal Debugger will allow you walking through the script and even pause your application at certain point. The pause is not a break point unlike other programming tool where you can set it beforehand. You only set it when you run the application.

If you don't like to use a phone to test your voice application, you can use their Vocal Scripter. It simulates the dialing process for connection and convert your response in text to voice back to your application. But I am not fond of it. I would like to test the voice quality as well.
Voxeo only provides a single tool called Application Debugger. You can have it open while you are testing your application. When an error occurs, it will highlight it in red. Unfortunately, the error is not exactly what we are looking for in our script. They are mostly Java stack trace error with the line numbers that we don't care. It is good for Voxeo supports or their developers for further diagnostic. I like BeVocal giving me exactly which line of my script is having a problem.

Unlike BeVocal debugger, Voxeo debugger won't allow you walking through the application directly. But it provides the execution messages as if it documents the call scenario so that you can look at the output and see if there is any abnormality.

Their error logs can be retrieved in a later time but as soon as you close your debugger, all the color highlighting is gone forever. The log is in plain text. Within a day, you can easily view it with your browser. Otherwise, you have to download it first because Voxeo will put them in a .GZ file. Because of this, I guess, Voxeo provides another tool called Prophecy Log Search. Still there is no color highlighting. At this moment, I don't find it useful but instead dislike it. To me, the tool is too heavy. The search is slow. JavaScript error is everywhere. It doesn't support both IE 7 and Sea Monkey 1.1.14, which I use intensively for my Web development.

One important thing that I learned within these two weeks is about a shadow variable. In the beginning of my learning, I was confused by this term being used in IVR. A variable is a variable. What does it mean a shadow variable? If I am correct, it is similar to a read-only property of an object. When we use one of the pre-defined tag element such as <record>, we are sure able to access its attributes. But this <record> tag consists of other properties available after the code execution like duration, size, termchar and maxtime. They are all read-only. In IVR world, such read-only properties are called shadow variables.

In closing, as a beginner, I would like to work with voxeo because their support encourages me to do more in their products, which I could not find this with other companies. Although I am frustrated by their voice recognition all the time, I found a way to work around it. Of course, I wish that voxeo could improve it in the future soon.

1 comment:

  1. Greetings! We're glad you had a positive experience overall with our Voxeo platform and are pleased to see you recommend it to others. I thought I would respond to a couple of points here:

    1. On the TTS issue, I am sorry to hear the results were not up to your expectations. Did you engage with our support team at all on this issue? They may have some suggestions to offer you. If you haven't yet done so, please email support@voxeo.com or raise a ticket through our Evolution developer portal.

    2. Also regarding TTS, you wrote "Voxeo claims that the speech will be better with their pay voice version." In fact, the TTS engine you received for free in our developer edition is the exact same TTS engine that is in our commercial product. The free and commercial products are the same code base. Now, if you do want to use the TTS engine from another company, for example Nuance or Loquendo, you *can* use those TTS engines with our platform, but in those cases there *are* additional license fees. Voxeo's own TTS engine (and ASR engine) is included for free with no additional costs.

    3. Regarding your ASR difficulties, I am VERY surprised to hear of your frustration. We have a good number of companies using our free ASR engine in production situations without any problems. Our director of speech technologies is very interested to learn more about what precisely you were doing. Would you be open to emailing your contact info to me at dyork@voxeo.com ? I would like to put him in touch with you directly.

    4. Regarding your comments on the Application Debugger and Prophecy Log Search, I will pass that feedback along to our engineering team.

    Thanks again for your evaluation and feedback. Best wishes with our continued studies - as you are probably aware, we have all sorts of documentation at http://docs.voxeo.com/ We also have a developer blog and other info at http://blogs.voxeo.com/ Please do also make use of our support team - they are there to help any and all developers who are working with our platform.

    Thanks,
    Dan

    ReplyDelete