Platform SDK: Agent

Be Efficient and Natural

When accomplishing tasks, effective human conversations are typically exchanges of brief information. Often, elements in the discussion are established between the parties and then referred to indirectly using abbreviated responses. These forms of abbreviation are beneficial because they are efficient, and they also imply that the speaker and listener have a common context; that is, that they are communicating. Using appropriate forms of abbreviation also makes a dialogue more natural.

One form of conversational abbreviation is the use of contractions. When they are not used, they make a speaker seem more formal and rigid, and sometimes less human. Most human conversations demonstrate more freedom in the linguistic rules than written text.

Another common form of abbreviation in conversations is anaphora, the use of pronouns. For example, when someone asks, "Have you seen Bill today?" responses that substitute "him" for "Bill" are more natural than repeating the name again. The substitution is a cue that the parties in the dialogue share a common context of who "him" is. Keep in mind that the word "I" refers to the character when he or she says it.

Shared context is also communicated by the use of linguistic ellipsis, the truncation of many of the words in the original query. For example, the listener could respond, "Yes, I saw him," demonstrating the shared context of when or even respond with a simple "Yes" that demonstrates the shared context of who and when.

Implicit understanding can also be conveyed through other forms of abbreviated conversational style, where content is inferred without repetition, as shown in the following example:

User: I'd like a Chicago-style pizza.
Character: With "Extra Cheese"?

Similarly, if someone says, "It is hot in here," the phrase is understandable and requires no further detail if you know where the speaker is. However, if the context is not well established or is ambiguous, eliminating all contextual references may leave the user confused.

When using abbreviated communication, always consider the user's context and the type of content. It is appropriate to use longer descriptions for new and unfamiliar information. However, even with long descriptive information, try to break it up into smaller chunks. This gives you the ability to change the animation as the character speaks. It also provides greater opportunity for the user to interrupt the character, especially when using speech input.

Consistency is important in speech output. Strange speech patterns or prosody may be interpreted as downgrading the intelligence of the character. Similarly, switching between TTS and recorded speech may cause users to interpret the character as strange or possessing more than one personality. Lip-synced mouth movements can improve intelligibility of speech. Microsoft Agent automatically supports lip-syncing for TTS engines that comply with its required SAPI interfaces. However, lip-syncing is also supported for recorded speech. Sound files can also be enhanced with the Microsoft Linguistic Sound Editing Tool.