SemVox GmbH

How much AI do Smart Assistants need?

Blog-Eintrag   •   Jul 10, 2018 15:34 CEST

With the AI buzzword being the key to current developments in almost all fields of innovation, it’s time to look at the real value it brings to smart assistants.

What are the latest developments in AI and how do you make use of them?

Machine learning (ML) is the undisputed “big thing” in AI right now. We are very excited about the possibilities of machine learning and employ ML technology for different tasks, e.g., natural language understanding and user modelling, and also in customized form for application backends.

Deep learning does not equal Artificial Intelligence, and is not applicable in every area.

There is currently lots of hype around it, some people believe that machine learning alone will solve most, if not all, hard computational problems. Our response to this is that if all you have is a hammer, everything starts to look like a nail. Like all good craftsmen, we prefer a full toolbox. Nevertheless, it is true that machine learning offers powerful solutions with respect to dialog systems. Moreover, it can provide lots of synergy in combination with symbolic approaches. Weas a team have an exceptionally solid background in all of AI as a field, so we constantly evaluate and embrace any new technologies to devise better solutions. Often, this will result in a combination of newer and older AI technology.

The field of AI attacks the “hardest” computational problems, but the notion of what is a “hard” problem changes over time. A good example is that playing chess was once considered a task that definitely required human intelligence, but it is no more. A computer can beat most ordinary humans at chess by an extensive search of the possibility space of the game. This brute-force approach is considered “shallow” AI now that you can buy inexpensive chess computers in your local supermarket, but it was hard not too long ago.

The methods used in AI evolve and change with the available hardware, software, and data. The chess task was mainly solved by faster hardware. Now that vast amounts of data are available, data-centric approaches are becoming feasible, and we see very good results. But that does not mean that the past findings of AI are obsolete now, or that there will be no other paradigms in the future. Also, machine learning is not equally well applicable to all areas. To get the best results, we need to use the full set of established AI methods, and keep improving with what becomes available.

Can dialog modelling be substituted with dialog learning?

While it seems a good idea at first glance, a complete substitution is neither realistic nor desirable in most cases. Some types of dialogs are well suited for dialog learning, either from corpora or dynamically during the interaction.

Systems akin to the famous “Eliza” psychotherapist could harvest protocols of therapy sessions to get samples for the type of noncommittal responses expected of a therapist. Many Chatbot development kits use similar techniques. However, this usually amounts more to learning “how a typical conversation goes” to produce a convincing dialog. More sophisticated learning techniques are required to actually produce a cooperative interaction.

Most practical systems have requirements that can be problematic with a pure learning approach. At SemVox, we are primarily concerned with task-oriented interactions. For a task-oriented speech interface, the entertainment factor is secondary (although it is of course positive if a system is fun to use). However, the main focus is to provide an efficient and reliable way to complete tasks.

For this, it is important that the application designer is able to exactly define what the system will do in a given situation. This does not mean that the system must always be rigid and deterministic. If desired, fuzzy and probabilistic elements can be introduced at will. It also does not follow that every possible path has to be enumerated — techniques such as planning can be employed to capture complex domains with variability. Additionally, it is a good feature for an application to dynamically adapt to the user’s preferences and habits.

However, depending on the application, a tight control over what the system’s actions can be a hard requirement. For premium applications in areas like automotive or medical services it is mission critical that they are reliable, testable and to specification. Also, well thought-out usability rules provided by ergonomics experts should be honoured, even if collected interaction samples tell a different story.

Another limiting factor is that the data to learn from is typically sparse at the design time of a new application. Dialog corpora that capture the interactions to learn may be unavailable, since the application simply does not exist yet. There are methods to gather interaction samples for non-existing systems, such as Wizard-of-Oz experiments, but it is difficult and prohibitively expensive to produce a sensible corpus that covers all possible situations for non-trivial applications this way. In many cases, it is very doubtful that the effort would be less than an explicit specification process for the application.

Concerning dialog modelling, the most promising aspect to be learned automatically is deriving constraints and defaults on the tasks that stem from facts about the user, the interaction context and the situational context. For example, a system can learn all kinds of implicit in-task or meta-task preferences by observing the behaviour patterns of the user.

Can computers be emotional?

Given the current state of AI, we are safe to say that a computer program cannot experience emotions as living creatures do. However, from a practical viewpoint, it is possible to create an emotional model that takes into account the context and external input. The model can then derive an emotional state that influences the behaviour of the program. For example, if a dialog system is insulted repeatedly, it can be programmed to refuse to answer the user at some point.

In our opinion, the main point to consider is not the philosophical question, but rather the question of “how can an emotional capability improve the interaction with a computer”. This concerns that the system can have a grasp on the emotional state of the user on one hand, but it also can simulate an emotional state for itself. The relevant feature in both aspects is how they influence the reactions of the system.

A model of the user’s emotional and para-emotional state (e.g., anger or impatience) makes it possible to truly adapt and respond intelligently to the user’s needs and desires. From the development of basic emotional state, other parameters can be derived, such as whether the user’s cognitive load is high, whether she is satisfied or confused by the interaction, etc. Based on this information, the system can decide to offer more or less information in a single turn, use shorter prompts, or any number of other adaptations of the interaction that go beyond the basic action-plan required to complete a task.

The system can be provided with a model for its own emotional state. Some systems use this merely as a gimmick, but without better reasons, the user does not get much value beyond some fleeting novelty entertainment. There is also a question of plausibility. Your “intelligent” light switch could always assure you that it truly enjoys lighting the room for you. At best, this would come across a bit silly, but for most people, that joke would get old really quickly. To be accepted and make sense, emotional expression must be used with sensitiveness and also consider the social setting of the interaction.

Where does SemVox apply AI technology?

SemVox has broad experience with numerous fields of AI, including computational linguistics, phonetics, semantic modelling and inference, and machine learning. This enables us to employ the full spectrum of available technologies. Our approach is to carefully select and use whatever tool is best suited for a given task.

In many cases these tools are already built-in into the platform, but sometimes it means that it has to be extended to form a custom solution. In the case of built-in capabilities, the general ODP S3 API offers several “docking points” where an application can connect and use them. For example, if an application requires advanced reasoning over semantic structures, such as a user model, backend modules can make use of a built-in engine that provides rule-based inferences. Beyond high-level services, the API also gives the backend developer a toolbox for common low-level operations for manipulating semantic structures, such as pattern matching, unification and overlay.

Other tools can be utilized as a backend service. For example, if an application needs planning capabilities to organize more complex tasks, an external constraint satisfaction solver such as OptaPlanner can be employed.

What is our vision for the evolution of dialog systems in the medium term?

A very important point is that dialog systems need to become much more pleasant and efficient to use than they are today. Compared to human performance, current systems are still way too clumsy and insufficiently aware of the situational context.

One everyday example is multi-party dialog. In a situation with multiple dialog participants, each participant must obey certain rules with respect to turn taking (i.e., when to speak). Humans can do this subconsciously and very fast. They also take a variety of cues into account, such as prosody and body language. That a dialog system is able to rapidly and reliably process multi-modal information is crucial for a smooth and efficient dialog flow, and it also applies if there are just two speakers.

This also applies in the reverse direction. Humans can handle turn-taking well. To do so, they rely on subtle and not-so-subtle signals from dialog partners. Current systems do not provide many cues in that direction. Dialog-enabled robotic systems are a very promising platform in this regard. The “embodied setting” opens a whole new dimension for multimodal expression and sensing of emotional and communicative cues. It also adds much more plausibility for a system to have emotions if it actually is present in the environment and exhibits increased situational awareness.

Another essential task is to strike the balance between easy specification of dialog applications and expressive power. Traditional simplistic approaches like state-based machines are not adequate for real-world systems, since they have massive problems with scaling and interaction flexibility. However, with growing complexity, it becomes more challenging to design, specify and test applications. The ODP S3 framework offers an integrated workbench with editors for all aspects of the development process that continues to evolve and improve.

How can we make use of AI technology without compromising user privacy?

Some customers have expressed concerns about data security in the context of AI, especially when it comes to cloud-based solutions that involve collecting and processing user data, and potentially storing that information on third-party hardware. We take this very seriously and offer solutions for different degrees of data sensitivity.

In some application areas, such as health care, it is absolutely crucial that users can be secure that private information about them is not inappropriately shared. In other cases, for example if the user is anonymous anyway (think of a robot guide in a museum), data security may not be that important. To achieve the right amount of data protection, the individual solutions must be selected and adapted to the requirements of the application.

Some AI techniques do not involve collecting data about the user at all. If user data is necessary, there is a number of approaches to protect it to differing degrees. Relevant data can be stored locally, anonymised, obfuscated, aggregated or encrypted (or a combination thereof); user profiles can be anonymous, data can be safely discarded after use, to just name a few of the possibilities. If data protection is an issue, it must not be added as an afterthought, but be incorporated into the design of the application from the beginning. Depending on the concrete use case, SemVox is ready to discuss and devise the right strategy with potential customers and evaluate the trade-off between available functionality and data privacy.