Design and Implementation of Human-machine Voice Interaction System in Car Navigation

Design and Implementation of Human-machine Voice Interaction System in Car Navigation

As a natural man-machine interface, voice can make the car navigation system achieve safer and more humanized operation. It can be seen from the comparison of the functions of car navigation systems at home and abroad that supporting voice interaction is a development trend of car navigation systems. In addition, survey data from market information services company JD Power and Associates also shows that 56% of consumers prefer to choose voice-activated navigation systems. Therefore, it is meaningful to develop in-vehicle voice navigation system. At present, China already has the technical foundation for developing in-vehicle voice navigation systems, especially the text-to-speech conversion TTS technology and voice command recognition technology based on small and medium vocabulary have reached a more practical level. Based on the car navigation system of the research group and two domestic voice engines, this paper develops a car navigation system that supports voice interaction.

Car voice navigation system structure

Car voice navigation system is divided into two aspects of car navigation and navigation voice interaction from the function. Among them, car navigation functions include GPS satellite navigation and positioning, electronic map browsing and query, intelligent path planning, real-time display of navigation information such as vehicle geographic location and speed; navigation voice interaction function is divided into voice operation and voice prompts. In the design of the system, according to the needs of human-computer interaction, the hardware framework for designing the voice navigation system is shown in Figure 1.

Hardware framework of voice navigation system

The human-machine interaction interface between the voice navigation system and the user is composed of five interactive devices such as touch screen, buttons, microphone, display screen and loudspeaker. The hardware framework can realize the conventional manual interaction mode and the voice interaction mode. The whole system is divided into three subsystems: navigation subsystem, speech recognition subsystem and speech synthesis subsystem, and each subsystem communicates through interfaces to coordinate the completion of voice navigation tasks.

Design of Dialogue Mode of Man-machine Voice Interaction System for Vehicle Navigation

Navigation system state transition network

The entire navigation system is a complex human-computer interaction system. To facilitate the design of the voice interactive dialogue mode, the system is first divided into states, and then the state transition network of the entire system is described from the perspective of human-computer interaction. The system is divided into six function states, such as map browsing and function selection, and one exit state. Figure 2 describes the state transition network between these states.

State transition network

The nodes in the figure represent the various states of the system, and the lines with arrows represent the transition from the source state to the target state. The state transition network receives the user's operation as a driving event and completes the transition from one state to another state. A path in the network represents a specific interaction process.

The Design of Dialogue Mode of Various State Nodes in Navigation System

In order to facilitate the description of the internal dialogue mode of each state node, the state nodes are numbered S1 ~ S7 as shown in Figure 2, and Tmn is used to indicate the transition from state node Sm to state node Sn. In addition, referring to the representation method of the stateflow model, a dialogue model for describing the man-machine voice interaction system of car navigation is proposed. Redefine the description of the transition, using four attributes to describe a transition within the state node:

T = {P1, P2, P3, P4} (1)

Among them, t is used to indicate a conversion, P1 ~ P4 are the attributes of the conversion: P1 is the voice event; P2 is the voice output; P3 is the additional condition; P4 is the conversion action.

In this way, a transition t describes the user's voice input, the system's voice output, the restricted conditions of the dialog and the actions performed by the system in a conversation.

Take the map browsing status as an example to illustrate the process of designing the dialogue mode. The map browsing state is composed of two mutually exclusive sub-states: map roaming state and vehicle guidance state (see Figure 2). The human-computer interactions in these two seed states are mostly the same, so the two are unified under the map browsing state. For the interactive process of distinguishing between these two sub-states, additional conditions can be used to judge the current sub-state, and then different treatments are made. The dialog mode design of the map browsing status node is shown in Figure 3.

The Design of Dialogue Mode of Map Browsing State Node

Implementation of voice control commands

The realization scheme of the voice control command is shown in Figure 4. The left box in the figure represents the state transition network STN of the dialogue mode of the entire voice navigation system. According to the design of the dialogue mode, the system is divided into 7 state nodes, such as map browsing state, function selection state, and path planning state. Each state node has its own voice dialogue mode. The dialogue mode consists of several internal transitions. Therefore, the entire voice navigation system is a two-layer state transition network, and its internal transition is driven by voice events. The voice event is generated by the interface module of the navigation subsystem according to the user's intention sent by the voice recognition subsystem.

Implementation Scheme of Voice Control Command

The process of implementing voice control commands is divided into the following four steps:

The voice recognition engine recognizes the user's voice according to the current command vocabulary and obtains the recognition result.

The management window obtains the recognition result, obtains the control command corresponding to the recognition result by querying the "recognition word-control command" mapping, and sends the control command to the interface module of the navigation subsystem as the user's intention.

The interface module responds to user intent and changes the state of the voice navigation system through voice events.

The interface module judges whether the current command vocabulary needs to be changed according to the state of the voice navigation system, and if necessary, changes the current command vocabulary through the management window.

Recognition method of POI name

In addition to identifying control commands, the identification subsystem also needs to identify the POI (point of interest, mark point) name. The biggest difference between POI name recognition and control command recognition is the difference in the size of its candidate set. In this system, the maximum size of the candidate set for control command identification is about 30, but for POI name identification, taking the Beijing electronic map used as an example, the number of POI points is 20,172. At this time, its candidate The size of the set is several orders of magnitude larger than when the control command is recognized.

When using the command word recognition engine for recognition, you must provide the engine with a current vocabulary. You need to convert the entries in the candidate set into a vocabulary before you can truly recognize it. At the same time, the ASR recognition engine based on the small and medium vocabulary cannot generate a vocabulary with a size of more than 20,000, so for POI name recognition, a scheme different from control command recognition is adopted. When identifying control commands, because the candidate set can be represented by a vocabulary, an online identification method is adopted. When recognizing POI names, a single vocabulary cannot accommodate all POI names. Therefore, an offline traversal recognition scheme using the offline recognition function of the recognition engine is proposed. This scheme uses multiple vocabularies to describe the entire candidate set. The specific process of implementation is shown in Figure 5.

specific process

This scheme divides the candidate POI set into n subsets, and generates a vocabulary of each subset, and then uses each vocabulary as the current vocabulary for offline recognition, and summarizes these local recognition results into a temporary vocabulary. Recognize in this temporary vocabulary and get the global optimal recognition result. This process traverses each subset, which is equivalent to matching the optimal recognition result in the entire candidate set, so the recognition accuracy rate is guaranteed. At the same time, as the number of recognition increases, the recognition time becomes longer accordingly.

Implementation scheme of voice prompt of navigation system

The voice prompt of the navigation system is completed by a special voice synthesis subsystem. The process of implementing voice prompts is divided into two steps: making a request and executing a request. The requester and executor constitute a client / server (C / S) model, in which the speech synthesis subsystem acts as a server. Since the speech synthesis engine usually cannot output multi-line synthesized speech at the same time, it will encounter a conflict of requests. When a request conflict occurs, the most direct processing strategy is to abort the ongoing composition and proceed to the next composition, or maintain the ongoing composition and ignore the new composition request. To this end, a management module is designed in the speech synthesis subsystem to determine the processing method when a synthesis conflict occurs.

For the speech synthesis subsystem, the synthesis request is a random event, and this type of random event is referred to as Qi. Each synthetic request Qi has a priority attribute, and its priority depends on the importance of the requested prompt information, as shown in Table 1. The processing flow of the management module is shown in Figure 6. If the priority of the next request Qi + 1 is higher than the current request Qi, then the priority is to synthesize Qi + 1.

Management module processing flow

Test verification of in-vehicle voice navigation system

Figure 7 is a physical photo of the car's voice navigation system in this article. The verification test of voice navigation was carried out on this system, and the car navigation function shown in Table 2 was completed through voice interaction. Tests show that the state of the system can be completely and correctly converted according to the designed dialogue mode, and can complete the man-machine dialogue process of various navigation functions correctly; at the same time, the system's voice prompts can also work correctly.

Physical photos of car voice navigation system

In addition, the system's ability to correctly respond to voice control commands was tested. In the test, the 49 recognized words of all voice control commands in the map browsing state were tested with clear and smooth voice. A total of 49 × 3 = 147 times, 132 successes, 15 failures, and a success rate of 89.8%. It can be seen that the effectiveness of the system voice control commands is better.

In the test of mass POI name recognition, POI names with 2 to 10 words were tested. For each length of POI name, take 10 for testing. Each POI name is tested at most twice, and the second test is continued only if the first test fails. The test results are shown in Table 3.

It can be seen that the primary recognition accuracy rate of the offline traversal recognition scheme is 86.7%, and the secondary recognition accuracy rate is 93.3%. The average time for correct identification is between 6.1s and 10.4s, and the weighted average time to calculate according to the statistical distribution of the number of words in the POI name is 8.3s. The above data shows that this scheme can use a small vocabulary keyword recognition engine to realize the recognition of large vocabulary POI names, and obtain a satisfactory recognition accuracy rate, but it takes a long time.

Conclusion

This paper mainly completes the design and implementation of the car-navigation human-machine voice interaction system, and the experimental verification of the system in the laboratory environment.

It proves that the use of synthesized voice can realize rich and flexible voice prompts, so that users can use the navigation system without distracting too much energy. Further work is to improve the recognition accuracy and reduce the average time for correct recognition.

Project lamps is based on the actual needs of the project, customized for the specific lighting products, it plays decorative effect, also known as non-standard lamps, decorative lamps, custom lamps and so on. Engineering lamps  feature with custom, long production time, high production technology, strong professionallity , weak liquidity, highly difficult installation, after-sales service and so on.

project lamp

church chandelier

airport project

Lighting Project

Project Lamp,Rechargeable Spot Light,Lobby Room Project,Airport Project

Zhongshan Laidi Lighting Co.,LTD , http://www.idealightgroup.com

Posted on