References

One promising approach to improve ASR performance is using context knowledge regarding expected utterances. This information may heavily reduce the search space and lead to fewer missed recognitions. Oualil et al. [1] analyzed the benefits of using context information for pre-processing versus using context for post-recognition.

Helmke et al. extend the usage of context by generating the context from an assistance system, i.e. an AMAN, to support ABSR [2]. In 2016, it was shown that ABSR significantly reduces controllers’ workload, which translates into fuel burn reduction and an increased runway throughput. These results were quantified in [3] resp. [4]. MALORCA project aims at automatically adapting the speech recognition building blocks to different approach areas. Learning of command prediction, i.e. the relevant part of the assistant system, was described in [5]. Automatic adaptation results for Vienna and Prague approach area from 22 hours of controller-pilot speech recordings and the corresponding radar tracks were presented in [6]. Command recognition error rates of the baseline system were reduced from 7.9% to below 0.6% for Prague and from 18.9% to 3.2% for Vienna. The buildings blocks and their adaptation to different approach areas were presented in [7]. Most recently, it was reported that no safety issues were observed. The controller detected all misrecognitions, when speech recognition fails [8].

The MALORCA project continues exploring Statistical Language Models (SLMs). Even though, SLMs have shown to perform better than Grammar-based models [9], the MALORCA project has raised a unique challenge combining both model types, as the initial amount of transcribed data is relatively small (< 4 hours). As this can lead to a poor coverage of ATM commands, MALORCA project alleviate this problem by leveraging the ICAO grammar and constructing a hybrid SLM from this grammar and already trained SLM.

The grammar specifies the set of rules, defining the correspondence of command words to ATM concepts. These classes can then be used to build a class-based SLM, which has shown an improved ASR performance. Intuitively, this class-based LM allows overcoming the problem with lack of data by mapping everything to a class space. In this class space, correlations can be learned at a concept level; unlike the regular SLMs used earlier [1]. These class-based LMs and regular SLMs are linearly interpolated [10] to produce the final hybrid SLM. Eventually, this hybrid LM is converted to a first-pass decoding finite state transducer and employed in the ABSR pipeline; see [11] for details of AM and LM training in MALORCA project.

The problem with a pure grammar-based approach is that controllers often deviate from ICAO standard phraseology. Unique rules for annotation are needed to enable exchange of data of different annotators or even between different approach areas. Such a solution cannot be provided by researchers only, because detailed and specific domain knowledge as well as the formation of a joint view on that problem is necessary. Only a broad industrial consortium is in the position to create an appropriate solution. In this case, the structure of SESAR with its exploratory and industrial research part plays an important role. As several partners of the MALORCA project are part of the exploratory as well as the industrial research part of SESAR, they are in the position to bridge the gap between research and industry by aligning the work between parts in SESAR and speed up in this way the deployment of new technologies in ATM. The SESAR 2020 funded solution 16-04 agreed on an ontology, i.e. unique rules, for command transcription and annotation [12]. The main elements of the ontology, agreed by 16-04 partners, are callsign and instruction. 16-04 partners include Air Navigation Service Providers (ANS CR, Avinor, Austro Control, DFS, LFV, NATS, Romatsa), Research Institutes (CRIDA, DLR), ATM supplier industry (Frequentis, Indra, and Thales) and Integra as ATM consultancy.


Recent Publications:

Recapitulating achieved recognition performance for Prague and Vienna approach from MALORCA and new statistics obtained from various error analysis processes are presented. Results are detailed for different types of ATC commands followed by rationales causing the performance drops. [13]

Speech recognitions results and ATCos feedback of using a Commercial-Off-The-Shelf (COTS) Speech Recognizer from Nuance in combination with the Command Prediction and Checker components of DLR. [14]

Based on the experience with an ATC approach hypotheses generator, a prototypic tower command hypotheses generator (TCHG) was developed to face current and future challenges in the aerodrome environment. [15]

Reference to the predecessor projects AcListant® and AcListant®-Strips can be found here.

[1] Y. Oualil, M. Schulder, H. Helmke, A. Schmidt, and D. Klakow, “Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition,” Interspeech, Dresden, Germany, 2015.

[2] H. Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr, M. Kleinert, Y. Oualil, and M. Schulder, “Assistant-Based Speech Recognition for ATM Applications,” in 11th USA/ Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal, 2015.

[3] H. Helmke, O. Ohneiser, Th. Mühlhausen, and M. Wies,.”Reducing controller workload with automatic speech recognition”, in IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). Sacramento, California, 2016.

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

[4] H. Helmke, O. Ohneiser, J. Buxbaum, and Chr. Kern, “Increasing ATM efficiency with assistant-based speech recognition”, in 12th USA/Europe Air Traffic Management Research and Development Seminar (ATM2017). Seattle, Washington, 2017.

[5] M. Kleinert, H. Helmke, G. Siol, H. Ehr, M. Finke, A. Srinivasamurthy, and Y. Oualil, “Machine learning of controller command prediction models from recorded radar data and controller speech utterances,” 7th SESAR Innovation Days, Belgrade, 2017.

[6] M. Kleinert, H. Helmke, G. Siol, H. Ehr, A. Cerna, C. Kern, D. Klakow, P. Motlicek et al., ”Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas”, in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, England, 2018.

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

[7] M. Kleinert, H. Helmke, H. Ehr, Chr. Kern, D. Klakow, P. Motlicek, M. Singh, and G. Siol, “Building Blocks of Assistant Based Speech Recognition for Air Traffic Management Applications”. 8th SESAR Innovation Days, Salzburg, 2018.

[8] M. Kleinert, H. Helmke1, G. Siol, H. Ehr, D. Klakow, M. Singh, P. Motlicek, Chr. Kern, A. Cerna, and P. Hlousek, “Adaptation of Assistant Based Speech Recognition to New Domains and its Acceptance by Air Traffic Controllers” in Proc. of the 2nd International Conference on Intelligent Human Systems Integration (IHSI 2019): Integrating People and Intelligent Systems, Feb. 2019,San Diego, California, USA.

[9] Y. Oualil, D. Klakow, G. Szaszák, A. Srinivasamurthy, H. Helmke, P. Motlicek, “Context-aware speech recognition and understanding system for air traffic control domain”, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2017), Okinawa, Japan, Dec. 2017, pp. 404-408.

[10] M .Singh, Y. Oualil, D. Klakow, “Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition”, in Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, September 2017, pp. 2720-2724.

[11] A. Srinivasamurthy, P. Motlicek, I. Himawan, G. Szaszák, Y. Oualil, and H. Helmke, “Semi-supervised learning with semantic knowledge extraction for improved speech recognition in air traffic control,” in INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm Sweden, Aug. 2017.

[12] H. Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al., “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, United Kingdom, 2018.

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

[13] H. Helmke, M.Kleinert, J.Rataj, P. Motlicek, D. Klakow, C. Kern, and P. Hlousek, “Cost Reductions Enabled by Machine Learning in ATM – How can Automatic Speech Recognition enrich human operators’ performance?” in 13th USA/ Europe Air Traffic Management Research and Development Seminar (ATM2019), Vienna, Austria, 2019

[14] M. Kleinert, H. Helmke, S.Moos, P.Hlousek, C.Windisch, O.Ohneiser, H. Ehr, A.Labreuil, “Reducing Controller Workload by Automatic Speech Recognition Assisted Radar Label Maintenance”. 9th SESAR Innovation Days, Athen, 2019.

[15] O.Ohneiser, H. Helmke, M. Kleinert, G.Siol, H. Ehr, S.Hobein, A.Predescu, J.Bauer, “Tower Controller Command Prediction for Future Speech Recognition Applications”. 9th SESAR Innovation Days, Athen, 2019.