A speech synthesizer is the main means of obtaining information about the contents of the screen for a blind computer user, and when there is no alternative in the form of a braille display, it is generally the only one. Due to the fact that support for working with tactile displays in the Android platform is only in its infancy, the importance of text-to-speech for users with visual impairments is becoming even greater.
At the time of this writing, there are several Russian speech synthesis solutions available for Android OS users with different synthesis quality and a set of disadvantages. We invite you to get to know them better and listen to their sound.
Since the Android system does not provide the user with the ability to switch speech synthesizers “on the fly”, a situation often arises when one voice engine has to read multilingual texts, at least with the presence of Russian and English. Even a complete transition to the English interface does not solve this problem, since to read Russian-language web pages or messages, you still have to switch to a Russian speech synthesizer, going all the way through the menu. In addition, work with speech synthesizers in general is inferior in performance to visual perception of data from the screen, so many experienced blind users prefer to tune TTS to the maximum reading speed to compensate for this lag.
In view of this, in this review, we will also pay considerable attention not only to the quality of speech synthesis as such, but also to such aspects as the reading of Latin text and the speed of speech, which are important for people using text-to-speech in screen readers.
At the end of the review, you can find a summary table of the assessment of existing solutions for three key parameters: the quality of synthesis, speech speed and support for reading Latin text.
Acapela TTS Voices
Acapela Group ported several dozen of its speech synthesizers to the Android operating system, among which there is the well-known Russian voice of Alena. In general, in terms of the quality of speech synthesis, this commercial solution is at a fairly high level, but the product is not devoid of a number of shortcomings, although, for the sake of fairness, it should be noted that at the time of writing the review it has a beta status.
First, the entire engine is extremely unstable and prone to errors on all supported Android versions, after which it has to be restarted.
Secondly, Alena has a common mistake with the lack of pronunciation of free-standing Russian soundless letters, such as soft and hard signs.
Thirdly, the voice is prone to swallowing the endings of phrases, especially on the border of the Cyrillic and Latin texts, which can be seen in the above example.
Of the positive features, we can highlight the fact that it is a high-quality voice, which at the same time is suitable not only for one-time tasks of scoring text, but also for constant work in screen readers, since it has a faster response than SVOX engines and does not have their problems with reading free-standing Latin letters.
Reading a text written in Latin letters is carried out according to the rules of the English language, although with a pronunciation that is extremely far from correct.
Alena’s maximum speed is not very high, so those who like fast work will most likely be disappointed.
To get started, first download the general Acapela TTS Voices engine from the Play Market, from the menu of which you should select the voice you are interested in. Then, in the menu that opens, click on the “Buy” button and go through the standard purchase procedure, then reopen the menu of this voice and load the synthesizer by clicking on the “Download” button.
System requirements: Android 2.2 or higher.
Price: 3.30 EUR.
Captin TTS Engine
Under the Android operating system, Anatoly Kamynin ported the Captain speech synthesizer, widely known in narrow circles.
At the time of this writing, the product is at the stage of public testing, but already now some are using it as the main voice of the system.
As you can hear from the audio recording, the voice is characterized by high intelligibility of pronunciation, but an unnatural sound, which makes it difficult for an unprepared user to understand it.
Of the existing shortcomings and specific features, the following can be noted.
Firstly, on some devices, the synthesizer duplicates the first character of a text segment, which creates a slight stuttering effect at the beginning of phrases.
Of the positive features, we can note the record speed of response, surpassing all existing analogues, as well as the possibility of more flexible configuration, in particular, multi-stage indication of the details of reading non-alphabetic characters: from rare punctuation characters to each space character.
The reading of the Latin text by the synthesizer is carried out strictly according to the rules of Latin, which, with knowledge of the relevant principles of reading, allows you to work with almost any language based on the Latin script.
In terms of speech speed, the Captain can also claim prizes, giving out some of the highest results.
System requirements: Android 2.2 to 3.0.
System requirements: Android 4.0 or higher.
One of the main developers of specialized software for Android, the Eyes-Free Project community, ported the well-known non-commercial eSpeak speech synthesizer to this operating system.
As you can hear from the demonstration, his speech in Russian has a number of significant shortcomings.
First, in the Russian text the synthesizer does not read uppercase characters, as it happened in the above entry with the words “Hello, Me, My, Details”.
Secondly, eSpeak splits the entire string of Cyrillic symbols arriving at the synthesizer into many small fragments, consisting either of several or even one letter. What explains such a ragged speech. This is especially noticeable in the presence of a soft mark at the end of words, which is almost always read separately.
Thirdly, the overall quality of the audio signal is rather low, which is especially noticeable when using headphones.
Of the positive qualities, we can note a slightly higher response speed compared to analogs, as well as, mainly, free of charge.
This synthesizer reads Latin text according to the rules of the English language and is of acceptable quality.
Regarding the maximum speech speed, eSpeak, unfortunately, cannot boast of high rates, so from this side you should not expect much from it.
System requirements: Android 2.2 or higher.
Milena in the Mobile Accessibility RU package
This speech synthesizer from Vocolazer is not a universal voice engine built into the Android system TTS service, but a built-in component of the Russian-language Mobile Accessibility software complex for screen access.
Thus, this voice cannot be used by any other program other than the Russian-language localization of the Code Factory product.
The synthesizer has a fairly high sound quality, as well as a decent response speed, although comprehensive testing of this aspect is complicated by the engine’s built-in into a specific application.
In general, this is a very high-quality speech synthesis solution, which is known from many other platforms, but it also has some drawbacks.
Firstly, because Milena is embedded in the Code Factory product, the user can work with it either inside the Mobile Accessibility environment, or in the Android system, but only when using the MA screen reader.
Secondly, the voice of Milena itself is characterized by a peculiar pronunciation of some letters, for example, the letter “h”, with some configurations of neighboring letters, which in particular is observed in such a word that is often used when voicing interfaces as “marked”.
On the positive side, we can also note the fact that due to the built-in synthesizer in the Mobile Accessibility package, the user has more subtle settings than those provided by the Android voice system service, for example, adjusting the reading of punctuation marks or phonetic reading of symbols.
Milena reads the Latin text according to the rules of the English language, but the correctness of pronunciation is often at a rather low level, although the general rules are roughly followed.
In terms of speech speed, this is one of the fastest synthesizers.
Cost: 69 EUR.
Link in the Play Market for a 30-day trial version …
Link to the full version in the Play Market …
SVOX SVOX Classic TTS
Another solution offers two commercial Russian-speaking voices from SVOX.
This is a female voice named Katya and a male voice named Yuri, which, due to their relationship, have similar advantages and disadvantages.
Of the differences, one can only note the richer low frequencies of Yuri’s voice, as well as, in our opinion, his more correct intonations. True, here it is also worth mentioning the subtle defects of Yuri’s speech when pronouncing hissing consonants, which, for example, can be seen in relation to the letter “h” in the word “dot”.
As for the general characteristics, then, as you can hear from the demo files, here the quality of synthesis is quite high and the readable information is perceived without much difficulty. However, these voices also have a number of disadvantages.
First, the synthesizer often pronounces fragments of text, which are a collection of letters and non-alphabetic characters, by symbols, and not together, as happened with the part of the link “www.tiflocomp.ru”. It also manifests itself on E-mail addresses and simply on the text, where the rules for setting spaces are not clearly observed, for example, in SMS.
Secondly, when entering text, the letters I, V, X, L, C, D and M are read as Roman numbers by the synthesizer, which is extremely difficult for a blind user who uses TTS not only to read books, but also for absolutely all work. conveniently. Moreover, the reading of Roman numbers also suffers from errors, for example, MI is considered the number 101, not 1001.
Third, the word processor of the voice engine does not handle Arabic numerals correctly, reading many numbers incorrectly. This is mainly noticeable in decimal fractions.
Fourthly, the synthesizer as a whole has a lower reaction speed than many analogs, although with an extremely insignificant difference.
Unfortunately, when it comes to reading the Latin text, the SVOX voices are not able to boast of good intelligibility.
As you can hear from the demo recording, SVOX generally reads the Latin alphabet according to the rules for reading Latin, but in places tries to pronounce it in the English manner, for example, the word “voice” is read as “voike”, not “voice”.
In terms of maximum speed, these voices are also not characterized by high rates.
System requirements: SVOX Classic Text To Speech Engine and Android 2.1 and higher.
Cost: 2.99 USD (for each voice separately).
Link in the Play Market for a 14-day trial version of Katja …
Link in the Play Market to the 14-day trial version of Yuri …
Link to the full version of Katja in the Play Market …
Link to the full version of Yuri in the Play Market …
This synthesizer, developed by Sergei Nechiporenko and distributed free of charge, is a cloud service that uses the Google TTS API in its work. In essence, the program does not deal with speech synthesis, it only embeds itself in the Android voice service, sends text information to the Google server and plays the returned audio signal.
This leads to the fact that an active Internet connection with a sufficiently high data transfer rate is required for the synthesizer to work.
As you can hear from the demo audio recording, Google TTS uses Katya’s already familiar voice from SVOX, so it has all the same features that were highlighted in the SVOX Classic TTS review.
Separately, it should only be noted that TTS Online does not support setting the speech rate, so the user will have to come to terms with its leisurely pace.
In addition, at the time of this writing, TTS Online does not support work in the Android 4.0.x Ice Cream Sandwich environment, and, according to the developer, it is not planned to change this in the foreseeable future.
System requirements: Android 2.2 to 3.x.
Below is a summary table of existing Russian-language speech synthesizers for Android OS, which reflects such characteristics as voice quality, applicable rules for reading Latin text and maximum speech speed.
|Synthesizer||Voice quality||Reading Latin||Maximum speech rate|
|Acapela TTS Voices||Very high||according to the rules of the English language||Average|
|Captin TTS Engine||Low||According to the rules of Latin||High|
|eSpeak TTS||Very low||According to the rules of the English language||Low|
|Milena from Mobile Accessibility RU||High||According to the rules of the English language||High|
|SVOX Classic TTS||High||According to the rules of Latin with distortion||Average|
|TTS Online||High||According to the rules of Latin with distortion||Very low|