Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

04-25 21:30

관리 메뉴

zyint's blog

Text-To-Speech Interface 본문

예전글들

Text-To-Speech Interface

진트 2009. 2. 28. 07:59

Text-To-Speech Interface

Text-to-Speech Overview

TTS API의 핵심은 ISpVoice 이다. 이 인터페이스를 사용하여 말하는 동안의 실시간 이벤트 처리 뿐만 아니라 말할 텍스트, 목소리 특성, 목소리 변경 등을 할 수 있다. 사실 대부분의 어플리케이션에서는 이 인터페이스만으로도 기본적인 TTS 기능을 이용할 수 있다.

어플리케이션에서는 COM 객체를 생성하여서 ISpVoice 인터페이스에 접근할 수 있다. ISpVOice 객체는 여러 TTS 목소리를 단일 인터페이스로 만든 것이다. 모든 ISpVOice 객체는 각각의 목소리를 나타낸다. 서로 두 개의 서로다른 ISpVoice 객체가 있어도 서로 같은 목소리를 선택할 수 있고 두 개의 목소리를 서로 다르게 독립적으로 변경할 수도 있다.

Speaking

어플리케이션에서 ISpVoice 객체를 만들면, 객체가 기본 목소리로 초기화한다. 이 뜻은 새로운 객체가 생성되면 별도의 초기화 과정 없이도 즉시 텍스트를 말할 수 있도록 준비됨을 나타낸다. 이 때, 어플리케이션에서는 Speak or SpeakStream을 사용하여 어떠한 유니코드(Unicode) 텍스트 데이터도 목소리로 말하게 할 수 있다.

Speak	Speaks a text string or file.
SpeakStream	Speaks a text stream or plays an audio (WAV) stream.

Synchronous vs. Asynchronous Speaking

Speak or SpeakStream 함수를 이용하여 목소리를 재생할 때, synchronously(텍스트가 완전히 읽혀질때까지 함수가 끝나지 않는 방식)방식과 asynchronously(함수가 호출 즉시 return 되지만, background 프로세스에서는 계속해서 텍스트를 말하고있는 방식)방식이 있다. Asynchronous 방식을 이용하여 어플리케이션에서 말하고있는 텍스트를 강조하거나 애니메이션을 제공하거나 모니터 컨트롤 등 텍스트를 말하는 중에 다양한 동작을 수행할 수 있다. 그렇지 않은경우 간단하게 Synchronous 방식을 사용한다.

Getting Status Information

asynchronous speech 방식을 사용할 떄, 어플리케이션에서는 현재 상태(text position, speech done state, bookmarks 등) 정보를 받을 수 있다.

두 가지 방법:

주기적으로 ISpVoice 객체의 GetStatus 메소들를 호출한다.

GetStatus	Returns current speech and event status information.
WaitUntilDone	Delays until either the voice has completed speaking or the specified time interval has elapsed.
SpeakCompleteEvent	Returns an event handle that will be signaled when speech is done.

이벤트가 발생하는 즉시 메시지를 전달하는 실시간 이벤트(real-time event)처리를 위해 ISpVoice 객체를 초기화 한다.(Real-time Event Management (inherited from ISpEventSource))

SetInterest	Sets the type of events to queue.
GetEvents	Returns the queued events.
GetInfo	Returns information about the event queue.
SetNotifySink	Sets up the instance to make free-threaded calls through ISpNotifySink::Notify.
SetNotifyWindowMessage	Sets a window handle to receive notifications as window messages.
SetNotifyCallbackFunction	Sets a callback function to receive notifications.
SetNotifyCallbackInterface	Sets an object derived from ISpTask to receive notifications.
SetNotifyWin32Event	Sets up a Win32 event object to be used by this instance for notifications.
WaitForNotifyEvent	A blocking call which waits for a notification.
GetNotifyEventHandle	Retrieves Win32 event handle associated with this notify source.

Flow Control

대부분의 TTS 어플리케이션에서는 Pause, Resume 메소들를 이용하여 사용자가 음성합성을 일시 중단시킬 수 있다.

Pause	Pauses the output speech at the nearest alert boundary.
Resume	Resumes speaking.
Skip	Skips ahead or backward to a new input text position while speaking.

Modifying Voice Attributes

TTS에서 기본 설정 값을 변경할 필요가 있다.

두 가지 방법: . Typically, the API functions are used as global settings that affect the speech independent of current selected voice or document that is spoken. While the XML tags are usually used in much narrower scope, affecting only the spoken style in a single document.

ISpVoice API 함수를 호출하는 방법:

SetRate	Sets the speaking rate in real time.
GetRate	Returns the current speaking rate.
SetVolume	Sets the speech volume level in real time.
GetVolume	Returns the current speech volume level.
SetVoice	Sets the identity of the voice used for synthesis.
GetVoice	Retrieves the object token that identifies the current voice.

embedding special Extended Markup Language (XML) 태그를 이용하여 텍스트에서 설정하는 방법

Audio Output

비록 대부분의 데스크탑 어플리케이션에서 별도의 audio output에 대해 설정을하지는 않지만, SAPI TTS는 audio를 사운드카드로 보낼지, 메모리의 버퍼로 보낼지, 아니면 기타 별도의 하드웨어로 전송할지 여부를 제공한다. ISpVoice에서는 다음과 같이 Audio Output Control 메소드를 제공한다.

SetOutput	Sets the current output object. A value of NULL may be used to select the default audio device.
GetOutputStream	Retrieves a pointer to the current output stream.
GetOutputObjectToken	Retrieves the object token for the current output object.

기타(Miscellaneous)

SetPriority	Sets the priority for the voice.
GetPriority	Retrieves the current voice priority level.
SetAlertBoundary	Specifies which event should be used as the insertion point for alerts.
GetAlertBoundary	Retrieves the event that is currently being used as the insertion point for alerts.
IsUISupported	Determines if the specified type of UI is supported.
DisplayUI	Displays the requested UI.
SetSyncSpeakTimeout	Sets the timeout interval in milliseconds after which, synchronous Speak and SpeakStream calls to this instance of the voice will timeout.
GetSyncSpeakTimeout	Retrieves the timeout interval for synchronous speech operations for this ISpVoice instance.

참고자료

Text-to-Speech Overview, MSDN

이 글은 스프링노트에서 작성되었습니다.

'예전글들' Related Articles

Comments

zyint's blog

Text-To-Speech Interface 본문

Text-To-Speech Interface

Text-To-Speech Interface

Text-to-Speech Overview

Speaking

Synchronous vs. Asynchronous Speaking

Getting Status Information

Flow Control

Modifying Voice Attributes

Audio Output

기타(Miscellaneous)

참고자료

티스토리툴바