Phonetic Translation Library

Open source Platform-compatible Transliteration


Features. 3

For Developers. 6

Using PhTranslation COM Objects. 10

API Usage. 15

PhWordpad Application. 4

PhTranslator Application. 5

Alphabet Mappings. 18

PhAlphabetEditor 18

Bengali Phonetic Table. 20

Gujarati Phonetic Table. 21

Hindi Phonetic Table. 22

Kannada Phonetic Table. 23

Malayalam Phonetic Table. 24

Oriya Phonetic Table. 25

Sanskrit Phonetic Table. 26

Tamil Phonetic Table. 27

Telugu Phonetic Table. 28

Example Transliterations: 29

Project Download Page: http://sourceforge.net/projects/phtranslator/


Features

Phonetic Translation Library is a project that aims to create reusable components (C++ libraries, COM components, and Edit controls) for Phonetic Transliteration of Indian languages, such as Telugu, Tamil, and Kannada etc. Reuse, platform-compatibility and easy-of-use are the primary design considerations. This release ships with both ready to use end-user tools and programming API libraries, components. In-built support for 10 Indian Languages.

End-user tools available as part of this are:

·        A simple phonetic transliteration application for typical users who use Indian language content (mails, instant messaging, greeting card designs). Just enter the words and get them transliterated phonetically to the chosen language.

·        A complete RichText editing application to typeset documents in Indian languages. This is for more serious users who would like to use phonetic transliteration for creating Indian language documents (Desktop publishing). Helpful for preserving old Indian classical books and documents in native language format. Supports multiple platform-compatible output formats such as Html, OpenOffice document and Acrobat PDF.

Library API/Components available for Developers are:

·        C/C++ API for using directly as a Library (through source reference or Dll reference) from any C/C++ project

·        COM components for use with Script languages (vb/java script). The component is Registration-free-COM ready.

·        DllImport options for use with .Net and VB applications

·        Multi-platform compatible Phonetic RichEdit Text control ready to be used with GUI applications. Supports Mac, Linux and Windows versions

Supported Languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Sanskrit, Tamil, Telugu

In-addition to these in-built translation tables, support is available for customized tables. One can save any in-built table to a phonetic table file, edit it and load it as a custom language.


For Developers

Phonetic Translation Library has been created with explicit intent on being easy-to-use and reusable across platforms. With as little as 3 lines of code, you should be able to start translating any Indic language content phonetically. A simple example is as below:

 

  #include “PhTranslateLib.h”

 

  wchar_t szOutput[128];

  ::Translate(GetSanskritTranslator(), “bhaaShaa parivartanam”, szOutput, 128); // szOutput will be भाषा परिवर्तनम्

 

If you are a developer who would like to use PhTranslation library in your own applications, you have multiple options to do so.

·        Linking to the PhTranslation Library source code directly: This is useful if you are a C/C++ developer

·        Using the PhTranslation DLLs for P/Invoke: This is useful if you are a .Net Developer

·        Using the PhTranslation COM components: This is useful for any cross-language development

 

For linking directly with source code, add the PhTranslateLib project to your solution and build it. If you do not want to link against the source code, then you can just #include the PhTranslateLib.h header file and link the PhTranslateLib.Lib to your code. The PhTranslateLib is compiled as a DLL by-default. So, when you link against the PhTranslateLib.Lib, you need to place the PhTranslateLib.Dll in your output directory to make it accessible to your application.

#include “PhTranslateLib.h” and link against PhTranslateLib.Lib

PhTranslateLib.h hosts the translator methods that you need to call from your application. GetxxxTranslator() methods to retrieve one particular language translator object, and Translate() methods to perform the actual text translation. Note that Translate methods work on both char* and wchar_t* inputs.

 

The GetTranslator methods supported are:

 

    // Get the Telugu Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetTeluguTranslator();

 

    // Get the Bengali Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetBengaliTranslator();

 

    // Get the Gujarati Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetGujaratiTranslator();

 

    // Get the Hindi Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetHindiTranslator();

 

    // Get the Kannada Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetKannadaTranslator();

 

    // Get the Malayalam Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetMalayalamTranslator();

 

    // Get the Punjabi Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetPunjabiTranslator();

 

    // Get the Oriya Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetOriyaTranslator();

 

    // Get the Sanskrit Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetSanskritTranslator();

 

    // Get the Telugu Translator.

    // The output of this method must be sent as input to the Translate Method.

    PHTRANSLATELIB_API void* GetTamilTranslator();

All the above GetxxxTranslator() methods return a Translator object based on in-built translation tables. They use the mappings defined in the LanguageCodes.h header file (Refer to the tables in the Alphabet Mappings section for the in-built transliterations). However, if you wish to create a translator based on your own phonetic table mappings, then you should use the CreateCustomTranslator () method instead of the above GetTranslator methods.

 

    // Creates a Translator based on the PhoneticTables loaded from the specified file.

    // The output of this method must be sent as input to the Translate Method.

      // Use ReleaseCustomTranslator method to release the created translator.

    PHTRANSLATELIB_API void* CreateCustomTranslator(const char* szPhoneticTableFilePath);

 

    // Releases a Translator previously created with the CreateCustomTranslator() method

    PHTRANSLATELIB_API void ReleaseCustomTranslator(void* Translator);

 

 

Note that, unlike the in-built translator objects, the custom translator need to be released once its usage is complete. To release, use the ReleaseCustomTranslator method. No such release is required for the translators returned with GetTranslator methods.

 

Once you have the translator object ready, you can supply it to one of the Translate methods to get the actual translation done. The Translate methods supported are:

 

    // Translates the given Phonetic English string.

    // Parameters:

    //  [in]  Translator: This must be a value returned by one of the GetTranslator methods or the CreateCustomTranslator method

    //  [in]  szInput: The Phonetic English String that is to be translated

    //  [out] szOutput: The Translated String in Unicode representation

    //  [in]  nLen: Max no.of wide chars to be filled. szOutput[nLen-1] will be '\0' if the buffer is small.

    //  [return] Returns the length of the full converted string. szOutput might be holding only a fraction of it, if nLen is small.

    //  Remarks: Send szOutput as NULL and to get the required length of the buffer.

    PHTRANSLATELIB_API int Translate(void* Translator, const char* szInput,

                                      wchar_t* szOutput, const int nLen);

 

    // Translates the given Phonetic English string.If the string contains non-Ascii characters they will be

    // inserted into the output string as is.

    // Parameters:

    //  [in]  Translator: This must be a value returned by one of the GetTranslator methods or the CreateCustomTranslator method

    //  [in]  szInput: The Phonetic English String that is to be translated

    //  [out] szOutput: The Translated String in Unicode representation

    //  [in]  nLen: Max no.of wide chars to be filled. szOutput[nLen-1] will be '\0' if the buffer is small.

    //  [return] Returns the length of the full converted string. szOutput might be holding only a fraction of it, if nLen is small.

    //  Remarks: Send szOutput as NULL and to get the required length of the buffer.

    PHTRANSLATELIB_API int TranslateW(void* Translator, const wchar_t* szInput,

                                      wchar_t* szOutput, const int nLen);

 

    // Translates the given string and returns the required buffer size to be allocated to hold the output.

    // You can directly use the return value to allocate the buffer size as wchar_t* psz = new wchar_t[GetTranslatedBufferLength(...)];

    // The actual translated buffer can later be filled into the allocated string by supplying it along with the pHint to the GetTranslatedBuffer() method.

    // Parameters:

    //  [in]  Translator: This must be a value returned by one of the GetTranslator methods or the CreateCustomTranslator method

    //  [in]  szInput: The Phonetic English String that is to be translated

    //  [Out] ppHint: Returns a Hint object pointer that can be used to retrieve the translated buffer later

    PHTRANSLATELIB_API int GetTranslatedBufferLength(void* Translator, const char* szInput, void** ppHint);

 

    // Translates the given string and returns the required buffer size to be allocated to hold the output.

    // You can directly use the return value to allocate the buffer size as wchar_t* psz = new wchar_t[GetTranslatedBufferLengthW(...)];

    // The actual translated buffer can later be filled into the allocated string by supplying it along with the pHint to the GetTranslatedBuffer() method.

    // Parameters:

    //  [in]  Translator: This must be a value returned by one of the GetTranslator methods or the CreateCustomTranslator method

    //  [in]  szInput: The Phonetic English String that is to be translated

    //  [Out] ppHint: Returns a Hint object pointer that can be used to retrieve the translated buffer later

    PHTRANSLATELIB_API int GetTranslatedBufferLengthW(void* Translator, const wchar_t* szInput, void** ppHint);

 

    // Retrieves the translatedand buffer previously computed with GetTranslatedBufferLength/GetTranslatedBufferLengthW method.

    // Upon success, the Hint object will be reset to NULL to restrict its further usage.

    // Parameters:

    //  [out] szOutput: The buffer to hold the Translated String in Unicode representation

    //  [in/Out] ppHint: The Hint object pointer. This will be reset to NULL upon return.

    PHTRANSLATELIB_API void GetTranslatedBuffer(wchar_t* szOutput, void** ppHint);

 

 

TranslateW is the Unicode version of Translate. Both methods take the English phonetic text as input and fill the supplied buffer with the translated content. Note that the methods vary only in their input text format. TranslateW accepts wchar_t input, while Translate accepts char input. But both methods use wchar_t for returning the output. This is because, the input string usually contains only English alphabet characters (which can be represented with just char), where as the translated string contains Unicode characters that require wchar_t.

 

Also note that if you want to use these methods you need to pass in an empty buffer with enough space pre-allocated. But how would you know how much size is needed to hold the translated output? Usually there is no fixed calculation, but usually twice the input string length should do well. However, if you want to know the output buffer size exactly, then you can call the method with NULL specified for the szOutput parameter. The translate method will return the size of the output buffer required, which you can pass on to the memory allocation routine and get the buffer allocated. Of course, you need to call the Translate method again with the allocated buffer to get the actual content.

 

A typical usage is:

 

int nOutputSize = 0;

 

wchar_t* psz = new wchar_t[nOutputSize = TranslateW(GetTeluguTranslator(), L”PhoneticStringInput”, NULL, 0)];

 

TranslateW(GetTeluguTranslator(), L”PhoneticStringInput”, psz, nOutputSize);

 

delete[] psz;

 

Note the call to TranslateW being made twice (once with output buffer set as NULL, and the next time with the actual allocated buffer).

However, this method of knowing the output buffer size is not an efficient one, though. This makes the translation done twice (once for calculating the output size, and the second for actual translation). As informed, there is no way of calculating the output size, other than to actually compute it.

To make this process efficient, you can use the next set of methods:  GetTranslatedBufferLength and GetTranslatedBuffer. You call GetTranslatedBufferLength to know the size of the output buffer. This will, as usual, translate the content and return the output size. However, it will not discard the translated string – instead, it holds onto it internally and gives you a pointer that you can pass on to GetTranslatedBuffer to retrieve it. This way, the translation is done only once. Sample usage is as below:

 

void* pHint = NULL;

 

wchar_t* psz = new wchar_t[GetTranslatedBufferLengthW(GetTeluguTranslator(), L”PhoneticStringInput”, &pHint)];

 

GetTranslatedBuffer(psz, &pHint); // Retrieve the storted buffer

 

delete[] psz;

 

Note the pHint pointer being used as a link between the GetTranslatedBufferLength and GetTranslatedBuffer. (GetTranslatedBufferLengthW is Unicode version of GetTranslatedBufferLength – you can either use GetTranslatedBufferLengthW or GetTranslatedBufferLength, based on your input character type. They both work same with GetTranslatedBuffer).

Using PhTranslation COM Objects

The Phonetic Translation COM objects are shipped in the PhTranslateCOM.dll. These are useful if you are planning to use the library from COM-Compatible languages, such as VBScript, or with .Net-COM interop. (Note: If you are using the library from regular C++ applications or through .Net P/Invoke, you do not need to use these COM objects and hence you can skip this section and instead refer: API Usage)

To start using the COM components, you first need to register the dll on your machine, by using the RegSvr32.exe command as below:

   C:\Windows\System32\regsvr32.exe PhTranslateCOM.dll

Once the component is registered, you can use CoCreateInstance as usual to create the objects. Sample example is as shown below:

 

    IPhTranslator* pTranslator = NULL;

 

    HRESULT hr = CoCreateInstance(__uuidof(SanskritTranslator), NULL, CLSCTX_ALL, IID_IPhTranslator, (LPVOID*)&pTranslator);

 

    if(FAILED(hr)) printf("\nCoCreateInstance Failed with 0x%x", hr);

 

You need to #include "PhTranslateCOM_i.h" to be able to use the UUIDs. The list of uuids supported is:

 

EXTERN_C const CLSID CLSID_TeluguTranslator; // ProgID: PhTranslation.TeluguTranslator

 

EXTERN_C const CLSID CLSID_BengaliTranslator; // ProgID: PhTranslation.BengaliTranslator

 

EXTERN_C const CLSID CLSID_HindiTranslator; // ProgID: PhTranslation.HindiTranslator

 

EXTERN_C const CLSID CLSID_GujaratiTranslator; // ProgID: PhTranslation.GujaratiTranslator

 

EXTERN_C const CLSID CLSID_KannadaTranslator; // ProgID: PhTranslation.KannadaTranslator

 

EXTERN_C const CLSID CLSID_MalayalamTranslator; // ProgID: PhTranslation.MalayalamTranslator

 

EXTERN_C const CLSID CLSID_PunjabiTranslator; // ProgID: PhTranslation.PunjabiTranslator

 

EXTERN_C const CLSID CLSID_SanskritTranslator; // ProgID: PhTranslation.SanskritTranslator

 

EXTERN_C const CLSID CLSID_TamilTranslator; // ProgID: PhTranslation.TamilTranslator

 

EXTERN_C const CLSID CLSID_OriyaTranslator; // ProgID: PhTranslation.OriyaTranslator

 

EXTERN_C const CLSID CLSID_CustomTranslator; // ProgID: PhTranslation.CustomTranslator

 

All translator objects implement IPhTranslator interface that is defined as below:

[

      object,

      uuid(581A99EE-6C43-42F2-9A48-0CE5EE15C469),

      dual,

      nonextensible,

      helpstring("IPhTranslator Interface"),

      pointer_default(unique)

]

interface IPhTranslator : IDispatch{

      [id(1), helpstring("method Translate")] HRESULT Translate([in] BSTR inPhoneticString, [out,retval] BSTR* pTranslatedString);

      [id(2), helpstring("method SavePhoneticTable")] HRESULT SavePhoneticTable([in] BSTR bstrPhTableFilePath);

      [id(3), helpstring("method LoadPhoneticTable")] HRESULT LoadPhoneticTable([in] BSTR bstrPhTableFilePath);

};

To use a COM object for translation, all that you need to do is, create the COM object with the appropriate UUID or the ProgID and use the IPhTranslator:: Translate() method. In case of custom translator, you need to call the LoadPhoneticTable() method before making any Translate calls. Note that LoadPhoneticTable() will not work on other translators. You can use the SavePhoneticTable() method to save the phonetic tables to a file. This method is applicable for all translators (custom and in-built).

If you would like to use these components in Registration-Free-COM manner, you first need to add activation manifest for your application with the COM class types defined in it. A sample activation manifest is as below:

<?xml version="1.0" encoding="utf-8"?>

<assembly>

  <assemblyIdentity name="Your_Application_Name.exe" version="1.0.0.0" type="win32" />

  <file name="PhTranslateCOM.dll" asmv2:size="201728">

    <hash xmlns="urn:schemas-microsoft-com:asm.v2">

      <dsig:Transforms>

        <dsig:Transform Algorithm="urn:schemas-microsoft-com:HashTransforms.Identity" />

      </dsig:Transforms>

      <dsig:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />

      <dsig:DigestValue>CZZ+w0pLx36LPKY2orTea7eLqWw=</dsig:DigestValue>

    </hash>

    <typelib tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" version="1.0" helpdir="" resourceid="0" flags="HASDISKIMAGE" />

    <comClass clsid="{a4e71d2a-480a-4cf0-a4a8-dcbfb3f99f32}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.TeluguTranslator.1" description="TeluguTranslator Class" />

    <comClass clsid="{5aa852e3-965d-4dca-8b15-cb64125be3d6}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.BengaliTranslator.1" description="BengaliTranslator Class" />

    <comClass clsid="{dc5e7dd2-4565-4041-beac-45e3692f6a61}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.HindiTranslator.1" description="HindiTranslator Class" />

    <comClass clsid="{8fe6872e-c45b-4c98-b6f7-5b6c6f137e00}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.GujaratiTranslator.1" description="GujaratiTranslator Class" />

    <comClass clsid="{e692a113-b73a-4896-8bf5-6a0d3c87dbc1}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.KannadaTranslator.1" description="KannadaTranslator Class" />

    <comClass clsid="{c46f00bb-fadd-404b-8e78-ed29ca9707e3}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.MalayalamTranslator.1" description="MalayalamTranslator Class" />

    <comClass clsid="{b15cd237-a8a6-4c74-9d98-d16ce7451009}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.PunjabiTranslator.1" description="PunjabiTranslator Class" />

    <comClass clsid="{f075931b-17ff-491a-a3bb-7fff6b5a31b5}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.SanskritTranslator.1" description="SanskritTranslator Class" />

    <comClass clsid="{ba6d442f-3b70-42e4-adb0-b89cff57a188}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.TamilTranslator.1" description="TamilTranslator Class" />

    <comClass clsid="{e05bf268-26fe-49f2-ad1f-9cfb97b7f5fa}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.OriyaTranslator.1" description="OriyaTranslator Class" />

    <comClass clsid="{5d6b9596-63f2-4989-9b43-a8b79c28c0af}" threadingModel="Apartment" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}" progid="PhTranslation.CustomTranslator.1" description="CustomTranslator Class" />

  </file>

            <comInterfaceExternalProxyStub name="IPhTranslator" iid="{581A99EE-6C43-42F2-9A48-0CE5EE15C469}" proxyStubClsid32="{00020424-0000-0000-C000-000000000046}" baseInterface="{00000000-0000-0000-C000-000000000046}" tlbid="{1abde916-4c9a-4866-815f-b4a26cb7347f}"/>

 

</assembly>

Once you have this above manifest saved on to disk, you can use the Activation API to load it at run time and create the instances of COM objects. A complete example is as below:

 

#include <ComDef.h>

#include <Windows.h>

#include <AtlStr.h>

 

#include "PhTranslateCOM_i.h"

 

#define Error(x)    exit(fprintf(stderr, "%s", x));

 

int _tmain(int argc, _TCHAR* argv[])

{

    TCHAR szInput[32];

 

    printf("\nEnter some Phonetic Text to appear in Sanskrit: ");

    wscanf(L"%20s", szInput);

 

    TCHAR szCurDir[MAX_PATH];

    GetCurrentDirectory(MAX_PATH, szCurDir);

 

    CString strManifest(szCurDir);

    strManifest += _T("\\COMTestClient.exe.manifest"); // Path of Activation Manifest

 

    CoInitialize(NULL);

 

    // Create an Activation Context structure

    ACTCTX actctx;

    ZeroMemory(&actctx, sizeof(actctx));

    actctx.cbSize = sizeof(actctx);

    actctx.lpSource = strManifest; // Give complete path to the Activation Manifest

 

    HANDLE pActCtx = CreateActCtx(&actctx);

    if(pActCtx == INVALID_HANDLE_VALUE)   {     Error(_T("CreateActCtx"));      return -1;  }

 

    ULONG_PTR lpCookie;

    if(!ActivateActCtx(pActCtx, &lpCookie))     { Error(_T("ActivateActCtx")); return -1; }

 

    IPhTranslator* pTranslator = NULL;

 

    HRESULT hr = CoCreateInstance(__uuidof(SanskritTranslator), NULL, CLSCTX_ALL, IID_IPhTranslator, (LPVOID*)&pTranslator);

    if(FAILED(hr))

        printf("\nCoCreateInstance Failed with 0x%x", hr);

    else

    {

        _bstr_t bstrInput(szInput);

 

        BSTR bstrOutput;

 

        if(FAILED(hr = pTranslator->Translate(bstrInput, &bstrOutput)))

            printf("\nTranslation Failed with 0x%x", hr);

        else

        {

            MessageBox(NULL, bstrOutput, _T("Translated Text"), MB_OK);

            printf("\nRegistration Free COM worked fine !!\n");

        }       

    }

 

    CoUninitialize();  

 

    return 0;

}

 

Phonetic Translation COM objects are Dual Interface compatible. Hence, they can be invoked from any script language. A simple VBScript that uses these objects for translating phonetic strings into Indian languages is shown below:

 

  dim strTranslated

 

  str = "telugu"

  set obj = createobject("PhTranslation.TeluguTranslator")

  strTranslated = obj.Translate(str)

 

  str = "bengaalI"

  set obj = createobject("PhTranslation.BengaliTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "gujaraatI"

  set obj = createobject("PhTranslation.GujaratiTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "hinDI"

  set obj = createobject("PhTranslation.HindiTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "kannada"

  set obj = createobject("PhTranslation.KannadaTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "malayaalam"

  set obj = createobject("PhTranslation.MalayalamTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "panjaabI"

  set obj = createobject("PhTranslation.PunjabiTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "oriyaa"

  set obj = createobject("PhTranslation.OriyaTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "saMskzrtaM"

  set obj = createobject("PhTranslation.SanskritTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  str = "tamizL"

  set obj = createobject("PhTranslation.TamilTranslator")

  strTranslated = strTranslated + vbcrlf + obj.Translate(str)

 

  Msgbox(strTranslated)

API Usage

While Phonetic Translation COM objects are useful for scripting languages like VBScript and .Net-COM Interop, the regular Phonetic Translation Library API is more easier to be used from regular C++ and P/Invoke compatible languages (such as .Net P/Invoke).

The API offers GetxxxTranslator() and Translate() methods that work together.  A typical usage would be as below:

 

#include “PhTranslateLib.h”

 

      wchar_t szTranslatedSting[128];

 

::Translate(GetTeluguTranslator(), “iDi telugu vaakyaM”, szTranslatedSting, 128);

 

The Translate() method takes its input phonetic string in the char * format. TranslateW() is its sibling that accepts wchar_t* format input. In both cases the translated content is always filled in a wchar_t* buffer. All GetxxxTranslator() methods return a pointer to one of the in-built translators (such as Bengali translator, Gujarati Translator etc…). If you want to load a custom translator based on custom phonetic tables, use the CreateCustomTranslator() method as shown below:     

 

wchar_t szTranslatedSting[128];

 

void* pTranslator = CreateCustomTranslator(“c:\\MyPhoneticTable.PhTable”);

 

Translate(pTranslator, “input scentence”, szTranslatedSting, 128);

 

ReleaseCustomTranslator(pTranslator);

 

Note that the Translate method takes a wchar_t* buffer as one of its parameter to fill it with the translated content. We may not the size of buffer required apriori. In such case we can either specify a large enough buffer (typically twice the input size should work), or use the GetTranslatedBufferLength method along with GetTranslatedBuffer.

A typical .Net usage of these two methods is:

 

    m_CurrentTranslator = GetTeluguTranslator();

 

    String strInput = textBox_Input.Text;

    IntPtr ppHint = System.IntPtr.Zero;

           

    int nLen = GetTranslatedBufferLengthW(m_CurrentTranslator, strInput, ref ppHint);

    StringBuilder strBufferW = new StringBuilder(nLen);

    GetTranslatedBuffer(strBufferW, ref ppHint);

 

    textBox_Output.Text = strBufferW.ToString();

 

GetTranslatedBufferLengthW is unicode version of GetTranslatedBufferLength. Note that when we use GetTranslatedBufferLength, we should not use Translate method, instead we should use GetTranslatedBuffer method. GetTranslatedBufferLength returns a hint pointer that should be passed to GetTranslatedBuffer method. This is because of two reasons:

1.      The GetTranslatedBufferLength method computes the size of the output buffer required by actually performing the translation. By calling  GetTranslatedBuffer, we can retrieve the translated string without recomputing it.

2.      GetTranslatedBufferLength returns a pointer to the actual transliterated string, which should be passed to GetTranslatedBuffer method (where it will be freed). Not calling GetTranslatedBuffer after calling GetTranslatedBufferLength will lead to memory leaks.

The DllImports for .Net will typically look like:

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetTeluguTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetBengaliTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetGujaratiTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetHindiTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetKannadaTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetMalayalamTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetPunjabiTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetOriyaTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetSanskritTranslator();

 

    [DllImport("PhTranslateLib.dll")]

    public static extern IntPtr GetTamilTranslator();

 

    [DllImport("PhTranslateLib.dll")]

   public static extern IntPtr CreateCustomTranslator([MarshalAs(UnmanagedType.LPStr)] String szPhoneticTableFilePath);

 

    [DllImport("PhTranslateLib.dll")]

    public static extern void ReleaseCustomTranslator(IntPtr Translator);

 

    [DllImport("PhTranslateLib.dll")]

    public static extern int Translate(IntPtr Translator, [MarshalAs(UnmanagedType.LPStr)] String szInput, [MarshalAs(UnmanagedType.LPWStr)] StringBuilder szOutput, int nLen);

   

    [DllImport("PhTranslateLib.dll")]

    public static extern int TranslateW(IntPtr Translator, [MarshalAs(UnmanagedType.LPWStr)] String szInput, [MarshalAs(UnmanagedType.LPWStr)] StringBuilder szOutput, int nLen);

   

    [DllImport("PhTranslateLib.dll")]

     public static extern int GetTranslatedBufferLength(IntPtr Translator, [MarshalAs(UnmanagedType.LPStr)] String szInput, ref IntPtr ppHint);

 

    [DllImport("PhTranslateLib.dll")]

     public static extern int GetTranslatedBufferLengthW(IntPtr Translator, [MarshalAs(UnmanagedType.LPWStr)] String szInput, ref IntPtr ppHint);

 

    [DllImport("PhTranslateLib.dll")]

    public static extern void GetTranslatedBuffer([MarshalAs(UnmanagedType.LPWStr)] StringBuilder szOutput, ref System.IntPtr ppHint);

 

    [DllImport("PhTranslateLib.dll")]

    public static extern bool SavePhoneticTable(IntPtr Translator, [MarshalAs(UnmanagedType.LPStr)] String szFilePath);

 

 

In-built phonetic tables can be saved to files by using the method as shown below:

           

   

    SavePhoneticTable(GetSanskritTranslator(), saveFileDialog1.FileName);

 

 

The phonetic table is a text file that lists the phonetic string and Unicode character mappings. You can edit and reload it as a custom phonetic table (with CreateCustomTranslator method).


PhWordpad Application

PhWordpad is a simple WordPad application that transliterates the words as you type them on the fly. You can select any of the in-built languages or use your own custom phonetic table for the transcription.

This application supports the rich-text format operations such as Bold, Italic, and Underline and to some extent lists and tables.

As you type words in English, they will be transcribed phonetically to the currently select transliteration scheme. You can optionally turn-off the on-the-fly translation by unchecking the ‘Translate Words As I type them’ option from the ‘Translate’ Menu. In such case, you can use the Ctrl+T option to select any word and transcribe it as you need.

PhWordpad supports standard Copy|Paste operations. When you paste content from other applications, the pasted words will not be translated automatically. This is useful if you wish to paste any English content as is. However, if you wish to translate the pasted content, then you can select the content and press Ctrl+T to force transcription on each word in the selection.

Once the typesetting is complete, you can save the content in multiple formats such as,

·        Open office document files (*.odt)

·        Html files (*.html)

Also special ‘Export PDF’ option is available that will render the typed content as-is to a PDF format document.

While opening documents, however, only Html files can be opened for editing. So, if you would like to continue editing your document some time later, you should choose to save in Html format.

The application also supports themes to change its UI look. UI themes such as Motif, CDE etc… are supported. You can change the UI theme by selecting the appropriate option from the Theme combo-box from the tool box near the status bar.

Note that some languages might require font support to be rendered properly. You can select and change the font options from the combo-boxes near the top tool bar.

Known Issues:

·        Selecting only partial Table content or only few rows of list content and pressing Ctrl+T may not work as expected. Workaround is to select complete table and applying Ctrl+T on it, or to select each cell content individually and transcribing them separately.

·        Usually the words get transcribed as each word gets complete. However, if you radically change cursor position across the pages using Mouse clicks, then later as you type sometimes the words may not get transcribed immediately once the word is complete. This is due to the way the QT TextEdit control computes its word position. The workaround is to continue typing; you would see the words getting transcribed. The other option is to click somewhere else on the text (to change the cursor position) and clicking it back to the current position. This should reset the current word calculation and start transcribing the words immediately as they are completed.


PhTranslator Application

This is a simple .Net application based on the PhTranslation API that performs simple word translations. You can enter phonetic English words and convert them to any supported language.

Each alphabet will be translated as and when you type, allowing you to view the results instantly. When you would like to view the Alphabet chart, you can click on the ‘Help: Alphabet Chart’ link at the bottom right of the window. It will launch the phonetic table chart in a HTML browser for the currently selected language.

The Font Options button allows you to adjust the font options for the output view. You can select different fonts and colors and see how the output looks. Then the output text can be copied to any Unicode compatible application for further processing.


Alphabet Mappings

Phonetic Translation Library performs the translations based on table lookup methods. Each language is defined in terms of its characters mapped to different English phonetic strings. You can use the ‘PhAlphabetEditor’ tool to modify any of those in-built tables to suit your own needs – effectively modifying the transliteration behavior.

For example, many users may have different preferences when it comes to alphabets such as ta, da etc… Some might want to map extra characters for a symbol. All these are possible with the PhAlphabetEditor tool. The tables modified with this editor tool can be loaded as custom tables in the PhWordPad application or PhTranslator application.

Below the editor tool details are presented. Later the default table layouts for each language are presented. They are the default tables as coded initially. You can modify them to create Custom tables. (Note that modifying default mapping does not delete them, instead you will only be creating additional custom tables that you use as your own preferred translator. You can anytime switch back to the original default tables, by selecting that language name, or continue to use your own custom translator).

PhAlphabetEditor

This application allows you to map English phonetic strings to Unicode characters of Indian languages. You can view mappings for any in-built language table supported or load your own custom table.

Characters are color coded to indicate if they are treated as Vowels, Consonants, Digits or Special Symbols. Upon hovering the mouse on any character, you would be presented with a Tip that summarizes the current state of the character.

A Vowel has two forms: an Independent form and a dependant form. The dependant form dictates how a consonant should look like when that vowel is applied to it.

All consonants in Indic languages by default inherit the vowel ‘A’ (the first vowel). So, ‘A’ will not have any dependent form.

Virama or Halant is the character that removes the inherent vowel from the consonants. All languages have one Virama/Halant symbol (and only one).

To modify the mappings of phonetic strings, double-click on any character and modify the settings in the ensuing dialog. For example, if you wish to swap the phonetic strings for‘t’ and ‘T’, double-click each of them and remove the existing letter and add the new one. Once the adjustment is done, use the ‘Test’ button to check if the mapping is working as expected or not. (Note that mappings are case sensitive, so it is possible to map‘t’ for one character and ‘T’ for another character.)

Also, for a single character you can map multiple phonetic strings (as shown in the picture here). In this example, we are editing the dependent form of character . The Unicode of this dependant character is 0x093E (as displayed on the dialog title).  Its original independent form is 0x0906. You can see the dependent form being displayed at the top, while its original independent form is displayed below. The top form is how the will look like when applied to a consonant. (The O will be replaced with the consonant shape, retaining the tail)

The editor does not perform any checking on the mappings. So it is possible to have a phonetic string to be mapped for multiple characters.  Make sure you test with enough input words to ensure your mappings have no conflicts.

When the editing is done, you can save the modified table using the File|Save menu option. The saved .phTable file can later be loaded as a custom table from the PhWordPad application or the PhTranslator application.

On the editor application, you will notice two small edit boxes, named From and To. You can set the range of characters to be displayed using these. You need to enter the start and end character code in Hex. The values will be automatically rounded to the 8th and 16th byte automatically (to align to the table correctly).


 

Bengali Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e, E

ai

o, O

au

zR

zL

AO

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph, f

b, B

bh, Bh

m

y

r, R

l, L

v, w

sh

Sh

s

h

zd

zdh

zy

Special Symbols

|

||

 


Gujarati Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e

E

ai

o

O

au

zR

zL

AO

M

H

OM

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph, f

b, B

bh, Bh

m

y

r, R

l

L

v, w

sh

Sh

s

h

Special Symbols

zS

|

||

 


Hindi Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e

E

ai

o

O

au

zR

zL

AO

M

H

OM

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

zN

p

ph, f

b, B

bh, Bh

m

y

r

R

l

L

zL

v, w

sh

Sh

s

h

zk

zkh

zg

zj

zd

zdh

zph

zy

Special Symbols

zS

|

||

 


Kannada Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e

E

ai

o

O

au

zR

zL

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph

f

b, B

bh, Bh

m

y

r

R

l

L

v, w

sh

Sh

s

h

Special Symbols

zs

|

||

 


Malayalam Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e

E

ai

o

O

au

zR

zL

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph, f

b, B

bh, Bh

m

y

r

R

l

L

zL

v, w

sh

Sh

s

h

Special Symbols

|

||

 


Oriya Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e, E

ai

o, O

au

zR

zL

AO

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph, f

b, B

bh, Bh

m

y

r

l

L

v, w

sh

Sh

s

h

zd

zdh

zy

Special Symbols

|

||

 


Sanskrit Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e, E

ai

o, O

au

zR

zL

AO

M

H

OM

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

zN

p

ph, f

b, B

bh, Bh

m

y

r

R

l

L

zL

v, w

sh

Sh

s

h

zx

Special Symbols

zS

|

||

 


Tamil Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

e

E

ai

o

O

au

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

NG

ch

j, J

NY

T

N

t

n

zN

p

m

y

r

R

l

L

zL

v,  w

sh

Sh

s

h

Special Symbols

|

||

 


Telugu Phonetic Table

Vowels

a

aa, A

i

ee, ii, I

u

oo, U

zr

zl

e

E

ai

o

O

au

zR

zL

AO

M

H

Digits

z0

z1

z2

z3

z4

z5

z6

z7

z8

z9

Consonants

k, K

kh, Kh

g, G

gh, Gh

NG

ch

Ch

j, J

jh, Jh

NY

T

Th

d

dh

N

t

th

D

Dh

n

p

ph, f

b, B

bh, Bh

m

y

r

R

l

L

v, w

sh

Sh

s

h

Special Symbols

|

||

 


Example Transliterations:


shaarNgapaaNi

·        শার্ণ্গপাণি

·        શાર્ણ્ગપાણિ

·        शार्ण्गपाणि

·        ಶಾರ್ಣ್ಗಪಾಣಿ

·        ശാര്ണ്ഗപാണി

·        ଶାର୍ଣ୍ଗପାଣି

·        ਸ਼ਾਰ੍ਣ੍ਗਪਾਣਿ

·        शार्ण्गपाणि

·        శార్ణ్గపాణి

vijNYaanaM

·        বিজ্ঞানং

·        વિજ્ઞાનં

·        विज्ञानं

·        ವಿಜ್ಞಾನಂ

·        വിജ്ഞാനം

·        ିଜ୍ଞାନଂ

·        ਵਿਜ੍ਞਾਨਂ

·        विज्ञानं

·        விஜ்ஞாநஂ

·        విజ్ఞానం

taDEvaahamasmiH ||

·        তদেবাহমস্মিঃ ॥

·        તદેવાહમસ્મિઃ ॥

·        तदेवाहमस्मिः ॥

·        ತದೇವಾಹಮಸ್ಮಿಃ ॥

·        തദേവാഹമസ്മിഃ ॥

·        ତଦୋହମସ୍ମିଃ ॥

·        ਤਦੇਵਾਹਮਸ੍ਮਿਃ ॥

·        तदेवाहमस्मिः ॥

·        తదేవాహమస్మిః ॥

tamizL

·        தமிழ்

lakShmI

·        লক্ষ্মী

·        લક્ષ્મી

·        लक्ष्मी

·        ಲಕ್ಷ್ಮೀ

·        ലക്ഷ്മീ

·        ଲକ୍ଷ୍ମୀ

·        ਲਕ੍ਸ਼੍ਮੀ

·        लक्ष्मी

·        லக்ஷ்மீ

·        లక్ష్మీ

 

kaLyaaNaM

·        কল্যাণং

·        કળ્યાણં

·        कळ्याणं

·        ಕಳ್ಯಾಣಂ

·        കള്യാണം

·        କଳ୍ଯାଣଂ

·        ਕਲ਼੍ਯਾਣਂ

·        कळ्याणं

·        கள்யாணஂ

·        కళ్యాణం

 

Download

Phtranslator is available at the downloads section of http://sourceforge.net/projects/phtranslator/

Copyright

This is a product of CineFx Research Labs, distributed under LGPL with the hope that it will be useful. No warranty of what-so-ever is implied, including MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Refer to LGPL for more details.

Author: Gopalakrishna Palem

Project: http://phtranslator.sourceforge.net/

Copyright (C) 2009 CineFx Digital Media Pvt Ltd.


API For Transliterating Indian Languages Copyright (C) 2009 CineFx Digital Media Pvt Ltd. P.Gopalakrishna

Phonetic Translation Library: Typing in Telugu, Tamil, Hindi, Malayalam, Kannada, Sanskrit, Bengali, Oriya Made Easy