Table of Contents

ParallelWeb  

WWW-wide translation of HTML pages

The most popular application of this project is Guhgel - a German->Saxon translator applied to the search engine Google.

The same technique was independently developed for the German->Swabian translation for UNiMUT's Schwobifying Proxy

"parallelweb" when installed as CGI script will translate a HTML document while all its links are rewritten so that following links will also translate the referred documents.

Structure

To get this nice effect, three problems have to solved (and they are all solved at a certain degree):

  • Parse HTML and separate markup from text

  • On markup: Rewrite hyperlinks for automatic translation of referred pages.

  • On text: Translate it.

Although the last point was the original motivation the first two points consumed most of the development time because of the crappy HTML design including its extensions and mistakes that are made by 99% of the HTML designers who either use ugly WYSIWYG HTML editors or at least avoid any HTML validation.

The heart of the project is the module RelinkHTML.py which provides the ugly technical part whereas many modules (see the example directory) can benefit from it.

Usage

Look into the "Makefile" ! :-]

ToDo

  • RelinkHTML seems to be a never ending story, it does still not work with many HTML pages.

  • sgmllib.py contains some bugs as far as I can see and htmlentitydefs.py is inconsistent with respect to numeric character entities

  • Adaption of the modules to work with the apache module mod_python. Which is much more efficient than running the python interpreter again and again.

  • Is there someone who is able and willing to install these scripts for general usage on his machine?

  • Can someone tell me why happydoc creates the whole home/user/parallelweb/doc path into my doc directory, again? I can't make the links work generally this way!

  • Today I feel that a static typed language like Haskell had been the better choice for this project, although Python is much better suited than Perl or PHP. In Python the data structures are built dynamically which let you easily lose track of the structures and makes it a horror to restructure something.

Modules and Packages   

example/

ChineseDoubleBass

Replaces vowel according to the scheme of the piece of the three Chineses with a double bass.

Cute

Makes German text sound a little more cute.

Echo

apply an echo effect on German HTML files :-)

Ehmulator

add some tongues to German HTML files, so the texts sounds like those spoken by a famous Bavarian politician :-)

Kapostroph

Replaces vowel according to the scheme of the piece of the three Chineses with a double bass.

Loschka

apply a so called spoon effect on German HTML files :-)

Saxophone

Translate German HTML files to Saxonian dialect :-).

Wisdom

add some wise words to German HTML files :-)

modules/

ExtDict

Extends UserDict by the methods transpose(), keyregexp().

FormattedOutput

Output of special markup

GermanGrammar

Help processing of German words.

RelinkHTML

Parse HTML pages, absolutize links and invoke a translate on the text.


Table of Contents

This document was automatically generated on Mon Oct 6 18:11:15 2003 by HappyDoc version 2.1