WWW-wide translation of HTML pages
The most popular application of this project is
Guhgel
- a German->Saxon translator applied to the search engine Google.
The same technique was independently developed
for the German->Swabian translation for
UNiMUT's Schwobifying Proxy
"parallelweb" when installed as CGI script
will translate a HTML document
while all its links are rewritten
so that following links
will also translate the referred documents.
Structure
To get this nice effect, three problems have to solved
(and they are all solved at a certain degree):
Although the last point was the original motivation
the first two points consumed most of the development time
because of the crappy HTML design including its extensions
and mistakes that are made by 99% of the HTML designers
who either use ugly WYSIWYG HTML editors or at least
avoid any HTML validation.
The heart of the project is the module RelinkHTML.py
which provides the ugly technical part
whereas many modules (see the example directory)
can benefit from it.
Usage
Look into the "Makefile" ! :-]
ToDo
RelinkHTML seems to be a never ending story,
it does still not work with many HTML pages.
sgmllib.py contains some bugs as far as I can see
and htmlentitydefs.py is inconsistent with respect
to numeric character entities
Adaption of the modules to work with the apache module mod_python .
Which is much more efficient
than running the python interpreter again and again.
Is there someone who is able and willing to install
these scripts for general usage on his machine?
Can someone tell me why happydoc creates the whole home/user/parallelweb/doc path
into my doc directory, again?
I can't make the links work generally this way!
Today I feel that a static typed language like Haskell
had been the better choice for this project,
although Python is much better suited than Perl or PHP.
In Python the data structures are built dynamically
which let you easily lose track of the structures
and makes it a horror to restructure something.
Modules and Packages
|
|
example/
ChineseDoubleBass |
Replaces vowel according to the scheme of the piece of the three Chineses with a double bass.
|
Cute |
Makes German text sound a little more cute.
|
Echo |
apply an echo effect on German HTML files :-)
|
Ehmulator |
add some tongues to German HTML files, so the texts sounds like those spoken by a famous Bavarian politician :-)
|
Kapostroph |
Replaces vowel according to the scheme of the piece of the three Chineses with a double bass.
|
Loschka |
apply a so called spoon effect on German HTML files :-)
|
Saxophone |
Translate German HTML files to Saxonian dialect :-).
|
Wisdom |
add some wise words to German HTML files :-)
|
modules/
ExtDict |
Extends UserDict by the methods transpose(), keyregexp().
|
FormattedOutput |
Output of special markup
|
GermanGrammar |
Help processing of German words.
|
RelinkHTML |
Parse HTML pages, absolutize links and invoke a translate on the text.
|
|
|