I have modified the soffice2html Perl script of Steve Slaven (
http://hoopajoo.net) to get only text.
The purpose of the script is to be used by phpDig(see below) to index OpenOffice.org files. The script is distrbued under GPL. You can download the tarball of the script here:
soffice2txt-0.1.tgz
To index OpenOffice.org files, the 1.8.6 version of phpDig must be modified. The files "admin/robot_functions.php" and "include/config.php" must be adaptated to support new mime types and to declare the text convertion tool (soffice2txt.pl but other is possible).
You can find the patch file here:
phpdig-1.8.6_openoffice.diff
To apply the patch try the following command:
unzip phpdig-1.8.6.zip
patch -p1 < phpdig-1.8.6_openoffice.diff
Note: Your HTTP server must know mime types of OpenOffice.org applications. To do that you must add (if it's not the case) following mime types in the mime types file in the configuration directory of your web server. (On Apache server, this file is called "mime.types").
application/vnd.sun.xml.writer sxw
application/vnd.sun.xml.calc sxc
application/vnd.sun.xml.draw sxd
application/vnd.sun.xml.impress sxi
application/vnd.sun.xml.math sxm
application/vnd.sun.xml.writer.template stw
application/vnd.sun.xml.calc.template stc
application/vnd.sun.xml.draw.template std
application/vnd.sun.xml.impress.template sti
If you upgrade the script or encounter some problem, feel free to contact me.