Software

soffice2txt
I have modified the soffice2html Perl script of Steve Slaven (http://hoopajoo.net) to get only text.
The purpose of the script is to be used by phpDig(see below) to index OpenOffice.org files. The script is distrbued under GPL. You can download the tarball of the script here: soffice2txt-0.1.tgz
phpDig
To index OpenOffice.org files, the 1.8.6 version of phpDig must be modified. The files "admin/robot_functions.php" and "include/config.php" must be adaptated to support new mime types and to declare the text convertion tool (soffice2txt.pl but other is possible).
You can find the patch file here: phpdig-1.8.6_openoffice.diff
To apply the patch try the following command:

unzip phpdig-1.8.6.zip
patch -p1 < phpdig-1.8.6_openoffice.diff
Note: Your HTTP server must know mime types of OpenOffice.org applications. To do that you must add (if it's not the case) following mime types in the mime types file in the configuration directory of your web server. (On Apache server, this file is called "mime.types").

application/vnd.sun.xml.writer    sxw
application/vnd.sun.xml.calc   sxc
application/vnd.sun.xml.draw   sxd
application/vnd.sun.xml.impress   sxi
application/vnd.sun.xml.math   sxm
application/vnd.sun.xml.writer.template   stw
application/vnd.sun.xml.calc.template   stc
application/vnd.sun.xml.draw.template   std
application/vnd.sun.xml.impress.template   sti
If you upgrade the script or encounter some problem, feel free to contact me.
Mail: bonnet.jeanphilippe@free.fr