2007年1月3日 星期三

image spam

FuzzyOCR 是一個 Spamassassin 的 Plugin ,專門對付日益增加的 Image Spam。以下是 installation guide for RHEL/CentoS and Fedora OS: (source: MailScanner wiki)

Pre-requisities

a. You’ll need the netpbm and libungif RPMs from the RedHat / Centos YUM repository
b. You’ll also need the GOCR rpm (build it yourself or use the Dag Wieers repository).
c. Finally you’ll need the String::Approx perl module.

Install the natively available RPMS

YUM does a fantastic job here.
# yum install netpbm netpbm-devel netpbm-progs gtk+-devel libungif libungif-devel libungif-progs

Next we install the gocr RPM.
# mkdir /root/FuzzyOcrPlugin
# cd /root/FuzzyOcrPlugin
# wget http://jaist.dl.sourceforge.net/sourceforge/jocr/gocr-[latest-version].tar.gz
# tar xzf gocr-[latest-version].tar.gz
# cd gocr-[latest-version]
# perl -e "s/^%configure --with-netpbm=no/%configure/g;" -pi gocr.spec
# cd .. # tar czf gocr-[latest-version].custom.tar.gz gocr-[latest-version]
# rm -fr gocr-[latest-version]
# rpmbuild -ta gocr-[latest-version].custom.tar.gz
# cd /usr/src/redhat/RPMS/i386/
# rpm -ivh gocr-[latest-version]-1.i386.rpm gocr-devel-[latest-version]-1.i386.rpm
# cd -

Now we come to the String::Approx installation Don’t trouble yourself, simply use CPAN for this.
# perl -MCPAN -eshell cpan> install String::Approx

Finally we download and install the FuzzyOCR library and cf file.

The latest release is always available as fuzzyocr-latest.tar.gz (though it could change withut notice). The main download page is at http://fuzzyocr.own-hero.net/wiki/Downloads
# wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz
# tar xzf fuzzyocr-latest.tar.gz
# cd Fuzzy*
# mv FuzzyOcr.cf FuzzyOcr.pm /etc/mail/spamassassin
# mv FuzzyOcr.words.sample /etc/mail/spamassassin/FuzzyOcr.words
Test out the setup

And finally the test.
# spamassassin -x -D --lint

Check for lines like:
[4466] dbg: plugin: fixed relative path: /etc/mail/spamassassin/FuzzyOcr.pm
[4466] dbg: plugin: loading FuzzyOcr from /etc/mail/spamassassin/FuzzyOcr.pm
[4466] dbg: plugin: registered FuzzyOcr=HASH(0x9467294)
[4466] dbg: plugin: FuzzyOcr=HASH(0x9467294) implements 'parse_c

試驗:

上圖是其中一封垃圾電郵的 Image Span, 加上 FuzzyOcrPlugin 後,spamassassin 的分數加上
* 5.0 FUZZY_OCR BODY: Mail contains an image with common spam text inside
* Words found:
* "company" in 1 lines
* "trade" in 1 lines
* "target" in 1 lines
* (3 word occurrences found)
都算厲害,我肉眼都未能立時看到這三個字!!!

更多資料:

Google Search
FuzzyOCR 網站

0 意見: