FuzzyOCR 是一個 Spamassassin 的 Plugin ,專門對付日益增加的 Image Spam。以下是 installation guide for RHEL/CentoS and Fedora OS: (source: MailScanner wiki)
Pre-requisities
a. You’ll need the netpbm and libungif RPMs from the RedHat / Centos YUM repository
b. You’ll also need the GOCR rpm (build it yourself or use the Dag Wieers repository).
c. Finally you’ll need the String::Approx perl module.
Install the natively available RPMS
YUM does a fantastic job here.
# yum install netpbm netpbm-devel netpbm-progs gtk+-devel libungif libungif-devel libungif-progs
Next we install the gocr RPM.
# mkdir /root/FuzzyOcrPlugin
# cd /root/FuzzyOcrPlugin
# wget http://jaist.dl.sourceforge.net/sourceforge/jocr/gocr-[latest-version].tar.gz
# tar xzf gocr-[latest-version].tar.gz
# cd gocr-[latest-version]
# perl -e "s/^%configure --with-netpbm=no/%configure/g;" -pi gocr.spec
# cd .. # tar czf gocr-[latest-version].custom.tar.gz gocr-[latest-version]
# rm -fr gocr-[latest-version]
# rpmbuild -ta gocr-[latest-version].custom.tar.gz
# cd /usr/src/redhat/RPMS/i386/
# rpm -ivh gocr-[latest-version]-1.i386.rpm gocr-devel-[latest-version]-1.i386.rpm
# cd -
Now we come to the String::Approx installation Don’t trouble yourself, simply use CPAN for this.
# perl -MCPAN -eshell cpan> install String::Approx
Finally we download and install the FuzzyOCR library and cf file.
The latest release is always available as fuzzyocr-latest.tar.gz (though it could change withut notice). The main download page is at http://fuzzyocr.own-hero.net/wiki/Downloads
# wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-latest.tar.gz
# tar xzf fuzzyocr-latest.tar.gz
# cd Fuzzy*
# mv FuzzyOcr.cf FuzzyOcr.pm /etc/mail/spamassassin
# mv FuzzyOcr.words.sample /etc/mail/spamassassin/FuzzyOcr.words
Test out the setup
And finally the test.
# spamassassin -x -D --lint
Check for lines like:
[4466] dbg: plugin: fixed relative path: /etc/mail/spamassassin/FuzzyOcr.pm
[4466] dbg: plugin: loading FuzzyOcr from /etc/mail/spamassassin/FuzzyOcr.pm
[4466] dbg: plugin: registered FuzzyOcr=HASH(0x9467294)
[4466] dbg: plugin: FuzzyOcr=HASH(0x9467294) implements 'parse_c
試驗:
上圖是其中一封垃圾電郵的 Image Span, 加上 FuzzyOcrPlugin 後,spamassassin 的分數加上
* 5.0 FUZZY_OCR BODY: Mail contains an image with common spam text inside
* Words found:
* "company" in 1 lines
* "trade" in 1 lines
* "target" in 1 lines
* (3 word occurrences found)
都算厲害,我肉眼都未能立時看到這三個字!!!
更多資料:
Google Search
FuzzyOCR 網站
2007年1月3日 星期三
image spam
張貼者: 企鵝佬 於 晚上8:33
標籤: FuzzyOCR, spamassassin
訂閱:
張貼留言 (Atom)
0 意見:
張貼留言