Full 논문 공개와 데이터 공개

어제 교수님께서 학습 데이터하고 논문 파일 보내달라고 하셔서 정리하다가 이왕이면 다른 실험 하시는 분들도 참고하시라고 이렇게 모두 공개한다.

단, 동의하여야될 부분은… 이 판정 데이터의 정확도 여부에 대해서는 책임을 지지 않는다는 것과, 반드시 학술 목적으로 사용해야 한다는 조건을 붙인다.
위의 사항에 동의하시는 분들만 받아서 사용하길 바란다.

본문과 덧글의 동시출현 자질을 이용한 역 카이제곱 기반 블로그 덧글 스팸 필터 시스템

A Comment Spam Filter System based on Inverse Chi-Square Using of Co-occurrence Feature between Comment and Blog Post

Abstract

Blog
is the best media that can be used in individual purpose what is more can be used
in corporate communication. Beside of free writing, there is abusing of blog
comment spam.

In
case of common spam filter, it only use comment feature. But it is hard to gain
high accurate rate, because spam comment is shorter than ham comment that cause
shortage of features using in spam filter algorithm.

This paper suggests a similarity assumption
between main post and comment, and using spam filter algorithm added
co-occurrence information feature with current term probability feature.
Actually after adding this feature, we gain more accuracy than common filter
that only use term probability feature.

JeonHeeWon_full_paper

trainingset

from future import dream

당신의 나의 뜨거운 감자!

관련