Delta Engine Blog

AI, Robotics, multiplatform game development and Strict programming language

BlogEngine.net Spam Comment Remover

This blog has almost 500 posts. I moved from DasBlog to BlogEngine.net in 2009. While the spam comment count was low enough in 2009, it now has reached an absurd number of 41000 spam comments. Keep in mind this is in spite of removing spam comments in recent posts all the time and adding different mechanisms (captcha, approving comments). In 2009 I wrote a little console program that removed most spam comments while migrating to BlogEngine.net from DasBlog, but it was not very good and it just killed most of the comments.

Instead of removing all comments this time I decided to write a little tool that removes all spam comments automatically. To be honest not many comments were written for the posts of exdream.com in 2009-2012, but they are still useful. I started with a little console app I wrote in a few hours while trying out some new lambda tricks and doing TDD (tests first, then production code). Today I added some Gui and refactored a lot and made sure 100% of the code is covered with tests (except for UI code).

The result is called: BlogEngineSpamCommentRemover, you can download the executable here or the source code at: https://github.com/DeltaEngine/BlogEngineSpamCommentRemover

 

You can also start it as an console application (which was how I started and did all the unit tests first):

 

About the code: As you can see from the screenshots the program is simple, but not trivial. It does a lot of string compares, has a huge list of words and nice spam comment phrases (see below) and does contact a webservice to check if a comment contains spam or not. It also features a console application, a Gui version using WPF and unit tests covering 100% of the code (except UI code of course). It still is under 300 lines of code (about 800 if you count comments and empty lines):

I tried to follow our new Delta Engine Coding Style, which is based on Clean Code by Uncle Bob. Some things might not be perfect yet, but every time I look through the code more and more refactoring is applied. I worked pretty much all day on this and now enough fine-tuning has been done, but if I (or anyone) decide to extend or modify the program, more refactoring will happen.

To reach 100% coverage loading BlogEngine.net blog post files with comments had to be loaded and the spam comment checker webservice needed to be contacted. This is obviously slow so I tried out some mocking frameworks and different test frameworks, but got back to just NUnit and testing everything with NCrunch. In some other research projects from this week I used Moq, but it was not required in the end for this project. Moq can't handle static methods, so I tried out the Fakes framework (part of VS11 quality testing tools, was previously Moles) and TypeMock, but both are way overpowered and they did not make the code simpler. Executing all tests with Resharpers test runner takes about 100ms, but I excluded all file and webservice tests. For NCrunch nothing was excluded and a full re-run takes about a second, which is totally fine. I am not sure how this would affect big solutions with multiple projects. Unit testing should obviously be as quick as possible, but as long as it is under one second and done in the background anyway I see no need to improve unit testing performance even more.

Finally here is the list of some comment spam comment nice words, which already catches most of the spam posts after removing unapproved ones and ones with obvious spam words or domains:

public static List<string> NiceSpamComments = new List<string>(new[]
{
	// http://www.goseewrite.com/2010/12/great-spam-comments/
	"this is good information",
	"interesting read, thanks",
	"i have learned a lot from your blog",
	"i used to love reading your blog",
	"you're such an inspiration",
	"i love this blog",
	"i absolutely do not believe you",
	"i really do not understand you",
	"i'm happy i found this blog",
	"you must be a genius",
	"you must be a genious",
	"excellent blog post, i look forward to reading more",
	"pleasure in the job puts perfection in the work",
	"blown away by the content and quality of your site",
	"what i liked about her,",
	"the post is really the best on this laudable topic.",
	"know yourself and you will win all battles.",
	"hi mate this is interesting article",
	"will make sure I check your posts more often",
	"really interesting article",
	"its funny how those little things can drive you mad",
	"i would like to thank you for the efforts you have made in writing this post.",
	"i am hoping the same best work from you in the future as well.",
	"your creative writing abilities has inspired me to start my own blog",
	"really the blogging is spreading its wings rapidly.",
	"your write up is a fine example of it.",
	"pretty good post.",
	"i just stumbled upon your blog and wanted to say ",
	"i have really enjoyed reading your blog posts",
	"i'll be subscribing to your feed and I hope you post again soon.",
	"i have bookmarked it and I am looking forward to reading new articles.",
	"keep up the good work. i hope you can continue this kind ",
	"increase the importance and interactivity of site." +
	"i don? usually reply to posts but I will in this case.",
	"definitely agree with what you stated.",
	"your explanation was certainly the easiest to understand.",
	"you managed to hit the nail right on the head and explained out everything ",
	"this is a nice blog.",
	"good clean ui and nice informative blogs.",
	"i will be coming back in a bit, thanks for the great post.",
	"i put a link to your blog at my site, hope you don't mind?",
	"that? too nice, when it comes in india hope it can make a rocking place for ",
	"thanks for taking the time to discuss this, ",
	"i feel strongly about information and love learning more on this.",
	"if possible, as you gain expertise, It is extremely helpful for me.",
	"would you mind updating your blog with more information?",
	"interesting blog. ",
	"it would be great if you can provide more details about it.",
	" thanks a load!",
	"nice article, I must say I never scan one thing that summed it up therefore well.",
	"one thing like this could be scan once in an exceedingly whereas, ",
	"i have been reading a lot on here and have picked up some great ideas.",
	"i appreciate you taking the time to write all this up for us",
	"hhmmmm very interesting article!",
	"this is a great site, very handy, just what i was looking for",
	"keep up the good work, many thanks!",
	"i was very pleased to find this site.",
	"i wanted to thank you for this great read!",
	"i definitely enjoying every little bit of it and i have you bookmarked to check out",
	"thank you for another informative post!",
	"i appreciate your efforts.",
	"its really very interesting article and really had me thinking",
	"will make sure i check your posts more often!",
	"it contains a useful information in it..thanks",
	"cheers for the info. it was a good read.",
	"don’t stop blogging! it’s nice to read a sane commentary for once",
	"great blog here. keep it up!",
	"please try to include more information if possible.",
	"i have just started working with this software and am having a few problems.",
	"is there any place to go where I can get some more information?",
	"this sounds like a great app.",
	"it never ceases to amaze me that so many different apps are hitting the market.",
	"thanks for the cool pic too.",
	"i had been looking for this product.",
	"finally I found it in your blog.",
	"great post! thanks for the information",
	"great post, you’ve helped me a lot",
	"i thought you were going to chip in with some decisive insght at the end there",
	"i am so satisfied that I have found this your post because I have been searching ",
	"i will definitely bookmark your website and wait for other useful and info",
	"i usually don? post in blogs but",
	"your blog appears quite informative.",
	"can you please tell me how can i read your rss blog?",
	"nice article, I must say I never scan one thing that summed it up therefore well.",
	"one thing like this could be scan once in an exceedingly whereas",
	"i would like to thank you for the efforts you have made in writing this post.",
	"i like your blog so much that i feel i have to wish you",
	"i love reading through your blog, I wanted to leave a little comment",
	"wishing you the best of luck for all your blogging efforts.",
	"i just got this in the mail this week.",
});

Hopefully this project is useful for anyone stumbling upon it and helps removing spam comments.