boat_broker hat geschrieben:I had considered using Regular Expressions before but I was too lazy to try and learn how they worked. When I saw the new “Rules” Section I was kind of hoping that it might do the same thing easier. However, I have yet to see any update on just how to use the Rules Section. Is there any discussion of that section or instructions for that section?
We are all pretty excited about the Rule Filter, it is very powerful and could even substitute several other filters if you really wanted to put the extra effort into it.
The usage is pretty straight forward, in fact there is a
german WIKI-Entry which hasn't been translated to English yet. The only tricky part is to manhandle Regular Expression, which is a whole topic itself. People have written Books about it, you know your way around in that language and creating rules for the Rules Filter containing Regular Expression will be quick and painless.
I'm making extensive use of the Rulefilter myself, although extensive in the meaning of quantity, i am also trying hard to improve my knowledge about Regular Expressions and incorporate that into new Rules on a daily basis. I will introduce my most effective Rules here in this board in the very near future, but it might be in the german section. Other Board memebrs and myself tried to come up with strategy how to discuss Rules for the Rulefilter. Stuff like what to filter, what not to filter, what to watch out for in terms of false positives etc ...
We started discussing it, but obviously there is no "Project" startet yet

Getting back to regular expressions. After playing around with them for a while I believe that the RegEx \b[Rr][Ee]\b:\s\B will do what I was asking, to knock out any email with a blank after the “Re:” while not eliminating a message with a word after the blank. I extrapolated this to Fw: and Fwd: as well (\b[Ff][Ww]\b:\s\B and \b[Rr][Ee]\b:\s\B) so I believe that I am covered. Comments would be welcome.
Seeing you experimenting with special features like word boundaries makes me happy, it is generally a good way of eliminating fasle positives.
Ok lets strip it down:
Your original RegExp:
- Code: Alles auswählen
\b[Rr][Ee]\b:\s\B
- Regex Matching Mode is "case insensitive"
Since Spami uses the "case insensitive" flag in the used RegExp-Libray, it means the Rulefilter and the Spamword Filter are case insensitive.
Therefor your rule could already be simplified to
- Code: Alles auswählen
\bRE\b:\s\B
- Word Boundaries "\b"
I'm still having trouble grasping the effect of word boundaries in special and advanced scenarios.
Regular-Expressions.info hat geschrieben:- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
Your first "\b" before "RE" makes sense.
It covers Scenario "A". The only problem is that it could also cover Scenario "C".
A Word Character is covered be the class [0-9a-z] while a Non-Word Character is everything else, so your RegExp would also find a match in the String "BA.RE:"
To avoid this i would recommend looking into the use of anchors for the start and end of the line instead of using word boundaries.
Your second "\b" after "E" and before ":" doesn't make any sense. Between "E" and ":" there is always a word boundary by definition "C" (see above) so you could leave it out. - Negated Word Boundaries "\B"
I can't figure out your intention in this case 
As for matching a RegExp, the Rule filter offers two different options, "contains (regex)" and "matches (regex)".
The easiest and shortest solution to your task i could come up with this one:- Code: Alles auswählen
Subject matches (regex) RE:\s*
Or as an example hot to utilize anchors:
- Code: Alles auswählen
Subject contains (regex) ^RE:\s*$
It looks like the Rules Section may allow me to use RegExps in specific areas of an email (ie Subject or body, etc). I am looking forward to exploring the potential of this "wizard".
The same for me and some others here in this forum.
Non of us is really a pro when it comes to RegExps, we simply like to explore and experiment with the endless possibilities.
RegEx implementations come in many differernt flavors like Perl, Java and .Net, they have slight differences, so experimenting is sometimes the only way of telling if something works or not.
Spamihilator is using the Boost RegEx Library which is a PERL flavor.
Check out the 4 different matching modes.
There is no official statement which one is turned within Spamihilator.
It's pretty obvious that "case insensitive" matching is enabled, about the rest i don't know.
Regards,
Quellcore