boat_broker hat geschrieben:I believe that you correctly identified my intent with \b[Rr][Ee]\b:\s\B. However, with the RegEx Coach VM it does seem to work. It recognizes "Re:" and "Re: " and "Re: " (a re followed by a tab). At the same time it passes "Re: boats" and "Re:boats". So far, all is good. I will let you know if I change anything and I will go back to my formula and try to explain my reasoning. I do need to review the "anchor" concept that you mentioned. I should also note that I came up with this as an entry into the Spam Words file not the Rule Filter. I believe that it will be the same in the Rule Filter but it should be a little more precise.
I finally got your RegExp to work in The RegEx Coach, however even then it would also match a subject like "BA RE:". This might be alright for you, i just wanted to point it out that there could be any text before that "RE:" as long as its separated from "RE:" by a word boundary (Non-Word-Character).
But when i tried your RegExp in Spami it would NOT work in any of the cases you see in my screenshot of the Training Area above.
I understand your reasoning behind your RegEx, now even the "\B" at the end

Except for the problem with text before the "RE:" it's fine, i just couldn't get it to work in Spami.
I cannot tell you why it is not working but it must have to do with the word boundaries.
boat_broker hat geschrieben:Thank you so much for the time you put into your reply. It was way more than I had hoped.
I'm also still trying to learn more about RegExps myself, i'm sure i'll benefit from this discussion myself, so call me selfish if you want

.
boat_broker hat geschrieben:I’ve been pecking away at this today and my head is spinning. I am curious if the virtual machine that you are using is available. I have been getting different results than you mention (or I am interpreting them incorrectly) when I use
The Regex Coach by Edi Weitz that is mentioned in the Spamihilator Wiki
RegEx Tutorial
I'm still using "The RegEx Coach" myself from time to time, but unfortunately it doesn't seem to have the exact same interpretation of PERL RegExps as being used in Spami through the
Boost Library.
Using this fantastic program for developing RegExps for single Words should be no problem, i have done that myself plenty of times with great success.
But when it comes to more complex expressions with word boundaries, anchors and linefeeds i don't trust "The RegEx Coach" anymore since i already had more than one incident where it worked in the program but not in Spami.
One example:
When you have a text consisting of two lines of text with a forced linebreak (Hit "ENTER" at the end of the line") in between those two lines you can match this linebreak with the RegEx "\s" in "The RegEx Coach" but in the Boost Library it's two times "\s".
Every RegExp Program out there looks for a substring match of your RegExp. This corresponds to the method "contains (regex)" while Spami offers also the method "matches (regex)" where the whole test-text has to be matched by your RegExp. This makes it a little harder to develop RegExps that uses the method "matches (regex)".
My newest favorite playground for RegExps is the QuickREx plug-in in development platform Eclipse.
When I was trying to find the URL i just learned that QuickREx is also available now a s a standalone application.
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/standalone.htmlThe behavior is way closer to the one in Spami.
Of course the problem with the different methods "contains (regex)" and "matches (regex) persists.
Choose the Jakarta ORO Perl implementation and set the global ORO Perl flag "Case insensitive"
This requires only checking/unschecking stuff, don't panic

Regards,
Quellcore
[quote="boat_broker"]I believe that you correctly identified my intent with \b[Rr][Ee]\b:\s\B. However, with the RegEx Coach VM it does seem to work. It recognizes "Re:" and "Re: " and "Re: " (a re followed by a tab). At the same time it passes "Re: boats" and "Re:boats". So far, all is good. I will let you know if I change anything and I will go back to my formula and try to explain my reasoning. I do need to review the "anchor" concept that you mentioned. I should also note that I came up with this as an entry into the Spam Words file not the Rule Filter. I believe that it will be the same in the Rule Filter but it should be a little more precise.[/quote]
I finally got your RegExp to work in The RegEx Coach, however even then it would also match a subject like "BA RE:". This might be alright for you, i just wanted to point it out that there could be any text before that "RE:" as long as its separated from "RE:" by a word boundary (Non-Word-Character).
But when i tried your RegExp in Spami it would NOT work in any of the cases you see in my screenshot of the Training Area above.
I understand your reasoning behind your RegEx, now even the "\B" at the end ;-)
Except for the problem with text before the "RE:" it's fine, i just couldn't get it to work in Spami.
I cannot tell you why it is not working but it must have to do with the word boundaries.
[quote="boat_broker"]Thank you so much for the time you put into your reply. It was way more than I had hoped.[/quote]
I'm also still trying to learn more about RegExps myself, i'm sure i'll benefit from this discussion myself, so call me selfish if you want ;-).
[quote="boat_broker"]I’ve been pecking away at this today and my head is spinning. I am curious if the virtual machine that you are using is available. I have been getting different results than you mention (or I am interpreting them incorrectly) when I use [url=http://weitz.de/regex-coach]The Regex Coach[/url] by Edi Weitz that is mentioned in the Spamihilator Wiki [url=http://wiki.spamihilator.com/doku.php?id=en:tutorials:regex]RegEx Tutorial[/url][/quote]
I'm still using "The RegEx Coach" myself from time to time, but unfortunately it doesn't seem to have the exact same interpretation of PERL RegExps as being used in Spami through the [url=http://www.boost.org/doc/libs/1_42_0/libs/regex/doc/html/index.html]Boost Library[/url].
Using this fantastic program for developing RegExps for single Words should be no problem, i have done that myself plenty of times with great success.
But when it comes to more complex expressions with word boundaries, anchors and linefeeds i don't trust "The RegEx Coach" anymore since i already had more than one incident where it worked in the program but not in Spami.
One example:
When you have a text consisting of two lines of text with a forced linebreak (Hit "ENTER" at the end of the line") in between those two lines you can match this linebreak with the RegEx "\s" in "The RegEx Coach" but in the Boost Library it's two times "\s".
Every RegExp Program out there looks for a substring match of your RegExp. This corresponds to the method "contains (regex)" while Spami offers also the method "matches (regex)" where the whole test-text has to be matched by your RegExp. This makes it a little harder to develop RegExps that uses the method "matches (regex)".
My newest favorite playground for RegExps is the QuickREx plug-in in development platform Eclipse.
When I was trying to find the URL i just learned that QuickREx is also available now a s a standalone application.
[url]http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/standalone.html[/url]
The behavior is way closer to the one in Spami.
Of course the problem with the different methods "contains (regex)" and "matches (regex) persists.
Choose the Jakarta ORO Perl implementation and set the global ORO Perl flag "Case insensitive"
This requires only checking/unschecking stuff, don't panic ;-)
Regards,
Quellcore