Understanding "Regular Expressions" and "Expr

For all users, who don't speak German!

Moderator: Forum-Team

Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 17. Mär 2006, 00:22

Hi,

I've been using Spami for a few months now. Fabulous program!

I have been receiving a large number of spams with the word "stock" in the subject line. I added "stock" (at 100%) to the spam word file and did not check the "Regular Expression" box. I expected Spami to identify a message with any variation of the word "stock" (ie: stock, Stock, STOCK) as spam. However, I often get spam passed through with the word "Stock" in the subject line. When I open the header I can recognize the word "stock" in the subject (it isn't in HTML, at least I don't think it is).

Am I doing something wrong?

Thanks,
Boat Broker
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "

Beitragvon Dieter » 17. Mär 2006, 01:16

boat_broker hat geschrieben:However, I often get spam passed through with the word "Stock" in the subject line.

I'm not quite sure, but I believe that only the body of a mail will be checked by the Spam-Word-Filter, not the header lines.

Regards
Dieter
Intel(R) Core(TM)2 Duo CPU E8500 3.16 GHz, 3.25 GB Ram, WinXP Home Edition SP3,
Outlook Express 6, Outlook 2007
Intel(R) Core(TM)2 Duo CPU T8100 2.10 GHz, 4.00 GB Ram, Windows 7 Ultimate,
Windows Live Mail, Outlook 2007
T-Online/1&1/Gmx/Strato, Call & Surf Comfort Plus, DSL 16000
Benutzeravatar
Dieter
Assistent
Assistent
 
Beiträge: 1538
Registriert: 14. Sep 2003, 11:16
Wohnort: München

Re: Understanding "Regular Expressions" and "Expr

Beitragvon michel » 18. Mär 2006, 14:38

Hi!

The subject line will also be checked.

Sincerely,
Michel Krämer
Chuck Norris doesn't kill Spam. He uses Spamihilator! ;-)
Benutzeravatar
michel
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
Plugin-Programmierer
 
Beiträge: 4314
Registriert: 22. Mär 2003, 02:16
Wohnort: Buseck

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Dieter » 18. Mär 2006, 14:53

michel hat geschrieben:The subject line will also be checked.

O.K. :wink: I said, I was not sure!

Regards
Dieter
Intel(R) Core(TM)2 Duo CPU E8500 3.16 GHz, 3.25 GB Ram, WinXP Home Edition SP3,
Outlook Express 6, Outlook 2007
Intel(R) Core(TM)2 Duo CPU T8100 2.10 GHz, 4.00 GB Ram, Windows 7 Ultimate,
Windows Live Mail, Outlook 2007
T-Online/1&1/Gmx/Strato, Call & Surf Comfort Plus, DSL 16000
Benutzeravatar
Dieter
Assistent
Assistent
 
Beiträge: 1538
Registriert: 14. Sep 2003, 11:16
Wohnort: München

Re: Understanding "Regular Expressions" and "

Beitragvon Chactory » 18. Mär 2006, 18:28

Hi boat_broker,

AFAIK your expression "stock" should filter "Stock" as well. At least if my testing is done properly.

Spamword Filter just sieves the subject line and mail text, the same the Learning filter.

Perhaps this findings of lacking header filtering may be understood as a suggestion for further developement of the Spamihilator program?

Until Michel will find time to check this, you may want to use other filters and plugins, too. This could bring you relief. :-) IMHO best filters are:
- Scripts Filter
- DCC Filter
- Hercule
- URL Filter
- Empty Mail Filter
- Charset Plugin
- Spam Word Filter
- Learning Filter

With kind regards, Chactory
Vostro 3450, Intel Core i5 2410M 2,3 GHz, 4 GB DDR3 SDRAM 1333 MHz, Windows 7 Pro 64 Bit SP1
Online-Hilfe: «DE» − Chactory's Tipps: «DE» − Anbuva's FAQ: «DE» und «EN» − Bob Loefflers FAQ: «EN»

Bild
Benutzeravatar
Chactory
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
 
Beiträge: 8627
Registriert: 10. Jan 2004, 00:19
Wohnort: Kiel (D)

Re: Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 21. Mär 2006, 18:55

Thanks all,

Dieter, I am pleased you answered but I am happy that the subject line is also checked! Thanks Michel for clarifying that.

Chactory, I have added my filters below and how I have them set. If you read this and notice a problem please comment. I’ve tried repeatedly to load the Hercule filter but my computer doesn’t seem to want to down load it.

Also, for example, I have the spam word “stock” set at 100%. The sensitivity set to medium aggressiveness and 99%.

I noticed that the recommended pluggin settings are Stop filtering when spam is found and Stop filtering process when non-spam is found (1 B). However, as you’ll notice, I have set most to continue to the next filter when non-spam is found. I believe this should weed out spam better. Am I right?
************************************************************

1) Stop filtering when spam is found
2) Continue with the next filter when Spam is found

A) Continue to next filter when non-spam is found
B) Stop filtering process when non-spam is found


White string filter 1B
Blacklist Filter 1A
Spam word filter 1A
DCC filter 1A
Learning filter 1A
Alphabet soup filter 1A
Newsletter Plugin 1B
Attachment filter 1A
Addressee filter 1A
Empty mail filter 1A
Mystic signs filter 1A
Air filter 1A
Image filter 1A

************************************************************

Here are some subject lines that have passed through even though I have added "stock" to on my spam word list at 100% (copied from the header information):

Subject: GrandSlam Stock
Subject: fwd: Unbiased stock info and valuable insider data
Subject: Smallcap Stock of Interest

If it would help to copy the entire message here I could do so.


Thanks,
Mike
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Dieter » 21. Mär 2006, 19:30

boat_broker hat geschrieben:I noticed that the recommended pluggin settings are Stop filtering when spam is found and Stop filtering process when non-spam is found (1 B). However, as you’ll notice, I have set most to continue to the next filter when non-spam is found. I believe this should weed out spam better. Am I right?

No! You shouldn't change the recommended settings. Each filter processing has three exits:
  1. classified as spam
  2. classified as non-spam
  3. undefined
With the 3rd exit, processing continues with the next filter. A change should only be made if a filter is able to recognize both, spam and non-spam, and for example you will not have the result non-spam from this filter.

Regards
Dieter
Intel(R) Core(TM)2 Duo CPU E8500 3.16 GHz, 3.25 GB Ram, WinXP Home Edition SP3,
Outlook Express 6, Outlook 2007
Intel(R) Core(TM)2 Duo CPU T8100 2.10 GHz, 4.00 GB Ram, Windows 7 Ultimate,
Windows Live Mail, Outlook 2007
T-Online/1&1/Gmx/Strato, Call & Surf Comfort Plus, DSL 16000
Benutzeravatar
Dieter
Assistent
Assistent
 
Beiträge: 1538
Registriert: 14. Sep 2003, 11:16
Wohnort: München

Re: Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 21. Mär 2006, 20:00

White string filter 1B
Blacklist Filter 1A
Spam word filter 1A
DCC filter 1A
Learning filter 1A
Alphabet soup filter 1A
Newsletter Plugin 1B
Attachment filter 1A
Addressee filter 1A
Empty mail filter 1A
Mystic signs filter 1A
Air filter 1A
Image filter 1A


OK Dieter, please bear with me. My brain doesn't do flow charts well. I'll just use two filters as an example, Spam word filter (first filter) and Attachment filter (second filter).

If I get an email with an unacceptable attachment but otherwise it is clean I would expect it to go through the Spam Word Filter and be determined as non Spam. At this point, since it considers it non-spam, Word filter could A) Continue to next filter when non-spam is found or B) Stop filtering process when non-spam is found. Wouldn't I want it to pass through to the next filter (in this case to the Attachment filter) Selection A rather than stop the filtering process and put it in my inbox? If I used Selection B wouldn't it stop the filtering process and pass it directly to my inbox rather than pass it on to the Attachement filter?

Thanks,
Mike
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Chactory » 23. Mär 2006, 00:34

Hello @ you all,

boat_broker hat geschrieben:I have added my filters below and how I have them set. If you read this and notice a problem please comment.
Suggested filter sequence and settings:
White string filter 1B 2B ("positive" filters at the beginning)
Newsletter Plugin 1B 2B
- Scripts Filter 1A
DCC filter 1A 1A
- Hercule 1A
- URL Filter 1A
Attachment filter 1A 1A
Empty mail filter 1A 1A
Air filter 1A 1A
- Charset Plugin 1A
Mystic signs filter 1A 1A
Alphabet soup filter 1A 1A
Spam word filter 1A 1A
Learning filter 1A 1B (Learning filter is the ultimate one)
Addressee filter 1A (BCC fault)
Blacklist Filter 1A (blacklists aren't dependable)
Image filter 1A (redundant with Hercule)

boat_broker hat geschrieben:I’ve tried repeatedly to load the Hercule filter but my computer doesn’t seem to want to down load it.
Please try again later, because it's a very powerful filter.

boat_broker hat geschrieben:I have the spam word “stock” set at 100%. The sensitivity set to medium aggressiveness and 99%.
If you didn't check the regex box, I can't understand the reason. :-( IMHO you don't need to definne 99%, just keep 100%. I don't know if Spamihilator works well with 99%.

Kind regards, Chactory
Vostro 3450, Intel Core i5 2410M 2,3 GHz, 4 GB DDR3 SDRAM 1333 MHz, Windows 7 Pro 64 Bit SP1
Online-Hilfe: «DE» − Chactory's Tipps: «DE» − Anbuva's FAQ: «DE» und «EN» − Bob Loefflers FAQ: «EN»

Bild
Benutzeravatar
Chactory
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
 
Beiträge: 8627
Registriert: 10. Jan 2004, 00:19
Wohnort: Kiel (D)

Re: Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 23. Mär 2006, 22:15

Thanks everyone,

Sorry I don’t seem to be grasping this quickly. I think it might be best for me to search the English Forum more at this point. I haven’t figured out how to easily search the German forum yet. The translators add several steps to the process.

I haven’t changed my settings yet. Chactory, in the list of your filters you have several instances where you have used the code I supplied twice after a filter you listed. Sometimes the second choice is in blue. Were you suggesting something by this? I noticed that you use the A) choice (Continue to next filter when non-spam is found) more often than the “default” B) choice. That’s encouraging.

I have a request, though I don’t know who to ask and I don’t know if it is feasible. Would it be possible to develop a data base on the WEB which would include two sample e-mails for each filter? One would be spam, the other non-spam for each filter. We could copy it (or, even better, just type in our address and have it sent to our e-mail client) and send it to ourselves. Then we could repeatedly test the filters with the same data to determine if the filters were working the way we expect. It would also allow everyone in the discussion to use the same data. Just a thought. I don’t have any idea how to implement it yet.

I can switch back to 100% threshold. I was just concerned because I understood the directions to say Spami filtered any e-mail over the threshold (default 100%) and I figured if it hit right on 100% it wouldn’t be filtered so I dropped the threshold to 99%.

I’ll try to keep up with this forum. Even with Spami running at 50% efficiency it is great! It seldom kicks out any non-spam.

Best regards,
Mike
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Chactory » 23. Mär 2006, 23:06

Hello Mike,

boat_broker hat geschrieben:Sorry I don’t seem to be grasping this quickly.
that's not true, you are comprehending very fast, so that I can't follow furthermore. ;-)

Sorry for not figuring out that in my posting above the second instance of the code you used means that I added the settings like I'm using them. In blue are those which differ from yours.

boat_broker hat geschrieben:Would it be possible to develop a data base on the WEB which would include two sample e-mails for each filter?
This has been done (in the beginning) by BillyX, whose server isn't reachable since months, unfortunately. Perhaps we can acquire somebody else to install this again. Your idea is quite more complete. I hope Michel Krämer will read this.

boat_broker hat geschrieben:It seldom kicks out any non-spam.
This is the deciding point, I guess.

See here (sorry, in german, but only two sentences): http://www.spamihilator.com/forum/viewt ... 9027#29027

Kind regards, Chactory
Vostro 3450, Intel Core i5 2410M 2,3 GHz, 4 GB DDR3 SDRAM 1333 MHz, Windows 7 Pro 64 Bit SP1
Online-Hilfe: «DE» − Chactory's Tipps: «DE» − Anbuva's FAQ: «DE» und «EN» − Bob Loefflers FAQ: «EN»

Bild
Benutzeravatar
Chactory
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
 
Beiträge: 8627
Registriert: 10. Jan 2004, 00:19
Wohnort: Kiel (D)

Re: Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 24. Mär 2006, 18:07

Thanks for the morale boost.

Unfortunately...... that brings up another question that I was going to ask later. That post included some spam word entries to allow Spami to cope with misspelled / misrepresented words. Can someone explain the logic behind it?

For instance, "* POLICY VIOLATION ! *" is broken down to (I've given each letter its own line) :
^[cC].?[\.]?
[iI1! \|][iI1! \|]?.?[\.]?
[aA][aA]?.?[\.]?
[lLiI17 \|][lLiI17 \|]?.?[\.]?
[iI1! \|][iI1! \|]?.?[\.]?
[sS235][sS235]?$

I see that a single "?" is placed at the end of the options for each letter and that the word is enclosed by a roof symbol (I don't find it on my keyboard) and a "$". However, the purpose of the multiple brackets and the ".?" and "?.?" spacers eludes me. It appears that there are characters that a spammer would substitute for a given letter inside the brackets. But why the multiple brackets and why are some separated by ".?" or "?.?" and some are not separated at all? "[\.]" is used for most letters but not all. Why?

If I wanted to modify the letter "a" to include "@" how would I do this?

Thanks all,
Mike
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Chactory » 24. Mär 2006, 18:58

Hi Mike!

^´` are the accents for ê é è.
Learn about RegEx e.g. here:
http://www.regular-expressions.info/index.html

Kind regards, Chactory
Vostro 3450, Intel Core i5 2410M 2,3 GHz, 4 GB DDR3 SDRAM 1333 MHz, Windows 7 Pro 64 Bit SP1
Online-Hilfe: «DE» − Chactory's Tipps: «DE» − Anbuva's FAQ: «DE» und «EN» − Bob Loefflers FAQ: «EN»

Bild
Benutzeravatar
Chactory
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
 
Beiträge: 8627
Registriert: 10. Jan 2004, 00:19
Wohnort: Kiel (D)

Re: Understanding "Regular Expressions" and "Expr

Beitragvon boat_broker » 24. Mär 2006, 23:13

Thanks Chactory,

I'm maxed out now. I'll probably try to digest and return. Have a good weekend.

Mike
boat_broker
Spam-Jäger
Spam-Jäger
 
Beiträge: 18
Registriert: 16. Mär 2006, 23:24

Re: Understanding "Regular Expressions" and "Expr

Beitragvon Chactory » 24. Mär 2006, 23:34

boat_broker hat geschrieben:I'm maxed out now. I'll probably try to digest and return. Have a good weekend.

Bye, Mike, and thanks for your good wishes. :wink:
Vostro 3450, Intel Core i5 2410M 2,3 GHz, 4 GB DDR3 SDRAM 1333 MHz, Windows 7 Pro 64 Bit SP1
Online-Hilfe: «DE» − Chactory's Tipps: «DE» − Anbuva's FAQ: «DE» und «EN» − Bob Loefflers FAQ: «EN»

Bild
Benutzeravatar
Chactory
Administrator
Administrator
 
Administration
Beta-Tester
Forum-Team
 
Beiträge: 8627
Registriert: 10. Jan 2004, 00:19
Wohnort: Kiel (D)


Zurück zu English Forum

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 0 Gäste

 industrious-southeast