von dabserver » 24. Apr 2008, 08:44
I google problem for many days, and now I can say, that Spamihilator ignores some rules of base64 quote-printible decoding of Subject header!!
I read this forum and as I see, users spoke about this about two or more ears. Why this problem still not fixed?
Speech that after decoding of Subject Header you get the codes of characters, but after it, it is impossible to use the proper by it letters in the default font (codepage) of the system.
Foremost, for very plenty of national codepages, it is necessary to make comparison of codes of characters of the code indicated in a header, with codes, proper the system code of windows.
For example cyrrilic codepages (iso-8859-5, koi8-r and others) - in this codepages,
in these codes the location of letters differs from their location in a codepage 1251.
iso8859-5

koi8-r

windows-1251

and one more example - IBM - Cyrillic in Unicode - look here. As you can notice here (and on images above), same national letters have different codes in koi8-r, iso-8859-5 and in Windows 1251 Codepages. The russian windows xp default is windows-1251, but subject, decoded from koi8-r or iso-8859-5 codepages - are not correct readable russian text , if you show it for user using windows-1251 codepagefrom 3000 of my colected by two month messages, about 2000 of them decoded incorect by spamihilator . I do not know as much from them are not a spam, because they are displayed by spamihilator in the not read kind.
now did I convince you?
Notice - I understand that to decide this problem not very much simply and not very much quickly! Let us think above a decision together!
I think the best way to implement support for national codepages, is to give the ability for users to set in spamihilator settings system default codepage (or spamihilator can determinate it automatically), and give users the ability to edit and complement the separate file, with comparisons of codes of pages.
a file format can be very simple - for example (reference of code, got at decoding of subject, on a code in a table by system default)
!koi8-r=windows-1251
179=168
223=250
...
!
!iso-8859-5=windows-1251
161=168
234=250
...
and so on. I think there are many examples of just such transformations on php (java) or apache forums.
After implementing this, the only problem - is the correct decoding of message body in Spamihilator "view message" window. for the economy of time for development and realization - simply enable users to change (to choose) a font straight in a window of the Spamihilator "view message". (i have some truetype fonts with changed codepage for correct showing texts from different codepages on windows-1251 codopage, and them not so difficult to do by user itself)
That will say about all of it Spamihilator developer? Does can it is time to decide this problem?
I google problem for many days, and now I can say, that Spamihilator ignores some rules of base64 quote-printible decoding of Subject header!!
I read this forum and as I see, users spoke about this about two or more ears. Why this problem still not fixed?
Speech that after decoding of Subject Header you get the codes of characters, but after it, it is impossible to use the proper by it letters in the default font (codepage) of the system.
Foremost, for very plenty of national codepages, it is necessary to make comparison of codes of characters of the code indicated in a header, with codes, proper the system code of windows.
[b]For example cyrrilic codepages (iso-8859-5, koi8-r and others) - in this codepages,
in these codes the location of letters differs from their location in a codepage 1251.
iso8859-5
[img]http://www-sbras.nsc.ru/gif/inter/cp-iso8859-5.gif[/img]
koi8-r
[img]http://www-sbras.nsc.ru/gif/inter/cp-koi8-r.gif[/img]
windows-1251
[img]http://www-sbras.nsc.ru/gif/inter/cp-cp1251.gif[/img]
and one more example - [url=http://www.ibm.com/developerworks/linux/library/l-u-cyr/]IBM - Cyrillic in Unicode[/url] - look here. As you can notice here (and on images above), same national letters have different codes in koi8-r, iso-8859-5 and in Windows 1251 Codepages. The russian windows xp default is windows-1251, but subject, decoded from koi8-r or iso-8859-5 codepages - are not correct readable russian text , if you show it for user using windows-1251 codepage[/b]
from 3000 of my colected by two month messages, about 2000 of them decoded incorect by spamihilator . I do not know as much from them are not a spam, because they are displayed by spamihilator in the not read kind.
now did I convince you? :)
Notice - I understand that to decide this problem not very much simply and not very much quickly! Let us think above a decision together!
I think the best way to implement support for national codepages, is to give the ability for users to set in spamihilator settings system default codepage (or spamihilator can determinate it automatically), and give users the ability to edit and complement the separate file, with comparisons of codes of pages.
a file format can be very simple - for example (reference of code, got at decoding of subject, on a code in a table by system default)
!koi8-r=windows-1251
179=168
223=250
...
!
!iso-8859-5=windows-1251
161=168
234=250
...
and so on. I think there are many examples of just such transformations on php (java) or apache forums.
After implementing this, the only problem - is the correct decoding of message body in Spamihilator "view message" window. for the economy of time for development and realization - simply enable users to change (to choose) a font straight in a window of the Spamihilator "view message". (i have some truetype fonts with changed codepage for correct showing texts from different codepages on windows-1251 codopage, and them not so difficult to do by user itself)
[b]That will say about all of it Spamihilator developer? Does can it is time to decide this problem?[/b]