源码

《STL源码剖析》学习笔记(4)

20,STL仿函数需要定义相应的型别,以便让配接器取出,以此而拥有配接能力,相应型别只是一些typedef,所有操作在编译期全部完成,对程序的执行效率没有影响。仿函数的相应型别主要用来表现函数参数型别和传回值,为了方便,STL已经定义了两个class,代码如下:

template <class Arg, class Result>
struct unary_function {
    typedef Arg argument_type;
    typedef Result result_type;
};
 
template </class><class Arg1, class Arg2, class Result>
struct binary_function {
    typedef Arg1 first_argument_type;
    typedef Arg2 second_argument_type;
    typedef Result result_type;
};      
</class>

› Continue reading

Tags: , ,

星期六, 十月 29th, 2011 学习笔记 没有评论

《STL源码剖析》学习笔记(3)

18,copy算法利用函数重载,对char* 和wchar_t* 的操作直接memmove,利用模版特化对于有trivial operator=的操作memmove,对于RandomAccessIterator通过头尾间隔来确定循环次数,对于InputIterator通过不断累加是否到达last来确定循环次数。

template <class InputIterator, class OutputIterator>
inline OutputIterator __copy(InputIterator first, InputIterator last,
	OutputIterator result, input_iterator_tag)
{
	for ( ; first != last; ++result, ++first)
		*result = *first;
	return result;
}
</class>

› Continue reading

Tags: , ,

星期三, 十月 26th, 2011 学习笔记 没有评论

《STL源码剖析》学习笔记(2)

12,set、map、multiset、multimap底层均以红黑树实现。红黑树是一种平衡二叉搜索树,平衡二叉搜索树比失去平衡的二叉搜索树来说插入和删除的时间长,但是查找速度快,红黑树相对普通平衡二叉搜索树来说据说统计性能比较好。红色树的迭代器内含一个节点的指针,按照一定的规则来移动,是双向迭代器。跟list一样,每个都是个节点,所以插入不会导致其他迭代器失效,内存也是随用随申请和释放。

13,二叉搜索树具有对数平均时间的表现,但是这样的表现构造在输入数据具有足够随机性的假设上。hashtable的数据结构在插入、删除、搜索等操作上也具有常数平均时间的表现,而且这种表现是以统计为基础,不需要依赖输入数据的随机性。
› Continue reading

Tags: , ,

星期三, 十月 12th, 2011 学习笔记 没有评论

JScript.Encode的解密

一直都是用scrdec18.exe来解密JScript.Encode这种用微软screnc.exe加密的网页,今天搜了下,官网还提供了源码,hoho,VC6下顺利编译,晚上回家把他加到mdecoder中。老外真是太可爱了。

附:

Win32 command line executable : scrdec18.exe (53 Kb)
Source (will compile cleanly on most Unix systems): scrdec18.c

另外官网上还有一篇关于如何解密这种加密的文章《Breaking The Windows Script Encoder》,转载如下:

Breaking The Windows Script Encoder

The Windows Script Encoder (screnc.exe) is a Microsoft tool that can be used to encode your scripts (i.e. JScript, ASP pages, VBScript). Yes: encode, not encrypt. The use of this tool is to be able to prevent people from looking at, or modifying, your scripts. Microsoft recommends using the Script Encoder to obfuscate your ASP pages, so in case your server is compromised the hacker would be unable to find out how your ASP applications work.

You can download the Windows Script Encoder at http://www.microsoft.com/downloads/details.aspx?FamilyID=e7877f67-c447-4873-b1b0-21f0626a6329&displaylang=en

The documentation already says the following:

Note that this encoding only prevents casual viewing of your code; it will not prevent the determined hacker from seeing what you’ve done and how.

(By the way, because of this text, I did not deem it necessary to inform Microsoft of this article).

Also, an encoded script is protected against tampering and modifications:

After encoding, if you change even one character in the encoded text, the integrity of the entire script is lost and it can no longer be used.

So we can make the following observations:

  • We are a “determined hacker”. *grins*
  • If it’s about “preventing casual viewing”, what’s wrong with encoding mechanisms like a simple XOR or even uuencode, base64, and URL-encoding?
  • Anyone using this tool will be convinced that it’s safe to hard-code all usernames, passwords, and “secret” algorithms into their ASP-pages. And any “determined hacker” will be able to get to them anyway.

Okay. So even Microsoft says this can be broken. Can’t be difficult then. It wasn’t. Writing this article took me at least twice the time I needed for breaking it. But I think this can be a very nice exercise for anyone who wants to learn more about analysing codes like this, with known plaintext, known cihpertext, and unknown key and algorithm. (Actually, a COM object that can do the encoding is shipped with IE 5.0, so reverse engineering this will reveal the algorithm, but that’s no fun, is it?)

So, how does this work?

The Script Encoder works in a very simple way. It takes two parameters: the filename of the file containing the script, and the name of the output file, containing the encoded script.

What part of the file will be encoded depends on the filename extension, as well as on the presence of a so-called “encoding marker”. This encoding marker allows you to exclude part of your script from being encoded. This can be very handy for JavaScripts, because the encoded scripts will only work on MSIE 5.0 or higher…. (of course this is not an issue for ASP and VB scripts that run on a web server!).

Say, you’ve got this HTML page with a script you want to hide from prying eyes:

<HTML>
<HEAD>
<TITLE>Page with secret information</TITLE>
<SCRIPT LANGUAGE=”JScript”>
<!–//
//**Start Encode**
alert (“this code should be kept secret!!!!”);
//–>
</SCRIPT>
</HEAD>
<BODY>
This page contains secret information.
</BODY>
</HTML>

This is what it looks like after running Windows Script Encoder:

<HTML>
<HEAD>
<TITLE>Page with secret information</TITLE>
<SCRIPT LANGUAGE=”JScript.Encode”>
<!–//
//**Start Encode**#@~^QwAAAA==@#@&P~,l^+DDPvEY4kdP1W[n,/tK;V9P4
~V+aY,/nm.nD"Z"eE#p@#@&&JOO@*@#@&qhAAAA==^#~@&
</SCRIPT>
</HEAD>
<BODY>
This page contains secret information.
</BODY>
</HTML>

As you can see, the <script language="..."> has been changed into "JScript.Encode". The Script Encoder uses the Scripting.Encoder COM-object to do the actual encoding. The decoding will be done by the script interpreter itself (so we cannot simply call a Scripting.Decoder, because that doesn't exist).

Okay, let's play!

Plaintext Encoded
Hoi
#@~^FQAAAA==@#@&CGb@#@&zz O@*@#@&WwIAAA==^#~@
Hai
#@~^FQAAAA==@#@&CCb@#@&zz O@*@#@&TQIAAA==^#~@
HaiHai HaiHai
#@~^IgAAAA==@#@&CCbCmk@#@&CmrCmk@#@&JzRR@*@#@&mgUAAA==^#~@

Cute. As you can see, @#@& appears to be a newline (@# = CR, @& = LF), and the position of a character does (sometimes...) matter (the first time HaiHai becomes CCbCmk and the second time it's CmrCmk).
Let's just encode a line with a lot of A's:

//**Start Encode**#@~^lgAAAA==@#@&b)zbzbbzbz)bzb)bzb))zbbz)bzbbz))bzbzb)b))zb)bz)bzb))zbb))zb)bz )zb)zbzbbzbz)bzb)bzb))zbbz)bzbbz))bzbzb)b))zb)bz)bzb))zbb))zb)bz)zb)zb@#@&zJO @*@#@&vyIAAA==^#~@

The algorithm

After staring at this for some time, I discovered that the red part was repeating (actually, the entire string is repeating itself after 64 characters). Also, it seems to be that the character 'A' has three different representations: b, z, and ). If you encode a string of B's you'll see the same pattern, but with different characters.

This means the encoding will look something like this:

int pick_encoding[64] = {….};
int lookuptable[96][3] = {….};

char encode_char (char c, int pos)
{
if (!specialchar (c))
return lookuptable [c-32][pick_encoding[pos%64]];
else
return escapedchar (c);
}

I assumed that only the ASCII codes 32 to 126 inclusive, and 9 (TAB) are encoded. The rest is being escaped in a similar fashion as CR and LF.

What’s left is the stuff before and after the encoded string. I did not look into this (yet). It will probably contain a checksum and some information about the length of the encoded script.

The encoding tables

So now we’ll have to find out those tables for the encoding. The pick_encoding table is very simple to discover by just looking at the pattern that was the result of encoding all those A’s.

int pick_encoding[64] = { 1, 2, 0, 1, 2, 0, 2, 0, 0, 2, 0, 2, 1, 0, 2, 0, 1, 0, 2, 0, 1, 1, 2, 0, 0, 2, 1, 0, 2, 0, 0, 2, 1, 1, 0, 2, 0, 2, 0, 1, 0, 1, 1, 2, 0, 1, 0, 2, 1, 0, 2, 0, 1, 1, 2, 0, 0, 1, 1, 2, 0, 1, 0, 2 };

The string of A’s had a CR and LF in front of them, so after skipping the first two digits, you’ll see that 0, 1, 2, 0, 2, 0, 0, 2 perfectly matches b, ), z, b, z, b, b, z , having b=0, )=1 and z=2.

The other table is a matrix that holds three different representations for each character. Which one will be used, depends on the pick_encoding table. To find out this matrix, just make a file that will cause every character to be encoded three times. Make sure the algorithm is ‘reset’ by padding the lines so each group will start on a 64-byte boundary.

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
!!!aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
“”"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
###aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
$$$aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Etcetera. Note that there is only 59 bytes of padding a’s because the CR and LF at the end of the line are counting too! (59 + 2 + 3 = 64).

After encoding this you can remove the encoded a’s again, as well as the @#@& for the CR and LF. This is what remains:

d7i P~, “Ze JEr a:[ ^yf ]Yu ['L BvE `cv #b* eMC _Q3 ~SB OR R c z&J !TZ Fq8 +y &f2 c*W *Xl v+ G{F %0R ,1O )l= iIp @!@!@! 'x{ @*@*@* g_Q @$@$@$ b)z A$~ Z/; f9G 23A sow M!V Cu_ q(& 9Bx |Fn SJd H\t 1Hg r6} nKh p}5 I]” ?j U KP: ji` .#j q (po 5eI }t\ $,] -w’ TDY 7?% {m| =|# lCm 48( m^1 N[9 +n 0W6 oLT t44 krb L%N 3V0 Vs^ :hs xU WGK w2a ;5$ D .M /dk YOD E;! \-7 hAS 6aX XzH y". `P uk- 8N) U=?

So what is this? It's the encoded representation of the ASCII characters 9, and 32 through 126. Every character has got three different representations, so this sums up to 3*(127-32 + 1) = 288 characters.

You'll see that the < , > and @ characters are escaped too, resulting in the following table:

Esc Org
@# \r
@& \n
@! <
@* >
@$ @

I've removed the @!, @* and @$ from the encoded text too and replaced them with question marks, so the table will stay nice. This is what you get as a hex dump:

unsigned char encoding[288] = { 0×64,0×37,0×69, 0×50,0x7E,0x2C, 0×22,0x5A,0×65, 0x4A,0×45,0×72, 0×61,0x3A,0x5B, 0x5E,0×79,0×66, 0x5D,0×59,0×75, 0x5B,0×27,0x4C, 0×42,0×76,0×45, 0×60,0×63,0×76, 0×23,0×62,0x2A, 0×65,0x4D,0×43, 0x5F,0×51,0×33, 0x7E,0×53,0×42, 0x4F,0×52,0×20, 0×52,0×20,0×63, 0x7A,0×26,0x4A, 0×21,0×54,0x5A, 0×46,0×71,0×38, 0×20,0x2B,0×79, 0×26,0×66,0×32, 0×63,0x2A,0×57, 0x2A,0×58,0x6C, 0×76,0x7F,0x2B, 0×47,0x7B,0×46, 0×25,0×30,0×52, 0x2C,0×31,0x4F, 0×29,0x6C,0x3D, 0×69,0×49,0×70, 0x3F,0x3F,0x3F, 0×27,0×78,0x7B, 0x3F,0x3F,0x3F, 0×67,0x5F,0×51, 0x3F,0x3F,0x3F, 0×62,0×29,0x7A, 0×41,0×24,0x7E, 0x5A,0x2F,0x3B, 0×66,0×39,0×47, 0×32,0×33,0×41, 0×73,0x6F,0×77, 0x4D,0×21,0×56, 0×43,0×75,0x5F, 0×71,0×28,0×26, 0×39,0×42,0×78, 0x7C,0×46,0x6E, 0×53,0x4A,0×64, 0×48,0x5C,0×74, 0×31,0×48,0×67, 0×72,0×36,0x7D, 0x6E,0x4B,0×68, 0×70,0x7D,0×35, 0×49,0x5D,0×22, 0x3F,0x6A,0×55, 0x4B,0×50,0x3A, 0x6A,0×69,0×60, 0x2E,0×23,0x6A, 0x7F,0×09,0×71, 0×28,0×70,0x6F, 0×35,0×65,0×49, 0x7D,0×74,0x5C, 0×24,0x2C,0x5D, 0x2D,0×77,0×27, 0×54,0×44,0×59, 0×37,0x3F,0×25, 0x7B,0x6D,0x7C, 0x3D,0x7C,0×23, 0x6C,0×43,0x6D, 0×34,0×38,0×28, 0x6D,0x5E,0×31, 0x4E,0x5B,0×39, 0x2B,0x6E,0x7F, 0×30,0×57,0×36, 0x6F,0x4C,0×54, 0×74,0×34,0×34, 0x6B,0×72,0×62, 0x4C,0×25,0x4E, 0×33,0×56,0×30, 0×56,0×73,0x5E, 0x3A,0×68,0×73, 0×78,0×55,0×09, 0×57,0×47,0x4B, 0×77,0×32,0×61, 0x3B,0×35,0×24, 0×44,0x2E,0x4D, 0x2F,0×64,0x6B, 0×59,0x4F,0×44, 0×45,0x3B,0×21, 0x5C,0x2D,0×37, 0×68,0×41,0×53, 0×36,0×61,0×58, 0×58,0x7A,0×48, 0×79,0×22,0x2E, 0×09,0×60,0×50, 0×75,0x6B,0x2D, 0×38,0x4E,0×29, 0×55,0x3D,0x3F } ;

So, encoding character c at position i goes as follows:

  • look up which representation to use (the first, second or third): pick_encoding[i mod 64]
  • find the representations in the huge table:
    encoding[c * 3]
  • encoded character =
    encoding[c*3 + pick_encoding[i%64]];

Because the table starts at 9 and then goes to 32, you’ll have to do some corrections. But we’ll get to that later, as we are not really interested in encoding after all. We want to be able to do some decoding!

The decoding tables

The pick_encoding table will stay the same. This is because each character (except for the escaped ones, of course) will be in the same place as the original. Then, we could just look up the encoded character in the table. For instance, an ‘A’ in encoded text (hex 0×41), occurs on these places in the ‘encoding’ table:

  • row 9, group 4, representation 1 = ‘F’
  • row 10, group 3, representation 3 = ‘I’
  • row 23, group 1, representation 2 = ‘{‘

So an ‘A’ in the encoded text is an F, I or {, depending on it’s position. Where there is a 0 in the pick_encoding table, it’s an F, for 1 it’s an I, and for 2 it’s a {.

You don’t want to go looking through the encoding table each time, trying to find those numbers. By transforming the encoding table into another table, you can just go to position 0×41 (actually, 0×41 – 31 to correct it skipping everything below space except for TAB), and pick the correct representation.

unsigned char transformed[3][126];

void maketrans (void)
{
int i, j;

for (i=31; i<=126; i++)
for (j=0; j<3; j++)
transformed[j][encoding[(i-31)*3 + j]] = (i==31) ? 9 : i;
}

With this matrix, it’s very simple to look up the original character by simply looking it up in our table. Assume i is the position of the character and c is the character again. Then:

decoded = transformed[pick_encoding[i%64]][c];

The encoding of the length-field

So what’s left is to find out how many characters there are to decode. If we just keep decoding stuff, we will decode part of the HTML that’s behind the encoded script. This can be avoided by stopping when a ‘<’ is encountered (‘<’ will never appear in an encoded stream), but even in the case we are looking at a ‘pure’ script file (*.js or *.vbs), there is some checksum stuff behind the actual data, which we should not decode.

I created a number of files of different size. By giving them a *.js extension the entire file is encoded without the Script Encoder looking for a start marker. The results are below (only the first 12 bytes are displayed).

Length First 12 bytes ASCII
1 23 40 7E 5E 41 51 41 41-41 41 3D 3D #@^EQAAAA==
2 23 40 7E 5E 41 67 41 41-41 41 3D 3D #@^EgAAAA==
3 23 40 7E 5E 41 77 41 41-41 41 3D 3D #@^EwAAAA==
4 23 40 7E 5E 42 41 41 41-41 41 3D 3D #@^FAAAAA==
5 23 40 7E 5E 42 51 41 41-41 41 3D 3D #@^FQAAAA==
6 23 40 7E 5E 42 67 41 41-41 41 3D 3D #@^FgAAAA==
7 23 40 7E 5E 42 77 41 41-41 41 3D 3D #@^FwAAAA==
8 23 40 7E 5E 43 41 41 41-41 41 3D 3D #@^GAAAAA==
9 23 40 7E 5E 43 51 41 41-41 41 3D 3D #@^GQAAAA==
32 23 40 7E 5E 49 41 41 41-41 41 3D 3D #@^IAAAAA==
48 23 40 7E 5E 4D 41 41 41-41 41 3D 3D #@^MAAAAA==
80 23 40 7E 5E 55 41 41 41-41 41 3D 3D #@^UAAAAA==
96 23 40 7E 5E 59 41 41 41-41 41 3D 3D #@^YAAAAA==
103 23 40 7E 5E 5A 77 41 41-41 41 3D 3D #@^ZwAAAA==
104 23 40 7E 5E 61 41 41 41-41 41 3D 3D #@^aAAAAA==
111 23 40 7E 5E 62 77 41 41-41 41 3D 3D #@^bwAAAA==
116 23 40 7E 5E 64 41 41 41-41 41 3D 3D #@^dAAAAA==
166 23 40 7E 5E 70 67 41 41-41 41 3D 3D #@^pgAAAA==
216 23 40 7E 5E 32 41 41 41-41 41 3D 3D #@^2AAAAA==
265 23 40 7E 5E 43 51 45 41-41 41 3D 3D #@^CQEAAA==
451 23 40 7E 5E 77 77 45 41-41 41 3D 3D #@^wwEAAA==

The length seems to be encoded in the 5th to 10th byte, and 41 appears to be representing zero. The first byte of the length seems to be increasing with one when the length increases with 4. Also, the second byte alternates between 41, 51, 67, and 77.

If you look at length 166, this value is 0×70, where it should be 0×41 + (166/4) = 0x6a. So something goes wrong, and it can be narrowed down to length 104, where it suddenly jumps from 0x5a to 0×61. This puzzled me for a long time, until I realised that 0x5a = ‘Z’ and 0×61 = ‘a’. And yes, the length turns out to be Base64 encoded indeed :)

The checksum

At the end of the encoded data is apparently some kind of checksum. I did not look into this any further.

The decoder program

The further working of the decoder program, which can be downloaded from the scrdec home page, is left as an exercise to the reader. It’s implemented as a “Turing-like” state machine. The decoder will treat .js and .vbs files as fully encoded, while .htm(l) and .asp files are seen as files that contain script amongst other things – like HTML code.

The decoder simply takes two arguments: input filename (encoded), and output filename (decoded).

There is one thing lacking in the decoder: the value of the <SCRIPT LANGUAGE=”…”> attribute, is not changed back into the original form. You’d better use a tool like sed for that.

Conclusion

It’s not just sad that Microsoft made a tool like this. They’ve probably asked Bill Gates’ little nephew to write this code. The really bad part is that Microsoft actually recommends people to use this piece of crap, and because of that, people will rely on it, even though the documentation hints that it’s unsafe. (Nobody reads the docs anyway…)

Security by obscurity is a bad, bad idea. Instead of encouraging that approach, Microsoft should educate programmers to find other ways to store their passwords and sensitive data, and tell them that an algorithm or any other piece of code that needs to be ‘hidden’, is just bad design.

Tags: , , , ,

星期二, 七月 21st, 2009 网马解密 4 条评论

MDecoder开源

庆祝女友考研成功,MDecoder开源。

下载地址:http://blog.mtian.net/wp-content/uploads/2009/05/mdecoder_source.zip

没整理,懒得写说明,随便写点:

代码写的挺烂,需要的人就凑合着看吧。呵呵。

需要用到 Aogo 的正则表达式库,请自行到 http://www.aogosoft.com/downpage.asp?table=soft&id=172 下载。

发现BUG,请email给我 adian410@yahoo.com.cn (不保证修改)。

update:开源的代码是0.1版本,后续版本未开源,源码正在出售。

Tags: ,

星期一, 五月 4th, 2009 MDecoder 7 条评论