Internet technologies. Lab: CGI. CGI programming in assembler?!? - Easily! URL Structure and Request Data Encoding
The article has been circulating on the Internet for quite some time, but, as the author, I think I have the right to repost it here. Much (if not all) of what is written here is outdated and may seem useless at first glance, but having gone this way, 6 years later, I can say that it was not superfluous. So.
In this article I want to talk about the CGI interface in general, its implementation for windows and the use of assembly language in writing CGI programs in particular. Not within the scope of this article Full description CGI, since there is simply a sea of material on the Internet on this issue, and I simply don’t see the point in retelling all this here.
CGI theory
CGI - (Common Gateway Interface)– Common Gateway Interface. As you might guess, this interface serves as a gateway between the server (here I mean the server program) and some external program written for the OS on which this server is running. Thus, the CGI is responsible for exactly how the data will be transferred from the server program to the CGI program and vice versa. The interface does not impose any restrictions on what the CGI program should be written in, it can be either a regular executable file or any other file - the main thing is that the server can run it (in the windows environment, for example, it can be a file with the extension associated with any program).From the moment you call (for example, press the button of the form to which the call to the CGI program is attached) the CGI program until you receive the result in the browser window, the following happens:
- A web client (for example, a browser) creates a connection to the server specified in the URL;
- The web client sends a request to the server, this request is usually made using two GET or POST methods;
- Data from a client request (eg form field values) is passed by the server using the CGI interface to the CGI program specified in the URL;
- The CGI program processes the client's data received from the server and, based on this processing, generates a response to the client, which it sends over the same CGI interface to the server, and the server, in turn, sends it directly to the client;
- The server terminates the connection with the client.
The standard CGI specification assumes that the server can communicate with the program in the following ways:
- Environment variables - they can be set by the server when starting the program;
- Standard input stream (STDIN) - with its help the server can transfer data to the program;
- Standard output stream (STDOUT) - the program can write its own output to it, which is transmitted to the server;
- Command line - the server can pass some parameters to the program in it.
Standard input/output streams are very convenient and widely used on UNIX systems, which cannot be said about windows, so there is a CGI specification developed specifically for windows systems called "Windows CGI". But, of course, standard input / output streams can also be used in windows CGI programming. Here I will not touch on the “Windows CGI” standard, and there are at least two reasons for this - the first, and most important - at the moment, not all http servers under windows support this specification (in particular, my favorite Apache 1.3.19) . You can observe the second reason by typing in any search engine line "Windows CGI". I will note only general details regarding this interface - all data from the server to the client is transmitted through the usual windows * .ini file, the name of which is passed to the program on the command line. At the same time, all the data in the file is already carefully divided into sections by the server, and you just have to use the “GetPrivateProfile*” functions to extract them from there. The response to the server is transmitted again by means of a file whose name is specified in the corresponding entry in the ini file.
What kind of data can be passed by a client to a CGI program? - almost any. In the general case, the program is passed the values of the form fields that the client fills in, but it can also be any binary data, such as a file with a picture or music. Data can be sent to the server with two various methods is a GET method and a POST method. When we create a form to fill out on our page, we explicitly indicate which of the methods we want to send the data entered by the user, this is done in the main form tag like this:
When sending data using the GET method, the browser reads the data from the form and places it after the URL of the script, after the question mark, if there are several significant fields in the form, then they are all transmitted through the "&" sign, the field name and its value are written in the URL through the " =". For example, the request generated by the browser from the form when clicking on the button to which the script "/cgi-bin/test.exe" is attached, given that the first field of the form is called "your_name", the second - "your_age", may look like this:
GET /cgi-bin/test.exe?your_name=Pupkin&your_age=90 HTTP/1.0
Using the GET method has several weaknesses- the first and most important thing - because data is transmitted in the URL, then it has a limit on the amount of these transmitted data. The second weakness again follows from the URL - this is confidentiality, with such a transfer, the data remains absolutely open. So, it's good if we have 2-3 small fields in the form ... the question arises what to do if there is more data? The answer is to use the POST method!
When using the POST method, the data is transmitted to the server as a data block, and not in a URL, which somewhat frees our hands to increase the amount of information transmitted, for the above example of the POST form, the block sent to the server will be something like this:
POST /cgi-bin/test.exe HTTP/1.0
Accept: text/plain
Accept: text/html
Accept: */*
Content-type: application/x-www-form-urlencoded
content length: 36
your_name=Pupkin&your_age=90
As mentioned above, after receiving the data, the server must convert it and pass it to the CGI program. In the standard CGI specification, the data entered by the client in a GET request is placed by the server in the program's environment variable "QUERY_STRING". A POST request places data on the application's standard input stream, from where it can be read. In addition, with such a request, the server sets two more environment variables - CONTENT_LENGTH and CONTENT_TYPE, which can be used to judge the length of the request in bytes and its content.
In addition to the data itself, the server sets other environment variables of the called program, here are some of them:
REQUEST_METHOD
Describes how the data was obtained
Example: REQUEST_METHOD=GET
QUERY_STRING
Query string if the GET method was used
Example: QUERY_STRING= your_name=Pupkin&your_age=90&hobby=asm
CONTENT_LENGTH
Length in bytes of the request body
Example: CONTENT_LENGTH=31
CONTENT_TYPE
Request body type
GATEWAY_INTERFACE
CGI protocol version
Example:GATEWAY_INTERFACE=CGI/1.1
REMOTE_ADDR
The IP address of the remote host, that is, the client who pressed the button in the form
Example: REMOTE_ADDR=10.21.23.10
REMOTE_HOST
The name of the remote host, this can be its domain name or, for example, the computer name in Windows environment, if none can be obtained, then the field contains its IP
Example: REMOTE_HOST=wasm.ru
SCRIPT_NAME
The name of the script used in the request.
Example: SCRIPT_NAME=/cgi-bin/gols.pl
SCRIPT_FILENAME
The name of the script file on the server.
Example: SCRIPT_FILENAME=c:/page/cgi-bin/gols.pl
SERVER_SOFTWARE
Server software
Example: Apache/1.3.19 (WIN32)
The called CGI program can read any of its environment variables set by the server and use it to its advantage.
In general, this is all in brief, for more detailed information about the Common Gateway Interface, see the specialized documentation, I made this description in order to remind you, and if you did not know, then bring it up to date. Let's try to do something in practice.
Practical part
For practice, we need at least 3 things - some http server for Windows, I tried all the examples on Apache 1.3.19 for Windows, the server is free, you can download it from iYes, and we will need a server not just anyhow - which one, but configured to run cgi scripts! See the documentation for how this is done for the server you are using. The second thing we need is of course an assembler, it is also necessary that the compiler supports the creation of WIN32 console applications, I use Tasm, but Fasm and Masm and many other *asms are fine. And finally, the most important thing is that this desire is required.
So, I assume that the server was successfully installed and configured by you, so that in the document root directory of the server there is an index.html file, which is wonderfully displayed in the browser when you type the address 127.0.0.1 . I will also take into account that somewhere in the wilds of the server folders there is a “cgi-bin” folder, in which scripts are allowed to run.
Let's check the server setup, and at the same time write a small script. Our script will be a regular *.bat file. I foresee questions - how? really? Yes, this is a regular batch file, as mentioned above, the CGI specification does not distinguish between file types, the main thing is that the server can run it, and he, in turn, has access to stdin / stdout and environment variables, a bat file, albeit not fully, but for an example we are quite satisfied. Let's create a file with the following content:
@echo off
rem Request header
echo Content-type: text/html
echo.
rem request body
echo "Hi!
echo "GET request received data: %QUERY_STRING%
Let's call the file test.bat and place it in the directory for running scripts, most likely it will be the "cgi-bin" directory. The next thing we will need to do is call this script in some way, in principle, this can be done directly by typing something like the following “http://127.0.0.1/cgi-bin/test.bat” in the browser address box, but let's let's call it from our main page, at the same time check the operation of the GET method. Let's create an index.html file in the root of the server with the following content:
Now, when you enter the server (http://127.0.0.1 in the browser address bar), a form should appear, type something in it and click the “send” button, if everything was done correctly, you will see the answer of our bat- script. Now let's see what we got there.
As you might guess, the “echo” command outputs to stdout, the first thing we do is pass the header of our response to the server - “echo Content-type: text/html”. This is the standard CGI spec header that tells us whether we want to pass text or an html document, and there are other headers. A very important point is that the header must be separated from the response body by an empty line, which we do with the next “echo.” command. Next, the body of the response itself is transmitted - this is a regular html document, in the body of the document, for clarity, I display one of the environment variables passed to us by the server - “QUERY_STRING”, as already mentioned with the GET method (and this is precisely our case), all the data entered by the user, which we can observe in the script response. You may have noticed “quotes out of place” in the last 2 lines of the file, immediately after “echo”, they are there due to the specificity of bat files, as you can see, html tags are framed by the characters “<» и «>”, at the same time, these characters serve as I / O redirection in bat files, and therefore we cannot freely use them here.
I recommend playing around with similar bat scripts a little, it can be very useful, try looking at other environment variables. I’ll say a little, digressing from the topic, on UNIX systems, command interpreter languages are very well developed and the line between programming in the command interpreter language and programming in a “real” programming language is very, very blurred in some cases, so simple scripts are often written on UNIX systems specifically in command interpreter languages, but the windows interpreter cmd.exe or, earlier, command.com is clearly too weak for these purposes.
Now let's move on to the main task of this article, to actually writing a CGI program in assembler. In principle, given all of the above about CGI, we can conclude that the CGI interface requires from our program:
- The program must be able to read the standard input stream (stdin) in order to access the data passed by the POST method;
- The program must be able to write to the standard output stream (stdout) in order to send the result of its work to the server;
- It follows from the first two points that in order for the server to be able to pass something to our program on stdin, and it could answer something on stdout, the CGI program must be a console application;
- Our program must be able to read its environment variables.
Let's start with the last point. To access the environment variables of a Windows application, the GetEnvironmentStrings API function is used, the function has no arguments and returns a pointer to an array of environment variables (NAME=VALUE) separated by zero, the array is closed by a double zero, when the program is launched by the server in the program environment, in addition to standard variables, the specific CGI variables described above are added; when you run the program from the command line, you will not see them, of course.
In order to write something to stdout or read from stdin, we first need to get the handles of these streams, this is done using the GetStdHandle API function, one of the following values is passed as a function parameter:
- STD_INPUT_HANDLE - for stdin (standard input);
- STD_OUTPUT_HANDLE - for stdout (standard output);
- STD_ERROR_HANDLE - for stderr.
The function will return the handle we need for read/write operations. The next thing we need to do is write/read these streams. This is done by ordinary file read/write operations, i.e. ReadFile and WriteFile. There is one subtlety here, you might think that you can use WriteConsole / ReadConsole for these purposes, yes, this is really true for the console and it will work fine, the results, just like with WriteFile, will be displayed on the console, but this will continue until we run our program as a script on the server. This happens because when our program is started by the server, the handles returned by the “GetStdHandle” function will no longer be console handles per se, they will be pipe handles, which is necessary for communication between two applications.
Here is a small example of what a CGI assembler program should look like:
386
.modelflat,stdcall
includelib import32.lib
.const
PAGE_READWRITE=4h
MEM_COMMIT = 1000h
MEM_RESERVE=2000h
STD_INPUT_HANDLE = -10
STD_OUTPUT_HANDLE=-11
Data
hStdoutdd ?
hStdin dd ?
hMemdd ?
header:
db "Content-Type: text/html",13,10,13,10,0
start_html:
db" The environment of a CGI program looks like this:
",13,10,0
for_stdin:
db" STDIN of the program contains:
",13,10,0
end_html:
Db "",13,10,0
nwritten dd ?
toscr db 10 dup(32)
db " - File type",0
.code
_start:
Xor ebx, ebx
call GetStdHandle,STD_OUTPUT_HANDLE
mov hStdout,eax
call GetStdHandle,STD_INPUT_HANDLE
mov hStdin,eax
Call write_stdout, offset header
call write_stdout, offset start_html
Call VirtualAlloc,ebx,1000,MEM_COMMIT+MEM_RESERVE,PAGE_READWRITE
mov hMem,eax
mov edi,eax
call GetEnvironmentStringsA
mov esi,eax
next_symbol:
mov al,
or al, al
jz end_string
mov ,al
next_string:
cmpsb
jmp short next_symbol
end_string:
mov ,">rb<"
add edi,3
cmp byte ptr ,0
jnz next_string
inc edi
stosb
call write_stdout, hMem
call write_stdout, offset for_stdin
Call GetFileSize,,ebx
mov edi,hMem
call ReadFile,,edi, eax,offset nwritten, ebx
add edi,
mov byte ptr ,0
call write_stdout, hMem
call write_stdout, offset end_html
call VirtualFree,hMem
call ExitProcess,-1
Write_stdout proc bufOffs:dword
call lstrlen,bufOffs
call WriteFile,,bufOffs,eax,offset nwritten,0
ret
write_stdout endp
extrn GetEnvironmentStringsA:near
extrn GetStdHandle:near
extrn ReadFile:near
extrn WriteFile:near
extrn GetFileSize:near
extrn VirtualAlloc:near
extrn VirtualFree:near
extrn ExitProcess:near
extrn lstrlen:near
ends
end_start
The executable file is built with the commands:
tasm32.exe /ml test.asm
tlink32.exe /Tpe /ap /o test.obj
Do not forget that the program must be console.
You can call this program using the html form described above, you just need to change the name test.bat in the form to test.exe and copy it to /cgi-bin/, respectively, while you can set it in the POST request method, the program processes it.
I also want to note that you can call the program in a different way, you can create a file in the cgi-bin directory, for example, test.cgi with one single line "#! c: /_path_/test.exe" and call it in requests, and the server in in turn, it will read its first line and run the exe-file, for this it is necessary that the *.cgi extension be specified in the http-server settings as an extension for scripts. With this approach, the server will launch our program with the command line “test.exe path_to_test.exe”, this has several advantages - the first is that the person running our script will not even guess what the script is written on, the second is how it is transmitted to us file name with our line, for example, we can add any settings for our script to this file, which simplifies debugging, by the way, this is how all interpreters work - you have noticed that in all perl / php / etc programs, there is a similar line - indicating to the command interpreter itself. So, when the server starts the cgi program, if the program extension is specified as a script in the settings, it reads the first line of the file, and if it turns out to be of the format described above, then it launches the program specified in the line with the name of this file without a space, let's say that in the line indicates the pearl interpreter, he, having received such a gift, begins its execution, tk. a comment in a pearl is a “#” symbol, then it skips the first line and the script continues to execute, in general, it’s a convenient thing.