• Revista PROGRAMAR: Já está disponível a edição #53 da revista programar. Faz já o download aqui!

orium

Breve tutorial de brainfuck

1 mensagem neste tópico

Boas,

Pode haver coisa mais bela que uma liguagem que nos faz feliz por fazer aparecer meia duzia de caracteres no ecrã?

Brainfuck e' uma dessas linguagens (linguagem esotérica), que, acreditem ou não e' Turing-complete.

Todo o código fonte e' composto apenas por 8 caracteres (operadores), todos os outros caracteres são ignorados (cuidado para não usar esses caracteres no que estão a' espera de ser comentários).

Antes de mais nada eu vou postar aqui o codigo fonte do compilador... sim... o codigo fonte do compilador, leram bem... Tem apenas ~270 linhas. Podem ver no cabeçalho como o compilar.

bf.asm:

;; bf.asm: Copyright (C) 1999-2001 by Brian Raiter, under the GNU
;; General Public License (version 2 or later). No warranty.
;;
;; To build:
;;	nasm -f bin -o bf bf.asm && chmod +x bf
;; To use:
;;	bf < foo.b > foo && chmod +x foo

BITS 32

;; This is the size of the data area supplied to compiled programs.

%define arraysize	30000

;; For the compiler, the text segment is also the data segment. The
;; memory image of the compiler is inside the code buffer, and is
;; modified in place to become the memory image of the compiled
;; program. The area of memory that is the data segment for compiled
;; programs is not used by the compiler. The text and data segments of
;; compiled programs are really only different areas in a single
;; segment, from the system's point of view. Both the compiler and
;; compiled programs load the entire file contents into a single
;; memory segment which is both writeable and executable.

%define	TEXTORG		0x45E9B000
%define	DATAOFFSET	0x2000
%define	DATAORG		(TEXTORG + DATAOFFSET)

;; Here begins the file image.

	org	TEXTORG

;; At the beginning of the text segment is the ELF header and the
;; program header table, the latter consisting of a single entry. The
;; two structures overlap for a space of eight bytes. Nearly all
;; unused fields in the structures are used to hold bits of code.

;; The beginning of the ELF header.

	db	0x7F, "ELF"		; ehdr.e_ident

;; The top(s) of the main compiling loop. The loop jumps back to
;; different positions, depending on how many bytes to copy into the
;; code buffer. After doing that, esi is initialized to point to the
;; epilog code chunk, a copy of edi (the pointer to the end of the
;; code buffer) is saved in ebp, the high bytes of eax are reset to
;; zero (via the exchange with ebx), and then the next character of
;; input is retrieved.

emitputchar:	add	esi, byte (putchar - decchar) - 4
emitgetchar:	lodsd
emit6bytes:	movsd
emit2bytes:	movsb
emit1byte:	movsb
compile:	lea	esi, [byte ecx + epilog - filesize]
	xchg	eax, ebx
	cmp	eax, 0x00030002		; ehdr.e_type    (0x0002)
					; ehdr.e_machine (0x0003)
	mov	ebp, edi		; ehdr.e_version
	jmp	short getchar

;; The entry point for the compiler (and compiled programs), and the
;; location of the program header table.

	dd	_start			; ehdr.e_entry
	dd	proghdr - $$		; ehdr.e_phoff

;; The last routine of the compiler, called when there is no more
;; input. The epilog code chunk is copied into the code buffer. The
;; text origin is popped off the stack into ecx, and subtracted from
;; edi to determine the size of the compiled program. This value is
;; stored in the program header table, and then is moved into edx.
;; The program then jumps to the putchar routine, which sends the
;; compiled program to stdout before falling through to the epilog
;; routine and exiting.

eof:		movsd				; ehdr.e_shoff
	xchg	eax, ecx
	pop	ecx
	sub	edi, ecx		; ehdr.e_flags
	xchg	eax, edi
	stosd
	xchg	eax, edx
	jmp	short putchar		; ehdr.e_ehsize

;; 0x20 == the size of one program header table entry.

	dw	0x20			; ehdr.e_phentsize

;; The beginning of the program header table. 1 == PT_LOAD, indicating
;; that the segment is to be loaded into memory.

proghdr:	dd	1			; ehdr.e_phnum & phdr.p_type
					; ehdr.e_shentsize
	dd	0			; ehdr.e_shnum & phdr.p_offset
					; ehdr.e_shstrndx

;; (Note that the next four bytes, in addition to containing the first
;; two instructions of the bracket routine, also comprise the memory
;; address of the text origin.)

	db	0			; phdr.p_vaddr

;; The bracket routine emits code for the "[" instruction. This
;; instruction translates to a simple "jmp near", but the target of
;; the jump will not be known until the matching "]" is seen. The
;; routine thus outputs a random target, and pushes the location of
;; the target in the code buffer onto the stack.

bracket:	mov	al, 0xE9
	inc	ebp
	push	ebp			; phdr.p_paddr
	stosd
	jmp	short emit1byte

;; This is where the size of the executable file is stored in the
;; program header table. The compiler updates this value just before
;; it outputs the compiled program. This is the only field in the two
;; headers that differs between the compiler and its compiled
;; programs. (While the compiler is reading input, the first byte of
;; this field is also used as an input buffer.)

filesize:	dd	compilersize		; phdr.p_filesz

;; The size of the program in memory. This entry creates an area of
;; bytes, arraysize in size, all initialized to zero, starting at
;; DATAORG.

	dd	DATAOFFSET + arraysize	; phdr.p_memsz

;; The code chunk for the "." instruction. eax is set to 4 to invoke
;; the write system call. ebx, the file handle to write to, is set to
;; 1 for stdout. ecx points to the buffer containing the bytes to
;; output, and edx equals the number of bytes to output. (Note that
;; the first byte of the first instruction, which is also the least
;; significant byte of the p_flags field, encodes to 0xB3. Having the
;; 2-bit set marks the memory containing the compiler, and its
;; compiled programs, as writeable.)

putchar:	mov	bl, 1			; phdr.p_flags
	mov	al, 4
	int	0x80			; phdr.p_align

;; The epilog code chunk. After restoring the initialized registers,
;; eax and ebx are both zero. eax is incremented to 1, so as to invoke
;; the exit system call. ebx specifies the process's return value.

epilog:		popa
	inc	eax
	int	0x80

;; The code chunks for the ">", "<", "+", and "-" instructions.

incptr:		inc	ecx
decptr:		dec	ecx
incchar:	inc	byte [ecx]
decchar:	dec	byte [ecx]

;; The main loop of the compiler continues here, by obtaining the next
;; character of input. This is also the code chunk for the ","
;; instruction. eax is set to 3 to invoke the read system call. ebx,
;; the file handle to read from, is set to 0 for stdin. ecx points to
;; a buffer to receive the bytes that are read, and edx equals the
;; number of bytes to read.

getchar:	mov	al, 3
	xor	ebx, ebx
	int	0x80

;; If eax is zero or negative, then there is no more input, and the
;; compiler proceeds to the eof routine.

	or	eax, eax
	jle	eof

;; Otherwise, esi is advanced four bytes (from the epilog code chunk
;; to the incptr code chunk), and the character read from the input is
;; stored in al, with the high bytes of eax reset to zero.

	lodsd
	mov	eax, [ecx]

;; The compiler compares the input character with ">" and "<". esi is
;; advanced to the next code chunk with each failed test.

	cmp	al, '>'
	jz	emit1byte
	inc	esi
	cmp	al, '<'
	jz	emit1byte
	inc	esi

;; The next four tests check for the characters "+", ",", "-", and
;; ".", respectively. These four characters are contiguous in ASCII,
;; and so are tested for by doing successive decrements of eax.

	sub	al, '+'
	jz	emit2bytes
	dec	eax
	jz	emitgetchar
	inc	esi
	inc	esi
	dec	eax
	jz	emit2bytes
	dec	eax
	jz	emitputchar

;; The remaining instructions, "[" and "]", have special routines for
;; emitting the proper code. (Note that the jump back to the main loop
;; is at the edge of the short-jump range. Routines below here
;; therefore use this jump as a relay to return to the main loop;
;; however, in order to use it correctly, the routines must be sure
;; that the zero flag is cleared at the time.)

	cmp	al, '[' - '.'
	jz	bracket
	cmp	al, ']' - '.'
relay:		jnz	compile

;; The endbracket routine emits code for the "]" instruction, as well
;; as completing the code for the matching "[". The compiler first
;; emits "cmp dh, [ecx]" and the first two bytes of a "jnz near". The
;; location of the missing target in the code for the "[" instruction
;; is then retrieved from the stack, the correct target value is
;; computed and stored, and then the current instruction's jmp target
;; is computed and emitted.

endbracket:	mov	eax, 0x850F313A
	stosd
	lea	esi, [byte edi - 8]
	pop	eax
	sub	esi, eax
	mov	[eax], esi
	sub	eax, edi
	stosd
	jmp	short relay

;; This is the entry point, for both the compiler and its compiled
;; programs. The shared initialization code sets eax and ebx to zero,
;; ecx to the beginning of the array that is the compiled programs's
;; data area, and edx to one. (This also clears the zero flag for the
;; relay jump below.) The registers are then saved on the stack, to be
;; restored at the very end.

_start:
	xor	eax, eax
	xor	ebx, ebx
	mov	ecx, DATAORG
	cdq
	inc	edx
	pusha

;; At this point, the compiler and its compiled programs diverge.
;; Although every compiled program includes all the code in this file
;; above this point, only the eleven bytes directly above are actually
;; used by both. This point is where the compiler begins storing the
;; generated code, so only the compiler sees the instructions below.
;; This routine first modifies ecx to contain TEXTORG, which is stored
;; on the stack, and then offsets it to point to filesize. edi is set
;; equal to codebuf, and then the compiler enters the main loop.

codebuf:
	mov	ch, (TEXTORG >> 8) & 0xFF
	push	ecx
	mov	cl, filesize - $$
	lea	edi, [byte ecx + codebuf - filesize]
	jmp	short relay

;; Here ends the file image.

compilersize	equ	$ - $$

Depois de compilado, movam o ficheiro para algum lugar da $PATH (ex: /usr/local/bin) (se quiserem instalar-lo claro...).

Tambem vai decerto dar jeito um Makefile para tornar a compilação dos vossos programas mais prática:

Makefile:

srcfile=source.bf # Change me
ofile=outbin      # Change me

all: $(ofile)

$(ofile): $(srcfile)
        bf < $< > $@
        chmod 700 $@

clean:
        rm -f $(ofile)

Em brainfuck o mundo e' uma array unidimensional, e' essa a memoria que têm (inicialmente a posicao no array e' 0 e todas as celulas estão a 0). Podem navegar no array, incrementar/decrementar celulas dessa array, pedir input (apenas na forma de caracteres), escrever output, e executar loops (com uma condição imutavel).

Lista de operadores:

>   Desloca-se uma celula para a direita

<   Desloca-se uma celula para a esquera

+   Incrementa o valor da celula

-     Decrementa o valor da celula

.     Escreve o caracter correspondente a' celula actual

,     Escreve o caracter para a celula actual

[     Delimita o inicio de um loop

]     Delimita o fim de um loop

Os loops realizam-se enquanto o valor da celula actual (actual no inicio/fim do ciclo, essa expressao e' avaliada a cada iteração) for não-zero.

Ex:

[-] Isto coloca a celula actual a 0(ponto) Enquanto a celula actual não for zero decrementa 

Por exemplo, para copiar (na verdade mover) um numero para outra celula pode-se fazer:

, Pede input ao user

[->+<] Decrementa a 1º celula(virgula) Move_se para a 2º(virgula) incremeta_a(virgula) e volta novamente para a 1º(ponto)
              Assim que a 1º celula estiver a 0(virgula) tudo estara' codiado(virgula) e o loop termina

> Move_se para a 2º celula
. Escreve o resultado (que deve ser igual ao input)

Um exemplo parecido que permite somar dois numeros:

,>,< Pede 2 caracteres ao user

[->+<]

>.

Nota: E' provavel que o caracter somado não tenha repesentacao grafica, um bom caracter a somar e' o newline (10).

Outro programa interesante e' o que lê uma scring e mostra invertida:

-- -- -- -- -- Simula uma iteracao(virgula) no loop esta celula vai passar a ter o valor zero(virgula) que vai também ser o nosso indicador de fim 
                  quando formos ler a string ao contrario
[
        ++ ++ ++ ++ ++ Incrementa 10 que foi o valor decrementado na iteracao anterior(virgula) com o unico proposito de fazer o loop terminar
        > Avanca para ler mais um caracter

        , Le um caracter

        -- -- -- -- --  Decremeta 10 para que se o caracter lido for o newline (que vale 10) o ciclo termine
]

< O ultimo caracter lido foi o newline(virgula) e a celula actual vale zero

[.<] Escreve todos os caracteres ate encontrar a celula a zero(virgula) coisa que e' assegurada logo no inicio do programa

++ ++ ++ ++ ++ . Escreve uma newline (que valendo 10(virgula) como temos a certeza da 1º celula estar a 0(virgula) basta isto)

Ha ate' quem seja suficientemente génio/louco para escrever em brainfuck um algorithmo de teste de primalidade: http://esoteric.sange.fi/brainfuck/bf-source/prog/PRIME.BF. Podem ver mais programas interesantes em http://esoteric.sange.fi/brainfuck/bf-source/prog/

Espero que se divirtam tanto como eu me diverti a escrever esta liguagem tão interesante.

Codigo e tutorial: Diogo Sousa aka orium (Considerem o código GPL3)

0

Partilhar esta mensagem


Link para a mensagem
Partilhar noutros sites

Crie uma conta ou ligue-se para comentar

Só membros podem comentar

Criar nova conta

Registe para ter uma conta na nossa comunidade. É fácil!


Registar nova conta

Entra

Já tem conta? Inicie sessão aqui.


Entrar Agora