David Kano Ikeda

Overview

Quokka is a relatively small language I've been working on (around 20 default keywords), that allows communication and allocation of resources over any standard USB cable. It's (It shouldent be) not limited by the port either, as long as both devices either have a USB port, or they have RX/TX pins, and the ability to use those ports, the language works.

A quick note

This project is still under development, at the time of writing this (3/5/26), I have finished most of the language (Lexer, Ast, Parser, and testing scripts) I plan to finish the compiler by the end of the month, view the progress here: https://github.com/davidikeda/quokka

Lexer

The lexer is custom made, in C of course. Originally I thought about going with a program that would generate the lexer (Flex/Bison), but I decided to just code it myself for more control. There is two main parts of the lexer (right now), the main C program, and the lexer_test.c which is what gets turned into an executable.

lexer_test.c

//
// Created by David Ikeda on 2/10/2026.
//

#include "../lexer.h"
#include <stdio.h>
#include <stdlib.h>

static const char *tokenName(TokenType t)
{
    switch (t)
    {
        case TOK_EOF: return "EOF";
        case TOK_IDENTIFIER: return "IDENTIFIER";
        case TOK_ELSE: return "ELSE";
        case TOK_NUMBER: return "NUMBER";
        case TOK_STRING: return "STRING";
        //etc.... 
        default: return "UNKNOWN";
    }
}

int main(void)
{
    FILE *f = fopen("test.qk", "r");
    if (!f)
    {
        perror("test.qk");
        return 1;
    }

    Lexer *lx = lexerInit(f);

    for (;;)
    {
        Token tok = lexerNextToken(lx);
        printf("[%d:%d] %-12s %s\n",
            tok.line,
            tok.column,
            tokenName(tok.type),
            tok.value ? tok.value : "");

        if (tok.value) {
            free(tok.value);
        }

        if (tok.type == TOK_EOF)
            break;
    }

    lexerFree(lx);
    fclose(f);
    return 0;
}

test.qk is the file I use for lexer testing, this is (hopefully) what the syntax ends up being. I aimed for a mix of C, CPP, Python, and a little bit of Java

test.qk

@import "logging.j"
@import "usb_driver.j"

new device USB1 as Keyboard;
USB1.connect();

if (USB1.status() == "connected") then {
    USB1.write(header="KEY-UP", payload="A");
    USB1.write(header="KEY-DOWN", payload="A");
} else {
    log("Keyboard not detected, aborting.");
};

After bundling and building lexer_test.c this was the output. Something I noticed which I need to fix is that the end of the lines characters like semicolons get put one row ahead of what they are. They still get parsed in the correct order, but the lexer is saying its somewhere its not. fixed 2/17

plaintext

[1:0] AT           
[1:1] IMPORT       
[1:8] STRING       logging.j
[2:0] AT           
[2:1] IMPORT       
[2:8] STRING       usb_driver.j
[4:0] NEW          
[4:4] IDENTIFIER   device
[4:11] IDENTIFIER   USB1
[4:16] AS           
[4:19] IDENTIFIER   Keyboard
[4:27] SEMICOLON    
[5:0] IDENTIFIER   USB1
[5:4] DOT          
[5:5] CONNECT      
[5:12] LPAREN       
[5:13] RPAREN       
[5:14] SEMICOLON    
[7:0] IF           
[7:3] LPAREN       
[7:4] IDENTIFIER   USB1
[7:8] DOT          
[7:9] IDENTIFIER   status
[7:15] LPAREN       
[7:16] RPAREN       
[7:18] EQUAL        
[7:21] STRING       connected
[7:32] RPAREN       
[7:34] THEN         
[7:39] LBRACE       
[8:4] IDENTIFIER   USB1
[8:8] DOT          
[8:9] WRITE        
[8:14] LPAREN       
[8:15] IDENTIFIER   header
[8:21] ASSIGN       
[8:22] STRING       KEY-UP
[8:30] COMMA        
[8:32] IDENTIFIER   payload
[8:39] ASSIGN       
[8:40] STRING       A
[8:43] RPAREN       
[8:44] SEMICOLON    
[9:4] IDENTIFIER   USB1
[9:8] DOT          
[9:9] WRITE        
[9:14] LPAREN       
[9:15] IDENTIFIER   header
[9:21] ASSIGN       
[9:22] STRING       KEY-DOWN
[9:32] COMMA        
[9:34] IDENTIFIER   payload
[9:41] ASSIGN       
[9:42] STRING       A
[9:45] RPAREN       
[9:46] SEMICOLON    
[10:0] RBRACE       
[10:2] ELSE         
[10:7] LBRACE       
[11:4] IDENTIFIER   log
[11:7] LPAREN       
[11:8] STRING       Keyboard not detected, aborting.
[11:42] RPAREN       
[11:43] SEMICOLON    
[12:0] RBRACE       
[12:1] SEMICOLON    
[13:0] EOF

The way the lexer works is pretty standard. The function that gets called the most is lexerAdvance(), this function does what is sounds like, it advances the "cursor" to the next character by 1 (lx->column++). What I did to make comments work is just to ignore them (again very standard). This is accomplished by the lexerSkipWhitespace() function. The function is just a while loop that checks if there is a space, advances, and if lx->current == '/' && lx->next == '/' and the next lines are not either a newline, or the End Of File (EOF), it just advances until it reaches the end of the line, effectively ignoring comments. Since I have not added every keyword yet, there is only 20 currently implemented out of the 152 (give or take) planned. Now I know that doesn't sound like half done at all, but its just brainless work which will take me an hour or two. Anyway, since most of the language is the abundance of keywords, there is not much else. The rest of the lexer is making tokens (makeToken()), identifying tokens, numerics, strings, and symbols (which personally I don't count as keywords even though they kind of are).

AST

[placeholder for text :P]

Parser

[placeholder for text :P]

Validators

[placeholder for text :P]

Compiler

[placeholder for text :P]