Apache NuttX and small systems - NuttX Core Size
We continue our exploration of Apache NuttX for small embedded systems. In the previous post, we examined a simple "Hello, World!" example and explored how small it could be on NuttX.
Now, we take it a step further by disabling all possible NuttX features, allowing the toolchain to remove as much code as possible. This approach leaves us with the core of NuttX—components that can't be eliminated through configuration options and compiler optimizations.
Finally, we implement a trivial application using two different approaches—the POSIX-compliant method and the non-portable alternative—highlighting the trade-offs between achieving portability and minimal system size.
NuttX core image
Ready-to-compile code and configurations are available in this repository. Before going any further, I highly recommend viewing the previous post from this series, as this post builds on the information covered earlier.
Our goal now is to prepare three slightly different configurations. Each subsequent configuration will strip away key OS components until the system is no longer POSIX-compliant but remains functional.
The new application code is the simplest possible loop, ensuring that no additional OS code is included in our image:
Config 1
We start from the setup we completed last time, but we enable our new
oscore
application and boot into it:
CONFIG_ARCH="arm" CONFIG_ARCH_BOARD_CUSTOM=y CONFIG_ARCH_BOARD_CUSTOM_DIR="boards/arm/stm32/nucleo-f302r8/" CONFIG_ARCH_BOARD_CUSTOM_DIR_RELPATH=y CONFIG_ARCH_BOARD_CUSTOM_NAME="nucleo-f302r8" CONFIG_ARCH_BOARD_NUCLEO_F302R8=y CONFIG_ARCH_BUTTONS=y CONFIG_ARCH_CHIP="stm32" CONFIG_ARCH_CHIP_STM32=y CONFIG_ARCH_CHIP_STM32F302R8=y CONFIG_ARCH_MINIMAL_VECTORTABLE=y CONFIG_ARCH_MINIMAL_VECTORTABLE_DYNAMIC=y CONFIG_ARCH_NUSER_INTERRUPTS=5 CONFIG_BOARD_LOOPSPERMSEC=16717 CONFIG_DEBUG_FULLOPT=y CONFIG_DEBUG_SYMBOLS=y CONFIG_DEFAULT_SMALL=y CONFIG_DEFAULT_TASK_STACKSIZE=384 CONFIG_IDLETHREAD_STACKSIZE=384 CONFIG_INIT_ENTRYPOINT="oscore_main" CONFIG_INTELHEX_BINARY=y CONFIG_LTO_FULL=y CONFIG_NAME_MAX=0 CONFIG_NFILE_DESCRIPTORS_PER_BLOCK=3 CONFIG_PATH_MAX=32 CONFIG_PID_INITIAL_COUNT=3 CONFIG_RAILAB_MINIMAL_OSCORE=y CONFIG_RAM_SIZE=16386 CONFIG_RAM_START=0x20000000 CONFIG_RAW_BINARY=y CONFIG_SIG_ALLOC_ACTIONS=0 CONFIG_SIG_PREALLOC_ACTIONS=0 CONFIG_SIG_PREALLOC_IRQ_ACTIONS=0 CONFIG_START_DAY=6 CONFIG_START_MONTH=12 CONFIG_START_YEAR=2011 CONFIG_STDIO_BUFFER_SIZE=32 CONFIG_STM32_JTAG_SW_ENABLE=y CONFIG_STM32_USART2=y CONFIG_TASK_NAME_SIZE=0 CONFIG_USART2_RXBUFSIZE=0 CONFIG_USART2_SERIAL_CONSOLE=y CONFIG_USART2_TXBUFSIZE=32
The resource consumption is as follows:
For comparison, let's look at the results we got for "Hello, World!":
As we can see, there is not much difference here. The empty program is slightly
smaller than the printing one. With the console enabled, the overhead of
printf
and sleep
support is negligible.
The console support code is included in the image anyway, leaving no way for the compiler to optimize it out.
The next step is to remove console support.
Config 2
In this configuration we completely disable support for the serial port and console. If our application doesn't require UART support, this is an easy optimization. The obvious downside is the lack of printing capabilities; therefore, printf-debugging becomes impossible.
Without /dev/console
, the system won't be able to initialize standard I/O
streams, which is a POSIX violation. Serial port support is not required, but
file descriptors 0, 1, and 2 are reserved for stdin
, stdout
, and
stderr
, respectively.
NuttX allows you to redirect standard streams to dev/null
if the console is
not supported. We just need to enable support for the NULL device.
The modifications in the config are as follows:
The memory report is:
This saves 5,768 bytes of FLASH and 196 bytes of SRAM compared to the console-enabled setup with UART—a significant reduction!
Config 3
Now let's take one final step and disable both /dev/console
and /dev/null
.
This way, we should completely remove file system support, as there are no files
used in our image. Since we don't use files at all, we can also disable file
descriptor cloning when a new task is started. At this point, our system is
intentionally no longer POSIX-compliant.
Changes in configuration:
While compiling our new program, we notice an additional warning that appears:
external/nuttx/sched/group/group_setupidlefiles.c:115:4: warning: #warning file descriptors 0-2 are not opened [-Wcpp]
There is no available device that can be used as a backend for file descriptors 0-2. This means that any OS feature using standard I/O streams is no longer allowed.
The result is:
We notice a huge reduction in FLASH, and we easily broke the 10KB FLASH barrier.
At this point, everything that can be disabled has been disabled, and everything the compiler is able to remove has been removed. Without modifying the kernel sources, we can't go any lower for the architecture used. I think we can call this the "NuttX Core."
Below is the complete list of symbols, with a comment about the OS module to which each one belongs:
00000001 b g_nx_initstate | sched 00000002 b g_ino | fs 00000002 t oscore_main | apps 00000004 b g_errno | libc 00000004 t g_idle_topstack | arch 00000004 d g_irqmap_count | sched 00000004 b g_lastpid | sched 00000004 b g_mmheap | mm 00000004 b g_npidhash | sched 00000004 b g_pidhash | sched 00000004 b g_reboot_notifier_list | sched 00000004 b g_running_tasks | sched 00000004 b g_system_ticks | sched 00000006 T abort | libc 00000008 b g_inactivetasks | sched 00000008 b g_pendingtasks | sched 00000008 b g_readytorun | sched 00000008 b g_sigpendingaction | sched 00000008 b g_sigpendingirqaction | sched 00000008 b g_sigpendingirqsignal | sched 00000008 b g_sigpendingsignal | sched 00000008 b g_waitingforsignal | sched 00000008 d g_wdactivelist | sched 0000000a t start | arch 0000000c T __assert | libc 0000000c d g_sync_nb | fs 0000000c t tls_get_info | libc 00000010 t panic_notifier_call_chain | sched 00000012 t memset.constprop.0 | libc 00000014 t __errno | libc 00000014 t free | mm 00000014 t sq_remfirst | misc 00000018 t irq_unexpected_isr | sched 00000018 t memcpy.constprop.0.isra.0 | libc 00000018 t sched_lock.isra.0 | sched 0000001c t arm_svcall | arch 0000001c t nxsched_gettid | sched 0000001e t inode_free | fs 00000020 t strlcpy.isra.0 | libc 00000024 t up_release_stack.isra.0 | arch 00000026 t wd_cancel.isra.0 | sched 00000028 b g_irqvector | sched 0000002c T arm_doirq | arch 0000002c t up_mdelay.constprop.0 | arch 0000002c t zalloc | mm 00000030 T up_saveusercontext | arch 00000038 t group_postinitialize | sched 00000038 b g_tasklisttable | sched 00000038 t irq_dispatch | sched 0000003e t tls_init_info | sched 00000040 t exception_direct | arch 00000040 t nxtask_start | sched 0000004c t nxsig_release_pendingsigaction | sched 0000004c t sync_reboot_handler | fs 00000050 b g_last_regs | arch 00000050 t group_initialize | sched 00000050 t irq_attach.constprop.0.isra.0 | sched 00000050 t stm32_timerisr | arch 00000054 t arm_hardfault | arch 00000054 t nxsched_release_tcb.isra.0 | sched 00000054 t nxsched_remove_readytorun | sched 0000005c t sched_unlock.isra.0 | sched 00000060 t nxsched_merge_pending | sched 00000062 T exception_common | arch 00000062 b g_irqmap | sched 0000006c t up_initial_state | arch 0000007c b g_idletcb | sched 00000096 t task_fssync | fs 0000009a t files_putlist.part.0 | fs 000000a0 t nxsched_add_readytorun | sched 000000a8 b g_kthread_group | sched 000000d0 b g_sigpool | sched 000000dc t mm_unlock | mm 000000e0 t _exit.isra.0 | sched 000000e4 T _assert | sched 000000fa t mm_delayfree.constprop.0 | mm 00000110 t mm_malloc | mm 00000120 t mm_lock | mm 0000013c t group_leave | sched 00000180 T __start | arch 00000188 T _vectors | arch 00000734 t nx_start | sched
Now, let's look at memory usage per OS module for FLASH:
OS module |
sched |
arch |
mm |
fs |
libc |
misc |
apps |
---|---|---|---|---|---|---|---|
FLAS Size [B] |
3744 |
1424 |
1094 |
422 |
124 |
20 |
2 |
And next, for SRAM:
OS module |
sched |
arch |
fs |
libc |
mm |
---|---|---|---|---|---|
SRAM Size [B] |
795 |
80 |
14 |
4 |
4 |
Most of the symbols come from sched
and arch
which is what you would
expect.
When adding data from the tables, we can see that the results differ from those returned after compilation. I don't know exactly where this comes from. Part of the difference may be due to data alignment, but even with that, the numbers still don't add up. If anyone knows the explanation, please let me know.
In this configuration, we intentionally don't use files, so more OS logic has been
removed by the compiler. We dropped almost all logic related to the file system,
but it's interesting that there are 422 bytes of code left from fs
.
We basically removed all kernel code responsible for hardware abstraction,
which allows separation of kernel space from user space.
Now the question is whether the OS in this state makes sense at all and can be practically used. Are we able to implement any application without files in NuttX? It depends. If we accept the loss of portability and design an application in a non-POSIX way, it's possible. In the next section I'll show how.
blinky
This time, we'll implement the classic "blinky" example. The goal here is to demonstrate the use of NuttX in a non-standard way.
Here, we have two versions of minimalistic "blinky": one implemented in a portable way using files and the other not POSIX-compliant but with minimal resource usage. The functionality of both applications will be the same; the only difference is that one will use a portable interface, while the other won't. The basis for both examples is "Config 3" from above.
POSIX-way blinky
Let's start with the file-based version. In this case, we'll use the user LED driver available in NuttX. For this, we need to disable the LED control by the OS and give the control to the user application. An alternative solution would be to use a GPIO driver, but we won't focus on that here.
The required configuration changes are:
The application code is shown below:
#include <nuttx/config.h> #include <sys/ioctl.h> #include <unistd.h> #include <fcntl.h> #include <nuttx/leds/userled.h> #define LEDS_DEVPATH "/dev/userleds" int main(int argc, FAR char *argv[]) { userled_set_t ledset; int ret; int fd; /* Open user LED device */ fd = open(LEDS_DEVPATH, O_WRONLY); if (fd < 0) { return -1; } while (1) { /* Toggle LED */ ledset ^= 1; /* Set LED */ ret = ioctl(fd, ULEDIOC_SETALL, ledset); if (ret != 0) { return -1; } /* Wait some time */ sleep(1); } return 0; }
This version of the example gives us:
Non-portable blinky
Now, it's time for a file-free implementation to eliminate the file abstraction from our firmware. In this case, we'll use the STM32 architecture features directly, in a non-portable manner.
The user LED driver support is no longer needed, but we have to keep
CONFIG_ARCH_LEDS=n
.
To access architecture-specific APIs from the application context, we need to
manually add the architecture directory to the build system.
For instance, when compiling NuttX with CMake and our application is called blinky2
,
we have to add the following lines to the application's CMakeLists.txt
:
target_include_directories(apps_blinky2 PRIVATE ${CMAKE_SOURCE_DIR}/arch/arm/src/stm32) target_include_directories(apps_blinky2 PRIVATE ${CMAKE_SOURCE_DIR}/arch/arm/src/common)
The changes to the previous program are straightforward: we directly configure the GPIO, and instead of changing the LED state via the file interface, write the GPIO state directly using the architecture interface. The modified code looks as follows:
#include <nuttx/config.h> #include <unistd.h> #include "stm32.h" #define GPIO_LED1 (GPIO_OUTPUT|GPIO_PUSHPULL|GPIO_SPEED_50MHz| \ GPIO_OUTPUT_CLEAR|GPIO_PORTB|GPIO_PIN13) int main(int argc, FAR char *argv[]) { bool ledon = false; /* Initialize LED GPIO */ stm32_configgpio(GPIO_LED1); while (1) { /* Toggle led */ ledon ^= 1; /* Set led */ stm32_gpiowrite(GPIO_LED1, ledon); /* Wait some time */ sleep(1); } return 0; }
It's worth noting that this code can be portable across architectures that define the same API functions for GPIO. As a result, it'll work on most STM32 chips supported in NuttX. However, due to some inconsistencies in architecture ports, it's not compatible with all STM32 chips at the time of writing this.
The resource usage for this implementation is:
The difference between the file-based version and the non-portable version is 4,380 bytes of FLASH and 84 bytes of SRAM.
This simple example demonstrates the GPIO interface, but NuttX on STM32 offers other easy-to-use low-level APIs like DMA, timers, PWM, or ADC. Additionally, bus drivers like SPI or I2C in NuttX are designed to provide low-level interfaces for kernel drivers that are used without files. We can abuse this interface and call it directly in user space. Finally, hardware description headers are available—often of much higher quality than the code provided by vendors—enabling direct manipulation of registers.
The presented approach is applicable only in the NuttX FLAT build, where there's no hardware protected separation between user space and kernel code. For small systems, however, this is the only sensible architecture because it requires less powerful chips.
Summary
We have collected some data on the size of the NuttX image under various simple scenarios. This data can serve as a baseline for future resource usage analysis in NuttX releases.
If we want to save more space, we can manually modify the kernel code by excluding functions that we know we won't use and which can't be removed by the compiler. For example, eliminating the remaining signal logic. However, since major improvements are unlikely in this area, I don't discuss this topic.
Even without the file interface, NuttX still offers many features that
we can use, such as the architecture-specific code, libc
, libm
,
synchronization mechanisms, and more. By utilizing NuttX in this manner,
we can achieve very low resource usage.
While it's technically possible to use non-portable interfaces, doing so is generally not recommended for NuttX users, as it sacrifices the primary advantage of the OS: portability and modularity. For minimal applications, there are certainly more appropriate tools available. In many small system cases, it's likely you don't even need an RTOS.
But... if, for some reason, you truly want to use NuttX for a small project, and you're fully aware of the disadvantages of mentioned solutions, do what you want with your code ;) With a few small tweaks to the build system, you can easily access architecture-specific APIs and register definitions directly from your application. Personally, I prefer to work with a single tool when possible (Emacs fan here), even if it means hacking it to suit my needs. For this reason, I push NuttX to its limits in my small projects.
That’s all for today. Now that we have explored how small NuttX can be in its simplest form, it’s time to examine the requirements for more advanced OS features that we’ve kept disabled so far. In the next post, we’ll take a closer look at these through simple examples to determine which ones are suitable for small systems.
Comments