ProxmoxVE

mirror of https://github.com/community-scripts/ProxmoxVE.git synced 2026-03-03 17:55:53 +00:00

Author	SHA1	Message	Date
CanbiZ (MickLesk)	ebc3512f50	fix: improve error trace propagation for telemetry - post_update_to_api: Attempts 2/3 now send medium_error (16KB truncated log) instead of short_error (generic description only). This is the primary fix — when attempt 1 fails (120KB payload too large/timeout), attempts 2/3 no longer discard all log data. - _send_abort_telemetry: Increased container fallback from 20 to 200 log lines (capped at 16KB). Added SILENT_LOGFILE as fallback source. Added exit code explanation header and error_category to payload. - get_error_text/get_full_log: Added SILENT_LOGFILE as last-resort fallback when INSTALL_LOG, combined log, and BUILD_LOG are all empty/missing.	2026-03-02 14:38:55 +01:00
CanbiZ (MickLesk)	17de8e761b	fix: replace generic exit 1 with specific exit codes in ct/ and install/ scripts (#12475 ) Part of #12467 — scripts only (no framework changes). New exit codes 250-254 registered in api.func and error_handler.func: - 250: App download failed or version not determined - 251: App file extraction failed (corrupt/incomplete archive) - 252: App required file or resource not found - 253: App data migration required — update aborted - 254: App user declined prompt or input timed out Existing codes reused where applicable: - 10: privileged/Docker required (unifi-os-server) - 64: invalid user input (postgresql, tomcat) - 71: system error (pulse useradd) - 150: service failed to start (docker, npmplus) - 153: build failed (booklore) - 233: app not installed (evcc, endurain, grafana, loki, itsm-ng) - 236: hardware not detected (unifi-os-server /dev/net/tun) - 238: OS not supported (frigate)	2026-03-02 13:57:42 +01:00
CanbiZ (MickLesk)	5dc244a8c1	core: standardize exit codes and add mappings (#12467 ) * Standardize exit codes and add mappings Replace generic exit 1 usages with specific numeric exit codes and add corresponding explanations to the error lookup. This commit updates multiple misc/* scripts to return distinct codes for validation, Proxmox/LXC, networking, download and curl errors (e.g. 103-123, 64, 107-120, 206, 0 for explicit user cancels). It also updates curl error handling to propagate the original curl exit code and adds new entries in explain_exit_code and the error handler to improve diagnostics. * Set exit code 115 for update_os errors Change exit status from 6 to 115 in misc/alpine-install.func's update_os() error handlers when failing to download tools.func or when the expected functions are missing. This gives a distinct exit code for these specific failure cases. * Add tools/addon exit codes and use them Introduce exit codes 232-238 for Tools & Addon scripts in misc/api.func and misc/error_handler.func. Update addon scripts (tools/addon/adguardhome-sync.sh, tools/addon/copyparty.sh, tools/addon/cronmaster.sh) to return specific codes instead of generic exit 1: 238 for unsupported OS and 233 when the application is not installed/upgrade prerequisites are missing. This makes failures more descriptive and aligns scripts with the central error explanations. * Standardize exit codes in exporter addons Unify exit codes across exporter addon scripts: return 238 for unsupported OS detections and 233 when an update is requested but the exporter is not installed. Applied to nextcloud-exporter.sh, pihole-exporter.sh, prometheus-paperless-ngx-exporter.sh, and qbittorrent-exporter.sh to make failure modes distinguishable for callers/automation. * Use specific exit codes in addon scripts Replace generic exit 1 with distinct exit codes across multiple addon scripts to enable finer-grained error handling in automation. Exit codes introduced: 10 for Docker/Compose missing or user-declined Docker install, 233 for "nothing to update" cases, and 238 for unsupported OS cases. Affected files: tools/addon/arcane.sh, coolify.sh, dockge.sh, dokploy.sh, filebrowser-quantum.sh, filebrowser.sh, immich-public-proxy.sh, jellystat.sh, runtipi.sh. * Use specific exit codes in addon scripts Replace generic exit 1 with specific exit codes across multiple addon scripts to improve error signaling and handling. Files updated: tools/addon/add-netbird-lxc.sh (exit 238 on unsupported distro), tools/addon/add-tailscale-lxc.sh (treat user cancel as exit 0), tools/addon/glances.sh (exit 233 when not installed), tools/addon/komodo.sh (distinct exits for missing compose, legacy DB, backup/download failures, docker checks), tools/addon/netdata.sh (distinct exits for unsupported PVE versions, OS/codename detection, repo lookups), and tools/addon/phpmyadmin.sh (distinct exits for unsupported OS, network/download issues, package install/start failures, and invalid input). These changes make failures easier to identify and automate recovery or reporting. * Use specific exit codes in PVE scripts Replace generic exit 1 with distinct exit codes across tools/pve scripts to provide clearer failure signals for callers. post-pve-install.sh now returns 105 for unsupported Proxmox versions; pve-privilege-converter.sh uses 104 for non-root, 234 when no containers, and 235 for backup/conversion failures; update-apps.sh maps backup failures to 235, missing containers/selections to 234 (and UI cancellations to 0), missing backup storage to 119, and returns the actual container update exit code on failure. These changes improve diagnostics and allow external tooling to react to specific error conditions. * Standardize exit codes and behaviors Adjust exit codes and abort handling across multiple PVE helper scripts to provide clearer outcomes for automation and interactive flows. Changes include: - container-restore-from-backup.sh, core-restore-from-backup.sh: return 235 when no backups found (was 1). - fstrim.sh: treat user cancellation of non-ext4 warning as non-error (exit 0 instead of 1). - kernel-clean.sh: treat no selection or user abort as non-error (exit 0 instead of 1). - lxc-delete.sh: return 234 when no containers are present; treat no selection as non-error (exit 0). - nic-offloading-fix.sh: use specific non-zero codes for root check and tool install failures (exit 104, 237) and 236 when no matching interfaces (was 1). - pbs_microcode.sh, post-pmg-install.sh, post-pbs-install.sh: use distinct exit codes (232 and 105) for detected VM/PVE/unsupported distro conditions instead of generic 1. These modifications make scripts return distinct codes for different failure modes and ensure user-initiated aborts or benign conditions exit with 0 where appropriate. * Use exit 105 for unsupported PVE versions Standardize error handling by replacing generic exit 1 with exit 105 in pve_check() across multiple VM template scripts to indicate unsupported Proxmox VE versions. Also add API exit code 226 message for "Proxmox: VM disk import or post-creation setup failed" in misc/api.func. Affected files include misc/api.func and various vm/-vm.sh scripts. Use specific exit codes in VM scripts Replace generic exit 1 with distinct exit codes across vm/*.sh to make failures more actionable for callers. Changes include: use 226 for missing imported-disk references, 237 for pv installation failures, 115 for download/extract/ISO-related failures, 214 for insufficient disk space during FreeBSD decompression, and 119 for missing storage detection. Updated scripts: archlinux-vm.sh, docker-vm.sh, haos-vm.sh, openwrt-vm.sh, opnsense-vm.sh, truenas-vm.sh, umbrel-os-vm.sh.	2026-03-02 10:55:20 +01:00
CanbiZ (MickLesk)	9f15ca6242	Simplify lxc-attach logging and terminal handling Remove host-side tee capture of lxc-attach output and PIPESTATUS handling; lxc-attach is now invoked directly and the exit code is taken from $?. Simplify install log retrieval by pulling /root/.install-<SESSION_ID>.log directly and removing the fallback that used the host-captured terminal log, related temp-file size checks, and timeout logic. Remove terminal-state restores and input-draining (stty/dd) and stop redirecting reads from /dev/tty so interactive reads use standard input; similar simplifications applied to the retry flow. Also remove cleanup of the discarded capture log. These changes reduce complexity and terminal manipulation, at the cost of losing the previous terminal-capture fallback for installs that failed to produce a container-side log.	2026-03-02 10:46:25 +01:00
CanbiZ (MickLesk)	750b904abc	Flush terminal input and make repo cache global Restore and sanitize terminal state before prompting by draining stale input from /dev/tty (dd iflag=nonblock) and adding a short sleep, then perform timed reads from /dev/tty in misc/build.func and misc/error_handler.func. Also make _REPO_CACHE a global associative array (declare -gA) with fallbacks in misc/tools.func so the cache survives when tools.func is sourced inside a function scope.	2026-03-02 10:40:17 +01:00
CanbiZ (MickLesk)	96b5411d1d	Use PG_VERSION var; default to vendor repos Pass PG_VERSION from the Mealie installer (replace POSTGRES_VERSION with PG_VERSION) and update misc/tools.func to prefer vendor package repos by default. Adjusted comments/examples for setup_mysql and setup_postgresql to reflect the new default behavior, and changed the local default for USE_MYSQL_REPO to true. These changes align variable naming in the installer and clarify that official MySQL/PGDG repositories are used unless explicitly disabled.	2026-03-02 10:35:30 +01:00
CanbiZ (MickLesk)	a3404102ce	Update tools.func	2026-03-02 10:29:04 +01:00
CanbiZ (MickLesk)	374c4492d9	Read prompts from a fresh /dev/tty to avoid tty corruption Replace pre-opened _RECOVERY_TTY handling with direct reads from /dev/tty in misc/build.func and misc/error_handler.func. The change opens /dev/tty at prompt time (with stty sane) so the prompt reads aren't affected by tty state corruption from lxc-attach\|tee, simplifies the read logic by using a local response variable with a timeout, and removes the pre-open/close bookkeeping for _RECOVERY_TTY.	2026-03-02 10:25:19 +01:00
CanbiZ (MickLesk)	3cfe86512d	Fix read failures in install scripts and recovery menu Two critical bugs fixed: 1. Install scripts (80+) using 'read' for interactive prompts all fail because lxc-attach stdin was redirected from /dev/null. Change to /dev/tty so install scripts like immich, elementsynapse, etc. can prompt the user interactively. 2. Recovery menu read gets 'Input/output error' from /dev/tty after the lxc-attach\|tee pipeline corrupts the terminal state. Pre-open a separate file descriptor to /dev/tty BEFORE the pipeline starts. This fd survives any tty corruption and is used as fallback for the recovery menu read. Fixes the 'command not found' issue where user input falls through to the parent shell. Both build.func (main install + APT retry) and error_handler.func (fallback cleanup prompt) are updated with the same pattern.	2026-03-02 10:05:55 +01:00
CanbiZ (MickLesk)	53efcdc9df	Redirect lxc-attach stdin; restore terminal Prevent the lxc-attach pipeline from consuming the host's stdin by redirecting its stdin from /dev/null, keeping /dev/tty available for the recovery menu after SIGINT or failures (avoids "read: read error: Input/output error"). Also restore terminal state after the pipeline by running `stty sane` (errors ignored). Applied these changes to both installation invocation sites in misc/build.func.	2026-03-02 09:29:55 +01:00
CanbiZ (MickLesk)	d9e53d5a16	Remove non-TTY static spinner fallback Delete the stderr TTY check and the static spinner printf/early return in msg_info. The function now always calls color_spinner and starts the animated spinner in the background, removing the special-case for piped/non-TTY environments and simplifying terminal handling.	2026-03-02 09:26:24 +01:00
CanbiZ (MickLesk)	e3af8ad287	Only strip v for digit tags; sanitize version Improve GitHub release tag handling: only remove a leading 'v' when it's followed by a digit (avoids mangling tags like "version/..."), and sanitize the derived version string for filenames by replacing '/' with '-'. Use the sanitized version when constructing the downloaded tarball filename to prevent invalid or unexpected paths.	2026-03-02 09:16:25 +01:00
CanbiZ (MickLesk)	f2970522a9	Handle non-TTY output; simplify verbosity check Detect non-TTY stderr in msg_info() and print a static progress indicator instead of launching the background spinner (which is unreliable when output is piped). Remove the non-TTY check from is_verbose_mode() and add comments clarifying that non-TTY behavior is handled in msg_info(). Apply the same verbosity simplification to vm-core.func. This keeps spinner visuals working when passed through pipes while avoiding backgrounding issues.	2026-03-02 09:08:55 +01:00
CanbiZ (MickLesk)	edd88d33d5	tools.func: Improve stability with retry logic, caching, and debug mode (#10351 ) * refactor(tools.func): use distro packages by default for stability - fetch_and_deploy_gh_release: add validation for empty app names - Derives app name from repo if not provided - Prevents '/root/.: Is a directory' error (fixes #10342) - setup_hwaccel: fix Intel driver app names for fetch_and_deploy_gh_release - Add proper app names: intel-igc-core, intel-igc-opencl, libigdgmm12, intel-opencl-icd - setup_mariadb: use distro packages by default - Default: apt packages (default-mysql-server, mariadb-server) - Optional: USE_MARIADB_REPO=true for official MariaDB repo - Fixes GPG key/mirror availability issues - setup_mysql: use distro packages by default - Default: apt packages (default-mysql-server, mysql-server) - Optional: USE_MYSQL_REPO=true for official MySQL repo - Keeps Debian Trixie 8.4 LTS handling when using official repo - setup_postgresql: use distro packages by default - Default: apt packages (postgresql, postgresql-client) - Optional: USE_PGDG_REPO=true for official PGDG repo - setup_docker: use distro packages by default - Default: docker.io package - Optional: USE_DOCKER_REPO=true for official Docker repo - Maintains Portainer support in both modes This refactoring prioritizes stability by using well-tested distro packages while maintaining the option to use official repos for specific version requirements. * feat(tools.func): add retry logic and debug mode for stability New helper functions: - curl_with_retry: Robust curl wrapper with retry logic (3 attempts) - curl_api_with_retry: API calls with HTTP status handling - download_gpg_key: GPG key download with retry and dearmor support - debug_log: Conditional debug output when TOOLS_DEBUG=true Replaced critical curl calls: - MongoDB GPG key download - NodeSource GPG key download - PostgreSQL GPG key download - PHP (Sury) keyring download - MySQL GPG key download - setup_deb822_repo GPG import Benefits: - Automatic retry on transient network failures - Configurable timeouts (CURL_TIMEOUT, CURL_CONNECT_TO) - Debug mode for troubleshooting (TOOLS_DEBUG=true) - Consistent error handling across all GPG key imports * feat(tools.func): extend retry logic to all major downloads Added curl_with_retry to all critical download operations: - Adminer download - Composer installer - FFmpeg (binary and source) - Go tarball - Ghostscript source - ImageMagick source - rbenv and ruby-build - uv (astral-sh) - yq binary - Go version check Extended timeouts for large downloads: - CURL_TIMEOUT=300 for FFmpeg, Go (large tarballs) - CURL_TIMEOUT=180 for Ghostscript, ImageMagick Remaining without retry (intentional): - download_with_progress (specialized function) - Rustup installer (piped to shell) - Portainer version check (non-critical) Total curl_with_retry/download_gpg_key usage: 27 locations * typo * Fix removed features in refactor branch - Add libmfx-gen1.2 back for Intel Quick Sync Video encoding (Debian 12+13) - Restore tmpfiles.d configuration for MariaDB /run/mysqld persistence - Fix MariaDB fallback version from 11.4 to 12.2 (latest GA version) These changes were incorrectly removed in the refactor commits. * Optimize tools.func: fix typos, duplicate debug_log, Node.js version, PG backup, Intel VPL * Optimize tools.func: intelligent fallbacks, retry logic, caching, DNS pre-check - curl_with_retry: DNS pre-check + exponential backoff - download_gpg_key: Auto-detect key format, validation - ensure_dependencies: Batch dpkg-query check, individual fallback - install_packages_with_retry: Progressive recovery (dpkg fix, broken deps, individual packages) - verify_repo_available: Caching with TTL to avoid repeated HTTP requests - get_fallback_suite: Dynamic HTTP availability check cascade - ensure_apt_working: APT lock handling, progressive recovery - safe_service_restart: Wait-for-ready with configurable timeout, retry logic - get_latest_github_release: Fallback to tags API, prerelease support, rate limit handling * foirmatting * tools.func: Smarter parallel jobs calculation with load awareness - get_parallel_jobs: Add memory-based limiting (1.5GB/job), load awareness, and container detection for conservative limits - get_default_php_version: Add future versions (Debian 14, Ubuntu 26.04), update defaults to 8.3 - get_default_python_version: Add future versions, update defaults to 3.12 * fix: whitespace cleanup and indentation fix in tools.func	2026-03-02 08:44:59 +01:00
CanbiZ (MickLesk)	fd67210906	Migrate: DokPloy, Komodo, Coolify, Dockge, Runtipi to Addons (#12275 ) * feat: add Docker-based tool addons for dockge, komodo, dokploy, npmplus Create addon scripts following the arcane.sh pattern for Docker-based tools that can be installed on any existing Docker LXC: - dockge: Docker Compose stack manager (port 5001) - komodo: Build/deployment system with MongoDB/FerretDB (port 9120) - dokploy: PaaS via external installer with Redis (port 3000) - npmplus: Nginx Proxy Manager Plus via Compose (port 81) Each addon includes: - Docker availability check - Install with full configuration - Update via docker compose pull - Uninstall with container cleanup - ASCII header files Original ct/ and install/ scripts are preserved for now. * refactor: convert Docker tools to addons, remove old scripts Convert dockge, komodo, dokploy, coolify from standalone ct/install scripts to addon pattern (like arcane.sh). Added: - tools/addon/dockge.sh (port 5001) - tools/addon/komodo.sh (port 9120, MongoDB/FerretDB choice) - tools/addon/dokploy.sh (port 3000, external installer) - tools/addon/coolify.sh (port 8000, external installer) - tools/headers/ for all 4 Removed: - ct/dockge.sh, ct/komodo.sh, ct/alpine-komodo.sh, ct/dokploy.sh, ct/coolify.sh - install/dockge-install.sh, install/komodo-install.sh, install/alpine-komodo-install.sh - install/dokploy-install.sh, install/coolify-install.sh - frontend/public/json/ for dockge, komodo, dokploy, coolify - tools/addon/npmplus.sh (not an addon candidate) These tools are Docker-only and fit the addon pattern: they require an existing Docker LXC and manage containers via docker compose. * feat: add addon JSON configs for dockge, komodo, dokploy, coolify Recreate JSON configs with type=addon, script paths pointing to tools/addon/.sh, null resources (addon runs on existing Docker LXC), and update instructions in notes. feat: add Runtipi addon + upgrade all addons with Proxmox host check, optional Docker install, Alpine support - New: tools/addon/runtipi.sh with full Alpine support (gcompat for musl) - New: tools/headers/runtipi ASCII header - Updated: runtipi.json to addon type with null resources - Removed: ct/runtipi.sh, install/runtipi-install.sh (migrated to addon) - All addons (dockge, komodo, dokploy, coolify, runtipi) now have: - check_proxmox_host(): warns when running on PVE host, default N - check_or_install_docker(): optional Docker install (Debian+Alpine) - Alpine-aware curl bootstrap and dependency installation * readd ct, update information * Create runtipi.sh * refactor: remove inline header_info from addons, use core.func get_header() - get_header() in core.func now maps APP_TYPE=addon to tools/headers/ path - Removed 5 duplicate ASCII art header_info functions from addon scripts - Addons now use the shared header_info() from core.func + tools/headers/ files * chore(tools): add Github source links to dockge, komodo, dokploy, coolify, runtipi addons * fix(runtipi): drop Alpine support; add OS compat notes to docker addon JSONs	2026-03-02 08:44:49 +01:00
CanbiZ (MickLesk)	fddc47064d	core: read from /dev/tty in all interactive prompts \| fix empty or cropped logs due build process (#12406 )	2026-02-28 14:51:26 +01:00
CanbiZ (MickLesk)	a2dc3f44d3	feat: graceful fallback for apt-get update failures (#12386 ) Add apt_update_safe() function that warns instead of aborting when apt-get update fails (e.g. enterprise repo 401 Unauthorized). Shows a helpful hint about disabling the enterprise repo when no subscription is active. Replaces direct apt-get update calls in build.func and install.func.	2026-02-27 14:39:39 +01:00
CanbiZ (MickLesk)	774bbbc6d5	core: Improve error outputs across core functions (#12378 ) * Improve error outputs across core functions * Update tools.func	2026-02-27 13:59:02 +01:00
CanbiZ (MickLesk)	7d79a15ddf	Fix missing libGL.so.1 in Nvidia LXC containers (#12372 )	2026-02-26 22:27:46 +01:00
CanbiZ (MickLesk)	83a19adbb4	tools.func: Improve GitHub/Codeberg API error handling and error output (#12330 )	2026-02-25 21:01:45 +01:00
CanbiZ (MickLesk)	b09f2db2a9	Handle job-control signals and clear tostop Prevent terminal job-control signals from suspending the script during recovery by trapping TSTP, TTIN and TTOU (instead of only TSTP) and restoring them on exit. Also clear the terminal 'tostop' flag in stop_spinner() with `stty -tostop` to avoid background spinner I/O from stopping the process group.	2026-02-25 14:45:11 +01:00
CanbiZ (MickLesk)	dd46dd2d87	core: remove duplicate traps, consolidate error handling and harden signal traps (#12316 ) * fix(zammad): configure Elasticsearch for LXC container startup - Set discovery.type: single-node (required for single-node ES) - Set xpack.security.enabled: false (not needed in local LXC) - Set bootstrap.memory_lock: false (fails in unprivileged LXC) - Add startup wait loop (up to 60s) to ensure ES is ready before Zammad installation continues Fixes #12301-related recurring Elasticsearch startup failures * refactor(api): eliminate duplicate traps, harden error handling & telemetry Phase 1 - Structural: - Remove api_exit_script() and 5 inline traps from build.func - error_handler.func is now the sole trap owner via catch_errors() - Update api.func comment reference (api_exit_script -> on_exit) Phase 2 - Quality: - Add stop_spinner() + cursor restore to error_handler(), on_interrupt(), on_terminate(), on_hangup() to prevent spinner/cursor artifacts - Enhance _send_abort_telemetry() with error text (last 20 log lines), duration calculation, and 2 retry attempts (was fire-and-forget) - Harden json_escape() to also strip DEL (0x7F) character * fix(build): show spinner during post_update_to_api to prevent Ctrl+Z abort post_update_to_api can take up to 33 seconds worst-case (3 curl attempts x 10s timeout + sleep delays). Without any terminal output during this time, users think the script is stuck and press Ctrl+Z, which prevents the recovery menu from ever appearing. Add msg_info spinner before both post_update_to_api calls in the failure path (initial report + final force retry after recovery menu). * fix(build): prevent SIGTSTP from killing recovery dialog - Replace msg_info/stop_spinner with plain echo for telemetry reporting The background spinner process in non-interactive shells (bash -c) can trigger SIGTSTP, stopping the entire process group before the recovery dialog appears. Plain echo avoids this. - Add trap '' TSTP at failure path entry to ignore suspension signals Prevents Ctrl+Z or terminal-related SIGTSTP from interrupting the recovery menu. Restored with trap - TSTP before exit. - Root cause: msg_info starts a background process (spinner &) that is not properly detached in non-interactive shells where job control (set -m) is OFF. The disown builtin has no effect without job control, leaving the spinner in the same process group. This can cause terminal I/O conflicts during the 33-second post_update_to_api retry window, resulting in [2]+ Stopped. * fix(test): initialize colors and remove illegal local in test harness - Call load_functions() after sourcing core.func to initialize color/formatting/icon variables (RD, GN, YW, CL, TAB, etc.) - Remove 'local' keyword from top-level scope (not inside function) - Default REPO_SOURCE to ref_api instead of main * chore: remove test-recovery-dialog.sh from branch * Revert "fix(zammad): configure Elasticsearch for LXC container startup" This reverts commit `10e450b72f`. * fix(build): show telemetry status only in verbose mode Telemetry reporting is an implementation detail that doesn't help the user during failure recovery. Wrap echo statements with VERBOSE check so they only appear when verbose mode is enabled.	2026-02-25 14:08:24 +01:00
Tempest	8c0016c0a7	Fix detection of ssh keys (#12230 )	2026-02-25 08:23:14 +01:00
CanbiZ (MickLesk)	0e364adb54	core: fix broken "command not found" after err_trap (#12280 )	2026-02-24 14:22:07 +01:00
CanbiZ (MickLesk)	56de2d1e39	feat(tools): add get_latest_gh_tag helper function (#12261 ) Adds a reusable function to fetch the latest tag from a GitHub repo. Useful for projects that only use tags, not full releases (e.g. mongodb/mongo-tools). Features: - Optional prefix filter (e.g. '100.' or 'v') - Optional prefix stripping for clean version output - Skips pre-release tags (rc, alpha, beta, dev, test) - Sorts by version (sort -V) to find the latest - Respects GITHUB_TOKEN for rate limiting	2026-02-24 11:07:40 +01:00
CanbiZ (MickLesk)	5f13d29c57	fix(telemetry): prevent sporadic stuck 'configuring' status on success Root cause: post_update_to_api set POST_UPDATE_DONE=true even after all 3 retry attempts failed (curl timeout, API error). This prevented the EXIT trap (api_exit_script) from retrying with fresh attempts. Changes: - Only set POST_UPDATE_DONE=true on actual HTTP 2xx success - If all 3 attempts fail, EXIT trap gets 3 more fresh attempts - Increase timeout from 5s to 10s for final status updates (STATUS_TIMEOUT) Progress pings keep 5s (TELEMETRY_TIMEOUT) since they're lightweight - post_update_to_api_extended: add proper retry logic + HTTP code check (was fire-and-forget with no retry)	2026-02-23 17:24:06 +01:00
CanbiZ (MickLesk)	3c83654666	fix(telemetry): move 'configuring' transition to right after container creation Validation status was persisting through container start, network check, and OS customization. Now transitions to 'configuring' immediately after create_lxc_container returns. Validation only covers storage/template/cluster checks as intended.	2026-02-23 17:11:14 +01:00
CanbiZ (MickLesk)	ae3a249854	Capture lxc-attach output to host log Pipe lxc-attach output through tee into /tmp/.install-capture-${SESSION_ID}.log and use PIPESTATUS[0] to get the real lxc-attach exit code. Prefer a pulled container-side INSTALL_LOG when it exists and is >100 bytes; otherwise fall back to the host-captured terminal log (stripping ANSI codes) and append it to the combined log so get_full_log() can find it. Apply the same capture behavior to the retry path and remove temporary capture files on completion. This makes install output reliable when container-side logging is missing (DNS errors, early crashes, or missing silent() usage).	2026-02-23 17:08:38 +01:00
CanbiZ (MickLesk)	a8a1cbcf3e	feat(telemetry): add 'validation' status, fix status transitions, show 20 log lines Status flow is now: installing → validation → configuring → success/failed Changes: - post_progress_to_api() accepts optional status parameter (default: configuring) - build.func: Send 'validation' before storage/template/cluster checks - build.func: Send 'configuring' just before lxc-attach (app install) - build.func: Remove redundant progress pings during container start/network - install.func + alpine-install.func: Accept status parameter in container-side post_progress_to_api() - core.func + vm-core.func: silent() now shows last 20 lines on error (was 10)	2026-02-23 17:01:18 +01:00
CanbiZ (MickLesk)	60f9622998	fix(core): keep host-side logging on BUILD_LOG after INSTALL_LOG export After 'export INSTALL_LOG' in build.func, get_active_logfile() returned the container's INSTALL_LOG path for all host-side logging, causing msg_info/msg_ok/msg_error on the host to write to /root/.install-SESSION.log (the host file, not the container's) instead of BUILD_LOG. This made BUILD_LOG incomplete and get_full_log() unable to send full traces. Fix: Add _HOST_LOGFILE (not exported, invisible to container) so the host always logs to BUILD_LOG. Container still uses INSTALL_LOG as before.	2026-02-23 16:07:13 +01:00
CanbiZ (MickLesk)	691cec80ab	core: Enhance signal handling, reported "status" and logs (#12216 ) * Enhance telemetry, signal handling, and logs Improve failure telemetry and signal handling across the installer: add get_full_log() to collect/strip/truncate install logs and include them in API payloads with a truncated retry; add CONTAINER_INSTALLING flag around lxc-attach and stop containers on abort to avoid orphaned "installing/configuring" records; introduce _send_abort_telemetry() (curl fallback for container context) and _stop_container_if_installing() helpers; centralize and simplify EXIT/ERR/INT/TERM/HUP traps and handlers (including a new on_hangup handler) and update VM scripts to report numeric exit codes. Also ensure best-effort log collection is performed and tweak error categorization for certain signals. * Include full log in error telemetry Use get_full_log (up to 120KB) to populate the error telemetry field so the API receives the full installation trace; fall back to get_error_text (last ~20 lines) if the full log is empty. Removed collection and inclusion of a separate install_log field from the JSON payloads and simplified the retry payloads/comments accordingly. The change ensures error reports contain the complete trace while avoiding duplicate large log fields and keeps graceful failure handling (get_full_log \|\| true). * Anonymize IP addresses in get_full_log Mask IPv4 addresses in logs when collecting full log output: added a sed step that replaces the last two octets with "x.x" to avoid exposing full IPs (GDPR). Also updated the comment to reflect anonymization; existing steps that strip carriage returns and ANSI escape sequences remain in place before truncating with head -c.	2026-02-23 14:30:48 +01:00
CanbiZ (MickLesk)	69de9fa57e	Improve error handling and logging for LXC builds (#12208 ) Prevent host-side error_handler from being triggered during in-container install/recovery by delaying re-enabling set -Eeuo pipefail and the ERR trap in misc/build.func until after install/recovery completes; add explanatory comments. Update misc/error_handler.func to fall back to BUILD_LOG if container-internal log path is unavailable, show the last 20 log lines when present, refine container vs host detection (check INSTALL_LOG file and /root), copy INSTALL_LOG into /root and write a .failed flag with the exit code for host-side detection, and ensure full-log output and container removal prompt are shown appropriately in host context. Tweak misc/core.func silent() output to include a "Full log" path and adjust formatting.	2026-02-23 13:22:09 +01:00
CanbiZ (MickLesk)	1976a1715c	bookworm arc support	2026-02-23 13:08:39 +01:00
CanbiZ (MickLesk)	6ba22c82d7	Add Signed-By to Debian APT source entries Update misc/tools.func to include Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg on multiple Debian repository blocks (bullseye, bookworm, trixie/sid and their -security entries) for non-free and non-free-firmware components. This ensures APT uses the specified keyring to verify repository metadata and improves reproducible/secure apt configuration.	2026-02-23 12:59:42 +01:00
CanbiZ (MickLesk)	171d830c22	fix(tools): add GitHub API rate-limit detection and GITHUB_TOKEN support (#12176 ) - check_for_gh_release: add Authorization header when GITHUB_TOKEN is set, detect HTTP 403 and show actionable rate-limit hint - fetch_and_deploy_gh_release: improve retry loop with specific 403 handling, exponential backoff, and token export hint on rate-limit failure	2026-02-22 11:54:21 +01:00
CanbiZ (MickLesk)	491081ffbf	Add post_progress_to_api lightweight telemetry ping Introduce post_progress_to_api() in misc/api.func — a non-blocking, fire-and-forget curl ping (gated by DIAGNOSTICS and RANDOM_UUID) that updates telemetry status to "configuring". Wire this progress ping into multiple scripts (alpine-install.func, install.func, build.func, core.func) at key milestones (container start, network ready, customization, creation, cleanup) and replace/deduplicate some earlier post_to_api calls. Also update error_handler.func to always report failures immediately via post_update_to_api to ensure failures are captured even before/after container lifecycle.	2026-02-18 16:19:19 +01:00
CanbiZ (MickLesk)	6cc8877852	Add timeouts and prioritize telemetry on exit Prevent hangs when pulling logs from containers by wrapping pct pull calls with timeout (8s) and running ensure_log_on_host under timeout (10s). Always send telemetry (post_update_to_api) before attempting best-effort log collection so status is reported even if log retrieval blocks. Update EXIT/ERR/SIGHUP/SIGINT/SIGTERM traps and consolidate error/interrupt handlers to use the new timeouted log collection. Changes in misc/build.func and misc/error_handler.func.	2026-02-18 13:14:59 +01:00
CanbiZ (MickLesk)	b439960222	core: Execution ID & Telemetry Improvements (#12041 ) * fix: send telemetry BEFORE log collection in signal handlers - Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps - For signal exits (>128): send telemetry immediately, then best-effort log collection - Add 2>/dev/null \|\| true to all I/O in signal handlers to prevent SIGPIPE - Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1' - Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily * feat: add execution_id to all telemetry payloads - Generate EXECUTION_ID from RANDOM_UUID in variables() - Export EXECUTION_ID to container environment - Add execution_id field to all 8 API payloads in api.func - Add execution_id to post_progress_to_api in install.func and alpine-install.func - Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat) * fix: correct telemetry type values for PVE and addon scripts - PVE scripts (tools/pve/): change type 'tool' -> 'pve' - Addon scripts (tools/addon/): fix 4 scripts that wrongly used 'tool' -> 'addon' (netdata, add-tailscale-lxc, add-netbird-lxc, all-templates) - api.func: post_tool_to_api sends type='pve', default fallback 'pve' - Aligns with PocketBase categories: lxc, vm, pve, addon * fix: persist diagnostics opt-in inside containers for addon telemetry - install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics inside the container when DIAGNOSTICS=yes (from build.func export) - Enables addon scripts running later inside containers to find the opt-in - Update init_tool_telemetry default type from 'tool' to 'pve' * refactor: clean up diagnostics/telemetry opt-in system - diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail text with clear what/what-not collected, add telemetry + privacy links - diagnostics_menu(): better UX with current status, clear enable/disable buttons, note about existing containers - variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no telemetry before user consents via diagnostics_check) - install.func + alpine-install.func: persist BOTH yes AND no in container so opt-out is explicit (not just missing file = no) - Fix typo 'menue' -> 'menu' in config file comments * fix: no pre-selection in telemetry dialog, link to telemetry-service README - Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes) - Change privacy link from discussions/1836 to telemetry-service#privacy--compliance * fix: use radiolist for telemetry dialog (no pre-selection) - Replace --yesno with --radiolist: user must actively SPACE-select an option - Both options start as OFF (no pre-selection) - Cancel/Exit defaults to 'no' (opt-out) * simplify: inline telemetry dialog text like other whiptail dialogs * improve: telemetry dialog with more detail, link to PRIVACY.md - Add what we collect / don't collect sections back to dialog - Link to telemetry-service/docs/PRIVACY.md instead of README anchor - Update config file comment with same link	2026-02-18 10:24:06 +01:00
CanbiZ (MickLesk)	3ce3c6f613	tools/pve: add data analytics / formatting / linting (#12034 ) * core: add progress; fix exit status Introduce post_progress_to_api() in alpine-install.func and install.func to send a lightweight, fire-and-forget telemetry ping (HTTP POST) that updates an existing telemetry record to "configuring" when DIAGNOSTICS=yes and RANDOM_UUID is set. The function is non-blocking (curl -m 5, errors ignored) and is invoked during container setup and after OS updates to signal active progress. Also adjust api_exit_script() in build.func to report success (post_update_to_api "done" "0") for cases where the script exited normally but a completion status wasn't posted, instead of reporting failure. * Safer tools.func load and improved error handling Replace process-substitution sourcing of tools.func with an explicit curl -> variable -> source via /dev/stdin, adding failure messages and a check that expected functions (e.g. fetch_and_deploy_gh_release) are present (misc/alpine-install.func, misc/install.func). Add categorize_error mapping for exit code 10 -> "config" (misc/api.func). Tweak build.func: minor pipeline formatting and change the ERR trap to capture the actual exit code and only call ensure_log_on_host/post_update on non-zero exits, preventing erroneous failure reporting. * tools: add data init and auto-reporting to tools and pve section Introduce telemetry helpers in misc/api.func: _telemetry_report_exit (reports success/failure via post_tool_to_api/post_addon_to_api) and init_tool_telemetry (reads DIAGNOSTICS, starts install timer and installs an EXIT trap to auto-report). Integrate telemetry into many tools/addon and tools/pve scripts by sourcing the remote api.func and calling init_tool_telemetry (guarded with declare -f). Also apply a minor arithmetic formatting tweak in misc/build.func for RECOVERY_ATTEMPT.	2026-02-17 16:36:20 +01:00
CanbiZ (MickLesk)	f07f2cb04e	core: error-handler improvements \| better exit_code handling \| better tools.func source check (#12019 ) * core: add progress; fix exit status Introduce post_progress_to_api() in alpine-install.func and install.func to send a lightweight, fire-and-forget telemetry ping (HTTP POST) that updates an existing telemetry record to "configuring" when DIAGNOSTICS=yes and RANDOM_UUID is set. The function is non-blocking (curl -m 5, errors ignored) and is invoked during container setup and after OS updates to signal active progress. Also adjust api_exit_script() in build.func to report success (post_update_to_api "done" "0") for cases where the script exited normally but a completion status wasn't posted, instead of reporting failure. * Safer tools.func load and improved error handling Replace process-substitution sourcing of tools.func with an explicit curl -> variable -> source via /dev/stdin, adding failure messages and a check that expected functions (e.g. fetch_and_deploy_gh_release) are present (misc/alpine-install.func, misc/install.func). Add categorize_error mapping for exit code 10 -> "config" (misc/api.func). Tweak build.func: minor pipeline formatting and change the ERR trap to capture the actual exit code and only call ensure_log_on_host/post_update on non-zero exits, preventing erroneous failure reporting.	2026-02-17 13:25:17 +01:00
CanbiZ (MickLesk)	c9ecb1ccca	core: smart recovery for failed installs \| extend exit_codes (#11221 ) * feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.	2026-02-17 12:14:46 +01:00
CanbiZ (MickLesk)	2dddeaf966	Call get_lxc_ip in start() before updates (#12015 )	2026-02-17 09:08:09 +01:00
CanbiZ (MickLesk)	896714e06f	core/vm's: ensure script state is sent on script exit (#11991 ) * Ensure API update is sent on script exit Add exit-time telemetry handling across scripts to avoid orphaned "installing" records. Introduce local exit_code capture in api_exit_script and cleanup handlers and, when POST_TO_API_DONE is true but POST_UPDATE_DONE is not, post a final status (marking failures on non-zero exit codes, or marking done/failed in VM cleanups based on exit code). Changes touch misc/build.func, misc/vm-core.func and various vm/-vm.sh cleanup functions to reliably send post_update_to_api on normal or early exits. Update api.func * fix(telemetry): add missing exit codes to explain_exit_code() - Add curl error codes: 4, 5, 8, 23, 25, 30, 56, 78 - Add code 10: Docker/privileged mode required (used in ~15 scripts) - Add code 75: Temporary failure (retry later) - Add BSD sysexits.h codes: 64-77 - Sync error_handler.func fallback with canonical api.func	2026-02-16 17:14:00 +01:00
CanbiZ (MickLesk)	ae8dd5ba36	tools.func: persist /usr/local/bin in shell PATHs (#11970 )	2026-02-16 10:30:05 +01:00
CanbiZ (MickLesk)	2edd2ffaf8	fix: remove duplicate error handler from alpine-install.func (#11971 ) - Remove legacy error_handler(), on_exit(), on_interrupt(), on_terminate() and set/trap definitions from alpine-install.func (already provided by error_handler.func which is sourced on line 10) - The local error_handler() expected positional args as required, but catch_errors() sets trap as 'error_handler' (without args), causing unbound variable error with set -u (nounset) - error_handler.func uses default values which is set -u safe - Also align legacy trap in install.func network_check() to standard format Fixes #11929	2026-02-16 08:51:05 +01:00
CanbiZ (MickLesk)	a8977a25d4	fix(build): SIGINT/SIGTERM traps now exit properly - Add 'exit 130' after SIGINT trap handler - Add 'exit 143' after SIGTERM trap handler - Fixes issue where interrupted scripts stayed as 'Installing' in telemetry - Previously the traps only sent the update but didn't terminate the script	2026-02-15 10:46:21 +01:00
CanbiZ (MickLesk)	81ec696ac5	fix(api): handle nested VM RAM detection gracefully - Add \|\| true to dmidecode pipelines to prevent script abort when 'Configured Memory Speed: Unknown' is returned (no numeric match) - Fixes #11913 edge case for nested ProxmoxVE VMs	2026-02-15 10:32:57 +01:00
CanbiZ (MickLesk)	39afa87703	revert(api): revert PR #11913 RAM speed changes, fix REPO_SOURCE fallback	2026-02-15 10:11:14 +01:00
CanbiZ (MickLesk)	6c5f2aa9c1	Set default REPO_SOURCE to ProxmoxVE Update hardcoded fallback REPO_SOURCE from ProxmoxVED to ProxmoxVE and clarify comment about production vs dev repository naming. Add fail-safe '\|\| true' to several detection pipelines (lspci for GPU and grep commands reading /proc/cpuinfo for CPU vendor/model) to avoid non-zero exits or empty outputs causing function failures and improve robustness.	2026-02-15 10:06:33 +01:00
CanbiZ (MickLesk)	240f1f391f	fix(api): handle missing RAM speed in nested VMs (#11913 ) - Add \|\| true to dmidecode pipelines to prevent error when speed is 'Unknown' - Validate RAM_SPEED is a valid integer, fallback to 0 - Send ram_speed as integer (not string) in all JSON payloads for PocketBase Fixes: dmidecode in nested VMs returns 'Configured Memory Speed: Unknown' which causes grep to fail and triggers catch_errors handler.	2026-02-14 23:33:51 +01:00

1 2 3 4 5 ...

1480 Commits