devops

backend/devops

Fork 0

Files

History

sophon 2b0e1c0413 modify scripts

2026-01-11 11:50:55 +08:00

batch_del.py

modify scripts

2025-11-07 10:08:19 +08:00

docker_patch.sh

modify scripts

2026-01-11 11:50:55 +08:00

em_reports_consume.sh

modify scripts

2026-01-11 10:36:07 +08:00

origin_parsers.py

modify scripts

2026-01-11 11:50:55 +08:00

parse_filename.py

modify scripts

2025-11-03 16:21:46 +08:00

parsers.py

modify scripts

2025-11-07 09:03:35 +08:00

readme.md

modify scripts

2026-01-11 11:50:55 +08:00

readme.md

登陆

用户名： admin

密码： paperless

需要指定用户名

配置好 USERMAP_GID和USERMAP_GID，否则可能无法执行主机映射进去的脚本。

详见 https://docs.paperless-ngx.com/configuration/#USERMAP_UID

自定义的文件名解析脚本

# 文档
https://docs.paperless-ngx.com/advanced_usage/#file-name-handling
https://docs.paperless-ngx.com/configuration/#PAPERLESS_POST_CONSUME_SCRIPT

# 配置
environment:
  PAPERLESS_POST_CONSUME_SCRIPT: "/usr/src/paperless/scripts/parse_filename.py"

源码修改，可以通过在容器里执行 docker_patch.sh 脚本来完成

对于无法简单读取pdf内容的文档，paperless会启动OCR扫描，且复杂情况下会执行两遍，非常慢而且消耗资源。只能通过修改源码解决：

# /usr/src/paperless/src/paperless_tesseract/parsers.py :

        # force skip ocr process.
        if not original_has_text:
            original_has_text = True
            text_original = "this is default content, as we skipped ocr process..."
            self.log.warning("Cannot read text from Document, use default message.")

        if skip_archive_for_text and original_has_text:
            self.log.debug("Document has text, skipping OCRmyPDF entirely.")
            self.text = text_original
            return

readme.md Unescape Escape

登陆

用户名： admin

密码： paperless

需要指定用户名

配置好 USERMAP_GID和USERMAP_GID，否则可能无法执行主机映射进去的脚本。

详见 https://docs.paperless-ngx.com/configuration/#USERMAP_UID

自定义的文件名解析脚本

源码修改，可以通过在容器里执行 docker_patch.sh 脚本来完成

对于无法简单读取pdf内容的文档，paperless会启动OCR扫描，且复杂情况下会执行两遍，非常慢而且消耗资源。只能通过修改源码解决：

readme.md