Files
devops/docker/paperless/plugins/readme.md
2026-01-11 11:50:55 +08:00

38 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 登陆
### 用户名: admin
### 密码: paperless
## 需要指定用户名
### 配置好 USERMAP_GID和USERMAP_GID否则可能无法执行主机映射进去的脚本。
### 详见 https://docs.paperless-ngx.com/configuration/#USERMAP_UID
## 自定义的文件名解析脚本
```Bash
# 文档
https://docs.paperless-ngx.com/advanced_usage/#file-name-handling
https://docs.paperless-ngx.com/configuration/#PAPERLESS_POST_CONSUME_SCRIPT
# 配置
environment:
PAPERLESS_POST_CONSUME_SCRIPT: "/usr/src/paperless/scripts/parse_filename.py"
```
## 源码修改,可以通过在容器里执行 docker_patch.sh 脚本来完成
### 对于无法简单读取pdf内容的文档paperless会启动OCR扫描且复杂情况下会执行两遍非常慢而且消耗资源。只能通过修改源码解决
```Bash
# /usr/src/paperless/src/paperless_tesseract/parsers.py :
# force skip ocr process.
if not original_has_text:
original_has_text = True
text_original = "this is default content, as we skipped ocr process..."
self.log.warning("Cannot read text from Document, use default message.")
if skip_archive_for_text and original_has_text:
self.log.debug("Document has text, skipping OCRmyPDF entirely.")
self.text = text_original
return
```