38 lines
1.3 KiB
Markdown
38 lines
1.3 KiB
Markdown
## 登陆
|
||
### 用户名: admin
|
||
### 密码: paperless
|
||
|
||
## 需要指定用户名
|
||
### 配置好 USERMAP_GID和USERMAP_GID,否则可能无法执行主机映射进去的脚本。
|
||
### 详见 https://docs.paperless-ngx.com/configuration/#USERMAP_UID
|
||
|
||
## 自定义的文件名解析脚本
|
||
```Bash
|
||
# 文档
|
||
https://docs.paperless-ngx.com/advanced_usage/#file-name-handling
|
||
https://docs.paperless-ngx.com/configuration/#PAPERLESS_POST_CONSUME_SCRIPT
|
||
|
||
# 配置
|
||
environment:
|
||
PAPERLESS_POST_CONSUME_SCRIPT: "/usr/src/paperless/scripts/parse_filename.py"
|
||
```
|
||
|
||
|
||
## 源码修改,可以通过在容器里执行 docker_patch.sh 脚本来完成
|
||
### 对于无法简单读取pdf内容的文档,paperless会启动OCR扫描,且复杂情况下会执行两遍,非常慢而且消耗资源。只能通过修改源码解决:
|
||
```Bash
|
||
# /usr/src/paperless/src/paperless_tesseract/parsers.py :
|
||
|
||
# force skip ocr process.
|
||
if not original_has_text:
|
||
original_has_text = True
|
||
text_original = "this is default content, as we skipped ocr process..."
|
||
self.log.warning("Cannot read text from Document, use default message.")
|
||
|
||
if skip_archive_for_text and original_has_text:
|
||
self.log.debug("Document has text, skipping OCRmyPDF entirely.")
|
||
self.text = text_original
|
||
return
|
||
|
||
```
|