modify scripts

This commit is contained in:
2025-11-05 17:25:41 +08:00
parent b7dffc539c
commit 808dbaa985
4 changed files with 980 additions and 9 deletions

View File

@ -34,7 +34,23 @@ environment:
PAPERLESS_POST_CONSUME_SCRIPT: "/usr/src/paperless/scripts/parse_filename.py"
paperless 默认不会删除重复的文件,这会导致如果重复添加,会不停扫描,加载,报错。没找到配置,直接修改源码解决:
对于无法简单读取pdf内容的文档paperless会启动OCR扫描且复杂情况下会执行两遍非常慢而且消耗资源。只能通过修改源码解决:
/usr/src/paperless/src/paperless_tesseract/parsers.py :
# force skip ocr process.
if not original_has_text:
original_has_text = True
text_original = "this is default content, as we skipped ocr process..."
self.log.warning("Cannot read text from Document, use default message.")
if skip_archive_for_text and original_has_text:
self.log.debug("Document has text, skipping OCRmyPDF entirely.")
self.text = text_original
return
paperless 默认不会删除重复的文件,这会导致如果重复添加,会不停扫描,加载,报错。没找到配置,直接修改源码解决:(已经有配置,详见 docker-compose.yml
/usr/src/paperless/src/documents/consumer.py