Starting from this issue, I will organize some commonly used LaTeX-related tools, returning to the original intention of establishing this account.
Brief Introduction#
In the past, the best website/software for recognizing mathematical formulas and converting code was Mathpix: it supports various format conversions, such as pdf, png to tex, md, and of course, docx, etc. Whether it's converting pdf documents or recognizing screenshots, the full ecosystem coverage from the mobile Snip App to the desktop client, web version, and browser plugins provides an incredibly smooth user experience.
However, the usage limit of the free version is quite limited, and the paid subscription is too expensive (50 dollars a year); additionally, the web version occasionally experiences unstable network connections.
But now, the so-called strongest OCR tool on Earth, Mistral, has emerged. This is a French startup AI company, which can be understood as the European version of DeepSeek, and its price is also very cheap (the OCR function can convert thousands of pages of pdf for about 1 dollar).
Usage Example#
File Upload#
Directly visit the official website chat interface (you may need to register an account with your phone number):
Upload files in the dialog box like other large language models, and simply enter "convert to markdown"; for example, here I uploaded a paper related to DeepSeek-R1:
Wait a few seconds, and you will get the converted markdown code.
Furthermore, using markdown editors like Typora or Obsidian, you can view or convert to pdf, docx, and other formats (you may need to install pandoc additionally).
Web Version Effect Display#
For more effect displays, please refer to the official website introduction:
This is the original pdf file:
This is the effect displayed after conversion in Typora (Theme: Newsprint):
As you can see, the effects are quite good except for the images, but if you want to extract the images as well and keep them in the corresponding positions, you will need to use the methods below.
Advanced Configuration#
In addition to processing files in the official website chat interface, you can also perform batch processing through API calls.
Thanks to @nicekate for providing the Python code, which allows local calls to the Mistral API for file processing, and there is also a corresponding demonstration video on Bilibili.
- GitHub repository address:
https://github.com/nicekate/mistral-ocr - Bilibili video:
【Test Mistral OCR: The World's Best Document Understanding Model?】 https://www.bilibili.com/video/BV1Bw92YiEEH
The configuration method is also very simple; you just need to apply for your own API key, then clone the above repository and fill in the corresponding API key.
Apply for API Key#
Click "API Keys" in the left menu bar of the console, then click "Create new key" in the upper right corner, and copy it.
Download Python Code#
First, clone the above repository to your local machine:
git clone https://github.com/nicekate/mistral-ocr.git
Then install the dependencies:
pip install mistralai
Modify the API key and PDF file path in pdf_ocr.py
(lines 72-73):
API_KEY = "fill in your own api key"
PDF_PATH = "xxx.pdf"
Run the file, and you will find the converted folder ocr_results_xxx
in the same directory, which contains the converted markdown files and image files.
Local Conversion Effect Display#
Theme: Typora GitHub
Batch Processing#
Sometimes you need to convert multiple PDF files at the same time, so I added batch processing functionality based on this, which can convert all PDF files in the same directory.
The specific modifications are as follows:
- Added the
get_pdf_files_in_directory
function to scan the specified folder and return the full paths of all PDF files. - In
__main__
, replaced the manual specification of PDF file paths with automatic retrieval of PDF files from the folder. - If there are no PDF files in the folder, it will prompt the user.
Here is the newly added code snippet:
def process_pdfs(pdf_paths: list, api_key: str) -> None:
for pdf_path in pdf_paths:
try:
output_dir = process_pdf(pdf_path, api_key)
print(f"File {pdf_path} processed, results saved in: {output_dir}")
except Exception as e:
print(f"Error processing file {pdf_path}: {e}")
def get_pdf_files_in_directory(directory: str) -> list:
"""Get all PDF file paths in the specified directory"""
pdf_files = []
for file in os.listdir(directory):
if file.endswith(".pdf"):
pdf_files.append(os.path.join(directory, file))
return pdf_files
if __name__ == "__main__":
# Usage example
API_KEY = "your_mistral_api_key"
DIRECTORY = "your_pdf_file" # Specify the folder name containing PDF files
# Get all PDF files in the folder
PDF_PATHS = get_pdf_files_in_directory(DIRECTORY)
if not PDF_PATHS:
print(f"No PDF files found in directory {DIRECTORY}.")
else:
process_pdfs(PDF_PATHS, API_KEY)
The complete file has been uploaded to GitHub, feel free to star it:
References#
- The first step to perfectly translating PDF - Mistral OCR preliminary usage guide https://zhuanlan.zhihu.com/p/28801320889
- 【Test Mistral OCR: The World's Best Document Understanding Model?】 https://www.bilibili.com/video/BV1Bw92YiEEH
- https://github.com/nicekate/mistral-ocr
- Added batch processing functionality based on the former: https://github.com/YZDame/mistral-ocr