Question 1

How does the free trial work?

Accepted Answer

The free trial lets you process up to 100 images at no cost, with no time limit. You can explore the full workflow—model selection, built-in sections, custom fields, and export options—before purchasing.

Question 2

Which image formats and sources are supported?

Accepted Answer

VisionTagger supports common image formats such as JPEG, PNG, TIFF, HEIC, and WebP, as well as various RAW formats including DNG. You can select images from folders on your Mac or directly from your Photos Library.

Question 3

Can I adjust the description verbosity?

Accepted Answer

Yes. You can choose between three levels: Brief for a single concise sentence suitable for alt text, Standard for two sentences with context ideal for captions, or Detailed for a comprehensive description.

Question 4

Can I control which keywords are generated?

Accepted Answer

Yes. You can set a maximum number of keywords so the model generates up to that many keywords per image. You can also define keywords to always include at the start or end of the list, and specify keywords to exclude. After generation, you can manually reorder, edit, add, or delete keywords for each individual image before exporting.

Question 5

Can I define custom metadata fields?

Accepted Answer

Yes. In addition to built-in sections (Title, Description, Keywords, Content & Style, Safety & Compliance), [you can create custom sections and add your own fields](https://youtu.be/S2oEM6LTHVQ). Each field supports a data type (Boolean, Text, or List of Texts) and its own prompt, so you can tailor exactly what the model extracts.

Question 6

Can VisionTagger write back to my Photos Library?

Accepted Answer

Yes. VisionTagger can write metadata back to your Photos Library when you choose that output option. You will always see a publish summary before anything is written.

Question 7

What outputs can VisionTagger create?

Accepted Answer

VisionTagger can export JSON, CSV or TXT per image, or a single JSON/CSV/TXT file for an entire batch. It can also apply Finder tags. For XMP sidecars and embedding metadata into image files, VisionTagger integrates with [ExifTool](https://exiftool.org) (installed separately).

Question 8

Can VisionTagger output metadata in languages other than English?

Accepted Answer

Yes. VisionTagger always generates metadata in English for optimal AI model quality. When you select a different output language in Settings, the generated metadata is automatically translated using macOS built-in Translation. Supported languages include Arabic, Chinese, Dutch, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Thai, Turkish, Ukrainian, and Vietnamese. Language packs must be downloaded in System Settings before translation is available.

Question 9

Do I need to install ExifTool?

Accepted Answer

[ExifTool](https://exiftool.org) is only required for XMP sidecars and embedding metadata into image files. If you only export JSON/CSV/TXT or apply Finder tags, you do not need ExifTool.

Question 10

Will VisionTagger overwrite existing files or metadata?

Accepted Answer

VisionTagger shows a publish summary before writing any outputs and warns you if existing files may be overwritten. You can review the actions and confirm before anything is saved.

Question 11

Do I need to configure anything technical?

Accepted Answer

No. Download a model with one click and start processing. VisionTagger ships with sensible defaults. If you want more control, you can adjust parameters like output length in Settings — but most users never need to.

Question 12

Does VisionTagger require an internet connection?

Accepted Answer

VisionTagger runs locally and does not upload your images or generated metadata. An internet connection is only needed to download models in-app and to check for and download app updates.

Question 13

How fast is it, and what Mac do I need?

Accepted Answer

VisionTagger requires Apple Silicon (M1 or later) and runs on macOS Tahoe 26.0 or later. 16 GB of RAM is the minimum; for larger models, 32 GB or more is recommended. Speed depends on your Mac, the selected model, image resolution, and your chosen metadata fields. Smaller models typically run faster; larger models can produce higher-quality results.

Question 14

How much disk space do models use?

Accepted Answer

Model downloads are stored locally. Plan for roughly 4–8 GB per model (varies by model).

Question 15

Can I automate VisionTagger?

Accepted Answer

Yes. VisionTagger integrates with Apple Shortcuts through two actions: Generate Image Metadata (for files in Finder) and Generate Photo Metadata (for your Photos Library). Both run the full pipeline in the background and export results to your configured destinations. You can use them in the Shortcuts app, Finder Quick Actions, folder automations, the command line, and AppleScript. Optionally supply a settings preset exported from the app for reproducible automation.

Question 16

Which vision models are included?

Accepted Answer

VisionTagger includes seven preconfigured vision models: Qwen3-VL 8B Instruct, Qwen3-VL 30B-A3B Instruct, Qwen3-VL 32B Instruct, Qwen2.5-VL 7B Instruct, Gemma 3 4B IT, InternVL3 8B Instruct, and Pixtral 12B. Smaller models generally run faster, while larger models may produce higher-detail output but require more memory, depending on your Mac and chosen settings. Use the trial to compare models and tweak parameters until the results match your workflow and preferred level of detail.

Question 17

Can I use my own models?

Accepted Answer

Yes. If you have a GGUF-compatible vision model and its matching projector file (also GGUF), you can [link them](https://youtu.be/V21D3kcudQc) in VisionTagger and use them like the built-in options. You are responsible for ensuring your use of third-party models complies with their licenses and terms.

Question 18

Can I tune the model parameters?

Accepted Answer

Yes. In Settings you can adjust generation parameters such as temperature, max tokens, context length, top-P, and top-K using sliders. This helps you balance creativity versus consistency and control output length and detail.

Question 19

How does VisionTagger compare to cloud keywording services?

Accepted Answer

Most cloud keywording services charge per image and require uploading your photos to their servers. VisionTagger is a one-time purchase with no per-image fees — process as many images as you want. Your photos never leave your Mac, and metadata is written directly to XMP sidecars and your files instead of a CSV export you have to import manually.

Question 20

Does the GPS Location feature send my data anywhere?

Accepted Answer

GPS coordinates embedded in your images are sent anonymously to Apple Maps to look up place names. Only the coordinates are sent — Apple does not collect personal data associated with your Maps usage. The GPS Location feature is disabled by default.

Question 21

Does the translation feature send data to Apple?

Accepted Answer

By default, macOS may use Apple’s online translation services for improved accuracy. To ensure all translation happens entirely on your Mac with no data leaving the device, enable “On-Device Mode” in System Settings > Translation.

Question 22

Does VisionTagger collect any usage data or analytics?

Accepted Answer

No. VisionTagger does not include analytics or telemetry, and it does not upload your data. Licensing activation and update checks involve network requests as needed for those functions.

Never tag photos manually again.

Who is VisionTagger for?

Smarter results with context you already have

Generate exactly the metadata you need

Fits right into your workflow

Automate it and forget about it

How it works

One-Time Purchase

VisionTagger FAQ

Getting started