Comparing VLM Run's Visual AI Agent with ChatGPT, Gemini and Claude across a wide set of use-cases
Detect all objects present in this scene. For each detected object, determine if it is associated with writing tasks (such as pens, pencils, paper, books, whiteboards, laptops, or tablets). Segment only those writing-related items and provide their locations or bounding boxes. Return a brief description of each segmented object and its relevance to writing.
Generate a new creative image inspired by this reference, but with a completely different artistic style and composition.
Detect and extract only the most important details in the news article title, such as key subjects, events, or names, and present them in a markdown format. Only show the title and dates from the image overlayed with bounding boxes.
Detect all the birds in the image and visualize them.
Segment out the turtles in the image.
Identify and list all major rooms in the floor plan, providing the square footage for each. Redesign the space using Japandi-style furnishings, emphasizing natural light, neutral color palettes, and minimalist forms. Return an updated floor plan with labeled rooms and suggested furniture layout.
Enhance and beautify the handwritten text in the document, so it appears clear and aesthetically pleasing. Replace the entire background with a realistic desk scene that includes a laptop and a coffee cup. Make the text look as if it is neatly written in a notebook, with a pen and the coffee cup nearby. Ensure the final image is high-quality, artifact-free, and visually appealing.
You are an expert in X-ray analysis and disease detection. Analyze the given X-ray, mark the affected area and give a disease description only if the patient has a disease, else analyze as a normal X-ray.
Step 1: Detect and highlight all people present in the image. Step 2: Apply deoldification techniques to restore true-to-life color and detail. Return a high-quality, artifact-free video clearly demonstrating the deoldified scene, focusing on preserving facial features and overall realism.
You are provided with two images: one of a dress(the first image) and one of a person(the second image). First, accurately detect the boundaries of the person and the dress. Next, generate a highly realistic virtual try-on by seamlessly compositing the dress onto the person, ensuring natural fit, alignment, and that the person appears fully and appropriately dressed. All visible body parts and clothing edges should be preserved and blended with the new garment. Once the transformation is complete, produce a high-quality, artifact-free video of the person wearing the dress from the provided inputs, with special attention to natural body contours, color consistency, and overall realism.
Detect the keypoints of the eyes in the image.
Count the number of fruits in the image by detecting their center.
Segment all the furniture in the living room.
Detect all the cars and pedestrians at the intersection.
Detect the light sources in the image.
Detect all the UI elements in this image and visualize the boxes
Segment out the faces in this image and visulize the segments
Write an exciting sports narrative describing the action in this image, including player dynamics and game context and then animate a video of this.
Generate an image of this girl on the mars. I want her in front of a futuristic settlement house on mars as if she has been there for her whole lif enjoying the day. I want her outfit and pose to blend into the surrounding and look natural.
Give a detailed explanation of the plan and generate detailed 3D architectural visualizations and interior design concepts based on the plan
Generate detailed 3D architectural visualizations and interior design concepts based on this floor plan, including furniture placement and lighting.
Get the layout of the document and visualize all the text
Extract all the passport details like name, dob etc and visualize them with boxes around the location they were extracted from.
Can you blur out the patient name, dob, sex and marriage status?
Parse this document and extract all the handwritten fields. Detect the locations of each of the fields and visualize them.
Crop into the clock in the image and extract the time shown
First detect the license plates in the image and and blur them
Analyze this image quality and enhance the image to a high resolution.
Extract the first and last page from the document, visualize and analyze them
Get the best highlight showing the most fireworks in this video
Crop the first 3 minutes of the video and visualize it
Extract any text visible in the document, their locations and visualize them.