Exploring Multimodal LLMs: Text, Image, and Video Integration