Penalaran Kompleks pada Citra Digital Motif Batik Lampung Menggunakan Model LVLM

Ari Kurniawan Saputra; Robby Yuli Endra; Fenty Ariani; Erlangga Erlangga; Iing Lukman

doi:10.36448/expert.v16i1.4926

Penalaran Kompleks pada Citra Digital Motif Batik Lampung Menggunakan Model LVLM

Ari Kurniawan Saputra, Robby Yuli Endra, Fenty Ariani, Erlangga Erlangga, Iing Lukman

Abstract

Penelitian ini menerapkan Large Vision-Language Model (LVLM) untuk melakukan penalaran kompleks berbasis Chain-of-Thought (CoT) pada citra digital motif batik Lampung. Batik Lampung merupakan warisan tekstil tradisional masyarakat Lampung yang dicirikan oleh empat motif khas: Leluak Tehambur (Kota Metro), Kapal Pesagi (Kabupaten Lampung Selatan), Pohon Hayat (Kabupaten Pesawaran), dan Motif Bambu (Kabupaten Pringsewu). Pendekatan berbasis CNN yang ada tidak mampu menjelaskan makna budaya yang terkandung dalam motif-motif tersebut, sehingga mendorong kebutuhan akan model yang mampu melakukan penalaran semantik. Dataset mandiri BLD-28 sebanyak 28 citra dikumpulkan dari empat Dekranasda resmi di Provinsi Lampung dan dianotasi oleh pakar budaya dengan inter-annotator agreement κ = 0,89. Model InternVL2-8B di-fine-tune menggunakan Low-Rank Adaptation (LoRA, r = 64, α = 128) dengan fungsi loss multi-task yang menggabungkan objektif klasifikasi dan generasi CoT. Hasil menunjukkan InternVL2-8B mencapai akurasi 94,37%, mIoU 88,12%, dan Reasoning Coherence Score (RCS) 4,62/5,00, melampaui seluruh baseline CNN maupun LVLM pembanding secara signifikan (uji McNemar, p < 0,001). Penalaran CoT terbukti meningkatkan akurasi klasifikasi sebesar 3,21 poin dibandingkan klasifikasi langsung, membuktikan kelayakan LVLM untuk pengenalan motif tekstil tradisional Indonesia yang berbasis pemahaman budaya

Keywords

Batik Lampung; Citra; CoT; Motif; LVLM.

Full Text:

PDF

References

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual Instruction Tuning,” Adv. Neural Inf. Process. Syst., vol. 36, no. NeurIPS, pp. 1–25, Dec. 2023, [Online]. Available: http://arxiv.org/abs/2304.08485

Z. Chen et al., “InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., no. 1, pp. 24185–24198, Jan. 2024, doi: 10.1109/CVPR52733.2024.02283.

K. Li, A. K. Hopkins, D. Bau, F. Viégas, H. Pfister, and M. Wattenberg, “Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task,” 11th Int. Conf. Learn. Represent. ICLR 2023, no. 2022, pp. 1–17, Jun. 2024, [Online]. Available: http://arxiv.org/abs/2210.13382

J. Bai et al., “Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond,” pp. 1–24, Oct. 2023, [Online]. Available: http://arxiv.org/abs/2308.12966

J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Adv. Neural Inf. Process. Syst., vol. 35, no. NeurIPS, pp. 1–43, Jan. 2023, [Online]. Available: http://arxiv.org/abs/2201.11903

M. Oquab et al., “DINOv2: Learning Robust Visual Features without Supervision,” Trans. Mach. Learn. Res., vol. 2024, pp. 1–32, Feb. 2024, [Online]. Available: http://arxiv.org/abs/2304.07193

X. Zhai, B. Mustafa, A. Kolesnikov, and L. Beyer, “Sigmoid Loss for Language Image Pre-Training,” Proc. IEEE Int. Conf. Comput. Vis., pp. 11941–11952, Sep. 2023, doi: 10.1109/ICCV51070.2023.01100.

S. Liu et al., “Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) , vol. 15105 LNCS, pp. 38–55, 2025, doi: 10.1007/978-3-031-72970-6_3.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-December, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.

M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.

W. Wang et al., “CogVLM: Visual Expert for Pretrained Language Models,” Adv. Neural Inf. Process. Syst., vol. 37, Feb. 2024, doi: 10.52202/079017-3860.

Q. Ye et al., “mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 13040–13051, Nov. 2023, doi: 10.1109/CVPR52733.2024.01239.

H. Shao et al., “Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning,” Adv. Neural Inf. Process. Syst., vol. 37, no. NeurIPS, 2024, doi: 10.52202/079017-0275.

H. Liu, C. Li, Y. Li, and Y. J. Lee, “Improved Baselines with Visual Instruction Tuning,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 26286–26296, 2024, doi: 10.1109/CVPR52733.2024.02484.

D. Hendrycks and K. Gimpel, “Gaussian Error Linear Units (GELUs),” pp. 1–10, Jun. 2023, [Online]. Available: http://arxiv.org/abs/1606.08415

E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” ICLR 2022 - 10th Int. Conf. Learn. Represent., pp. 1–26, Oct. 2021, [Online]. Available: http://arxiv.org/abs/2106.09685

M. Yuksekgonul, F. Bianchi, P. Kalluri, D. Jurafsky, and J. Zou, “When and Why Vision-Language Models Behave Like Bags-of-Words, and What To Do About It?,” 11th Int. Conf. Learn. Represent. ICLR 2023, pp. 1–20, 2023.

R. Andrian, R. Taufik, D. Kurniawan, A. S. Nahri, and H. C. Herwanto, “Lampung Batik Classification Using AlexNet, EfficientNet, LeNet and MobileNet Architecture,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 11, pp. 930–935, 2024, doi: 10.14569/IJACSA.2024.0151191.

R. Andrian, H. C. Herwanto, R. Taufik, and D. Kurniawan, “Performance Comparison Between LeNet And MobileNet In Convolutional Neural Network for Lampung Batik Image Identification,” Sci. J. Informatics, vol. 11, no. 1, pp. 147–154, 2024, doi: 10.15294/sji.v11i1.49451.

Y. Z. Malih and M. Akbar, “KLASIFIKASI DAN SEGMENTASI MOTIF BATIK YOGYAKARTA,” vol. 10, no. 1, pp. 1511–1518, 2026.

I. Fathurrahman, M. Djamaluddin, Z. Amri, and M. N. Wathani, “Klasifikasi Motif Batik Nusantara Menggunakan Vision Transformer (ViT) Berbasis Deep Learning,” Infotek J. Inform. dan Teknol., vol. 8, no. 2, pp. 511–522, Jul. 2025, doi: 10.29408/jit.v8i2.31108.

L. Fitriani, D. Tresnawati, and M. B. Sukriyansah, “Image Classification On Garutan Batik Using Convolutional Neural Network with Data Augmentation,” JUITA J. Inform., vol. 11, no. 1, p. 107, May 2023, doi: 10.30595/juita.v11i1.16166.

D. G. T. Meranggi, N. Yudistira, and Y. A. Sari, “Batik Classification Using Convolutional Neural Network with Data Improvements,” Int. J. Informatics Vis., vol. 6, no. 1, pp. 6–11, 2022, doi: 10.30630/joiv.6.1.716

DOI: http://dx.doi.org/10.36448/expert.v16i1.4926

Refbacks

There are currently no refbacks.

EXPERT: Jurnal Manajemen Sistem Informasi dan Teknologi

Published by Pusat Studi Teknologi Informasi, Fakultas Ilmu Komputer, Universitas Bandar Lampung
Gedung M Lt.2 Pascasarjana Universitas Bandar Lampung
Jln Zainal Abidin Pagaralam No.89 Gedong Meneng, Rajabasa, Bandar Lampung,
LAMPUNG, INDONESIA

Indexed by:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Username
Password
Remember me