Proceedings: GI 2019

Rapid Sequence Matching for Visualization Recommender Systems

Shaoliang Nie (North Carolina State University), Christopher G. Healey (North Carolina State University), Rada Y. Chirkova (North Carolina State University), Juan L. Reutter (Pontificia Universidad Católica de Chile)

Proceedings of Graphics Interface 2019: Kingston, Ontario, 28 - 31 May 2019

DOI 10.20380/GI2019.05

  • BibTex

    author = {Nie, Shaoliang and Healey, Christopher G. and Chirkova, Rada Y. and Reutter, Juan L.},
    title = {Rapid Sequence Matching for Visualization Recommender Systems},
    booktitle = {Proceedings of Graphics Interface 2019},
    series = {GI 2019},
    year = {2019},
    issn = {0713-5424},
    isbn = {978-0-9947868-4-5},
    location = {Kingston, Ontario},
    numpages = {8},
    doi = {10.20380/GI2019.05},
    publisher = {Canadian Information Processing Society},
    keywords = {Visualization systems, recommendation systems, similarity measures, locality sensitive hashing},


We present a method to support high quality visualization recommendations for analytic tasks. Visualization converts large datasets into images that allow viewers to efficiently explore, discover, and validate within their data. Visualization recommenders have been proposed that store past sequences: an ordered collection of design choices leading to successful task completion; then match them against an ongoing visualization construction. Based on this matching, a system recommends visualizations that better support the analysts’ tasks. A problem of scalability occurs when many sequences are stored. One solution would be to index the sequence database. However, during matching we require sequences that are similar to the partially constructed visualization, not only those that are identical. We implement a locality sensitive hashing algorithm that converts visualizations into set representations, then uses Jaccard similarity to store similar sequence nodes in common hash buckets. This allows us to match partial sequences against a database containing tens of thousands of full sequences in less than 100ms. Experiments show that our algorithm locates 95% or more of the sequences found in an exhaustive search, producing high-quality visualization recommendations.