VisualStringDistances
VisualStringDistances.Glyph
VisualStringDistances.Glyph
VisualStringDistances.GlyphCoordinates
VisualStringDistances.glyph!
VisualStringDistances.printglyph
VisualStringDistances.visual_distance
VisualStringDistances.Glyph
— TypeGlyph <: AbstractArray{Bool,2}
Holds the bitmap associated to a Unifont glyph in a packed format.
VisualStringDistances.Glyph
— MethodGlyph(s::AbstractString) --> Glyph
Construct a Glyph
from a string.
Examples
julia> Glyph("abc")
------------------------
------------------------
------------------------
---------#--------------
---------#--------------
---------#--------------
--####---#-###----####--
-#----#--##---#--#----#-
------#--#----#--#------
--#####--#----#--#------
-#----#--#----#--#------
-#----#--#----#--#------
-#---##--##---#--#----#-
--###-#--#-###----####--
------------------------
------------------------
VisualStringDistances.GlyphCoordinates
— TypeGlyphCoordinates{T} <: AbstractVector{T}
A sparse representation of a Glyph
.
VisualStringDistances.glyph!
— Methodglyph!(v::Vector{UInt8}) -> Glyph
Creates a Glyph
for a vector of bytes, assuming the vector represents a single Unifont character. Modifies v
and may share its memory.
VisualStringDistances.printglyph
— Functionprintglyph([io=stdout], g::Union{Char, AbstractString, Glyph})
Prints a visual representation of g
to io
.
VisualStringDistances.visual_distance
— Methodvisual_distance(::Type{T}, s::Union{Char,AbstractString},
t::Union{Char,AbstractString}; D=KL(one(T)), ϵ=T(0.1),
normalize=nothing) where {T}
Computes a measure of distance between the strings s
and t
in terms of their visual representation as rendered by GNU Unifont and quantified by an unbalanced Sinkhorn divergence from UnbalancedOptimalTransport.jl.
- The keyword argument
D
chooses theUnbalancedOptimalTransport.AbstractDivergence
used to penalize the creation or destruction of "mass" (black pixels). ForD = VisualStringDistances.KL(ρ)
for some numberρ ≥ 0
, the distance is non-negative and zero if and only if the two visual representations of the strings are the same, as is generally desired. - The keyword argument
ϵ
sets the "entropic regularization" in the Sinkhorn divergence; see the documentation there for more information. In short, smallerϵ
computes a quantity more directly related to the cost of moving mass, but takes longer to compute. - The keyword argument
normalize
can be chosen to be a function which returns a normalizing constant given the maximum length of the two strings. The choicenormalize=identity
thus divides the result by the maximum length of the two strings. The choicenormalize=sqrt
has been found to give a good balance in some settings.
One may use printglyph
to see the visual representation of the strings as rendered by GNU Unifont.
At the time of this writing, GNU Unifont is capable of rendering 57086 different unicode characters. However, it renders some unicode characters with the same graphical representation; specifically, 689 distinct unicode characters have duplicate representations. Here's a set of six duplicates, for example:
- 'Ꮋ': Unicode U+13BB (category Lu: Letter, uppercase)
- 'Н': Unicode U+041D (category Lu: Letter, uppercase)
- 'ꓧ': Unicode U+A4E7 (category Lo: Letter, other)
- 'Ⲏ': Unicode U+2C8E (category Lu: Letter, uppercase)
- 'Η': Unicode U+0397 (category Lu: Letter, uppercase)
- 'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)
The visual distance between these, therefore, is returned as zero (up to numerical error).
Example
julia> using VisualStringDistances
julia> printglyph("abc")
------------------------
------------------------
------------------------
---------#--------------
---------#--------------
---------#--------------
--####---#-###----####--
-#----#--##---#--#----#-
------#--#----#--#------
--#####--#----#--#------
-#----#--#----#--#------
-#----#--#----#--#------
-#---##--##---#--#----#-
--###-#--#-###----####--
------------------------
------------------------
julia> printglyph("def")
------------------------
------------------------
------------------------
------#-------------##--
------#------------#----
------#------------#----
--###-#---####-----#----
-#---##--#----#--#####--
-#----#--#----#----#----
-#----#--######----#----
-#----#--#---------#----
-#----#--#---------#----
-#---##--#----#----#----
--###-#---####-----#----
------------------------
------------------------
julia> visual_distance("abc", "def")
31.57060117541754
julia> visual_distance("abc", "abe")
4.979840716647487