HtmlToText
-- -- toggle navigation ged computation description datasets evaluations prepare your method participants and methods results graph classif description datasets how to participate publications dates / inscriptions organizers icpr 2016 - graph distance contest measuring the dissimilarity between graphs is a key step in data analysis based on graph representations. this contest focusses on the definition and the computation of general dissimilarity measures through two challenges: challenge 1 : computation of the exact or an approximate graph edit distance challenge 2 : computation of a dissimilarity measure for graph classification different datasets are considered with symbolic or numerical attributes on both nodes and edges. methods submitted to challenge 1 can also be submitted to challenge 2. this contest is supported by the technical committee on graph-based representations in pattern recognition ( tc 15 ) of the international association of pattern recognition ( iapr ) news april 2017 results of challenge 1 submitted to a journal dec. 2016 icpr / results sept. 2016 submission deadline submissions number of methods - challenge 1: 8 - challenge 2: 2 description -- challenge 1 ged computation goal compute the exact or an approximate ged, for several pairs of graphs of several datasets , under the following constraints: running time limited to 30 seconds maximum for each pair of graphs edit costs are imposed for each dataset datasets are divided into two groups: tiny graphs (less than 10 nodes): selected such that the exact ged is known for all pairs of graphs small graphs (less than 30 nodes): selected such that the exact ged is known for some pairs of graphs larger graphs (less that 70 nodes): the exact ged is unknown evaluation methods are compared according to running time closeness to reference distances given two graphs, the reference distance can be: the exact ged computed by the a*-based method [ 1 ], if available (depends on the size of the graphs), or the smallest distance computed by methods tested in this challenge: obtained by the organizers with a*-based method [ 1 ], beam-search method [ 1 ], hungarian-based bipartite ged [ 2 ] obtained by the participants more information about evaluation . participation this challenge is open to any method based on the computation of edit paths: exact methods methods based on heuristics linear and quadratic programming approaches algorithms improving computational time, e.g. multi-threaded methods (using at most 4 threads) datasets three groups of datasets are considered. dataset #graphs mean #nodes mean degree min #nodes max #nodes node attributes edge attributes exact ged alkane 150 8.9 1.8 1 10 chemical symbol valence (bound type) 99.5467% acyclic 185 8.2 1.8 3 11 85.7595% pah 94 20.7 20.4 10 28 no mao 68 18.4 2.1 11 27 no cmu 111 30 30 30 - distance no for a finer analysis, each dataset of the second group is decomposed into several subsets of graphs having the same number of nodes, and one subset of graphs having different number of nodes. each subset is considered as a dataset. dataset #subsets #graphs / subset mean #nodes mean degree min #nodes max #nodes node attributes edge attributes exact ged muta 7 10 10 10 10 chemical symbol valence (bound type) no 20 20 20 30 30 30 40 40 40 50 50 50 60 60 60 70 70 70 10, 20, ..., 70 10 70 grec 4 10 5 5 5 x,y coordinate and type type 90% (100% soon) 10 10 10 15 15 15 20 20 20 5, 10, 15, ..., 20 5 20 other attributes can be found in the datasets. attributes listed here are those used in the challenge. see prepare your method for the expressions of the costs using these attributes. about these datasets all datasets are taken from existing benchmarks: pah, mao, alkane, acyclic are part of the greyc's chemistry dataset . grec, muta and cmu are part of gdr4ged [ 3 ]. grec and muta are subsets of the iam graph database repository , and cmu is a set of delaunay graphs computed from the house image sequence of the cmu/vasc image database . visit links for more information. download datasets adapted to the challenge: gdc-c1.tar.gz graph format : graph exchange language (gxl) a example of code is given for each cost function. -- evaluation for each dataset d (or subset), each set of cost parameters c and each method m, the distance is computed for all pairs of graphs. note that for cmu, each graph is compared with graphs representing the same image rotated at 10, 20, 30, 40, 50, 60, 70, 80, and 90 degrees only (see readme file in datasets). then, methods are compared according to the following metrics [ 3 ] computed for each triplet (d,c,m): mean deviation of the computed distances from the reference distances number of times the reference distance is found mean running time time/deviation score since running time is compared, your method needs to be executed on our computer. prepare your method given two graphs g1.gxl and g2.gxl, each encoding a graph, your program must: load the graphs and their attributes from the gxl files compute the distance based on edit operations print the following information on the standard output : opt_sol;time;node_match opt_sol is an integer: 0 (not found), 1 (found), 2 (don't know) time (in seconds) is the total running time of steps 1 and 2, printed with 8 digits after the comma node_match is a vector encoding substitutions and removals of nodes of g1 it is indexed by nodes of g1 in the order defined in the gxl file (attribute id ) for each node i of g1 node_match i = j if i is substituted by the node of g2 with index j , or node_match i = -1 if i is removed step 1 or 2 must include the computation of the costs. moreover, it cannot run more than 30 seconds, and no more than 4 threads can be used. since all graphs are simple, the full edit path can be easily reconstructed from this vector without any ambiguity. its total cost defines the value of the distance computed by your method. it cannot be less than the exact ged. we are in charge of the computation of the full edit path and the associated distance from node_match . example your program computes a distance in 0.5 seconds between a graph g1 with 4 nodes and a graph g2 with 5 nodes, with operations on nodes given by: substitutions 1->2 , 2->1 , 4->4 removal: node 3 of g1 insertions: nodes 4 and 5 of g2 and induced operations on edges. your method does not know if the distance is exact. your program must print: 2;0.50000000;2 1 -1 4 on the standard output. other operations are computed from this last information and from the graphs. programs we accept command-line programs for linux 64 bits (16gb ram, 4 quad-core amd opteron f 8350 2ghz): binaries, executable jar files source code in c, c++, java matlab scripts (with/out mex files, compiled or not) other languages may be possible on demand there are 3 types of pairs of attributes on nodes and edges, so you have to prepare 3 programs , as described below. 1. for symbolic graphs (alkane, acyclic, pah, mao, muta) 6 inputs node costs c node,sub , c node,del/ins edge costs c edge,sub , c edge,del/ins graphs as gxl files g1.gxl, g2.gxl myexec_symbolic c node,sub c node,del/ins c edge,sub c edge,del/ins g1.gxl g2.gxl with costs computed by myexec_symbolic as follows: dataset node attribute node costs edge attribute edge costs alkane chemical symbol (label) l c sub (u,v)=0 if l(u)=l(v) , c node,sub else c del/ins (u)=c node,del/ins for all node u valence (bound type) t c sub (a,b)=0 if t(a)=t(b) , c edge,sub else c del/ins (a)=c edge,del/ins for all edge a pah mao acyclic muta several values of parameters c node,sub , c node,del/ins , c edge,sub , c edge,del/ins are tested. 2. for grec dataset 2 inputs graphs as gxl files g1.gxl, g2.gxl myexec_grec g1.gxl g2.gxl with costs computed by myexec_grec as follows: grec dataset used attributes costs nodes (x,y) coordinates pos , and type (label) l c sub (u,v) 0.5*d euc (pos(u),pos(v)) if l(u)=l(v) (same labels) 90 else (different labels) c ins/del 45 edges type0 (string: line or a