ivci:nuva-utils

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
ivci:nuva-utils [2024/03/20 13:21] fkaagivci:nuva-utils [2025/04/25 09:03] (current) fkaag
Line 1: Line 1:
 ====== Python utilities to handle NUVA ====== ====== Python utilities to handle NUVA ======
-[[https://github.com/fkaag71/nuva-utils/tree/master/NUVA%20Utils|library available on GitHub]] allows to retrieve and explore NUVA. It is a work in progress, that will be progressively enriched to provide metrics on code systems based upon their mapping to NUVA codes.+[[https://pypi.org/project/nuva-utils/|nuva_utils]] is Python package available from the PyPi repository.
  
-The supported functions are so far+It can be installed with command
-<code python+<code> 
-get_nuva_version()+pip install nuva-utils
 </code> </code>
-Returns the version index for the last publication of NUVA. 
  
 +The supported functions are so far:
 <code python> <code python>
-get_nuva(version+def nuva_version(): 
-</code> +    """ 
-Uploads in the current directory the referenced version in RDF/XML format as **nuva_ans.rdf**, and creates rebased version **nuva_ivci.rdf**.+    Returns the current version of the NUVA graph available from https://ivci.org/nuva 
 +    """ 
 +def nuva_core_graph(): 
 +    """ 
 +    Returns the core graph of NUVA as RDFLib graph 
 +    :return: the core graph 
 +    """ 
 +def nuva_add_codes_to_graph(g,codesystem,codes): 
 +    """ 
 +    Adds the alignments for an external code system.
  
-<code python> +    g: The graph where the alignments are to be added 
-split_nuva(+    codesystem: The code system of the aligments 
-</code> +    codes: an array of Dict objects, such as {'CVX':'CVX-219','NUVA':'VAC1188')} 
-From the uploaded **nuva_ivci.rdf** filecreates split version as collection of files in RDF/Turtle format+    """ 
-  * **nuva_core.ttl** including the concepts for vaccinesvalencestarget diseases and their labels in English +def nuva_add_lang(g,lang): 
-  * **nuva_lang_XX.ttl** includes all translations for language XX +    """  
-  * **nuva_refcode_YYY.ttl** includes the concepts and the NUVA alignments for code system YYY+    Adds language graph to base graph 
 +    """      
 +def nuva_get_vaccines(g,lang,onlyAbstract= False): 
 +    """ 
 +    Return a Dict of all NUVA vaccines and their properties 
 +    """ 
 +def nuva_translate(g,lang1,lang2)
 +    """ 
 +    Extracts from a graph the translation across 2 languages 
 +    """ 
 +def nuva_optimize(g,codesystem,onlyAbstract): 
 +    """ 
 +    Determines the optimal mapping of a code system to NUVA, either full or limited to abstract vaccines. 
 +    Returns a dictionary with three items: 
 +    - bestcodes, a dictionary of all NUVA concepts 
 +    - revcodes, a dictionary of all codes in the code system 
 +    - metrics, the computed metrics of the code system
  
-<code python> +    For each NUVA concept, bestcodes is formed by: 
-refturtle_to_map(code) +    - label: the English label of the concept 
-</code> +    - isAbstract: whether the concept is abstract 
-Starting from the **nuva_refcode_YYY.ttl** file for the given code, creates a simple CSV file **nuva_refcode_YYY.csv** with alignments between the given code and NUVA.+    - nbequiv: the number of codes that match exactly the NUVA concept 
 +    - blur: the number of concepts covered by the narrowest codes for the NUVA conceptIf nbequiv is not 0, blur should be 1 
 +    - codes: the list of codes with the given blur
  
-<code python> +    For each code in the code system, revcodes is formed by: 
-map_to_turtle(code)+    - label: the English label of the corresponding NUVA concept 
 +    - cardinality: the number of NUVA concepts covered by the given code 
 +    - may: the list of these NUVA concepts 
 +    - blur: the number of NUVA concepts for which the given code is the best possible one 
 +    - best: the list of these NUVA concepts, that is a subset of "may" 
 + 
 +    The metrics is formed by: 
 +    - completeness: the share of NUVA concepts that can be represented by a code, even roughly 
 +    - precision: the inverse of the average blur over all the codes in the code system, when using the most optimal one for each concept. 
 +    - redundancy: for the NUVA concepts that have exact alignments in the code system, the average number of such alignments. 
 +    """                  
 </code> </code>
-Assuming that the **nuva_refcode_YYY.csv** file has been copied to work file **nuva_code_YYY.csv**, then edited for enhancing the alignments, creates a Turtle work file **nuva_code_YYY.ttl** for further processing. 
  
-Note that the refcode file contains the NUVA English labels of vaccines for convenience, but these are not required nor processed from the work code file.+Here an example of use: 
 +  - Retrieve the NUVA version 
 +  - Retrieve the NUVA core graph 
 +  - Complement it with ATC alignments 
 +  - Complement it with French labels 
 +  - Display the list of vaccines 
 +  - Display a translation table from English to French 
 +  - Determine the best possible mapping from and to ATC and the corresponding metrics
  
-<code python+<code Python
-query_core(q) +import os 
-</code> +import nuva_utils 
-Runs a SPARQL query q against the core graph loaded from **nuva_core.ttl**+from pathlib import Path 
 +from nuva_utils.nuva_utils import *
  
-<code python> +# Here the main program - Adapt the work directory to your environment
-query_code(q,code) +
-</code> +
-Runs a SPARQL query q against a graph formed by merging **nuva_core.ttl** and the work file **nuva_code_YYY.ttl**, thus allowing to run checks and measures on the alignment.+
  
 +os.chdir(str(Path.home())+"/Documents/NUVA")
 +version = nuva_version()
 +print(version)
  
-<code> +g = nuva_core_graph() 
-eval_code(code+print ("Core graph loaded")
-</code> +
-Produces the metrics for a code system, given a **nuva_code_YYY.csv** file for alignments.+
  
-Subproducts are: +codes = [] 
-  * **nuva_reverse_YYY.csv** : file with all NUVA codes matching a given external code +csv_file = open("NUVA_refcode_ATC.csv",'r',encoding="utf-8-sig",newline='') 
-  * **nuva_best_YYY.csv**: file with the best possible external code for a given NUVA code+reader = csv.DictReader(csv_file,delimiter=';'
 +codesystem = reader.fieldnames[0] 
 +for row in reader: 
 +    codes.append(row)
  
-An example use sequence is included in the file: +nuva_add_codes_to_graph(g,codesystem,codes) 
-<code python> +nuva_add_lang(g,'fr') 
-# Here the main program - Adapt the work directory to your environment+vaccines = nuva_get_vaccines(g,'fr'
 +print(vaccines) 
 +trans = nuva_translate(g,'en','fr'
 +print(trans) 
 +eval_codes = nuva_optimize(g,codesystem,False) 
 +bestcodes = eval_codes['bestcodes'
 +revcodes = eval_codes['revcodes'
 +metrics = eval_codes['metrics']
  
-os.chdir(str(Path.home())+"/Documents/NUVA"+rev_fname = f"{codesystem}/nuva_reverse_{codesystem}.csv" 
-get_nuva(get_nuva_version()) +best_fname= f"{codesystem}/nuva_best_{codesystem}.csv" 
-split_nuva() +metrics_fname=f"{codesystem}/nuva_metrics_{codesystem}.txt"
-refturtle_to_map("CVX"+
-shutil.copyfile("nuva_refcode_CVX.csv","nuva_code_CVX.csv") +
-map_to_turtle("CVX")+
  
-q = """  +print ("Create best codes report "+best_fname) 
-   # All vaccines against smallpox +best_file = open(best_fname,'w',encoding="utf-8",newline='') 
-    SELECT ?vcode ?vl WHERE {  +best_writer = csv.writer(best_file, delimiter=';') 
-    ?dis rdfs:subClassOf nuva:Disease . +best_writer.writerow(["NUVA","Label","IsAbstract",f"Best {codesystem}""Equiv"]
-    ?dis rdfs:label "Smallpox-Monkeypox"@en . +for nuva_code in bestcodes
-    ?vac rdfs:subClassOf nuva:Vaccine +    best_writer.writerow([nuva_code,bestcodes[nuva_code]['label'],bestcodes[nuva_code]['isAbstract'], 
-    ?vac rdfs:label ?vl  +                            bestcodes[nuva_code]['codes'], bestcodes[nuva_code]['nbequiv']]) 
-    ?vac skos:notation ?vcode . +best_file.close
-    ?vac nuvs:containsValence ?val .  +
-    ?val nuvs:prevents ?dis  +
- } +
-""" +
-res = query_core(q+
-for row in res+
-     print (f"{row.vcode} - {row.vl}")+
  
-res = eval_code("CVX") +print ("Create reverse codes report "+rev_fname
-print ("Completeness {:.1%} ".format(res['Completeness'])+rev_file = open(rev_fname,'w',encoding="utf-8",newline=''
-print ("Precision {:.1%} ".format(res['Precision']))+rev_writer = csv.writer(rev_file, delimiter=';') 
 +rev_writer.writerow([codesystem,"Label","Cardinality","May code", "Blur", "Best code for"]) 
 +for extcode in revcodes: 
 +    rev_writer.writerow([extcode,revcodes[extcode]['label'],  
 +                            revcodes[extcode]['cardinality'],revcodes[extcode]['may'],  
 +                            revcodes[extcode]['blur'], revcodes[extcode]['best']]) 
 +rev_file.close
  
-</code>+nbnuva = len(bestcodes) 
 +nbcodes = len(revcodes)
  
 +print (f"NUVA version :{version}\n")
 +print (f"Number of NUVA concepts : {nbnuva}")
 +print ("Completeness: {:.1%}\n".format(metrics['completeness']))
 +print (f"Number of aligned codes: {nbcodes}")
 +print ("Precision: {:.1%}".format(metrics['precision']))
 +print ("Redundancy: {:.3}".format(metrics['redundancy']))
 +</code>
  • ivci/nuva-utils.txt
  • Last modified: 2025/04/25 09:03
  • by fkaag