ivci:nuva-utils

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ivci:nuva-utils [2024/01/05 13:54] fkaagivci:nuva-utils [2025/04/25 09:03] (current) fkaag
Line 1: Line 1:
 ====== Python utilities to handle NUVA ====== ====== Python utilities to handle NUVA ======
-[[https://github.com/fkaag71/nuva-utils/tree/master/NUVA%20Utils|library available on GitHub]] allows to retrieve and explore NUVA. It is a work in progress, that will be progressively enriched to provide metrics on code systems based upon their mapping to NUVA codes.+[[https://pypi.org/project/nuva-utils/|nuva_utils]] is Python package available from the PyPi repository.
  
-The supported functions are so far+It can be installed with command
-<code python+<code> 
-get_nuva_version()+pip install nuva-utils
 </code> </code>
-Returns the version index for the last publication of NUVA. 
  
 +The supported functions are so far:
 <code python> <code python>
-get_nuva(version+def nuva_version(): 
-</code> +    """ 
-Uploads in the current directory the referenced version in RDF/XML format as **nuva_ans.rdf**, and creates rebased version **nuva_ivci.rdf**.+    Returns the current version of the NUVA graph available from https://ivci.org/nuva 
 +    """ 
 +def nuva_core_graph(): 
 +    """ 
 +    Returns the core graph of NUVA as RDFLib graph 
 +    :return: the core graph 
 +    """ 
 +def nuva_add_codes_to_graph(g,codesystem,codes): 
 +    """ 
 +    Adds the alignments for an external code system.
  
-<code python> +    g: The graph where the alignments are to be added 
-split_nuva(+    codesystem: The code system of the aligments 
-</code> +    codes: an array of Dict objects, such as {'CVX':'CVX-219','NUVA':'VAC1188')} 
-From the uploaded **nuva_ivci.rdf** filecreates split version as collection of files in RDF/Turtle format+    """ 
-  * **nuva_core.ttl** including the concepts for vaccinesvalencestarget diseases and their labels in English +def nuva_add_lang(g,lang): 
-  * **nuva_lang_XX.ttl** includes all translations for language XX +    """  
-  * **nuva_refcode_YYY.ttl** includes the concepts and the NUVA alignments for code system YYY+    Adds language graph to base graph 
 +    """      
 +def nuva_get_vaccines(g,lang,onlyAbstract= False): 
 +    """ 
 +    Return a Dict of all NUVA vaccines and their properties 
 +    """ 
 +def nuva_translate(g,lang1,lang2)
 +    """ 
 +    Extracts from a graph the translation across 2 languages 
 +    """ 
 +def nuva_optimize(g,codesystem,onlyAbstract): 
 +    """ 
 +    Determines the optimal mapping of a code system to NUVA, either full or limited to abstract vaccines. 
 +    Returns a dictionary with three items: 
 +    - bestcodes, a dictionary of all NUVA concepts 
 +    - revcodes, a dictionary of all codes in the code system 
 +    - metrics, the computed metrics of the code system
  
-<code python> +    For each NUVA concept, bestcodes is formed by: 
-refturtle_to_map(code) +    - label: the English label of the concept 
-</code> +    - isAbstract: whether the concept is abstract 
-Starting from the **nuva_refcode_YYY.ttl** file for the given code, creates a simple CSV file **nuva_refcode_YYY.csv** with alignments between the given code and NUVA.+    - nbequiv: the number of codes that match exactly the NUVA concept 
 +    - blur: the number of concepts covered by the narrowest codes for the NUVA conceptIf nbequiv is not 0, blur should be 1 
 +    - codes: the list of codes with the given blur
  
-<code python> +    For each code in the code system, revcodes is formed by: 
-map_to_turtle(code) +    - label: the English label of the corresponding NUVA concept 
-</code> +    - cardinality: the number of NUVA concepts covered by the given code 
-Assuming that the **nuva_refcode_YYY.csv** file has been copied to work file **nuva_code_YYY.csv**, then edited for enhancing the alignmentscreates Turtle work file **nuva_code_YYY.ttl** for further processing.+    - may: the list of these NUVA concepts 
 +    - blur: the number of NUVA concepts for which the given code is the best possible one 
 +    - best: the list of these NUVA conceptsthat is subset of "may"
  
-Note that the refcode file contains the NUVA English labels of vaccines for conveniencebut these are not required nor processed from the work code file+    The metrics is formed by: 
- +    - completeness: the share of NUVA concepts that can be represented by a codeeven roughly 
-<code python> +    - precision: the inverse of the average blur over all the codes in the code system, when using the most optimal one for each concept
-query_core(q)+    - redundancy: for the NUVA concepts that have exact alignments in the code system, the average number of such alignments. 
 +    """                  
 </code> </code>
-Runs a SPARQL query q against the core graph loaded from **nuva_core.ttl** 
  
-<code python> +Here an example of use: 
-query_code(q,code) +  - Retrieve the NUVA version 
-</code> +  - Retrieve the NUVA core graph 
-Runs SPARQL query q against a graph formed by merging **nuva_core.ttl** and the work file **nuva_code_YYY.ttl**, thus allowing to run checks and measures on the alignment.+  - Complement it with ATC alignments 
 +  - Complement it with French labels 
 +  - Display the list of vaccines 
 +  - Display translation table from English to French 
 +  - Determine the best possible mapping from and to ATC and the corresponding metrics
  
 +<code Python>
 +import os
 +import nuva_utils
 +from pathlib import Path
 +from nuva_utils.nuva_utils import *
  
-An example use sequence is included in the file: 
-<code python> 
 # Here the main program - Adapt the work directory to your environment # Here the main program - Adapt the work directory to your environment
  
 os.chdir(str(Path.home())+"/Documents/NUVA") os.chdir(str(Path.home())+"/Documents/NUVA")
-get_nuva(get_nuva_version()) +version = nuva_version() 
-split_nuva() +print(version)
-refturtle_to_map("CVX"+
-shutil.copyfile("nuva_refcode_CVX.csv","nuva_code_CVX.csv"+
-map_to_turtle("CVX")+
  
-q1 """  +nuva_core_graph() 
-   # All vaccines against smallpox +print ("Core graph loaded")
-    SELECT ?vcode ?vl WHERE {  +
-    ?dis rdfs:subClassOf nuva:Disease . +
-    ?dis rdfs:label "Smallpox-Monkeypox"@en . +
-    ?vac rdfs:subClassOf nuva:Vaccine . +
-    ?vac rdfs:label ?vl .  +
-    ?vac skos:notation ?vcode . +
-    ?vac nuvs:containsValence ?val .  +
-    ?val nuvs:prevents ?dis  +
- } +
-""" +
-res = query_core(q1+
-for row in res: +
-    print (str(row[0])+"-"+str(row[1]))+
  
-q2=""" +codes [] 
-    # List CVX Codes +csv_file = open("NUVA_refcode_ATC.csv",'r',encoding="utf-8-sig",newline='') 
-    SELECT ?cvx ?nuva ?lvac WHERE {  +reader = csv.DictReader(csv_file,delimiter=';') 
-    ?vac rdfs:subClassOf nuva:Vaccine .  +codesystem = reader.fieldnames[0] 
-    ?vac skos:notation ?nuva . +for row in reader
-    ?vac skos:exactMatch ?code +    codes.append(row) 
-    ?code rdfs:subClassOf nuva:CVX + 
-    ?code skos:notation ?cvx +nuva_add_codes_to_graph(g,codesystem,codes) 
-    ?vac rdfs:label $lvac +nuva_add_lang(g,'fr'
-    +vaccines = nuva_get_vaccines(g,'fr'
-""" +print(vaccines) 
-res=query_code(q2,"CVX") +trans = nuva_translate(g,'en','fr'
-for row in res+print(trans) 
-    print ("CVX "+str(row[0])+" = "+str(row[1])+" "+str(row[2])+eval_codes = nuva_optimize(g,codesystem,False) 
-</code>+bestcodes = eval_codes['bestcodes'
 +revcodes = eval_codes['revcodes'
 +metrics = eval_codes['metrics'
 + 
 +rev_fname = f"{codesystem}/nuva_reverse_{codesystem}.csv" 
 +best_fname= f"{codesystem}/nuva_best_{codesystem}.csv" 
 +metrics_fname=f"{codesystem}/nuva_metrics_{codesystem}.txt" 
 + 
 +print ("Create best codes report "+best_fname) 
 +best_file = open(best_fname,'w',encoding="utf-8",newline='') 
 +best_writer csv.writer(best_filedelimiter=';'
 +best_writer.writerow(["NUVA","Label","IsAbstract",f"Best {codesystem}", "Equiv"]
 +for nuva_code in bestcodes
 +    best_writer.writerow([nuva_code,bestcodes[nuva_code]['label'],bestcodes[nuva_code]['isAbstract'], 
 +                            bestcodes[nuva_code]['codes'], bestcodes[nuva_code]['nbequiv']]) 
 +best_file.close 
 + 
 +print ("Create reverse codes report "+rev_fname) 
 +rev_file = open(rev_fname,'w',encoding="utf-8",newline=''
 +rev_writer = csv.writer(rev_file, delimiter=';'
 +rev_writer.writerow([codesystem,"Label","Cardinality","May code", "Blur", "Best code for"]) 
 +for extcode in revcodes: 
 +    rev_writer.writerow([extcode,revcodes[extcode]['label'],  
 +                            revcodes[extcode]['cardinality'],revcodes[extcode]['may'],  
 +                            revcodes[extcode]['blur'], revcodes[extcode]['best']]) 
 +rev_file.close
  
 +nbnuva = len(bestcodes)
 +nbcodes = len(revcodes)
 +
 +print (f"NUVA version :{version}\n")
 +print (f"Number of NUVA concepts : {nbnuva}")
 +print ("Completeness: {:.1%}\n".format(metrics['completeness']))
 +print (f"Number of aligned codes: {nbcodes}")
 +print ("Precision: {:.1%}".format(metrics['precision']))
 +print ("Redundancy: {:.3}".format(metrics['redundancy']))
 +</code>
  • ivci/nuva-utils.1704462878.txt.gz
  • Last modified: 2024/01/05 13:54
  • by fkaag