新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 本版讨论Semantic Web(语义Web,语义网或语义万维网, Web 3.0)及相关理论,如:Ontology(本体,本体论), OWL(Web Ontology Langauge,Web本体语言), Description Logic(DL, 描述逻辑),RDFa,Ontology Engineering等。
    [返回] 计算机科学论坛W3CHINA.ORG讨论区 - Web新技术讨论『 Semantic Web(语义Web)/描述逻辑/本体 』 → SPARQL and relational databases: getting started 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 4110 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: SPARQL and relational databases: getting started 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18407
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 SPARQL and relational databases: getting started


    http://www.snee.com/bobdc.blog/2008/10/sparql_and_relational_database.html

    SPARQL and relational databases: getting started
    Asking about tables and columns and doing a simple join.

    In an [URL=http://www.snee.com/xml/xml2006/owlrdbms.html]earlier project[/URL] I did querying relational data with SPARQL, I wanted to demonstrate how adding OWL metadata made it possible to answer useful queries that couldn't have been answered with just the original data. I did this using two simple databases of one table each, but recently, while helping someone who was also using the [URL=http://sites.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/]D2RQ[/URL] interface to provide access to some relational data, I decided to get more comfortable with using SPARQL to query data from multi-table relational databases.

    I wanted to find ways to:

    [URL=http://www.flickr.com/photos/toasty/1540997910/][/URL]
    list a database's tables

    list the columns in those tables

    run a query that joined data from at least two of the tables

    It was all easier than I though it would be.

    For a database to query, I went to the sample database section of [URL=http://dev.mysql.com/doc/#sampledb]MySQL's documentation page[/URL] and got the "world" database. This database has a country table with columns for country names, country codes unique within the table, and other information about the country such as its population and head of state. Another table, named countrylanguage, has a column for a country code to link to the country table, a column for a language spoken in that country, another for a figure showing the percentage of the country's residents who speak that language, and a fourth column for a boolean IsOfficial value.

    The following SQL query against that database lists the name of each country, a language spoken there, and what percentage of the population speak that language, with the rows sorted from high percentages to low:

    SELECT Name, Language, Percentage FROM country, countrylanguage
    WHERE country.Code = countrylanguage.CountryCode
    ORDER BY Percentage DESC;

    Here's how I developed a SPARQL query doing the same thing.

    As I described in the writeup of my earlier project, the first step of using D2RQ with a particular database is pointing its generate-mapping utility at a database installed in a running copy of MySQL (or one of the other relational database managers that D2RQ supports) to generate an SQL-to-SPARQL mapping file for that database. Next, you start up the d2r-server program with that mapping file as a parameter to run a server that provides a SPARQL endpoint for that database.

    You could then send SPARQL queries to that server's SPARQL endpoint [URL=http://www.snee.com/bobdc.blog/2008/09/querying_wikidbpedia_for_presi.html#id203795]using Curl[/URL]. For exploring the data, though, I prefer the SNORQL interface, because I can enter a query on a web form, click the Go button, and browse the result. (The SNORQL interface in the latest official release of D2RQ [URL=http://sourceforge.net/mailarchive/forum.php?thread_name=48F7C172.3040003%40snee.com&forum_name=d2rq-map-devel]doesn't get along with Firefox 3.0[/URL]—apparently there's already a fixed version in D2RQ's cvs tree—so I used Chrome for this.)

    When exploring a relational database, I first want to know what its tables are and what columns are in those tables. To do this with SPARQL queries, the D2RQ mapping file that I generated earlier provided some good clues. (Keep in mind that there may be other SPARQL interfaces to relational databases in the future, and they may not all map the relational structures to RDF the same way.)

    First: what are the tables in the database? D2RQ treats each table row as a resource of type tablename. For example, it treats the country table's row for France as a resource identified as http://localhost:2020/resource/country/FRA, which has a http://www.w3.org/1999/02/22-rdf-syntax-ns#type of http://localhost:2020/resource/vocab/country. So, by asking for a list of all the types with the following query (with DISTINCT added so that each only shows up once),

    SELECT DISTINCT ?o WHERE {
      ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o
    }

    SNORQL's browse mode gives me this list (SNORQL defines "vocab" as the prefix for the http://localhost:2020/resource/vocab/ URI), which corresponds to the tables declared in the world database:

    vocab:country
    vocab:countrylanguage
    vocab:city

    What are the columns of these tables? For an example of one column's name, the world database's country table has a row for France with a figure of 59225700 in the Population column. D2RQ represents the column name as a property named tablename_columnname, so the triple describing France's population tells us that http://localhost:2020/resource/country/FRA has a http://localhost:2020/resource/vocab/country_Population of 59225700. How do we list all these predicate names? As I mentioned in [URL=http://www.snee.com/bobdc.blog/2008/08/how_you_can_explore_a_new_set.html]How you can explore a new set of linked data[/URL], the following query lists all the predicates in an RDF-based data collection:

    SELECT DISTINCT ?p WHERE {?s ?p ?o}

    Displaying all predicates in the world database, though, includes a few extras along with the ones representing the table columns. The table columns are all in the http://localhost:2020/resource/vocab namespace, so the best way I could think of to query for predicates that were in that namespace was like this:

    SELECT DISTINCT ?p
    WHERE { ?s ?p ?o.
            FILTER(regex(str(?p),"http://localhost:2020/resource/vocab/")).
    }

    It's pretty kludgy, so I'd love to hear of a better alternative. It selects all the predicates in the data set and then only passes along the ones whose URLs, when converted to a string, have "http://localhost:2020/resource/vocab/" in them. If another namespace used http://localhost:2020/resource/vocab/foo as its URL, this FILTER would select predicates in that namespace as well, because it has that "http://localhost:2020/resource/vocab/" substring in there, but I couldn't find a better way to do this. (Although SPARQL implements a subset of XPath, the [URL=http://www.w3.org/TR/1999/REC-xpath-19991116#function-namespace-uri]namespace-uri[/URL] function is not part of this subset.)

    I won't show you all 24 predicate names that that this query returns, but for the SPARQL version of the SQL query above that listed country names, languages, and percentages, I picked out the tablename_columnname predicates I needed from the list of 24 and used them to create this query:

    SELECT ?name ?language ?percentage
    WHERE { ?s1 vocab:country_Name ?name;
                vocab:country_Code ?ccode.
            ?s2 vocab:countrylanguage_CountryCode ?ccode;
                vocab:countrylanguage_Language ?language;
                vocab:countrylanguage_Percentage ?percentage.
    }
    ORDER BY DESC(?percentage)

    It's not a very fancy query. The most interesting part is the use of the ccode variable to connect information in triples from the world database's country table with information in triples from the database's countrylanguage table. When I first started this I had no idea how I was going to do this SPARQL equivalent of an SQL join, but once I sat down and tried it was intuitive enough. I won't show you all 984 results rows, but here's a selection from the middle to give you the flavor of what's there:

    "Réunion"        "Creole French"  91.5
    "Germany"        "German"         91.3
    "Seychelles"     "Seselwa"        91.3
    "Romania"        "Romanian"       90.7
    "American Samoa" "Samoan"         90.6
    "Syria"          "Arabic"         90.0
    "Swaziland"      "Swazi"          89.9
    "Bahamas"        "Creole English" 89.7
    "Chile"          "Spanish"        89.7
    "Sweden"         "Swedish"        89.5

    Now I feel more confident about forging ahead to explore sets of relational data that I can access with SPARQL queries using the D2RQ interface.

    Posted by Bob DuCharme on October 20, 2008 5:24 PM | [URL=http://www.snee.com/bobdc.blog/2008/10/sparql_and_relational_database.html]Permalink[/URL]

    TrackBack
    TrackBack URL for this entry:
    http://www.snee.com/cgi-sys/cgiwrap/bobd/managed-mt/mt-tb.cgi/447

    Comments
    (Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

    Bob,

    Queries like SELECT DISTINCT ?p WHERE {?s ?p ?o} are OK for RDF stores because of their specialized indexing strategies, but I don't think this would work well with D2RQ against a relational database.

    It would be OK if the RDBMS had a very small amount of data. But on a database of any size, such queries are not practical. I suspect D2RQ will just try to get all the data in the entire database and then try to figure out predicates. Same for query you are suggesting for getting the table names.

    D2RQ translates database schema in the following way - all tables get represented as owl:Class, all columns are exposed as properties connected to the classes using rdfs:domain statements.

    So, a much less expensive way to get all tables is simply by using SELECT ?table WHERE {?table rdf:type owl:Class}

    Then to get columns you can do ?column rdfs:domain ?table. In fact, this single query will get you all tables with the corresponding columns

    SELECT ?table ?column WHERE{?column rdfs:domain ?table}

    Regards,

    Irene

    Posted by: Irene Polikoff | [URL=http://www.snee.com/bobdc.blog/2008/10/sparql_and_relational_database.html#comment-2135]October 20, 2008 9:18 PM[/URL]

    Thanks Irene! That all makes sense to me, but
    I didn't see any use of the [URL=http://www.w3.org/2002/07/owl#]http://www.w3.org/2002/07/owl#[/URL] namespace (or even the string "owl") in the mapping file generated by D2RQ, and

      SELECT ?table WHERE {?table rdf:type }

    got me no results. This also got me no results:

    SELECT ?table ?column WHERE{?column rdfs:domain ?table}


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2008/10/21 22:04:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 点击这里发送电邮给Google AdSense  访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2025/9/18 8:43:24

    本主题贴数1,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    62.500ms