New R package flatxml: working with XML files as R dataframes

The world is flat

The new R package flatxml provides functions to easily deal with XML files. When parsing an XML document fxml_importXMLFlat produces a special dataframe that is 'flat' by its very nature but contains all necessary information about the hierarchical structure of the underlying XML document (for details on the dataframe see the reference for the fxml_importXMLFlat function). flatxml offers a set of functions to work with this dataframe.

Apart from representing the XML document in a dataframe structure, there is yet another way in which flatxml relates to dataframes: the fxml_toDataFrame function can be used to extract data from an XML document into a dataframe, e.g. to work on the data with statistical functions. Because in this case there is no need to represent the XML document structure as such (it's all about the data contained in the document), there is no representation of the hierarchical structure of the document any more, it's just a normal dataframe.

Each XML element, for example <tag attribute="some value">Here is some text</tag> has certain characteristics that can be accessed via the flatxml interface functions, after an XML document has been imported with \fxml_importXMLFlat. These characteristics are:

  • value: The (text) value of the element, "Here is some text" in the example above
  • attributes: The XML attributes of the element, attribute with its value "some value" in the example above
  • children: The elements on the next lower hierarchical level
  • parent: The element of the next higher hierarchical level, i.e. the element to which the current element is a child
  • siblings: The elements on the same hierarchical level as the current element


Structure of the flatxml interface

The flatxml interface to access these characteristics follows a simple logic: For each of the characteristics there are typically three functions available:

  • fxml_has...(): Determines if the current XML element has (at least one instance of) the characteristic
  • fxml_num...(): Returns the number of the characteristics of the current XML (e.g. the number of children elements
  • fxml_get...(): Returns (the IDs of) the respective characteristics of the current XML element (e.g. the children of the current element)


Learn more

For more information on the flatxml package please go to http://www.zuckarelli.de/flatxml/index.html.

Comments

  1. Have You tried to load XMLs from Sharepoint?

    ReplyDelete
  2. Not yet. But it's good to try that.

    ReplyDelete
  3. Hi bro thanks for this great article i really like this post and i love your blog you are doing really good work keep this good work up and also check these articles

    also if you want to exchange backlink with me you can contact me on Usamabutt333.aa@gmail.com Thanks

    Jazz free tv channel links
    <--------------->
    Zong free internet 2019 6tricks 100% working
    <--------------->
    Telenor free internet 2019 trick
    <--------------->
    Ufone free internet 2019 new trick updated
    .

    Airtel free internet 2019 100% working proxy and code
    <--------------->
    Top 5 alternatives of Google adsence with High CPM CPC ads 2019 with proof
    <--------------->
    Zong free internet 2019 new trick 10000% Working
    <--------------->
    Mobilink Jazz free internet 2019 unlimited free with sky vpn
    <--------------->
    Top five seo tips to rank first in Google in 2019
    <--------------->
    Earn $100 a day just no investment
    <--------------->

    ReplyDelete

Post a Comment

Popular posts from this blog

New package 'packagefinder' - Search for packages from the R console

New R package 'debugr' - use automatic debug messages to improve your code