Tag: ReStructured Text

Converting a Python data into a ReStructured Text table

This probably exist but I couldn’t find it. I wanted to export a bunch of data from a Python/Django application into something a non-coder could understand. The data was not going to be a plain CSV, but a document, with various tables and explanations of what each table is. Because ReStructured Text seems to be the winning format in the Python world I decided to go with that.

Generating the text part was easy and straightforward. The question was how to export tables. I decided to represent tables as lists of dicts and thus, I ended up building this little module:

def dict_to_rst_table(data):
    field_names, column_widths = _get_fields(data)
    with StringIO() as output:
        output.write(_generate_header(field_names, column_widths))
        for row in data:
            output.write(_generate_row(row, field_names, column_widths))
        return output.getvalue()


def _generate_header(field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        for field_name in field_names:
            output.write(
                f"| {field_name} {' ' * (column_widths[field_name] - len(field_name))}"
            )
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+={'=' * column_widths[field_name]}=")
        output.write("+\n")
        return output.getvalue()


def _generate_row(row, field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(
                f"| {row[field_name]}{' ' * (column_widths[field_name] - len(str(row[field_name])))} "
            )
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        return output.getvalue()


def _get_fields(data):
    field_names = []
    column_widths = defaultdict(lambda: 0)
    for row in data:
        for field_name in row:
            if field_name not in field_names:
                field_names.append(field_name)
            column_widths[field_name] = max(
                column_widths[field_name], len(field_name), len(str(row[field_name]))
            )
return field_names, column_widths

It’s straightforward and simple. It currently cannot deal very well with cases in which dicts have different set of columns.

Should this be turned into a reusable library?

Advertisements

Turning a list of dicts into a ReStructured Text table

I recently found myself having to prepare a report of some mortgage calculations so that non-technical domain experts could read it, evaluate it, and tell me whether my math and the way I was using certain APIs was correct.

Since I’m using Python, I decided to go as native as possible and make my little script generate a ReStructured Text file that I would then convert into HTML, PDFs, whatever. The result of certain calculations ended up looking like a data table expressed as list of dicts all with the same keys. I wrote a function that would turn that list of dicts into the appropriately formatted ReStructured Text.

For example, given this data:

creators = [{"name": "Guido van Rossum", "language": "Python"}, 
            {"name": "Alan Kay", "language": "Smalltalk"},
            {"name": "John McCarthy", "language": "Lisp"}]

when you call it with:

dict_to_rst_table(creators)

it produces:

+------------------+-----------+
| name             | language  |
+==================+===========+
| Guido van Rossum | Python    |
+------------------+-----------+
| Alan Kay         | Smalltalk |
+------------------+-----------+
| John McCarthy    | Lisp      |
+------------------+-----------+

The full code for this is:

from collections import defaultdict

from io import StringIO


def dict_to_rst_table(data):
    field_names, column_widths = _get_fields(data)
    with StringIO() as output:
        output.write(_generate_header(field_names, column_widths))
        for row in data:
            output.write(_generate_row(row, field_names, column_widths))
        return output.getvalue()


def _generate_header(field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        for field_name in field_names:
            output.write(f"| {field_name} {' ' * (column_widths[field_name] - len(field_name))}")
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+={'=' * column_widths[field_name]}=")
        output.write("+\n")
        return output.getvalue()


def _generate_row(row, field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"| {row[field_name]}{' ' * (column_widths[field_name] - len(str(row[field_name])))} ")
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        return output.getvalue()


def _get_fields(data):
    field_names = []
    column_widths = defaultdict(lambda: 0)
    for row in data:
        for field_name in row:
            if field_name not in field_names:
                field_names.append(field_name)
            column_widths[field_name] = max(column_widths[field_name], len(field_name), len(str(row[field_name])))
    return field_names, column_widths

Feel free to use it as you see fit, and if you’d like this to be a nicely tested reusable pip package, let me know and I’ll turn it to one. One thing that I would need to add is making it more robust to malformed data and handle more cases of data that looks differently.

If I turn it into a pip package, it would be released from Eligible, as I wrote this code while working there and we are happy to contribute to open source.

ReStructured Text on WordPress

I am coming from the Plone world. My web site used to be Plone and various website I still maintain are still Plone-based. In Plone, one of the formats to write text is ReStructured Text (RST); and Matt has convinced me it is the right way… or at least, one good way. But in the world of WordPress, RST is not a first class citizen.

Searching for “restructured text wordpress” I’ve found an article titled “Using reStructuredText with WordPress” . That seemed like good news. The author talks about a plug in, but getting it to work required some work on my side and that is what I am trying to show here.

Disclaimer: poorly written article ahead (and above), take the information, ignore the prose.

First, the plug in is not distributed as a compressed archive (tar.gz, tar.bz2, etc) like software is generally distributed but as a TXT. You have to download rest.php-1.2.txt to your plugins directory of your WordPress install and rename it to “rest.php”.

In the computer were WordPress is installed you should also have Docutils installed, particularly the program rst2html. The plug in has a hard-codded path to reach that program, around line 18:

// Set this to the prefix of your docutils installation.
$prefix = "/usr/local";

// Set this to the path of rst2html.py
$rst2html = "$prefix/bin/rst2html.py";

since I am running Debian GNU/Linux and installed Docutils using the package, the binary was in /usr/bin/ so I have to change the prefix to be:

// Set this to the prefix of your docutils installation.
$prefix = "/usr/local";

Also, due to various personal preferences, I’ve changed the line 116 to:

$execstr = $rst2html . ' --no-generator --no-source-link --rfc-references --no-doc-title --initial-header-level=2 --footnote-references="superscript"';

To know more about it, read the man page of rst2html.

The other important change is that around line 100, there’s some code to replace the “more” HTML links that WordPress insert with RST links. The problem with that is that the code searches for a particular text (“(more…)”):

$pattern = '\(more...\)';
$replacement = "\n\n`(more...) < \1>`__\n\n";
$text = ereg_replace($pattern, $replacement, $text);

but that text is skin-dependent, so I changed it for:

$pattern = '(.*)';
$replacement = "\n\n`\2 < \1>`__\n\n";
$text = ereg_replace($pattern, $replacement, $text);

which is more generic and will successfully correct all the links, no matter what they say. Now that I take a look at it, the line jumps in the RST version shouldn’t be there. This had an unfortunate consequent. The text of the a element of an HTML link can have HTML specific stuff, in my case it had one HTML entity. The right thing would be to expand all HTML entities into text, but I just did a search and replace for this particular one. An ugly hack indeed:

$pattern = '»';
$replacement = "»";
$text = ereg_replace($pattern, $replacement, $text);

And at last the jewel. If you don’t write -*- mode: rst -*- somewhere in the article, then it won’t be RST, it would be the built-in simple WordPress markup language whatever that is called. You can prefix that with two dots and a space, like “.. “, in the beginning of the line to also turn it into a RST comment so the actual text doesn’t appear in the final article.

You’ll see that there are many nasty things here, lots of hacks, etc. The right thing would have been to solve the bugs, make a customization page to set the path and the RST options and make the solution more generic. Then pack it and release it. I am not going to do that, enough time has been spent writting it down here, now I am back to my main project. This is related to what I’ve said in my previous post regarding worrying less about the perfectionism of my web site and work more in my personal projects.

Enjoy!

Comments in the original post

  1. Matt Dorn Says:
    Finally caught up with your new site and updated my RSS feeds–looks nice! But I like that I can now read your entire articles within my RSS reader instead of just the introduction.I agree that Plone is not ideal for a personal site, though I think I’ll be keeping mine for a while. I would like to get comments activated without hosing my Apache caching setup–that seems to be the main challenge, because running a Plone site without caching is not advisable.

    Re. our shared enthuasiasm for Restructured Text, I just had another occasion to appreciate it. You can use it in Tracs. I also got Tracs working with Darcs, which I know you use as well. Have you thought about making your software projects available via Tracs projects? If so, the writeup I just did might help you:

    http://mattdorn.com/content/trac-darcs-restructuredtext

    February 1st, 2007 at 13:01

  2. Pupeno Says:

    Hello Matt,

    Nice to see you active in the community. When I was re-structuring my web site I’ve decided to use Trac for my projects but after a quick try I gave up. A quick try to use it with Darcs (or nothing at all) that is.

    Now that you have written a tutorial about it I am going to read it and give it a nice try. Isn’t it nice that I helped you start with darcs and now you help me start with trac plus darcs? That’s karma!

    February 1st, 2007 at 20:40 e

  3. Ivan Says:

    Dear Pupeno,

    I’ve just installed this plugin to a blog on my local machine (not the blog above). It doesn’t quite work. If I could get it working it would be an absolute godsend. Please help.

    When I add the mode line to a post, wordpress then ignores the entire body of the post – I get the title and nothing else. Any idea what might be wrong?

    I have no knowledge of php at all, my main language is Python. Would it be possible to write plugins in Python?

    My blog is fairly new. I’m enjoying using wordpress, but the rest of the time I more or less live in emacs and use rest for most of my documentation. If I could write my blog posts in rest I should be very happy. Would be prepared to put some work into this to get it working.

    Best wishes

    Ivan

    June 15th, 2007 at 12:52 e

  4. Pupeno Says:
    Hello Ivan,Be sure that the plug in is activated. Otherwise, contact the plug in author.

    I am not sure about programming plug ins in Python but I doubt it’ll be possible, you should check in WordPress’ site and mailing lists.

    I am only a user of WordPress, my knowledge of its internals is null, but if you ask in WordPress’ mailing list or IRC channels, I am sure you’ll get a lot of help.

    June 16th, 2007 at 13:53 e

  5. you know something? » Blog Archive » Happy! ReST Says:
    […] 落ち着いて考えれば、すぐ分かりそうなものですが、 以前、うまくいかなかった理由は、 rst2html.py をあらかじめインストールしておいて、 rest.php 内のパス を書き換えておかったため。 […]September 7th, 2007 at 2:41 e
  6. ROTR » reStructuredText for WordPress is now a Launchpad project Says:
    […] Pupeno […]September 9th, 2007 at 5:30 e