?

Log in

No account? Create an account
structured text revisited - LogJam [entries|archive|friends|userinfo]
LogJam

[ website | LogJam ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

structured text revisited [Dec. 13th, 2002|12:07 pm]
LogJam
logjam
[evan]

An elaboration on the hyphen question:

There are something like six different characters which look like a horizontal line in the Unicode standard, but we're only concerned with a few here:
  • The hyphen/minus sign, found on your keyboard, but more or less deprecated because it's ambiguous.
  • The (Unicode) hyphen, used to contract words.
  • The en dash, which is discussed here (eru = my hero).
  • The em dash, discussed previously.
  • And finally, the minus sign (which is its own character now).

So we map --- to the em dash, -- to the en dash (as is done in TeX) but now we're left with - meaning either hyphen or minus sign. TeX handles this because it has a math mode: when you're typing math, - means minus sign, otherwise it's hyphen. But how should LogJam do it?
(I also see the Textism one (discussed below) uses one hyphen surrounded by spaces for en dash and two for em... is that better?).


Some more questions:
How should we do blockquote? Nested lists? The original StructuredText does it by indenting the paragraphs, but that will probably be cumbersome for this editing interface. I'd like to avoid explicit markup (like writing "bq" somewhere, as I've seen others do it)...
The problem with both of these is that the formatting applies across paragraphs, so simply including a character at the beginning won't do (because the next paragraph will fall out of the formatting). So what would represent these concepts to you? >>blockquoted text<< ?



And I visited colin's journal because he's also into fancy-looking text, and aside from noticing the ellipses (which I'll stick into LogJam, too) I found a like to this via typographica. Funny coincidence, eh?


See also Restructured Text, which appears to be the successor to Structured Text, and is full of good ideas. But there, they use `` for monospaced text. Arrgh! So many different ways to do it.
LinkReply

Comments:
[User Picture]From: revjim
2002-12-13 12:53 pm (UTC)
The minus vs. hyphen for "-" is a difficult choice. I think, as far as text goes, the hyphen is more likely to be used than the minus, but that really isn't reason enough to make a user have to take special steps to create a minus sign. I think, since hypens should always(?) be touching a word, if there is space on both side of the symbol, make it a minus, otherwise, it is a hyphen.

If you are trying to make an all inclusive system, I suggest the following, as well:


======Heading Size 1======
=====Heading Size 2=====
====Heading Size 3====
===Heading Size 4===
==Heading Size 5==
(or something similar)


I like this syntax for blockquote:

>>blockquoted text here<<

however, this works just as well:

]]blockquoted text here[[

I, like you, agree that using 'bq' is NOT a good method.


I prefer /this for/ emphasized text, but your *method* works fine too.


"----" should mean a horizontal rule.


I also suggest using a system similar to my suggesting for the hyphen when it comes to dashes. Currently, most people denote an em dash using the "space minus minus space" constuct. Therefore, I suggest allowing that constuct to continue. Since an en dash is used to denote "through" I suggest that two minus signs surrounded by NON-space characters be rendered as an en dash, while two minus signs surrounded by space characters be rendered as an em dash.


I think it is very important that the typographical substitutions look normal and readable so that, when a pure ASCII representation of text is needed, no conversion is needed.


Also, as far as the typographical substitutions go, if at all possible, please implement a method to leave the formatting options on (bold, block quote, etc) while turning the quotes, dashes, and elipses off. The reason behind this is that many people, myself included, prefer not to use proper typographical characters for one reason. They don't cut and paste well at all.


Unlike the author of Textile, when using Structured Text, I don't like to have to type any HTML at all. This way, I can write format converters that will convert it into anything, including plain text. Therefore, I suggest automatically quoting <, >, ", and &.


I'm sure I could ramble on about this for hours.
(Reply) (Thread)
From: evan
2002-12-13 01:22 pm (UTC)
[headings]
Do you ever use headings on LJ? Does anyone?

[blockquotes]
Any arguments for or against square brackets versus angle? Any prior system to emulate here?

I also suggest using a system similar to my suggesting for the hyphen when it comes to dashes. Currently, most people denote an em dash using the "space minus minus space" constuct. Therefore, I suggest allowing that constuct to continue. Since an en dash is used to denote "through" I suggest that two minus signs surrounded by NON-space characters be rendered as an en dash, while two minus signs surrounded by space characters be rendered as an em dash.

Actually, most things I read don't use spaces around them at all. If I hadn't just read your comment and was trying to make my own heuristic, I would've assumed that "space minus minus space" is definitely *not* an em dash.

I suggest automatically quoting [html metacharacters]
Done already. See the README linked in the other post. :)
(Reply) (Parent) (Thread)
[User Picture]From: revjim
2002-12-13 01:08 pm (UTC)
Oh yes... one more thing...

in order to preserve the highest possible amount of readability, and in order to prevent people from having to type abnormally as much as possible, I suggest a different approach for the quotes.

I suggest "things in double quotes" to use standard double quotes on both sides and for 'things in single quotes' to do the same. Then, use a nice hefty regular expression to decide which set should go which way. Additionally, if a single quote is encounted within a word (like I'm) it should deduce the proper encoding for that as well. This way I don't have to have ``silly looking text'' when viewing it without the encoding.
(Reply) (Thread)
From: evan
2002-12-13 01:31 pm (UTC)
What if someone writes, "He said, 'I'm goin' to the store.'"?
How do I know that the apostrophes doesn't close the quotes? An autoclosing system will get confused.

nice hefty regular expression
Tee hee. If only I were using an expressive programming language. :)

silly looking text
Some people don't seem to mind.

I'd like to use " and ' for HTML links, in the way Structured Text does it.

I see your point about cut'n'paste, though. I suppose a "Don't use Unicode characters in StructuredText" option would be sufficient?
The real solution is for us to actually type the proper quote characters when we want them, but our current keyboard format will probably stick with us for a while.
(Reply) (Parent) (Thread)
[User Picture]From: decklin
2002-12-13 04:42 pm (UTC)
I suggest "things in double quotes" to use standard double quotes on both sides and for 'things in single quotes' to do the same.

I think the TeX way of doing things is a pretty well-entrenched convention. I'm already used to it, and the alternative is far too DWIMish for me to be comfortable switching to structured. (Currently I'm a bit of an oddball in that I use preformat on every single post...[1][2])

[1] Evan: the main reason for this is that I'm very picky about making sure all text in my lastn view is enclosed in a block element, e.g., <p>. Current server-side autoformatting just adds <br>s. Have you dealt with this? It seems a lot of system styles assume the latter and include the <p> themselves, which I feel is very wrong...

[2] Which brings up another question: how do I use an HTML element which the structured markup does not provide for, like <strike>? HTML metachars are escaped, so... (sorry for being to lazy to make another comment in the appropriate place for this.)
(Reply) (Parent) (Thread)
From: evan
2002-12-14 01:17 pm (UTC)
Do you mean your entire post is enclosed within one <p>? I assume not... with structured text your post ends up as one long line (single newlines become spaces, while multiple newlines separate paragraphs which are properly tagged with both opening and closing <p> tags).
(Reply) (Parent) (Thread)
[User Picture]From: decklin
2002-12-14 02:57 pm (UTC)
No, multiple <p>s.

However, if you have no idea what HTML even is, and you don't touch any formatting options, you get a whole bunch of text with some <br>s in it, where there should be block elements. it works, but it's bad HTML. I think some styles even assume this and include, e.g., <p>%%event%%. (yuck.)

I assume by "properly tagged" you mean that it's not just going to be two newlines -> </p><p>, but that a non-special block at the beginning of an entry is marked up with a <p> and that the last non-special block gets a closing </p>. (and that transitions elsewhere from/to blockquote or whatever to/from unadorned blocks are handled correctly, etc. this is mostly just a special case of that.)
(Reply) (Parent) (Thread)
[User Picture]From: colin
2002-12-29 04:54 pm (UTC)

Markup the planet semantically

The style I’m currently working on has an insanely deep stylesheet. Hopefully I can promote well-structured HTML to the masses.

Tangental gripe: Why isn’t <blockquote> valid within <p>? To me, it makes perfect sense that a blockquote be embedded within a paragraph, and it is common practice in writing. Arrghh.
(Reply) (Parent) (Thread)
From: evan
2002-12-29 05:13 pm (UTC)

Re: Markup the planet semantically

That blockquote thing confuses me too. :(
(Reply) (Parent) (Thread)
[User Picture]From: colin
2002-12-30 12:56 pm (UTC)

Re: Markup the planet semantically

Ahh, it looks like XHTML 2.0 will remedy this. p Common (PCDATA | Inline | List | blockquote | pre | table)* Whew.
(Reply) (Parent) (Thread)
[User Picture]From: colin
2002-12-30 12:57 pm (UTC)

Re: Markup the planet semantically

In comparison with earlier versions of HTML, where a paragraph could only contain inline text, XHTML2's paragraphs represent the conceptual idea of a paragraph, and so may contain lists, blockquotes, pre's and tables as well as inline text. They may not, however, contain directly nested p elements.

(Reply) (Parent) (Thread)
[User Picture]From: colin
2002-12-29 04:48 pm (UTC)
I prefer the ---/-- for em/en dashes. I agree, though, that --/ - is more typing-like and is how Word behaves (for convention’s sake). My major complaint with Textile (and I expressed this to Dean) is that there is no way to get tight-set en dashes. If I want to write, “pages 8–12”, I have to enter the en dash manually, and that’s counterintuitive. It’s a common enough construct that there should be some way to create easily.
(Reply) (Thread)