?

Log in

No account? Create an account
I've understood the premises of XML for a long time, but I've always… - LogJam [entries|archive|friends|userinfo]
LogJam

[ website | LogJam ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Nov. 10th, 2001|09:05 pm]
LogJam
logjam
[evan]
I've understood the premises of XML for a long time, but I've always been confused about the use of attributes and element content.

For example, why would I choose one of these formats over the other:

<user>
	<username>test</username>
	<password>test</password>
	<usefastserver />
</user>
versus
<user usefastserver="1" password="test" name="test" />
Can anyone explain it, or why I should use some option in between?
LinkReply

Comments:
From: compwiz
2001-11-10 09:14 pm (UTC)
speaking of XML, are you planning to implement some kind of journal-export feature to XML or some other format sometime in the future in LogJam?
(Reply) (Thread)
From: evan
2001-11-10 09:20 pm (UTC)
That was a goal, yes.
(Reply) (Parent) (Thread)
From: piman
2001-11-10 09:22 pm (UTC)
There is, in fact, no real reason to do one over the other in simple situations. I prefer elements, because they're easier to parse from scripts. Other people prefer attributes because they look cleaner.

By making something an attribute, however, you're making a statement that "this property will never contain subproperties." For things like fastserver, name, and password, that's probably okay. But think about what might possibly need subproperties at some point in the future when deciding between attributes and elements.
(Reply) (Thread)
From: evan
2001-11-10 09:29 pm (UTC)
OK, now that you've revealed your knowledge, time for question two:

When generating these files, should I generate a DOM tree and use libxml to output it? Or is just writing the files myself OK?

I'm trying to decide between DOM and SAX, basically...
(Reply) (Parent) (Thread)
[User Picture]From: visions
2001-11-10 09:33 pm (UTC)
i trypically generate trees and output the tree... but you can easily write the file yourself.
(Reply) (Parent) (Thread)
From: piman
2001-11-10 09:35 pm (UTC)
I generally use a DOM and output it with libxml(2). It means less worrying about escaping characters.
(Reply) (Parent) (Thread)
[User Picture]From: jope
2001-11-11 01:50 am (UTC)
I'm not sure the SAX option even applies, except for reading in data already in XML format for processing, as opposed to adding XML formatting to non-XML data. That said, be careful buildng an entire DOM tree and only outputting it at the very end, if the tree gets too large, you could be hurting. I don't know libxml well enough to know whether it allows you to build and output your sub-trees iteratively (probably does), which would probably be enough to avoid this problem.
(Reply) (Parent) (Thread)
[User Picture]From: alchemist
2001-11-11 04:44 am (UTC)
Actually, SAX combined with TRaX (or however it's supposed to be capp'ed) work wonders for generating XML. That's what I'm using in my current project.

Of course, I do java, and the libs might not be ported everywhere else yet.
(Reply) (Parent) (Thread)
[User Picture]From: alchemist
2001-11-11 04:46 am (UTC)
For generating XML, using a SAX transformer makes life so much easier, compared to using a DOM tree. I've done both in the past week, fir the first time, and I found SAX to be temendously more programmer freindly.

HTH, YMMV, etc, etc
(Reply) (Parent) (Thread)
[User Picture]From: avva
2001-11-10 09:41 pm (UTC)
1. The first version allows future nesting: maybe your values will have nested elements in their own right.
2. The first version allows more flexible and simple processing in the future. It's easier to standartise tag names then attribute names across the document. Doing something like "walk the file, change all <username> into <name>" is easier than parsing and replacing attributes.
3. You're more restricted on what you can put into attribute names/values. See the spec for details, I don't remember them.
4. Hmm... DTDs are easier to write for the first form? I'm not sure.

I'm personally biased towards the first form, but not too strongly.
(Reply) (Thread)
From: tribelessnomad
2001-11-10 09:52 pm (UTC)

Attributes vs. elements

I think that should be:

3. You have more control over what can be put into attribute names/values.
(Reply) (Parent) (Thread)
From: tribelessnomad
2001-11-10 10:14 pm (UTC)

Attributes vs. elements

(Sorry, hit ENTER too soon.)

I think that should be:

3. You have more control over what can be put into attribute values.

As for 4, a DTD that doesn't define attributes is simpler, but I can't think when it would matter. The slight additional complexity when defining attributes is well worth it if you might want to restrict the attribute's type later.

Here are a few more points:
  • If you design your document to have a simple 3-level (document/record/field) structure with no attributes, you give users some additional -- and extremely simple -- processing options.
  • If a document will be hand-coded or file size should be kept down, attributes are often preferable.
  • If XML is viewed as HTML (e.g., when it's embedded as a data island), attribute values are hidden but element content is displayed as unformatted text. This has sometimes been put forward as a guiding principle in determining whether attributes or child elements should be used, but in practice you can rarely make the unformatted text look right anyway, so the other considerations usually outweigh this one.

(Reply) (Parent) (Thread)
[User Picture]From: hober
2001-11-10 09:41 pm (UTC)
(Reply) (Thread)
[User Picture]From: android5
2001-11-11 03:19 am (UTC)

one point not mention yet

the first form creates a tree of nodes instead of a singular node. in the first form, you can have multiple under (which doesn't make sense in this situation).
(Reply) (Thread)
[User Picture]From: cmm
2001-11-11 11:41 am (UTC)
> I've understood the premises of XML for a long time, but I've always been confused about the use of attributes and element content.
[...]
Can anyone explain it


why, sure. the people who brought us XML and related acronym-heavy stuff are basically bozos.

attributes are entirely surepfluous. if you are free to define your own XML-based data format (and I understand that you are, here), simply don't use them. you'll have one headache less.

hth.
(Reply) (Thread)
[User Picture]From: jnala
2001-11-12 09:16 pm (UTC)

Advantages of attributes

Others have explained the advantages of elements. One downside is that you can't default them, unlike attributes. Another is that if you have several child key/value pairs, you're forced to either impose an arbitrary order on them, list every possible order in the DTD, or accept documents with bogus repeated subelements.

It feels unnatural to me to index content based on CDATA grandchild nodes, so I tend to put keys (in the RDBMS sense, a combination of children which uniquely specify the element) in attributes. I'm not sure if this prejudice has any basis in technical reality.

In your example case, I would definitely make username an attribute, probably make password an attribute, and definitely not make usefastserver an attribute.

If you're hand-reading and hand-writing the files yourself, it's easier to avoid attributes entirely, but then why use XML?

Oh, BTW, I highly recommend "XML in a Nutshell" for excellent concise practical discussions of issues like this.
(Reply) (Thread)
[User Picture]From: mudpoet
2001-11-15 01:34 am (UTC)
IBM answers both of your questions here:
http://www-106.ibm.com/developerworks/education/xmlintro/xmlintro.html

I highly recommend reading through the entire thing. It's not long at all - it helped me a lot.
(Reply) (Thread)
[User Picture]From: mudpoet
2001-11-15 01:59 am (UTC)
I just realized that I posted the wrong URL. Forget that one. This one explains elements vs. attributes:
http://www.intelligenteai.com/XMLRepository/elements_versus_attributes.htm

This one, while really a SAX article, does compare SAX vs. DOM a bit:
http://www-106.ibm.com/developerworks/xml/library/x-saxapi/index.html
(Reply) (Thread)
From: evan
2001-11-15 09:28 am (UTC)
Oh, good, I was looking at the first link you posted and wondering what I was missing... :)
(Reply) (Parent) (Thread)