1 # remark-parse [![Build Status][build-badge]][build-status] [![Coverage Status][coverage-badge]][coverage-status] [![Chat][chat-badge]][chat]
3 [Parser][] for [**unified**][unified]. Parses markdown to an
4 [**MDAST**][mdast] syntax tree. Used in the [**remark**
5 processor][processor]. Can be [extended][extend] to change how
13 npm install remark-parse
19 var unified = require('unified');
20 var createStream = require('unified-stream');
21 var markdown = require('remark-parse');
22 var html = require('remark-html');
24 var processor = unified()
25 .use(markdown, {commonmark: true})
29 .pipe(createStream(processor))
30 .pipe(process.stdout);
36 * [processor.use(parse\[, options\])](#processoruseparse-options)
37 * [parse.Parser](#parseparser)
38 * [Extending the Parser](#extending-the-parser)
39 * [Parser#blockTokenizers](#parserblocktokenizers)
40 * [Parser#blockMethods](#parserblockmethods)
41 * [Parser#inlineTokenizers](#parserinlinetokenizers)
42 * [Parser#inlineMethods](#parserinlinemethods)
43 * [function tokenizer(eat, value, silent)](#function-tokenizereat-value-silent)
44 * [tokenizer.locator(value, fromIndex)](#tokenizerlocatorvalue-fromindex)
45 * [eat(subvalue)](#eatsubvalue)
46 * [add(node\[, parent\])](#addnode-parent)
47 * [add.test()](#addtest)
48 * [add.reset(node\[, parent\])](#addresetnode-parent)
49 * [Turning off a tokenizer](#turning-off-a-tokenizer)
54 ### `processor.use(parse[, options])`
56 Configure the `processor` to read markdown as input and process an
57 [**MDAST**][mdast] syntax tree.
61 Options are passed directly, or passed later through [`processor.data()`][data].
69 GFM mode (`boolean`, default: `true`) turns on:
71 * [Fenced code blocks](https://help.github.com/articles/github-flavored-markdown/#fenced-code-blocks)
72 * [Autolinking of URLs](https://help.github.com/articles/github-flavored-markdown/#url-autolinking)
73 * [Deletions (strikethrough)](https://help.github.com/articles/github-flavored-markdown/#strikethrough)
74 * [Task lists](https://help.github.com/articles/writing-on-github/#task-lists)
75 * [Tables](https://help.github.com/articles/github-flavored-markdown/#tables)
77 ##### `options.commonmark`
81 and this is also part of the preceding paragraph.
84 CommonMark mode (`boolean`, default: `false`) allows:
86 * Empty lines to split blockquotes
87 * Parentheses (`(` and `)`) around for link and image titles
88 * Any escaped [ASCII-punctuation][escapes] character
89 * Closing parenthesis (`)`) as an ordered list marker
90 * URL definitions (and footnotes, when enabled) in blockquotes
92 CommonMark mode disallows:
94 * Code directly following a paragraph
95 * ATX-headings (`# Hash headings`) without spacing after opening hashes
96 or and before closing hashes
97 * Setext headings (`Underline headings\n---`) when following a paragraph
98 * Newlines in link and image titles
99 * White space in link and image URLs in auto-links (links in brackets,
101 * Lazy blockquote continuation, lines not preceded by a closing angle
102 bracket (`>`), for lists, code, and thematicBreak
104 ##### `options.footnotes`
107 Something something[^or something?].
109 And something else[^1].
111 [^1]: This reference footnote contains a paragraph...
116 Footnotes mode (`boolean`, default: `false`) enables reference footnotes and
117 inline footnotes. Both are wrapped in square brackets and preceded by a caret
118 (`^`), and can be referenced from inside other footnotes.
120 ##### `options.blocks`
127 Blocks (`Array.<string>`, default: list of [block HTML elements][blocks])
128 exposes let’s users define block-level HTML elements.
130 ##### `options.pedantic`
133 Check out some_file_name.txt
136 Pedantic mode (`boolean`, default: `false`) turns on:
138 * Emphasis (`_alpha_`) and importance (`__bravo__`) with underscores
140 * Unordered lists with different markers (`*`, `-`, `+`)
141 * If `commonmark` is also turned on, ordered lists with different
143 * And pedantic mode removes less spaces in list-items (at most four,
144 instead of the whole indent)
148 Access to the [parser][], if you need it.
150 ## Extending the Parser
152 Most often, using transformers to manipulate a syntax tree produces
153 the desired output. Sometimes, mainly when introducing new syntactic
154 entities with a certain level of precedence, interfacing with the parser
157 If the `remark-parse` plug-in is used, it adds a [`Parser`][parser] constructor
158 to the `processor`. Other plug-ins can add tokenizers to the parser’s prototype
159 to change how markdown is parsed.
161 The below plug-in adds a [tokenizer][] for at-mentions.
164 module.exports = mentions;
166 function mentions() {
167 var Parser = this.Parser;
168 var tokenizers = Parser.prototype.inlineTokenizers;
169 var methods = Parser.prototype.inlineMethods;
171 /* Add an inline tokenizer (defined in the following example). */
172 tokenizers.mention = tokenizeMention;
174 /* Run it just before `text`. */
175 methods.splice(methods.indexOf('text'), 0, 'mention');
179 ### `Parser#blockTokenizers`
181 An object mapping tokenizer names to [tokenizer][]s. These
182 tokenizers (for example: `fencedCode`, `table`, and `paragraph`) eat
183 from the start of a value to a line ending.
185 ### `Parser#blockMethods`
187 Array of `blockTokenizers` names (`string`) specifying the order in
190 ### `Parser#inlineTokenizers`
192 An object mapping tokenizer names to [tokenizer][]s. These tokenizers
193 (for example: `url`, `reference`, and `emphasis`) eat from the start
194 of a value. To increase performance, they depend on [locator][]s.
196 ### `Parser#inlineMethods`
198 Array of `inlineTokenizers` names (`string`) specifying the order in
201 ### `function tokenizer(eat, value, silent)`
204 tokenizeMention.notInLink = true;
205 tokenizeMention.locator = locateMention;
207 function tokenizeMention(eat, value, silent) {
208 var match = /^@(\w+)/.exec(value);
215 return eat(match[0])({
217 url: 'https://social-network/' + match[1],
218 children: [{type: 'text', value: match[0]}]
224 The parser knows two types of tokenizers: block level and inline level.
225 Block level tokenizers are the same as inline level tokenizers, with
226 the exception that the latter must have a [locator][].
228 Tokenizers _test_ whether a document starts with a certain syntactic
229 entity. In _silent_ mode, they return whether that test passes.
230 In _normal_ mode, they consume that token, a process which is called
231 “eating”. Locators enable tokenizers to function faster by providing
232 information on where the next entity may occur.
236 * `Node? = tokenizer(eat, value)`
237 * `boolean? = tokenizer(eat, value, silent)`
241 * `eat` ([`Function`][eat]) — Eat, when applicable, an entity
242 * `value` (`string`) — Value which may start an entity
243 * `silent` (`boolean`, optional) — Whether to detect or consume
247 * `locator` ([`Function`][locator])
248 — Required for inline tokenizers
249 * `onlyAtStart` (`boolean`)
250 — Whether nodes can only be found at the beginning of the document
251 * `notInBlock` (`boolean`)
252 — Whether nodes cannot be in blockquotes, lists, or footnote
254 * `notInList` (`boolean`)
255 — Whether nodes cannot be in lists
256 * `notInLink` (`boolean`)
257 — Whether nodes cannot be in links
261 * In _silent_ mode, whether a node can be found at the start of `value`
262 * In _normal_ mode, a node if it can be found at the start of `value`
264 ### `tokenizer.locator(value, fromIndex)`
267 function locateMention(value, fromIndex) {
268 return value.indexOf('@', fromIndex);
272 Locators are required for inline tokenization to keep the process
273 performant. Locators enable inline tokenizers to function faster by
274 providing information on the where the next entity occurs. Locators
275 may be wrong, it’s OK if there actually isn’t a node to be found at
276 the index they return, but they must skip any nodes.
280 * `value` (`string`) — Value which may contain an entity
281 * `fromIndex` (`number`) — Position to start searching at
285 Index at which an entity may start, and `-1` otherwise.
290 var add = eat('foo');
293 Eat `subvalue`, which is a string at the start of the
294 [tokenize][tokenizer]d `value` (it’s tracked to ensure the correct
299 * `subvalue` (`string`) - Value to eat.
305 ### `add(node[, parent])`
308 var add = eat('foo');
309 add({type: 'text', value: 'foo'});
312 Add [positional information][location] to `node` and add it to `parent`.
316 * `node` ([`Node`][node]) - Node to patch position on and insert
317 * `parent` ([`Node`][node], optional) - Place to add `node` to in
318 the syntax tree. Defaults to the currently processed node
326 Get the [positional information][location] which would be patched on
331 [`Location`][location].
333 ### `add.reset(node[, parent])`
335 `add`, but resets the internal location. Useful for example in
336 lists, where the same content is first eaten for a list, and later
341 * `node` ([`Node`][node]) - Node to patch position on and insert
342 * `parent` ([`Node`][node], optional) - Place to add `node` to in
343 the syntax tree. Defaults to the currently processed node
349 ### Turning off a tokenizer
351 In rare situations, you may want to turn off a tokenizer to avoid parsing
352 that syntactic feature. This can be done by deleting the tokenzier from
353 your Parser’s `blockTokenizers` (or `blockMethods`) or `inlineTokenizers`
354 (or `inlineMethods`).
356 The following example turns off indented code blocks:
359 delete remarkParse.Parser.prototype.blockTokenizers.indentedCode;
364 [MIT][license] © [Titus Wormer][author]
368 [build-badge]: https://img.shields.io/travis/wooorm/remark.svg
370 [build-status]: https://travis-ci.org/wooorm/remark
372 [coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/remark.svg
374 [coverage-status]: https://codecov.io/github/wooorm/remark
376 [chat-badge]: https://img.shields.io/gitter/room/wooorm/remark.svg
378 [chat]: https://gitter.im/wooorm/remark
380 [license]: https://github.com/wooorm/remark/blob/master/LICENSE
382 [author]: http://wooorm.com
384 [npm]: https://docs.npmjs.com/cli/install
386 [unified]: https://github.com/wooorm/unified
388 [data]: https://github.com/unifiedjs/unified#processordatakey-value
390 [processor]: https://github.com/wooorm/remark/blob/master/packages/remark
392 [mdast]: https://github.com/wooorm/mdast
394 [escapes]: http://spec.commonmark.org/0.25/#backslash-escapes
396 [node]: https://github.com/wooorm/unist#node
398 [location]: https://github.com/wooorm/unist#location
400 [parser]: https://github.com/wooorm/unified#processorparser
402 [extend]: #extending-the-parser
404 [tokenizer]: #function-tokenizereat-value-silent
406 [locator]: #tokenizerlocatorvalue-fromindex
410 [add]: #addnode-parent
412 [blocks]: https://github.com/wooorm/remark/blob/master/packages/remark-parse/lib/block-elements.json