# remark-parse [![Build][build-badge]][build] [![Coverage][coverage-badge]][coverage] [![Downloads][downloads-badge]][downloads] [![Size][size-badge]][size] [![Sponsors][sponsors-badge]][collective] [![Backers][backers-badge]][collective] [![Chat][chat-badge]][chat] [Parser][] for [**unified**][unified]. Parses Markdown to [**mdast**][mdast] syntax trees. Used in the [**remark** processor][remark] but can be used on its own as well. Can be [extended][extend] to change how markdown is parsed. ## Sponsors


πŸ₯‡ ZEIT


πŸ₯‡ Gatsby


πŸ₯‡ Netlify


Holloway




You?
[**Read more about the unified collective on Medium Β»**][announcement] ## Install [npm][]: ```sh npm install remark-parse ``` ## Use ```js var unified = require('unified') var createStream = require('unified-stream') var markdown = require('remark-parse') var remark2rehype = require('remark-rehype') var html = require('rehype-stringify') var processor = unified() .use(markdown, {commonmark: true}) .use(remark2rehype) .use(html) process.stdin.pipe(createStream(processor)).pipe(process.stdout) ``` [See **unified** for more examples Β»][unified] ## Table of Contents * [API](#api) * [`processor().use(parse[, options])`](#processoruseparse-options) * [`parse.Parser`](#parseparser) * [Extending the Parser](#extending-the-parser) * [`Parser#blockTokenizers`](#parserblocktokenizers) * [`Parser#blockMethods`](#parserblockmethods) * [`Parser#inlineTokenizers`](#parserinlinetokenizers) * [`Parser#inlineMethods`](#parserinlinemethods) * [`function tokenizer(eat, value, silent)`](#function-tokenizereat-value-silent) * [`tokenizer.locator(value, fromIndex)`](#tokenizerlocatorvalue-fromindex) * [`eat(subvalue)`](#eatsubvalue) * [`add(node[, parent])`](#addnode-parent) * [`add.test()`](#addtest) * [`add.reset(node[, parent])`](#addresetnode-parent) * [Turning off a tokenizer](#turning-off-a-tokenizer) * [Security](#security) * [Contribute](#contribute) * [License](#license) ## API [See **unified** for API docs Β»][unified] ### `processor().use(parse[, options])` Configure the `processor` to read Markdown as input and process [**mdast**][mdast] syntax trees. ##### `options` Options can be passed directly, or passed later through [`processor.data()`][data]. ###### `options.gfm` GFM mode (`boolean`, default: `true`). ```markdown hello ~~hi~~ world ``` Turns on: * [Fenced code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks#fenced-code-blocks) * [Autolinking of URLs](https://help.github.com/articles/autolinked-references-and-urls) * [Deletions (strikethrough)](https://help.github.com/articles/basic-writing-and-formatting-syntax#styling-text) * [Task lists](https://help.github.com/articles/basic-writing-and-formatting-syntax#task-lists) * [Tables](https://help.github.com/articles/organizing-information-with-tables) ###### `options.commonmark` CommonMark mode (`boolean`, default: `false`). ```markdown This is a paragraph and this is also part of the preceding paragraph. ``` Allows: * Empty lines to split blockquotes * Parentheses (`(` and `)`) around link and image titles * Any escaped [ASCII punctuation][escapes] character * Closing parenthesis (`)`) as an ordered list marker * URL definitions (and footnotes, when enabled) in blockquotes Disallows: * Indented code blocks directly following a paragraph * ATX headings (`# Hash headings`) without spacing after opening hashes or and before closing hashes * Setext headings (`Underline headings\n---`) when following a paragraph * Newlines in link and image titles * White space in link and image URLs in auto-links (links in brackets, `<` and `>`) * Lazy blockquote continuation, lines not preceded by a greater than character (`>`), for lists, code, and thematic breaks ###### `options.footnotes` Footnotes mode (`boolean`, default: `false`). ```markdown Something something[^or something?]. And something else[^1]. [^1]: This reference footnote contains a paragraph... * ...and a list ``` Enables reference footnotes and inline footnotes. Both are wrapped in square brackets and preceded by a caret (`^`), and can be referenced from inside other footnotes. ###### `options.pedantic` Pedantic mode (`boolean`, default: `false`). ```markdown Check out some_file_name.txt ``` Turns on: * Emphasis (`_alpha_`) and importance (`__bravo__`) with underscores in words * Unordered lists with different markers (`*`, `-`, `+`) * If `commonmark` is also turned on, ordered lists with different markers (`.`, `)`) * And removes less spaces in list items (at most four, instead of the whole indent) ###### `options.blocks` Blocks (`Array.`, default: list of [block HTML elements][blocks]). ```markdown foo ``` Defines which HTML elements are seen as block level. ### `parse.Parser` Access to the [parser][], if you need it. ## Extending the Parser Typically, using [*transformers*][transformer] to manipulate a syntax tree produces the desired output. Sometimes, such as when introducing new syntactic entities with a certain precedence, interfacing with the parser is necessary. If the `remark-parse` plugin is used, it adds a [`Parser`][parser] constructor function to the `processor`. Other plugins can add tokenizers to its prototype to change how Markdown is parsed. The below plugin adds a [tokenizer][] for at-mentions. ```js module.exports = mentions function mentions() { var Parser = this.Parser var tokenizers = Parser.prototype.inlineTokenizers var methods = Parser.prototype.inlineMethods // Add an inline tokenizer (defined in the following example). tokenizers.mention = tokenizeMention // Run it just before `text`. methods.splice(methods.indexOf('text'), 0, 'mention') } ``` ### `Parser#blockTokenizers` Map of names to [tokenizer][]s (`Object.`). These tokenizers (such as `fencedCode`, `table`, and `paragraph`) eat from the start of a value to a line ending. See `#blockMethods` below for a list of methods that are included by default. ### `Parser#blockMethods` List of `blockTokenizers` names (`Array.`). Specifies the order in which tokenizers run. Precedence of default block methods is as follows: * `newline` * `indentedCode` * `fencedCode` * `blockquote` * `atxHeading` * `thematicBreak` * `list` * `setextHeading` * `html` * `footnote` * `definition` * `table` * `paragraph` ### `Parser#inlineTokenizers` Map of names to [tokenizer][]s (`Object.`). These tokenizers (such as `url`, `reference`, and `emphasis`) eat from the start of a value. To increase performance, they depend on [locator][]s. See `#inlineMethods` below for a list of methods that are included by default. ### `Parser#inlineMethods` List of `inlineTokenizers` names (`Array.`). Specifies the order in which tokenizers run. Precedence of default inline methods is as follows: * `escape` * `autoLink` * `url` * `html` * `link` * `reference` * `strong` * `emphasis` * `deletion` * `code` * `break` * `text` ### `function tokenizer(eat, value, silent)` There are two types of tokenizers: block level and inline level. Both are functions, and work the same, but inline tokenizers must have a [locator][]. The following example shows an inline tokenizer that is added by the mentions plugin above. ```js tokenizeMention.notInLink = true tokenizeMention.locator = locateMention function tokenizeMention(eat, value, silent) { var match = /^@(\w+)/.exec(value) if (match) { if (silent) { return true } return eat(match[0])({ type: 'link', url: 'https://social-network/' + match[1], children: [{type: 'text', value: match[0]}] }) } } ``` Tokenizers *test* whether a document starts with a certain syntactic entity. In *silent* mode, they return whether that test passes. In *normal* mode, they consume that token, a process which is called β€œeating”. Locators enable inline tokenizers to function faster by providing where the next entity may occur. ###### Signatures * `Node? = tokenizer(eat, value)` * `boolean? = tokenizer(eat, value, silent)` ###### Parameters * `eat` ([`Function`][eat]) β€” Eat, when applicable, an entity * `value` (`string`) β€” Value which may start an entity * `silent` (`boolean`, optional) β€” Whether to detect or consume ###### Properties * `locator` ([`Function`][locator]) β€” Required for inline tokenizers * `onlyAtStart` (`boolean`) β€” Whether nodes can only be found at the beginning of the document * `notInBlock` (`boolean`) β€” Whether nodes cannot be in blockquotes, lists, or footnote definitions * `notInList` (`boolean`) β€” Whether nodes cannot be in lists * `notInLink` (`boolean`) β€” Whether nodes cannot be in links ###### Returns * `boolean?`, in *silent* mode β€” whether a node can be found at the start of `value` * [`Node?`][node], In *normal* mode β€” If it can be found at the start of `value` ### `tokenizer.locator(value, fromIndex)` Locators are required for inline tokenizers. Their role is to keep parsing performant. The following example shows a locator that is added by the mentions tokenizer above. ```js function locateMention(value, fromIndex) { return value.indexOf('@', fromIndex) } ``` Locators enable inline tokenizers to function faster by providing information on where the next entity *may* occur. Locators may be wrong, it’s OK if there actually isn’t a node to be found at the index they return. ###### Parameters * `value` (`string`) β€” Value which may contain an entity * `fromIndex` (`number`) β€” Position to start searching at ###### Returns `number` β€” Index at which an entity may start, and `-1` otherwise. ### `eat(subvalue)` ```js var add = eat('foo') ``` Eat `subvalue`, which is a string at the start of the [tokenized][tokenizer] `value`. ###### Parameters * `subvalue` (`string`) - Value to eat ###### Returns [`add`][add]. ### `add(node[, parent])` ```js var add = eat('foo') add({type: 'text', value: 'foo'}) ``` Add [positional information][position] to `node` and add `node` to `parent`. ###### Parameters * `node` ([`Node`][node]) - Node to patch position on and to add * `parent` ([`Parent`][parent], optional) - Place to add `node` to in the syntax tree. Defaults to the currently processed node ###### Returns [`Node`][node] β€” The given `node`. ### `add.test()` Get the [positional information][position] that would be patched on `node` by `add`. ###### Returns [`Position`][position]. ### `add.reset(node[, parent])` `add`, but resets the internal position. Useful for example in lists, where the same content is first eaten for a list, and later for list items. ###### Parameters * `node` ([`Node`][node]) - Node to patch position on and insert * `parent` ([`Node`][node], optional) - Place to add `node` to in the syntax tree. Defaults to the currently processed node ###### Returns [`Node`][node] β€” The given node. ### Turning off a tokenizer In some situations, you may want to turn off a tokenizer to avoid parsing that syntactic feature. Preferably, use the [`remark-disable-tokenizers`][remark-disable-tokenizers] plugin to turn off tokenizers. Alternatively, this can be done by replacing the tokenizer from `blockTokenizers` (or `blockMethods`) or `inlineTokenizers` (or `inlineMethods`). The following example turns off indented code blocks: ```js remarkParse.Parser.prototype.blockTokenizers.indentedCode = indentedCode function indentedCode() { return true } ``` ## Security As Markdown is sometimes used for HTML, and improper use of HTML can open you up to a [cross-site scripting (XSS)][xss] attack, use of remark can also be unsafe. When going to HTML, use remark in combination with the [**rehype**][rehype] ecosystem, and use [`rehype-sanitize`][sanitize] to make the tree safe. Use of remark plugins could also open you up to other attacks. Carefully assess each plugin and the risks involved in using them. ## Contribute See [`contributing.md`][contributing] in [`remarkjs/.github`][health] for ways to get started. See [`support.md`][support] for ways to get help. Ideas for new plugins and tools can be posted in [`remarkjs/ideas`][ideas]. A curated list of awesome remark resources can be found in [**awesome remark**][awesome]. This project has a [Code of Conduct][coc]. By interacting with this repository, organisation, or community you agree to abide by its terms. ## License [MIT][license] Β© [Titus Wormer][author] [build-badge]: https://img.shields.io/travis/remarkjs/remark.svg [build]: https://travis-ci.org/remarkjs/remark [coverage-badge]: https://img.shields.io/codecov/c/github/remarkjs/remark.svg [coverage]: https://codecov.io/github/remarkjs/remark [downloads-badge]: https://img.shields.io/npm/dm/remark-parse.svg [downloads]: https://www.npmjs.com/package/remark-parse [size-badge]: https://img.shields.io/bundlephobia/minzip/remark-parse.svg [size]: https://bundlephobia.com/result?p=remark-parse [sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg [backers-badge]: https://opencollective.com/unified/backers/badge.svg [collective]: https://opencollective.com/unified [chat-badge]: https://img.shields.io/badge/join%20the%20community-on%20spectrum-7b16ff.svg [chat]: https://spectrum.chat/unified/remark [health]: https://github.com/remarkjs/.github [contributing]: https://github.com/remarkjs/.github/blob/master/contributing.md [support]: https://github.com/remarkjs/.github/blob/master/support.md [coc]: https://github.com/remarkjs/.github/blob/master/code-of-conduct.md [ideas]: https://github.com/remarkjs/ideas [awesome]: https://github.com/remarkjs/awesome-remark [license]: https://github.com/remarkjs/remark/blob/master/license [author]: https://wooorm.com [npm]: https://docs.npmjs.com/cli/install [unified]: https://github.com/unifiedjs/unified [data]: https://github.com/unifiedjs/unified#processordatakey-value [remark]: https://github.com/remarkjs/remark/tree/master/packages/remark [blocks]: https://github.com/remarkjs/remark/blob/master/packages/remark-parse/lib/block-elements.js [mdast]: https://github.com/syntax-tree/mdast [escapes]: https://spec.commonmark.org/0.29/#backslash-escapes [node]: https://github.com/syntax-tree/unist#node [parent]: https://github.com/syntax-tree/unist#parent [position]: https://github.com/syntax-tree/unist#position [parser]: https://github.com/unifiedjs/unified#processorparser [transformer]: https://github.com/unifiedjs/unified#function-transformernode-file-next [extend]: #extending-the-parser [tokenizer]: #function-tokenizereat-value-silent [locator]: #tokenizerlocatorvalue-fromindex [eat]: #eatsubvalue [add]: #addnode-parent [announcement]: https://medium.com/unifiedjs/collectively-evolving-through-crowdsourcing-22c359ea95cc [remark-disable-tokenizers]: https://github.com/zestedesavoir/zmarkdown/tree/master/packages/remark-disable-tokenizers [xss]: https://en.wikipedia.org/wiki/Cross-site_scripting [rehype]: https://github.com/rehypejs/rehype [sanitize]: https://github.com/rehypejs/rehype-sanitize