navigation
HTML Cleane

HTML Cleaner with options

Clean complex HTML code for an easy reusability

Full implementation of sanityse-html library

Note that attributes, classes, and some html tags are not allowed by default.
You can specify your options below in the options panes.

All available features:

Where to start?

  1. Upload HTML file or copy text
  2. Click Clean HTML
  3. Check the result below

Or past you HTML

Discard text outside html tags?

Yes No

Allow all tags (if not you can set detailed options below)?

Yes No

Allow all attributes (if not you can set detailed options below)?

Yes No

Allowed attributes

Yes No
Tag Attribute Multiple Attr. Values

Set allowed classes

Tag classes

Set allowed styles (only Regex)

Tag Style Regex

Allowed tags (click infos button to see them)

Discard
Escape
Recursive escape (to escape the tag and all its content)

Transform tag to another

All involved tags and attributes must be allowed!

Yes No
From To Attrs Vals

Tags filter

Choose tags to remove. Empty fields aren't counted. Use ^$ in Text field for empty text
Tag Attrs Attributes Text Mediachildren Position

Text filter

Tags Replace By

iframe filter

iframe tag and src attribute must be allowed Hostname targets the link and Domain targets the domain name
iframe Hostnames
iframe domains

Script filter

(script tag and src attribute must be allowed) (Hostname targets the link and Domain targets the domain name)
script Hostnames
script domains

Allowed schemes

Yes No
infos on paragraphs

You can add more HTML tags in the generated HTML code and check the rendering.

Html code

Rendered page

A powerfull HTML cleaning tool based on sanitize-html library

This online tool was made based on sanityse-html node js library. Either you want to clean html files or to test the library, we've implemented all its functions.

Use cases

You can use this tool to clean html file that you had downloaded. You can:

  • Discard or let text outside html tags
  • Extract raw text from html file by disallowing tags
  • Allow specific tags and choose nesting level
  • Filter tags
  • Filter attributes and their values
  • Filter classes
  • Filter inline styles
  • Transform tag to another
  • Filter text
  • Filter iframe by domain or url
  • Filter script by domain or url
  • Filter schemes

About sanitize-html

sanitize-html provides a simple HTML sanitizer with a clear API.

sanitize-html is tolerant. It is well suited for cleaning up HTML fragments such as those created by CKEditor and other rich text editors. It is especially handy for removing unwanted CSS when copying and pasting from Word.

sanitize-html allows you to specify the tags you want to permit, and the permitted attributes for each of those tags.

If a tag is not permitted, the contents of the tag are not discarded. There are some exceptions to this, discussed below in the "Discarding the entire contents of a disallowed tag" section.

The syntax of poorly closed p and img elements is cleaned up.

href attributes are validated to ensure they only contain http, https, ftp and mailto URLs. Relative URLs are also allowed. Ditto for src attributes.

Allowing particular urls as a src to an iframe tag by filtering hostnames is also supported.

HTML comments are not preserved. Additionally, sanitize-html escapes ALL text content - this means that ampersands, greater-than, and less-than signs are converted to their equivalent HTML character references (& --> &amp;, < --> &lt;, and so on). Additionally, in attribute values, quotation marks are escaped as well (" --> &quot;).

Discard or let text outside html tags

Some text editing applications generate HTML to allow copying over to a web application. These can sometimes include undesirable control characters after terminating html tag. By default sanitize-html will not discard these characters, instead returning them in sanitized string. This behaviour can be modified using enforceHtmlBoundary option.

Setting this option to true will instruct sanitize-html to discard all characters outside of html tag boundaries -- before <html> and after </html> tags.

Extract raw text from html file by disallowing tags

By disallowing all tag you instruct the app to remove any tag inside your html file or pasted text. This is very handy when you need to get only the text from long and dirty files.

Allow specific tags and choose nesting level

You can limit the depth of HTML tags in the document with the nestingLimit option: This will prevent the user from nesting tags more than 6 levels deep. Tags deeper than that are stripped out exactly as if they were disallowed. Note that this means text is preserved in the usual ways where appropriate.

For the default allowed tags you can check them above in the Allowed Tags option. You can set any tag you want to allow.

Filter tags

You can provide a filter function to remove unwanted tags. From the options you can choose the tag and put conditions on it like class name...

Filter attributes and their values

You can make a list of allowed attributes. Several allowed values may appear in the same attribute, separated by spaces. Otherwise the attribute must exactly match one and only one of the allowed values.

Filter classes

If you wish to allow specific CSS classes on a particular element, you can do so with the allowedClasses option. Any other CSS classes are discarded. This implies that the class attribute is allowed on that element.

Filter inline styles

If you wish to allow specific CSS styles on a particular element, you can do that with the allowedStyles option. Simply declare your desired attributes as regular expression options within an array for the given attribute.

Specific elements will inherit allowlisted attributes from the global (*) attribute. Any other CSS classes are discarded.

You must also use allowedAttributes to activate the style attribute for the relevant elements. Otherwise this feature will never come into play.

When constructing regular expressions, don't forget ^ and $. It's not enough to say "the string should contain this." It must also say "and only this."

URLs in inline styles are NOT filtered by any mechanism other than your regular expression.

Transform tag to another

What if you want to add or change an attribute? What if you want to transform one tag to another? No problem, it's simple!

You can specify the * wildcard instead of a tag name to transform all tags.

Filter text

Use this one to choose filter your text from the file. You can use a matching word or put a regex code.

Filter iframe by domain or url

If you would like to allow iframe tags but want to control the domains that are allowed through, you can provide an array of hostnames and/or array of domains that you would like to allow as iframe sources. This hostname is a property in the options object passed as an argument to the sanitize-html function.

These arrays will be checked against the html that is passed to the function and return only src urls that include the allowed hostnames or domains in the object. The url in the html that is passed must be formatted correctly (valid hostname) as an embedded iframe otherwise the module will strip out the src from the iframe.

Filter script by domain or url

Similarly to iframes you can allow a script tag on a list of allowlisted domains. You can allow a script tag on a list of allowlisted hostnames too.

Filter schemes

By default, we allow the following URL schemes in cases where href, src, etc. are allowed: [ 'http', 'https', 'ftp', 'mailto' ] You can override this if you want to.

Subscribe to us

Subscribe to our daily newsletter to get the latest news.