Rich text editors and rendering engines

Rich text editors and rendering engines

(writer.zohopublic.com)

by lewisjoe

samwillis

Everyone who's ever build on top of contenteditable has war stories, my first bash with it was about 19 years ago. But tools like Prosemirror and Lixical solve most of those issues and separate the content from the presentation. 99% of the time they do what you want, but that 1% is impossible (page breaks!). Personally I'm a fan of TipTap, it's a great abstraction over the lower level complexities of Prosemirror. The TipTap/ProseMirror/Yjs combination is insanely good for being able to build collaborative editing.

Amused by the nod to the Ladybird WASM idea at the end, I'm taking credit for that: https://news.ycombinator.com/item?id=35521878. Good ideas spread. (Joe, apologies for not getting back to your email, it's been a rather busy few months)

unshavedyak

As someone interested in exploring this area -- specifically writing a very thin input highlighter/wysiwyg-lite thing -- i keep hearing about war stories but i'm curious how much people have discussed it, talked about it, etc.

Are there any documented best-practices for writing your own? (aside from "don't" hah)

I get that common wisdom says it is kind of insane, but i'd like to learn why, concretely. It's possible a fraction of what i want is not the "hard part", so i feel compelled to at least explore it.

braden-lk

We use Tiptap/Prosemirror/Yjs at LegendKeeper. It’s a great setup, but TipTap is a very shallow abstraction over PM. (Not knocking it at all, it’s still a great library.) To anyone looking at Tiptap, just know you’ll need to learn Prosemirror pretty thoroughly, too, if you want to do more sophisticated stuff.

ignoramous

Once Flutter fully supports WASM, then that's another viable path forward, as well: https://docs.flutter.dev/platform-integration/web/wasm

TRiG_Ireland

Why does this website use spans and weird javascript to fake links, instead of normal a tags? It means that right-click to open in a new tab doesn't work. Very strange. And very irritating.

NikkiA

It also loaded with white bars instead of text at first, a very irritating web trend.

skc

I once tried to build an editor for collaboratively writing fiction. I used contenteditable's.

I think that was the most humbling programming experience of my life. I gave up after a few weeks of banging my head against the wall.

superasn

And every time you use something custom, you end up breaking things for users you can't even imagine. Even when you think you have all the edge case covered.

Like I use a custom DefaultKeyBinding.dict for my laptop using which I've defined some shortcuts for text editing. It works with Textarea but not for GDocs. Which makes editing so cumbersome for me.

I've always wondered why this was the issue until I read this article. Now I see why it's broken.

amadeuspagel

Rich text editing doesn't imply pagination. Contenteditable is not abandoned. Lexical[1], Meta's framework for text editing relies on contenteditables.

[1]: https://lexical.dev/

lewisjoe

The very reason why libraries like Lexical & ProseMirror exist, is because it's painful to work with contenteditable APIs.

And even with those libraries, try implementing features like multi-columns and cross-block selections.

And the most important problem: try implementing a pure HTML editor using ProseMirror / Lexical - i.e the editor should accept HTML exactly as it is, from arbitrary sources, like how contenteditable accepts. (You can't)

Those libraries depend on tightly controlling what goes into the editor, which is an amazing trade-off if you are building a tightly controlled editor. But good luck, building an email editor that accepts any wild HTML.

Closi

Also note some of the limitations with Lexical - e.g. you can't even drag and drop highlighted text within a block.

jitl

Lexical is not mature and I wouldn’t recommend it for a project yet. Draft.JS (Facebook’s previous ContentEditable framework) stagnated for a long time. Use ProseMirror/TipTap - much lower risk.

KRAKRISMOTT

But lexical provides so much more semantic data. You can build actual operational transforms while for prosemirror the quickest way for collab editing is almost always binding it to a third party CRDT like y.js (which while admittedly works, makes it somewhat hard to customize compared to op transforms).

jitl

Well the 3rd party collaboration libraries for ProseMirror are all using Operation (Step) and transformation interfaces exposed by ProseMirror, right? Or do you mean to say those are too "black box" to build nicely on top of?

https://prosemirror.net/docs/ref/#transform

https://prosemirror.net/docs/ref/#collab

vxNsr

He alludes to the reason why it’s like this at the very end… google and Microsoft see it as their competitive advantage, so they have no reason to collaborate, but guess who also makes the two most popular browsers… so not only do the companies that have made great improvements compared to contenteditable have no reason to collab on making a new standard, those two companies have a very big reason to never improve the APIs in their own browsers.

Another reason our activist FTC head should focus her sights on Google after she’s done pulling Amazon apart.

AshleysBrain

contenteditable may have its flaws, but I'm not sure if it's fair to call it "abandoned" as browser vendors do still maintain it - for example this very week Firefox 115 came out and the release notes[1] refer to updates to how contenteditable works.

[1] https://www.mozilla.org/en-US/firefox/115.0/releasenotes/

jitl

That’s a fix to a bug[1] I reported in October 2021! Big thanks to Masayuki Nakano who worked on this.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1735608

neovive

This is both fascinating and sad. It does feel somewhat reminiscent to the state of HTML (pre-HTML5) before W3C and WHATWG agreed to work together on the development of HTML5. I suppose Google, Microsoft, Zoho, and other large companies that have implemented their own their own layout systems consider their implementation a competitive advantage and would not find it worth the effort to collaborate on a new standard to supplant contenteditable — if that is even possible.

geokon

A bit of a tangential question ...

If one were to write a simple text editor from scratch - is there some place with a "spec" or list of standard modern text editor UI paradigms one would need to implement to make the editor feel natural/normal ? (so it feels like Kate/GEdit/contenteditable)

I kinda get there are a lot of little subtle quirks and corner cases that need to be covered so it doesnt feel weird or janky

For instance EMACS CUA-mode is an example of a incredibly incomplete implemention

jitl

It’s not a spec, but I like “Text editing hates you too” [1] and the article that inspired it, “text rendering hates you” [2].

Also Apple’s old documentation about how their text layout system works. That link is annoying to find so I don’t have it on hand.

[1]: https://lord.io/text-editing-hates-you-too/

[2]: https://faultlore.com/blah/text-hates-you/

richthegeek

I work on a project at a large (although not Google-scale) company building a collaborative RTE.

The short is answer is no, there are very few universal truths about how text editing works. You'll experience differences between combinations of operating systems, input modes (Japanese IME for example), software keyboards on Android, languages (RTL languages) and that's just for text itself.

Then when you are thinking about more complex features simple things like "what happens when I double or triple click on this" are a complete crapshoot.

0xbadcafebee

Some parts of a web browser are generally useful for applications. The APIs that control how the application interfaces with the host operating system, security, networking, etc. But the parts of a browser that deal with actual content are often problematic.

As a casual user of HTML and CSS, I'm often struck by how stupid the layout controls are. How very simple things ("centering text") are often confusing and difficult. And it's striking how these layout and content limitations make it necessary to add layer after layer of complexity (mostly in the form of JavaScript and CSS) in order for anyone to make a "modern" web page successfully. Whereas older applications that are not "web-based" allow any casual user to create rich document presentation just by pointing and clicking.

The Web today is like an Indy 500 race car with foot petals. I think we need to consider the future and how we want computers to work, and start moving towards that, rather than perpetually carrying forward the status quo.

skydhash

Because the web was not meant for applications, but documents. Even JS was a toy project to do fun stuff. Now, we want full-fledged applications on top of it because it happens to be the common platform and learning JS is easy (no types, no pointers).

As soon as Flash came to be, we should have thought of a standard better suited for applications.

jitl

I think the OP is kinda off the mark. The next step for web editor abstractions is not implementing layout engines so we can render page breaks correctly. It’s a library that separates handling input from rendering output. The biggest constricting factor with ContentEditable is that your rendered view also dictates how your input works. We need a library that handles all the user interaction and accessible, localized text input flawlessly but invisibly while integrating well with the OS. If we have that, developers can build their layout system on top or just use DOM layout without worrying about their DOM structure being compatible with ContentEditable input quirks across all the browsers, or their HTML changing in random ways under their feet.

There are a lot of problems with doing a hidden ContentEditable for input, which is what Google Docs does. Last time I looked, the input / focused DOM element was actually hidden inside the blinking caret. But that strategy is clumsy with assistive technology because the accessible bits don’t “line up” with the actual UI drawn to the screen. It also breaks some system conventions like smooth cursor movement when holding spacebar on iOS’s keyboard. You can try it in Google Docs and see what I mean.

Those kinds of issues are actually why Lexical (FB’s new text editor toolkit) doesn’t use hidden input according to Lexical’s author trueadm on Twitter (can’t find my citation). Those same issues also make us hesitate to move that direction in Notion’s editor.

If you want to use a custom “layout engine” implemented on top of the DOM today, you can use Skia CanvasKit (https://skia.org/docs/user/modules/quickstart/) or Flutter Web which is based on Skia. Although CanvasKit is kinda slow and text looks bad on iOS.

enriquto

> Google Docs uses its own layouting system for its editor---which means, it doesn’t use browser’s contenteditable API

Ah!, so that's the reason why copy-paste is badly broken in google docs (and jupyter notebooks as well)? Now I understand the reason. It's because they didn't reimplement it again.

I can copy-paste between all my windows and firefox tabs, except google docs and jupyter notebooks. This is because the selection that they show it's not an actual text selection that the system understands, but just a hand-drawn color rectangle around the text. It's a fake selection, of sorts!

jitl

Someone can break copy paste in ContentEditable just as easily. All you need to do is preventDefault on the copy or paste event, and then do your own thing poorly.

Basically every rich editor on the web that’s any good overrides copy/paste with custom handling.

emadda

An alternative to contenteditable is to use a plain <textarea>, measure the inline width of the rendered text to get the absolute position of each word, and then overlay styles on top.

You need to debounce/throttle the change event handler to not fire too many re renders when typing.

I used this approach for https://bigwav.app for highlighting text.

You can click the demo .bigwav file to take a look.

morninglight

Simple RTF editors are surprisingly few. The AbiWord project looked promising but the MS Windows version ceased development and other versions didn't get much love. I'm surprised that even a lightweight RTF editor has difficulty finding a place.

Someone

> Simple RTF editors are surprisingly few

RTF isn’t standardized and, reading https://en.wikipedia.org/wiki/Rich_Text_Format I get the impression there are fairly wide differences between versions.

From the same page, it also isn’t simple:

- files can use at least 18 different text encodings.

- you can include Unicode code points, but if you do, they “must be followed by the nearest representation of this character in the specified code page”

- if you want to create a table in a RTF file, it seems you have to include an OLE object

- for compatibility with existing files, it seems an editor will have to support Windows meta file format.

So, I would guess any simple RTF editor would be incompatible with a significant fraction of existing RTF files.

jitl

What’s the use-case? The system RTF editor on macOS is quite serviceable, and the Windows one isn’t too bad either.

morninglight

Agreed. However, it was nice to have source code available to work around minor annoyances. I guess most folks just live with what they got.

catapart

I feel the pain of this author, but between stuff like pell.js and more and more web devs understanding and taking advantage of template literals for rendering, we're not really far off from what is being described.

jitl

pell.js does nothing for a developer who wants to build an experience like Notion or Google Docs. Pell’s approach is to say “ok browser do whatever you want”. Very hard to make that collaborative, build complex UI within it, or even predict the resulting HTML structure coming out of it. No chance to do custom layout on top of Pell.

catapart

As a developer who built my own note app (like an anemic Notion), using pell.js as a backbone, I tend to disagree that it does nothing to help. I'll grant you that it is so lightweight as to not provide much luxury, but I find that far more useful than the 'kitchen sink' approach most libraries take. I don't need anyone to coddle me, I need someone to expose functionality to me in a way that works best for my intentions and architecture. pell.js wraps up inputs and delivers me formatted content. That's all I need it to do in order to take input from multiple users and provide that formatted content to the data store and other users. I was able to customize all the layouts I needed with the actions extensions that the library provides for. The lightweight, event-based functionality made my render stack the only slowdown for nesting new pell.js instances inside layout blocks so that I could use the top level instance to insert an inline image and a new pell.js span, and then that span's instance to format whatever blocks of content I wanted beside the image. I was honestly pretty surprised by how easily the utility just worked.

Not sure what all you are trying to do that template literals and a simple input formatter can't do, but if you elaborate on an example or something maybe I can see where the library's failures are?

jitl

> Not sure what all you are trying to do

I work on non-anemic Notion and built the current @-mention menu and overhauled the text editor system to enable cross-block text selection.

The criteria I use to evaluate a rich text editor component are:

- How does collaboration work? Does the framework give me enough for OT or CRDT bindings?

- Does it use a tree-like data structure or do you need to add this later?

- Can you easily build interactions like @-mention menu or tab completion like VS Code?

- Does the editor tolerate concurrent state updates without breaking Input Method Editor composition, ie a remote user comments on text while the current user is inputting a Chinese character?

- Do demos of the above features work on Android? In Chinese?

From reading the Pell source I didn’t see much that would help with my criteria.

catapart

You seem a little confused by my initial comment. What you seem to be saying, here, is that pell.js isn't a good rich text editor. What I said was that pell can be used to write a good rich text editor. If I were building a rich text editor, why would I want my input mechanism to build my tree or replicatable data bindings for me? And it's not like pell stands in the way of those things. It's a glorified text type input.

I'm just going to chalk this one up to a misunderstanding and hope you have a good day!

amw-zero

A perfect example of the limits of HTML rendering. It works for a large class of applications, but it's not the end of the line in terms of all UI.

Crafted by Rajat

Source Code

hckrnws

Rich text editors and rendering engines